Skip to content

Infer matmul dimension roles to compute vectorization#2303

Merged
jacobhinkle merged 31 commits intomainfrom
flexible_tensor_roles
Jun 6, 2024
Merged

Infer matmul dimension roles to compute vectorization#2303
jacobhinkle merged 31 commits intomainfrom
flexible_tensor_roles

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented May 24, 2024

This PR does the following:

  1. Rename RolesMap to TensorRolesMap and introduce DimRolesMap which is a mapping from ValGroup to MatmulDomain.
  2. Compute a canonical dim ordering on ValGroups based on allocation domains of inputs and outputs. This is used to compute vectorization properly but can be used for canonicalization of loop domains in scheduleMatmul in a future PR.
  3. Properly infer vectorization for every operand, epilogue input, and output based on canonical dim ordering.

This is in preparation for further generalization to accomodate multiple MmaOps in a single Fusion.

Fixes #2169.

NOTE: this is a WIP draft that will likely be split into multiple
smaller PRs.

This is an attempt to generalize our matmul scheduler by doing the
following:
1. Support more than 2 operands in our
   MatmulParams.SupportedVectorization struct
2. Properly infer vectorization for every operand, epilogue input, and
   output.
3. Compute a canonical dim ordering on ValGroups. This is used to
   compute vectorization properly but can be used for canonicalization
   of loop domains in scheduleMatmul in a future PR.
4. Schedule each tensor according to its supported vectorization. This
   might imply a new loop. For example if there are two outputs and one
   supports only vectorization width of 4 and the other 8, then we will
   unroll a loop of size 2 for the width-4 writes so that the outer
   loops are still inlined properly.

This is in preparation for further generalization to accomodate multiple
MmaOps in a single Fusion.
@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle changed the title [WIP] Flexible matmul tensor and domain roles Flexible matmul tensor and dimension roles May 28, 2024
@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle changed the title Flexible matmul tensor and dimension roles Infer matmul dimension roles to compute vectorization May 31, 2024
@jacobhinkle
Copy link
Collaborator Author

!build --diff

@jacobhinkle jacobhinkle marked this pull request as ready for review May 31, 2024 18:56
@jacobhinkle
Copy link
Collaborator Author

I think the codediffs are just due to the uncommented tests.

@jacobhinkle jacobhinkle requested a review from zasdfgbnm May 31, 2024 19:59
@zasdfgbnm
Copy link
Collaborator

What if my mma output is (M=1024, N=1024), and my epilogue is:

T1[1024, 128, 2, 2', 2"] = view(mma_output);
T2[1024, 128, 2', 2, 2"] = transpose(T1);
T3[1024, 1024] = view(T2);
output T3

@jacobhinkle
Copy link
Collaborator Author

What if my mma output is (M=1024, N=1024), and my epilogue is:

T1[1024, 128, 2, 2', 2"] = view(mma_output);
T2[1024, 128, 2', 2, 2"] = transpose(T1);
T3[1024, 1024] = view(T2);
output T3

In that case, I would expect T3 to not be mapped as an OUTPUT_D tensor, in which case we currently would refuse to segment this because we do not allow tensors without known roles. In the future, we could try and allow things like this by loosening the restriction that output dims must exact map to M, N, K or Batch dims. Then we could do something like what @protonu did for #2315; Use unmapped dims to traverse backward through producers to find mapped dimensions and determine vectorization. In that case we could accept since the inner dimension of T3 corresponds to a shuffled N dimension, and we'd select vectorization width of 2 for T3. This particular example would also require us to accept epilogues with ViewOp in them, which we currently block, but it's at least feasible I think.

@zasdfgbnm
Copy link
Collaborator

Oh, I see. We are effectively rejecting view ops inside the fusion. And this rejection saves us from having to use the vectorization helper.

@jacobhinkle
Copy link
Collaborator Author

!build

This failure was just due to using exact graph while we now use
permissive graph instead.
@jacobhinkle jacobhinkle merged commit 4382bf3 into main Jun 6, 2024
@jacobhinkle jacobhinkle deleted the flexible_tensor_roles branch June 6, 2024 14:07
jacobhinkle added a commit that referenced this pull request Jun 25, 2024
This updates `scheduleMatmul` to use the dimension roles introduced in
#2303 to reorder and merge dims within each role. That means we can
schedule fusions with more than one tensor in each role.

Two tests are included so far:
- MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds
to `torch.linear` with 3D input i.e. a Linear layer with two "batch"
dimensions.
- MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This
shows that M dimensions need not be contiguous. Note that vectorization
is unaffected in this example since K is innermost.
I plan to add more tests to explore more scenarios.

Note that we cannot yet handle cases where K is not the innermost
dimension, and I have not yet tested cases with M as the innermost
output dimension. These will likely require us to modify the fusion
definition to place those dimensions in the right spot in the MmaOp.

---------

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
protonu pushed a commit that referenced this pull request Jun 25, 2024
This updates `scheduleMatmul` to use the dimension roles introduced in
#2303 to reorder and merge dims within each role. That means we can
schedule fusions with more than one tensor in each role.

Two tests are included so far:
- MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds
to `torch.linear` with 3D input i.e. a Linear layer with two "batch"
dimensions.
- MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This
shows that M dimensions need not be contiguous. Note that vectorization
is unaffected in this example since K is innermost.
I plan to add more tests to explore more scenarios.

Note that we cannot yet handle cases where K is not the innermost
dimension, and I have not yet tested cases with M as the innermost
output dimension. These will likely require us to modify the fusion
definition to place those dimensions in the right spot in the MmaOp.

---------

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Analyze matmul epilogue to determine vectorization

2 participants