Infer matmul dimension roles to compute vectorization by jacobhinkle · Pull Request #2303 · NVIDIA/Fuser

jacobhinkle · 2024-05-24T17:53:36Z

This PR does the following:

Rename RolesMap to TensorRolesMap and introduce DimRolesMap which is a mapping from ValGroup to MatmulDomain.
Compute a canonical dim ordering on ValGroups based on allocation domains of inputs and outputs. This is used to compute vectorization properly but can be used for canonicalization of loop domains in scheduleMatmul in a future PR.
Properly infer vectorization for every operand, epilogue input, and output based on canonical dim ordering.

This is in preparation for further generalization to accomodate multiple MmaOps in a single Fusion.

NOTE: this is a WIP draft that will likely be split into multiple smaller PRs. This is an attempt to generalize our matmul scheduler by doing the following: 1. Support more than 2 operands in our MatmulParams.SupportedVectorization struct 2. Properly infer vectorization for every operand, epilogue input, and output. 3. Compute a canonical dim ordering on ValGroups. This is used to compute vectorization properly but can be used for canonicalization of loop domains in scheduleMatmul in a future PR. 4. Schedule each tensor according to its supported vectorization. This might imply a new loop. For example if there are two outputs and one supports only vectorization width of 4 and the other 8, then we will unroll a loop of size 2 for the width-4 writes so that the outer loops are still inlined properly. This is in preparation for further generalization to accomodate multiple MmaOps in a single Fusion.

jacobhinkle · 2024-05-28T20:46:13Z

!build

jacobhinkle · 2024-05-29T12:29:13Z

!build

jacobhinkle · 2024-05-29T14:20:32Z

!build

I had mistakenly thought we needed the leaf domain to handle hte multidevice cases, but we don't and that is confusing. I think this way is more reliable.

…c_size

jacobhinkle · 2024-05-31T14:12:32Z

!build --diff

jacobhinkle · 2024-05-31T18:59:11Z

I think the codediffs are just due to the uncommented tests.

zasdfgbnm · 2024-06-04T00:53:16Z

What if my mma output is (M=1024, N=1024), and my epilogue is:

T1[1024, 128, 2, 2', 2"] = view(mma_output);
T2[1024, 128, 2', 2, 2"] = transpose(T1);
T3[1024, 1024] = view(T2);
output T3

jacobhinkle · 2024-06-04T12:21:24Z

What if my mma output is (M=1024, N=1024), and my epilogue is:
T1[1024, 128, 2, 2', 2"] = view(mma_output);
T2[1024, 128, 2', 2, 2"] = transpose(T1);
T3[1024, 1024] = view(T2);
output T3

In that case, I would expect T3 to not be mapped as an OUTPUT_D tensor, in which case we currently would refuse to segment this because we do not allow tensors without known roles. In the future, we could try and allow things like this by loosening the restriction that output dims must exact map to M, N, K or Batch dims. Then we could do something like what @protonu did for #2315; Use unmapped dims to traverse backward through producers to find mapped dimensions and determine vectorization. In that case we could accept since the inner dimension of T3 corresponds to a shuffled N dimension, and we'd select vectorization width of 2 for T3. This particular example would also require us to accept epilogues with ViewOp in them, which we currently block, but it's at least feasible I think.

zasdfgbnm · 2024-06-04T19:13:13Z

Oh, I see. We are effectively rejecting view ops inside the fusion. And this rejection saves us from having to use the vectorization helper.

csrc/scheduler/mma_utils.h

csrc/scheduler/mma_utils.cpp

csrc/scheduler/matmul_utils.cpp

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

jacobhinkle · 2024-06-05T14:27:20Z

!build

This failure was just due to using exact graph while we now use permissive graph instead.

This updates `scheduleMatmul` to use the dimension roles introduced in #2303 to reorder and merge dims within each role. That means we can schedule fusions with more than one tensor in each role. Two tests are included so far: - MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds to `torch.linear` with 3D input i.e. a Linear layer with two "batch" dimensions. - MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This shows that M dimensions need not be contiguous. Note that vectorization is unaffected in this example since K is innermost. I plan to add more tests to explore more scenarios. Note that we cannot yet handle cases where K is not the innermost dimension, and I have not yet tested cases with M as the innermost output dimension. These will likely require us to modify the fusion definition to place those dimensions in the right spot in the MmaOp. --------- Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

jacobhinkle added enhancement Matmuls labels May 24, 2024

jacobhinkle added 4 commits May 24, 2024 18:46

clang-tidy

19cc0f5

Use canonicalDimOrdering for better vec inference

b30ba0d

Fix what I broke earlier

22abcc8

Fix StridedInputs test

1cdac42

jacobhinkle changed the title ~~[WIP] Flexible matmul tensor and domain roles~~ Flexible matmul tensor and dimension roles May 28, 2024

jacobhinkle added 3 commits May 29, 2024 11:58

Fix MultipleBroadcasts test

91d1b55

Ignore unvectorizable epilogue inputs

374435c

Merge remote-tracking branch 'origin/main' into flexible_tensor_roles

0772134

jacobhinkle added 2 commits May 29, 2024 14:19

Check contiguity with running stride computation

62892be

Remove debug print, satisfy clang-tidy

12bed9a

jacobhinkle mentioned this pull request May 30, 2024

Translate MatmulOp and LinearOp #2236

Merged

jacobhinkle added 10 commits May 30, 2024 15:34

Fix bug leading to segfault in StridedBatch test

311f452

Ignore device dims when computing vectorization

58ed061

Avoid segfault when remaining_inner_dims is empty

26bf0f5

Merge remote-tracking branch 'origin/main' into flexible_tensor_roles

002f299

Switch leaf back to logical.

b3b7e46

I had mistakenly thought we needed the leaf domain to handle hte multidevice cases, but we don't and that is confusing. I think this way is more reliable.

Remove files that were added by mistake

7ec9fc6

Revert change that collected a and b into "operands" for supported_ve…

6102228

…c_size

Undo some formatting changes to simplify diff

75d7f37

Revert kernelconfig change

f0db300

Remove unneeded include

eb0cb95

jacobhinkle changed the title ~~Flexible matmul tensor and dimension roles~~ Infer matmul dimension roles to compute vectorization May 31, 2024

jacobhinkle marked this pull request as ready for review May 31, 2024 18:56

Unset default for epilogue vec, fix comment

4c4eb2b

jacobhinkle requested a review from zasdfgbnm May 31, 2024 19:59

zasdfgbnm reviewed Jun 4, 2024

View reviewed changes

jacobhinkle and others added 9 commits June 5, 2024 07:38

Update csrc/scheduler/matmul_utils.cpp

702addb

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

Assert that tv is fusion output if not input in getSizesAndStrides

95ca977

Respect contiguity flags for outputs in getSizesAndStrides

8742d9e

Add note about preallocated output pointer alignment

969d225

Break once vec_dim_role is found

6f9f79c

Switch from exact graph to permissive

5f46e3c

More exact->permissive replacements

7218465

Assert that N is innermost

a799868

Merge remote-tracking branch 'origin/main' into flexible_tensor_roles

c746d33

zasdfgbnm approved these changes Jun 5, 2024

View reviewed changes

Switch to permissive graph in failing test

beeef86

This failure was just due to using exact graph while we now use permissive graph instead.

jacobhinkle merged commit 4382bf3 into main Jun 6, 2024

jacobhinkle deleted the flexible_tensor_roles branch June 6, 2024 14:07

jacobhinkle mentioned this pull request Jun 10, 2024

Canonicalize matmul dims in scheduleMatmul using dimension roles #2376

Merged

jacobhinkle mentioned this pull request Oct 30, 2024

Matmul Schedules requires vectorization Analysis, it is currently hard coded #2083

Closed

Conversation

jacobhinkle commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented May 28, 2024

Uh oh!

jacobhinkle commented May 29, 2024

Uh oh!

jacobhinkle commented May 29, 2024

Uh oh!

jacobhinkle commented May 31, 2024

Uh oh!

jacobhinkle commented May 31, 2024

Uh oh!

zasdfgbnm commented Jun 4, 2024

Uh oh!

jacobhinkle commented Jun 4, 2024

Uh oh!

zasdfgbnm commented Jun 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle commented Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented May 24, 2024 •

edited

Loading