Canonicalize matmul dims in scheduleMatmul using dimension roles by jacobhinkle · Pull Request #2376 · NVIDIA/Fuser

jacobhinkle · 2024-06-10T15:55:36Z

This updates scheduleMatmul to use the dimension roles introduced in #2303 to reorder and merge dims within each role. That means we can schedule fusions with more than one tensor in each role.

Two tests are included so far:

MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds to torch.linear with 3D input i.e. a Linear layer with two "batch" dimensions.
MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This shows that M dimensions need not be contiguous. Note that vectorization is unaffected in this example since K is innermost.
I plan to add more tests to explore more scenarios.

Note that we cannot yet handle cases where K is not the innermost dimension, and I have not yet tested cases with M as the innermost output dimension. These will likely require us to modify the fusion definition to place those dimensions in the right spot in the MmaOp.

jacobhinkle · 2024-06-10T20:04:53Z

!build

jacobhinkle · 2024-06-11T14:39:41Z

csrc/ir/utils.h

-//! Matches the following matmul patterns.
-//! Matmul: A x B, alpha * A x B
-//! Matmul + Bias (addmm): A x B + C,  alpha * A x B + C, A x B + beta * C,
-//!   alpha * A x B  + beta * C
-//! Linear: A x B / A x B + C
-//! Assumptions:
-//! 1. For simplicity, we assume the MmaOp to be in the first operand.
-//! 2. For linear ([M, K], [N, K]), alpha, beta parameters are nullptr.
-bool matchMatmulPatterns(const UnaryOp* cast_op, MatmulInputs* matmul_inp);
-


This is undefined/uncalled so should've been removed in a previous PR.

One is failing bc we don't modify alloc dom of smem tensors

jacobhinkle · 2024-06-13T20:06:03Z

csrc/scheduler/matmul_utils.cpp

+    // Also check that dims within each role are consecutive with one another
+    // for this pattern.
+    // TODO: Lift this requirement by modifying the definition or setting
+    // allocation domains to support this setting in MmaOp


I believe this condition is sufficient to avoid the problematic cases like the included test MultipleNonConsecutiveNDims.

jacobhinkle · 2024-06-13T20:07:58Z

csrc/scheduler/mma_utils.cpp

+  // Insert the device dims first, then skip them when inserting dims from each
+  // other role


Ensuring device dim groups are placed outside, even if that means we have non-consecutive dims within each role.

csrc/scheduler/matmul.cpp

jacobhinkle · 2024-06-13T20:11:22Z

tests/cpp/test_gpu_tensorcore.cpp

+}
+
+// This is a tougher test where we insert a batch dim between the M dims
+TEST_F(GPUTTensorCoreTest, MultipleMDimsBatch) {


A batch dim can be inserted between M dims, but not the K dim. This is because the batch dimension gets parallelized, so that M1 and M2 will be contiguous with one another in the smem tensor.

zasdfgbnm · 2024-06-14T16:33:37Z

tests/cpp/test_gpu_tensorcore.cpp

+  tv0 = broadcast(tv0, {false, false, true, false});
+  tv1 = broadcast(tv1, {true, true, false, false});


Why Ms are discontiguous?

Oops. Error in the comment. This is the consecutive Ms case. I'll rename it as such as well.

Also add a disabled non-consecutive version

zasdfgbnm · 2024-06-24T18:02:56Z

csrc/scheduler/matmul_utils.cpp

+      NVF_ERROR(it != id_roles.end());
+      role_order.pushBack(it->second);
+    }
+    NVF_ERROR(


So scheduleMatmul can now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?

Wait, should this be the following?

if (role_order.size() != 3 && role_order.size() != 4) { return "Expected either…."; }

Wait, should this be the following?

Ah! Good catch!

So scheduleMatmul can now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?

This check will not complain if there are multiple M or N dims, since we build role_order as the ordering of the roles not all the dimensions that constitute all the roles. For example we might have A[M1, K1, M2] and B[K1, N1] and ordering [M, N, K].

So scheduleMatmul can now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?

This check will not complain if there are multiple M or N dims, since we build role_order as the ordering of the roles not all the dimensions that constitute all the roles. For example we might have A[M1, K1, M2] and B[K1, N1] and ordering [M, N, K].

But wouldn't the role_order.size() check reject the fusion?

No, role_order would be [M, N, K] in that case. It doesn't hold dimensions, but roles.

csrc/scheduler/mma_utils.h

zasdfgbnm · 2024-06-24T18:26:31Z

tests/cpp/test_gpu_tensorcore.cpp

+                  .reshape({M, N1, N2});
+  NVF_CHECK(cg_outputs[0].allclose(tref, 0.0001, 0.0001));
+}
+


Do we also need a MultipleConsecutiveKDims? Or, maybe combine all these tests together to do a (B, B, M, M, N, N, K, K) matmul?

Yeah they are pretty orthogonal so I think I could just combine them into one.

MmaOp currently has a restriction that there can only be a single K dimension. I added a check for this in isMatmulFusionDefinitionSupported since such a pattern could be created as a mul-sum; it cannot be created with matmul or linear.

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

zasdfgbnm · 2024-06-24T18:27:59Z

tests/cpp/test_gpu_tensorcore.cpp

+  fusion.addInput(tv0);
+  fusion.addInput(tv1);
+
+  // M1, N, K, M2


Should this be M1, N, M2, K?

Actually, this is M1, N, K, M2, it's just that I screwed up the definition. I think I will combine this with the other disabled test and have non-consecutive, M, N, and K dims. Similarly I'll combine the enabled consecutive tests

zasdfgbnm · 2024-06-24T18:29:42Z

tests/cpp/test_gpu_tensorcore.cpp

+
+  fusion.addInput(tv0);
+  fusion.addInput(tv1);
+


// M1, B, M2, N, K

…pported

This updates `scheduleMatmul` to use the dimension roles introduced in #2303 to reorder and merge dims within each role. That means we can schedule fusions with more than one tensor in each role. Two tests are included so far: - MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds to `torch.linear` with 3D input i.e. a Linear layer with two "batch" dimensions. - MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This shows that M dimensions need not be contiguous. Note that vectorization is unaffected in this example since K is innermost. I plan to add more tests to explore more scenarios. Note that we cannot yet handle cases where K is not the innermost dimension, and I have not yet tested cases with M as the innermost output dimension. These will likely require us to modify the fusion definition to place those dimensions in the right spot in the MmaOp. --------- Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

jacobhinkle added 7 commits June 10, 2024 12:30

separate present and concrete flags (lot of print statements left)

88c991a

Fix multiple-broadcasts ordering

8518bc3

Remove debug prints

af50dac

Handle multiple M dims, add test.

05621f8

Add [M1, B, M2, K] @ [B, N, K] example

55a2be0

Remove single dim per role restriction, update tests

8470ccc

clang-tidy

acfdcee

jacobhinkle added 3 commits June 11, 2024 13:39

Merge remote-tracking branch 'origin/main' into canonicalize_dims

b503348

Place device dims outermost in canonicalDimOrdering

8e451f4

Remove declaration of unused/undefined matchMatmulPatterns

cddf5fd

jacobhinkle commented Jun 11, 2024

View reviewed changes

jacobhinkle and others added 4 commits June 11, 2024 18:42

Merge remote-tracking branch 'origin/main' into canonicalize_dims

6c7cc03

Merge remote-tracking branch 'origin/main' into canonicalize_dims

06fd9f2

Add multiple N dims tests

26bacc8

One is failing bc we don't modify alloc dom of smem tensors

Merge branch 'main' into canonicalize_dims

8ce549a

jacobhinkle marked this pull request as ready for review June 13, 2024 20:03

jacobhinkle commented Jun 13, 2024

View reviewed changes

jacobhinkle requested a review from zasdfgbnm June 13, 2024 20:11

zasdfgbnm reviewed Jun 14, 2024

View reviewed changes

jacobhinkle added 2 commits June 14, 2024 18:12

Rename and annotate multiple M dims test

764bb2b

Also add a disabled non-consecutive version

Clarify comments

be96163

zasdfgbnm reviewed Jun 24, 2024

View reviewed changes

zasdfgbnm approved these changes Jun 24, 2024

View reviewed changes

csrc/scheduler/mma_utils.h Outdated Show resolved Hide resolved

csrc/scheduler/mma_utils.h Show resolved Hide resolved

zasdfgbnm reviewed Jun 24, 2024

View reviewed changes

Update csrc/scheduler/mma_utils.h

f776aab

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>

zasdfgbnm approved these changes Jun 24, 2024

View reviewed changes

jacobhinkle added 3 commits June 24, 2024 18:38

Change assertion to return error string in isMatmulFusionDefinitionSu…

0d234c0

…pported

Check that there is a single K dimension

2aa6758

Combine consecutive multiple-dims tests

f0b9539

jacobhinkle added 3 commits June 24, 2024 19:32

Merge remote-tracking branch 'origin/main' into canonicalize_dims

a62defd

Merge remote-tracking branch 'origin/main' into canonicalize_dims

118ae00

Remove redundant dim check

9eec662

jacobhinkle merged commit 7bc7a08 into main Jun 25, 2024

jacobhinkle deleted the canonicalize_dims branch June 25, 2024 15:34

		// Insert the device dims first, then skip them when inserting dims from each
		// other role

		tv0 = broadcast(tv0, {false, false, true, false});
		tv1 = broadcast(tv1, {true, true, false, false});

Conversation

jacobhinkle commented Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented Jun 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented Jun 10, 2024 •

edited

Loading

zasdfgbnm Jun 24, 2024 •

edited

Loading