Canonicalize matmul dims in scheduleMatmul using dimension roles#2376
Canonicalize matmul dims in scheduleMatmul using dimension roles#2376jacobhinkle merged 23 commits intomainfrom
Conversation
|
!build |
| //! Matches the following matmul patterns. | ||
| //! Matmul: A x B, alpha * A x B | ||
| //! Matmul + Bias (addmm): A x B + C, alpha * A x B + C, A x B + beta * C, | ||
| //! alpha * A x B + beta * C | ||
| //! Linear: A x B / A x B + C | ||
| //! Assumptions: | ||
| //! 1. For simplicity, we assume the MmaOp to be in the first operand. | ||
| //! 2. For linear ([M, K], [N, K]), alpha, beta parameters are nullptr. | ||
| bool matchMatmulPatterns(const UnaryOp* cast_op, MatmulInputs* matmul_inp); | ||
|
|
There was a problem hiding this comment.
This is undefined/uncalled so should've been removed in a previous PR.
One is failing bc we don't modify alloc dom of smem tensors
| // Also check that dims within each role are consecutive with one another | ||
| // for this pattern. | ||
| // TODO: Lift this requirement by modifying the definition or setting | ||
| // allocation domains to support this setting in MmaOp |
There was a problem hiding this comment.
I believe this condition is sufficient to avoid the problematic cases like the included test MultipleNonConsecutiveNDims.
| // Insert the device dims first, then skip them when inserting dims from each | ||
| // other role |
There was a problem hiding this comment.
Ensuring device dim groups are placed outside, even if that means we have non-consecutive dims within each role.
| } | ||
|
|
||
| // This is a tougher test where we insert a batch dim between the M dims | ||
| TEST_F(GPUTTensorCoreTest, MultipleMDimsBatch) { |
There was a problem hiding this comment.
A batch dim can be inserted between M dims, but not the K dim. This is because the batch dimension gets parallelized, so that M1 and M2 will be contiguous with one another in the smem tensor.
tests/cpp/test_gpu_tensorcore.cpp
Outdated
| tv0 = broadcast(tv0, {false, false, true, false}); | ||
| tv1 = broadcast(tv1, {true, true, false, false}); |
There was a problem hiding this comment.
Why Ms are discontiguous?
There was a problem hiding this comment.
Oops. Error in the comment. This is the consecutive Ms case. I'll rename it as such as well.
Also add a disabled non-consecutive version
csrc/scheduler/matmul_utils.cpp
Outdated
| NVF_ERROR(it != id_roles.end()); | ||
| role_order.pushBack(it->second); | ||
| } | ||
| NVF_ERROR( |
There was a problem hiding this comment.
So scheduleMatmul can now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?
There was a problem hiding this comment.
Wait, should this be the following?
if (role_order.size() != 3 && role_order.size() != 4) {
return "Expected either….";
}There was a problem hiding this comment.
Wait, should this be the following?
Ah! Good catch!
There was a problem hiding this comment.
So
scheduleMatmulcan now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?
This check will not complain if there are multiple M or N dims, since we build role_order as the ordering of the roles not all the dimensions that constitute all the roles. For example we might have A[M1, K1, M2] and B[K1, N1] and ordering [M, N, K].
There was a problem hiding this comment.
So
scheduleMatmulcan now partially support having multiple M/N dims, but we are still rejecting it in the scheduler for now?This check will not complain if there are multiple M or N dims, since we build
role_orderas the ordering of the roles not all the dimensions that constitute all the roles. For example we might have A[M1, K1, M2] and B[K1, N1] and ordering [M, N, K].
But wouldn't the role_order.size() check reject the fusion?
There was a problem hiding this comment.
No, role_order would be [M, N, K] in that case. It doesn't hold dimensions, but roles.
| .reshape({M, N1, N2}); | ||
| NVF_CHECK(cg_outputs[0].allclose(tref, 0.0001, 0.0001)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Do we also need a MultipleConsecutiveKDims? Or, maybe combine all these tests together to do a (B, B, M, M, N, N, K, K) matmul?
There was a problem hiding this comment.
Yeah they are pretty orthogonal so I think I could just combine them into one.
There was a problem hiding this comment.
MmaOp currently has a restriction that there can only be a single K dimension. I added a check for this in isMatmulFusionDefinitionSupported since such a pattern could be created as a mul-sum; it cannot be created with matmul or linear.
Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
| fusion.addInput(tv0); | ||
| fusion.addInput(tv1); | ||
|
|
||
| // M1, N, K, M2 |
There was a problem hiding this comment.
Should this be M1, N, M2, K?
There was a problem hiding this comment.
Actually, this is M1, N, K, M2, it's just that I screwed up the definition. I think I will combine this with the other disabled test and have non-consecutive, M, N, and K dims. Similarly I'll combine the enabled consecutive tests
|
|
||
| fusion.addInput(tv0); | ||
| fusion.addInput(tv1); | ||
|
|
This updates `scheduleMatmul` to use the dimension roles introduced in #2303 to reorder and merge dims within each role. That means we can schedule fusions with more than one tensor in each role. Two tests are included so far: - MultipleMDims which performs [ M1, M2, K] @ [N, K]. This corresponds to `torch.linear` with 3D input i.e. a Linear layer with two "batch" dimensions. - MultipleMDimsBatch which performs [M1, B, M2, K] @ [B, N, K]. This shows that M dimensions need not be contiguous. Note that vectorization is unaffected in this example since K is innermost. I plan to add more tests to explore more scenarios. Note that we cannot yet handle cases where K is not the innermost dimension, and I have not yet tested cases with M as the innermost output dimension. These will likely require us to modify the fusion definition to place those dimensions in the right spot in the MmaOp. --------- Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
This updates
scheduleMatmulto use the dimension roles introduced in #2303 to reorder and merge dims within each role. That means we can schedule fusions with more than one tensor in each role.Two tests are included so far:
torch.linearwith 3D input i.e. a Linear layer with two "batch" dimensions.I plan to add more tests to explore more scenarios.
Note that we cannot yet handle cases where K is not the innermost dimension, and I have not yet tested cases with M as the innermost output dimension. These will likely require us to modify the fusion definition to place those dimensions in the right spot in the MmaOp.