Matmul default scheduling [1] by Priya2698 · Pull Request #1743 · NVIDIA/Fuser

Priya2698 · 2024-02-08T23:37:09Z

Initial PR for Issue #1669.

Adds a EnableOption::MatmulExprEval to turn on expression evaluation for matmul while the API in under progress.
Currently, we only evaluate the MmaOp. The next PRs will amend this to look ahead and evaluate Mma + Cast, which is what we should see in the fusion definitions. See discussion here. In the absence of this we may have casts such as bfloat->float->bfloat.

naoyam · 2024-02-13T00:07:58Z

csrc/scheduler/matmul.cpp

      mma_ops.size());
+
+  // Skip scheduling if Matmul will be expression evaluated.
+  if (isOptionEnabled(EnableOption::MatmulExprEval)) {


Does this mean any matmul pattern including mma + epilogue will be handled by the expression evaluator? Shouldn't it only take care of the mma part?

There are two problems with taking care of just the mma:

at::matmul doesn't do HH->S. We could plug in another backend that supports HH->S for EE though.

MMA is never alone in these GPT models (e.g. LLaMA). It's always part of a linear layer or an SDPA. nvFuser doesn't do SDPA well and we will have to offload it to another executor for quite some time, so scratch that. A linear layer however comes with this MMA->BiasAdd pattern. In order for its performance to be on par with framework-not-giving-nvFuser-the-matmul, we have to execute MMA+epilogue in one call.

Wdyt, @naoyam? Extending EE to do MMA+epilogue is the most obvious way to me to solve the above problems. But I could definitely be wrong.

Extending EE to do MMA+epilogue

How would it do that? Does Aten support matmul with some epilogue op?

Oh yeah, it's called at::addmm. I realized I was wrong about Relu. torch.nn.Linear doesn't do Relu, so the pattern would be matmul+biasadd which is what at::addmm does.

So, how would you handle matmul+epilogue patterns that are accepted by the nvFuser matmul scheduler but there's no corresponding aten version? Would that end up doing mma and the epilogue op separately?

That makes sense. Or, maybe we might want to have a separate scheduler for EE? We try the native matmul scheduler first, and then the EE matmul scheduler, and then the other schedulers. No particular preference but just my two cents.

Yes, that's a certainly great option to consider. It makes things more composable at a risk of being harder to share logics with MatmulScheduler. I hope the preference will be more obvious when we know what the heuristics look like!

Adding on to @wujingyue's comments, the plan for the next PRs is:

Support Mma + Cast -> avoid roundtrip casting (half-> float -> half) by checking mma_out->uses() and if it is either a castOp or pointwise ops with inputs of the same type (half), then skip casting back the output of at::matmul. This will not execute matmul + bias in a single call.

Handle common epilogue fusions: We will need to pattern match and evaluate within the MmaOp accordingly. test_matmul_scheduler.cpp currently has a few cases that I will start with: mma + bias, mma + bias + relu/gelu, mma + relu.

Epilogue fusions that are not be supported: They can still be computed through EE but should ideally not be plumbed down to the matmul scheduler.

Do 2 only when it's needed. You should double check this but I think Llama for example uses linear with bias off and none of the linear layers in our benchmarks do relu or gelu

Llama2 has bias=False in the linear layers but some GPT configs have bias=True.

wujingyue

Nice work! The PR currently shows as a draft. I'll hold off my reviews when it's ready.

csrc/device_lower/pass/expr_sort.cpp

test/test_matmul_default.cpp

wujingyue

LGTM with some comments to be resolved.

csrc/fusion.h

csrc/scheduler/matmul.cpp

jacobhinkle

LGTM. Just some comments on the tests.

test/test_matmul_default.cpp

csrc/fusion.h

drzejan2 · 2024-02-14T10:55:28Z

Thanks for preparing this change, they look good to me after caught up to the latest discussions around matmul scheduler.

protonu · 2024-02-14T20:20:26Z

Nit: Will it make sense to make this C++ test file a part of the test_matmul target?
https://github.com/NVIDIA/Fuser/blob/9fac12bdd98a63fdc88dde265b8add6a0e3f41cf/CMakeLists.txt#L480C1-L489C41

test/test_matmul_aten_evaluation.cpp

Priya2698 · 2024-02-14T20:57:36Z

Nit: Will it make sense to make this C++ test file a part of the test_matmul target? https://github.com/NVIDIA/Fuser/blob/9fac12bdd98a63fdc88dde265b8add6a0e3f41cf/CMakeLists.txt#L480C1-L489C41

Thanks for the suggestion! Moved the test file.

jjsjann123 · 2024-02-15T01:11:12Z

FYI, this pr seems to break CI on V100.

Priya2698 · 2024-02-15T01:13:35Z

Thanks for pointing this out. Let's revert this until I identify the patch.

This reverts commit a0cb47a.

jjsjann123 · 2024-02-15T01:15:01Z

FYI, these are the failing tests from CI:

00:00:43 [  FAILED  ] 3 tests, listed below:
00:00:43 [  FAILED  ] MatmulATenEvaluationTest.SingleMmaOp
00:00:43 [  FAILED  ] MatmulATenEvaluationTest.MmaOpAndCast
00:00:43 [  FAILED  ] MatmulATenEvaluationTest.MatmulWithBias

Priya2698 · 2024-02-15T01:16:25Z

Yes, these are the tests I added to check functionality.
Looking into it.

wujingyue · 2024-02-15T01:17:12Z

Thanks for pointing this out. Let's revert this until I identify the patch.

Thank you! I can't say enough great things about revert-and-debug-later!

Reverts #1743. This is breaking on V100.

Priya2698 · 2024-02-15T01:41:18Z

Are we looking to support V100 through the default path? CC: @kevinstephano
NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD can be used in the tests to fix the error. The error surfaced because matmulScheduler does not support V100 (See:

Fuser/csrc/scheduler/matmul_utils.cpp

Line 44 in 77caa57

inline std::optional<MmaMacro> getMmaOp(

).

If we wish to support V100, we can have appropriate checks in the heuristic verification to allow other architectures when we are using expression evaluator.

wujingyue · 2024-02-15T02:56:04Z

Are we looking to support V100 through the default path?

I'd do it. This is actually something that can be supported with much less effort in fallback mode than in codegen. Sounds like a low-hanging fruit to me.

wujingyue · 2024-02-15T04:36:57Z

Are we looking to support V100 through the default path? CC: @kevinstephano NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD can be used in the tests to fix the error. The error surfaced because matmulScheduler does not support V100 (See:

Fuser/csrc/scheduler/matmul_utils.cpp

Line 44 in 77caa57

inline std::optional<MmaMacro> getMmaOp(

).
If we wish to support V100, we can have appropriate checks in the heuristic verification to allow other architectures when we are using expression evaluator.

FYI, I suspect it's not just V100. https://github.com/NVIDIA/Fuser/actions/runs/7910657123/job/21593586846 seems to be the same error but for H100.

protonu · 2024-02-15T18:47:15Z

On option could be use to use something like this (https://github.com/NVIDIA/Fuser/blob/88727dc828684f5a62d7f1837a610b7589f629d1/test/test_combine_mul_sum.cpp#L40C1-L57C3) to reduce the machines these tests run on, in case you guys don't plan on adding support for V100/H100.

Priya2698 · 2024-02-15T18:57:49Z

On option could be use to use something like this (https://github.com/NVIDIA/Fuser/blob/88727dc828684f5a62d7f1837a610b7589f629d1/test/test_combine_mul_sum.cpp#L40C1-L57C3) to reduce the machines these tests run on, in case you guys don't plan on adding support for V100/H100.

Thanks for the suggestion, I am moving forward with supporting any architecture since it's simple enough.
For this PR, the fix will be to skip the heuristic computation when the MatmulExprEval is set. In the next PRs, I'll refactor some code to always use the default scheduling for any unsupported architecture.

Priya2698 added 8 commits February 7, 2024 20:20

Add type evaluate

00c94e1

skip codegen for AllocationType::Evaluate

38559cd

add enable option for matmul aten

0467dd7

add test

0793897

change function args

3cabef9

clean unnecessary checks; use fusion output instead of mma->out

27e0da7

set hide output to false

3ae1d02

rename

669db2b

Priya2698 changed the title ~~Matmul default scheduling [1]~~ [WIP] Matmul default scheduling [1] Feb 8, 2024

add tests

9f3c938

Priya2698 changed the title ~~[WIP] Matmul default scheduling [1]~~ Matmul default scheduling [1] Feb 9, 2024

Priya2698 requested review from jacobhinkle, protonu and wujingyue February 12, 2024 21:54

lintrunner

19ade12

naoyam reviewed Feb 13, 2024

View reviewed changes

wujingyue reviewed Feb 13, 2024

View reviewed changes

csrc/device_lower/pass/expr_sort.cpp Outdated Show resolved Hide resolved

test/test_matmul_default.cpp Show resolved Hide resolved

rename

526edcb

Priya2698 marked this pull request as ready for review February 13, 2024 05:43

save orig state in enable options

95b39b7

wujingyue reviewed Feb 13, 2024

View reviewed changes

csrc/fusion.h Outdated Show resolved Hide resolved

csrc/scheduler/matmul.cpp Outdated Show resolved Hide resolved

review comments

cebd84e

jacobhinkle approved these changes Feb 13, 2024

View reviewed changes

test/test_matmul_default.cpp Outdated Show resolved Hide resolved

test/test_matmul_default.cpp Show resolved Hide resolved

test/test_matmul_default.cpp Show resolved Hide resolved

test/test_matmul_default.cpp Outdated Show resolved Hide resolved

csrc/fusion.h Outdated Show resolved Hide resolved

Priya2698 added 3 commits February 13, 2024 19:48

rename test and testfile

21517bd

modify test to cast bias

acde58f

check io_alias_ entry

78d8372

wujingyue reviewed Feb 14, 2024

View reviewed changes

test/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

test/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

move test file to test_matmul, expct_true->expect_eq

e4d5b3c

Priya2698 merged commit a0cb47a into main Feb 14, 2024

Priya2698 deleted the pm/mma_default branch February 14, 2024 22:19

Priya2698 mentioned this pull request Feb 14, 2024

Rename AllocationTypes #1763

Merged

jjsjann123 mentioned this pull request Feb 15, 2024

Layout propagation #1744

Closed

2 tasks

Priya2698 added a commit that referenced this pull request Feb 15, 2024

Revert "Matmul default scheduling [1] (#1743)"

14c8d27

This reverts commit a0cb47a.

Priya2698 mentioned this pull request Feb 15, 2024

Revert "Matmul default scheduling [1]" #1767

Merged

jjsjann123 pushed a commit that referenced this pull request Feb 15, 2024

Revert "Matmul default scheduling [1]" (#1767)

77caa57

Reverts #1743. This is breaking on V100.

Priya2698 restored the pm/mma_default branch February 15, 2024 01:18

This was referenced Feb 21, 2024

Matmul default scheduling #1775

Merged

Skip compiling fusion segments that are marked for EE #1812

Closed

Conversation

Priya2698 commented Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wujingyue Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wujingyue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wujingyue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drzejan2 commented Feb 14, 2024

Uh oh!

protonu commented Feb 14, 2024

Uh oh!

Uh oh!

Uh oh!

Priya2698 commented Feb 14, 2024

Uh oh!

jjsjann123 commented Feb 15, 2024

Uh oh!

Priya2698 commented Feb 15, 2024

Uh oh!

jjsjann123 commented Feb 15, 2024

Uh oh!

Priya2698 commented Feb 15, 2024

Uh oh!

wujingyue commented Feb 15, 2024

Uh oh!

Priya2698 commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue commented Feb 15, 2024

Uh oh!

wujingyue commented Feb 15, 2024

Uh oh!

protonu commented Feb 15, 2024

Uh oh!

Priya2698 commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Priya2698 commented Feb 8, 2024 •

edited

Loading

wujingyue Feb 13, 2024 •

edited

Loading

Priya2698 commented Feb 15, 2024 •

edited

Loading

Priya2698 commented Feb 15, 2024 •

edited

Loading