MmaOp - consistency checks for basic matmul by drzejan2 · Pull Request #131 · NVIDIA/Fuser

drzejan2 · 2023-04-05T12:08:22Z

The goal of this PR:

when MmaOp is constructed there are no checks to validate if provided parameters (inputs/outputs) are valid,
this resulted in implementing consistency checks as rules in compile time checks in matmul scheduler,
this PR covers part of follow-up tasks mentioned this comment in Enable matmul scheduler in segmenter #23

The scope of changes:

move some checks from compile time checks in matmul scheduler to MmaOp constructor,
gather M / N / K / Batch axes from MmaOp inputs/outputs,
move input layout check from matmul scheduler compile time checks to MmaOp constructor,
M / N / K / Batch axes and inputs' layout are stored as attributes in MmaOp object,

Tests will be re-enabled with follow up PR that will add in MmaOp handling of scenarios covered by these tests.

Verification results:

c++ tests
- cmd:
  .build/bin/nvfuser_tests
- result:
  all tests passed

cc @mmigdal-nv

drzejan2 · 2023-04-05T13:58:58Z

!build

drzejan2 · 2023-04-05T15:12:09Z

!build

zasdfgbnm

Posting my existing comments.

csrc/ir_nodes.cpp

drzejan2 · 2023-04-07T14:57:21Z

!build

zasdfgbnm · 2023-04-07T16:06:11Z

test/test_gpu_tensorcore.cpp

+  TORCH_CHECK(
+      ir_utils::getMmaOps(fusion.get()).front()->inputLayout().has_value(),
+      "input layout has not be set for MmaOp");
+  TORCH_CHECK(
+      layout ==
+          ir_utils::getMmaOps(fusion.get()).front()->inputLayout().value(),
+      "input layout from test and MmaOp do not match");


nit: I think layout == ir_utils::getMmaOps(fusion.get()).front()->inputLayout() should be sufficient.

The tricky part is that inputLayout() return optional. I made it like this because MmaOp can be created for Vals that are instances of TensorView or TensorIndex. For first layout can be created (unless inputs are incorrectly defined), for second I'm not sure how to handle it.

So currently, if MmaOp is created with TensorIndex then MAxes/NAxes/KAxes/BatchAxes/layout attributes are default initialized (empty std::vector<int> or empty c10::optional<MmaOptions::MmaInputLayout>).

TensorIndex should only appear during lowering. If not, then this is an error. So you don't need to worry about TensorIndex in fusion IR.

zasdfgbnm · 2023-04-07T16:06:22Z

test/test_gpu_tensorcore.cpp

+    TORCH_CHECK(
+        ir_utils::getMmaOps(fusion.get()).front()->inputLayout().has_value(),
+        "input layout has not be set for MmaOp");
+    TORCH_CHECK(
+        layout ==
+            ir_utils::getMmaOps(fusion.get()).front()->inputLayout().value(),
+        "input layout from test and MmaOp do not match");


csrc/ir_nodes.cpp

csrc/ir_internal_nodes.h

csrc/ir_nodes.cpp

drzejan2 · 2023-04-11T12:03:07Z

!build

drzejan2 · 2023-04-12T09:58:08Z

!build

zasdfgbnm

Posting some final comments

csrc/ir_nodes.cpp

zasdfgbnm · 2023-04-12T18:41:13Z

csrc/ir_nodes.cpp

+    }
+  };
+
+  const auto validateOutputDetails = [](const TensorViewDetails& details,


for the case of batched matmul, should we allowing having broadcast and more than two concrete domains?

We don't have (yet) support for batch, but I checked what are the domains in the MmaOp output for single test with strided batches:

NVFuserTest.FusionAmpereStridedBatchedMatmulTN_CUDA

and it looks like this:

T4_l [ iS18{i0}, iS19{i2}, iS20{i6}, iS21{i3}, rS22{i4} ]

I'm not sure if the test represents the final approach for strided batches so broadcasts could appear there.

For now I will keep the current implementation but I will add comment:

// TODO: revise rules when add support for batch gemms

csrc/ir_nodes.cpp

zasdfgbnm · 2023-04-12T18:57:21Z

test/test_gpu_tensorcore.cpp

+  TORCH_CHECK(
+      ir_utils::getMmaOps(fusion.get()).front()->inputLayout().has_value(),
+      "input layout has not be set for MmaOp");
+  TORCH_CHECK(
+      layout ==
+          ir_utils::getMmaOps(fusion.get()).front()->inputLayout().value(),
+      "input layout from test and MmaOp do not match");


TensorIndex should only appear during lowering. If not, then this is an error. So you don't need to worry about TensorIndex in fusion IR.

drzejan2 · 2023-04-13T12:18:37Z

!build

zasdfgbnm

Posting a few coding style change, please fix this before merge

zasdfgbnm · 2023-04-13T15:29:27Z

csrc/ir_nodes.cpp

+    if (details.bcasts.empty()) {
+      TORCH_INTERNAL_ASSERT(false, desc, ": has no broadcast domains.");
+    }


nit:

Suggested change

if (details.bcasts.empty()) {

TORCH_INTERNAL_ASSERT(false, desc, ": has no broadcast domains.");

}

TORCH_INTERNAL_ASSERT(!details.bcasts.empty(), desc, ": has no broadcast domains.");

I missed this, good point. I updated PR and if there are no issues I will merge.

Thanks!

zasdfgbnm · 2023-04-13T15:30:04Z

csrc/ir_nodes.cpp

+    if (!details.rdomains.empty()) {
+      TORCH_INTERNAL_ASSERT(false, desc, ": has reduction domains.");
+    }


nit:

Suggested change

if (!details.rdomains.empty()) {

TORCH_INTERNAL_ASSERT(false, desc, ": has reduction domains.");

}

TORCH_INTERNAL_ASSERT(details.rdomains.empty(), desc, ": has reduction domains.");

zasdfgbnm · 2023-04-13T15:30:51Z

csrc/ir_nodes.cpp

+    if (details.cdomains.size() < expected_gemm_cdomains) {
+      TORCH_INTERNAL_ASSERT(
+          false,
+          desc,
+          ": has unsupported number of concrete domains, expected at least ",
+          expected_gemm_cdomains,
+          ", got ",
+          details.cdomains.size());
+    }


nit:

Suggested change

if (details.cdomains.size() < expected_gemm_cdomains) {

TORCH_INTERNAL_ASSERT(

false,

desc,

": has unsupported number of concrete domains, expected at least ",

expected_gemm_cdomains,

", got ",

details.cdomains.size());

}

TORCH_INTERNAL_ASSERT(

details.cdomains.size() >= expected_gemm_cdomains,

desc,

": has unsupported number of concrete domains, expected at least ",

expected_gemm_cdomains,

", got ",

details.cdomains.size());

zasdfgbnm · 2023-04-13T15:31:26Z

csrc/ir_nodes.cpp

+    if (!details.bcasts.empty()) {
+      TORCH_INTERNAL_ASSERT(false, desc, ": has broadcast domains.");
+    }


nit:

Suggested change

if (!details.bcasts.empty()) {

TORCH_INTERNAL_ASSERT(false, desc, ": has broadcast domains.");

}

TORCH_INTERNAL_ASSERT(details.bcasts.empty(), desc, ": has broadcast domains.");

zasdfgbnm · 2023-04-13T15:32:06Z

csrc/ir_nodes.cpp

+    if (details.rdomains.empty()) {
+      TORCH_INTERNAL_ASSERT(false, desc, ": has no reduction domains.");
+    }


nit:

Suggested change

if (details.rdomains.empty()) {

TORCH_INTERNAL_ASSERT(false, desc, ": has no reduction domains.");

}

TORCH_INTERNAL_ASSERT(!details.rdomains.empty(), desc, ": has no reduction domains.");

zasdfgbnm · 2023-04-13T15:33:10Z

csrc/ir_nodes.cpp

+    if (details.cdomains.size() < expected_gemm_cdomains) {
+      TORCH_INTERNAL_ASSERT(
+          false,
+          desc,
+          ": has unsupported number of concrete domains, expected at least ",
+          expected_gemm_cdomains,
+          ", got ",
+          details.cdomains.size());
+    }


nit:

Suggested change

if (details.cdomains.size() < expected_gemm_cdomains) {

TORCH_INTERNAL_ASSERT(

false,

desc,

": has unsupported number of concrete domains, expected at least ",

expected_gemm_cdomains,

", got ",

details.cdomains.size());

}

TORCH_INTERNAL_ASSERT(

details.cdomains.size() >= expected_gemm_cdomains,

desc,

": has unsupported number of concrete domains, expected at least ",

expected_gemm_cdomains,

", got ",

details.cdomains.size());

- move some checks from compile time checks in matmul scheduler to MmaOp constructor, - move input layout check from matmul scheduler compile time checks to MmaOp class, - extend the set of attributes associated with MmaOp, - update compile time checks in matmul scheduler,

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch from 5aa83df to 34f252e Compare April 5, 2023 12:12

drzejan2 requested review from naoyam and zasdfgbnm April 5, 2023 13:59

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch from 34f252e to b624b73 Compare April 5, 2023 15:09

zasdfgbnm reviewed Apr 5, 2023

View reviewed changes

csrc/ir_nodes.cpp Outdated Show resolved Hide resolved

csrc/ir_nodes.cpp Outdated Show resolved Hide resolved

csrc/ir_nodes.cpp Outdated Show resolved Hide resolved

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch 2 times, most recently from 944d183 to 2f6e541 Compare April 7, 2023 14:44

zasdfgbnm reviewed Apr 7, 2023

View reviewed changes

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch 2 times, most recently from 6b36c7e to 4a6571f Compare April 11, 2023 11:59

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch from 4a6571f to 2f823e9 Compare April 12, 2023 09:57

mmigdal-nv self-requested a review April 12, 2023 09:59

zasdfgbnm reviewed Apr 12, 2023

View reviewed changes

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch from 2f823e9 to d31dda6 Compare April 13, 2023 07:18

zasdfgbnm approved these changes Apr 13, 2023

View reviewed changes

drzejan2 force-pushed the ab/MmaOp_def_consistency_checks branch from d31dda6 to a992c0e Compare April 13, 2023 16:52

drzejan2 merged commit 7a3b24e into main Apr 13, 2023

drzejan2 deleted the ab/MmaOp_def_consistency_checks branch April 13, 2023 16:57

Conversation

drzejan2 commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The goal of this PR:

The scope of changes:

Verification results:

Uh oh!

drzejan2 commented Apr 5, 2023

Uh oh!

drzejan2 commented Apr 5, 2023

Uh oh!

zasdfgbnm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drzejan2 commented Apr 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drzejan2 Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drzejan2 commented Apr 11, 2023

Uh oh!

drzejan2 commented Apr 12, 2023

Uh oh!

zasdfgbnm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drzejan2 commented Apr 13, 2023

Uh oh!

zasdfgbnm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drzejan2 commented Apr 5, 2023 •

edited

Loading

drzejan2 Apr 7, 2023 •

edited

Loading

zasdfgbnm left a comment •

edited

Loading