Add a new ATen Matmul IR node by Priya2698 · Pull Request #2175 · NVIDIA/Fuser

Priya2698 · 2024-05-01T21:54:06Z

Adds a new MatmulOp IR node that has the same functionality as torch.matmul.

Both tensors are 1D: Dot product is returned (sum(mul(a, b)), without creating a MatmulOp node.
One of the tensors is 1D: [M, K] x [K] -> [M] / [K] x [K, N] -> [N]
Both tensors are 2D: [M, K] x [K, N] -> [M, N]
Both tensors are atleast 1D and one of the tensors is > 2D: [B, M, K] x [K, N] -> [B, M, N]

csrc/ops/utils/mapMatmulOpIterDomains defines the logic to map the input operands to the ouput. This is used to create the new output TensorView for the MatmulOp using the input iterdomains and in PairwiseRootDomainMap to accurately map the MatmulOp input/outputs. This is required since the inputs are no longer broadcasted affecting the alignment of the inputs with the output.

csrc/ops/arith.cpp

csrc/root_domain_map.cpp

csrc/scheduler/expr_eval_sched.cpp

tests/cpp/test_matmul_aten_evaluation.cpp

csrc/root_domain_map.cpp

jacobhinkle

First pass. Havent looked in detail at the mapping code yet.

csrc/scheduler/expr_eval_sched.cpp

tests/cpp/test_matmul_aten_evaluation.cpp

csrc/ir/internal_nodes.h

csrc/ops/arith.cpp

csrc/root_domain_map.cpp

csrc/scheduler/expr_eval_sched.h

tests/cpp/test_matmul_aten_evaluation.cpp

csrc/ir/internal_nodes.h

tests/cpp/test_matmul_aten_evaluation.cpp

csrc/ops/arith.cpp

jacobhinkle · 2024-05-06T16:24:32Z

csrc/ops/utils.cpp

+// Adding these pragmas since gcc-12.2.1
+// incorrectly reports a warning with the use of evaluate
+#if defined(__GNUC__) && !defined(__clang__)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+#endif


CC @protonu. Please check that this GCC check is in the right place. Specifically, should it instead be at outIterDomain()? I checked the original PR #643 but I didn't find any mention of this code so I'm not sure what the original problem was.

Based on the comment about issues with using evaluate, it should be around the new outIterDomain function.

csrc/ops/utils.h

csrc/root_domain_map.cpp

tests/cpp/test_matmul_aten_evaluation.cpp

Priya2698 · 2024-05-07T19:17:55Z

I made some drive-by comments, but I'll really defer approval to others because I'm not a matmul expert and I have a bit too many PRs to review this week :)

Thanks @wujingyue, for the helpful comments.

jacobhinkle

LGTM other than some minor comments

jacobhinkle · 2024-05-08T14:55:56Z

csrc/ops/utils.h

+    bool is_lhs,
+    size_t out_size);
+
+IterDomain* newOutputIterDomain(const std::vector<IterDomain*>& ids);


Please add a comment above this function indicating what you just said here.

jacobhinkle · 2024-05-08T14:57:22Z

csrc/root_domain_map.cpp

 }

+// Add key-value iterdomain pair to the map.
+void updatePairwiseRootDomainMap(


You could make this a lambda inside map() so that you could capture the last three arguments instead of passing them explicitly.

jacobhinkle · 2024-05-08T15:02:45Z

tests/cpp/test_matmul_aten_evaluation.cpp

+        std::make_tuple(Sizes({m, k}), Sizes({k})),
+        std::make_tuple(Sizes({k}), Sizes({b, k, n})),
+        std::make_tuple(Sizes({b, m, k}), Sizes({k})),
+        std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n}))));


Is it true that we can accept any combination of A and B where A is one of {k}, {m, k}, {b, m, k}, {b, 1, m, k} and B is one of {k}, {k, n}, {b, k, n}? If so maybe we could parametrize each of those separately and hit all combos.

Out of curiosity. What would happen if k/m/n happen to be 1?

Out of curiosity. What would happen if k/m/n happen to be 1?

Do you mean if the output shape is different? -- It will be the same as any other case.

Or how we intend to handle those cases?
@jacobhinkle pointed out we could special-case these cases though to increase opportunity for fusion. (For eg: For [M, 1] x [1, N], we can simply return the outer product without creating the MatmulOp node)

For [M, 1] x [1, N]

This is the K=1 case but I think Jie is asking what about M and N, which we aren't testing here. I would also add that we're not really testing this case properly either since we are creating the input tensors with makeSymbolicTensor(a_shape.size()) which will make all dimensions IterType::Iteration. I suggested a code change that I think will address that, then we can add shape combos that have 1s in each position.

I added cases for M=1/N=1. At present, they will behave the same way as when M/N > 1.

I would also add that we're not really testing this case properly either since we are creating the input tensors with makeSymbolicTensor(a_shape.size()) which will make all dimensions IterType::Iteration.

makeSymbolicTensor(a_shape) -> Does this mark dimensions as Broadcast if the extent is 1?

csrc/ops/composite.cpp

csrc/ir/nodes.cpp

csrc/ops/composite.h

csrc/ops/composite.cpp

csrc/ops/utils.cpp

csrc/ops/utils.h

jjsjann123 · 2024-05-08T17:03:28Z

tests/cpp/test_matmul_aten_evaluation.cpp

+
+  FusionExecutor fe;
+  fusion->aliasOutputToInput(
+      fusion->outputs()[0], /*input=*/nullptr, AllocationType::Evaluate);


This is a strange API for marking something as AllocationType::Evaluate...

We used the existing framework for aliasing, hence, some of the API names may seem weird.
I'll make a note of this to modify in a future cleanup PR.

jjsjann123 · 2024-05-08T17:04:42Z

tests/cpp/test_matmul_aten_evaluation.cpp

+        std::make_tuple(Sizes({m, k}), Sizes({k})),
+        std::make_tuple(Sizes({k}), Sizes({b, k, n})),
+        std::make_tuple(Sizes({b, m, k}), Sizes({k})),
+        std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n}))));


Out of curiosity. What would happen if k/m/n happen to be 1?

This PR restricts the accepted matmul segments for the nvfuser matmul scheduler to only those containing pointwise epilogues. Additionally, it rules out cases for which we cannot yet reliably determine epilogue input vectorization due to transposes (TODO, see #2169). Note that this check can be lifted when more epilogue cases are supported, e.g. #2213. Fixes #2167. This is stacked on #2175 and follow-up PR to that introducing LinearOp because currently segmentation fails for matmuls unless the complete fusion can be scheduled (see #1707). The MatmulOp and LinearOp IR nodes remove the need to inspect operand producer branches, so segmentation should work fine once that work is merged. This PR will be marked as draft until then.

Priya2698 · 2024-05-09T17:30:45Z

!build

jacobhinkle · 2024-05-09T17:30:44Z

tests/cpp/test_matmul_aten_evaluation.cpp

+  auto tv0 = makeSymbolicTensor(a_shape.size(), DataType::Half);
+  auto tv1 = makeSymbolicTensor(b_shape.size(), DataType::Half);


Suggested change

auto tv0 = makeSymbolicTensor(a_shape.size(), DataType::Half);

auto tv1 = makeSymbolicTensor(b_shape.size(), DataType::Half);

auto tv0 = makeSymbolicTensor(a_shape, DataType::Half);

auto tv1 = makeSymbolicTensor(b_shape, DataType::Half);

This will ensure the tensors are defined in the fusion almost the way they would be using fd.from_pytorch in case the shapes contain 1s. That could be useful here since you might translate some ops to non-MatmulOp if they are trivial. To be more precise though, this will still not declare them as contiguous. For that you might want to do

auto tv0 = TensorViewBuilder().shape(sym_shape).dtype(DataType::Half).contiguity(true).build();

where sym_shape is a vector of 1 and -1.

To be more precise though, this will still not declare them as contiguous

Why do we need this?

We probably don't.

However if in the future we try to evaluate this test by forcing the nvfuser matmul scheduler we might use an inefficient kernel since we won't be able to tell the inputs are contiguous. It's not a high priority, but it would be nice to have a test utility that could take an at::Tensor and create a TensorView* that matches it just like how we do in fd.from_pytorch.

jacobhinkle · 2024-05-09T17:30:49Z

tests/cpp/test_matmul_aten_evaluation.cpp

+        std::make_tuple(Sizes({m, k}), Sizes({k})),
+        std::make_tuple(Sizes({k}), Sizes({b, k, n})),
+        std::make_tuple(Sizes({b, m, k}), Sizes({k})),
+        std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n}))));


For [M, 1] x [1, N]

This is the K=1 case but I think Jie is asking what about M and N, which we aren't testing here. I would also add that we're not really testing this case properly either since we are creating the input tensors with makeSymbolicTensor(a_shape.size()) which will make all dimensions IterType::Iteration. I suggested a code change that I think will address that, then we can add shape combos that have 1s in each position.

Priya2698 · 2024-05-09T18:28:18Z

!build

This PR does the following: 1. Add `MatmulOp` to `ir_utils::isTvOp` so that its `IterDomain`s will be automatically propagated by `IdModel`. 2. Updates the tests to check that all non-Broadcast axes are properly mapped by `IdModel` through the `MatmulOp`. 3. Changes the output of `MatmulOp` to have an `IterType::Reduction` axis in the last position of its root domain to represent the `K` dimension. This change was motivated by needing a way to have both operand K dimensions exact mapped together, as they would be if the op were translated to a mul+sum+cast. 4. Updates the `matmul` op to translate trivial cases where K=1 to simple multiply+cast patterns. Fixes #1707. In fact, that test was actually fixed by #2175 but the test validation was failing because `isTvOp` was not picking up the matmul as a reduction.

Priya2698 added 7 commits May 1, 2024 03:24

add matmul node, create output tensorview

367d0d3

wip scheduler

79eb80e

add to dispatch, scheduler heuristic, registry

ad47a98

root map override

3154ea4

rebase

14e9e8e

mapping for matmul ir node

c97b2df

use higher dim inp to create output

763571c

Priya2698 commented May 1, 2024

View reviewed changes

csrc/ops/arith.cpp Outdated Show resolved Hide resolved

Priya2698 commented May 1, 2024

View reviewed changes

csrc/ops/arith.cpp Outdated Show resolved Hide resolved

Priya2698 commented May 1, 2024

View reviewed changes

csrc/root_domain_map.cpp Outdated Show resolved Hide resolved

Priya2698 commented May 1, 2024

View reviewed changes

csrc/scheduler/expr_eval_sched.cpp Outdated Show resolved Hide resolved

Priya2698 commented May 1, 2024

View reviewed changes

tests/cpp/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

Priya2698 commented May 1, 2024

View reviewed changes

csrc/root_domain_map.cpp Outdated Show resolved Hide resolved

jacobhinkle reviewed May 1, 2024

View reviewed changes

csrc/scheduler/expr_eval_sched.cpp Outdated Show resolved Hide resolved

tests/cpp/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

jacobhinkle reviewed May 2, 2024

View reviewed changes

Priya2698 added 4 commits May 2, 2024 22:42

remove scheduler"

779c17b

review comments

d0d626f

modify pairwise matching

a4c9196

parametrize tests

09f4596

Priya2698 mentioned this pull request May 3, 2024

Enable linear for nvFuser Lightning-AI/lightning-thunder#318

Merged

Replace testing::ValuesIn with testing::Values.

b3938c6

wujingyue reviewed May 3, 2024

View reviewed changes

csrc/ir/internal_nodes.h Outdated Show resolved Hide resolved

tests/cpp/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

tests/cpp/test_matmul_aten_evaluation.cpp Outdated Show resolved Hide resolved

Priya2698 added 3 commits May 3, 2024 22:52

refactor iterdomain build

ddcf950

modify matmul out allocation

d91b3ca

reusecode

9681850

jacobhinkle reviewed May 6, 2024

View reviewed changes

Priya2698 added 4 commits May 6, 2024 19:23

review comments

cd547e6

move mapping logic to another function

67a4113

use mapping in root domain

8f474ca

comment

cb2fd8d

review comments

cfafd78

Priya2698 requested a review from jjsjann123 May 7, 2024 19:18

Priya2698 added 3 commits May 7, 2024 19:29

1D case, review comments

445d150

move common code

3bef898

format

f79a0ae

jacobhinkle approved these changes May 8, 2024

View reviewed changes

jacobhinkle reviewed May 8, 2024

View reviewed changes

csrc/ops/composite.cpp Outdated Show resolved Hide resolved

jjsjann123 reviewed May 8, 2024

View reviewed changes

jacobhinkle mentioned this pull request May 8, 2024

ATen scheduler for the new Matmul/LinearOp IR nodes #2209

Merged

jacobhinkle mentioned this pull request May 9, 2024

[WIP] Check that matmul epilogue contains only pointwise ops #2225

Closed

Priya2698 added 3 commits May 9, 2024 17:16

review comments

96645b8

parametrize A and b separately

183448d

lin

31da266

jacobhinkle reviewed May 9, 2024

View reviewed changes

Priya2698 added 3 commits May 9, 2024 17:34

lint

4dad833

add M=1/N=1 case

19a3179

add K=1 case

0d2a2d8

Priya2698 merged commit d214286 into main May 9, 2024

Priya2698 deleted the pm/linear_node branch May 9, 2024 22:01

This was referenced May 14, 2024

Additional features to fill out ops.matmul #2092

Closed

Update nvFuser matmul Lightning-AI/lightning-thunder#419

Merged

Segmentation failure in matmul + reshape fusion #2127

Closed

jacobhinkle mentioned this pull request May 16, 2024

Fix MatmulOp IterDomain mapping #2246

Merged

Priya2698 mentioned this pull request May 17, 2024

Investigate adding IR Nodes for linear and matmul #2149

Closed

jacobhinkle mentioned this pull request Jun 3, 2024

Task formalism in our IR [Inspired by Online Softmax] #2329

Open

		auto tv0 = makeSymbolicTensor(a_shape.size(), DataType::Half);
		auto tv1 = makeSymbolicTensor(b_shape.size(), DataType::Half);

Conversation

Priya2698 commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Priya2698 commented May 7, 2024

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priya2698 May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priya2698 commented May 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Priya2698 commented May 1, 2024 •

edited

Loading

Priya2698 May 9, 2024 •

edited

Loading