Conversation
jacobhinkle
left a comment
There was a problem hiding this comment.
First pass. Havent looked in detail at the mapping code yet.
csrc/ops/utils.cpp
Outdated
| // Adding these pragmas since gcc-12.2.1 | ||
| // incorrectly reports a warning with the use of evaluate | ||
| #if defined(__GNUC__) && !defined(__clang__) | ||
| #pragma GCC diagnostic push | ||
| #pragma GCC diagnostic ignored "-Wfree-nonheap-object" | ||
| #endif |
There was a problem hiding this comment.
Based on the comment about issues with using evaluate, it should be around the new outIterDomain function.
Thanks @wujingyue, for the helpful comments. |
jacobhinkle
left a comment
There was a problem hiding this comment.
LGTM other than some minor comments
| bool is_lhs, | ||
| size_t out_size); | ||
|
|
||
| IterDomain* newOutputIterDomain(const std::vector<IterDomain*>& ids); |
There was a problem hiding this comment.
Please add a comment above this function indicating what you just said here.
csrc/root_domain_map.cpp
Outdated
| } | ||
|
|
||
| // Add key-value iterdomain pair to the map. | ||
| void updatePairwiseRootDomainMap( |
There was a problem hiding this comment.
You could make this a lambda inside map() so that you could capture the last three arguments instead of passing them explicitly.
| std::make_tuple(Sizes({m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({k}), Sizes({b, k, n})), | ||
| std::make_tuple(Sizes({b, m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n})))); |
There was a problem hiding this comment.
Is it true that we can accept any combination of A and B where A is one of {k}, {m, k}, {b, m, k}, {b, 1, m, k} and B is one of {k}, {k, n}, {b, k, n}? If so maybe we could parametrize each of those separately and hit all combos.
There was a problem hiding this comment.
Out of curiosity. What would happen if k/m/n happen to be 1?
There was a problem hiding this comment.
Out of curiosity. What would happen if k/m/n happen to be 1?
Do you mean if the output shape is different? -- It will be the same as any other case.
Or how we intend to handle those cases?
@jacobhinkle pointed out we could special-case these cases though to increase opportunity for fusion. (For eg: For [M, 1] x [1, N], we can simply return the outer product without creating the MatmulOp node)
There was a problem hiding this comment.
For
[M, 1] x [1, N]
This is the K=1 case but I think Jie is asking what about M and N, which we aren't testing here. I would also add that we're not really testing this case properly either since we are creating the input tensors with makeSymbolicTensor(a_shape.size()) which will make all dimensions IterType::Iteration. I suggested a code change that I think will address that, then we can add shape combos that have 1s in each position.
There was a problem hiding this comment.
I added cases for M=1/N=1. At present, they will behave the same way as when M/N > 1.
There was a problem hiding this comment.
I would also add that we're not really testing this case properly either since we are creating the input tensors with
makeSymbolicTensor(a_shape.size())which will make all dimensionsIterType::Iteration.
makeSymbolicTensor(a_shape) -> Does this mark dimensions as Broadcast if the extent is 1?
|
|
||
| FusionExecutor fe; | ||
| fusion->aliasOutputToInput( | ||
| fusion->outputs()[0], /*input=*/nullptr, AllocationType::Evaluate); |
There was a problem hiding this comment.
This is a strange API for marking something as AllocationType::Evaluate...
There was a problem hiding this comment.
We used the existing framework for aliasing, hence, some of the API names may seem weird.
I'll make a note of this to modify in a future cleanup PR.
| std::make_tuple(Sizes({m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({k}), Sizes({b, k, n})), | ||
| std::make_tuple(Sizes({b, m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n})))); |
There was a problem hiding this comment.
Out of curiosity. What would happen if k/m/n happen to be 1?
This PR restricts the accepted matmul segments for the nvfuser matmul scheduler to only those containing pointwise epilogues. Additionally, it rules out cases for which we cannot yet reliably determine epilogue input vectorization due to transposes (TODO, see #2169). Note that this check can be lifted when more epilogue cases are supported, e.g. #2213. Fixes #2167. This is stacked on #2175 and follow-up PR to that introducing LinearOp because currently segmentation fails for matmuls unless the complete fusion can be scheduled (see #1707). The MatmulOp and LinearOp IR nodes remove the need to inspect operand producer branches, so segmentation should work fine once that work is merged. This PR will be marked as draft until then.
|
!build |
| auto tv0 = makeSymbolicTensor(a_shape.size(), DataType::Half); | ||
| auto tv1 = makeSymbolicTensor(b_shape.size(), DataType::Half); |
There was a problem hiding this comment.
| auto tv0 = makeSymbolicTensor(a_shape.size(), DataType::Half); | |
| auto tv1 = makeSymbolicTensor(b_shape.size(), DataType::Half); | |
| auto tv0 = makeSymbolicTensor(a_shape, DataType::Half); | |
| auto tv1 = makeSymbolicTensor(b_shape, DataType::Half); |
This will ensure the tensors are defined in the fusion almost the way they would be using fd.from_pytorch in case the shapes contain 1s. That could be useful here since you might translate some ops to non-MatmulOp if they are trivial. To be more precise though, this will still not declare them as contiguous. For that you might want to do
auto tv0 = TensorViewBuilder().shape(sym_shape).dtype(DataType::Half).contiguity(true).build();where sym_shape is a vector of 1 and -1.
There was a problem hiding this comment.
To be more precise though, this will still not declare them as contiguous
Why do we need this?
There was a problem hiding this comment.
However if in the future we try to evaluate this test by forcing the nvfuser matmul scheduler we might use an inefficient kernel since we won't be able to tell the inputs are contiguous. It's not a high priority, but it would be nice to have a test utility that could take an at::Tensor and create a TensorView* that matches it just like how we do in fd.from_pytorch.
| std::make_tuple(Sizes({m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({k}), Sizes({b, k, n})), | ||
| std::make_tuple(Sizes({b, m, k}), Sizes({k})), | ||
| std::make_tuple(Sizes({b, 1, m, k}), Sizes({b, k, n})))); |
There was a problem hiding this comment.
For
[M, 1] x [1, N]
This is the K=1 case but I think Jie is asking what about M and N, which we aren't testing here. I would also add that we're not really testing this case properly either since we are creating the input tensors with makeSymbolicTensor(a_shape.size()) which will make all dimensions IterType::Iteration. I suggested a code change that I think will address that, then we can add shape combos that have 1s in each position.
|
!build |
This PR does the following: 1. Add `MatmulOp` to `ir_utils::isTvOp` so that its `IterDomain`s will be automatically propagated by `IdModel`. 2. Updates the tests to check that all non-Broadcast axes are properly mapped by `IdModel` through the `MatmulOp`. 3. Changes the output of `MatmulOp` to have an `IterType::Reduction` axis in the last position of its root domain to represent the `K` dimension. This change was motivated by needing a way to have both operand K dimensions exact mapped together, as they would be if the op were translated to a mul+sum+cast. 4. Updates the `matmul` op to translate trivial cases where K=1 to simple multiply+cast patterns. Fixes #1707. In fact, that test was actually fixed by #2175 but the test validation was failing because `isTvOp` was not picking up the matmul as a reduction.
Issue #2149.
Adds a new MatmulOp IR node that has the same functionality as
torch.matmul.sum(mul(a, b)), without creating aMatmulOpnode.[M, K] x [K] -> [M] / [K] x [K, N] -> [N][M, K] x [K, N] -> [M, N][B, M, K] x [K, N] -> [B, M, N]csrc/ops/utils/mapMatmulOpIterDomainsdefines the logic to map the input operands to the ouput. This is used to create the new outputTensorViewfor theMatmulOpusing the input iterdomains and inPairwiseRootDomainMapto accurately map the MatmulOp input/outputs. This is required since the inputs are no longer broadcasted affecting the alignment of the inputs with the output.