-
Notifications
You must be signed in to change notification settings - Fork 79
Description
The following test fails when trying to create an MmaOp
// Single batch dimension which is broadcast
TEST_F(GPUTTensorCoreTest, FusionAmpereBroadcastBatchMatmul_CUDA) {
auto layout = MmaLayout::TN;
Fusion fusion;
FusionGuard fg(&fusion);
auto shapes = matmulAtInputShape3DTuring(-1, -1, -1, layout);
auto tv0 = makeContigConcreteTensor(shapes.first, DataType::Half);
auto tv1 = makeContigConcreteTensor(shapes.second, DataType::Half);
fusion.addInput(tv0);
fusion.addInput(tv1);
tv0 = canonicalizeInputToBMNK(tv0, layout, MmaOperand::A);
tv1 = canonicalizeInputToBMNK(tv1, layout, MmaOperand::B);
auto tv2 = fusedMultiplySum(
broadcast(tv0, {true, false, false, false}),
broadcast(tv1, {true, false, false, false}),
{-1});
/*
C++ exception with description "details.bcasts.empty() INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/ir/utils.cpp":1
268, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. MmaOp output: has broadcast domains.
Exception raised from operator() at /opt/pytorch/nvfuser/csrc/ir/utils.cpp:1268 (most recent call first):
*/
fusion.addOutput(tv2);
}This caused the failure of
Fuser/tests/cpp/test_combine_mul_sum.cpp
Line 121 in 5228f89
| TEST_F(CombineMulSumAsMmaTest, AmpereMulSumToMatmul_Fail2) { |
MmaOp ctor to not balk at such cases.Reactions are currently unavailable