-
Notifications
You must be signed in to change notification settings - Fork 79
Closed
Description
This is a corner case for Resize ops that hasn't had a lot of attention yet, in which we use pad in a way that results in a broadcast that is then resolved.
TEST_F(NVFuserTest, PadToBroadcast) {
auto fusion = std::make_unique<Fusion>();
FusionGuard fg(fusion.get());
auto tv0 = makeConcreteTensor({2});
auto tv1 = makeConcreteTensor({3});
fusion->addInput(tv0);
fusion->addInput(tv1);
auto tv2 = pad(tv0, {fusion->zeroVal(), IrBuilder::create<Scalar>(-1)});
auto tv3 = mul(tv1, tv2);
fusion->addOutput(tv3);
// Fusion is not dynamic
EXPECT_FALSE(fusion->hasDynamicTransform());
fusion->printMath();
/*
Inputs:
T0_g[ iS0{2} ], float
T1_g[ iS1{3} ], float
Outputs:
T3_g[ iS4{3} ], float %kernel_math {
T2_l[ bS3{1}rf ]
= pad( T0_g[ iS0{2} ], {0, -1} )
T3_g[ iS4{3} ]
= T1_g[ iS1{3} ]
* T2_l[ bS3{1}rf ];
}
*/
auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
auto t0 = at::randn({2}, options);
auto t1 = at::randn({3}, options);
std::vector<c10::IValue> aten_inputs({t0, t1});
auto args = KernelArgumentHolder::createKernelArgumentHolder(aten_inputs);
FusionKernelRuntime runtime(std::move(fusion), args);
/*
terminate called after throwing an instance of 'c10::Error'
what(): it != broadcast_origin_map_.end() INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/device_lower/analysis/trivial_broadcast.cpp":98, please report a bug to PyTorch.
Broadcast origin info not found for producer broadcast domain: bS3{1}rf of T2_l[ bS3{1}rf ]
Exception raised from handle at /opt/pytorch/nvfuser/csrc/device_lower/analysis/trivial_broadcast.cpp:98 (most recent call first)
*/
runtime.compileFusionParallel(args);
auto cg_outputs = runtime.runWithInputs(args);
auto t2_padded = at::pad(t0, {0, -1});
auto ref_t2 = t1 * t2_padded;
testValidate(
runtime.fusionSegments()->completeFusion(),
cg_outputs,
aten_inputs,
{ref_t2},
__LINE__,
__FILE__);
}Simply determining the output IterType is not sufficient to be able to actually resolve the broadcast. We instead need to have an actual tensor allocated to act as the broadcast origin. This can be done in this case by translating the pad to
auto tv2 = broadcast(select(tv0, 0, fusion->zeroVal()), {true});in which case select introduces the needed intermediate tensor which is the origin of the following broadcast.
Note that dynamic pad faces a similar issue but that we should probably fix this static case first. Static and dynamic slice will face a similar problem (related to #511).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels