Fix overzealous concretization root->rfactor propagation by jacobhinkle · Pull Request #397 · NVIDIA/Fuser

jacobhinkle · 2023-05-25T14:36:28Z

Repro:

TEST_F(NVFuserTest, TMP) {
  std::unique_ptr<Fusion> fusion_ptr = std::make_unique<Fusion>();
  Fusion& fusion = *fusion_ptr.get();
  FusionGuard fg(fusion_ptr.get());
  auto tv0 = makeSymbolicTensor(1);
  fusion.addInput(tv0);
  // tv0[:2] introduces symbolic IterDomain
  auto tv1 = slice(
      tv0, {{fusion.zeroVal(), IrBuilder::create<Int>(2), fusion.oneVal()}});
  auto tv2 = slice(
      tv1,
      {{fusion.zeroVal(), fusion.oneVal(), fusion.oneVal()}});
  fusion.addOutput(tv2);

  fusion.print();
  /*
    %kernel {
    T1_l[ ?S2{2}rf ]
       = slice( T0_g[ iS0{i0} ], { {0, 2, 1} } )
    T2_g[ bS4{1}rf ]
       = slice( T1_l[ ?S2{2}rf ], { {0, 1, 1} } )

    TransformPrinter :
    T0_g[ iS0{i0} ]
     root domain : (iS0{i0})
     contiguity: f
     leaf domain : (iS0{i0})
    T1_l[ ?S2{2}rf ]
     root domain : (iS1{i0}rf)
      Resize: iS1{i0}rf by 0 and ( 2 - i0 ) -> ?S2{2}rf
     rfactor domain : (?S2{2}rf)
     contiguity: t
     leaf domain : (?S2{2}rf)
    T2_g[ bS4{1}rf ]
     root domain : (?S3{2}rf)
      Resize: ?S3{2}rf by 0 and ( 1 - 2 ) -> bS4{1}rf
     rfactor domain : (bS4{1}rf)
     contiguity: n
     leaf domain : (bS4{1}rf)
    }
   */

  FusionExecutorCache fusion_executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor at0 = at::randn({5}, options);
  std::vector<c10::IValue> aten_inputs = {at0};
  auto outputs = fusion_executor_cache.runFusionWithInputs(aten_inputs);
  auto at1 = at::slice(at0, 0, 0, 2);
  auto at2 = at::slice(at1, 1, 0, 1);
  testValidate(&fusion, outputs, aten_inputs, {at2}, __LINE__, __FILE__);
}

Running this we encounter an error at runFusionWithInputs():

"The contiguity of a broadcast/reduction dimension must be None. The contiguity of a non-broadcast/reduction dimension must be true/false
Exception raised from validateContiguity at /opt/pytorch/nvfuser/csrc/ir/nodes.cpp:2679

The error occurs during concretization when creating a new TensorDomain for T2. We are replacing the old tensor domain:

root_dom: ?S3{2}rf              
rfactor_dom: bS4{1}rf           
leaf_dom: bS4{1}rf              
contig: nullopt

with a new one:

root_dom: iS7{2}rf              
rfactor_dom: iS8{1}rf           
leaf_dom: iS8{1}rf              
contig: nullopt

The IterType is being changed from Broadcast to Iteration when concretizing this slice op. The root domain ?S3{2}rf is concretized to iS5{2}rf, which is correct, but when it's propagated from root to rfactor inside DynamicTransformConcretizer::mutate(TensorView* tv) we should not overwrite each op's outputs unless it is Symbolic. Even then, it may be best to delegate this to the op instead. For now, this PR skips non-Symbolic outputs when doing this propagation.

jacobhinkle · 2023-05-25T14:36:42Z

!build

naoyam · 2023-05-25T18:36:06Z

The IterType is being changed from Broadcast to Iteration when concretizing this slice op

Why is this? Aren't we generating a broadcast domain when the extent is 1?

naoyam · 2023-05-25T18:37:45Z

I wonder why this needs to be symbolic:

root_dom: ?S3{2}rf

Its extent is known to be 2, so it's definitely not a broadcast domain, right?

jacobhinkle · 2023-05-25T20:28:41Z

Why is this? Aren't we generating a broadcast domain when the extent is 1?

Yes. Sorry I should've been more clear. The root should get concretized to Iteration, and it does. This is correct. However, currently we also propagate that to rfactor, assuming that if we updated the root then all uses up until rfactor will need an update. In this case, the rfactor is already a concrete Broadcast, so we need to avoid overwriting it.

jacobhinkle · 2023-05-25T20:30:55Z

I wonder why this needs to be symbolic:

One reason to keep it symbolic would be if we allow our slice to operate like PyTorch: i.e. if you have a size [1] tensor x, then x[:2] is size 1 (it doesn't throw an error). In that case, we would not actually know the output extent since it's the minimum between 2 and the input extent. Currently I think this case is pretty much ignored.

naoyam · 2023-05-25T21:41:21Z

Why is this? Aren't we generating a broadcast domain when the extent is 1?

Yes. Sorry I should've been more clear. The root should get concretized to Iteration, and it does. This is correct. However, currently we also propagate that to rfactor, assuming that if we updated the root then all uses up until rfactor will need an update. In this case, the rfactor is already a concrete Broadcast, so we need to avoid overwriting it.

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

naoyam · 2023-05-25T21:45:28Z

I wonder why this needs to be symbolic:

One reason to keep it symbolic would be if we allow our slice to operate like PyTorch: i.e. if you have a size [1] tensor x, then x[:2] is size 1 (it doesn't throw an error). In that case, we would not actually know the output extent since it's the minimum between 2 and the input extent. Currently I think this case is pretty much ignored.

So, the ID should really be something like:

root_dom: ?S3{x}rf

where x is min(2, i0) and is determined at the concretization time. Shouldn't we change how we setup symbolic resize ops?

jacobhinkle · 2023-05-26T12:15:46Z

Shouldn't we change how we setup symbolic resize ops?

Yes, I think we probably should address this in the ops. I don't think Resize or pad have an issue, since those will always have extent in_extent + left_pad + right_pad. Similarly, I think the extent computation for cat is fine since it's based on pad.

However, with slice it's true that unless both the input extent and (start, stop, step) are constant, we have to use out_extent = (min(stop, in_extent) - start) / step. At concretization we can determine which branch of the min we're on and replace the min with its proven val. This is a dynamic resize op anyway, so it is already tracked in concretization due to the fact it may switch between Iteration and Broadcast, but I think we'd want to track the slice op itself and handle that first before the resize is concretized. Our current dynamic resize triggers a recompile when the output toggles between Broadcast and Iteration; this would additionally recompile whenever in_extent - stop changes sign, for slice.

jacobhinkle · 2023-05-26T12:24:52Z

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

Yes that is a bit of an issue I think. The way it works now is we work from root to rfactor and we do not infer the "proper" iter_type at each step, we just copy the input iter_type if the intermediate ID is Symbolic. There is no expr_eval at this stage since we need the concretized Fusion to still apply for a range of input shapes/vals. So if we want to get the proper iter_types for all those intermediates, we need to do this scan when building DynamicTransformConcretizationInfo, record those intermediate iter_types in an unordered_map<IterDomain*, IterType>, then when we actually do the concretization we can just look up the iter_types we need to change. That map would include any Symbolic rfactor IDs as well.

naoyam · 2023-05-26T16:22:01Z

Shouldn't we change how we setup symbolic resize ops?

Yes, I think we probably should address this in the ops. I don't think Resize or pad have an issue, since those will always have extent in_extent + left_pad + right_pad. Similarly, I think the extent computation for cat is fine since it's based on pad.

However, with slice it's true that unless both the input extent and (start, stop, step) are constant, we have to use out_extent = (min(stop, in_extent) - start) / step. At concretization we can determine which branch of the min we're on and replace the min with its proven val. This is a dynamic resize op anyway, so it is already tracked in concretization due to the fact it may switch between Iteration and Broadcast, but I think we'd want to track the slice op itself and handle that first before the resize is concretized. Our current dynamic resize triggers a recompile when the output toggles between Broadcast and Iteration; this would additionally recompile whenever in_extent - stop changes sign, for slice.

Yeah, I kind of expected there would be some op-specific corner cases.

naoyam · 2023-05-26T16:29:43Z

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

Yes that is a bit of an issue I think. The way it works now is we work from root to rfactor and we do not infer the "proper" iter_type at each step, we just copy the input iter_type if the intermediate ID is Symbolic. There is no expr_eval at this stage since we need the concretized Fusion to still apply for a range of input shapes/vals. So if we want to get the proper iter_types for all those intermediates, we need to do this scan when building DynamicTransformConcretizationInfo, record those intermediate iter_types in an unordered_map<IterDomain*, IterType>, then when we actually do the concretization we can just look up the iter_types we need to change. That map would include any Symbolic rfactor IDs as well.

Ah, I see. So, we should just skip non-symbolic IDs. For any symbolic ID, if it requires concretization with an expr eval, it should be taken care before the propagation by concreteizeReshape and concretizeResize, so if there's any remaining symbolic IDs, their actual types can just be copied from their corresponding producer IDs. Does this sound correct?

jacobhinkle · 2023-05-26T17:05:20Z

Ah, I see. So, we should just skip non-symbolic IDs. For any symbolic ID, if it requires concretization with an expr eval, it should be taken care before the propagation by concreteizeReshape and concretizeResize, so if there's any remaining symbolic IDs, their actual types can just be copied from their corresponding producer IDs. Does this sound correct?

I think that's correct. One thing I've been trying to sort out is how far we need to traverse to find the IDs that need an expr_eval to decide, before concretization. Currently this problematic code is in root->rfactor that happens during forward propagation after we have run concretize{Reshape,Resize}. So it is not necessarily being run only on the dynamic ops that we have identified. It seems unlikely that it would affect other ops but I just want to be sure. Anyway, I will take a stab at doing this conversion.

naoyam

LGTM. Thanks for the fix. Please create an issue for the slice problem.

jacobhinkle added 2 commits May 25, 2023 10:32

Add failing test

933e53f

Skip concretizing non-symbolic IDs in root->rfactor

33f4eae

jacobhinkle requested a review from naoyam May 25, 2023 14:36

jacobhinkle mentioned this pull request May 25, 2023

Transpose scheduler fails with Resize ops #398

Closed

jacobhinkle added a commit that referenced this pull request May 25, 2023

Cherry-pick from #397

73e9c58

Merge branch 'main' into overzealous_concretization_propagation

6d4ff86

Merge branch 'main' into overzealous_concretization_propagation

1633939

naoyam approved these changes May 26, 2023

View reviewed changes

Merge branch 'main' into overzealous_concretization_propagation

181f924

jacobhinkle mentioned this pull request Jun 1, 2023

Slice should clip to end of tensor #439

Closed

Merge branch 'main' into overzealous_concretization_propagation

2d869c1

jacobhinkle merged commit fd254b8 into main Jun 2, 2023

jacobhinkle deleted the overzealous_concretization_propagation branch June 2, 2023 13:39

Conversation

jacobhinkle commented May 25, 2023

Uh oh!

jacobhinkle commented May 25, 2023

Uh oh!

naoyam commented May 25, 2023

Uh oh!

naoyam commented May 25, 2023

Uh oh!

jacobhinkle commented May 25, 2023

Uh oh!

jacobhinkle commented May 25, 2023

Uh oh!

naoyam commented May 25, 2023

Uh oh!

naoyam commented May 25, 2023

Uh oh!

jacobhinkle commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacobhinkle commented May 26, 2023

Uh oh!

naoyam commented May 26, 2023

Uh oh!

naoyam commented May 26, 2023

Uh oh!

jacobhinkle commented May 26, 2023

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacobhinkle commented May 26, 2023 •

edited

Loading