Skip to content

Fix overzealous concretization root->rfactor propagation#397

Merged
jacobhinkle merged 6 commits intomainfrom
overzealous_concretization_propagation
Jun 2, 2023
Merged

Fix overzealous concretization root->rfactor propagation#397
jacobhinkle merged 6 commits intomainfrom
overzealous_concretization_propagation

Conversation

@jacobhinkle
Copy link
Collaborator

Repro:

TEST_F(NVFuserTest, TMP) {
  std::unique_ptr<Fusion> fusion_ptr = std::make_unique<Fusion>();
  Fusion& fusion = *fusion_ptr.get();
  FusionGuard fg(fusion_ptr.get());
  auto tv0 = makeSymbolicTensor(1);
  fusion.addInput(tv0);
  // tv0[:2] introduces symbolic IterDomain
  auto tv1 = slice(
      tv0, {{fusion.zeroVal(), IrBuilder::create<Int>(2), fusion.oneVal()}});
  auto tv2 = slice(
      tv1,
      {{fusion.zeroVal(), fusion.oneVal(), fusion.oneVal()}});
  fusion.addOutput(tv2);

  fusion.print();
  /*
    %kernel {
    T1_l[ ?S2{2}rf ]
       = slice( T0_g[ iS0{i0} ], { {0, 2, 1} } )
    T2_g[ bS4{1}rf ]
       = slice( T1_l[ ?S2{2}rf ], { {0, 1, 1} } )

    TransformPrinter :
    T0_g[ iS0{i0} ]
     root domain : (iS0{i0})
     contiguity: f
     leaf domain : (iS0{i0})
    T1_l[ ?S2{2}rf ]
     root domain : (iS1{i0}rf)
      Resize: iS1{i0}rf by 0 and ( 2 - i0 ) -> ?S2{2}rf
     rfactor domain : (?S2{2}rf)
     contiguity: t
     leaf domain : (?S2{2}rf)
    T2_g[ bS4{1}rf ]
     root domain : (?S3{2}rf)
      Resize: ?S3{2}rf by 0 and ( 1 - 2 ) -> bS4{1}rf
     rfactor domain : (bS4{1}rf)
     contiguity: n
     leaf domain : (bS4{1}rf)
    }
   */

  FusionExecutorCache fusion_executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor at0 = at::randn({5}, options);
  std::vector<c10::IValue> aten_inputs = {at0};
  auto outputs = fusion_executor_cache.runFusionWithInputs(aten_inputs);
  auto at1 = at::slice(at0, 0, 0, 2);
  auto at2 = at::slice(at1, 1, 0, 1);
  testValidate(&fusion, outputs, aten_inputs, {at2}, __LINE__, __FILE__);
}

Running this we encounter an error at runFusionWithInputs():

"The contiguity of a broadcast/reduction dimension must be None. The contiguity of a non-broadcast/reduction dimension must be true/false
Exception raised from validateContiguity at /opt/pytorch/nvfuser/csrc/ir/nodes.cpp:2679

The error occurs during concretization when creating a new TensorDomain for T2. We are replacing the old tensor domain:

root_dom: ?S3{2}rf              
rfactor_dom: bS4{1}rf           
leaf_dom: bS4{1}rf              
contig: nullopt

with a new one:

root_dom: iS7{2}rf              
rfactor_dom: iS8{1}rf           
leaf_dom: iS8{1}rf              
contig: nullopt

The IterType is being changed from Broadcast to Iteration when concretizing this slice op. The root domain ?S3{2}rf is concretized to iS5{2}rf, which is correct, but when it's propagated from root to rfactor inside DynamicTransformConcretizer::mutate(TensorView* tv) we should not overwrite each op's outputs unless it is Symbolic. Even then, it may be best to delegate this to the op instead. For now, this PR skips non-Symbolic outputs when doing this propagation.

@jacobhinkle jacobhinkle requested a review from naoyam May 25, 2023 14:36
@jacobhinkle
Copy link
Collaborator Author

!build

jacobhinkle added a commit that referenced this pull request May 25, 2023
@naoyam
Copy link
Collaborator

naoyam commented May 25, 2023

The IterType is being changed from Broadcast to Iteration when concretizing this slice op

Why is this? Aren't we generating a broadcast domain when the extent is 1?

@naoyam
Copy link
Collaborator

naoyam commented May 25, 2023

I wonder why this needs to be symbolic:

root_dom: ?S3{2}rf

Its extent is known to be 2, so it's definitely not a broadcast domain, right?

@jacobhinkle
Copy link
Collaborator Author

Why is this? Aren't we generating a broadcast domain when the extent is 1?

Yes. Sorry I should've been more clear. The root should get concretized to Iteration, and it does. This is correct. However, currently we also propagate that to rfactor, assuming that if we updated the root then all uses up until rfactor will need an update. In this case, the rfactor is already a concrete Broadcast, so we need to avoid overwriting it.

@jacobhinkle
Copy link
Collaborator Author

I wonder why this needs to be symbolic:

One reason to keep it symbolic would be if we allow our slice to operate like PyTorch: i.e. if you have a size [1] tensor x, then x[:2] is size 1 (it doesn't throw an error). In that case, we would not actually know the output extent since it's the minimum between 2 and the input extent. Currently I think this case is pretty much ignored.

@naoyam
Copy link
Collaborator

naoyam commented May 25, 2023

Why is this? Aren't we generating a broadcast domain when the extent is 1?

Yes. Sorry I should've been more clear. The root should get concretized to Iteration, and it does. This is correct. However, currently we also propagate that to rfactor, assuming that if we updated the root then all uses up until rfactor will need an update. In this case, the rfactor is already a concrete Broadcast, so we need to avoid overwriting it.

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

@naoyam
Copy link
Collaborator

naoyam commented May 25, 2023

I wonder why this needs to be symbolic:

One reason to keep it symbolic would be if we allow our slice to operate like PyTorch: i.e. if you have a size [1] tensor x, then x[:2] is size 1 (it doesn't throw an error). In that case, we would not actually know the output extent since it's the minimum between 2 and the input extent. Currently I think this case is pretty much ignored.

So, the ID should really be something like:

root_dom: ?S3{x}rf

where x is min(2, i0) and is determined at the concretization time. Shouldn't we change how we setup symbolic resize ops?

@jacobhinkle
Copy link
Collaborator Author

jacobhinkle commented May 26, 2023

Shouldn't we change how we setup symbolic resize ops?

Yes, I think we probably should address this in the ops. I don't think Resize or pad have an issue, since those will always have extent in_extent + left_pad + right_pad. Similarly, I think the extent computation for cat is fine since it's based on pad.

However, with slice it's true that unless both the input extent and (start, stop, step) are constant, we have to use out_extent = (min(stop, in_extent) - start) / step. At concretization we can determine which branch of the min we're on and replace the min with its proven val. This is a dynamic resize op anyway, so it is already tracked in concretization due to the fact it may switch between Iteration and Broadcast, but I think we'd want to track the slice op itself and handle that first before the resize is concretized. Our current dynamic resize triggers a recompile when the output toggles between Broadcast and Iteration; this would additionally recompile whenever in_extent - stop changes sign, for slice.

@jacobhinkle
Copy link
Collaborator Author

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

Yes that is a bit of an issue I think. The way it works now is we work from root to rfactor and we do not infer the "proper" iter_type at each step, we just copy the input iter_type if the intermediate ID is Symbolic. There is no expr_eval at this stage since we need the concretized Fusion to still apply for a range of input shapes/vals. So if we want to get the proper iter_types for all those intermediates, we need to do this scan when building DynamicTransformConcretizationInfo, record those intermediate iter_types in an unordered_map<IterDomain*, IterType>, then when we actually do the concretization we can just look up the iter_types we need to change. That map would include any Symbolic rfactor IDs as well.

@naoyam
Copy link
Collaborator

naoyam commented May 26, 2023

Shouldn't we change how we setup symbolic resize ops?

Yes, I think we probably should address this in the ops. I don't think Resize or pad have an issue, since those will always have extent in_extent + left_pad + right_pad. Similarly, I think the extent computation for cat is fine since it's based on pad.

However, with slice it's true that unless both the input extent and (start, stop, step) are constant, we have to use out_extent = (min(stop, in_extent) - start) / step. At concretization we can determine which branch of the min we're on and replace the min with its proven val. This is a dynamic resize op anyway, so it is already tracked in concretization due to the fact it may switch between Iteration and Broadcast, but I think we'd want to track the slice op itself and handle that first before the resize is concretized. Our current dynamic resize triggers a recompile when the output toggles between Broadcast and Iteration; this would additionally recompile whenever in_extent - stop changes sign, for slice.

Yeah, I kind of expected there would be some op-specific corner cases.

@naoyam
Copy link
Collaborator

naoyam commented May 26, 2023

Yes, but even if we overwrite it, shouldn't it be overwritten as a broadcast domain since the extent is 1? I agree we shouldn't overwrite it from the beginning, but I wonder if there's anything wrong in the propagation step of fusion concretization.

Yes that is a bit of an issue I think. The way it works now is we work from root to rfactor and we do not infer the "proper" iter_type at each step, we just copy the input iter_type if the intermediate ID is Symbolic. There is no expr_eval at this stage since we need the concretized Fusion to still apply for a range of input shapes/vals. So if we want to get the proper iter_types for all those intermediates, we need to do this scan when building DynamicTransformConcretizationInfo, record those intermediate iter_types in an unordered_map<IterDomain*, IterType>, then when we actually do the concretization we can just look up the iter_types we need to change. That map would include any Symbolic rfactor IDs as well.

Ah, I see. So, we should just skip non-symbolic IDs. For any symbolic ID, if it requires concretization with an expr eval, it should be taken care before the propagation by concreteizeReshape and concretizeResize, so if there's any remaining symbolic IDs, their actual types can just be copied from their corresponding producer IDs. Does this sound correct?

@jacobhinkle
Copy link
Collaborator Author

Ah, I see. So, we should just skip non-symbolic IDs. For any symbolic ID, if it requires concretization with an expr eval, it should be taken care before the propagation by concreteizeReshape and concretizeResize, so if there's any remaining symbolic IDs, their actual types can just be copied from their corresponding producer IDs. Does this sound correct?

I think that's correct. One thing I've been trying to sort out is how far we need to traverse to find the IDs that need an expr_eval to decide, before concretization. Currently this problematic code is in root->rfactor that happens during forward propagation after we have run concretize{Reshape,Resize}. So it is not necessarily being run only on the dynamic ops that we have identified. It seems unlikely that it would affect other ops but I just want to be sure. Anyway, I will take a stab at doing this conversion.

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix. Please create an issue for the slice problem.

@jacobhinkle jacobhinkle merged commit fd254b8 into main Jun 2, 2023
@jacobhinkle jacobhinkle deleted the overzealous_concretization_propagation branch June 2, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants