Skip to content

Make selfReplay stricter#5221

Merged
wujingyue merged 13 commits intomainfrom
wjy/replay
Sep 30, 2025
Merged

Make selfReplay stricter#5221
wujingyue merged 13 commits intomainfrom
wjy/replay

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Sep 24, 2025

With this change, ignore_reductions is computed instead of given as a knob.

@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Sep 24, 2025

Review updated until commit f2ed3b9

Description

  • Make selfReplay automatically handle reduction mismatches

  • Remove explicit ignore_reductions parameter

  • Improve axis type compatibility checks

  • Update call sites to omit ignore_reductions


Changes walkthrough 📝

Relevant files
Enhancement
8 files
dynamic_transform.cpp
Remove ignore_reductions from selfReplay call                       
+1/-3     
fusion.cpp
Simplify selfReplay by removing parameter                               
+1/-3     
fusion_segmenter.cpp
Update selfReplay to infer reductions                                       
+1/-2     
decompose_reshardings.cpp
Remove explicit ignore_reductions flag                                     
+1/-2     
mark_aliases_prepare.cpp
Adjust selfReplay calls for new logic                                       
+1/-2     
remove_bcast_squeeze.cpp
Simplify selfReplay usage                                                               
+1/-2     
reorder_sharded_axis.cpp
Update selfReplay to omit parameter                                           
+2/-4     
transform_replay.h
Simplify selfReplay interface                                                       
+9/-12   
Bug fix
1 files
transform_replay.cpp
Automatically detect reduction handling                                   
+43/-40 
Tests
1 files
test_replay.cpp
Update test calls to selfReplay                                                   
+2/-4     

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review

Possible Issue

The logic for determining ignore_reductions based on logical domain size differences may incorrectly remove reduction dimensions when the size mismatch is due to reasons other than reductions, leading to incorrect domain mappings.

bool ignore_reductions = logical.size() != new_logical.size();
if (logical.size() > new_logical.size()) {
  logical = TensorDomain::noReductions(logical);
} else if (logical.size() < new_logical.size()) {
  new_logical = TensorDomain::noReductions(new_logical);
}
NVF_ERROR_EQ(logical.size(), new_logical.size());
Logic Error

The comparison between loop and self->logical() uses the original self->loop() instead of the filtered logical vector, which may lead to incorrect replay behavior when reduction domains are involved.

if (loop != self->logical()) {
Logic Error

The allocation domain replay uses self->allocation() instead of the filtered logical domains, which may cause mismatches when reduction dimensions are present in one domain but not the other.

const std::vector<IterDomain*>& allocation = self->allocation();

@wujingyue
Copy link
Collaborator Author

!test

otherwise
thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder::dtypes::float32
fails
@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue changed the title Smoke test for selfReplay Make selfReplay stricter Sep 24, 2025
@wujingyue
Copy link
Collaborator Author

!test

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue marked this pull request as ready for review September 25, 2025 05:43
@wujingyue
Copy link
Collaborator Author

!test

@Priya2698
Copy link
Collaborator

@wujingyue do you plan on refactoring this based on yesterday's discussion?

@wujingyue
Copy link
Collaborator Author

@wujingyue do you plan on refactoring this based on yesterday's discussion?

This PR is orthogonal to that.

@Priya2698
Copy link
Collaborator

API wise, it seems simpler to me to keep the knob than infer it in the program.

For example:

self = [i0, i1, r0, r1]
new_self = [i0', i1', r2]

The current changes will raise an error. Having the knob makes it simpler. selfReplay does not have to decide whether or not the reduction IDs should be consider mapped or not, the caller code can specify that based on context.

For the original case in discussion #5177 (comment):

  1. We could assert on IDs having the same itertype if mapped.
  2. Or, if a non-reduction ID is mapped to a reduction ID, allocation contiguity can be set to false if the replayed ID is non-reduction.

@wujingyue
Copy link
Collaborator Author

For example:
self = [i0, i1, r0, r1]
new_self = [i0', i1', r2]
The current changes will raise an error.

Yes, this is intentional. All tests passing means we don't have a use case for this. (We might in the future but YAGNI.) Therefore, removing the knob is strictly simpler because the function has fewer parameters and the user has fewer chances to make mistakes.

  1. We could assert on IDs having the same itertype if mapped.
  2. Or, if a non-reduction ID is mapped to a reduction ID, allocation contiguity can be set to false if the replayed ID is non-reduction.

Good ideas! Yes, I'll add these checks in this PR.

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue
Copy link
Collaborator Author

Good ideas! Yes, I'll add these checks in this PR.

Done

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partial with the change. On one hand, I agree that this version would be easier to use because there's no option and its functionality is sufficient. On the other hand, automatic removal of reduction IDs is not something we do commonly, so I consider that's a deviation from the common behavior. Also, since this is a common building block that could be used throughout the whole system, I'd generally prefer an explicit and verbose interface over an implicit and smarter interface.

@wujingyue
Copy link
Collaborator Author

On the other hand, automatic removal of reduction IDs is not something we do commonly

The current code in fact uses ignore_reductions=true more often than false:

// In practice, `ignore_reductions=true` is used more often than `false`. I

I'd generally prefer an explicit and verbose interface over an implicit and smarter interface.

Thanks! I hear your concerns. I'll keep those in mind and revert this change when we run into the potential problems you suggested.

@wujingyue
Copy link
Collaborator Author

!test

Comment on lines +256 to +263
if (logical.size() > new_logical.size()) {
logical = TensorDomain::noReductions(logical);
ignore_reductions = true;
} else if (logical.size() < new_logical.size()) {
new_logical = TensorDomain::noReductions(new_logical);
ignore_reductions = true;
} else {
ignore_reductions = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not -> if same size, ignore_reductions=false, else, true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier comment was intended as:

if (logical.size() != new_logical.size()){
  logical = TensorDomain::noReductions(logical)
  new_logical = TensorDomain::noReductions(new_logical)
  ignore_reductions=True
} else {
  ignore_reductions=False
}

Sorry about the ambiguity. The current implementation still maintains conditional clearing of reduction axes from logical and new_logical. I was indicating that we clear all reduction axis if the sizes are unequal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It's related to #5221 (comment). This PR makes the contract stricter: when sizes don't match, the extra IterDomains and only them are reduction.

Therefore, this PR NVF_ERRORs the following example

self = [i0, i1, r0, r1]
new_self = [i0', i1', r2]

because the expected behavior is unclear. If r0 and r1 are loop-transformed, should we transform r2 like r0 or r1, or not transform it at all? (Recall that when sizes do match we map/replay reductions as well)

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue merged commit 06f565d into main Sep 30, 2025
49 of 51 checks passed
@wujingyue wujingyue deleted the wjy/replay branch September 30, 2025 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants