Make selfReplay stricter by wujingyue · Pull Request #5221 · NVIDIA/Fuser

wujingyue · 2025-09-24T01:48:55Z

With this change, ignore_reductions is computed instead of given as a knob.

wujingyue · 2025-09-24T01:49:01Z

!test

github-actions · 2025-09-24T01:49:44Z

Review updated until commit f2ed3b9

Description

Make selfReplay automatically handle reduction mismatches
Remove explicit ignore_reductions parameter
Improve axis type compatibility checks
Update call sites to omit ignore_reductions

Changes walkthrough 📝

Relevant files

Enhancement

8 files

dynamic_transform.cpp `Remove ignore_reductions from selfReplay call`	+1/-3
fusion.cpp `Simplify selfReplay by removing parameter`	+1/-3
fusion_segmenter.cpp `Update selfReplay to infer reductions`	+1/-2
decompose_reshardings.cpp `Remove explicit ignore_reductions flag`	+1/-2
mark_aliases_prepare.cpp `Adjust selfReplay calls for new logic`	+1/-2
remove_bcast_squeeze.cpp `Simplify selfReplay usage`	+1/-2
reorder_sharded_axis.cpp `Update selfReplay to omit parameter`	+2/-4
transform_replay.h `Simplify selfReplay interface`	+9/-12

Bug fix

1 files

transform_replay.cpp `Automatically detect reduction handling`	+43/-40

Tests

1 files

test_replay.cpp `Update test calls to selfReplay`	+2/-4

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Possible Issue The logic for determining `ignore_reductions` based on logical domain size differences may incorrectly remove reduction dimensions when the size mismatch is due to reasons other than reductions, leading to incorrect domain mappings. bool ignore_reductions = logical.size() != new_logical.size(); if (logical.size() > new_logical.size()) { logical = TensorDomain::noReductions(logical); } else if (logical.size() < new_logical.size()) { new_logical = TensorDomain::noReductions(new_logical); } NVF_ERROR_EQ(logical.size(), new_logical.size()); Logic Error The comparison between `loop` and `self->logical()` uses the original `self->loop()` instead of the filtered `logical` vector, which may lead to incorrect replay behavior when reduction domains are involved. if (loop != self->logical()) { Logic Error The allocation domain replay uses `self->allocation()` instead of the filtered logical domains, which may cause mismatches when reduction dimensions are present in one domain but not the other. const std::vector<IterDomain*>& allocation = self->allocation();

csrc/transform_replay.cpp

wujingyue · 2025-09-24T02:24:51Z

!test

otherwise thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder::dtypes::float32 fails

wujingyue · 2025-09-24T20:50:07Z

!test

wujingyue · 2025-09-25T03:42:32Z

!test

wujingyue · 2025-09-25T05:43:30Z

!test

wujingyue · 2025-09-25T05:49:15Z

!test

Priya2698 · 2025-09-26T18:19:10Z

@wujingyue do you plan on refactoring this based on yesterday's discussion?

wujingyue · 2025-09-26T18:20:30Z

@wujingyue do you plan on refactoring this based on yesterday's discussion?

This PR is orthogonal to that.

Priya2698 · 2025-09-26T20:34:37Z

API wise, it seems simpler to me to keep the knob than infer it in the program.

For example:

self = [i0, i1, r0, r1]
new_self = [i0', i1', r2]

The current changes will raise an error. Having the knob makes it simpler. selfReplay does not have to decide whether or not the reduction IDs should be consider mapped or not, the caller code can specify that based on context.

For the original case in discussion #5177 (comment):

We could assert on IDs having the same itertype if mapped.
Or, if a non-reduction ID is mapped to a reduction ID, allocation contiguity can be set to false if the replayed ID is non-reduction.

wujingyue · 2025-09-26T21:36:26Z

For example:
self = [i0, i1, r0, r1]
new_self = [i0', i1', r2]
The current changes will raise an error.

Yes, this is intentional. All tests passing means we don't have a use case for this. (We might in the future but YAGNI.) Therefore, removing the knob is strictly simpler because the function has fewer parameters and the user has fewer chances to make mistakes.

We could assert on IDs having the same itertype if mapped.

Or, if a non-reduction ID is mapped to a reduction ID, allocation contiguity can be set to false if the replayed ID is non-reduction.

Good ideas! Yes, I'll add these checks in this PR.

wujingyue · 2025-09-27T03:39:21Z

!test

wujingyue · 2025-09-27T04:18:29Z

Good ideas! Yes, I'll add these checks in this PR.

Done

naoyam

I'm partial with the change. On one hand, I agree that this version would be easier to use because there's no option and its functionality is sufficient. On the other hand, automatic removal of reduction IDs is not something we do commonly, so I consider that's a deviation from the common behavior. Also, since this is a common building block that could be used throughout the whole system, I'd generally prefer an explicit and verbose interface over an implicit and smarter interface.

wujingyue · 2025-09-29T17:42:23Z

On the other hand, automatic removal of reduction IDs is not something we do commonly

The current code in fact uses ignore_reductions=true more often than false:

Fuser/csrc/transform_replay.h

Line 247 in 697d93d

// In practice, `ignore_reductions=true` is used more often than `false`. I

I'd generally prefer an explicit and verbose interface over an implicit and smarter interface.

Thanks! I hear your concerns. I'll keep those in mind and revert this change when we run into the potential problems you suggested.

wujingyue · 2025-09-29T17:42:36Z

!test

Priya2698 · 2025-09-29T17:47:11Z

csrc/transform_replay.cpp

+  if (logical.size() > new_logical.size()) {
+    logical = TensorDomain::noReductions(logical);
+    ignore_reductions = true;
+  } else if (logical.size() < new_logical.size()) {
+    new_logical = TensorDomain::noReductions(new_logical);
+    ignore_reductions = true;
+  } else {
+    ignore_reductions = false;


Why not -> if same size, ignore_reductions=false, else, true

My earlier comment was intended as:

if (logical.size() != new_logical.size()){ logical = TensorDomain::noReductions(logical) new_logical = TensorDomain::noReductions(new_logical) ignore_reductions=True } else { ignore_reductions=False }

Sorry about the ambiguity. The current implementation still maintains conditional clearing of reduction axes from logical and new_logical. I was indicating that we clear all reduction axis if the sizes are unequal.

Got it. It's related to #5221 (comment). This PR makes the contract stricter: when sizes don't match, the extra IterDomains and only them are reduction.

Therefore, this PR NVF_ERRORs the following example

self = [i0, i1, r0, r1] new_self = [i0', i1', r2]

because the expected behavior is unclear. If r0 and r1 are loop-transformed, should we transform r2 like r0 or r1, or not transform it at all? (Recall that when sizes do match we map/replay reductions as well)

csrc/transform_replay.cpp

wujingyue · 2025-09-29T21:29:50Z

!test

wujingyue · 2025-09-30T02:34:40Z

!test

Smoke test for selfReplay

3a07bda

xwang233 reviewed Sep 24, 2025

View reviewed changes

csrc/transform_replay.cpp Outdated Show resolved Hide resolved

Fix build

2a01db4

Fix

51df6d2

otherwise thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_outer_nvfuser_cuda_thunder::dtypes::float32 fails

WIP

5b0d6ca

wujingyue changed the title ~~Smoke test for selfReplay~~ Make selfReplay stricter Sep 24, 2025

wujingyue mentioned this pull request Sep 24, 2025

Use selfReplay in fusion segmentor #5177

Merged

Add back some code

6c2a5dc

wujingyue marked this pull request as ready for review September 25, 2025 05:43

Comment

6e6e5b6

wujingyue requested review from Priya2698 and naoyam September 25, 2025 05:50

wujingyue added 3 commits September 26, 2025 20:22

Rename

7906270

Make checks stricter

279c63a

Merge remote-tracking branch 'origin/main' into wjy/replay

2b2bc7d

wujingyue force-pushed the wjy/replay branch from 2ed8669 to 2b2bc7d Compare September 27, 2025 03:39

naoyam approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into wjy/replay

95b3913

Priya2698 reviewed Sep 29, 2025

View reviewed changes

csrc/transform_replay.cpp Outdated Show resolved Hide resolved

wujingyue added 2 commits September 29, 2025 11:44

Review

cebb78f

review

1f94d7c

Priya2698 approved these changes Sep 29, 2025

View reviewed changes

Fix

f2ed3b9

wujingyue merged commit 06f565d into main Sep 30, 2025
49 of 51 checks passed

wujingyue deleted the wjy/replay branch September 30, 2025 04:48

Conversation

wujingyue commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

wujingyue commented Sep 24, 2025

Uh oh!

wujingyue commented Sep 24, 2025

Uh oh!

wujingyue commented Sep 25, 2025

Uh oh!

wujingyue commented Sep 25, 2025

Uh oh!

wujingyue commented Sep 25, 2025

Uh oh!

Priya2698 commented Sep 26, 2025

Uh oh!

wujingyue commented Sep 26, 2025

Uh oh!

Priya2698 commented Sep 26, 2025

Uh oh!

wujingyue commented Sep 26, 2025

Uh oh!

wujingyue commented Sep 27, 2025

Uh oh!

wujingyue commented Sep 27, 2025

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

wujingyue commented Sep 29, 2025

Uh oh!

wujingyue commented Sep 29, 2025

Uh oh!

Priya2698 Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

wujingyue Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Priya2698 Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

wujingyue Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wujingyue commented Sep 29, 2025

Uh oh!

wujingyue commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wujingyue commented Sep 24, 2025 •

edited

Loading

github-actions bot commented Sep 24, 2025 •

edited

Loading