Skip to content

Schedule loop domains such that reshape transforms are cancelled#3679

Merged
naoyam merged 7 commits intomainfrom
cancel_reshape
Jan 10, 2025
Merged

Schedule loop domains such that reshape transforms are cancelled#3679
naoyam merged 7 commits intomainfrom
cancel_reshape

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 8, 2025

This PR adds a scheduling primitive, cancelReshapeInLoopDomains(TensorView* from_tv), where all reshape transforms appearing between from_tv and fusion outputs are effectively cancelled in their loop domains. Please see the comment for a motivating example.

This could be used to remove the restriction of the interfering reshape in reduction/normalization fusions.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 8, 2025

!test

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 9, 2025

!test

@naoyam naoyam marked this pull request as ready for review January 9, 2025 02:01
@naoyam naoyam requested a review from jacobhinkle January 9, 2025 02:01
const std::vector<TensorView*>& tvs,
Expr* transform);
Expr* transform,
Direction replay_dir = Direction::Undefined);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an optional parameter to restrict the direction.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 9, 2025

!test

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one small clarifying question.

// as t0, which could minimize strided accesses.
//
// This scheduling is not always feasible. Specifically, if a reshape
// outout iter domain is resized, the loop domain needs to keep using
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example of what the output would look like if some transforms can't be cancelled? For example if we had a further tensor

// t4 = pad(t3) // [i1, i0*i2 + 2]

Then we cannot cancel anything. Presumably if there is a decomposition of the reshape where some is cancellable and another part is not, we will cancel the part that is possible to cancel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this test?

https://github.com/NVIDIA/Fuser/pull/3679/files#diff-add3baa66fa88dd28b1baec00ec023373d88630908bff6583a0a4d61379e17cbR705

The reshape for tv1 is not cancelled as the following slice depends on it, whereas the tv3 reshape is cancelled.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 10, 2025

!build

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 10, 2025

!build

@naoyam naoyam merged commit 05ec62b into main Jan 10, 2025
6 checks passed
@naoyam naoyam deleted the cancel_reshape branch January 10, 2025 18:56
naoyam added a commit that referenced this pull request Jan 15, 2025
Depends on #3674, #3675, #3679

Reorder tensors to align with the largest input. This should improve
memory accesses by minimizing strides. Store throughputs may be lowered,
but it should generally be more important to optimize load accesses.

I do not have actual performance results by this change. I just remember
this was effective in some cases while manually trying out different
optimization strategies. We may eventually need to enable or disable
this reordering by some heuristic.
naoyam added a commit that referenced this pull request Jul 25, 2025
…4823)

I decided to disable the cancellation of reshape in the resize
scheduler. It was originally added in #3679.

It results in about 10% perf regression in the RoPE benchmarks
http://nv/eO-.

The optimization should be reenabled but rather than ad-hoc patching, I
feel we should investigate fixing the root cause of the issue, which is
cycles in the exact graph. Tracking issue: #4839
nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
…VIDIA#4823)

I decided to disable the cancellation of reshape in the resize
scheduler. It was originally added in NVIDIA#3679.

It results in about 10% perf regression in the RoPE benchmarks
http://nv/eO-.

The optimization should be reenabled but rather than ad-hoc patching, I
feel we should investigate fixing the root cause of the issue, which is
cycles in the exact graph. Tracking issue: NVIDIA#4839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants