Schedule loop domains such that reshape transforms are cancelled#3679
Merged
Schedule loop domains such that reshape transforms are cancelled#3679
Conversation
Collaborator
Author
|
!test |
Collaborator
Author
|
!test |
naoyam
commented
Jan 9, 2025
| const std::vector<TensorView*>& tvs, | ||
| Expr* transform); | ||
| Expr* transform, | ||
| Direction replay_dir = Direction::Undefined); |
Collaborator
Author
There was a problem hiding this comment.
Added an optional parameter to restrict the direction.
Collaborator
Author
|
!test |
jacobhinkle
approved these changes
Jan 10, 2025
Collaborator
jacobhinkle
left a comment
There was a problem hiding this comment.
LGTM. Just one small clarifying question.
| // as t0, which could minimize strided accesses. | ||
| // | ||
| // This scheduling is not always feasible. Specifically, if a reshape | ||
| // outout iter domain is resized, the loop domain needs to keep using |
Collaborator
There was a problem hiding this comment.
Could you give an example of what the output would look like if some transforms can't be cancelled? For example if we had a further tensor
// t4 = pad(t3) // [i1, i0*i2 + 2]Then we cannot cancel anything. Presumably if there is a decomposition of the reshape where some is cancellable and another part is not, we will cancel the part that is possible to cancel.
Collaborator
Author
There was a problem hiding this comment.
How about this test?
The reshape for tv1 is not cancelled as the following slice depends on it, whereas the tv3 reshape is cancelled.
Collaborator
Author
|
!build |
Collaborator
Author
|
!build |
naoyam
added a commit
that referenced
this pull request
Jan 15, 2025
Depends on #3674, #3675, #3679 Reorder tensors to align with the largest input. This should improve memory accesses by minimizing strides. Store throughputs may be lowered, but it should generally be more important to optimize load accesses. I do not have actual performance results by this change. I just remember this was effective in some cases while manually trying out different optimization strategies. We may eventually need to enable or disable this reordering by some heuristic.
naoyam
added a commit
that referenced
this pull request
Jul 25, 2025
…4823) I decided to disable the cancellation of reshape in the resize scheduler. It was originally added in #3679. It results in about 10% perf regression in the RoPE benchmarks http://nv/eO-. The optimization should be reenabled but rather than ad-hoc patching, I feel we should investigate fixing the root cause of the issue, which is cycles in the exact graph. Tracking issue: #4839
nsarka
pushed a commit
to nsarka/Fuser
that referenced
this pull request
Jul 28, 2025
…VIDIA#4823) I decided to disable the cancellation of reshape in the resize scheduler. It was originally added in NVIDIA#3679. It results in about 10% perf regression in the RoPE benchmarks http://nv/eO-. The optimization should be reenabled but rather than ad-hoc patching, I feel we should investigate fixing the root cause of the issue, which is cycles in the exact graph. Tracking issue: NVIDIA#4839
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a scheduling primitive,
cancelReshapeInLoopDomains(TensorView* from_tv), where all reshape transforms appearing betweenfrom_tvand fusion outputs are effectively cancelled in their loop domains. Please see the comment for a motivating example.This could be used to remove the restriction of the interfering reshape in reduction/normalization fusions.