Conversation
Collaborator
Author
|
!build |
wujingyue
added a commit
that referenced
this pull request
Dec 9, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis before segmentation and during scheduling, so it's good to move common logic into AliasAnalysis.
ec54056 to
2ee42e0
Compare
3b37c9c to
0c42bb4
Compare
wujingyue
commented
Dec 11, 2023
| ExprSegmentationSorter sorter(fusion); | ||
| sorter.sort(); | ||
| auto sorted_exprs = sorter.getExprs(); | ||
| NVF_ERROR( |
Collaborator
Author
There was a problem hiding this comment.
FYI, ExprSegmentationSorter skips generating code for output aliases, so it's valid to have an empty expression list.
9d89ba8 to
e62714d
Compare
Collaborator
Author
|
!build |
wujingyue
added a commit
that referenced
this pull request
Dec 11, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis before segmentation and during scheduling, so it's good to move common logic into AliasAnalysis.
wujingyue
added a commit
that referenced
this pull request
Dec 11, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis before segmentation and during scheduling, so it's good to move common logic into AliasAnalysis.
jjsjann123
reviewed
Dec 11, 2023
Collaborator
jjsjann123
left a comment
There was a problem hiding this comment.
errr. code diff looks wrong on github GUI. I'll try to do a local diff and review that way.
Collaborator
|
Ha I think it's just the conflicts that's making it extra confusing to github. @wujingyue would you mind resolving that first and I'll review afterwards 🙇 |
wujingyue
added a commit
that referenced
this pull request
Dec 12, 2023
Currently, `MarkAliasPass` skips an output when its allocation domain is set at all. This PR makes it smarter to check compliance. This PR prepares #1478 for landing. Before segmentation, we'll run alias analysis on the un-segmented fusion and mark layouts for aliasing. During scheduling, we'll run alias analysis again on segmented fusions and have to recognize the preferred layout is compliant with the previously marked layout.
Collaborator
Author
|
!build |
jjsjann123
approved these changes
Dec 12, 2023
| // `contiguity=[f,f]` but not vice versa. | ||
| // `contiguity=[f,f]` but not vice versa. As a special case, | ||
| // an empty `required.allocation` indicates no requirements, i.e., the method | ||
| // always returns true. |
Collaborator
There was a problem hiding this comment.
Thanks for the nice comment here!
| fusion->aliasOutputToInput( | ||
| out, | ||
| // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) | ||
| const_cast<Val*>(in), |
Collaborator
There was a problem hiding this comment.
nitpick, not your problem, but I should probably have updated the signature in aliasOutputToInput to take const Val* instead.
Collaborator
Author
|
!build |
jacobhinkle
pushed a commit
that referenced
this pull request
Dec 15, 2023
This is one of the many steps to solve #1401. Currently, we run alias analysis and mark aliases before segmentation. This can't handle the output-to-output alias patterns we observed in the nanogpt2 benchmark, because some schedulers fork outputs and therefore break the aliasing chain observed before segmentation (and thus before scheduling). #1401 (comment) shows an example for curious minds. This PR changes the algorithm so it runs alias analysis both before segmentation and during scheduling. The former is used to change layouts to enable aliases, and the latter is used to detect and mark aliases in each segment. This allows segmentation to split out meta-op-only regions so the rest is easier to optimize. This PR doesn't solve the aforementioned output-to-output alias patterns. I'll need to add logic to segment out meta-op-only output-to-output regions in later PRs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is one of the many steps to solve #1401.
Currently, we run alias analysis and mark aliases before segmentation. This can't handle the output-to-output alias patterns we observed in the nanogpt2 benchmark, because some schedulers fork outputs and therefore break the aliasing chain observed before segmentation (and thus before scheduling). #1401 (comment) shows an example for curious minds.
This PR changes the algorithm so it runs alias analysis both before segmentation and during scheduling. The former is used to change layouts to enable aliases, and the latter is used to detect and mark aliases in each segment. This allows segmentation to split out meta-op-only regions so the rest is easier to optimize.
This PR doesn't solve the aforementioned output-to-output alias patterns. I'll need to add logic to segment out meta-op-only output-to-output regions in later PRs.