Skip to content

Mark aliases during scheduling. #1478

Merged
wujingyue merged 6 commits intomainfrom
wjy/noop
Dec 13, 2023
Merged

Mark aliases during scheduling. #1478
wujingyue merged 6 commits intomainfrom
wjy/noop

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Dec 7, 2023

This is one of the many steps to solve #1401.

Currently, we run alias analysis and mark aliases before segmentation. This can't handle the output-to-output alias patterns we observed in the nanogpt2 benchmark, because some schedulers fork outputs and therefore break the aliasing chain observed before segmentation (and thus before scheduling). #1401 (comment) shows an example for curious minds.

This PR changes the algorithm so it runs alias analysis both before segmentation and during scheduling. The former is used to change layouts to enable aliases, and the latter is used to detect and mark aliases in each segment. This allows segmentation to split out meta-op-only regions so the rest is easier to optimize.

This PR doesn't solve the aforementioned output-to-output alias patterns. I'll need to add logic to segment out meta-op-only output-to-output regions in later PRs.

@wujingyue
Copy link
Collaborator Author

!build

wujingyue added a commit that referenced this pull request Dec 9, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis
before segmentation and during scheduling, so it's good to move common
logic into AliasAnalysis.
@wujingyue wujingyue force-pushed the wjy/noop branch 2 times, most recently from ec54056 to 2ee42e0 Compare December 9, 2023 06:25
@wujingyue wujingyue force-pushed the wjy/noop branch 4 times, most recently from 3b37c9c to 0c42bb4 Compare December 11, 2023 08:15
ExprSegmentationSorter sorter(fusion);
sorter.sort();
auto sorted_exprs = sorter.getExprs();
NVF_ERROR(
Copy link
Collaborator Author

@wujingyue wujingyue Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, ExprSegmentationSorter skips generating code for output aliases, so it's valid to have an empty expression list.

@wujingyue wujingyue changed the base branch from main to wjy/alloc December 11, 2023 08:17
@wujingyue wujingyue force-pushed the wjy/noop branch 2 times, most recently from 9d89ba8 to e62714d Compare December 11, 2023 19:53
@wujingyue wujingyue requested a review from jjsjann123 December 11, 2023 20:54
@wujingyue
Copy link
Collaborator Author

!build

wujingyue added a commit that referenced this pull request Dec 11, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis
before segmentation and during scheduling, so it's good to move common
logic into AliasAnalysis.
wujingyue added a commit that referenced this pull request Dec 11, 2023
This prepares #1478 for landing. We'll need to use AliasAnalysis before
segmentation and during scheduling, so it's good to move common logic
into AliasAnalysis.
Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errr. code diff looks wrong on github GUI. I'll try to do a local diff and review that way.

@jjsjann123
Copy link
Collaborator

Ha I think it's just the conflicts that's making it extra confusing to github. @wujingyue would you mind resolving that first and I'll review afterwards 🙇

Base automatically changed from wjy/alloc to main December 12, 2023 01:34
wujingyue added a commit that referenced this pull request Dec 12, 2023
Currently, `MarkAliasPass` skips an output when its allocation domain is
set at all. This PR makes it smarter to check compliance.

This PR prepares #1478 for landing. Before segmentation, we'll run alias
analysis on the un-segmented fusion and mark layouts for aliasing.
During scheduling, we'll run alias analysis again on segmented fusions
and have to recognize the preferred layout is compliant with the
previously marked layout.
@wujingyue
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// `contiguity=[f,f]` but not vice versa.
// `contiguity=[f,f]` but not vice versa. As a special case,
// an empty `required.allocation` indicates no requirements, i.e., the method
// always returns true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice comment here!

fusion->aliasOutputToInput(
out,
// NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
const_cast<Val*>(in),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, not your problem, but I should probably have updated the signature in aliasOutputToInput to take const Val* instead.

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue merged commit 38caecd into main Dec 13, 2023
@wujingyue wujingyue deleted the wjy/noop branch December 13, 2023 20:47
jacobhinkle pushed a commit that referenced this pull request Dec 15, 2023
This is one of the many steps to solve #1401.

Currently, we run alias analysis and mark aliases before segmentation.
This can't handle the output-to-output alias patterns we observed in the
nanogpt2 benchmark, because some schedulers fork outputs and therefore
break the aliasing chain observed before segmentation (and thus before
scheduling).
#1401 (comment)
shows an example for curious minds.

This PR changes the algorithm so it runs alias analysis both before
segmentation and during scheduling. The former is used to change layouts
to enable aliases, and the latter is used to detect and mark aliases in
each segment. This allows segmentation to split out meta-op-only regions
so the rest is easier to optimize.

This PR doesn't solve the aforementioned output-to-output alias
patterns. I'll need to add logic to segment out meta-op-only
output-to-output regions in later PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants