preserving allocation domain transform on aliased output#3970
preserving allocation domain transform on aliased output#3970jjsjann123 merged 37 commits intomainfrom
Conversation
|
!test |
|
Review updated until commit 0fe131d Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
csrc/fusion.cpp
Outdated
| NVF_CHECK(output->isA<TensorView>() && input->isA<TensorView>(), | ||
| "aliasing output to input is only supported for TensorView"); | ||
| auto new_domain = TransformReplay::selfAllocationReplay(output->as<TensorView>()->domain(), input->as<TensorView>()->domain()); | ||
| output->as<TensorView>()->setAllocationDomain(new_domain->allocation(), new_domain->contiguity()); |
There was a problem hiding this comment.
when we are aliasing output to input, the two needs to have identical allocation domain and contiguity. (technically we only needed compatible allocation domain, but it's easier to have a fixed target).
|
!test |
|
keeping my 🤞 |
|
!test --diff |
|
!test --diff |
|
!test --diff |
|
!test --diff |
csrc/transform_replay.cpp
Outdated
| new_self_root->contiguity()); | ||
| } | ||
|
|
||
| TensorDomain* TransformReplay::selfAllocationReplay( |
There was a problem hiding this comment.
I tagged @wujingyue because I think you asked for something similar to this?
Though I'm not totally sure if this does what you want, because I'm addressing some requirement from remove_bcast_squeeze, which I think we can also refactor this one and put some logic on the user side, if we needed a more generic replay later.
wujingyue
left a comment
There was a problem hiding this comment.
LGTM overall -- there are some details that I want to make sure I understand to have a good mental model
csrc/transform_replay.cpp
Outdated
| new_self_root->contiguity()); | ||
| } | ||
|
|
||
| TensorDomain* TransformReplay::selfAllocationReplay( |
There was a problem hiding this comment.
Note: this can probably be consolidated with the existing
Fuser/csrc/transform_replay.cpp
Line 1220 in 7524c9f
| } | ||
| } | ||
|
|
||
| // If we are replacing an output, we need to preserve the memory layout by |
There was a problem hiding this comment.
Can we do this check in Fusion::replaceOutput?
There was a problem hiding this comment.
That check needs to be opt-in. We have places where we are explicitly altering memory layout. i.e. mark alias prepare and sharding propagation that would intentionally change the memory layout.
But it would be nice to have a check on that.
So the check would actually be checking for compatible transforms. i.e. the two transforms might not necessarily be the same, but they should be equivalent... This feels like something @wujingyue and @zasdfgbnm were discussing together.
|
!test --diff |
|
I'm surprised seeing a segfault with the python_bc_advisory thing. I didn't change any python API. No idea where that segfault is coming from. |
|
!test --diff |
| NVF_ERROR( | ||
| it != replay.getReplay().end(), | ||
| "Error during replay, didn't replay an axis."); | ||
| if (it->second->isBroadcast() == self_contiguity[i].has_value()) { |
There was a problem hiding this comment.
This feels like a bug or a misuse. Is there a good use case for is-broadcast to be inconsistent between self and new_self?
There was a problem hiding this comment.
This is the same reason with the comment above about symbolic IDs. If we are dealing with symbolic ID, we just need to make sure that the contiguity flag on the given ID is consistent with itself.
Later during concretization, it will fix the consistency globally.
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
|
!test --diff |
|
Failure doesn't seem to be relevant. We also have all test cleared in 4e6d81e. I'm merging this PR as-is. |
Fixes #3957
The issue was that when we specify output alias, we need to update the (allocation domain and contiguity) of the alias to be identical to the source.