Conversation
|
!test --diff |
|
Review updated until commit c71bc71 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
csrc/fusion_segmenter.cpp
Outdated
| for (const auto& id : logical) { | ||
| if (id->isRFactorProduct()) { | ||
| // Create new symbolic extents for logical iterDomains |
There was a problem hiding this comment.
Question: What is the reasoning behind this code snippet? Why do we create a symbolic id for rfactored iterdomains if not all ids are concrete?
There was a problem hiding this comment.
I think this is because we don't need to keep the history of transformations. I vaguely remember @jacobhinkle did something around here.
If it's "concrete", we know the concrete size, so that will be used instead anyway.
There was a problem hiding this comment.
That's correct. This was needed because we sometimes had a reshaped extent in a segmentation edge. Those extents depended on scalars from other segments such as the original input shape, which led to errors evaluating those segments.
|
@jjsjann123 I am putting this PR up for review. We can discuss if/how it may interfere with your work on padding, and identify any changes that should go before this PR. |
|
!test --diff |
|
Thanks! LGTM overall |
csrc/transform_replay.cpp
Outdated
| const std::vector<std::optional<bool>>& self_contiguity = | ||
| self->contiguity(); | ||
| NVF_ERROR_EQ(self_allocation.size(), self_contiguity.size()); | ||
| // Replay maybeAllocation and contiguity. |
There was a problem hiding this comment.
The change here is that I am replaying maybeAllocation unconditionally.
There was a problem hiding this comment.
Why? It seems reasonable to skip below if nothing is set for the allocation domain. What am I missing?
There was a problem hiding this comment.
It was because of: #5177 (comment)
Even if there is no allocation domain sepcified, there maybe a contiguity set. By default, TensorDomain is created with false contiguity. This replay makes sure that if the original domain has a contiguity of true (or any non-default value), it is correctly replayed
There was a problem hiding this comment.
FWIW, I am moving this change to another PR since it is breaking some tests. It would be clearer to do it separately
There was a problem hiding this comment.
Even if there is no allocation domain sepcified, there maybe a contiguity set. By default, TensorDomain is created with false contiguity. This replay makes sure that if the original domain has a contiguity of true (or any non-default value), it is correctly replayed
Not sure why replay is necessary. Can't we just copy the bool vector?
There was a problem hiding this comment.
That's why I suggested we should keep the API as simple as possible and make each function do minimal work so that it could be easily composable. The original concept of selfReplay is quite simple, and I'm worried that by adding more stuff into the function, it could be getting less flexible to use.
I think we are on the same page here. I was responding to your earlier comment about what if a non-reduction ID is mapped to reduction ID. The base replay class here would not do the correct thing here generating IDs with the itertype of original ID instead. Hence, my comment on not trying to support such a case here, and maybe adding a check to ensure hard-fail.
I disagree since there's no absolute definition of "mostly identical". Instead of defining what's identical, this interface is meant to let each caller to decide which IDs are considered mapped, and that should not be dictated by selfReplay itself.
Do you disagree on the part about rejecting reduction and non-reduction IDs being mapped, and letting the caller ensure that instead?
that by adding more stuff into the function, it could be getting less flexible to use.
This PR reverted the changes around contiguity replay and it is now in the calling code instead (segmentor)
We do replay contiguity anyway if allocation is present. However, contiguity can exist even without allocation, so what is incorrect with making allocation and contiguity replay independent (given that selfReplay is not handling ambiguous cases such as where reduction is mapped to non-reduction ID)?
There was a problem hiding this comment.
The base replay class here would not do the correct thing here generating IDs with the itertype of original ID instead.
Ah, that's bit dirty behavior that we have been living with, that is, reduction rfactor creates non-reduction IDs out of a reduction ID. I believe that's the only case where split changes the input iter type. That's more like an exception, so I wouldn't let it determine the overall API design.
Do you disagree on the part about rejecting reduction and non-reduction IDs being mapped, and letting the caller ensure that instead?
My point is that the core implementation should be free from that decision. We could have multiple wrappers around it tailored for various particular use cases, but we should keep the core component as simple as possible.
so what is incorrect with making allocation and contiguity replay independent (given that selfReplay is not handling ambiguous cases such as where reduction is mapped to non-reduction ID)?
Sorry, I'm not sure what your question is. Can you rephrase it, please?
There was a problem hiding this comment.
Ah, that's bit dirty behavior that we have been living with, that is, reduction rfactor creates non-reduction IDs out of a reduction ID. I believe that's the only case where split changes the input iter type. That's more like an exception, so I wouldn't let it determine the overall API design.
Interesting. Do we have a case where we need this? My understanding was we do that because we expect them to match anyway.
My point is that the core implementation should be free from that decision. We could have multiple wrappers around it tailored for various particular use cases, but we should keep the core component as simple as possible.
When we replay allocation, currently we do not handle (or expect?) this case. If we have mapped reduction and non-reduction IDs, then we could set the contiguity to be false to be conservative. That is, if the replay of a reduction ID is a non-reduction ID during allocation domain replay, the corresponding contiguity flag is false.
However, if we do not have a case where we do this, it seems simpler to be non-ambiguous and not allow this.
so what is incorrect with making allocation and contiguity replay independent (given that selfReplay is not handling ambiguous cases such as where reduction is mapped to non-reduction ID)?
The motivation for replaying contiguity in selfReplay (not in this PR) was: The Tensordomain created using IrBuider::create.. has a default contiguity of false. It is possible that the original tensordomain had some other contiguity even if we do not have a allocation domain. So we should replay that contiguity to the new tensordomain. Right now, I am doing this in the caller code, that is, in the segmentor so that in the process of cloning tensordomains, I do not lose that information.
There was a problem hiding this comment.
The motivation for replaying contiguity in selfReplay (not in this PR) was: The Tensordomain created using IrBuider::create.. has a default contiguity of false.
That's right. Since maybeAllocation and contiguity go hand in hand, I think it's a bug to replay allocation but not contiguity. So this change itself is fine.
To the larger question about the contract of selfReplay, I hope #5221 will simplify the contract and make it more strict. I'm still testing it out...
There was a problem hiding this comment.
Interesting. Do we have a case where we need this?
Yes, it's how rfactor works, IIRC.
When we replay allocation, currently we do not handle (or expect?) this case. If we have mapped reduction and non-reduction IDs, then we could set the contiguity to be false to be conservative. That is, if the replay of a reduction ID is a non-reduction ID during allocation domain replay, the corresponding contiguity flag is false.
However, if we do not have a case where we do this, it seems simpler to be non-ambiguous and not allow this.
That's exactly why I'm suggesting keeping selfReplay as simple as possible and moving the responsibility of what to propagate and how to update contiguity to something else. I was chatting with @jjsjann123 about #5184, and it seems we would like to use selfReplay from a fusion output but only with its logical domain since the allocation domain of the output doesn't make any sense to propagate, whereas in the use case of the fusion segmenter, we do want to keep the allocation domain propagated.
These different use cases have been pretty common, and that's why we have those options like bool propagate_allocation. That's not ideal, but I disagree with just removing it just because that would be fine for one case, because it would make it difficult to use in some other cases.
|
!test --diff |
|
!test --diff |
|
!test --diff |
|
!test --diff |
| auto compare_result = ir_utils::compareDomains( | ||
| tv->getLogicalDomain(), | ||
| tv->getLoopDomain(), | ||
| /*additional_ids=*/{}, | ||
| /*ignore_broadcast=*/false); | ||
| bool has_disjoint_loop_logical = compare_result.dom0_has_unreachable_ids || | ||
| compare_result.dom1_has_unreachable_ids; | ||
|
|
||
| new_td = IrBuilder::create<TensorDomain>( | ||
| /*root_domain=*/std::vector<IterDomain*>(), | ||
| new_logical_domain, | ||
| new_alloc, | ||
| new_loop, | ||
| tv->domain()->contiguity()); | ||
| if (has_disjoint_loop_logical) { |
There was a problem hiding this comment.
@wujingyue Had to make this new change after your review, if you would like to take another look.
There was a problem hiding this comment.
Fine with me. However, I've no idea what's going on. I can see the superficial reason that Scatter and Gather have disjoint logical and loop, but I've yet to understand the reason behind that. I'm sure it's my first time trying to understand how Scatter and Gather are codegen'ed. cc @naoyam and @jjsjann123 who might know the answer or any pointers.
There was a problem hiding this comment.
I don't mind the assert here on a gather scatter ID. Conceptually we only expect gather/scatter to create such scenario.
Any other things like padding that could affect the mapping between allocation to logical are only going to show up between logical->allocation. But shouldn't have affected loop.
For #4381.
This allows us to replay loop and allocation both instead of assuming loop is the same as either allocation or logical.