[TIR] Fix plan buffer allocation location for loop carried dependencies #12757
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The pass
PlanAndUpdateBufferAllocationLocationseems to have problem when the buffer accessed indices take a loop carried dependency. As an example,The block
b1's read access to intermediate bufferCon iterationi, dependsb0write ofCon bothiandi-1, thus we should not put allocation ofCunder loopi, which is the LCA position of current plan strategy.To fix the issue we change the behavior of
DetectBufferLCAto be aware of opaque block iters (loop carried dependency and other more complex behaviors are categorized asopaquein iter type annotation).It enforce that every legal "ancestor" of buffer accesses should dominate all loops relates to accessed opaque block iters within buffer indices. Eg, since
viis opaque, bufferCindices usevi, the loopimust be under the planned allocation point ofC.As an interesting workload related to loop carried dependency, refer to https://discuss.tvm.apache.org/t/rfc-introducing-a-rolling-buffer-scheduling-primitive/9836, where the intermediate result of previous iteration is try best to get reused.
cc @Hzfengsy @junrushao1994