Predicate indexing for circular buffering#2677
Conversation
|
!build |
|
CC: @zasdfgbnm, @rdspring1 |
|
!build |
|
!build |
|
Pinging @jacobhinkle |
jacobhinkle
left a comment
There was a problem hiding this comment.
LGTM. Tests make sense.
| inline std::unordered_set<IdModelEnableOption> getIdModelEnabledOptions() { | ||
| std::unordered_set<IdModelEnableOption> opts; | ||
|
|
||
| if (hasEnableOptionArgument(EnableOption::IdModel, "consumer_index") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "index") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "all")) { | ||
| opts.insert(IdModelEnableOption::ConsumerIndex); | ||
| } | ||
|
|
||
| if (hasEnableOptionArgument(EnableOption::IdModel, "producer_index") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "index") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "all")) { | ||
| opts.insert(IdModelEnableOption::ProducerIndex); | ||
| } | ||
|
|
||
| if (hasEnableOptionArgument(EnableOption::IdModel, "inline_predicate") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "predicate") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "all")) { | ||
| opts.insert(IdModelEnableOption::InlinePredicate); | ||
| } | ||
|
|
||
| if (hasEnableOptionArgument(EnableOption::IdModel, "unswitch_predicate") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "predicate") || | ||
| hasEnableOptionArgument(EnableOption::IdModel, "all")) { | ||
| opts.insert(IdModelEnableOption::UnswitchPredicate); | ||
| } | ||
|
|
||
| return opts; | ||
| } | ||
|
|
||
| inline bool isIdModelOptionEnabled(IdModelEnableOption option) { | ||
| const auto opts = getIdModelEnabledOptions(); | ||
| return opts.find(option) != opts.end(); | ||
| } |
There was a problem hiding this comment.
Nit. You may have other intended uses of this utility that would make use of getIdModelEnabledOptions() directly. If not, then you could instead replace this with a switch (option) in isIdModelOptionEnabled.
There was a problem hiding this comment.
I don't have any particular plan for this except that it should be retired once everything about IdModel is enabled by default, so I'll leave it as is.
csrc/id_model/predicate_indexing.cpp
Outdated
| return nullptr; | ||
| } | ||
|
|
||
| // Epilog should not hit this part |
There was a problem hiding this comment.
Is this specifically because there is now a different IterDomain mapped to each ForLoop that are not Loop mapped with one another and getCircularBufferAxis(tv) returns the IterDomain of the main loop only?
There was a problem hiding this comment.
I'll expand the comment here, but that's just because predication of an expr is based on the consumer. If the control reaches here, it means the tensor is circular buffered. In the epilogue loop, the circular buffer tensor should never appear as a consumer but only as a producer. So, what this part means is that this tensor is a consumer tensor and circular buffered, so the loop stage should never be epilogue.
There was a problem hiding this comment.
Ah ok. That makes sense. Thanks for the explanation.
| // (number_of_stages - 1) elements ahead. | ||
| auto replace_for_circular_buffering = [&](ForLoop* fl, | ||
| Val* original_index) -> Val* { | ||
| auto circular_buffer_axis = |
There was a problem hiding this comment.
Nit: probably a good idea to check this here like in getCircularBufferLoopStage:
| auto circular_buffer_axis = | |
| NVF_ERROR( | |
| GpuLower::hasCurrent(), | |
| "Circular buffering info of GpuLower is required but GpuLower is missing"); | |
| auto circular_buffer_axis = |
There was a problem hiding this comment.
I started this new indexing without assuming the existence of GpuLower, but I realized that's not reasonable, so hasCurrent() is assumed everywhere. Also, GpuLower::current() does the check too, so there should be an assertion error if the assumption is broken.
…through merge-inner or split-outer domains (#2689) (Stacked on top of #2677) The original issue is #681. It was addressed in #687. This PR is NOT as comprehensive as #687, but my gut feeling is that this should be good enough, in particular since contig indexing would avoid backward traversals through merge in many cases. I'll do final more comprehensive comparison with the legacy indexing once contig indexing is done. Since the original PR and issue were reviewed by @zasdfgbnm, could you please review this too?
(Stacked on #2677) Basically just porting what's already done with the current predication
Adding support of predicate indexing with circular buffering.
Circular buffering itself doesn't need many changes, but circular buffering and unswitch/unroll is a bit more complicated. There's an existing bug as well, which is fixed here.
#2663 could simplify this PR but we probably don't want to enforce epilogue generation. This PR doesn't rely on it.
Fixes #2159