Enable sliding window in registers#5815
Conversation
…alization bounds.
…/Halide into dsharletg/slide-registers
|
Ping on a code review, please take a look at this when you get a chance @abadams. I've confirmed that this does address the use case we had in mind well. |
| switch (type) { | ||
| case ForType::Extern: | ||
| case ForType::GPUBlock: | ||
| case ForType::GPUThread: |
There was a problem hiding this comment.
gpu loops absolutely need to be rebased to zero, and there's a pass inside FuseGPUThreadLoops.cpp that does it. Is that mutator now redundant?
There was a problem hiding this comment.
That mutator works a little differently. It substitutes loop_var + min, rather than makes a new let, and that seems like maybe something that might matter for another pass/logic (there's a lot of stuff happening in FuseGPUThreadLoops). I would also be careful about changing when loops get rebased to 0.
Maybe this PR should make it so rebase_loops_to_zero accepts a set of ForType that get rebased, and use that in FuseGPUThreadLoops? That would at least keep the rebasing happening at the same time, and the only change would be substitute vs. let.
There was a problem hiding this comment.
I think the new mutator is just more correct than the old one. I guess the old one happens earlier though, so we can't just rely on the new one to do the mutation. Your proposal sounds good, but I have no strong feelings one way or the other.
There was a problem hiding this comment.
I think the duplication here is minimal and there are non-minimal risks in messing with this, so I'll save it for a separate PR (will file an issue).
| rewrite(select(y, z, x + c0) < x + c1, y && (z < x + c1), c0 >= c1) || | ||
| rewrite(select(y, z, x + c0) < x + c1, !y || (z < x + c1), c0 < c1) || | ||
|
|
||
| rewrite(c0 < select(x, c1, c2), select(x, fold(c0 < c1), fold(c0 < c2))) || |
There was a problem hiding this comment.
These rules all formally verify ok
|
Generally lgtm. I think the substitute call is fishy though. |
|
The OSX failure is unrelated (will be fixed by #5841), should be good to land |
This PR enables an alternative strategy for sliding window (used when the producer of a sliding window is
store_in(MemoryType::Register)). The alternative strategy works by shifting the producer such that the previously computed values are copied from the previous iteration, and the newly computed values are always computed in the same place. This avoids non-constant addresses, which are a prerequisite for storing things in registers (on most targets).This mostly fixes #1820, though it currently requires bending over backwards in the schedule a bit for sliding vectorized dimensions.
@abadams contributed heavily to the ideas implemented here.
This PR has some other related changes:
if_then_elseintrinsics likeselectwhen partitioning loops.boxes_touched, rebase some loops to have a min of 0, to enable stronger simplifications.