-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Enable sliding window in registers #5815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
e81714e
Sliding in registers
dsharletg 89ef82a
Fix some failure cases.
dsharletg c8a3fb1
Handle if_then_else in loop partitioning.
dsharletg d05c72b
Add rebase_loops_to_zero pass.
dsharletg 975d700
Use select instead of if_then_else.
dsharletg 85c0ab5
Merge branch 'master' of github.com:halide/Halide into dsharletg/slid…
dsharletg 085ba48
Add select comparison simplifications.
dsharletg 411e0cb
Don't rewrite lets
dsharletg ce56515
Rebase producer loops of register slides to 0, and don't overwrite re…
dsharletg 4c0a6c5
Add rules for ramp < broadcast
dsharletg f422129
Put the likely on the old value instead of the new value.
dsharletg 8466e4d
New rules for comparing ramps and broadcasts
abadams 0f2c9e9
Merge branch 'dsharletg/slide-registers' of https://github.com/halide…
abadams f7111ca
Switch back to if_then_else
dsharletg 16ad4e0
Update comments.
dsharletg bc6a7c6
Don't try to fold dimensions with a constant min or max.
dsharletg 944ab79
More comments.
dsharletg b94a59b
Make the vectorized register sliding window test tighter.
dsharletg 70f9d7a
Remove debug helper.
dsharletg e54dd90
Fix tests broken by loop rebasing.
dsharletg 86b4fd6
Move rebasing after loop partitioning
dsharletg afe379d
clang-format
dsharletg b68bdcf
clang-tidy
dsharletg c142b77
Also put MemoryType::Register on the stack.
dsharletg 2e1f91e
Merge branch 'master' into dsharletg/slide-registers
steven-johnson 25b3997
Merge branch 'master' of github.com:halide/Halide into dsharletg/slid…
dsharletg 084aba3
Expand arg before substitute.
dsharletg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| #include "RebaseLoopsToZero.h" | ||
| #include "IRMutator.h" | ||
| #include "IROperator.h" | ||
|
|
||
| namespace Halide { | ||
| namespace Internal { | ||
|
|
||
| using std::string; | ||
|
|
||
| namespace { | ||
|
|
||
| bool should_rebase(ForType type) { | ||
| switch (type) { | ||
| case ForType::Extern: | ||
| case ForType::GPUBlock: | ||
| case ForType::GPUThread: | ||
| case ForType::GPULane: | ||
| return false; | ||
| default: | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| class RebaseLoopsToZero : public IRMutator { | ||
| using IRMutator::visit; | ||
|
|
||
| Stmt visit(const For *op) override { | ||
| if (!should_rebase(op->for_type)) { | ||
| return IRMutator::visit(op); | ||
| } | ||
| Stmt body = mutate(op->body); | ||
| string name = op->name; | ||
| if (!is_const_zero(op->min)) { | ||
| // Renaming the loop (intentionally) invalidates any .loop_min/.loop_max lets. | ||
| name = op->name + ".rebased"; | ||
| Expr loop_var = Variable::make(Int(32), name); | ||
| body = LetStmt::make(op->name, loop_var + op->min, body); | ||
| } | ||
| if (body.same_as(op->body)) { | ||
| return op; | ||
| } else { | ||
| return For::make(name, 0, op->extent, op->for_type, op->device_api, body); | ||
| } | ||
| } | ||
| }; | ||
|
|
||
| } // namespace | ||
|
|
||
| Stmt rebase_loops_to_zero(const Stmt &s) { | ||
| return RebaseLoopsToZero().mutate(s); | ||
| } | ||
|
|
||
| } // namespace Internal | ||
| } // namespace Halide | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #ifndef HALIDE_REBASE_LOOPS_TO_ZERO_H | ||
| #define HALIDE_REBASE_LOOPS_TO_ZERO_H | ||
|
|
||
| /** \file | ||
| * Defines the lowering pass that rewrites loop mins to be 0. | ||
| */ | ||
|
|
||
| #include "Expr.h" | ||
|
|
||
| namespace Halide { | ||
| namespace Internal { | ||
|
|
||
| /** Rewrite the mins of most loops to 0. */ | ||
| Stmt rebase_loops_to_zero(const Stmt &); | ||
|
|
||
| } // namespace Internal | ||
| } // namespace Halide | ||
|
|
||
| #endif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -285,9 +285,43 @@ Expr Simplify::visit(const LT *op, ExprInfo *bounds) { | |
| rewrite(select(y, z, x + c0) < x + c1, y && (z < x + c1), c0 >= c1) || | ||
| rewrite(select(y, z, x + c0) < x + c1, !y || (z < x + c1), c0 < c1) || | ||
|
|
||
| rewrite(c0 < select(x, c1, c2), select(x, fold(c0 < c1), fold(c0 < c2))) || | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These rules all formally verify ok |
||
| rewrite(select(x, c1, c2) < c0, select(x, fold(c1 < c0), fold(c2 < c0))) || | ||
|
|
||
| // Normalize comparison of ramps to a comparison of a ramp and a broadacst | ||
| rewrite(ramp(x, y, lanes) < ramp(z, w, lanes), ramp(x - z, y - w, lanes) < 0))) || | ||
| rewrite(ramp(x, y, lanes) < ramp(z, w, lanes), ramp(x - z, y - w, lanes) < 0) || | ||
|
|
||
| // Rules of the form: | ||
| // rewrite(ramp(x, y, lanes) < broadcast(z, lanes), ramp(x - z, y, lanes) < 0) || | ||
| // where x and z cancel usefully | ||
| rewrite(ramp(x + z, y, lanes) < broadcast(x + w, lanes), ramp(z, y, lanes) < broadcast(w, lanes)) || | ||
| rewrite(ramp(z + x, y, lanes) < broadcast(x + w, lanes), ramp(z, y, lanes) < broadcast(w, lanes)) || | ||
| rewrite(ramp(x + z, y, lanes) < broadcast(w + x, lanes), ramp(z, y, lanes) < broadcast(w, lanes)) || | ||
| rewrite(ramp(z + x, y, lanes) < broadcast(w + x, lanes), ramp(z, y, lanes) < broadcast(w, lanes)) || | ||
|
|
||
| // z = 0 | ||
| rewrite(ramp(x, y, lanes) < broadcast(x + w, lanes), ramp(0, y, lanes) < broadcast(w, lanes)) || | ||
| rewrite(ramp(x, y, lanes) < broadcast(w + x, lanes), ramp(0, y, lanes) < broadcast(w, lanes)) || | ||
|
|
||
| // w = 0 | ||
| rewrite(ramp(x + z, y, lanes) < broadcast(x, lanes), ramp(z, y, lanes) < 0) || | ||
| rewrite(ramp(z + x, y, lanes) < broadcast(x, lanes), ramp(z, y, lanes) < 0) || | ||
|
|
||
| // With the args flipped | ||
| rewrite(broadcast(x + w, lanes) < ramp(x + z, y, lanes), broadcast(w, lanes) < ramp(z, y, lanes)) || | ||
| rewrite(broadcast(x + w, lanes) < ramp(z + x, y, lanes), broadcast(w, lanes) < ramp(z, y, lanes)) || | ||
| rewrite(broadcast(w + x, lanes) < ramp(x + z, y, lanes), broadcast(w, lanes) < ramp(z, y, lanes)) || | ||
| rewrite(broadcast(w + x, lanes) < ramp(z + x, y, lanes), broadcast(w, lanes) < ramp(z, y, lanes)) || | ||
|
|
||
| // z = 0 | ||
| rewrite(broadcast(x + w, lanes) < ramp(x, y, lanes), broadcast(w, lanes) < ramp(0, y, lanes)) || | ||
| rewrite(broadcast(w + x, lanes) < ramp(x, y, lanes), broadcast(w, lanes) < ramp(0, y, lanes)) || | ||
|
|
||
| // w = 0 | ||
| rewrite(broadcast(x, lanes) < ramp(x + z, y, lanes), 0 < ramp(z, y, lanes)) || | ||
| rewrite(broadcast(x, lanes) < ramp(z + x, y, lanes), 0 < ramp(z, y, lanes)) || | ||
|
|
||
| false)) || | ||
| (no_overflow_int(ty) && EVAL_IN_LAMBDA | ||
| (rewrite(x * c0 < y * c1, x < y * fold(c1 / c0), c1 % c0 == 0 && c0 > 0) || | ||
| rewrite(x * c0 < y * c1, x * fold(c0 / c1) < y, c0 % c1 == 0 && c1 > 0) || | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu loops absolutely need to be rebased to zero, and there's a pass inside FuseGPUThreadLoops.cpp that does it. Is that mutator now redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That mutator works a little differently. It substitutes loop_var + min, rather than makes a new let, and that seems like maybe something that might matter for another pass/logic (there's a lot of stuff happening in FuseGPUThreadLoops). I would also be careful about changing when loops get rebased to 0.
Maybe this PR should make it so
rebase_loops_to_zeroaccepts a set ofForTypethat get rebased, and use that in FuseGPUThreadLoops? That would at least keep the rebasing happening at the same time, and the only change would be substitute vs. let.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new mutator is just more correct than the old one. I guess the old one happens earlier though, so we can't just rely on the new one to do the mutation. Your proposal sounds good, but I have no strong feelings one way or the other.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the duplication here is minimal and there are non-minimal risks in messing with this, so I'll save it for a separate PR (will file an issue).