egraphs: Add bmask bit pattern optimization rule#6196
egraphs: Add bmask bit pattern optimization rule#6196jameysharp merged 8 commits intobytecodealliance:mainfrom
bmask bit pattern optimization rule#6196Conversation
aa76a67 to
56ed912
Compare
| ;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1) | ||
| (rule (simplify (bnot ty | ||
| (isub ty | ||
| (ushr ty | ||
| (bor ty | ||
| input | ||
| (iadd ty | ||
| (bnot ty input) | ||
| (iconst ty (u64_from_imm64 1)))) | ||
| (iconst ty (u64_from_imm64 shift_amt))) | ||
| (iconst ty (u64_from_imm64 1))))) | ||
| (if-let $true (u64_eq shift_amt (u64_sub (ty_bits ty) 1))) | ||
| (bmask ty input)) |
There was a problem hiding this comment.
Is this actually correct? I couldn't convince myself, so I asked Z3 to find an x for which bmask(x) and !(((x | ((!x) + 1)) >> (ty_bits - 1)) - 1) were not equal (that is, I asked it to find a counterexample):
(declare-const x (_ BitVec 32))
(assert (! (= (ite (= x #x00000000)
#x00000000
#xffffffff)
;; !(((x | ((!x) + 1)) >> 31) - 1)
(bvnot (bvsub (bvlshr (bvor x
(bvadd (bvnot x)
#x00000001))
#x0000001f)
#x00000001)))))
(check-sat)
(get-model)It reports sat, meaning that it found a counterexample, and it says the counterexample is zero:
sat
(
(define-fun x () (_ BitVec 32)
#x00000000)
)So unless I'm missing something and/or translated the expression incorrectly, I think this optimization is incorrect?
There was a problem hiding this comment.
Then again, there is an explicit test for 0 in the run tests, so I must have done something wrong...
There was a problem hiding this comment.
The SMT counterexample is confusing me too:
x1 := (bvnot x)withx := 0->0xffffffffx2 := (bvadd x1 1)->0x3 := (bvor x2 x)->(bvor 0 0)->0x4 := (bvlshr x3 31)->0x5 := (bvsub x4 1)->0xffffffffx6 := (bvnot x5)->0
and on the LHS for the if-then-else we have (ite (= 0 0) 0 0xffffffff) -> 0. Seems like the optimization did what it was supposed to? I too am confused why we get SAT here.
There was a problem hiding this comment.
That Z3 expression looks like a correct translation of that pattern to my eyes, so I'm confused too.
I think the rule seems correct:
!input + 1is equivalent to-inputin two's-complement.- 0 is the only value which, when negated, leaves the sign bit clear. (
INT_MINleaves the sign bit set; all other cases flip the sign bit.) - Shifting the sign bit down after
x | -xthen gives us 0 if the input was 0, and 1 otherwise. - Subtracting 1 means we get -1 if the input was 0, and 0 otherwise.
- Inverting the result should then give us the right answer.
There was a problem hiding this comment.
And I think this is a reasonable informal proof, by cases:
Prove: ;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1) (for 64-bit input)
- Take the three cases
input == 0: we have the above evaluation; the RHS produces0andbmask(0) == 0.input < 0:inputhas MSB = 1, thusinput | ...has MSB = 1. Right-shift by 63, we have1;1 - 1=0; outer bnot gives us all-ones.input > 0:inputhas MSB = 0, thus!inputhas MSB = 1. Furthermore!input != 0xffff...ffff(else we would have hadinput == 0). So!input + 1also has MSB = 1 (because it will not wrap around, because of previous statement). Soinput | ((!input) + 1)has MSB = 1. Right-shift by 63, we have1;1 - 1=0; outer bnot gives us all-ones.
There was a problem hiding this comment.
(@afonso360 a comment to this effect in the source would be very useful for the future!)
There was a problem hiding this comment.
Ah, and after posting I see @jameysharp 's comment too. Both of us arrived at the same reasoning more or less so I'm fairly confident here :-)
There was a problem hiding this comment.
We can extract out more broadly-applicable rules that should compose nicely into this one.
Rewrite both of these patterns to (ineg ty input):
(iadd ty (bnot ty input) (iconst ty (u64_from_imm64 1)))(bnot ty (isub ty input (iconst ty (u64_from_imm64 1))))
In a 64-bit word, -(x >> 63), where right-shift is unsigned, should I think be equivalent to x >> 63 with a signed right-shift. Or in general, rewrite negation of unsigned shifts by N-1 bits in N-bit words.
Then this rule only needs to match (x | -x) >> (N-1), with a signed right-shift.
That should make all these rules more broadly applicable as well as (somewhat) easier to understand.
x | -x leaves the trailing 0s alone and sets the remaining bits to 1. (I found the inverse of this pattern in "Hacker's Delight", on page 12.) I don't think there's a useful rule we can extract from that pattern alone.
There was a problem hiding this comment.
The problem with the smtlib version was the use of !, which is for annotating expressions: switching to not fixes the problem:
(declare-const x (_ BitVec 32))
(assert (not (= (ite (= x #x00000000)
#x00000000
#xffffffff)
;; !(((x | ((!x) + 1)) >> 31) - 1)
(bvnot (bvsub (bvlshr (bvor x
(bvadd (bvnot x)
#x00000001))
#x0000001f)
#x00000001)))))
(check-sat)
(get-model)
produces the following output:
% z3 -smt2 test.smt
unsat
(error "line 14 column 10: model is not available")
fitzgen
left a comment
There was a problem hiding this comment.
LGTM with the nitpicks addressed. Thanks @afonso360!
| (if-let -1 (i64_sextend_imm64 ty k)) | ||
| (bnot ty x)) | ||
|
|
||
| ;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1) |
There was a problem hiding this comment.
Nitpick: instead of hard coding 63 lets make this bits(ty) - 1.
Also, lets add a comment with an informal proof / argument about why this is correct, as @cfallin mentioned.
There was a problem hiding this comment.
I think the correctness argument is much less important to write out if this is split into the four separate rules I suggested. They're individually much easier to verify.
|
#6140 has been merged, so the first 2 commits can be removed |
|
That is a really neat way of decomposing this op @jameysharp, Thanks! And thank you everyone for proving this works! I've added some comments that hopefully explain the new rule in a way that makes sense to anyone reading, but it's a lot simpler now. |
jameysharp
left a comment
There was a problem hiding this comment.
I'd like to add various commutative versions of these rules and make some comments consistent with elsewhere in the egraph rules (details below) but even if you merge without any changes I think this PR would be great.
I'm super pleased to see those nine-instruction sequences collapse into just one! And bmask has two-instruction lowerings in many cases on the backends I checked. Overall I find this very satisfying.
I'd like to know if this has any performance impact but I still haven't gotten around to adding myself to the list of people who can use /bench_x64 😅
afonso360
left a comment
There was a problem hiding this comment.
Let's see if this has any impact on benchmarks! It seems a lot easier to match now.
/bench_x64
|
The /bench_x64 thing didn't run but I've just run a few Sightglass benchmarks on this locally. On bz2, pulldown-cmark, and spidermonkey, there's no significant difference between this PR at e1d2f51 and main at 2d25db0, except in compilation speed, where this PR seems to be faster (by <1%) for no obvious reason. (There are a few extra commits on the version of I'm going ahead and merging this. These are good rewrites even if they don't happen to help these particular benchmarks, after all. |
…6196) * egraphs: Add a bmask bit pattern optimization * egraphs: Add more `ineg` rules * egraphs: Add sshr rule * egraphs: Simplify bmask rule * egraphs: Add comutative version of bmask rule * egraphs: Add more testcases * egraphs: Cleanup rule comments * egraphs: Add more `ineg` optimizations
👋 Hey,
This is something I wrote in #5888 and then @elliottt pointed out that It's just
bmask. I checked if we had a rule for this in the mid end and we don't.This is a really long rule, and I'm not sure how often it's going to fire. But I also got that lowering from here so it is out there in the real world.
This is based on top of #6140 so that we don't have to rebase that again.
I'm going to let the fuzzer run on this for a while.