egraphs: Add `bmask` bit pattern optimization rule by afonso360 · Pull Request #6196 · bytecodealliance/wasmtime

afonso360 · 2023-04-11T10:54:44Z

👋 Hey,

This is something I wrote in #5888 and then @elliottt pointed out that It's just bmask. I checked if we had a rule for this in the mid end and we don't.

This is a really long rule, and I'm not sure how often it's going to fire. But I also got that lowering from here so it is out there in the real world.

This is based on top of #6140 so that we don't have to rebase that again.

I'm going to let the fuzzer run on this for a while.

fitzgen · 2023-04-11T17:59:44Z

cranelift/codegen/src/opts/bitops.isle

+;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1)
+(rule (simplify (bnot ty
+                      (isub ty
+                            (ushr ty
+                                  (bor ty
+                                        input
+                                        (iadd ty
+                                              (bnot ty input)
+                                              (iconst ty (u64_from_imm64 1))))
+                                  (iconst ty (u64_from_imm64 shift_amt)))
+                            (iconst ty (u64_from_imm64 1)))))
+      (if-let $true (u64_eq shift_amt (u64_sub (ty_bits ty) 1)))
+      (bmask ty input))


Is this actually correct? I couldn't convince myself, so I asked Z3 to find an x for which bmask(x) and !(((x | ((!x) + 1)) >> (ty_bits - 1)) - 1) were not equal (that is, I asked it to find a counterexample):

(declare-const x (_ BitVec 32)) (assert (! (= (ite (= x #x00000000) #x00000000 #xffffffff) ;; !(((x | ((!x) + 1)) >> 31) - 1) (bvnot (bvsub (bvlshr (bvor x (bvadd (bvnot x) #x00000001)) #x0000001f) #x00000001))))) (check-sat) (get-model)

It reports sat, meaning that it found a counterexample, and it says the counterexample is zero:

sat ( (define-fun x () (_ BitVec 32) #x00000000) )

So unless I'm missing something and/or translated the expression incorrectly, I think this optimization is incorrect?

Then again, there is an explicit test for 0 in the run tests, so I must have done something wrong...

The SMT counterexample is confusing me too:

x1 := (bvnot x) with x := 0 -> 0xffffffff

x2 := (bvadd x1 1) -> 0

x3 := (bvor x2 x) -> (bvor 0 0) -> 0

x4 := (bvlshr x3 31) -> 0

x5 := (bvsub x4 1) -> 0xffffffff

x6 := (bvnot x5) -> 0

and on the LHS for the if-then-else we have (ite (= 0 0) 0 0xffffffff) -> 0. Seems like the optimization did what it was supposed to? I too am confused why we get SAT here.

That Z3 expression looks like a correct translation of that pattern to my eyes, so I'm confused too.

I think the rule seems correct:

!input + 1 is equivalent to -input in two's-complement.

0 is the only value which, when negated, leaves the sign bit clear. (INT_MIN leaves the sign bit set; all other cases flip the sign bit.)

Shifting the sign bit down after x | -x then gives us 0 if the input was 0, and 1 otherwise.

Subtracting 1 means we get -1 if the input was 0, and 0 otherwise.

Inverting the result should then give us the right answer.

And I think this is a reasonable informal proof, by cases:

Prove: ;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1) (for 64-bit input)

Take the three cases

input == 0: we have the above evaluation; the RHS produces 0 and bmask(0) == 0.

input < 0: input has MSB = 1, thus input | ... has MSB = 1. Right-shift by 63, we have 1; 1 - 1 = 0; outer bnot gives us all-ones.

input > 0: input has MSB = 0, thus !input has MSB = 1. Furthermore !input != 0xffff...ffff (else we would have had input == 0). So !input + 1 also has MSB = 1 (because it will not wrap around, because of previous statement). So input | ((!input) + 1) has MSB = 1. Right-shift by 63, we have 1; 1 - 1 = 0; outer bnot gives us all-ones.

(@afonso360 a comment to this effect in the source would be very useful for the future!)

Ah, and after posting I see @jameysharp 's comment too. Both of us arrived at the same reasoning more or less so I'm fairly confident here :-)

We can extract out more broadly-applicable rules that should compose nicely into this one.

Rewrite both of these patterns to (ineg ty input):

(iadd ty (bnot ty input) (iconst ty (u64_from_imm64 1)))

(bnot ty (isub ty input (iconst ty (u64_from_imm64 1))))

In a 64-bit word, -(x >> 63), where right-shift is unsigned, should I think be equivalent to x >> 63 with a signed right-shift. Or in general, rewrite negation of unsigned shifts by N-1 bits in N-bit words.

Then this rule only needs to match (x | -x) >> (N-1), with a signed right-shift.

That should make all these rules more broadly applicable as well as (somewhat) easier to understand.

x | -x leaves the trailing 0s alone and sets the remaining bits to 1. (I found the inverse of this pattern in "Hacker's Delight", on page 12.) I don't think there's a useful rule we can extract from that pattern alone.

The problem with the smtlib version was the use of !, which is for annotating expressions: switching to not fixes the problem:

(declare-const x (_ BitVec 32)) (assert (not (= (ite (= x #x00000000) #x00000000 #xffffffff) ;; !(((x | ((!x) + 1)) >> 31) - 1) (bvnot (bvsub (bvlshr (bvor x (bvadd (bvnot x) #x00000001)) #x0000001f) #x00000001))))) (check-sat) (get-model)

produces the following output:

% z3 -smt2 test.smt unsat (error "line 14 column 10: model is not available")

fitzgen

LGTM with the nitpicks addressed. Thanks @afonso360!

fitzgen · 2023-04-11T19:07:10Z

cranelift/codegen/src/opts/bitops.isle

  (if-let -1 (i64_sextend_imm64 ty k))
  (bnot ty x))
+
+;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1)


Nitpick: instead of hard coding 63 lets make this bits(ty) - 1.

Also, lets add a comment with an informal proof / argument about why this is correct, as @cfallin mentioned.

I think the correctness argument is much less important to write out if this is split into the four separate rules I suggested. They're individually much easier to verify.

Kmeakin · 2023-04-12T17:11:44Z

#6140 has been merged, so the first 2 commits can be removed

afonso360 · 2023-04-13T13:13:08Z

That is a really neat way of decomposing this op @jameysharp, Thanks! And thank you everyone for proving this works!

I've added some comments that hopefully explain the new rule in a way that makes sense to anyone reading, but it's a lot simpler now.

jameysharp

I'd like to add various commutative versions of these rules and make some comments consistent with elsewhere in the egraph rules (details below) but even if you merge without any changes I think this PR would be great.

I'm super pleased to see those nine-instruction sequences collapse into just one! And bmask has two-instruction lowerings in many cases on the backends I checked. Overall I find this very satisfying.

I'd like to know if this has any performance impact but I still haven't gotten around to adding myself to the list of people who can use /bench_x64 😅

cranelift/codegen/src/opts/bitops.isle

cranelift/codegen/src/opts/arithmetic.isle

cranelift/filetests/filetests/runtests/bitops.clif

cranelift/codegen/src/opts/arithmetic.isle

afonso360

Let's see if this has any impact on benchmarks! It seems a lot easier to match now.

/bench_x64

jameysharp · 2023-04-14T18:50:39Z

The /bench_x64 thing didn't run but I've just run a few Sightglass benchmarks on this locally. On bz2, pulldown-cmark, and spidermonkey, there's no significant difference between this PR at e1d2f51 and main at 2d25db0, except in compilation speed, where this PR seems to be faster (by <1%) for no obvious reason. (There are a few extra commits on the version of main that I compared against, so it's possible one of those slowed compilation down.)

I'm going ahead and merging this. These are good rewrites even if they don't happen to help these particular benchmarks, after all.

…6196) * egraphs: Add a bmask bit pattern optimization * egraphs: Add more `ineg` rules * egraphs: Add sshr rule * egraphs: Simplify bmask rule * egraphs: Add comutative version of bmask rule * egraphs: Add more testcases * egraphs: Cleanup rule comments * egraphs: Add more `ineg` optimizations

afonso360 requested a review from a team as a code owner April 11, 2023 10:54

afonso360 requested review from fitzgen and removed request for a team April 11, 2023 10:54

afonso360 force-pushed the long-bmask branch 4 times, most recently from aa76a67 to 56ed912 Compare April 11, 2023 11:24

github-actions bot added the cranelift Issues related to the Cranelift code generator label Apr 11, 2023

fitzgen reviewed Apr 11, 2023

View reviewed changes

fitzgen approved these changes Apr 11, 2023

View reviewed changes

afonso360 added 2 commits April 13, 2023 14:10

egraphs: Add a bmask bit pattern optimization

e1d6b5a

egraphs: Add more ineg rules

31fb6ed

afonso360 force-pushed the long-bmask branch from 56ed912 to 1ded3a9 Compare April 13, 2023 13:10

afonso360 force-pushed the long-bmask branch from 1ded3a9 to e782154 Compare April 13, 2023 14:09

afonso360 added 2 commits April 13, 2023 15:16

egraphs: Add sshr rule

e0afbab

egraphs: Simplify bmask rule

36910b6

afonso360 force-pushed the long-bmask branch from e782154 to 36910b6 Compare April 13, 2023 14:17

jameysharp approved these changes Apr 13, 2023

View reviewed changes

afonso360 added 4 commits April 13, 2023 20:15

egraphs: Add comutative version of bmask rule

6190b97

egraphs: Add more testcases

498abbb

egraphs: Cleanup rule comments

7da307b

egraphs: Add more ineg optimizations

e1d2f51

afonso360 commented Apr 13, 2023

View reviewed changes

jameysharp added this pull request to the merge queue Apr 14, 2023

Merged via the queue into bytecodealliance:main with commit 9e1ff97 Apr 14, 2023

Conversation

afonso360 commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kmeakin commented Apr 12, 2023

Uh oh!

afonso360 commented Apr 13, 2023

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afonso360 left a comment

Choose a reason for hiding this comment

Uh oh!

jameysharp commented Apr 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

afonso360 commented Apr 11, 2023 •

edited

Loading