Skip to content

egraphs: Add bmask bit pattern optimization rule#6196

Merged
jameysharp merged 8 commits intobytecodealliance:mainfrom
afonso360:long-bmask
Apr 14, 2023
Merged

egraphs: Add bmask bit pattern optimization rule#6196
jameysharp merged 8 commits intobytecodealliance:mainfrom
afonso360:long-bmask

Conversation

@afonso360
Copy link
Contributor

@afonso360 afonso360 commented Apr 11, 2023

👋 Hey,

This is something I wrote in #5888 and then @elliottt pointed out that It's just bmask. I checked if we had a rule for this in the mid end and we don't.

This is a really long rule, and I'm not sure how often it's going to fire. But I also got that lowering from here so it is out there in the real world.

This is based on top of #6140 so that we don't have to rebase that again.

I'm going to let the fuzzer run on this for a while.

@afonso360 afonso360 requested a review from a team as a code owner April 11, 2023 10:54
@afonso360 afonso360 requested review from fitzgen and removed request for a team April 11, 2023 10:54
@afonso360 afonso360 force-pushed the long-bmask branch 4 times, most recently from aa76a67 to 56ed912 Compare April 11, 2023 11:24
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Apr 11, 2023
Comment on lines 96 to 108
;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1)
(rule (simplify (bnot ty
(isub ty
(ushr ty
(bor ty
input
(iadd ty
(bnot ty input)
(iconst ty (u64_from_imm64 1))))
(iconst ty (u64_from_imm64 shift_amt)))
(iconst ty (u64_from_imm64 1)))))
(if-let $true (u64_eq shift_amt (u64_sub (ty_bits ty) 1)))
(bmask ty input))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually correct? I couldn't convince myself, so I asked Z3 to find an x for which bmask(x) and !(((x | ((!x) + 1)) >> (ty_bits - 1)) - 1) were not equal (that is, I asked it to find a counterexample):

(declare-const x (_ BitVec 32))

(assert (! (= (ite (= x #x00000000)
                   #x00000000
                   #xffffffff)
              ;; !(((x | ((!x) + 1)) >> 31) - 1)
              (bvnot (bvsub (bvlshr (bvor x
                                          (bvadd (bvnot x)
                                                 #x00000001))
                                    #x0000001f)
                            #x00000001)))))

(check-sat)
(get-model)

It reports sat, meaning that it found a counterexample, and it says the counterexample is zero:

sat
(
  (define-fun x () (_ BitVec 32)
    #x00000000)
)

So unless I'm missing something and/or translated the expression incorrectly, I think this optimization is incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, there is an explicit test for 0 in the run tests, so I must have done something wrong...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SMT counterexample is confusing me too:

  • x1 := (bvnot x) with x := 0 -> 0xffffffff
  • x2 := (bvadd x1 1) -> 0
  • x3 := (bvor x2 x) -> (bvor 0 0) -> 0
  • x4 := (bvlshr x3 31) -> 0
  • x5 := (bvsub x4 1) -> 0xffffffff
  • x6 := (bvnot x5) -> 0

and on the LHS for the if-then-else we have (ite (= 0 0) 0 0xffffffff) -> 0. Seems like the optimization did what it was supposed to? I too am confused why we get SAT here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That Z3 expression looks like a correct translation of that pattern to my eyes, so I'm confused too.

I think the rule seems correct:

  • !input + 1 is equivalent to -input in two's-complement.
  • 0 is the only value which, when negated, leaves the sign bit clear. (INT_MIN leaves the sign bit set; all other cases flip the sign bit.)
  • Shifting the sign bit down after x | -x then gives us 0 if the input was 0, and 1 otherwise.
  • Subtracting 1 means we get -1 if the input was 0, and 0 otherwise.
  • Inverting the result should then give us the right answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think this is a reasonable informal proof, by cases:

Prove: ;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1) (for 64-bit input)

  • Take the three cases
    • input == 0: we have the above evaluation; the RHS produces 0 and bmask(0) == 0.
    • input < 0: input has MSB = 1, thus input | ... has MSB = 1. Right-shift by 63, we have 1; 1 - 1 = 0; outer bnot gives us all-ones.
    • input > 0: input has MSB = 0, thus !input has MSB = 1. Furthermore !input != 0xffff...ffff (else we would have had input == 0). So !input + 1 also has MSB = 1 (because it will not wrap around, because of previous statement). So input | ((!input) + 1) has MSB = 1. Right-shift by 63, we have 1; 1 - 1 = 0; outer bnot gives us all-ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@afonso360 a comment to this effect in the source would be very useful for the future!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and after posting I see @jameysharp 's comment too. Both of us arrived at the same reasoning more or less so I'm fairly confident here :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract out more broadly-applicable rules that should compose nicely into this one.

Rewrite both of these patterns to (ineg ty input):

  • (iadd ty (bnot ty input) (iconst ty (u64_from_imm64 1)))
  • (bnot ty (isub ty input (iconst ty (u64_from_imm64 1))))

In a 64-bit word, -(x >> 63), where right-shift is unsigned, should I think be equivalent to x >> 63 with a signed right-shift. Or in general, rewrite negation of unsigned shifts by N-1 bits in N-bit words.

Then this rule only needs to match (x | -x) >> (N-1), with a signed right-shift.

That should make all these rules more broadly applicable as well as (somewhat) easier to understand.

x | -x leaves the trailing 0s alone and sets the remaining bits to 1. (I found the inverse of this pattern in "Hacker's Delight", on page 12.) I don't think there's a useful rule we can extract from that pattern alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the smtlib version was the use of !, which is for annotating expressions: switching to not fixes the problem:

(declare-const x (_ BitVec 32))                                                  
                                                                                 
(assert (not (= (ite (= x #x00000000)                                            
                   #x00000000                                                    
                   #xffffffff)                                                   
              ;; !(((x | ((!x) + 1)) >> 31) - 1)                                 
              (bvnot (bvsub (bvlshr (bvor x                                      
                                          (bvadd (bvnot x)                       
                                                 #x00000001))                    
                                    #x0000001f)                                  
                            #x00000001)))))                                      
                                                                                 
(check-sat)                                                                      
(get-model)  

produces the following output:

% z3 -smt2 test.smt
unsat
(error "line 14 column 10: model is not available")

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the nitpicks addressed. Thanks @afonso360!

(if-let -1 (i64_sextend_imm64 ty k))
(bnot ty x))

;; bmask(input) = !(((input | ((!input) + 1)) >> 63) - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: instead of hard coding 63 lets make this bits(ty) - 1.

Also, lets add a comment with an informal proof / argument about why this is correct, as @cfallin mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correctness argument is much less important to write out if this is split into the four separate rules I suggested. They're individually much easier to verify.

@Kmeakin
Copy link
Contributor

Kmeakin commented Apr 12, 2023

#6140 has been merged, so the first 2 commits can be removed

@afonso360
Copy link
Contributor Author

That is a really neat way of decomposing this op @jameysharp, Thanks! And thank you everyone for proving this works!

I've added some comments that hopefully explain the new rule in a way that makes sense to anyone reading, but it's a lot simpler now.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add various commutative versions of these rules and make some comments consistent with elsewhere in the egraph rules (details below) but even if you merge without any changes I think this PR would be great.

I'm super pleased to see those nine-instruction sequences collapse into just one! And bmask has two-instruction lowerings in many cases on the backends I checked. Overall I find this very satisfying.

I'd like to know if this has any performance impact but I still haven't gotten around to adding myself to the list of people who can use /bench_x64 😅

Copy link
Contributor Author

@afonso360 afonso360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if this has any impact on benchmarks! It seems a lot easier to match now.

/bench_x64

@jameysharp
Copy link
Contributor

The /bench_x64 thing didn't run but I've just run a few Sightglass benchmarks on this locally. On bz2, pulldown-cmark, and spidermonkey, there's no significant difference between this PR at e1d2f51 and main at 2d25db0, except in compilation speed, where this PR seems to be faster (by <1%) for no obvious reason. (There are a few extra commits on the version of main that I compared against, so it's possible one of those slowed compilation down.)

I'm going ahead and merging this. These are good rewrites even if they don't happen to help these particular benchmarks, after all.

@jameysharp jameysharp added this pull request to the merge queue Apr 14, 2023
Merged via the queue into bytecodealliance:main with commit 9e1ff97 Apr 14, 2023
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Apr 16, 2023
…6196)

* egraphs: Add a bmask bit pattern optimization

* egraphs: Add more `ineg` rules

* egraphs: Add sshr rule

* egraphs: Simplify bmask rule

* egraphs: Add comutative version of bmask rule

* egraphs: Add more testcases

* egraphs: Cleanup rule comments

* egraphs: Add more `ineg` optimizations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants