Skip to content

Block order and value number affects whether we get valid CLIF after optimizations #7857

@alexcrichton

Description

@alexcrichton

Last night we got a fuzz bug. Everything below is relative to Wasmtime at 0d662c9.

This input:

(module
  (func
    (local f32)
    f32.const 100
    f32.sqrt
    i32.const 0
    if
      f32.const 100
      f32.sqrt
      block
        i32.const 1
        br_if 0
        f32.const 0
        local.set 0
      end
      local.get 0
      i32.const 1
      select
      i32.reinterpret_f32
      global.set 0
    end
    i32.reinterpret_f32
    global.set 0
  )
  (global (;0;) (mut i32) i32.const 0)
)

will panic in regalloc

$ cargo run compile -C cache=n -W nan-canonicalization ./foo.wat
...
thread '<unnamed>' panicked at cranelift/codegen/src/machinst/compile.rs:76:14:
register allocation: SSA(VReg(vreg = 198, class = Float), Inst(33))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This surfaces a validation error in CLIF earlier with validation enabled

$ cargo run compile -C cache=n -C cranelift-debug-verifier -W nan-canonicalization ./foo.wat
...
                                    v19 = f32const +NaN
                                    v4 = select v18, v19, v17  ; v19 = +NaN
@0043                               v12 = select v8, v4, v10  ; v8 = 1
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; error: inst12 (v12 = select.f32 v8, v4, v10  ; v8 = 1): uses value arg from non-dominating block4

@0044                               v13 = bitcast.i32 v12
@0049                               store notrap aligned table v13, v0+80
@004b                               jump block1

                                block1:
@004b                               return
}

; 1 verifier error detected (see above). Compilation aborted.

This has been further reduced to this CLIF test case:

test optimize
set enable_verifier=true
set opt_level=speed
target x86_64

function %foo(i64) {
block0(v0: i64):
    v3 = f32const 0x1.900000p6
    v17 = sqrt v3
    v18 = fcmp ne v17, v17
    v19 = f32const +NaN
    v4 = select v18, v19, v17
    v5 = iconst.i32 0
    brif v5, block2, block3

block2:
    v6 = f32const 0x1.900000p6
    v20 = sqrt v6
    v21 = fcmp ne v20, v20
    v22 = f32const +NaN
    v7 = select v21, v22, v20
    v8 = iconst.i32 1
    v2 = f32const 0.0
    brif v8, block4(v2), block5

block5:
    v9 = f32const 0.0
    jump block4(v9)

block4(v10: f32):
    v11 = iconst.i32 1
    v12 = select.f32 v11, v7, v10
    v13 = bitcast.i32 v12
    store notrap aligned table v13, v0+80
    return

block3:
    v15 = bitcast.i32 v4
    store notrap aligned table v15, v0+80
    return
}

which can be reproduced with:

$ cd cranelift && cargo run test ./foo.clif

I've been investigating this with @elliottt and @fitzgen in person for a bit now. So far we have concluded a few "fixes" can be applied:

  • One fix is to renumber the original v12 input to v1 in CLIF.
  • Another fix is to move block3 to be beneath block0.
  • The final fix is to change this line to (subsume x)

Naturally none of these are actual fixes but are symptoms of the "real" issue. We're still figuring things out at this time but I wanted to open this up.

Trevor and Nick are telling me as well that this is possibly related to #6126.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior in the current implementation that needs fixingcraneliftIssues related to the Cranelift code generatorfuzz-bugBugs found by a fuzzer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions