Cranelift: JIT relocations depend on system allocator behaviour

Hey,

I'm seeing crashes during `finalize_definitions` calls related to x86\_64 `call` relocations:
```
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', cranelift/jit/src/compiled_blob.rs:55:80
```

Cranelift [emits 32-bit relocations for calls on x86\_64](https://github.com/bytecodealliance/wasmtime/blob/d147802d513db045406c043740a77e0556d1dfdb/cranelift/codegen/src/isa/x64/inst/emit.rs#L1196), and thus can "only" address in the relative ±2GB range. Code memory is [allocated](https://github.com/bytecodealliance/wasmtime/blob/d147802d513db045406c043740a77e0556d1dfdb/cranelift/jit/src/memory.rs#L55) with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged `jit-minimal.rs` example:

```rust
use cranelift::prelude::*;
use cranelift_codegen::settings;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{default_libcall_names, Linkage, Module};

fn main() {
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder
        .finish(settings::Flags::new(settings::builder()))
        .unwrap();

    let mut m = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));

    let mut ctx = m.make_context();
    let mut func_ctx = FunctionBuilderContext::new();

    let func_a = m
        .declare_function("a", Linkage::Local, &m.make_signature())
        .unwrap();
    let func_b = m
        .declare_function("b", Linkage::Local, &m.make_signature())
        .unwrap();

    // Define a dummy function `func_a`
    ctx.func.name = ExternalName::user(0, func_a.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        bcx.ins().return_(&[]);
        bcx.seal_all_blocks();
        bcx.finalize();
    }
    m.define_function(func_a, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Allocate a bunch (~4GB) to stretch address space
    let mut allocations: Vec<Vec<u8>> = Vec::new();
    for _ in 0..999999 {
        allocations.push(Vec::with_capacity(4096));
    }

    // Define `func_b` in a new allocation and reference `func_a`
    ctx.func.name = ExternalName::user(0, func_b.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        let local_func = m.declare_func_in_func(func_a, &mut bcx.func);
        // Emit a call with a relocation for func_a
        bcx.ins().call(local_func, &[]);
        // Make sure that this function's body is larger than page_size and will require a new allocation.
        for _ in 0..1024 {
            bcx.ins().call(local_func, &[]);
        }
        bcx.ins().return_(&[]);

        bcx.seal_all_blocks();
        bcx.finalize();
    }

    m.define_function(func_b, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Perform linking
    m.finalize_definitions();
}
```

It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places [>128kb allocations outside of the heap](https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html#index-M_005fMMAP_005fTHRESHOLD), though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.

Possible approaches:

- Determine the total size of the finalized code page before allocating; allocate one large chunk. It seems like an implementation for this should be doable, though I'm not sure if this is by design. (This would be incompatible with features like hot function replacement.)

- Don't allocate on the heap. Cranelift's `selinux-fix` features uses mmap allocations. The underlying issue still persists, though as mmap allocations are separate from the heap, they're mostly sequential and would need >2GB of generated machine code to cause problems.

- (Change relocation style? There's no 64-bit relative jump in x86\_64 and blowing up code size for this seems like a bad idea.)

Aarch64 runs into a related issue with 26-bit relative jumps: https://github.com/bytecodealliance/wasmtime/issues/3277
I'm not sure veneers are applicable for x86\_64, but they seems like an interesting and more general approach to relative jump range limits.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: JIT relocations depend on system allocator behaviour #4000

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cranelift: JIT relocations depend on system allocator behaviour #4000

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions