Hey,
I'm seeing crashes during finalize_definitions calls related to x86_64 call relocations:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', cranelift/jit/src/compiled_blob.rs:55:80
Cranelift emits 32-bit relocations for calls on x86_64, and thus can "only" address in the relative ±2GB range. Code memory is allocated with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged jit-minimal.rs example:
use cranelift::prelude::*;
use cranelift_codegen::settings;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{default_libcall_names, Linkage, Module};
fn main() {
let isa_builder = cranelift_native::builder().unwrap();
let isa = isa_builder
.finish(settings::Flags::new(settings::builder()))
.unwrap();
let mut m = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));
let mut ctx = m.make_context();
let mut func_ctx = FunctionBuilderContext::new();
let func_a = m
.declare_function("a", Linkage::Local, &m.make_signature())
.unwrap();
let func_b = m
.declare_function("b", Linkage::Local, &m.make_signature())
.unwrap();
// Define a dummy function `func_a`
ctx.func.name = ExternalName::user(0, func_a.as_u32());
{
let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
let block = bcx.create_block();
bcx.switch_to_block(block);
bcx.ins().return_(&[]);
bcx.seal_all_blocks();
bcx.finalize();
}
m.define_function(func_a, &mut ctx).unwrap();
m.clear_context(&mut ctx);
// Allocate a bunch (~4GB) to stretch address space
let mut allocations: Vec<Vec<u8>> = Vec::new();
for _ in 0..999999 {
allocations.push(Vec::with_capacity(4096));
}
// Define `func_b` in a new allocation and reference `func_a`
ctx.func.name = ExternalName::user(0, func_b.as_u32());
{
let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
let block = bcx.create_block();
bcx.switch_to_block(block);
let local_func = m.declare_func_in_func(func_a, &mut bcx.func);
// Emit a call with a relocation for func_a
bcx.ins().call(local_func, &[]);
// Make sure that this function's body is larger than page_size and will require a new allocation.
for _ in 0..1024 {
bcx.ins().call(local_func, &[]);
}
bcx.ins().return_(&[]);
bcx.seal_all_blocks();
bcx.finalize();
}
m.define_function(func_b, &mut ctx).unwrap();
m.clear_context(&mut ctx);
// Perform linking
m.finalize_definitions();
}
It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places >128kb allocations outside of the heap, though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.
Possible approaches:
-
Determine the total size of the finalized code page before allocating; allocate one large chunk. It seems like an implementation for this should be doable, though I'm not sure if this is by design. (This would be incompatible with features like hot function replacement.)
-
Don't allocate on the heap. Cranelift's selinux-fix features uses mmap allocations. The underlying issue still persists, though as mmap allocations are separate from the heap, they're mostly sequential and would need >2GB of generated machine code to cause problems.
-
(Change relocation style? There's no 64-bit relative jump in x86_64 and blowing up code size for this seems like a bad idea.)
Aarch64 runs into a related issue with 26-bit relative jumps: #3277
I'm not sure veneers are applicable for x86_64, but they seems like an interesting and more general approach to relative jump range limits.
Hey,
I'm seeing crashes during
finalize_definitionscalls related to x86_64callrelocations:Cranelift emits 32-bit relocations for calls on x86_64, and thus can "only" address in the relative ±2GB range. Code memory is allocated with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged
jit-minimal.rsexample:It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places >128kb allocations outside of the heap, though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.
Possible approaches:
Determine the total size of the finalized code page before allocating; allocate one large chunk. It seems like an implementation for this should be doable, though I'm not sure if this is by design. (This would be incompatible with features like hot function replacement.)
Don't allocate on the heap. Cranelift's
selinux-fixfeatures uses mmap allocations. The underlying issue still persists, though as mmap allocations are separate from the heap, they're mostly sequential and would need >2GB of generated machine code to cause problems.(Change relocation style? There's no 64-bit relative jump in x86_64 and blowing up code size for this seems like a bad idea.)
Aarch64 runs into a related issue with 26-bit relative jumps: #3277
I'm not sure veneers are applicable for x86_64, but they seems like an interesting and more general approach to relative jump range limits.