-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
To see this, run correctness_gpu_dynamic_shared with HL_JIT_TARGET=host-cuda-disable_llvm_loop_opt; on (at least) x86-64-Linux systems, you will crash with illegal memory access. (Note that only the case in the test with per_thread=1, memory_type=GPUShared fails; editing the test to only run this case makes debugging a bit simpler.)
It's not at all clear yet whether the culprit here is in LLVM or in the NVidia Driver. (It's almost certainly not Halide per se, as our IR is identical whether you use disable_llvm_loop_opt or not.)
@abadams and I both suspect the driver, as
- we've only seen this on x86-64-linux systems running recent "real" NVidia drivers (not the open-source variant)
- looking at the PTX disassembly and hand-walking thru it doesn't show anything obviously wrong to our eyes
- the failure appears to be a write that is one-past-the-end of the shared memory block
- running under cuda-memcheck and cuda-gdb hasn't enlightened us any further
- same behavior is seen when building Halide with LLVM11/12/13
We'd like to run this to ground so that we can consider landing #5019, but are a bit at a loss as to how to do so -- next step might be to see if we have a contact inside NVidia (or, perhaps a PTX Ninja who might know more than us) to help take a look.