Using `DisableLLVMLoopOpt` can generate crashy Cuda code

To see this, run `correctness_gpu_dynamic_shared` with `HL_JIT_TARGET=host-cuda-disable_llvm_loop_opt`; on (at least) x86-64-Linux systems, you will crash with illegal memory access. (Note that only the case in the test with per_thread=1, memory_type=GPUShared fails; editing the test to only run this case makes debugging a bit simpler.)

It's not at all clear yet whether the culprit here is in LLVM or in the NVidia Driver. (It's almost certainly not Halide per se, as our IR is identical whether you use `disable_llvm_loop_opt` or not.)

@abadams and I both suspect the driver, as
- we've only seen this on x86-64-linux systems running recent "real" NVidia drivers (not the open-source variant)
- looking at the PTX disassembly and hand-walking thru it doesn't show anything obviously wrong to our eyes
- the failure appears to be a write that is one-past-the-end of the shared memory block
- running under cuda-memcheck and cuda-gdb hasn't enlightened us any further
- same behavior is seen when building Halide with LLVM11/12/13

We'd like to run this to ground so that we can consider landing #5019, but are a bit at a loss as to how to do so -- next step might be to see if we have a contact inside NVidia (or, perhaps a PTX Ninja who might know more than us) to help take a look. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `DisableLLVMLoopOpt` can generate crashy Cuda code #6061

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using DisableLLVMLoopOpt can generate crashy Cuda code #6061

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Using `DisableLLVMLoopOpt` can generate crashy Cuda code #6061