Skip to content

Make GPU kernel compilation caching consistent across GPU backends.#5546

Merged
zvookin merged 46 commits intomasterfrom
gpu_context_consistency2
Feb 1, 2021
Merged

Make GPU kernel compilation caching consistent across GPU backends.#5546
zvookin merged 46 commits intomasterfrom
gpu_context_consistency2

Conversation

@zvookin
Copy link
Member

@zvookin zvookin commented Dec 11, 2020

This is a continuation of #5474 .

Move to using common code for kernel compilation caching for CUDA, OpenCL, Metal, and D3D12 GPU runtimes. New caching endeavors to be robust in not using a kernel compiled for one context on another and uses a hash table to avoid small allocations across multiple pages of VM. OpenCL was particularly broken in that code using two contexts was almost guaranteed to fail. This PR also opens the door to allowing better client control of caching, such as setting a size limit or allowing eviction of specific kernels, and is pretty close to allowing runtime overloads of the kernel compilation itself to allow persistent caching across process invocations for GPU APIs that allow this. (The compile_kernel function in multiple files needs to be promoted to a client visible runtime overload for each GPU API.)

Tests are added to cover many kernels and more than one context. A test using multiple contexts across multiple threads both tests things that didn't necessarily work before and provides an example for a common use case.

Two small fixes to CUDA prevent a crash in a very rare error case and make device release work if the CUDA library is linked directly into the app. (The latter would have shown up as a crash due to allocation caching for static linking as the code to release allocations when freeing a context did not run.)

OpenGL and OpenGLCompute were not addressed in this PR due to both time limitations and because there are more significant issues in these runtimes around this area. OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving, though similar work is needed re: how kernels are communicated to the runtime and compiled.

Kernel compilations are now ref counted such that they are marked as held when the initialize kernels call is made for a filter and released via a new finalization call that is made in the destructor section of a filter invocation. This is required to get both object lifetime and multiple context cache releasing to work with per-device cached APIs such as Metal and D3D.

Opportunistic fix to a syntax error in the output of the C++ codegen back end.

Z Stern added 16 commits December 7, 2020 13:01
…all by removing Comdat IR annotations in runtime on Mac OS and iOS.
cache for kernels. Introduces a finalization routine for kernel
compilation to indicate when kernals are not strictly required to be
defined. Thus allowing them to be unloaded or discarded, but not when
they are needed.
Quick fix for syntax error in C codegen.

Tab fixes.

Makefile fixes.
Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green

@alexreinking alexreinking added this to the v12.0.0 milestone Dec 11, 2020
@steven-johnson
Copy link
Contributor

Updated to master just to tickle the buildbots.

@steven-johnson
Copy link
Contributor

https://buildbot.halide-lang.org/master/#/builders/25/builds/15

CMake Error at cmake/AddCudaToTarget.cmake:3 (target_link_libraries):
Cannot specify link libraries for target
"generator_aot_multi_context_threaded" which is not built by this project.
Call Stack (most recent call first):
test/generator/CMakeLists.txt:127 (add_cuda_to_target)

Z Stern and others added 6 commits January 6, 2021 02:04
@steven-johnson
Copy link
Contributor

Looks like async_device_copy is failing on Linux for host-opencl

@steven-johnson
Copy link
Contributor

Looks like we now have only one failure: correctness_gpu_many_kernels for D3D12Computer

@steven-johnson
Copy link
Contributor

At this point I assume we want to bring in the D3D12 experts to assist figure out the last gotcha here?

@shoaibkamil
Copy link
Contributor

Going to take a look at the correctness_gpu_many_kernels failure after SIGGRAPH submissions (unless @slomp gets to it first).

@slomp
Copy link
Contributor

slomp commented Jan 25, 2021

I think I can give it a shot tomorrow!

@slomp
Copy link
Contributor

slomp commented Jan 26, 2021

Status report:

So far, it's still inconclusive...:(
From the d3d12 tracelog, a bunch of code gets executed, and eventually, there's a new call to d3d12_create_context().
In there, the crash seems to happen after new_command_queue().

Curiously, I can build the project with msbuild from the command-line and run the executable.
When I try to attach the debugger, or run it directly from Visual Studio, I am getting an obscure vcrtruntime error when Halide.dll is being loaded.

@slomp
Copy link
Contributor

slomp commented Jan 26, 2021

Status report:

OK, the issue seems to be related with releasing the device, and the next time d3d12_create_context() is called.
If I comment out the following line, it runs to completion without issues:

device->device_release(nullptr, device);

I'll investigate further.

@slomp
Copy link
Contributor

slomp commented Jan 26, 2021

Ok, I think I found the issue (a very silly one).
I took the liberty of pushing the changes directly to this branch.

@slomp
Copy link
Contributor

slomp commented Jan 27, 2021

As for the new failure case: correctness_interpreter runs just fine here for me.

@steven-johnson
Copy link
Contributor

correctness_interpreter is clearly a flake of some sort -- it's due to be investigated after SIGGRAPH deadlines pass. It shouldn't block landing this.

@steven-johnson
Copy link
Contributor

Ready to land?

@slomp
Copy link
Contributor

slomp commented Jan 29, 2021

Ready to land?

Fine by me!

@zvookin zvookin merged commit 9743fca into master Feb 1, 2021
@zvookin zvookin deleted the gpu_context_consistency2 branch February 1, 2021 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants