Make GPU kernel compilation caching consistent across GPU backends. by zvookin · Pull Request #5546 · halide/Halide

zvookin · 2020-12-11T01:13:37Z

This is a continuation of #5474 .

Move to using common code for kernel compilation caching for CUDA, OpenCL, Metal, and D3D12 GPU runtimes. New caching endeavors to be robust in not using a kernel compiled for one context on another and uses a hash table to avoid small allocations across multiple pages of VM. OpenCL was particularly broken in that code using two contexts was almost guaranteed to fail. This PR also opens the door to allowing better client control of caching, such as setting a size limit or allowing eviction of specific kernels, and is pretty close to allowing runtime overloads of the kernel compilation itself to allow persistent caching across process invocations for GPU APIs that allow this. (The compile_kernel function in multiple files needs to be promoted to a client visible runtime overload for each GPU API.)

Tests are added to cover many kernels and more than one context. A test using multiple contexts across multiple threads both tests things that didn't necessarily work before and provides an example for a common use case.

Two small fixes to CUDA prevent a crash in a very rare error case and make device release work if the CUDA library is linked directly into the app. (The latter would have shown up as a crash due to allocation caching for static linking as the code to release allocations when freeing a context did not run.)

OpenGL and OpenGLCompute were not addressed in this PR due to both time limitations and because there are more significant issues in these runtimes around this area. OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving, though similar work is needed re: how kernels are communicated to the runtime and compiled.

Kernel compilations are now ref counted such that they are marked as held when the initialize kernels call is made for a filter and released via a new finalization call that is made in the destructor section of a filter invocation. This is required to get both object lifetime and multiple context cache releasing to work with per-device cached APIs such as Metal and D3D.

Opportunistic fix to a syntax error in the output of the C++ codegen back end.

… and robust. (#5474)" (#5515)" This reverts commit 2ddd0b0.

…eaded_aottest.cpp (#5512)" (#5514)" This reverts commit 2c8e3ea.

…all by removing Comdat IR annotations in runtime on Mac OS and iOS.

…istency2

cache for kernels. Introduces a finalization routine for kernel compilation to indicate when kernals are not strictly required to be defined. Thus allowing them to be unloaded or discarded, but not when they are needed.

Quick fix for syntax error in C codegen. Tab fixes. Makefile fixes.

…Halide into gpu_context_consistency2

steven-johnson

LGTM pending green

Makefile

src/runtime/cuda.cpp

src/runtime/d3d12compute.cpp

…ail without a GPU.

steven-johnson · 2020-12-15T02:36:54Z

Updated to master just to tickle the buildbots.

steven-johnson · 2020-12-16T22:21:08Z

https://buildbot.halide-lang.org/master/#/builders/25/builds/15

CMake Error at cmake/AddCudaToTarget.cmake:3 (target_link_libraries):
Cannot specify link libraries for target
"generator_aot_multi_context_threaded" which is not built by this project.
Call Stack (most recent call first):
test/generator/CMakeLists.txt:127 (add_cuda_to_target)

…ack.

Make Metal context creation test API consistent with CUDA and OpenCL by having it return a success/fail indication instead of asserting internally.

the device context when compiling a kernel.

steven-johnson · 2021-01-15T21:59:04Z

Looks like async_device_copy is failing on Linux for host-opencl

steven-johnson · 2021-01-22T17:09:34Z

Looks like we now have only one failure: correctness_gpu_many_kernels for D3D12Computer

steven-johnson · 2021-01-25T17:56:44Z

At this point I assume we want to bring in the D3D12 experts to assist figure out the last gotcha here?

shoaibkamil · 2021-01-25T19:06:40Z

Going to take a look at the correctness_gpu_many_kernels failure after SIGGRAPH submissions (unless @slomp gets to it first).

slomp · 2021-01-25T21:35:29Z

I think I can give it a shot tomorrow!

slomp · 2021-01-26T21:45:36Z

Status report:

So far, it's still inconclusive...:(
From the d3d12 tracelog, a bunch of code gets executed, and eventually, there's a new call to d3d12_create_context().
In there, the crash seems to happen after new_command_queue().

Curiously, I can build the project with msbuild from the command-line and run the executable.
When I try to attach the debugger, or run it directly from Visual Studio, I am getting an obscure vcrtruntime error when Halide.dll is being loaded.

slomp · 2021-01-26T22:02:51Z

Status report:

OK, the issue seems to be related with releasing the device, and the next time d3d12_create_context() is called.
If I comment out the following line, it runs to completion without issues:

device->device_release(nullptr, device);

I'll investigate further.

slomp · 2021-01-26T22:35:52Z

Ok, I think I found the issue (a very silly one).
I took the liberty of pushing the changes directly to this branch.

slomp · 2021-01-27T00:37:44Z

As for the new failure case: correctness_interpreter runs just fine here for me.

steven-johnson · 2021-01-28T17:53:44Z

correctness_interpreter is clearly a flake of some sort -- it's due to be investigated after SIGGRAPH deadlines pass. It shouldn't block landing this.

steven-johnson · 2021-01-29T17:28:05Z

Ready to land?

slomp · 2021-01-29T17:30:01Z

Ready to land?

Fine by me!

Z Stern added 16 commits December 7, 2020 13:01

Revert "Revert "Make context handling in GPU runtimes more consistent…

d6f6053

… and robust. (#5474)" (#5515)" This reverts commit 2ddd0b0.

Revert "Revert "Fix broken destroy_context() in gpu_multi_context_thr…

f8df8eb

…eaded_aottest.cpp (#5512)" (#5514)" This reverts commit 2c8e3ea.

Solve the COMDAT in runtime failing on Mac OS X problem once and for …

805e14b

…all by removing Comdat IR annotations in runtime on Mac OS and iOS.

Improve comment.

fbea278

Merge branch 'master' into gpu_context_consistency2

312c05e

Merge branch 'remove_runtime_comdats_macos_ios' into gpu_context_cons…

1070d3f

…istency2

Fix tabs in indentation.

b851d20

Merge branch 'remove_runtime_comdats_macos_ios' into gpu_context_cons…

8b9017f

…istency2

Merge branch 'master' into gpu_context_consistency2

961fd42

Merge branch 'master' into gpu_context_consistency2

3a4c606

Merge branch 'master' into gpu_context_consistency2

4f55416

Add CUDA finalizer method.

03062dd

Quick fix for syntax error in C codegen. Tab fixes. Makefile fixes.

Conditionalize Objective C support.

42f9ccc

Fix clang-format complaints.

b634ac0

Fix clang-format complaints.

285750a

zvookin requested a review from steven-johnson December 11, 2020 02:10

Z Stern added 2 commits December 10, 2020 20:53

Attempt to fix new test failure with cmake.

e78ce14

Merge branch 'gpu_context_consistency2' of https://github.com/halide/…

0c0ff56

…Halide into gpu_context_consistency2

steven-johnson approved these changes Dec 11, 2020

View reviewed changes

Makefile Show resolved Hide resolved

src/runtime/cuda.cpp Show resolved Hide resolved

src/runtime/d3d12compute.cpp Outdated Show resolved Hide resolved

src/runtime/d3d12compute.cpp Show resolved Hide resolved

alexreinking added this to the v12.0.0 milestone Dec 11, 2020

Z Stern and others added 2 commits December 14, 2020 14:52

Add Metal support to acquire_release test and make it so it doesn't f…

1046ccc

…ail without a GPU.

Merge branch 'master' into gpu_context_consistency2

21d39d2

steven-johnson added 2 commits December 15, 2020 14:19

Merge branch 'master' into gpu_context_consistency2

348d2bf

Merge branch 'master' into gpu_context_consistency2

7afefcb

steven-johnson and others added 3 commits December 16, 2020 15:39

Merge branch 'master' into gpu_context_consistency2

71b4e23

Fix CMake cuda target issue. Add comment to Makefile per review feedb…

3352bf5

…ack.

Add a couple more locals initializations for safety.

05c8c7c

Z Stern and others added 6 commits January 6, 2021 02:04

Fix errors for MEtal case in acquire_release_aottest.cpp.

1962b25

Make Metal context creation test API consistent with CUDA and OpenCL by having it return a success/fail indication instead of asserting internally.

Fix formatting.

054a535

Merge branch 'master' into gpu_context_consistency2

2163090

Fix D3D runtime to work like Metal does and not reentrantly acquire

cf862ce

the device context when compiling a kernel.

Merge branch 'master' into gpu_context_consistency2

22e336d

Merge branch 'master' into gpu_context_consistency2

c669a85

Z Stern and others added 3 commits January 18, 2021 18:36

Merge branch 'master' into gpu_context_consistency2

8156bec

Merge branch 'master' into gpu_context_consistency2

f911bdb

Merge branch 'master' into gpu_context_consistency2

68dbd62

steven-johnson added 2 commits January 25, 2021 11:38

trigger buildbots

871acc3

trigger buildbots

593a59a

steven-johnson added 2 commits January 25, 2021 14:22

trigger buildbots

6b877df

Merge branch 'master' into gpu_context_consistency2

0452473

bugfix

c61bf39

steven-johnson added 2 commits January 27, 2021 15:02

trigger buildbots

6b79b65

Merge branch 'master' into gpu_context_consistency2

3028478

zvookin merged commit 9743fca into master Feb 1, 2021

zvookin deleted the gpu_context_consistency2 branch February 1, 2021 21:00

Conversation

zvookin commented Dec 11, 2020

Uh oh!

steven-johnson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steven-johnson commented Dec 15, 2020

Uh oh!

steven-johnson commented Dec 16, 2020

Uh oh!

steven-johnson commented Jan 15, 2021

Uh oh!

steven-johnson commented Jan 22, 2021

Uh oh!

steven-johnson commented Jan 25, 2021

Uh oh!

shoaibkamil commented Jan 25, 2021

Uh oh!

slomp commented Jan 25, 2021

Uh oh!

slomp commented Jan 26, 2021

Uh oh!

slomp commented Jan 26, 2021

Uh oh!

slomp commented Jan 26, 2021

Uh oh!

slomp commented Jan 27, 2021

Uh oh!

steven-johnson commented Jan 28, 2021

Uh oh!

steven-johnson commented Jan 29, 2021

Uh oh!

slomp commented Jan 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants