Make context handling in GPU runtimes more consistent and robust.#5474
Merged
Make context handling in GPU runtimes more consistent and robust.#5474
Conversation
added 8 commits
November 18, 2020 16:56
(shader/kernel/etc.) compilations.
the intial support and adds tests. Currently the gpu_multi test only has context creation code fo CUDA and OpenCL. This shoulkd be added for other GPU runteims, but some coverage is provided via using the default context for these APIs. Fixes a bug in CUDA runtime where some error message text in cuda_do_multidimensional_copy was not initialized. Fixes a bug in CUDA runtime where device release code did not run if CUDA libraries are directly linked into the executable. (This would have caused crashes due to the device allocation caching among other issues.)
Add initial commits explaining what tests do.
shoaibkamil
reviewed
Nov 24, 2020
added 3 commits
November 24, 2020 09:31
it to stick closer to naming pattern and work with CMake rules code.
Contributor
See #5475 |
Contributor
steven-johnson
left a comment
There was a problem hiding this comment.
Looks good from a quick skim -- gonna wait for buildbots to look clean(er) before reviewing more.
added 4 commits
November 25, 2020 14:37
added 5 commits
December 2, 2020 12:02
Member
Author
Possibly will be used to control conditional compilation at some point. Really it should probably nerf the test entirely outside of GPU APIs it can make contexts for. But it does get a little coverage on e.g. Metal and Direct3d so... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move to using common code for kernel compilation caching for CUDA, OpenCL, Metal, and D3D12 GPU runtimes. New caching endeavors to be robust in not using a kernel compiled for one context on another and uses a hash table to avoid small allocations across multiple pages of VM. OpenCL was particularly broken in that code using two contexts was almost guaranteed to fail. This PR also opens the door to allowing better client control of caching, such as setting a size limit or allowing eviction of specific kernels, and is pretty close to allowing runtime overloads of the kernel compilation itself to allow persistent caching across process invocations for GPU APIs that allow this. (The
compile_kernelfunction in multiple files needs to be promoted to a client visible runtime overload for each GPU API.)Tests are added to cover many kernels and more than one context. A test using multiple contexts across multiple threads both tests things that didn't necessarily work before and provides an example for a common use case.
Two small fixes to CUDA prevent a crash in a very rare error case and make device release work if the CUDA library is linked directly into the app. (The latter would have shown up as a crash due to allocation caching for static linking as the code to release allocations when freeing a context did not run.)
OpenGL and OpenGLCompute were not addressed in this PR due to both time limitations and because there are more significant issues in these runtimes around this area. OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving, though similar work is needed re: how kernels are communicated to the runtime and compiled.