[SYCL][Graph] Support for native-command#16871
Conversation
4fa2a2f to
2ead142
Compare
2ead142 to
80269a8
Compare
80269a8 to
d40e0d1
Compare
d40e0d1 to
4aeba98
Compare
e4498ce to
7d12878
Compare
7d12878 to
764aebc
Compare
764aebc to
4e62db9
Compare
047eb0e to
c07c67a
Compare
f499268 to
19bb3dc
Compare
Pennycook
left a comment
There was a problem hiding this comment.
Specification changes LGTM. Thanks for adding the other examples!
steffenlarsen
left a comment
There was a problem hiding this comment.
Few small comments but otherwise LGTM!
martygrant
left a comment
There was a problem hiding this comment.
UR changes LGTM. The new CTS test checks the new urCommandBufferGetNativeHandleExp function, but AppendNativeCommandExp was also added, should there be a test for this too?
Good question, I briefly looked into this but the amount of work to get testing setup for all the L0,CUDA,HIP, OpenCL adapters was a fair bit due to the need for backend specific code, so I stuck with using E2E tests for verification. I agree this is something we should do, so I created #17448 as follow-on work for adding UR CTS testing. |
hvdijk
left a comment
There was a problem hiding this comment.
NativeCPU looks fine to mark this as unsupported.
Support [sycl_ext_codeplay_enqueue_native_command](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) with SYCL-Graph. Introduces `interop_handle::ext_codeplay_get_native_graph<backend>()` to give the user access to the native graph object which native commands can be appended to. To use CUDA as an example, code using `ext_codeplay_enqueue_native_command` eagerly can be updated from: ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream); } ``` To ```cpp CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { if (IH.ext_codeplay_has_graph()) { auto NativeGraph = IH.ext_codeplay_get_native_graph<cuda>(); auto NativeStream = IH.get_native_queue<cuda>(); // Start capture stream calls into graph cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr, nullptr, 0, CU_STREAM_CAPTURE_MODE_GLOBAL); myNativeLibraryCall(NativeStream); // Stop capturing stream calls into graph cuStreamEndCapture(NativeStream, &NativeGraph); } else { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream ); } } ``` Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954
|
@intel/llvm-gatekeepers This is ready to merge, thanks |
Support sycl_ext_codeplay_enqueue_native_command with SYCL-Graph for all of L0, CUDA, HIP, and OpenCL backends.
Introduces
interop_handle::ext_oneapi_get_native_graph<backend>()to give the user access to the native graph object which native commands can be appended to. Implemented using new UR command-buffer entry-pointsurCommandBufferAppendNativeCommandExpandurCommandBufferGetNativeHandleExp.To use CUDA as an example, code using
ext_codeplay_enqueue_native_commandeagerly can be updated from:CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream); }To
CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) { if (IH.ext_oneapi_has_graph()) { auto NativeGraph = IH.ext_oneapi_get_native_graph<cuda>(); auto NativeStream = IH.get_native_queue<cuda>(); // Start capture stream calls into graph cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr, nullptr, 0, CU_STREAM_CAPTURE_MODE_GLOBAL); myNativeLibraryCall(NativeStream); // Stop capturing stream calls into graph cuStreamEndCapture(NativeStream, &NativeGraph); } else { auto NativeStream = IH.get_native_queue<cuda>(); myNativeLibraryCall(NativeStream ); } }Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954