Skip to content
This repository was archived by the owner on Dec 10, 2025. It is now read-only.

Conversation

@rafbiels
Copy link
Contributor

Improve SYCL performance on CUDA and HIP backends with the two changes below. There is no functional change for Intel backends.

1. Add CMake option to use in-order queue

Add the queue properties in_order and, if available, discard_events. The addition is steered by a CMake build option IN_ORDER_QUEUE. Set the default value to ON for NVIDIA and AMD backends and keep as OFF for other backends. The in-order queue corresponds more directly to how the CUDA/HIP variants of the benchmark are written since they use the default stream.

2. Use extensions to submit native commands when available

Add new wrapper functions in infrastructure/SYCL.h to call either host_task or native command submission extensions when available. This is essentially the same solution as used in oneMath.

Add the queue properties in_order and, if available, discard_events.
The addition is steered by a CMake build option IN_ORDER_QUEUE.
Set the default value to ON for NVIDIA and AMD backends and keep
as OFF for other backends.
@rafbiels
Copy link
Contributor Author

I also have PRs lined up using these new host_task wrappers in svm, lc0 and tsne. These will be submitted after this one is approved and merged.

@rmukhopa
Copy link
Contributor

Thanks @rafbiels, I'll review these changes.

Add new wrapper functions in infrastructure/SYCL.h to call either
host_task or native command submission extensions when available.
@rafbiels rafbiels force-pushed the dl-cifar-inorder-nativecmd branch from 555133e to 5d180df Compare April 17, 2025 10:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants