SYCL: Avoid using SYCL-Graph for unsupported nodes#13587
Merged
NeoZhangJianyu merged 1 commit intoggml-org:masterfrom May 22, 2025
Merged
SYCL: Avoid using SYCL-Graph for unsupported nodes#13587NeoZhangJianyu merged 1 commit intoggml-org:masterfrom
NeoZhangJianyu merged 1 commit intoggml-org:masterfrom
Conversation
Rbiessy
reviewed
May 19, 2025
Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.
289597c to
baf7b65
Compare
Rbiessy
approved these changes
May 21, 2025
Contributor
|
So, all LLMs including CONCAT or MUL_MAT_ID can't use sycl graph. |
Contributor
Author
Yup that's correct, the way these operations are implemented isn't valid usage for creating a sycl-graph by queue recording, leading to exceptions being thrown. I think the CONCAT/MUL_MAT_ID implementations could be reworked to make them valid for recording in a sycl-graph, but that's a larger/future task. So instead this PR avoids the |
Contributor
|
OK! It's clear to me! Maybe draft a special version of concat and mul_mat_id for sycl graph. |
NeoZhangJianyu
approved these changes
May 22, 2025
Seunghhon
pushed a commit
to Seunghhon/llama.cpp
that referenced
this pull request
Apr 26, 2026
Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently on a CUDA backend to SYCL when running
GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0there are two operations that throw an exception from the blocking waits during queue recording.-o CONCAT: Use of blocking waits on a queue that's being recorded in ggml_sycl_op_concat.-o MUL_MAT_ID: Blocking wait on a recording queue for a copy to host memory in ggml_sycl_mul_mat_id.We've noticed that
ggml-cuda.cuhas thecheck_node_graph_compatibility_and_refresh_copy_ops method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to
ggml-sycl.cppfor checking if a graph can be used for the operations even if a user has asked for it to be enabled.