SYCL: using graphs is configurable by environment variable and compile option by lslusarczyk · Pull Request #12371 · ggml-org/llama.cpp

lslusarczyk · 2025-03-13T14:23:37Z

Changes:

using sycl graphs configurable by compile option and environment variable
fixed warnings in gml-sycl

EwanC · 2025-03-13T15:31:20Z

    if (!initialized) {
        g_ggml_sycl_debug = get_sycl_env("GGML_SYCL_DEBUG", 0);
        g_ggml_sycl_disable_optimize= get_sycl_env("GGML_SYCL_DISABLE_OPT", 0);
+        g_ggml_sycl_graphs = get_sycl_env("GGML_SYCL_GRAPHS", 0);


Looks like there is some doc here where we could add this

llama.cpp/docs/backend/SYCL.md

Line 668 in be7c303

#### Runtime

I've updated the doc and renamed runtime and compile time variable names to align with similar names. Please check if it looks OK to you.

NeoZhangJianyu · 2025-03-14T01:39:54Z

It's great job!

what's the performance data after use sycl graph?
why add static in the functions?
If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

Rbiessy

I have no concern with the changes here. I tested without the graph enabled and confirmed no issues.

lslusarczyk · 2025-03-14T11:26:05Z

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

arthw · 2025-03-15T13:59:44Z

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

How to improve to make the SYCL graph increase performance in next step?
Update the application code or use newer compiler/running time?

If don't bring perforemance, that will impact the SYCL graph reputation.
I suggest highlight the status in guide to avoid user misunstanding.

arthw · 2025-03-15T14:32:37Z

 | GGML_SYCL_TARGET   | INTEL *(default)* \| NVIDIA \| AMD    | Set the SYCL target device type.            |
 | GGML_SYCL_DEVICE_ARCH | Optional (except for AMD)          | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
 | GGML_SYCL_F16      | OFF *(default)* \|ON *(optional)*     | Enable FP16 build with SYCL code path.      |
+| GGML_SYCL_GRAPH    | OFF *(default)* \|ON *(Optional)*     | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |


Suggest it enable sycl graph in building and running.
GGML_SYCL_ENABLE_GRAPH is rename to GGML_SYCL_DISABLE_GRAPH.

That will make the user experience be simple.
1 GGML_SYCL_GRAPH , open sycl graph in building and running by one variable.
2. GGML_SYCL_DISABLE_GRAPH, close sycl graph for debug.

A couple of points as to consider when making this decision.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

Running by default: As discussed in the other thread, performance is worse with SYCL-Graph enabled right now. If we enable this code path by default, users may see there application performance drop after this commit. Which is both surprising and undesirable from a user perspective.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

I would agree this is probably the best path forward, as you say two different switches introduces the chances of misconfiguration. I just wanted to note the experimental API point so that maintainers can make an informed choice, the impact on compilation time should be negligible.

I've merged this discussion with #12371 (comment) and decided to:

rename GGML_SYCL_ENABLE_GRAPH to GGML_SYCL_DISABLE_GRAPH (as @arthw suggested)

enable build graphs by default (as @EwanC and @Alcpz suggested)

disable running graphs by default (as @EwanC and @Alcpz , temporary contrary to @arthw until graphs perf is better than non-graph)

arthw · 2025-03-15T14:34:35Z

        stream->parallel_for(
            sycl::nd_range<3>(num_blocks * block_size, block_size),
-            [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
+            [=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {


Why need to change？
Are you using official oneAPI compiler 2025.0? or internal compiler?

I agree it's not strictly needed in this PR. I believe intel::reqd_sub_group_size will become deprecated in the next compiler release. This works fine with 2025.0 so I don't mind updating this now.

I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

I am compiling with both: oneAPI and latest intel llvm compiler (next oneAPI candidate). Latest llvm issues deprecated warning, hence the change.
See llvm release notes: Deprecated intel::reqd_sub_group_size, the official SYCL 2020 spelling should be used instead (with sycl:: namespace).

arthw · 2025-03-15T14:37:24Z

                }
                return false;
-            } break;
+            }


After remove so many break, the code path will be changed.
Have you test with the CI?
This change will impact the CI test.
Suggest to run CI locally.

I ran our CI and got no issues, see #12371 (review)
The break are not needed as there is always a return before.

These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

Alcpz · 2025-03-17T09:45:51Z

 | GGML_SYCL_TARGET   | INTEL *(default)* \| NVIDIA \| AMD    | Set the SYCL target device type.            |
 | GGML_SYCL_DEVICE_ARCH | Optional (except for AMD)          | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
 | GGML_SYCL_F16      | OFF *(default)* \|ON *(optional)*     | Enable FP16 build with SYCL code path.      |
+| GGML_SYCL_GRAPH    | OFF *(default)* \|ON *(Optional)*     | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |


Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

Alcpz · 2025-03-17T09:47:24Z

        stream->parallel_for(
            sycl::nd_range<3>(num_blocks * block_size, block_size),
-            [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
+            [=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {


I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

Alcpz · 2025-03-17T10:35:35Z

                }
                return false;
-            } break;
+            }


These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

lslusarczyk · 2025-03-17T15:16:13Z

How to improve to make the SYCL graph increase performance in next step? Update the application code or use newer compiler/running time?

We are going to use a new feature in graphs that make it possible to update graph more often rather than recreate it. It needs some work outside llama.

If don't bring performance, that will impact the SYCL graph reputation. I suggest highlight the status in guide to avoid user misunderstanding.

Good point. I did it.

NeoZhangJianyu · 2025-03-18T03:47:47Z

OK, it's clear to me.

Because the feature is not good enough to be enabled, I suggest pending this PR until the feature bring active performance impact.
Unused code will increase the maintain cost and technical debt.

Thank you!

Rbiessy · 2025-03-18T10:18:02Z

Merged now as all the comments have been addressed. This will simplify the work to make SYCL-Graph more performant. There is already a comment to manage the expectations.

…e option (ggml-org#12371) * alberto changes * enable sycl graphs by env variable * fixed compilation warnings in ggml-sycl.cpp * renamed graph variables * fix markdown in docs/backend/SYCL.md Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> * fix markdown in docs/backend/SYCL.md again * compiling graphs by default, renamed graph_enable to graph_disable --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

lslusarczyk added 3 commits March 13, 2025 15:15

alberto changes

2896987

enable sycl graphs by env variable

4efab98

fixed compilation warnings in ggml-sycl.cpp

d02a0d2

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 13, 2025

EwanC reviewed Mar 13, 2025

View reviewed changes

Comment thread ggml/src/ggml-sycl/ggml-sycl.cpp Outdated

Rbiessy reviewed Mar 14, 2025

View reviewed changes

renamed graph variables

89ac171

github-actions Bot added the documentation Improvements or additions to documentation label Mar 14, 2025

lslusarczyk marked this pull request as ready for review March 14, 2025 11:26

Rbiessy reviewed Mar 14, 2025

View reviewed changes

Comment thread docs/backend/SYCL.md Outdated

fix markdown in docs/backend/SYCL.md

d75f29a

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

Rbiessy approved these changes Mar 14, 2025

View reviewed changes

fix markdown in docs/backend/SYCL.md again

55a49ba

arthw reviewed Mar 15, 2025

View reviewed changes

Alcpz reviewed Mar 17, 2025

View reviewed changes

compiling graphs by default, renamed graph_enable to graph_disable

1693dde

Rbiessy merged commit 35cae5b into ggml-org:master Mar 18, 2025

reble mentioned this pull request May 20, 2025

[KHR] Adding command graph extension KhronosGroup/SYCL-Docs#825

Closed

Conversation

lslusarczyk commented Mar 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NeoZhangJianyu commented Mar 14, 2025

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

lslusarczyk commented Mar 14, 2025

Uh oh!

Uh oh!

arthw commented Mar 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lslusarczyk commented Mar 17, 2025

Uh oh!

NeoZhangJianyu commented Mar 18, 2025

Uh oh!

Rbiessy commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants