Skip to content

SYCL: using graphs is configurable by environment variable and compile option#12371

Merged
Rbiessy merged 7 commits intoggml-org:masterfrom
lslusarczyk:sycl-graphs
Mar 18, 2025
Merged

SYCL: using graphs is configurable by environment variable and compile option#12371
Rbiessy merged 7 commits intoggml-org:masterfrom
lslusarczyk:sycl-graphs

Conversation

@lslusarczyk
Copy link
Copy Markdown
Contributor

Changes:

  • using sycl graphs configurable by compile option and environment variable
  • fixed warnings in gml-sycl

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Mar 13, 2025
Comment thread ggml/src/ggml-sycl/ggml-sycl.cpp Outdated
if (!initialized) {
g_ggml_sycl_debug = get_sycl_env("GGML_SYCL_DEBUG", 0);
g_ggml_sycl_disable_optimize= get_sycl_env("GGML_SYCL_DISABLE_OPT", 0);
g_ggml_sycl_graphs = get_sycl_env("GGML_SYCL_GRAPHS", 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is some doc here where we could add this

#### Runtime

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the doc and renamed runtime and compile time variable names to align with similar names. Please check if it looks OK to you.

Comment thread ggml/src/ggml-sycl/ggml-sycl.cpp Outdated
@NeoZhangJianyu
Copy link
Copy Markdown
Contributor

It's great job!

  1. what's the performance data after use sycl graph?
  2. why add static in the functions?
  3. If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

Copy link
Copy Markdown
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no concern with the changes here. I tested without the graph enabled and confirmed no issues.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 14, 2025
@lslusarczyk
Copy link
Copy Markdown
Contributor Author

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

@lslusarczyk lslusarczyk marked this pull request as ready for review March 14, 2025 11:26
Comment thread docs/backend/SYCL.md Outdated
Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
@arthw
Copy link
Copy Markdown
Contributor

arthw commented Mar 15, 2025

It's great job!

Thank you. This goes also to @Alcpz , whose work I've continued here.

what's the performance data after use sycl graph?

It is currently slower, than non-graph version because of whole graph update approach, which is going to be changed into small modifications of existing graph. I'd like to have this merged in this first shape, in order to set up some internal benchmarking on llama with graphs and to split my work of making graphs version finally the fastest one.

why add static in the functions?

If a function is not used outside cpp file compilation unit, then its symbol does not need to be exported and can be hidden. Newest llvm complain in a warning about lack of 'static' keyword in such a case.

If sycl graph can increase the performance in more cases, set it as default in building and running. User could use the env variable to disable it, instead of enable it.

You are right. This is the goal and I hope I will be able to change default values of the variable soon.

How to improve to make the SYCL graph increase performance in next step?
Update the application code or use newer compiler/running time?

If don't bring perforemance, that will impact the SYCL graph reputation.
I suggest highlight the status in guide to avoid user misunstanding.

Comment thread docs/backend/SYCL.md Outdated
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
| GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
| GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest it enable sycl graph in building and running.
GGML_SYCL_ENABLE_GRAPH is rename to GGML_SYCL_DISABLE_GRAPH.

That will make the user experience be simple.
1 GGML_SYCL_GRAPH , open sycl graph in building and running by one variable.
2. GGML_SYCL_DISABLE_GRAPH, close sycl graph for debug.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of points as to consider when making this decision.

  • Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.
  • Running by default: As discussed in the other thread, performance is worse with SYCL-Graph enabled right now. If we enable this code path by default, users may see there application performance drop after this commit. Which is both surprising and undesirable from a user perspective.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

I would agree this is probably the best path forward, as you say two different switches introduces the chances of misconfiguration. I just wanted to note the experimental API point so that maintainers can make an informed choice, the impact on compilation time should be negligible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged this discussion with #12371 (comment) and decided to:

  • rename GGML_SYCL_ENABLE_GRAPH to GGML_SYCL_DISABLE_GRAPH (as @arthw suggested)
  • enable build graphs by default (as @EwanC and @Alcpz suggested)
  • disable running graphs by default (as @EwanC and @Alcpz , temporary contrary to @arthw until graphs perf is better than non-graph)

stream->parallel_for(
sycl::nd_range<3>(num_blocks * block_size, block_size),
[=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
[=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to change?
Are you using official oneAPI compiler 2025.0? or internal compiler?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not strictly needed in this PR. I believe intel::reqd_sub_group_size will become deprecated in the next compiler release. This works fine with 2025.0 so I don't mind updating this now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am compiling with both: oneAPI and latest intel llvm compiler (next oneAPI candidate). Latest llvm issues deprecated warning, hence the change.
See llvm release notes: Deprecated intel::reqd_sub_group_size, the official SYCL 2020 spelling should be used instead (with sycl:: namespace).

}
return false;
} break;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After remove so many break, the code path will be changed.
Have you test with the CI?
This change will impact the CI test.
Suggest to run CI locally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran our CI and got no issues, see #12371 (review)
The break are not needed as there is always a return before.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

Comment thread docs/backend/SYCL.md Outdated
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
| GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
| GGML_SYCL_GRAPH | OFF *(default)* \|ON *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling building by default: The SYCL-Graph extension API is still in experimental phase. This means that it is (very) unlikely, but not impossible, that SYCL may make a API breaking change that will break the llama build.

I find confusing for the user to have to enable SYCL-Graph at two different stages (building and running).
What do you think of having SYCL GRAPH support always built and have usage decided at runtime (disabled by default)? Does SYCL GRAPH have a big impact in terms of compilation times?

My reasoning is that the SYCL backend of Llama.cpp should target the current release compiler SYCL-Graph API, not the latest (DPCPP open source) API (although ideally we should support both, some incompatibilities will happen at some point). While API breaking changes could happen, those shouldn't come at a fast pace, but in between releases and should be manageable.

stream->parallel_for(
sycl::nd_range<3>(num_blocks * block_size, block_size),
[=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(WARP_SIZE)]] {
[=](sycl::nd_item<3> item_ct1) [[sycl::reqd_sub_group_size(WARP_SIZE)]] {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been facing a lot of warnings when building llama.cpp, so I find these changes a positive thing.

Comment thread ggml/src/ggml-sycl/ggml-sycl.cpp
}
return false;
} break;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These breaks used to show as unreachable code warnings notified by the compiler. I don't know why they went away.

@lslusarczyk
Copy link
Copy Markdown
Contributor Author

How to improve to make the SYCL graph increase performance in next step? Update the application code or use newer compiler/running time?

We are going to use a new feature in graphs that make it possible to update graph more often rather than recreate it. It needs some work outside llama.

If don't bring performance, that will impact the SYCL graph reputation. I suggest highlight the status in guide to avoid user misunderstanding.

Good point. I did it.

@NeoZhangJianyu
Copy link
Copy Markdown
Contributor

OK, it's clear to me.

Because the feature is not good enough to be enabled, I suggest pending this PR until the feature bring active performance impact.
Unused code will increase the maintain cost and technical debt.

Thank you!

@Rbiessy Rbiessy merged commit 35cae5b into ggml-org:master Mar 18, 2025
@Rbiessy
Copy link
Copy Markdown
Contributor

Rbiessy commented Mar 18, 2025

Merged now as all the comments have been addressed. This will simplify the work to make SYCL-Graph more performant. There is already a comment to manage the expectations.

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…e option (ggml-org#12371)

* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants