Skip to content

sycl: add RMS_NORM_BACK operation support#16808

Merged
NeoZhangJianyu merged 9 commits intoggml-org:masterfrom
YaelLogic:rms_norm_back_sycl
Oct 29, 2025
Merged

sycl: add RMS_NORM_BACK operation support#16808
NeoZhangJianyu merged 9 commits intoggml-org:masterfrom
YaelLogic:rms_norm_back_sycl

Conversation

@YaelLogic
Copy link
Copy Markdown
Contributor

Summary

Add SYCL backend support for RMS_NORM_BACK using a single FP32 compensated parallel reduction path.
No changes to the public API. Default numerical accuracy is preserved; a fast opt-in macro is also available.


Implementation

Algorithm (onsistent with existing backend behavior)

  • inv_r = 1 / sqrt( (Σ x²) / D + eps )
  • coeff = − (Σ x·dz) / (Σ x² + D·eps)
  • dx[i] = (dz[i] + coeff * x[i]) * inv_r

What was implemented

  • Per-thread accumulation of Σ x² and Σ x·dz with Kahan-style compensation.
  • Warp (sub_group) reduction via warp_reduce_sum.
  • Cross-warp reduction using local memory (one value per warp) with a single barrier.
  • group_broadcast used to distribute inv_r and coeff across the work-group.
  • Work-group size: multiple of WARP_SIZE, capped by device limit (≤256), not larger than D.

Optional fast path

  • Define GGML_SYCL_RMS_BACK_FAST to disable compensated summation and use plain FP32 accumulation.
  • Default remains high-accuracy compensated mode.

Validation

Focused tests executed locally:

Test Suite Result
RMS_NORM_BACK (CPU) 4 / 4 passed
RMS_NORM_BACK (SYCL host/GPU) 4 / 4 passed
Sanity check (NMSE) ≈ 1e-11

Build is warning-free for this code path.


Reproduce (build + test)

# Configure & build with SYCL
cmake -B build -DGGML_SYCL=ON && cmake --build build -j"$(nproc)"

# Focused RMS_NORM_BACK tests
./build/bin/test-backend-ops test -o RMS_NORM_BACK -b CPU
SYCL_DEVICE_FILTER=host ./build/bin/test-backend-ops test -o RMS_NORM_BACK -b SYCL0
./build/bin/test-backend-ops test -o RMS_NORM_BACK -b SYCL0

# Optional fast path (less numerically stable)
# add -DGGML_SYCL_RMS_BACK_FAST to your compiler definitions

Files Changed (minimal scope only)

File Purpose
ggml/src/ggml-sycl/norm.cpp Implementation of ggml_sycl_op_rms_norm_back
ggml/src/ggml-sycl/ggml-sycl.cpp Operation dispatch registration
ggml/src/ggml-sycl/norm.hpp Function declaration
docs/ops.md Mark RMS_NORM_BACK as ✅ for SYCL
docs/ops/SYCL.csv Mark RMS_NORM_BACK entries as supported

No unrelated files or personal data included.


Notes & Risks

  • Default path gives high numerical accuracy using compensated FP32 sums.
  • Fast path is fully optional (disabled by default).
  • Reduction order on GPUs is not bitwise-identical to CPU, but produces NMSE ≈ 1e-11.

Reviewers

cc @CISC @NeoZhangJianyu
Looking forward to your feedback. Thanks in advance!

@github-actions github-actions Bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 27, 2025
@YaelLogic
Copy link
Copy Markdown
Contributor Author

This PR is ready for review.
Tagging @CISC and @NeoZhangJianyu — your feedback would be greatly appreciated whenever you have the chance.
Thanks for your work on maintaining and improving the SYCL backend!

Comment thread ggml/src/ggml-sycl/norm.cpp Outdated
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
Copy link
Copy Markdown
Contributor

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good job!

Thank you!

@NeoZhangJianyu
Copy link
Copy Markdown
Contributor

@YaelLogic
Please fix the EditorConfig issue.

@YaelLogic
Copy link
Copy Markdown
Contributor Author

Hi @NeoZhangJianyu,
the EditorConfig issue has been fixed.
Please let me know if anything else is needed. Thank you.

@NeoZhangJianyu NeoZhangJianyu merged commit 338074c into ggml-org:master Oct 29, 2025
76 of 79 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* sycl: add RMS_NORM_BACK operation support

* sycl: rms_norm_back: add dual reduction paths (FP64 and FP32) and savepoint before further changes

* sycl: add RMS_NORM_BACK support

Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).

* revert: restore .gitignore and tools/run/CMakeLists.txt to upstream

* revert: restore tests/CMakeLists.txt to upstream

* sycl: optimize rms_norm_back

* fix: restore SYCL.csv to correct state with RMS_NORM_BACK support

* Update ggml/src/ggml-sycl/norm.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* fix: remove trailing whitespace and add missing newline (EditorConfig)

---------

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* sycl: add RMS_NORM_BACK operation support

* sycl: rms_norm_back: add dual reduction paths (FP64 and FP32) and savepoint before further changes

* sycl: add RMS_NORM_BACK support

Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).

* revert: restore .gitignore and tools/run/CMakeLists.txt to upstream

* revert: restore tests/CMakeLists.txt to upstream

* sycl: optimize rms_norm_back

* fix: restore SYCL.csv to correct state with RMS_NORM_BACK support

* Update ggml/src/ggml-sycl/norm.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* fix: remove trailing whitespace and add missing newline (EditorConfig)

---------

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* sycl: add RMS_NORM_BACK operation support

* sycl: rms_norm_back: add dual reduction paths (FP64 and FP32) and savepoint before further changes

* sycl: add RMS_NORM_BACK support

Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).

* revert: restore .gitignore and tools/run/CMakeLists.txt to upstream

* revert: restore tests/CMakeLists.txt to upstream

* sycl: optimize rms_norm_back

* fix: restore SYCL.csv to correct state with RMS_NORM_BACK support

* Update ggml/src/ggml-sycl/norm.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* fix: remove trailing whitespace and add missing newline (EditorConfig)

---------

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants