Skip to content

metal: Implement ROLL op#21946

Merged
ggerganov merged 6 commits intoggml-org:masterfrom
kushagharahi:impl-roll-op
Apr 16, 2026
Merged

metal: Implement ROLL op#21946
ggerganov merged 6 commits intoggml-org:masterfrom
kushagharahi:impl-roll-op

Conversation

@kushagharahi
Copy link
Copy Markdown
Contributor

@kushagharahi kushagharahi commented Apr 15, 2026

Overview

Per #21941 the roll op was not implemented for metal backend. Falling back to CPU.

I regenerated the ops docs (using M1) as well with the following:

./build/bin/test-backend-ops support -b MTL0 --output csv > ./docs/ops/Metal.csv

python3 ./scripts/create_ops_docs.py

Testing

Post implementation llama-server does not print the the CLIP graph uses unsupported operators by the backend logs for ROLL

./build/bin/test-backend-ops -o ROLL -b MTL0

ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.016 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  =  5726.63 MB
Testing 3 devices

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
Backend 1/3: MTL0
  Device description: Apple M1
  Device memory: 5461 MB (5460 MB free)

ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_roll_f32', name = 'kernel_roll_f32'
ggml_metal_library_compile_pipeline: loaded kernel_roll_f32                               0x12ee0b290 | th_max = 1024 | th_width =   32
  ROLL(shift0=3,shift1=-2,shift3=1,shift4=-1): OK
  1/1 tests passed
  Backend MTL0: OK
ggml_metal_free: deallocating
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

Requirements

  • I have read and agree with the contributing guidelines Yes
  • AI usage disclosure: Opus was used for assistance in the implementation. Verification, tests and final changes done by me.

ref #14909 too

@kushagharahi kushagharahi requested a review from a team as a code owner April 15, 2026 08:26
@github-actions github-actions Bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 15, 2026
@ngxson
Copy link
Copy Markdown
Contributor

ngxson commented Apr 15, 2026

CC @ggerganov if you have a bit of time, this can significantly improve the speed of conformer-based audio models (LFM and gemma 4)

@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 16, 2026
@ggerganov ggerganov merged commit ae2d348 into ggml-org:master Apr 16, 2026
54 of 58 checks passed
cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 17, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026
* nix: support unified apple-sdk

* Impl roll op for Metal

* Revert "nix: support unified apple-sdk"

This reverts commit abfa473.

* update ops.md

* update op docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants