metal: Implement ROLL op by kushagharahi · Pull Request #21946 · ggml-org/llama.cpp

kushagharahi · 2026-04-15T08:26:44Z

Overview

Per #21941 the roll op was not implemented for metal backend. Falling back to CPU.

I regenerated the ops docs (using M1) as well with the following:

./build/bin/test-backend-ops support -b MTL0 --output csv > ./docs/ops/Metal.csv

python3 ./scripts/create_ops_docs.py

Testing

Post implementation llama-server does not print the the CLIP graph uses unsupported operators by the backend logs for ROLL

./build/bin/test-backend-ops -o ROLL -b MTL0

ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.016 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   MTL0
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  =  5726.63 MB
Testing 3 devices

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: use fusion         = true
ggml_metal_init: use concurrency    = true
ggml_metal_init: use graph optimize = true
Backend 1/3: MTL0
  Device description: Apple M1
  Device memory: 5461 MB (5460 MB free)

ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_roll_f32', name = 'kernel_roll_f32'
ggml_metal_library_compile_pipeline: loaded kernel_roll_f32                               0x12ee0b290 | th_max = 1024 | th_width =   32
  ROLL(shift0=3,shift1=-2,shift3=1,shift4=-1): OK
  1/1 tests passed
  Backend MTL0: OK
ggml_metal_free: deallocating
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

Requirements

I have read and agree with the contributing guidelines Yes
AI usage disclosure: Opus was used for assistance in the implementation. Verification, tests and final changes done by me.

ref #14909 too

This reverts commit abfa473.

ngxson · 2026-04-15T15:27:57Z

CC @ggerganov if you have a bit of time, this can significantly improve the speed of conformer-based audio models (LFM and gemma 4)

* nix: support unified apple-sdk * Impl roll op for Metal * Revert "nix: support unified apple-sdk" This reverts commit abfa473. * update ops.md * update op docs

kushagharahi added 5 commits April 14, 2026 18:20

nix: support unified apple-sdk

abfa473

Impl roll op for Metal

986793b

Revert "nix: support unified apple-sdk"

f7c80b5

This reverts commit abfa473.

update ops.md

e567782

update op docs

800b85c

kushagharahi requested a review from a team as a code owner April 15, 2026 08:26

github-actions Bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 15, 2026

ngxson approved these changes Apr 15, 2026

View reviewed changes

Merge branch 'ggml-org:master' into impl-roll-op

74d989d

ggerganov approved these changes Apr 16, 2026

View reviewed changes

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 16, 2026

ggerganov merged commit ae2d348 into ggml-org:master Apr 16, 2026
54 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal: Implement ROLL op#21946

metal: Implement ROLL op#21946
ggerganov merged 6 commits intoggml-org:masterfrom
kushagharahi:impl-roll-op

kushagharahi commented Apr 15, 2026 •

edited

Loading

Uh oh!

ngxson commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kushagharahi commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Requirements

Uh oh!

ngxson commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kushagharahi commented Apr 15, 2026 •

edited

Loading

ngxson commented Apr 15, 2026 •

edited

Loading