Metal backend: SDPA metal implementation #16086

manuelcandales · 2025-12-04T21:11:15Z

Replaces SDPA MPSGraph's implementation with Metal implementation (adapted from MLX implementation, with several modifications, to support transposed middle dimensions, and floating point attention masks).

Speeds up voxtral/whisper by 2-3x

Fixes BFloat16 issue on macOS 26.1

[ghstack-poisoned]

manuelcandales · 2025-12-04T21:11:16Z

Stack from ghstack (oldest at bottom):

-> Metal backend: SDPA metal implementation #16086

pytorch-bot · 2025-12-04T21:11:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16086

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit a9108f8 with merge base c00d726 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/apple/metal/runtime/shims/et_metal.h:
pull / android / run-emulator (gh)
Timeout waiting for emulator to boot.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / macos-job (gh) (matched macos rule in flaky-rules.json)
File doesn't exist

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Apple / build-demo-ios / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 74

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: aa77e4b ghstack-comment-id: 3614336034 Pull-Request: #16086

github-actions · 2025-12-04T21:12:49Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This pull request replaces the MPSGraph-based implementation of Scaled Dot Product Attention (SDPA) with a custom Metal kernel implementation, ported from PyTorch and influenced by MLX.

Key Changes

Custom Metal kernel: Implements a one-pass SDPA algorithm embedded as a 200+ line inline shader with template instantiations for float, half, and bfloat types across head dimensions of 64, 96, and 128
Enhanced Metal API: Adds new setArg overloads for uint32_t, float, bool, and uint3 types, plus a new dispatchThreadgroups method for explicit threadgroup dispatch
Stride-aware computation: The new kernel handles transposed tensor layouts by decomposing batch and head indices and using explicit stride information

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.

File	Description
backends/apple/metal/runtime/shims/et_metal_ops.mm	Replaces ~400 lines of MPSGraph code with inline Metal shader source and direct kernel dispatch; adds shader library caching
backends/apple/metal/runtime/shims/et_metal.mm	Implements new setArg overloads for scalar types and uint3 structs; adds dispatchThreadgroups for explicit threadgroup control
backends/apple/metal/runtime/shims/et_metal.h	Declares new Metal kernel function methods for argument setting and threadgroup dispatch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T21:20:50Z