[MPS] Fix SDPA output shape when value head dim differs by hvaara · Pull Request #176843 · pytorch/pytorch

hvaara · 2026-03-08T20:22:26Z

This fixes MPS SDPA output shape for cases where value.size(-1) != query.size(-1), so output now follows (..., L, Ev) as expected. I also added guards in Metal kernel paths that assume equal qkv head dims.

Added the updated meta shape inference for the sdpa_general_mps path which seems to have been left out initially.

Added regression coverage in test/test_transformers.py covering the shape semantics, and a similar one in test/test_mps.py that also checks for numerical parity with CPU.

Fixes #176767

pytorch-bot · 2026-03-08T20:22:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176843

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cf60249 with merge base 7643509 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet · 2026-03-09T20:50:17Z

@pytorchbot merge -f "Lint + MPS is green"

pytorchmergebot · 2026-03-09T20:51:59Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@mergennachin

Fix proposed by @mergennachin in #177603. The issue was introduced in #176843. Remove data-dependent branching in the MPS SDPA meta kernel so export supports dynamic seq. Update meta-dispatch test to compare only the first output and add an export regression test. @angelayi, you wrote the original meta registration and tests in #159695. Does this LGTY? Fixes #177603 Pull Request resolved: #177620 Approved by: https://github.com/malfet

Summary Add float mask test coverage to test_sdpa_export_dynamic_seq_len, complementing the fix in #177620. The original regression (#177603, introduced by #176843) was triggered in practice by float attention masks — Metal SDPA requires mask dtype to match Q/K/V dtype, so real models use float masks, not bool. The existing test only covered bool masks. This also verifies across the <= 8 / > 8 seq_len boundary that was the branching condition in the buggy meta kernel. Test plan - python test/test_mps.py TestSDPA.test_sdpa_export_dynamic_seq_len

This fixes MPS SDPA output shape for cases where `value.size(-1) != query.size(-1)`, so output now follows `(..., L, Ev)` as expected. I also added guards in Metal kernel paths that assume equal qkv head dims. Added the updated meta shape inference for the `sdpa_general_mps` path which seems to have been left out initially. Added regression coverage in `test/test_transformers.py` covering the shape semantics, and a similar one in `test/test_mps.py` that also checks for numerical parity with CPU. Fixes pytorch#176767 Pull Request resolved: pytorch#176843 Approved by: https://github.com/malfet

@mergennachin

Fix proposed by @mergennachin in pytorch#177603. The issue was introduced in pytorch#176843. Remove data-dependent branching in the MPS SDPA meta kernel so export supports dynamic seq. Update meta-dispatch test to compare only the first output and add an export regression test. @angelayi, you wrote the original meta registration and tests in pytorch#159695. Does this LGTY? Fixes pytorch#177603 Pull Request resolved: pytorch#177620 Approved by: https://github.com/malfet

@mergennachin

Fix proposed by @mergennachin in pytorch#177603. The issue was introduced in pytorch#176843. Remove data-dependent branching in the MPS SDPA meta kernel so export supports dynamic seq. Update meta-dispatch test to compare only the first output and add an export regression test. @angelayi, you wrote the original meta registration and tests in pytorch#159695. Does this LGTY? Fixes pytorch#177603 Pull Request resolved: pytorch#177620 Approved by: https://github.com/malfet

… dim Workaround for PyTorch < 2.12 bug (pytorch/pytorch#176767, pytorch/pytorch#176843) where scaled_dot_product_attention on MPS produces incorrect output when value head dim != query head dim. DeepSeek models (MQA) are affected as they have different qk and v head dims. The fix pads v to match q's head dim before SDPA, then truncates output back to the original v size. Fixes huggingface#44554

[MPS] Fix SDPA output shape when value head dim differs

cf60249

hvaara requested a review from malfet as a code owner March 8, 2026 20:22

pytorch-bot Bot added the release notes: mps Release notes category label Mar 8, 2026

pytorchbot added the open source label Mar 8, 2026

hvaara mentioned this pull request Mar 8, 2026

MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim #176767

Closed

malfet added ciflow/mps Run MPS tests (subset of trunk) topic: bug fixes topic category labels Mar 9, 2026

malfet approved these changes Mar 9, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 9, 2026

pytorchmergebot added the Merged label Mar 9, 2026

pytorchmergebot closed this in eccc7be Mar 9, 2026

pytorchmergebot removed the merging label Mar 9, 2026

hvaara deleted the mps-sdpa-ev-shape-fix branch March 9, 2026 21:55

hvaara mentioned this pull request Mar 10, 2026

[MPS] Upstream correctness issue in attention when value head dim differs from query huggingface/transformers#44554

Open

4 tasks

mergennachin mentioned this pull request Mar 17, 2026

[MPS] Meta kernel for _scaled_dot_product_attention_math_for_mps guards on dynamic seq_len, breaking torch.export #177603

Closed

hvaara mentioned this pull request Mar 17, 2026

[MPS] Fix SDPA meta shapes to avoid dynamic seq guards #177620

Closed

mergennachin mentioned this pull request Mar 17, 2026

[MPS] expand the current export unit test for SDPA #177686

Draft

Jah-yee mentioned this pull request Apr 16, 2026

Fix MPS SDPA output shape when value head dim differs from query head dim huggingface/transformers#45467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPS] Fix SDPA output shape when value head dim differs#176843

[MPS] Fix SDPA output shape when value head dim differs#176843
hvaara wants to merge 1 commit intopytorch:mainfrom
hvaara:mps-sdpa-ev-shape-fix

hvaara commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

malfet commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hvaara commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176843

✅ No Failures

Uh oh!

malfet commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hvaara commented Mar 8, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 8, 2026 •

edited

Loading