Use custom SDPA for decoder-only HF Transformers by guangy10 · Pull Request #46 · huggingface/optimum-executorch

guangy10 · 2025-04-08T03:35:50Z

Register a custom SDPA attention to HF Transformers
Use the custom SDPA for decoder-only text models (custom SDPA is optimized for ExecuTorch)
Have to bump up transformers version to latest release (4.51.0) in order to use the AttentionInterface. This has been addressed in Bump Transformers verion #47
optimum-cli export executorch supports custom SDPA

3x speedup using custom SDPA for HF smollm2 (XNNPACK fp32):

General applicable to all causal LMs. For encoder-decoder models, it may apply to the self attention layer in the decoder, can make an experiment in a follow-up PR.

HuggingFaceDocBuilderDev · 2025-04-08T03:39:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

guangy10 · 2025-04-08T03:44:31Z

Will gate the access to custom_sdpa in 0.4.0 release.

guangy10 · 2025-04-08T03:52:53Z

@larryliu0820 FYI, I'm running into the same the issue when using the custom sdpa with eager. The exported model is fine. https://github.com/huggingface/optimum-executorch/actions/runs/14324546808/job/40147624383?pr=46#step:5:137. Checked the shape of the attn output, and they are same. Will dig further by inspecting if output tensors are close tomorrow.

Updated: The issue is because the eager was using DynamicCache by default, and our custom SDPA can ONLY work with the StaticCache. After setting the cache_implementation correctly, both the eager and the ExecuTorch model can work as expected.

guangy10 · 2025-04-08T19:34:23Z

Transformers version bump has been merged in #47

guangy10 · 2025-04-08T20:47:49Z

cc: @larryliu0820 @kimishpatel for review

larryliu0820 · 2025-04-08T23:39:41Z

Can you change this: https://github.com/huggingface/optimum-executorch/blob/main/optimum/executorch/modeling.py#L181 to use the new Python API: https://pytorch.org/executorch/stable/index.html

guangy10 · 2025-04-09T00:21:54Z

Can you change this: https://github.com/huggingface/optimum-executorch/blob/main/optimum/executorch/modeling.py#L181 to use the new Python API: https://pytorch.org/executorch/stable/index.html

@larryliu0820 Yeah, I'm going to do it in a separate PR.

guangy10 · 2025-04-10T18:48:57Z

Rebased and fixed conflicts

guangy10 · 2025-04-11T01:05:24Z

@larryliu0820 @kimishpatel good to merge?

guangy10 · 2025-04-11T17:49:11Z

Support export with custom_sdpa using optimum-cli export executorch

guangy10 force-pushed the custom_attn_impl branch from 8e9e3c2 to 1b2cb42 Compare April 8, 2025 19:32

guangy10 force-pushed the custom_attn_impl branch from 1b2cb42 to cadd829 Compare April 8, 2025 20:41

guangy10 requested review from echarlaix and michaelbenayoun April 8, 2025 20:42

guangy10 marked this pull request as ready for review April 8, 2025 20:42

kimishpatel reviewed Apr 9, 2025

View reviewed changes

Comment thread optimum/executorch/attentions/custom_sdpa.py

guangy10 force-pushed the custom_attn_impl branch 4 times, most recently from 754dd57 to e8f5263 Compare April 10, 2025 18:14

Use custom sdpa for ExecuTorch

eb2c840

guangy10 force-pushed the custom_attn_impl branch from e8f5263 to eb2c840 Compare April 10, 2025 18:48

guangy10 changed the title ~~Use custom sdpa for ExecuTorch~~ Use custom SDPA for decoder-only HF Transformers Apr 10, 2025

kimishpatel approved these changes Apr 11, 2025

View reviewed changes

michaelbenayoun approved these changes Apr 11, 2025

View reviewed changes

support export with custom_sdpa using optimum-cli

1dd8b3a

Updated docs to reflect using custom_sdpa

aab448f

guangy10 force-pushed the custom_attn_impl branch from 14a6bbd to aab448f Compare April 11, 2025 19:59

guangy10 merged commit 2901511 into huggingface:main Apr 11, 2025
218 checks passed

guangy10 deleted the custom_attn_impl branch April 11, 2025 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use custom SDPA for decoder-only HF Transformers #46

Use custom SDPA for decoder-only HF Transformers #46
guangy10 merged 3 commits intohuggingface:mainfrom
guangy10:custom_attn_impl

guangy10 commented Apr 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025 •

edited

Loading

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

larryliu0820 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 9, 2025

Uh oh!

Uh oh!

guangy10 commented Apr 10, 2025

Uh oh!

guangy10 commented Apr 11, 2025

Uh oh!

guangy10 commented Apr 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

guangy10 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 8, 2025

Uh oh!

larryliu0820 commented Apr 8, 2025

Uh oh!

guangy10 commented Apr 9, 2025

Uh oh!

Uh oh!

guangy10 commented Apr 10, 2025

Uh oh!

guangy10 commented Apr 11, 2025

Uh oh!

guangy10 commented Apr 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

guangy10 commented Apr 8, 2025 •

edited

Loading

guangy10 commented Apr 8, 2025 •

edited

Loading