[`FlexAttn`] Fix models with unique characteristics by vasqu · Pull Request #38433 · huggingface/transformers

vasqu · 2025-05-28T10:10:14Z

For context, flex attention cannot work with dimensions less than 16; hence, the config was manipulated to ensure the test works. Before, most models failed, including llama.

There are some models such as idefics 2+3, smolvlm which do not have the _is_composite flag and as I do not want to affect other tests - so, I added a new condition to skip the test. They may have passed before but it's not future-proof. For Zamba2, I overwrote the test since some other dims don't add up when changing hidden_size.

There are other options:

Rewrite the test to handle subconfigs --> tried that but there are so many edge cases and weird configs that lead to some issues one way or another.
Adjust the dimensions in all models and avoid the hidden dim manipulation in the first place. Not sure if this is good as it will strain the tests even more imo 👀

Edit: #38434 took care of the composite models. This PR is left to fix some of the more unique models such as zamba2 and deepseek3.

HuggingFaceDocBuilderDev · 2025-05-28T10:24:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2025-05-28T11:20:53Z

Maybe cc @zucchini-nlp (composite configs / models on vlms)

vasqu · 2025-05-28T11:59:59Z

#38434 seems to handle the vlms - will probably close this / only keep it for zamba2

zucchini-nlp · 2025-05-28T12:19:35Z

Good timing 😄

ArthurZucker

_is_composite is no longer needed / u8sed, but hasattr(config, "subconfigs") would be equivalent!

vasqu · 2025-05-28T13:32:09Z

I guess I unintentionally did that with len(config.sub_configs) > 1 lol

Should I merge this temporarily, or should we wait for #38434? Raushan's PR is trying to tackle the overall test to be composite compatible.

ydshieh · 2025-06-03T09:52:36Z

RUN_SLOW=1 python3 -m pytest -v tests/models/deepseek_v3/test_modeling_deepseek_v3.py::DeepseekV3ModelTest::test_flex_attention_with_grads

gives

ValueError: NYI: Currently non power of 2 embedding dimension are not supported. Got E=48 and Ev=32.

If you want to work on this too :-)

cc @zucchini-nlp too

zucchini-nlp · 2025-06-03T09:54:11Z

Btw @vasqu , my PR is merged so should be good to merge this one as well :)

vasqu · 2025-06-03T12:37:53Z

Nice, I'll take a look (maybe tomorrow, not sure) :D the deepseek issue looks like another hidden emb size issue 👀

Ig, I don't need to add the skip condition then?

ydshieh · 2025-06-03T13:27:56Z

no need to skip

vasqu · 2025-06-04T10:04:24Z

@ydshieh @zucchini-nlp If you want to take another look. The PR changed basically to fixing some more unique models which would be edge cases in the general test.

zucchini-nlp

LGTM, thanks

* fix * style * check * check 2 * add deepseek workaround

vasqu added 2 commits May 28, 2025 12:02

fix

f08b79b

style

7af31ba

vasqu requested review from ArthurZucker and ydshieh May 28, 2025 10:11

vasqu added 2 commits May 28, 2025 12:36

check

7a466e4

check 2

cbf8fab

ArthurZucker approved these changes May 28, 2025

View reviewed changes

vasqu added 2 commits June 4, 2025 10:47

Merge branch 'main' into vas-flexattn-test-remaining-changes

9904533

add deepseek workaround

cea9c44

vasqu changed the title ~~[FlexAttn] Skip models with multiple configs (composite models)~~ [FlexAttn] Fix models with unique characteristics Jun 4, 2025

Merge branch 'main' into vas-flexattn-test-remaining-changes

0bbbc64

zucchini-nlp approved these changes Jun 4, 2025

View reviewed changes

vasqu merged commit 1dc619e into main Jun 4, 2025
15 checks passed

vasqu deleted the vas-flexattn-test-remaining-changes branch June 4, 2025 11:37

bvantuan pushed a commit to bvantuan/transformers that referenced this pull request Jun 12, 2025

[FlexAttn] Fix models with unique characteristics (huggingface#38433)

5db0285

* fix * style * check * check 2 * add deepseek workaround

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`FlexAttn`] Fix models with unique characteristics#38433

[`FlexAttn`] Fix models with unique characteristics#38433
vasqu merged 7 commits intomainfrom
vas-flexattn-test-remaining-changes

vasqu commented May 28, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 28, 2025

Uh oh!

vasqu commented May 28, 2025

Uh oh!

vasqu commented May 28, 2025

Uh oh!

zucchini-nlp commented May 28, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

vasqu commented May 28, 2025

Uh oh!

ydshieh commented Jun 3, 2025

Uh oh!

zucchini-nlp commented Jun 3, 2025

Uh oh!

vasqu commented Jun 3, 2025

Uh oh!

ydshieh commented Jun 3, 2025

Uh oh!

vasqu commented Jun 4, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vasqu commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 28, 2025

Uh oh!

vasqu commented May 28, 2025

Uh oh!

vasqu commented May 28, 2025

Uh oh!

zucchini-nlp commented May 28, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented May 28, 2025

Uh oh!

ydshieh commented Jun 3, 2025

Uh oh!

zucchini-nlp commented Jun 3, 2025

Uh oh!

vasqu commented Jun 3, 2025

Uh oh!

ydshieh commented Jun 3, 2025

Uh oh!

vasqu commented Jun 4, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vasqu commented May 28, 2025 •

edited

Loading