Skip to content

[FlexAttn] Fix models with unique characteristics#38433

Merged
vasqu merged 7 commits intomainfrom
vas-flexattn-test-remaining-changes
Jun 4, 2025
Merged

[FlexAttn] Fix models with unique characteristics#38433
vasqu merged 7 commits intomainfrom
vas-flexattn-test-remaining-changes

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented May 28, 2025

For context, flex attention cannot work with dimensions less than 16; hence, the config was manipulated to ensure the test works. Before, most models failed, including llama.

There are some models such as idefics 2+3, smolvlm which do not have the _is_composite flag and as I do not want to affect other tests - so, I added a new condition to skip the test. They may have passed before but it's not future-proof. For Zamba2, I overwrote the test since some other dims don't add up when changing hidden_size.

There are other options:

  • Rewrite the test to handle subconfigs --> tried that but there are so many edge cases and weird configs that lead to some issues one way or another.
  • Adjust the dimensions in all models and avoid the hidden dim manipulation in the first place. Not sure if this is good as it will strain the tests even more imo 👀

Edit: #38434 took care of the composite models. This PR is left to fix some of the more unique models such as zamba2 and deepseek3.

@vasqu vasqu requested review from ArthurZucker and ydshieh May 28, 2025 10:11
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented May 28, 2025

Maybe cc @zucchini-nlp (composite configs / models on vlms)

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented May 28, 2025

#38434 seems to handle the vlms - will probably close this / only keep it for zamba2

@zucchini-nlp
Copy link
Copy Markdown
Member

Good timing 😄

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_composite is no longer needed / u8sed, but hasattr(config, "subconfigs") would be equivalent!

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented May 28, 2025

I guess I unintentionally did that with len(config.sub_configs) > 1 lol

Should I merge this temporarily, or should we wait for #38434? Raushan's PR is trying to tackle the overall test to be composite compatible.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Jun 3, 2025

RUN_SLOW=1 python3 -m pytest -v tests/models/deepseek_v3/test_modeling_deepseek_v3.py::DeepseekV3ModelTest::test_flex_attention_with_grads

gives

ValueError: NYI: Currently non power of 2 embedding dimension are not supported. Got E=48 and Ev=32.

If you want to work on this too :-)

cc @zucchini-nlp too

@zucchini-nlp
Copy link
Copy Markdown
Member

Btw @vasqu , my PR is merged so should be good to merge this one as well :)

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Jun 3, 2025

Nice, I'll take a look (maybe tomorrow, not sure) :D the deepseek issue looks like another hidden emb size issue 👀

Ig, I don't need to add the skip condition then?

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Jun 3, 2025

no need to skip

@vasqu vasqu changed the title [FlexAttn] Skip models with multiple configs (composite models) [FlexAttn] Fix models with unique characteristics Jun 4, 2025
@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Jun 4, 2025

@ydshieh @zucchini-nlp If you want to take another look. The PR changed basically to fixing some more unique models which would be edge cases in the general test.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@vasqu vasqu merged commit 1dc619e into main Jun 4, 2025
15 checks passed
@vasqu vasqu deleted the vas-flexattn-test-remaining-changes branch June 4, 2025 11:37
bvantuan pushed a commit to bvantuan/transformers that referenced this pull request Jun 12, 2025
* fix

* style

* check

* check 2

* add deepseek workaround
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants