[FlexAttn] Fix models with unique characteristics#38433
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Maybe cc @zucchini-nlp (composite configs / models on vlms) |
|
#38434 seems to handle the vlms - will probably close this / only keep it for zamba2 |
|
Good timing 😄 |
ArthurZucker
left a comment
There was a problem hiding this comment.
_is_composite is no longer needed / u8sed, but hasattr(config, "subconfigs") would be equivalent!
|
I guess I unintentionally did that with Should I merge this temporarily, or should we wait for #38434? Raushan's PR is trying to tackle the overall test to be composite compatible. |
gives
If you want to work on this too :-) cc @zucchini-nlp too |
|
Btw @vasqu , my PR is merged so should be good to merge this one as well :) |
|
Nice, I'll take a look (maybe tomorrow, not sure) :D the deepseek issue looks like another hidden emb size issue 👀 Ig, I don't need to add the skip condition then? |
|
no need to skip |
FlexAttn] Skip models with multiple configs (composite models)FlexAttn] Fix models with unique characteristics
|
@ydshieh @zucchini-nlp If you want to take another look. The PR changed basically to fixing some more unique models which would be edge cases in the general test. |
* fix * style * check * check 2 * add deepseek workaround
For context, flex attention cannot work with dimensions less than 16; hence, the config was manipulated to ensure the test works. Before, most models failed, including llama.
There are some models such as idefics 2+3, smolvlm which do not have the
_is_compositeflag and as I do not want to affect other tests - so, I added a new condition to skip the test. They may have passed before but it's not future-proof. For Zamba2, I overwrote the test since some other dims don't add up when changinghidden_size.There are other options:
Edit: #38434 took care of the composite models. This PR is left to fix some of the more unique models such as zamba2 and deepseek3.