Add packed tensor format support for flex/sdpa/eager through the mask!#39194
Add packed tensor format support for flex/sdpa/eager through the mask!#39194Cyrilvallez merged 16 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Very nice, just missing a test 😉
| # Packed format is always on batch of size 1 so we can early exit if not the case | ||
| if not position_ids.shape[0] == 1: |
There was a problem hiding this comment.
There really shouldn't be a restriction on this. It should work too with 2D packed tensors.
There was a problem hiding this comment.
Humm alright, I can lift it easily - thought it was always packed with all sequences along a batch of 1
|
[For maintainers] Suggested jobs to run (before merge) run-slow: arcee, aria, bitnet, cohere, cohere2, csm, deepseek_v3, dia, diffllama, dots1, emu3, gemma, gemma2, gemma3, gemma3n, glm |
winglian
left a comment
There was a problem hiding this comment.
perfect! fixes the regression and flex + packing works again for us now
#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
|
Hey @Cyrilvallez the docstring of
but it's actually a required argument, breaking existing code that calls this function. Is it intentional that it's required or should it have a default of |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1. Moreover, the check has been moved inside the if-branch that actually needs it instead of performing it at the start of the function. That way, no error is raised if we don't visit this branch.
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
|
Hi, I would like to inquire whether you could implement the attention_mask with the pattern [1,1,2,2,2,3,3,3], and support packed tensors with FlashAttention for scenarios requiring a sparse mask. This approach would enable us to leverage a universal method (2D attention mask/position IDs) to handle variable-length attention via masking. Additionally, we could extend support to 4D masks for more complex cases, building upon SDPA (Scaled Dot-Product Attention) and eager attention. |
|
Hey @BenjaminBossan! It's optional in the sense that it can be None, but indeed I did not provide a default of None to force the models to pass the argument to always allow packed format (same as the past_key_values). |
|
@Snowdar for your usage of FA2, you should not pass any mask mask but forward the seqlens directly 🤗 |
Thanks for explaining. I think in this case, it would have been better to provide a default, given that the signature was changed in a backwards incompatible way and then the change was released as a patch release, where the expectation as a user is that I can always upgrade without fear of breakage. I'm not sure if this function is considered "private", but even so, I think providing a default when there is a reasonable one would have been better. Now that the patch release is out, it's too late so I don't have any strong opinion either way. |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
|
Yep I think we need to patch again to have a default @Cyrilvallez |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
What does this PR do?
As per the title.