Blockwise mask fn as opt arg in all masking functions#45477
Blockwise mask fn as opt arg in all masking functions#45477zucchini-nlp wants to merge 26 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 17 new failed tests from this PR 😭
|
|
oh, no, I forgot about conversion. Will wait until merged and trigger test again |
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 6 new failed tests from this PR 😭
|
|
run-slow: gemma3, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 5 new failed tests from this PR 😭
|
|
run-slow: gemma3, paligemma, pi0 |
1 similar comment
|
run-slow: gemma3, paligemma, pi0 |
|
Workflow Run ⚙️💔 This comment contains |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Nice, I like having this natively!
Just a few things, most particularly I think it's best to pad the mask directly once at the beginning, similar to https://github.com/huggingface/transformers/blob/main/src/transformers/masking_utils.py#L185-L194
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 3 new failed tests from this PR 😭
|
|
Addressed the comments @Cyrilvallez |
|
I think we wait on Cyril for a final pass? I can also take a look if you want but I don't think it's rushing |
Cyrilvallez
left a comment
There was a problem hiding this comment.
I believe we still need to take the offset into account (see comment), but otherwise looks good to me! cc @vasqu here for last check and approval as you checked it as well!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
final pass with slow CI on rebased branch |
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45477&sha=c4c5ce |
|
run-slow: gemma3, gemma4, git, paligemma, pi0 |
|
This comment contains models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"] |
CI ResultsCommit Info
Model CI Report❌ 3 new failed tests from this PR 😭
|
What does this PR do?
As per title, I think this pattern is used quite often and deserves to be a public mask-fn. Used currently in gemma/paligemma family, GIT, PI0 and will be used in two upcoming models (deepseekOcr and Molmo2)
This PR allows all these models ton go
non-vmappath, which iirc is more preferable for us.I opted for adding 'blockwise_mask' as an arg in existing mask fn, it doesn't seems like a new mask type of itself. We might have to create otherwise -
create_blockwise_causal_mask,create_blockwise_sliding_causal_mask, etc.