Skip to content

Blockwise mask fn as opt arg in all masking functions#45477

Open
zucchini-nlp wants to merge 26 commits intohuggingface:mainfrom
zucchini-nlp:split-out-gemma-style-mask
Open

Blockwise mask fn as opt arg in all masking functions#45477
zucchini-nlp wants to merge 26 commits intohuggingface:mainfrom
zucchini-nlp:split-out-gemma-style-mask

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Apr 16, 2026

What does this PR do?

As per title, I think this pattern is used quite often and deserves to be a public mask-fn. Used currently in gemma/paligemma family, GIT, PI0 and will be used in two upcoming models (deepseekOcr and Molmo2)

This PR allows all these models ton go non-vmap path, which iirc is more preferable for us.

I opted for adding 'blockwise_mask' as an arg in existing mask fn, it doesn't seems like a new mask type of itself. We might have to create otherwise - create_blockwise_causal_mask, create_blockwise_sliding_causal_mask, etc.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Add blockwise mask fn as opt arg for all masking functions Blockwise mask fn as opt arg for all masking functions Apr 17, 2026
@zucchini-nlp zucchini-nlp changed the title Blockwise mask fn as opt arg for all masking functions Blockwise mask fn as opt arg in all masking functions Apr 17, 2026
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 36c679a5 workflow commit (merge commit)
PR e8f06b29 branch commit (from PR)
main 77de8dd8 base commit (on main)

Model CI Report

17 new failed tests from this PR 😭

  • gemma3:
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_batch_crops (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_bf16 (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_crops (❌ ⟹ ❌)
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_multiimage (❌ ⟹ ❌)

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

  • paligemma:
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_integration_detection_bug (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_multiimage (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_VQA (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched_bf16 (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_batched_f16 (❌ ⟹ ❌)
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_small_model_integration_test_paligemma_empty_prompt (❌ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_libero (❌ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_reference_values (❌ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

oh, no, I forgot about conversion. Will wait until merged and trigger test again

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 43e0b2ad workflow commit (merge commit)
PR 0e17f9a4 branch commit (from PR)
main c3ec5ff4 base commit (on main)

Model CI Report

6 new failed tests from this PR 😭

  • gemma3:
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_multiimage (✅ ⟹ ❌)

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

  • paligemma:
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_integration_detection_bug (✅ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_libero (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_reference_values (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN e73db476 workflow commit (merge commit)
PR 3617c364 branch commit (from PR)
main 0db33792 base commit (on main)

Model CI Report

5 new failed tests from this PR 😭

  • gemma3:
    tests/models/gemma3/test_modeling_gemma3.py::Gemma3IntegrationTest::test_model_4b_multiimage (✅ ⟹ ❌)

  • paligemma:
    tests/models/paligemma/test_modeling_paligemma.py::PaliGemmaForConditionalGenerationIntegrationTest::test_integration_detection_bug (✅ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_libero (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_reference_values (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, paligemma, pi0

1 similar comment
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like having this natively!
Just a few things, most particularly I think it's best to pad the mask directly once at the beginning, similar to https://github.com/huggingface/transformers/blob/main/src/transformers/masking_utils.py#L185-L194

Comment thread src/transformers/masking_utils.py Outdated
Comment thread src/transformers/masking_utils.py Outdated
Comment thread src/transformers/masking_utils.py
Comment thread src/transformers/masking_utils.py
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 1033cdbc workflow commit (merge commit)
PR 9b9706ce branch commit (from PR)
main bbb51c83 base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_reference_values (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Addressed the comments @Cyrilvallez

Comment thread src/transformers/masking_utils.py Outdated
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 27, 2026

I think we wait on Cyril for a final pass? I can also take a look if you want but I don't think it's rushing

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still need to take the offset into account (see comment), but otherwise looks good to me! cc @vasqu here for last check and approval as you checked it as well!

Comment thread src/transformers/masking_utils.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3, gemma4, git, paligemma, pi0

@zucchini-nlp
Copy link
Copy Markdown
Member Author

final pass with slow CI on rebased branch

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@huggingface huggingface deleted a comment from github-actions Bot Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45477&sha=c4c5ce

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: gemma3, gemma4, git, paligemma, pi0

@zucchini-nlp zucchini-nlp requested a review from vasqu April 29, 2026 09:35
@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/gemma4", "models/git", "models/paligemma", "models/pi0"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 4e759e21 workflow commit (merge commit)
PR c4c5ce1a branch commit (from PR)
main 9120f5e4 base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • gemma4:
    tests/models/gemma4/test_modeling_gemma4.py::Gemma4IntegrationTest::test_export_text_only (❌ ⟹ ❌)

  • pi0:
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_pi0_base_libero (✅ ⟹ ❌)
    tests/models/pi0/test_modeling_pi0.py::PI0ModelIntegrationTest::test_train_pi0_base_libero (❌ ⟹ ❌)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants