Remove unnecessary expand_as in get_placeholder_mask across VLMs by syncdoth · Pull Request #44907 · huggingface/transformers

syncdoth · 2026-03-21T06:07:35Z

Summary

Remove .expand_as(inputs_embeds) from placeholder mask creation in get_placeholder_mask and equivalent inline patterns across all VLM models. masked_scatter natively broadcasts (B, S, 1) → (B, S, H), making the expansion unnecessary.
Replace inputs_embeds[special_image_mask].numel() == image_features.numel() validation with equivalent arithmetic n_tokens * inputs_embeds.shape[-1] == image_features.numel(), which avoids data-dependent boolean indexing and is more torch.compile-friendly.
71 files changed across llava, qwen2_vl, paligemma, gemma3n, chameleon, video_llava, idefics2/3, instructblip, blip_2, and many more.

How this was developed

The core fix was first implemented and verified on llava/modeling_llava.py, then expanded to all other models following the same pattern using Claude Code. Each file was reviewed to ensure the transformation was appropriate — files with genuinely different expand_as usage (e.g., pe_audio where the mask is later .reshape()-ed) were left unchanged.

Test plan

Correctness verified: masked_scatter with broadcast (B,S,1) mask produces identical results to expanded (B,S,H) mask
pytest tests/models/llava/test_modeling_llava.py -x -v -k "not slow" — 136 passed
pytest tests/models/qwen2_vl/test_modeling_qwen2_vl.py -x -v -k "not slow" — 137 passed
pytest tests/models/paligemma/test_modeling_paligemma.py -x -v -k "not slow" — 124 passed
ruff check — all checks passed
check_modular_conversion.py --fix_and_overwrite — all generated files consistent
No duplicate PRs found (gh pr list --search "expand_as placeholder mask" / "get_placeholder_mask")

This PR uses AI assistance (Claude Code). I have reviewed all changes and validated the behavior end-to-end.

The placeholder mask was being expanded from (B, S, 1) to (B, S, H) via `.expand_as(inputs_embeds)` before being passed to `masked_scatter`. Since `masked_scatter` natively supports broadcasting, this expansion materializes a large boolean tensor unnecessarily. Changes: - Remove `.expand_as(inputs_embeds)` from mask creation, keeping masks as (B, S, 1) and relying on `masked_scatter`/`torch.where` broadcasting - Replace `inputs_embeds[mask].numel() == features.numel()` validation with equivalent arithmetic `n_tokens * hidden_dim == features.numel()`, which avoids data-dependent boolean indexing and is more torch.compile-friendly

github-actions · 2026-03-21T06:08:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, aya_vision, blip_2, chameleon, cohere2_vision, colqwen2, deepseek_vl, deepseek_vl_hybrid, emu3, ernie4_5_vl_moe, fast_vlm, florence2, fuyu, gemma3, gemma3n, glm46v

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rocketknight1 · 2026-03-23T12:13:08Z

cc @zucchini-nlp it's an agent PR but looks high-quality to me, mostly because I'm a sucker for ways to avoid materializing broadcasts and tensor expansions 😅

zucchini-nlp · 2026-03-23T12:20:03Z

Yea, this should be mostly copy-paste. I'm coming to this later today

Fix ruff formatting in examples/modular-transformers

ac13017

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zucchini-nlp self-requested a review March 23, 2026 10:19

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary expand_as in get_placeholder_mask across VLMs#44907

Remove unnecessary expand_as in get_placeholder_mask across VLMs#44907
syncdoth wants to merge 2 commits intohuggingface:mainfrom
syncdoth:remove-expand-as-placeholder-mask

syncdoth commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

Rocketknight1 commented Mar 23, 2026

Uh oh!

zucchini-nlp commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

syncdoth commented Mar 21, 2026

Summary

How this was developed

Test plan

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

Rocketknight1 commented Mar 23, 2026

Uh oh!

zucchini-nlp commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants