Remove `cache_position` in more models by Cyrilvallez · Pull Request #44330 · huggingface/transformers

Cyrilvallez · 2026-02-27T15:33:02Z

What does this PR do?

As per the title! Follow-up of #44181 with more models!

HuggingFaceDocBuilderDev · 2026-02-27T15:42:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2026-03-09T16:53:49Z

run-slow: got_ocr2 marian lfm2_vl modernbert_decoder video_llama_3 qwen2_5_omni qwen3_omni_moe biogpt janus dbrx fast_vlm bart perception_lm blenderbot llava_onevision roberta gemma2 llava ovis2 deepseek_vl_hybrid mistral3 paligemma glm_ocr t5gemma2 ernie bigbird_pegasus pegasus t5gemma glm4v_moe time_series_transformer timesfm2_5 ernie4_5_moe voxtral llava_next_video nanochat qwen2_5_vl gpt_neox xlm_roberta_xl deepseek_vl vipllava cohere2 big_bird gemma3n granitemoehybrid internvl git lighton_ocr gpt_oss colqwen2 m2m_100 glm46v plbart aya_vision bamba metaclip_2 camembert zamba2 blt informer glm_image gemma3 mbart blenderbot_small cohere2_vision phi4_multimodal glm4v roberta_prelayernorm aimv2 smolvlm florence2 vaultgemma data2vec_text jetmoe pegasus_x

github-actions · 2026-03-09T16:55:07Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aimv2", "models/aya_vision", "models/bamba", "models/bart", "models/big_bird", "models/bigbird_pegasus", "models/biogpt", "models/blenderbot", "models/blenderbot_small", "models/blt", "models/camembert", "models/cohere2", "models/cohere2_vision", "models/colqwen2", "models/dbrx", "models/deepseek_vl", "models/deepseek_vl_hybrid", "models/ernie", "models/ernie4_5_moe", "models/fast_vlm", "models/florence2", "models/gemma2", "models/gemma3", "models/gemma3n", "models/git", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_ocr", "models/got_ocr2", "models/gpt_neox", "models/gpt_oss", "models/granitemoehybrid", "models/informer", "models/internvl", "models/janus", "models/jetmoe", "models/lfm2_vl", "models/lighton_ocr", "models/llava", "models/llava_next_video", "models/llava_onevision", "models/m2m_100", "models/marian", "models/mbart", "models/metaclip_2", "models/mistral3", "models/modernbert_decoder", "models/nanochat", "models/ovis2", "models/paligemma", "models/pegasus", "models/pegasus_x", "models/perception_lm", "models/phi4_multimodal", "models/plbart", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen3_omni_moe", "models/roberta", "models/roberta_prelayernorm", "models/smolvlm", "models/t5gemma", "models/t5gemma2", "models/time_series_transformer", "models/timesfm2_5", "models/vaultgemma", "models/video_llama_3", "models/vipllava", "models/voxtral", "models/xlm_roberta_xl", "models/zamba2"]
quantizations: []

github-actions · 2026-03-09T22:09:28Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	693885da	workflow commit (merge commit)
PR	e13feda8	branch commit (from PR)
main	0a0ac7a2	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

Cyrilvallez · 2026-03-10T09:12:44Z

run-slow: got_ocr2 marian lfm2_vl modernbert_decoder video_llama_3 qwen2_5_omni qwen3_omni_moe janus dbrx fast_vlm bart perception_lm blenderbot roberta gemma2 llava ovis2 deepseek_vl_hybrid mistral3 paligemma glm_ocr t5gemma2 pegasus t5gemma glm4v_moe time_series_transformer timesfm2_5 ernie4_5_moe voxtral llava_next_video nanochat qwen2_5_vl gemma3n

github-actions · 2026-03-10T09:14:03Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/blenderbot", "models/dbrx", "models/deepseek_vl_hybrid", "models/ernie4_5_moe", "models/fast_vlm", "models/gemma2", "models/gemma3n", "models/glm4v_moe", "models/glm_ocr", "models/got_ocr2", "models/janus", "models/lfm2_vl", "models/llava", "models/llava_next_video", "models/marian", "models/mistral3", "models/modernbert_decoder", "models/nanochat", "models/ovis2", "models/paligemma", "models/pegasus", "models/perception_lm", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen3_omni_moe", "models/roberta", "models/t5gemma", "models/t5gemma2", "models/time_series_transformer", "models/timesfm2_5", "models/video_llama_3", "models/voxtral"]
quantizations: []

github-actions · 2026-03-10T14:25:23Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	c43a2d40	workflow commit (merge commit)
PR	1df3aa4e	branch commit (from PR)
main	0e3be387	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

Cyrilvallez · 2026-03-10T14:45:08Z

run-slow: lfm2_vl modernbert_decoder qwen2_5_omni qwen3_omni_moe bart gemma2 llava gemma3n

github-actions · 2026-03-10T14:46:23Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/gemma2", "models/gemma3n", "models/lfm2_vl", "models/llava", "models/modernbert_decoder", "models/qwen2_5_omni", "models/qwen3_omni_moe"]
quantizations: []

github-actions · 2026-03-10T19:24:44Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	c43a2d40	workflow commit (merge commit)
PR	1df3aa4e	branch commit (from PR)
main	0e3be387	base commit (on `main`)

⚠️ Model CI failed to report results

The test failure analysis could not be completed. Please check the workflow run for details.

Cyrilvallez · 2026-03-11T08:45:10Z

Because the run-slow command does not work, I checked locally and all those models:
got_ocr2 marian lfm2_vl modernbert_decoder video_llama_3 qwen2_5_omni qwen3_omni_moe janus dbrx fast_vlm bart perception_lm blenderbot roberta gemma2 llava ovis2 deepseek_vl_hybrid mistral3 paligemma glm_ocr t5gemma2 pegasus t5gemma glm4v_moe time_series_transformer timesfm2_5 ernie4_5_moe voxtral llava_next_video nanochat qwen2_5_vl gemma3n
have no diffs with main on their IntegrationTest!
So all good!

vasqu

Only thing I would address

Hybrid models nit
Some simplification on the input_ids/embeds check on a small number of models (iirc 2 I commented on)

The rest is mostly me commenting to myself a bit or nits

vasqu · 2026-03-11T11:54:13Z

-            and cache_position is not None
-            and cache_position[0] > 0


Honestly we never needed that condition at all, given we have cache_params.has_previous_state

Yup, no idea why it was there... But that's why I fully removed!

vasqu · 2026-03-11T12:00:13Z

-            cache_position = torch.arange(
-                past_key_values_length, past_key_values_length + seq_length, device=inputs_embeds.device
-            )
+        position_ids = torch.arange(seq_length, device=inputs_embeds.device) + past_key_values_length


Position ids are always None I assume?

Yup, since for some reason later on it was doing positions = self.embed_positions(..., position_ids=cache_position) and cache_position can never be different from a very simple arange based on the current length (i.e. it does not have the subtlety of position_ids which are different based on padding, packing format etc...), it's the same to recreate the arange in this way! So independently of if the "real" position_ids are passed as kwarg. This is because the model recreates the positions anyway

vasqu · 2026-03-11T12:03:22Z

        # embed positions
        if position_ids is None:
-            position_ids = cache_position.unsqueeze(0)
+            position_ids = torch.arange(seq_length, device=inputs_embeds.device) + past_key_values_length


Seeing it multiple times but is there any difference for torch to do arange + int vs arange only?

Yes! It's a subtlety, but because past_key_values_length will be a tensor if and only if the cache is a StaticCache, doing it this way avoids a precious cuda synchronization for those cases! See #42433 and the issue mentioned for more details!

vasqu · 2026-03-11T12:07:25Z

        )

-    # Copied from transformers.models.bert.modeling_bert.BertModel._create_attention_masks
+    # No longer Copied from transformers.models.bert.modeling_bert.BertModel._create_attention_masks


Hmm a bit weird to change this model but not bert 🤔 how come?

Was focusing on finishing the models using modular, and a LOT of other models are copied from bert, so will keep it for next batch!

github-actions · 2026-03-11T14:24:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, aya_vision, bamba, bart, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blt, camembert, cohere2, cohere2_vision, colqwen2, data2vec, dbrx

a few more

2c58c77

Cyrilvallez force-pushed the remove-cache-pos-more-models branch from e69f373 to 2c58c77 Compare March 4, 2026 20:41

Cyrilvallez added 2 commits March 9, 2026 15:19

more models

e6c7ad0

llava and likes

563a69a

Cyrilvallez changed the title ~~Remove cache_position in more models~~ Remove cache_position in more models Mar 9, 2026

Cyrilvallez and others added 5 commits March 9, 2026 15:26

oupsi

9305561

Merge branch 'main' into remove-cache-pos-more-models

69b08a6

more to fix others via copy/modular

80a9629

fixes

e45f1fa

fixes

e13feda

Merge branch 'main' into remove-cache-pos-more-models

1df3aa4

Merge branch 'main' into remove-cache-pos-more-models

9135889

vasqu approved these changes Mar 11, 2026

View reviewed changes

review nits

3c60003

Cyrilvallez merged commit 50cc0b8 into main Mar 11, 2026
29 checks passed

Cyrilvallez deleted the remove-cache-pos-more-models branch March 11, 2026 14:47

Cyrilvallez mentioned this pull request Mar 11, 2026

Remove cache_position in more models (2) #44602

Merged

Conversation

Cyrilvallez commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 27, 2026

Uh oh!

Cyrilvallez commented Mar 9, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

CI Results

Commit Info

Uh oh!

Cyrilvallez commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

CI Results

Commit Info

Uh oh!

Cyrilvallez commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

CI Results

Commit Info

Uh oh!

Cyrilvallez commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cyrilvallez commented Feb 27, 2026 •

edited

Loading

Cyrilvallez commented Mar 11, 2026 •

edited

Loading

Cyrilvallez Mar 11, 2026 •

edited

Loading