Prepare and keep track of position ids in `generate` by zucchini-nlp · Pull Request #43734 · huggingface/transformers

zucchini-nlp · 2026-02-04T11:12:26Z

What does this PR do?

As per title. lays ground to unifying 3D position ids in qwen-style VLMs

PR adds a single entrypoint to prepare position ids in GenerationMixin which models can override if needed (qwen-vl for ex). This allow users to prepare their own position ids and pass them to generate(). In decoding stages, the position ids are simply incremented by one to build the next positions

Along with it, PR starts a light unification on 3D positions by splitting it into its own utility fn. Now we have only two or three models with their own compute_3d_positions and all other models copy from there. In the next PR, I will split get_rope_index into smaller components allowing us to copy similarities easily. I am working on it locally but it's blocked by current branch

Review starting from transformers/generation and models from which we copy (qwen2-vl and ernie4_5_vl)

Fixes #29149

github-actions · 2026-02-04T13:55:07Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43734&sha=929e2c

HuggingFaceDocBuilderDev · 2026-02-04T14:53:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-02-04T16:57:28Z

run-slow: colqwen2, ernie4_5_vl_moe, gemma3, glm46v, glm4v, glm4v_moe, glm_image, glm_ocr, gpt_neo, paddleocr_vl, qwen2_5_vl, qwen2_vl, qwen3_vl, qwen3_vl_moe, reformer, video_llama_3

github-actions · 2026-02-04T16:58:42Z

This comment contains run-slow, running the specified jobs:

models: ["models/colqwen2", "models/ernie4_5_vl_moe", "models/gemma3", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_ocr", "models/gpt_neo", "models/paddleocr_vl", "models/qwen2_5_vl", "models/qwen2_vl", "models/qwen3_vl", "models/qwen3_vl_moe", "models/reformer", "models/video_llama_3"]
quantizations: []

github-actions · 2026-02-04T18:20:04Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	692a6f4e	merge commit
PR	819cecb1	branch commit
main	452c179e	base commit

✅ No failing test specific to this PR 🎉 👏 !

zucchini-nlp · 2026-02-06T11:59:16Z

run-slow: colqwen2, ernie4_5_vl_moe, gemma3, glm46v, glm4v, glm4v_moe, glm_image, glm_ocr, gpt_neo, paddleocr_vl, qwen2_5_vl, qwen2_vl, qwen3_vl, qwen3_vl_moe, reformer, video_llama_3

github-actions · 2026-02-06T12:00:29Z

This comment contains run-slow, running the specified jobs:

models: ["models/colqwen2", "models/ernie4_5_vl_moe", "models/gemma3", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_ocr", "models/gpt_neo", "models/paddleocr_vl", "models/qwen2_5_vl", "models/qwen2_vl", "models/qwen3_vl", "models/qwen3_vl_moe", "models/reformer", "models/video_llama_3"]
quantizations: []

zucchini-nlp · 2026-02-06T12:01:02Z

-            position_ids.masked_fill_(attention_mask == 0, 1)
+            position_ids.masked_fill_(attention_mask == 0, 0)


doesn't make diff which value we use, because the token is masked anyway. Using 0 makes more sense because when the seq has only one unmasked token, we are getting position ids with max value of 1, not 0

zucchini-nlp · 2026-02-06T12:01:13Z

-                    model_input = model_input[:, -current_input_length:]
+                    model_input = model_input[..., -current_input_length:]
                    model_input = model_input.clone(memory_format=torch.contiguous_format)


3D positions support

zucchini-nlp · 2026-02-06T12:02:20Z

-        position_ids, rope_deltas = self.vlm.model.get_rope_index(
-            input_ids=input_ids,
-            image_grid_thw=image_grid_thw,
-            video_grid_thw=None,
-            attention_mask=attention_mask,
-        )
-


why we called it here, no idea. Better to let self.vlm handle everything

zucchini-nlp · 2026-02-06T12:03:37Z

-        else:
-            if attention_mask is not None:
-                position_ids = attention_mask.long().cumsum(-1) - 1
-                position_ids.masked_fill_(attention_mask == 0, 1)
-                position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(attention_mask.device)
-                max_position_ids = position_ids.max(0, keepdim=False)[0].max(-1, keepdim=True)[0]
-                mrope_position_deltas = max_position_ids + 1 - attention_mask.shape[-1]
-            else:
-                position_ids = (
-                    torch.arange(input_ids.shape[1], device=input_ids.device)
-                    .view(1, 1, -1)
-                    .expand(3, input_ids.shape[0], -1)
-                )
-                mrope_position_deltas = torch.zeros(
-                    [input_ids.shape[0], 1],
-                    device=input_ids.device,
-                    dtype=input_ids.dtype,
-                )
-
-            return position_ids, mrope_position_deltas


same thing as it was for all get_rope_index, just deleted this part

github-actions · 2026-02-06T13:18:39Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	20268214	merge commit
PR	b3a9cb61	branch commit
main	49dd2979	base commit

✅ No failing test specific to this PR 🎉 👏 !

vasqu

Just left a few questions / nits, I have a feeling we can use modular a tad more re compute_3d_position_ids?

vasqu · 2026-02-09T17:26:38Z

        image_outputs.pooler_output = image_embeds
        return image_outputs

+    def compute_3d_position_ids(


Could we not inherit from Qwen2_5_VLModel? Or is there something specific, let's avoid rewriting where possible

Ernie uses mm_token_type_ids but Qwen2-5VL has second_grid_ts. We can do it if we hide extra kwargs as **kwargs, which basically will look like the above comment

Ah, yea that's a good point. On another note, do we want to change the other VLMs to use mm token type ids here? Iiirc, it's much faster(?)

we do! That is my next PR after this one is merged. I have some stuff locally :)

vasqu · 2026-02-09T17:32:55Z

run slow might have been broken so better to rerun after merging with main 👀

vasqu

Forgot to approve, can you also check if qwen3_5(_moe) have inherited as expected? They are essentially coming from qwen3 vl

zucchini-nlp · 2026-02-10T12:05:18Z

Oke, will run a few slow tests and merge

zucchini-nlp · 2026-02-10T13:29:08Z

run-slow: colqwen2, ernie4_5_vl_moe, gemma3, glm46v, glm4v, glm4v_moe, glm_image, glm_ocr, gpt_neo, idefics, paddleocr_vl, qwen2_5_vl, qwen2_vl, qwen3_vl, qwen3_vl_moe, reformer

github-actions · 2026-02-10T13:30:29Z

This comment contains run-slow, running the specified jobs:

models: ["models/colqwen2", "models/ernie4_5_vl_moe", "models/gemma3", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_ocr", "models/gpt_neo", "models/idefics", "models/paddleocr_vl", "models/qwen2_5_vl", "models/qwen2_vl", "models/qwen3_vl", "models/qwen3_vl_moe", "models/reformer"]
quantizations: []

github-actions · 2026-02-10T15:13:33Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	6fcbb100	merge commit
PR	3771d007	branch commit
main	44f92b63	base commit

Model CI Report

❌ 7 new failed tests from this PR 😭

colqwen2:
tests/models/colqwen2/test_modeling_colqwen2.py::ColQwen2ForRetrievalModelTest::test_torch_export
ernie4_5_vl_moe:
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_batch
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_expand
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_with_video
qwen2_5_vl:
tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py::Qwen2_5_VLIntegrationTest::test_small_model_integration_test_batch_different_resolutions
tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py::Qwen2_5_VLIntegrationTest::test_small_model_integration_test_batch_wo_image

Cyrilvallez · 2026-02-11T10:02:07Z

Hey! Sorry I'm a bit late to the party! Instead of having a dedicated _prepare_position_ids_for_generation method everywhere for a lot of models, wdyt about simply giving the responsability of creating position_ids to the model during the first forward? Similar to what we do with caches. That way, we don't have to maintain such additional methods, and generate reuse the prepared position_ids after first forward
@zucchini-nlp @vasqu

zucchini-nlp · 2026-02-11T10:22:30Z

If we make each model compute their position ids in forward (which already happens now in not-so-correct way), we can't just build upon it by incrementing to the next position. Position ids aren't returned from model like cache so we have to start returning them from forward to be able to re-use. Otherwise we just have to let each model re-compute positions from scratch every time, basically what happens now

Actually, I thought at first to make each BaseModel have their own method to compute_position_ids and generation simply calls base_model.compute_position_ids. Yet all models compute it the same way except for qwen-vl and paligemma, so why not just get it in generation mixin and override is special cases

zucchini-nlp · 2026-02-11T12:45:11Z

7 new failed tests from this PR 😭

Ah that was the issue with padding side, supposed to be "left". We don't recompute positions every time thus it doesn't work well with right padding

vasqu · 2026-02-11T13:00:58Z

Updated on the hub! So ernie should behave somewhat normally now, no idea why the default padding changed tbh

zucchini-nlp · 2026-02-11T13:24:26Z

run-slow: colqwen2, ernie4_5_vl_moe, qwen2_5_vl

github-actions · 2026-02-11T13:25:46Z

This comment contains run-slow, running the specified jobs:

models: ["models/colqwen2", "models/ernie4_5_vl_moe", "models/qwen2_5_vl"]
quantizations: []

github-actions · 2026-02-11T13:58:19Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	287c9e43	merge commit
PR	423e4b86	branch commit
main	b52b6631	base commit

Model CI Report

❌ 5 new failed tests from this PR 😭

ernie4_5_vl_moe:
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_batch
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_expand
tests/models/ernie4_5_vl_moe/test_modeling_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeSmallIntegrationTest::test_small_model_integration_test_with_video
qwen2_5_vl:
tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py::Qwen2_5_VLIntegrationTest::test_small_model_integration_test_batch_wo_image

zucchini-nlp · 2026-02-11T15:55:54Z

run-slow: ernie4_5_vl_moe, qwen2_5_vl

github-actions · 2026-02-11T15:57:15Z

This comment contains run-slow, running the specified jobs:

models: ["models/ernie4_5_vl_moe", "models/qwen2_5_vl"]
quantizations: []

Cyrilvallez · 2026-02-11T16:06:23Z

Position ids aren't returned from model like cache

Arhhh, you're right 🥲 Nevermind then!

github-actions · 2026-02-11T16:11:47Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	fb65d6f2	merge commit
PR	093c2329	branch commit
main	ac6cba66	base commit

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-02-12T09:10:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: colqwen2, ernie4_5_vl_moe, gemma3, glm46v, glm4v, glm4v_moe, glm_image, glm_ocr, gpt_neo, paddleocr_vl, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_vl, qwen3_vl_moe

zucchini-nlp added 3 commits January 13, 2026 15:48

draft smth for now

ca725cf

qwens

2ea6d96

fix assisted decoding, kinda. Get back later

929e2c1

zucchini-nlp added 3 commits February 4, 2026 15:17

fix assisted decoding same model

6c2a02f

pad tokens should have a position ids of 0, not 1

a2c8760

Merge branch 'main' into position-ids-precompute-once-per-generate

16d622f

zucchini-nlp added 5 commits February 4, 2026 16:53

qwens fix generate

cd03eb4

other 3d rope models

d4fe7d8

glm image

cad9751

colqwen

74b0be8

fix repo messed up with args, fix

819cecb

zucchini-nlp added 3 commits February 6, 2026 10:34

fix repo

704f4e6

fix qwen positions

9341731

fix?

b3a9cb6

zucchini-nlp commented Feb 6, 2026

View reviewed changes

zucchini-nlp requested review from Cyrilvallez and vasqu February 6, 2026 12:06

vasqu reviewed Feb 9, 2026

View reviewed changes

Merge branch 'main' into position-ids-precompute-once-per-generate

e818244

vasqu approved these changes Feb 10, 2026

View reviewed changes

update qwen3-5

3771d00

zucchini-nlp added 2 commits February 11, 2026 13:47

fix slow tests, ernie will fix its padding side in hub config

f02c3c1

Merge branch 'main' into position-ids-precompute-once-per-generate

cfc5841

fix repo

423e4b8

zucchini-nlp added 2 commits February 11, 2026 16:45

ernie!

fb025ca

qwen2-vl

093c232

zucchini-nlp enabled auto-merge (squash) February 12, 2026 08:20

zucchini-nlp added 2 commits February 12, 2026 09:20

Merge branch 'main' into position-ids-precompute-once-per-generate

ca11af4

bark can't embed inputs

d47bd71

zucchini-nlp merged commit 9d9b012 into huggingface:main Feb 12, 2026
25 checks passed

albertvillanova mentioned this pull request Feb 12, 2026

CI fails with dev dependencies: RuntimeError: The size of tensor a (6) must match the size of tensor b (3) at non-singleton dimension 0 huggingface/trl#5088

Closed

ydshieh mentioned this pull request Apr 5, 2026

Fix unexpected TF32 being enabled in testing #45252

Merged

		position_ids.masked_fill_(attention_mask == 0, 1)
		position_ids.masked_fill_(attention_mask == 0, 0)

Conversation

zucchini-nlp commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented Feb 4, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 4, 2026

Uh oh!

zucchini-nlp commented Feb 4, 2026

Uh oh!

github-actions Bot commented Feb 4, 2026

Uh oh!

github-actions Bot commented Feb 4, 2026

CI Results

Commit Info

Uh oh!

zucchini-nlp commented Feb 6, 2026

Uh oh!

github-actions Bot commented Feb 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 6, 2026

CI Results

Commit Info

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu commented Feb 9, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Feb 10, 2026

Uh oh!

zucchini-nlp commented Feb 10, 2026

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

github-actions Bot commented Feb 10, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

Cyrilvallez commented Feb 11, 2026

Uh oh!

zucchini-nlp commented Feb 11, 2026

Uh oh!

zucchini-nlp commented Feb 11, 2026

Uh oh!

vasqu commented Feb 11, 2026

Uh oh!

zucchini-nlp commented Feb 11, 2026

Uh oh!

github-actions Bot commented Feb 11, 2026

zucchini-nlp commented Feb 4, 2026 •

edited

Loading