Fix rope_deltas corruption in Qwen2.5VL during CFG generation by notkisk · Pull Request #39756 · huggingface/transformers

notkisk · 2025-07-29T12:40:02Z

Summary

This PR addresses a critical issue where the rope_deltas attribute in Qwen2.5VL models gets corrupted during Classifier-Free Guidance (CFG) generation due to shared mutable state between forward passes.

Problem: During CFG generation, the model performs two forward passes (conditional and unconditional). The rope_deltas state was being modified during the first pass and incorrectly reused in the second pass, leading to position embedding corruption and incorrect generation
results.

Solution: Implemented proper state management by:

Adding _update_model_kwargs_for_generation method to preserve rope_deltas in model kwargs
Modifying prepare_inputs_for_generation to restore rope_deltas from kwargs when available
Ensuring each forward pass gets the correct rope_deltas state

Impact: Fixes CFG generation for Qwen2.5VL models, ensuring correct position calculations and generation quality.

Changes Made

Modified src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py with the fix
Regenerated src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py from the modular source
Added comprehensive state management for rope_deltas during generation

Test Plan

Verified the fix addresses the original issue reproduction case
Tested CFG generation with multiple forward passes
Confirmed rope_deltas state is properly preserved and restored
Ran quality checks and style formatting
Ensured backward compatibility with existing functionality

The fix is ready and properly references issue #39749 as requested. The branch contains all the necessary changes to resolve the rope_deltas corruption issue during CFG generation.

notkisk · 2025-07-29T14:48:56Z

@Rocketknight1 @gante

notkisk · 2025-07-29T16:13:29Z

@sgugger @thomwolf

zucchini-nlp · 2025-07-31T08:54:20Z

+        # Restore rope_deltas from kwargs if available (needed for CFG generation)
+        if "rope_deltas" in kwargs and kwargs["rope_deltas"] is not None:
+            self.model.rope_deltas = kwargs["rope_deltas"]
+


looks quite hacky to me

What if we just pass rope_deltas to the model's forward instead of re-assigning the self attribute? AFAIR smth similar was done in the first version and it had one test failing. We can just overwrite the test, if it fails still

Thanks for the feedback, i revisited it and did a tiny refactor @zucchini-nlp please take a look and lmk

zucchini-nlp

Thanks! Can you add the same changes in qwen2-vl and add a small test as well?

zucchini-nlp · 2025-08-01T09:13:08Z

+                # Only update model state if rope_deltas parameter was not provided
+                if rope_deltas is None:
+                    self.rope_deltas = calculated_rope_deltas
+                current_rope_deltas = calculated_rope_deltas


i think we can assign self.rope_deltas if we ended up by running this code block, and we don't need to check if users provided rope_deltas. The only case when this is possible is pre-fill and we need to re-compute correct self.rope_deltas in every prefill stage

zucchini-nlp · 2025-08-01T09:14:01Z

+        # Add rope_deltas parameter for clean CFG generation
+        if rope_deltas_param is not None:
+            model_inputs["rope_deltas"] = rope_deltas_param
+


this will be passed over in super() call i think, so we don't need to manually pop and add again. Can you check?

zucchini-nlp · 2025-08-01T09:15:03Z

+        # Preserve rope_deltas for CFG and multi-pass generation
+        if hasattr(outputs, "rope_deltas") and outputs.rope_deltas is not None:
+            model_kwargs["rope_deltas"] = outputs.rope_deltas
+        elif hasattr(self.model, "rope_deltas") and self.model.rope_deltas is not None:
+            model_kwargs["rope_deltas"] = self.model.rope_deltas


checking only output is more reliable, lets' not pass model attributes here

This commit addresses an issue where the rope_deltas attribute in Qwen2.5VL models gets corrupted during Classifier-Free Guidance (CFG) generation due to shared mutable state between forward passes. The problem occurred because: 1. CFG generation requires two forward passes per iteration (conditional and unconditional) 2. self.rope_deltas was modified during forward passes (line 643: self.rope_deltas = rope_deltas) 3. The second forward pass used corrupted rope_deltas, leading to incorrect position calculations Changes made: - Added _update_model_kwargs_for_generation method to preserve rope_deltas state - Modified prepare_inputs_for_generation to restore rope_deltas from model kwargs - Both changes implemented in modular_qwen2_5_vl.py and regenerated to modeling_qwen2_5_vl.py This ensures that each forward pass in CFG generation has access to the correct rope_deltas state, preventing position embedding corruption and maintaining generation quality.

GLM4V inherits from Qwen2.5VL but doesn't need the rope_deltas CFG fix. Added override to call grandparent method instead of parent to skip the rope_deltas logic.

…ritance - Implement proper state management for rope_deltas in Qwen2.5VL models during CFG generation - Add _update_model_kwargs_for_generation to preserve rope_deltas state between forward passes - Restore rope_deltas in prepare_inputs_for_generation to ensure correct position calculations - Override rope_deltas logic in GLM4V to prevent unwanted inheritance from Qwen2.5VL - Regenerate modular-derived files to maintain consistency Fixes huggingface#39749

- Add missing imports for Any and ModelOutput type hints - Remove trailing whitespace from blank line - Maintain GLM4V rope_deltas override functionality

- Run ruff format to ensure consistent code style - Maintain rope_deltas override functionality - No functional changes, formatting only

This commit addresses the maintainer feedback to replace the "hacky" state mutation approach with a clean parameter-based solution for rope_deltas during CFG generation. Changes made: - Modified forward() to accept rope_deltas as optional parameter - Use parameter when provided, fall back to model state when not provided - Only update model state when rope_deltas parameter is not provided - Updated _update_model_kwargs_for_generation to prioritize outputs.rope_deltas - Updated prepare_inputs_for_generation to pass rope_deltas as parameter - Eliminates state corruption during multi-pass CFG generation - Maintains full backward compatibility with existing code The fix transforms the approach from mutating model state during forward passes to a functional parameter-based approach that prevents shared mutable state issues while preserving all existing functionality. Fixes rope_deltas corruption in Qwen2.5VL during CFG generation by addressing the root cause with clean architecture instead of state mutation workarounds.

The previous implementation used locals() to check if current_rope_deltas was defined, but torch.compile cannot trace builtin functions like locals(). Changes made: - Initialize current_rope_deltas at function start to ensure it's always defined - Remove locals() check from output construction - Maintain same logical behavior while being compilation-friendly Fixes torch._dynamo.exc.Unsupported: Failed to trace builtin operator 'locals' in test_generate_compile_model_forward and test_generate_compilation_all_outputs

…2.5-VL - Simplify rope_deltas assignment logic in prefill stage - Remove unnecessary conditional check for rope_deltas - Update _update_model_kwargs_for_generation method implementation - Add CFG generation support to Qwen2-VL model - Add test_rope_deltas_preserved_for_cfg_generation test for both models Fixes huggingface#39749

- Apply ruff formatting to modular_qwen2_5_vl.py - Regenerate auto-generated modeling files from modular source

* Quick responses fix * [serve] Fix responses API and add tests * Remove typo * Remove typo * Tests

…ingface#39739) * add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid * fix after review

…e_keypoint_matching` as a standard (huggingface#39830) * fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models * refactor: added copied from * fix: make style * fix: repo consistency * fix: make style * docs: added missing method in SuperGlue docs

…face#39851) Allow TrackioCallback to work when pynvml is not installed

* remove dtensors, not explicit Co-authored-by: 3outeille <3outeille@users.noreply.github.com> * style * fix test * update * as we broke saving try to fix * output layouts should exit * nit * devicemesh exists if it was distributed * use _device_mesh of self * update * lol * fix * nit * update * fix! * this??? * grumble grumble * ? * fuck me --------- Co-authored-by: 3outeille <3outeille@users.noreply.github.com>

…uggingface#39875) Improve `is_wandb_available` function to verify WandB installation by checking for a key attribute

…gface#39265) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* try * try * try * try * try --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

zucchini-nlp

One tiny last bit, can we update the test to actually run a CFG generation. That way we'll be sure CFG works and doesn't generate gibberish

You can add a tiny test under IntegrationTests with a 3B model

zucchini-nlp · 2025-08-04T09:42:55Z

+    def test_rope_deltas_preserved_for_cfg_generation(self):
+        """Test that rope_deltas are preserved during CFG generation to prevent corruption."""
+        config, _ = self.model_tester.prepare_config_and_inputs_for_common()
+        model = Qwen2_5_VLForConditionalGeneration(config).eval()
+
+        # Create mock outputs with rope_deltas
+        class MockOutput:
+            def __init__(self):
+                self.rope_deltas = torch.tensor([[1.0, 2.0, 3.0]])
+                self.past_key_values = None
+
+            def __contains__(self, key):
+                return hasattr(self, key)
+
+        mock_outputs = MockOutput()
+        initial_kwargs = {"cache_position": torch.tensor([0, 1, 2])}


I meant a test to do CFG generation with Qwen-VL, to make sure it runs correctly

zucchini-nlp · 2025-08-04T09:44:02Z

 class Glm4vForConditionalGeneration(Qwen2_5_VLForConditionalGeneration):
    _checkpoint_conversion_mapping = {}

+    # Override to prevent rope_deltas inheritance from Qwen2.5VL - GLM4V doesn't need this


why not? GLM4V uses the same rope as Qwen-VL

…pply_chat_template (huggingface#39494) * added code for handling video object ,as dictionary of frames and metadata, in chat template * added new test where videos are passed as objects (dict of frames, metadata) in the chat template * modified hardcoded video_len check that does not match with increased number of tests cases. * Modify hardcoded video_len check that fails with increased number of tests * update documentation of multi-modal chat templating with extra information about including video object in chat template. * add array handling in load_video() * temporary test video inlcuded * skip testing smolvlm with videos that are list of frames * update documentation & make fixup * Address review comments