Cumulative feature and defect fixes from recent Transformers PRs by evalstate · Pull Request #42 · evalstate/transformers

evalstate · 2026-04-29T08:56:08Z

Generated by the mergeability feature+defect flow. This PR publishes the committed snapshot at 64c1fe9 from the local features-and-defects-750 branch. Runtime .mergeability state remains local; uncommitted/conflicted worktree state is not included.

…rom_pretrained When loading with device_map, HF's _move_missing_keys_from_meta_to_device replaces all non-persistent buffers with torch.empty_like() (garbage memory). Add a _init_weights handler for Granite4VisionTextRotaryEmbedding that recomputes inv_freq and original_inv_freq from config, so _initialize_missing_keys restores correct values after the corruption. Also adds Granite4VisionTextRotaryEmbedding as an explicit subclass in the modular file so the isinstance check resolves correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ert to pure functions - Delete downsampling_granite4_vision.py; move WindowQFormerDownsampler, interpolate_downsample, and spatial_offset_downsample into modular - Replace stateless InterpolateDownsampler/SpatialOffsetDownsampler classes with plain functions (items 2 and 4 from reviewer feedback) - Add config.qformer_config (Blip2QFormerConfig) as a proper sub-config field on Granite4VisionConfig following the Blip2Config pattern; remove inline Blip2QFormerConfig construction from WindowQFormerDownsampler.__init__ (item 3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the raw list-of-tuples return from get_image_features with a proper @DataClass ModelOutput subclass (Granite4VisionImageFeaturesOutput), following the Qwen3-VL BaseModelOutputWithDeepstackFeatures pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Next The image processors are identical to LlavaNextImageProcessor and LlavaNextImageProcessorPil; no need to re-define them. Map 'granite4_vision' to the LlavaNext processors in image_processing_auto.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Item 8: move query/image_positions init to _init_weights (embed_std pattern) - Item 9: rename _win/_unwin to _windowed_raster/_unwindowed_raster, replace single-letter vars with descriptive names - Item 10: add deepstack_features field to Granite4VisionModelOutputWithPast and Granite4VisionCausalLMOutputWithPast instead of reusing image_hidden_states - Item 11: use TransformersKwargs instead of FlashAttentionKwargs in Granite4VisionModel.forward; remove unused FlashAttentionKwargs import - Item 12: raise ValueError instead of warning_once for patch shape mismatch; remove now-unused logger - Item 13: drop use_image_newline_parameter (not used in released checkpoint) - Item 14: read pad_token_id from config.text_config instead of top-level config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Item 15: fix copyright to "2026 IBM and The HuggingFace Team" - Item 16: remove bibtex entry from docs - Item 17: remove torch_dtype/device_map from docs examples - Item 18: move Notes to "Usage Tips" section before code examples - Item 19: remove model_type from Granite4VisionProcessor - Item 20: revert AttributeError() (converter incompatible); keep del self. - Item 21: remove granite4_vision from conversion_mapping (PrefixWeights handles it) - Item 22: remove granite4_vision from check_repo DOC_MODEL_NAMES_NOT_IN_AUTO and HARDCODED_CONFIG_FOR_MODELS in auto_docstring (bad rebase entries) - Item 23: update test copyright, remove use_image_newline_parameter from tester, update skip reasons for get_image_features tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Item 20: drop get_image_token_mask override, use parent's get_placeholder_mask - Item 29: delete test_image_processing_granite4_vision.py (identical to LlavaNext) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Pass output_attentions/output_hidden_states explicitly to language_model in Granite4VisionModel.forward (were swallowed as explicit params, not forwarded via **kwargs) - Collect all_hidden_states and all_self_attns in Granite4VisionTextModel layer loop; add output_attentions/output_hidden_states params - Fix qformer_config dict→object conversion to run before super().__post_init__() so _attn_implementation.setter doesn't hit a raw dict during sub_configs iteration - Use Blip2QFormerConfig directly in sub_configs (instead of AutoConfig) so save/load round-trip resolves the type correctly; add missing import to generated configuration_granite4_vision.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eech_plus.py From review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

…from FSDP

- Remove unused imports - Add docs for params - Remove encode post_init to general post_init otherwise it doesn't work

…erter

…rleaving renames and converts ops

Qwen3.5 MoE GGUF files identify their architecture as "qwen35moe" (matches gguf-py>=0.18.0's MODEL_ARCH.QWEN35MOE). Without an entry here, loading a Qwen3.5 MoE GGUF dies with: RuntimeError: Architecture qwen35moe not supported Wire up four spots so Qwen3_5MoeForCausalLM can load a "qwen35moe" GGUF end to end: * integrations/ggml.py: GGUF_CONFIG_MAPPING["qwen3_5_moe_text"] for the metadata-to-config translation. Includes feed_forward_length for the non-MoE layers in the hybrid stack and the regular expert_*_length keys. * integrations/ggml.py: GGUF_TO_FAST_CONVERTERS["qwen3_5_moe_text"] = GGUFQwen2Converter (Qwen3.5 reuses the qwen2/qwen3 BPE convention). * integrations/ggml.py: GGUF_CONFIG_DEFAULTS_MAPPING["qwen3_5_moe_text"] with norm_topk_prob=True. Same trap as qwen3_moe; llama.cpp's qwen35moe.cpp normalizes routed expert weights, so the HF default has to be overridden to keep routing math consistent. * modeling_gguf_pytorch_utils.py: TENSOR_PROCESSORS["qwen35moe"] = Qwen2MoeTensorProcessor for the fused 3-D ffn_*_exps splitting into per-expert {gate,up,down}_proj. Without this, MoE expert weights silently fall through to the default processor and aren't sliced. Plus the matching qwen3_5_moe_text -> qwen35moe alias in get_gguf_hf_weights_map. Text-only target (qwen3_5_moe_text, not qwen3_5_moe) is intentional: Qwen3_5MoeForCausalLM is backed by Qwen3_5MoeTextConfig, and Qwen3.5 MoE GGUF distributions ship as text-only; vision weights, when present, ride in a co-located mmproj-*.gguf. Adds a smoke test in tests/quantization/ggml/test_ggml.py marked @unittest.skip because the only public Qwen3.5 MoE GGUF (~12.7 GB) is too large for routine CI. Maintainers with a smaller fixture in hand can drop the skip. Signed-off-by: Lucas <lucas@eliteaero.com.br>

Follow-up to 73aa1cb. Map the remaining qwen35moe metadata that convert_hf_to_gguf actually writes (via Qwen3NextModel.set_gguf_parameters): * full_attention_interval -> kwarg consumed by Qwen3_5MoeTextConfig.__post_init__ to derive the hybrid layer_types list. Without this, layer_types silently falls back to the default interval of 4. The 35B-A3B target happens to use 4, but any future GGUF with a different cadence would load with the wrong attention pattern. * ssm.conv_kernel -> linear_conv_kernel_dim * ssm.state_size -> linear_key_head_dim * ssm.group_count -> linear_num_key_heads * ssm.time_step_rank -> linear_num_value_heads ssm.inner_size is derived (linear_value_head_dim * linear_num_value_heads) and has no direct config field, so it's mapped to None — linear_value_head_dim falls back to its config default, which matches the writer's contract. The keys flagged by review but not actually emitted by the qwen35moe converter path (rope.scaling.*, attention.sliding_window) are left unmapped on purpose; adding them now would be speculative. Signed-off-by: Lucas <lucas@eliteaero.com.br>

# Conflicts: # src/transformers/models/pi0/image_processing_pi0.py

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

# Conflicts: # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # src/transformers/models/auto/modeling_auto.py

# Conflicts: # src/transformers/models/colqwen2/modeling_colqwen2.py # src/transformers/models/colqwen2/modular_colqwen2.py

# Conflicts: # src/transformers/integrations/flex_attention.py # src/transformers/utils/generic.py

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

# Conflicts: # src/transformers/models/qwen3_5/modeling_qwen3_5.py

# Conflicts: # src/transformers/conversion_mapping.py # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # utils/check_repo.py

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

# Conflicts: # src/transformers/models/gemma3n/modeling_gemma3n.py # src/transformers/models/gemma3n/modular_gemma3n.py

# Conflicts: # src/transformers/models/auto/configuration_auto.py # utils/check_config_attributes.py

# Conflicts: # tests/test_processing_common.py

evalstate · 2026-04-29T09:05:59Z

In-flight mergeability status

This PR is a visibility snapshot for the running features-and-defects-750 cumulative mergeability job. It is not intended as a normal review-ready upstream PR.

Current local run state at the time of this comment:

Remote PR branch snapshot: 64c1fe9c753a
Current local worktree HEAD: 61629ab2fd48
PRs classified locally: 220
PRs with terminal state locally: 220
Processing range so far: [Fix Phi4 test] Fall back to model config for image processor when trust_remote_code is False huggingface/transformers#45692 → Adding support for GraniteDoclingHybrid huggingface/transformers#44445

Classification counts

defect: 98
feature: 93
documentation: 16
other: 13

Terminal status counts

merged: 101
applied: 21
already_present: 10
validation_failed: 16
aborted: 43
skipped: 29

Notes

This run includes both defect and feature PRs; documentation/other categories are classified but skipped.
Some validation_failed entries are known light-validation/environment limitations rather than necessarily bad patches. In particular, audio-related targets requiring torchcodec/system FFmpeg have been removed from the light selector going forward.
The local worktree may be temporarily dirty/conflicted while the agent processes a PR. Only committed branch snapshots are pushed to this PR.
The context-handover hook is now installed in the card pack so future batches can stop cleanly before context pressure gets too high.

Recent local terminal records include:

Add FP8 kernel acceleration for compressed-tensors quantized models huggingface/transformers#45699 — feature — merged
[FA] Refactor FA CB kwargs huggingface/transformers#44553 — feature — aborted, codebase moved on
Fix assistant_masks for multimodal inputs in apply_chat_template huggingface/transformers#44543 — defect — merged
Fix crash in Qwen2_5_VLProcessor when using batched input with padding=False huggingface/transformers#44535 — defect — already present
Add qwen3 tts huggingface/transformers#44517 — feature — validation failed
[Gradient Ckpting] Remove unnecessary attribute definitions huggingface/transformers#44495/Placeholder tokens update huggingface/transformers#44467/Adding support for GraniteDoclingHybrid huggingface/transformers#44445 — feature — aborted, codebase moved on

github-actions Bot and others added 30 commits April 27, 2026 14:06

Apply repo consistency fixes

352eeb9

Merge branch 'main' into fsdp-v2-default-cleanup

a3eaed2

Address remaining review items 20 and 29

1e1eefe

- Item 20: drop get_image_token_mask override, use parent's get_placeholder_mask - Item 29: delete test_image_processing_granite4_vision.py (identical to LlavaNext) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

f2e164d

…eech_plus.py From review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

3205f66

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

a5f2ad1

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

9004f51

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

0dc156c

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

da929e1

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Fix EP+FSDP2: wrap EP-sharded params as DTensors and exclude experts …

deb916e

…from FSDP

mappings on classes, scoping for every transforms

7ad712a

fix style

c63a7d8

cleanup imports

17de22d

Fix some code issues

c4c9198

- Remove unused imports - Add docs for params - Remove encode post_init to general post_init otherwise it doesn't work

Merge remote-tracking branch 'upstream/main' into improve-weight-conv…

f2d7154

…erter

Fix tests for new config format and add GenerationIntegrationTest

4e025bf

Fix deduplication removes submodel mappings of the same type

8f726c7

Fix scoped WeightConverter not applied in the correct order, now inte…

fd2c613

…rleaving renames and converts ops

temp fix paligemma

28ed270

Apply _local() to expert biases under EP

24660f6

Fix import ordering

37c106b

evalstate and others added 29 commits April 29, 2026 08:29

Merge branch 'mergeability-pr-44771' into features-and-defects-750

e40fdff

# Conflicts: # src/transformers/models/pi0/image_processing_pi0.py

add compressed-tensor fp8 integeration

5f46f04

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'mergeability-pr-44731' into features-and-defects-750

297b062

Merge branch 'mergeability-pr-44724' into features-and-defects-750

4102d18

# Conflicts: # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # src/transformers/models/auto/modeling_auto.py

Merge branch 'mergeability-pr-44713' into features-and-defects-750

74e7ea5

# Conflicts: # src/transformers/models/colqwen2/modeling_colqwen2.py # src/transformers/models/colqwen2/modular_colqwen2.py

Merge branch 'mergeability-pr-44697' into features-and-defects-750

8e8a6bd

# Conflicts: # src/transformers/integrations/flex_attention.py # src/transformers/utils/generic.py

Rerun modular_model_converter

27be42a

Merge branch 'mergeability-pr-44680' into features-and-defects-750

d6cb5cf

Merge branch 'mergeability-pr-44676' into features-and-defects-750

bf523b9

Merge branch 'mergeability-pr-45695' into features-and-defects-750

998596f

update

f5a3168

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'mergeability-pr-44664' into features-and-defects-750

25ab701

# Conflicts: # src/transformers/models/qwen3_5/modeling_qwen3_5.py

Merge branch 'mergeability-pr-44662' into features-and-defects-750

5c1770e

# Conflicts: # src/transformers/conversion_mapping.py # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # utils/check_repo.py

update

59762f0

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'mergeability-pr-44660' into features-and-defects-750

3479d85

Merge branch 'mergeability-pr-44650' into features-and-defects-750

8e634f2

Merge branch 'mergeability-pr-44641' into features-and-defects-750

023c96b

update copyright

0facb18

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'mergeability-pr-44635' into features-and-defects-750

5b3f1ae

# Conflicts: # src/transformers/models/gemma3n/modeling_gemma3n.py # src/transformers/models/gemma3n/modular_gemma3n.py

Merge branch 'mergeability-pr-44626' into features-and-defects-750

119c0e5

Merge branch 'mergeability-pr-44615' into features-and-defects-750

e649bd2

Merge branch 'mergeability-pr-44606' into features-and-defects-750

d57b50a

Merge branch 'mergeability-pr-44603' into features-and-defects-750

92a6b57

Merge branch 'mergeability-pr-44594' into features-and-defects-750

a0cd47a

Merge branch 'mergeability-pr-44587' into features-and-defects-750

0aecbf7

Merge branch 'mergeability-pr-44585' into features-and-defects-750

d6392bb

Merge branch 'mergeability-pr-44569' into features-and-defects-750

e0e8e7f

# Conflicts: # src/transformers/models/auto/configuration_auto.py # utils/check_config_attributes.py

Merge branch 'mergeability-pr-45699' into features-and-defects-750

e25d2db

Merge branch 'mergeability-pr-44543' into features-and-defects-750

64c1fe9

# Conflicts: # tests/test_processing_common.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cumulative feature and defect fixes from recent Transformers PRs#42

Cumulative feature and defect fixes from recent Transformers PRs#42
evalstate wants to merge 763 commits intomainfrom
features-and-defects-750

evalstate commented Apr 29, 2026

Uh oh!

evalstate commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

evalstate commented Apr 29, 2026

Uh oh!

evalstate commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

In-flight mergeability status

Classification counts

Terminal status counts

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

evalstate commented Apr 29, 2026 •

edited

Loading