Cumulative feature and defect fixes from recent Transformers PRs#42
Open
Cumulative feature and defect fixes from recent Transformers PRs#42
Conversation
…rom_pretrained When loading with device_map, HF's _move_missing_keys_from_meta_to_device replaces all non-persistent buffers with torch.empty_like() (garbage memory). Add a _init_weights handler for Granite4VisionTextRotaryEmbedding that recomputes inv_freq and original_inv_freq from config, so _initialize_missing_keys restores correct values after the corruption. Also adds Granite4VisionTextRotaryEmbedding as an explicit subclass in the modular file so the isinstance check resolves correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ert to pure functions - Delete downsampling_granite4_vision.py; move WindowQFormerDownsampler, interpolate_downsample, and spatial_offset_downsample into modular - Replace stateless InterpolateDownsampler/SpatialOffsetDownsampler classes with plain functions (items 2 and 4 from reviewer feedback) - Add config.qformer_config (Blip2QFormerConfig) as a proper sub-config field on Granite4VisionConfig following the Blip2Config pattern; remove inline Blip2QFormerConfig construction from WindowQFormerDownsampler.__init__ (item 3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the raw list-of-tuples return from get_image_features with a proper @DataClass ModelOutput subclass (Granite4VisionImageFeaturesOutput), following the Qwen3-VL BaseModelOutputWithDeepstackFeatures pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Next The image processors are identical to LlavaNextImageProcessor and LlavaNextImageProcessorPil; no need to re-define them. Map 'granite4_vision' to the LlavaNext processors in image_processing_auto.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 8: move query/image_positions init to _init_weights (embed_std pattern) - Item 9: rename _win/_unwin to _windowed_raster/_unwindowed_raster, replace single-letter vars with descriptive names - Item 10: add deepstack_features field to Granite4VisionModelOutputWithPast and Granite4VisionCausalLMOutputWithPast instead of reusing image_hidden_states - Item 11: use TransformersKwargs instead of FlashAttentionKwargs in Granite4VisionModel.forward; remove unused FlashAttentionKwargs import - Item 12: raise ValueError instead of warning_once for patch shape mismatch; remove now-unused logger - Item 13: drop use_image_newline_parameter (not used in released checkpoint) - Item 14: read pad_token_id from config.text_config instead of top-level config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 15: fix copyright to "2026 IBM and The HuggingFace Team" - Item 16: remove bibtex entry from docs - Item 17: remove torch_dtype/device_map from docs examples - Item 18: move Notes to "Usage Tips" section before code examples - Item 19: remove model_type from Granite4VisionProcessor - Item 20: revert AttributeError() (converter incompatible); keep del self. - Item 21: remove granite4_vision from conversion_mapping (PrefixWeights handles it) - Item 22: remove granite4_vision from check_repo DOC_MODEL_NAMES_NOT_IN_AUTO and HARDCODED_CONFIG_FOR_MODELS in auto_docstring (bad rebase entries) - Item 23: update test copyright, remove use_image_newline_parameter from tester, update skip reasons for get_image_features tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 20: drop get_image_token_mask override, use parent's get_placeholder_mask - Item 29: delete test_image_processing_granite4_vision.py (identical to LlavaNext) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pass output_attentions/output_hidden_states explicitly to language_model in Granite4VisionModel.forward (were swallowed as explicit params, not forwarded via **kwargs) - Collect all_hidden_states and all_self_attns in Granite4VisionTextModel layer loop; add output_attentions/output_hidden_states params - Fix qformer_config dict→object conversion to run before super().__post_init__() so _attn_implementation.setter doesn't hit a raw dict during sub_configs iteration - Use Blip2QFormerConfig directly in sub_configs (instead of AutoConfig) so save/load round-trip resolves the type correctly; add missing import to generated configuration_granite4_vision.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eech_plus.py From review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
- Remove unused imports - Add docs for params - Remove encode post_init to general post_init otherwise it doesn't work
…rleaving renames and converts ops
Qwen3.5 MoE GGUF files identify their architecture as "qwen35moe"
(matches gguf-py>=0.18.0's MODEL_ARCH.QWEN35MOE). Without an entry
here, loading a Qwen3.5 MoE GGUF dies with:
RuntimeError: Architecture qwen35moe not supported
Wire up four spots so Qwen3_5MoeForCausalLM can load a "qwen35moe"
GGUF end to end:
* integrations/ggml.py: GGUF_CONFIG_MAPPING["qwen3_5_moe_text"] for
the metadata-to-config translation. Includes feed_forward_length
for the non-MoE layers in the hybrid stack and the regular
expert_*_length keys.
* integrations/ggml.py: GGUF_TO_FAST_CONVERTERS["qwen3_5_moe_text"] =
GGUFQwen2Converter (Qwen3.5 reuses the qwen2/qwen3 BPE convention).
* integrations/ggml.py: GGUF_CONFIG_DEFAULTS_MAPPING["qwen3_5_moe_text"]
with norm_topk_prob=True. Same trap as qwen3_moe; llama.cpp's
qwen35moe.cpp normalizes routed expert weights, so the HF default
has to be overridden to keep routing math consistent.
* modeling_gguf_pytorch_utils.py: TENSOR_PROCESSORS["qwen35moe"] =
Qwen2MoeTensorProcessor for the fused 3-D ffn_*_exps splitting
into per-expert {gate,up,down}_proj. Without this, MoE expert
weights silently fall through to the default processor and aren't
sliced. Plus the matching qwen3_5_moe_text -> qwen35moe alias in
get_gguf_hf_weights_map.
Text-only target (qwen3_5_moe_text, not qwen3_5_moe) is intentional:
Qwen3_5MoeForCausalLM is backed by Qwen3_5MoeTextConfig, and Qwen3.5
MoE GGUF distributions ship as text-only; vision weights, when
present, ride in a co-located mmproj-*.gguf.
Adds a smoke test in tests/quantization/ggml/test_ggml.py marked
@unittest.skip because the only public Qwen3.5 MoE GGUF (~12.7 GB)
is too large for routine CI. Maintainers with a smaller fixture in
hand can drop the skip.
Signed-off-by: Lucas <lucas@eliteaero.com.br>
Follow-up to 73aa1cb. Map the remaining qwen35moe metadata that convert_hf_to_gguf actually writes (via Qwen3NextModel.set_gguf_parameters): * full_attention_interval -> kwarg consumed by Qwen3_5MoeTextConfig.__post_init__ to derive the hybrid layer_types list. Without this, layer_types silently falls back to the default interval of 4. The 35B-A3B target happens to use 4, but any future GGUF with a different cadence would load with the wrong attention pattern. * ssm.conv_kernel -> linear_conv_kernel_dim * ssm.state_size -> linear_key_head_dim * ssm.group_count -> linear_num_key_heads * ssm.time_step_rank -> linear_num_value_heads ssm.inner_size is derived (linear_value_head_dim * linear_num_value_heads) and has no direct config field, so it's mapped to None — linear_value_head_dim falls back to its config default, which matches the writer's contract. The keys flagged by review but not actually emitted by the qwen35moe converter path (rope.scaling.*, attention.sliding_window) are left unmapped on purpose; adding them now would be speculative. Signed-off-by: Lucas <lucas@eliteaero.com.br>
# Conflicts: # src/transformers/models/pi0/image_processing_pi0.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts: # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # src/transformers/models/auto/modeling_auto.py
# Conflicts: # src/transformers/models/colqwen2/modeling_colqwen2.py # src/transformers/models/colqwen2/modular_colqwen2.py
# Conflicts: # src/transformers/integrations/flex_attention.py # src/transformers/utils/generic.py
# Conflicts: # src/transformers/models/qwen3_5/modeling_qwen3_5.py
# Conflicts: # src/transformers/conversion_mapping.py # src/transformers/models/auto/configuration_auto.py # src/transformers/models/auto/image_processing_auto.py # utils/check_repo.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts: # src/transformers/models/gemma3n/modeling_gemma3n.py # src/transformers/models/gemma3n/modular_gemma3n.py
# Conflicts: # src/transformers/models/auto/configuration_auto.py # utils/check_config_attributes.py
# Conflicts: # tests/test_processing_common.py
Owner
Author
In-flight mergeability statusThis PR is a visibility snapshot for the running Current local run state at the time of this comment:
Classification counts
Terminal status counts
Notes
Recent local terminal records include:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Generated by the mergeability feature+defect flow. This PR publishes the committed snapshot at 64c1fe9 from the local features-and-defects-750 branch. Runtime .mergeability state remains local; uncommitted/conflicted worktree state is not included.