Skip to content

Cumulative feature and defect fixes from recent Transformers PRs#42

Open
evalstate wants to merge 763 commits intomainfrom
features-and-defects-750
Open

Cumulative feature and defect fixes from recent Transformers PRs#42
evalstate wants to merge 763 commits intomainfrom
features-and-defects-750

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Generated by the mergeability feature+defect flow. This PR publishes the committed snapshot at 64c1fe9 from the local features-and-defects-750 branch. Runtime .mergeability state remains local; uncommitted/conflicted worktree state is not included.

github-actions Bot and others added 30 commits April 27, 2026 14:06
…rom_pretrained

When loading with device_map, HF's _move_missing_keys_from_meta_to_device
replaces all non-persistent buffers with torch.empty_like() (garbage memory).
Add a _init_weights handler for Granite4VisionTextRotaryEmbedding that
recomputes inv_freq and original_inv_freq from config, so _initialize_missing_keys
restores correct values after the corruption. Also adds
Granite4VisionTextRotaryEmbedding as an explicit subclass in the modular file
so the isinstance check resolves correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ert to pure functions

- Delete downsampling_granite4_vision.py; move WindowQFormerDownsampler,
  interpolate_downsample, and spatial_offset_downsample into modular
- Replace stateless InterpolateDownsampler/SpatialOffsetDownsampler classes
  with plain functions (items 2 and 4 from reviewer feedback)
- Add config.qformer_config (Blip2QFormerConfig) as a proper sub-config field
  on Granite4VisionConfig following the Blip2Config pattern; remove inline
  Blip2QFormerConfig construction from WindowQFormerDownsampler.__init__ (item 3)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the raw list-of-tuples return from get_image_features with a
proper @DataClass ModelOutput subclass (Granite4VisionImageFeaturesOutput),
following the Qwen3-VL BaseModelOutputWithDeepstackFeatures pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Next

The image processors are identical to LlavaNextImageProcessor and
LlavaNextImageProcessorPil; no need to re-define them. Map
'granite4_vision' to the LlavaNext processors in image_processing_auto.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 8: move query/image_positions init to _init_weights (embed_std pattern)
- Item 9: rename _win/_unwin to _windowed_raster/_unwindowed_raster, replace
  single-letter vars with descriptive names
- Item 10: add deepstack_features field to Granite4VisionModelOutputWithPast and
  Granite4VisionCausalLMOutputWithPast instead of reusing image_hidden_states
- Item 11: use TransformersKwargs instead of FlashAttentionKwargs in
  Granite4VisionModel.forward; remove unused FlashAttentionKwargs import
- Item 12: raise ValueError instead of warning_once for patch shape mismatch;
  remove now-unused logger
- Item 13: drop use_image_newline_parameter (not used in released checkpoint)
- Item 14: read pad_token_id from config.text_config instead of top-level config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 15: fix copyright to "2026 IBM and The HuggingFace Team"
- Item 16: remove bibtex entry from docs
- Item 17: remove torch_dtype/device_map from docs examples
- Item 18: move Notes to "Usage Tips" section before code examples
- Item 19: remove model_type from Granite4VisionProcessor
- Item 20: revert AttributeError() (converter incompatible); keep del self.
- Item 21: remove granite4_vision from conversion_mapping (PrefixWeights handles it)
- Item 22: remove granite4_vision from check_repo DOC_MODEL_NAMES_NOT_IN_AUTO and
  HARDCODED_CONFIG_FOR_MODELS in auto_docstring (bad rebase entries)
- Item 23: update test copyright, remove use_image_newline_parameter from tester,
  update skip reasons for get_image_features tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Item 20: drop get_image_token_mask override, use parent's get_placeholder_mask
- Item 29: delete test_image_processing_granite4_vision.py (identical to LlavaNext)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pass output_attentions/output_hidden_states explicitly to language_model
  in Granite4VisionModel.forward (were swallowed as explicit params, not
  forwarded via **kwargs)
- Collect all_hidden_states and all_self_attns in Granite4VisionTextModel
  layer loop; add output_attentions/output_hidden_states params
- Fix qformer_config dict→object conversion to run before super().__post_init__()
  so _attn_implementation.setter doesn't hit a raw dict during sub_configs iteration
- Use Blip2QFormerConfig directly in sub_configs (instead of AutoConfig) so
  save/load round-trip resolves the type correctly; add missing import to
  generated configuration_granite4_vision.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eech_plus.py


From review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py


From a review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py


From a review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py


From a review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py


From a review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
…eech_plus.py


From a review by eustlb

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
- Remove unused imports
- Add docs for params
- Remove encode post_init to general post_init otherwise it doesn't work
Qwen3.5 MoE GGUF files identify their architecture as "qwen35moe"
(matches gguf-py>=0.18.0's MODEL_ARCH.QWEN35MOE). Without an entry
here, loading a Qwen3.5 MoE GGUF dies with:

    RuntimeError: Architecture qwen35moe not supported

Wire up four spots so Qwen3_5MoeForCausalLM can load a "qwen35moe"
GGUF end to end:

* integrations/ggml.py: GGUF_CONFIG_MAPPING["qwen3_5_moe_text"] for
  the metadata-to-config translation. Includes feed_forward_length
  for the non-MoE layers in the hybrid stack and the regular
  expert_*_length keys.
* integrations/ggml.py: GGUF_TO_FAST_CONVERTERS["qwen3_5_moe_text"] =
  GGUFQwen2Converter (Qwen3.5 reuses the qwen2/qwen3 BPE convention).
* integrations/ggml.py: GGUF_CONFIG_DEFAULTS_MAPPING["qwen3_5_moe_text"]
  with norm_topk_prob=True. Same trap as qwen3_moe; llama.cpp's
  qwen35moe.cpp normalizes routed expert weights, so the HF default
  has to be overridden to keep routing math consistent.
* modeling_gguf_pytorch_utils.py: TENSOR_PROCESSORS["qwen35moe"] =
  Qwen2MoeTensorProcessor for the fused 3-D ffn_*_exps splitting
  into per-expert {gate,up,down}_proj. Without this, MoE expert
  weights silently fall through to the default processor and aren't
  sliced. Plus the matching qwen3_5_moe_text -> qwen35moe alias in
  get_gguf_hf_weights_map.

Text-only target (qwen3_5_moe_text, not qwen3_5_moe) is intentional:
Qwen3_5MoeForCausalLM is backed by Qwen3_5MoeTextConfig, and Qwen3.5
MoE GGUF distributions ship as text-only; vision weights, when
present, ride in a co-located mmproj-*.gguf.

Adds a smoke test in tests/quantization/ggml/test_ggml.py marked
@unittest.skip because the only public Qwen3.5 MoE GGUF (~12.7 GB)
is too large for routine CI. Maintainers with a smaller fixture in
hand can drop the skip.

Signed-off-by: Lucas <lucas@eliteaero.com.br>
Follow-up to 73aa1cb. Map the remaining qwen35moe metadata that
convert_hf_to_gguf actually writes (via Qwen3NextModel.set_gguf_parameters):

* full_attention_interval -> kwarg consumed by
  Qwen3_5MoeTextConfig.__post_init__ to derive the hybrid layer_types
  list. Without this, layer_types silently falls back to the default
  interval of 4. The 35B-A3B target happens to use 4, but any future
  GGUF with a different cadence would load with the wrong attention
  pattern.
* ssm.conv_kernel    -> linear_conv_kernel_dim
* ssm.state_size     -> linear_key_head_dim
* ssm.group_count    -> linear_num_key_heads
* ssm.time_step_rank -> linear_num_value_heads

ssm.inner_size is derived (linear_value_head_dim *
linear_num_value_heads) and has no direct config field, so it's
mapped to None — linear_value_head_dim falls back to its config
default, which matches the writer's contract.

The keys flagged by review but not actually emitted by the qwen35moe
converter path (rope.scaling.*, attention.sliding_window) are left
unmapped on purpose; adding them now would be speculative.

Signed-off-by: Lucas <lucas@eliteaero.com.br>
evalstate and others added 29 commits April 29, 2026 08:29
# Conflicts:
#	src/transformers/models/pi0/image_processing_pi0.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts:
#	src/transformers/models/auto/configuration_auto.py
#	src/transformers/models/auto/image_processing_auto.py
#	src/transformers/models/auto/modeling_auto.py
# Conflicts:
#	src/transformers/models/colqwen2/modeling_colqwen2.py
#	src/transformers/models/colqwen2/modular_colqwen2.py
# Conflicts:
#	src/transformers/integrations/flex_attention.py
#	src/transformers/utils/generic.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts:
#	src/transformers/models/qwen3_5/modeling_qwen3_5.py
# Conflicts:
#	src/transformers/conversion_mapping.py
#	src/transformers/models/auto/configuration_auto.py
#	src/transformers/models/auto/image_processing_auto.py
#	utils/check_repo.py
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
# Conflicts:
#	src/transformers/models/gemma3n/modeling_gemma3n.py
#	src/transformers/models/gemma3n/modular_gemma3n.py
# Conflicts:
#	src/transformers/models/auto/configuration_auto.py
#	utils/check_config_attributes.py
# Conflicts:
#	tests/test_processing_common.py
@evalstate
Copy link
Copy Markdown
Owner Author

evalstate commented Apr 29, 2026

In-flight mergeability status

This PR is a visibility snapshot for the running features-and-defects-750 cumulative mergeability job. It is not intended as a normal review-ready upstream PR.

Current local run state at the time of this comment:

Classification counts

  • defect: 98
  • feature: 93
  • documentation: 16
  • other: 13

Terminal status counts

  • merged: 101
  • applied: 21
  • already_present: 10
  • validation_failed: 16
  • aborted: 43
  • skipped: 29

Notes

  • This run includes both defect and feature PRs; documentation/other categories are classified but skipped.
  • Some validation_failed entries are known light-validation/environment limitations rather than necessarily bad patches. In particular, audio-related targets requiring torchcodec/system FFmpeg have been removed from the light selector going forward.
  • The local worktree may be temporarily dirty/conflicted while the agent processes a PR. Only committed branch snapshots are pushed to this PR.
  • The context-handover hook is now installed in the card pack so future batches can stop cleanly before context pressure gets too high.

Recent local terminal records include:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.