Cumulative defect fixes from recent Transformers PRs by evalstate · Pull Request #41 · evalstate/transformers

evalstate · 2026-04-28T12:08:57Z

Cumulative defect fixes from recent Transformers PRs

This PR is generated by the all-defects mergeability flow. It accumulates defect-fix PRs from huggingface/transformers that could be applied cleanly to the current base.

Source branch: all-defects
Base: evalstate/transformers:main
Head: ae9e74bc2a
PRs classified: 500
PRs with terminal state: 500
Applied/merged/already-present defect fixes: 204
Aborted defect fixes: 38
Validation failures reverted: 31
Non-defect skipped: 227

Status counts

aborted: 38
already_present: 53
applied: 5
merged: 146
skipped: 227
validation_failed: 31

Category counts

defect: 273
documentation: 39
feature: 121
other: 67

Validation

Each applied defect fix was followed by the configured lightweight validation profile:

compileall -q src/transformers
utils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappings
utils/tests_fetcher.py ... && pytest ... when impacted pytest targets are selected

Note: this is intentionally not an end-to-end or slow-test validation pass.

Details

A detailed status table is posted as a PR comment and is also available locally in:

.mergeability/defect-merge-state.jsonl
.mergeability/pr-classifications.jsonl
all-defects-report.md

`flash_attention_forward` unconditionally called `s_aux.to(query.dtype)`, which crashed with `AttributeError: 'NoneType' object has no attribute 'to'` for models that don't use attention sinks (e.g. Gemma). Mirrors the parallel guard added in huggingface#40434 for `flash_paged.py`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

_init_weights() on `NemotronHPreTrainedModel` unconditionally overwrites `dt_bias` (random `inv_softplus(dt)`) and `out_proj.weight` (kaiming_uniform scaled by 1/sqrt(n_layer)) every time it is invoked on a mamba block. It sets `module.dt_bias._no_reinit = True` after the copy, but the flag is never checked by either code path (only the Linear-bias branch reads it). On transformers>=5.0, `_init_weights` is triggered a second time after `from_pretrained()` has loaded the checkpoint (the post-load safety pass that initializes tensors staying on `meta`). For `NemotronHForCausalLM` that silently overwrites the checkpoint values for `dt_bias` and `out_proj.weight` with fresh random draws. The model then outputs repetitive stop-word streams like ` and and and and ,` for any input. Minimal repro with any Nemotron-H checkpoint: from transformers import AutoConfig, AutoModelForCausalLM from safetensors.torch import load_file import json, pathlib path = ".../NVIDIA-Nemotron-Cascade-2-30B-A3B-BF16" # or Nano cfg = AutoConfig.from_pretrained(path); cfg._attn_implementation='eager' m = AutoModelForCausalLM.from_pretrained(path, config=cfg, torch_dtype='bfloat16') idx = json.loads((pathlib.Path(path) / 'model.safetensors.index.json').read_text())['weight_map'] k = 'backbone.layers.0.mixer.dt_bias' on_disk = load_file(f'{path}/{idx[k]}')[k] in_mem = m.backbone.layers[0].mixer.dt_bias print((on_disk.float() - in_mem.float().cpu()).abs().max()) # ~26.8 This patch makes `_init_weights` honour `_no_reinit` on both `dt_bias` and `out_proj.weight` (the only two params that re-init unconditionally), and sets `_no_reinit = True` on `out_proj.weight` after the initial kaiming scale so a second pass is a no-op. Ordinary fresh-init training is unaffected; only the second invocation becomes idempotent. Signed-off-by: Min Zhou <minzhou@virtueai.com>

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

# Conflicts: # src/transformers/loss/loss_utils.py

…bled

Direct merge conflicted after Trainer refactors; applied the minimal config-saving change from 57cb2b9.

evalstate · 2026-04-28T20:14:49Z

All-defects flow status

Processed terminal records: 500

Merged / applied defect records (204)

PR	Status	Method	Validation	Original PR summary / goal	Merge note
huggingface#45682	merged	merge	passed	FIX Restore LoRA hotswapping functionality	restored PEFT LoRA hotswap adapter loading; validation selected no pytest targets
huggingface#45681	merged	merge	passed	Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch	honored TokenizersBackend override for incorrect hub tokenizer class; validation selected no pytest targets
huggingface#45680	already_present	none		change got reverted	same DeepSeek/TokenizersBackend dispatch fix already merged via PR 45681; current code preserves spaces and existing/merged tests cover behavior
huggingface#45678	merged	merge	passed	Fix shared config mutation issue in flash_attn_from_config	deep-copied config in flash attention common test helper to avoid mutating shared config; validation selected no pytest targets
huggingface#45675	merged	merge	passed	Fix UnboundLocalError in shard_and_distribute_module for replicated parameters	guarded tensor-parallel attribute update when no shard plan exists; validation selected no pytest targets
huggingface#45671	merged	merge	passed	Update latest revision for Phi-4-multimodal test	updated Phi-4 multimodal test revisions to checkpoint state compatible with current code; validation selected no pytest targets
huggingface#45670	merged	merge	passed	[nit] glmasr should be in AutoModelForMultimodalLM	added missing glmasr/dia auto-model mappings; validation selected no pytest targets
huggingface#45665	already_present	none		Fix pageable H2D copies in Gated DeltaNet PyTorch fallback	fix already present in current cumulative branch as upstream squashed commit `ca72aa0`; cherry-pick of PR fix commit was empty
huggingface#45662	applied	cherry-pick	passed	Fix EP + FSDP2: experts silently overwritten by rank-0 broadcast	applied EP/FSDP2 DTensor wrapping and FSDP ignored-module fixes; validation selected no pytest targets
huggingface#45658	merged	merge	passed	Fix triggered by	fixed PeftConfigLike import/annotation so subclass initialization no longer raises NameError; validation selected no pytest targets
huggingface#45655	merged	merge	passed	Fix the order of `cls.config` resolution	fixed cls.config_class lookup order for loading VLM checkpoints into LLM classes; validation selected no pytest targets
huggingface#45651	merged	merge	passed	[Trainer] Optimize LengthGroupedSampler computation with select_columns and tqdm	optimized LengthGroupedSampler length extraction to avoid large-dataset iteration bottleneck; validation selected no pytest targets
huggingface#45650	merged	merge	passed	Fix KeyError for flash_attn in import_utils.py on Python 3.13Fix	fixed flash_attn availability lookup to avoid KeyError on Python 3.13; validation selected no pytest targets
huggingface#45649	merged	merge	passed	Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models	avoided full zero tensor allocation for FSDP2 cpu_ram_efficient_loading missing keys to prevent OOM; validation selected no pytest targets
huggingface#45642	merged	merge	passed	Fix trust_remote_code local cache collisions for local models (huggingface#45632)	fixed trust_remote_code local cache key collisions for local models; validation selected no pytest targets
huggingface#45641	merged	merge	passed	Fix NameError in serving CLI due to conditional import asymmetry	fixed serving CLI conditional import NameError paths; validation selected no pytest targets
huggingface#45639	merged	merge	passed	Make patched testing debug logs xdist-safe	made patched testing debug logs xdist-safe; validation selected no pytest targets
huggingface#45628	merged	merge	passed	[MistralCommonBackend] Soften validation mode and apply_chat_template arguments check	merged MistralCommonBackend validation softening; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45627	merged	merge	passed	Processing Utils: honor pre-built sub-processor kwargs in from_pretrained	merged AutoProcessor sub-processor kwarg fix; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45625	merged	merge	passed	Add `supports_gradient_checkpointing` to `NemotronHPreTrainedModel`	merged NemotronH gradient-checkpointing support flag; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45624	merged	merge	passed	Skip failing offloading tests	merged Gemma4 offloading test skips; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45622	merged	merge	passed	Fix peft constructors	merged PEFT constructor fix; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45620	already_present	none	not_run	Fix TypeError in video_processor_class_from_name when torchvision is not installed	current branch already has qwen3_omni mapping and video_processor_class_from_name compares directly against possibly-None values, so the TypeError fixed by the PR is no longer pre…
huggingface#45619	merged	merge	passed	Remove unnecessary generate warnings	merged generation config warning cleanup; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45615	merged	merge	passed	fix(qianfan_ocr): add XPU expectations	merged XPU expectations for Qianfan OCR integration tests; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45611	merged	merge	passed	Raise clear error for `problem_type="single_label_classification"` with `num_labels=1`	merged config validation for degenerate single-label classification; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45610	already_present	none	not_run	Fix configuration reading and error handling for kernels	current branch already contains the FP8 kernel error-handling and eager config-read fixes; direct PR-head merge conflicted only because the older branch lacks the newer laguna con…
huggingface#45606	merged	merge	passed	[gemma4] infer from config instead of hardcoding	merged Gemma4 audio relative-position config inference; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45605	merged	merge	passed	Processing Utils: continue when content is a string	merged processor string-content guard; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45603	merged	merge	passed	[Auto] Pass kwargs to fixed_cross_entropy (cluster-43240-3): merged 2 of 2 PRs	merged fixed_cross_entropy kwargs forwarding fix; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45602	merged	merge	passed	[AMD CI] Fix expectations for Gemma3n	merged Gemma3n AMD CI expectation fix; compile/lint passed; tests_fetcher selected no pytest targets
huggingface#45601	merged	merge	passed	fix: compute auxiliary losses when denoising is disabled in D-FINE	D-FINE auxiliary losses now compute when denoising is disabled; validation selected no pytest targets
huggingface#45598	merged	merge	passed	Align latest model attention function dispatch	aligned model attention dispatch methods; validation selected no pytest targets
huggingface#45596	merged	merge	passed	fix 2 failed test cases for blt model on XPU	updated BLT XPU test expectations; validation selected no pytest targets
huggingface#45594	already_present	none	passed	fix(utils): Resolve backbone utils test regressions	backbone utility test regression fix already present in cumulative branch
huggingface#45592	merged	merge	passed	fix padding side issue for fast_vlm tests	merged FastVLM padding-side test expectation fix; validation selected no pytest targets
huggingface#45591	merged	merge	passed	[nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight	merged Nemotron-H idempotent initialization guard; validation selected no pytest targets
huggingface#45590	already_present	none	passed	fix huggingface#45588: guard s_aux against None in flash_attention_forward	s_aux None guard is already present in current flash_attention.py; PR-head merge only conflicted on equivalent formatting
huggingface#45589	merged	merge	passed	Fix `AttributeError` on `s_aux=None` in `flash_attention_forward`	merged traceable flash_attention s_aux None guard; validation selected no pytest targets
huggingface#45582	merged	merge	passed	generate: drop stale num_return_sequences warning on continuous batching path	merged continuous batching num_return_sequences warning fix; validation selected no pytest targets
huggingface#45578	merged	merge	passed	Remove attribute_map from GptOssConfig	removed GptOssConfig attribute_map so checkpoint num_local_experts is not clobbered; validation selected no pytest targets
huggingface#45575	already_present	none		fix(generation): remove stale warning for num_return_sequences in paged generate	same stale num_return_sequences warning removal is already present via PR 45582; direct merge conflicted only on equivalent num_beams warning formatting in src/transformers/genera…
huggingface#45573	merged	merge	passed	fix transformers + torchao nvfp4 serialization	merged torchao NVFP4 serialization fix; validation selected no pytest targets
huggingface#45570	merged	merge	passed	Fix whisper long-form generation when eos_token_id is a list	merged Whisper eos_token_id list handling fix; validation selected no pytest targets
huggingface#45568	merged	merge	passed	Gemma4: fix failed test cases	merged Gemma4 failing-test fixes; validation selected no pytest targets
huggingface#45566	merged	merge	passed	fix: raise clear error when tokenizer config uses v5 list format on older versions	merged clearer tokenizer config version-mismatch error; validation selected no pytest targets
huggingface#45565	already_present	none		fix: remove stale num_return_sequences warning in paged generate	same stale num_return_sequences warning removal was already integrated via earlier PR 45582/45575 handling; no code change needed
huggingface#45564	merged	merge	passed	Gemma3n and Gemma4 cannot use rotary kernel	merged Gemma3n/Gemma4 rotary-kernel disablement; validation selected no pytest targets
huggingface#45562	merged	merge	passed	Updated the image cache for Paddle models according to the latest API	merged Paddle model cache/test update; validation selected no pytest targets
huggingface#45559	already_present	none		Drop noisy generate warnings when do_sample=False (or num_beams=1)	intended generation warning provenance fix is already present in src/transformers/generation/configuration_utils.py; direct PR head also contains broad unrelated churn, so no merg…
huggingface#45552	merged	merge	passed	Remove warnings for modernbert	merged ModernBERT auto-docstring warning suppression; validation selected no pytest targets
huggingface#45549	merged	merge	passed	fix: apply channel averaging correctly in audio feature extractors	merged channel-averaging correction in audio feature extractors; validation selected no pytest targets
huggingface#45548	merged	merge	passed	Fix EP + DeepSpeed ZeRO-3 loading via accelerate launch	merged Expert Parallelism plus DeepSpeed ZeRO-3 loading fix; validation selected no pytest targets
huggingface#45547	already_present	none	passed	Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection	disable_mmap safetensors FUSE-loading fix already present in cumulative branch; baseline validation selected no pytest targets
huggingface#45544	already_present	none	passed	fix table update versions	Python 3.10 dependency-table update fix already present in cumulative branch; baseline validation selected no pytest targets
huggingface#45541	merged	merge	passed	Fix local_files_only tokenizer fallback when tokenizer files are missing (Issue 45538)	merged local_files_only tokenizer fallback fix; validation selected no pytest targets
huggingface#45540	merged	merge	passed	Fix cross-attention cache layer type for T5Gemma2 long inputs	merged T5Gemma2 long-input cross-attention cache fix; validation selected no pytest targets
huggingface#45539	merged	merge	passed	[modular] Fix modular logic broken in huggingface#45045	merged modular converter inheritance regression fix; validation selected no pytest targets
huggingface#45533	already_present	none	passed	Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations	AMD CI torchvision/libjpeg fix and refreshed expectations already present in cumulative branch; baseline validation passed lint-only
huggingface#45530	merged	merge	passed	[CB] Changes for long generation	merged continuous-batching long-generation memory/cache fixes; validation selected no pytest targets
huggingface#45528	already_present	none	passed	qa: re-run modular converter when the script itself is modified	modular-converter checker rerun fix already present in cumulative branch; baseline validation passed lint-only
huggingface#45526	already_present	none	passed	xpu output align with cuda in test case	InternVL XPU expectation fix already present in cumulative branch; baseline validation passed lint-only
huggingface#45525	already_present	none	passed	Fix CSM `TextToAudioPipeline` missing `<bos>` token	CSM text-to-audio add_special_tokens pipeline fix already present in cumulative branch; baseline validation passed lint-only
huggingface#45523	merged	merge	passed	Fix Seq2SeqLM ExecuTorch export: add encoder_attention_mask to decoder and use static encoder shapes	merged seq2seq ExecuTorch encoder_attention_mask/static-shape export fix; validation selected no pytest targets
huggingface#45514	already_present	none	passed	Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models	GraniteMoeHybrid attention-only mamba-mask fix was already present in the cumulative branch; attempted merge produced no tree changes, and compileall/checkers passed before tests_…
huggingface#45513	already_present	none	passed	[Qwen3.5] Fix GDN linear attention multi-token cached forward	GDN linear-attention multi-token cached-forward fix was already present in cumulative branch; attempted merge produced no tree changes
huggingface#45510	merged	merge	passed	cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers	merged QuantizedLayer cache mutation propagation fix; validation selected no pytest targets
huggingface#45501	applied	patch	passed	.4021378118068288:e3506c3c5a98ec3a50332c6102362804_69e2e58842c82e665fe03f65.69e2e58c42c82e665fe03f69.69e2e58b8b8bd167e2faa23e:Trae CN.T(2026/4/18 09:59:40)	applied text-classification prediction label output fix; validation selected no runnable pytest targets
huggingface#45499	applied	patch	passed	.4021378118068288:94a295563cf6b5aa7d67bd0f2c0cd7a7_69e2df7342c82e665fe03edd.69e2df7542c82e665fe03ee1.69e2df758b8bd167e2faa23c:Trae CN.T(2026/4/18 09:33:41)	applied GLUE id2label mapping fix; validation selected no runnable pytest targets
huggingface#45498	already_present	none	passed	.4021378118068288:98296403e0cd6dedb7b420b80d0fe80b_69e2ce7c42c82e665fe03e0c.69e2ce9842c82e665fe03e10.69e2ce988b8bd167e2faa239:Trae CN.T(2026/4/18 08:21:44)	text-classification id2label prediction-output fix was already present from PR 45501 patch application
huggingface#45486	already_present	none	passed	fix: return empty tuple from import_protobuf_decode_error when protobuf is unavailable	protobuf decode error masking fix was already present in the cumulative branch; attempted merge produced no tree changes
huggingface#45483	already_present	none	passed	[`Conversion Mapping`] Small fixups	conversion mapping fixups were already present in the cumulative branch; direct merge conflicted only with newer adjacent mapping entries
huggingface#45466	already_present	none	passed	fix: return empty tuple when protobuf not available	protobuf decode error handling fix was already present; merge conflicted only on adjacent explanatory comments
huggingface#45444	merged	merge	unavailable	[`fix`] Always early return for non-Mistral models in _patch_mistral_regex	merge clean; compileall and style passed; light validation unavailable (baseline light validation failed and post-merge run exceeded tool timeout)
huggingface#45443	merged	merge	unavailable	Raise 400 on model mismatch when `transformers serve` is pinned	merge clean; compileall and style passed; light validation unavailable (baseline light validation failed and post-merge light run timed out)
huggingface#45441	merged	merge	unavailable	fix(DSV3): parity between native `DeepseekV3MoE` and remote official implementation	merge clean; compileall and style passed; light validation unavailable (baseline light validation failed and retry timed out)
huggingface#45437	merged	merge	unavailable	Fix spurious position_ids warnings for at least 40 architectures	merge clean; compileall and style passed; light validation unavailable (baseline light validation failed and post-merge light run timed out)
huggingface#45435	merged	merge	unavailable	do not index past decoded chars with special tokens	merge clean; compileall and style passed; light validation unavailable due same generated stdin SyntaxError seen in baseline after selecting tests
huggingface#45428	merged	merge	unavailable	[`fix`] PEFT integration fixes preventing save/load & integration	merge clean; compileall and checkers passed; light validation unavailable due baseline generated stdin SyntaxError and post-merge light run exceeded tool timeout
huggingface#45427	merged	merge	unavailable	fix(testing_utils): guard get_device_capability() with torch.cuda.is_available()	merge clean; compileall and checkers passed; light validation unavailable due baseline generated stdin SyntaxError and post-merge light run timed out
huggingface#45423	merged	merge	unavailable	Fix void segmentation map label reduction	merge clean; compileall and checkers passed; light validation unavailable (baseline generated stdin SyntaxError; post-merge tests_fetcher checkout blocked by transient untracked g…
huggingface#45422	merged	merge	unavailable	Drop `content=None` from messages in `apply_chat_template`	merge clean; compileall and checkers passed; light validation unavailable (baseline light validation timed out and post-merge light validation timed out)
huggingface#45421	merged	merge	unavailable	Improve nested `base_model_prefix` handling in weight conversion and loading	merge clean; compileall and checkers passed; light validation unavailable (baseline and post-merge light validation timed out)
huggingface#45420	merged	merge	unavailable	🚨 [`Kernels`] Fix kernel function registration	merge clean; compileall and checkers passed; light validation unavailable (baseline and post-merge light validation timed out)
huggingface#45418	already_present	none	unavailable	[serve] Forward `tool_calls`/`tool_call_id` in processor inputs	fix already present in cumulative branch: BaseHandler forwards tool_calls and tool_call_id and has matching LLM/VLM tests; direct PR merge only conflicted with newer raw-content/a…
huggingface#45414	merged	merge	unavailable	Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active	merge completed with straightforward conflict resolution preserving newer hidden-kernel registration plus PR DeepSpeed ZeRO-3 guard; compileall and checkers passed; light validati…
huggingface#45413	merged	merge	unavailable	Fix EtaLogitsWarper on fully masked logits	merge clean; compileall and checkers passed; light validation unavailable due environment (baseline timed out; post-merge selected targets but failed importing tests/cli/conftest.…
huggingface#45411	merged	merge	unavailable	Fix the response schema for the gemma4 converter	merge clean; compileall and checkers passed; light validation unavailable (baseline and post-merge light validation timed out)
huggingface#45410	merged	merge	unavailable	fix(altclip): fix failing tests	merge clean; compileall and checkers passed; light validation unavailable (baseline and post-merge light validation timed out)
huggingface#45407	merged	merge	unavailable	avoid wrap 4bit-quantized model into DP	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45404	merged	merge	unavailable	[`Tokenizers`] Move gpt sw3 tokenizer out	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45403	merged	merge	unavailable	fix(clipseg): fix 2 failing tests	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout (initial post-merge run also hit transient untracked files from tests_fetcher chec…
huggingface#45402	merged	merge	unavailable	Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45400	merged	merge	unavailable	Fix Qwen2.5VL temporal grid positions	merge completed with simple test expectation conflict resolution; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45395	already_present	none	not_run	Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active	fix already present from later PR huggingface#45414: use_kernelized_func skips registration under DeepSpeed ZeRO-3; current branch also preserves newer hidden-kernel registration
huggingface#45394	merged	merge	unavailable	fix(x_clip): fix 8 failed test cases	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45389	merged	merge	unavailable	Require input_ids for repetition penalty	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45388	merged	merge	unavailable	Make Gemma4ClippableLinear inherit from nn.Linear for PEFT/LoRA compatibility	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45386	already_present	none	not_run	[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load	PR merged as an empty tree change because the GGUF early dtype-casting changes are already present in the cumulative branch
huggingface#45385	merged	merge	unavailable	Ignore CLIP position_ids in unexpected key loading report	merge clean; compileall and checkers passed; light validation unavailable after baseline timeout
huggingface#45383	already_present	none	not_run	fix(processing): guard message content access in apply_chat_template	current processing_utils already uses message.get('content') or [] in tokenize=True and additionally handles string content, so the missing-content KeyError fix is present despite…
huggingface#45380	merged	merge	unavailable	fix Qwen3_5MoeVisionConfig deepstack_visual_indexes silently dropped by @strict (Issue: huggingface#45375)	merged Qwen3.5 deepstack_visual_indexes config fix; compile and ruff passed, light validation timed out as in baseline
huggingface#45379	already_present	none	not_run	fix(config): add deepstack_visual_indexes to Qwen3_5MoeVisionConfig	same issue huggingface#45375 fixed by preceding PR 45380; current Qwen3.5/Qwen3.5-MoE configs already declare deepstack_visual_indexes, and direct merge only conflicted on duplicate generate…
huggingface#45371	already_present	none	not_run	fix: check CUDA availability before calling get_device_capability	current get_device_properties already guards CUDA/ROCm capability lookup with torch.cuda.is_available(), so the no-GPU CUDA crash fix is present despite textual merge conflict
huggingface#45369	merged	merge	unavailable	fix(generation): handle CUDA multinomial limit in beam search sampling	merged CUDA multinomial category-limit guard; compile and ruff passed, light validation timed out as in baseline
huggingface#45368	already_present	none	not_run	fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation	current serving utils already resolves the Rust tokenizer through getattr(processor, "tokenizer", processor)._tokenizer while preserving newer tool-call streaming and has_talker h…
huggingface#45359	merged	merge	unavailable	Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError	merge clean; compileall and checkers passed; light validation timed out as in baseline
huggingface#45358	already_present	none	not_run	Fix vlm weight mappings	current conversion_mapping already contains the VLM mapping fixes for llava/llava_next/fuyu/mllama/emu3/qwen2_vl-style models; direct merge conflicted only with newer clip model a…
huggingface#45354	merged	merge	unavailable	fix gemma4 gradient accumulation loss and last token incorrect labels	merge clean after trivial conflict resolution preserving newer Gemma3n embedding accessors; compileall and checkers passed; light validation timed out as in baseline
huggingface#45352	merged	merge	unavailable	fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward	merge clean; compileall and checkers passed; light validation timed out as in baseline
huggingface#45351	merged	merge	unavailable	fix(testing_utils): guard get_device_capability with torch.cuda.is_available()	merge clean; compileall and checkers passed; light validation timed out as in baseline
huggingface#45348	already_present	none	not_run	Fix `apply_chat_template` crash on `tool_call` messages without content	current branch already handles missing or null message content in processing_utils and serving conversion and includes broader tool-call content coverage; direct merge conflicted …
huggingface#45347	merged	merge	unavailable	[gemma4] Fix device map auto	merge completed with practical conflict resolution for newer Gemma4 class/test changes; compileall and checkers passed after removing duplicate generated class fields; light valid…
huggingface#45346	merged	merge	unavailable	Fix Double Application of Softmax for Router Logits in MoE models	merge clean; PR head only added regression tests because implementation fix was already present; compileall and checkers passed; light validation timed out as in baseline
huggingface#45345	merged	merge	unavailable	Fix ByteLevel-BPE tokenizers silently breaking in	merge clean; compileall and checkers passed; light validation timed out as in baseline
huggingface#45340	already_present	none	not_run	Fix conversion mappings for vlms	current branch already contains upstream merge commit for huggingface#45340; direct PR-head merge conflicted with newer conversion mapping and WeightTransform refactors
huggingface#45336	already_present	none	not_run	[gemma4] Remove all shared weights, and silently skip them during loading	current branch already contains upstream merge commit for huggingface#45336
huggingface#45330	already_present	none	not_run	Fix Qwen2.5-VL temporal RoPE scaling applied to still images	current branch already contains upstream merge commit for huggingface#45330
huggingface#45328	already_present	none	not_run	Update Gemma4 weight conversion script	current branch already contains upstream merge commit for huggingface#45328
huggingface#45324	already_present	none	not_run	Gemma4 resizing per layer inputs	current branch already contains upstream merge commit for huggingface#45324
huggingface#45323	already_present	none	not_run	[CB] Fix capture of max_seqlen	current branch already contains upstream merge commit for huggingface#45323
huggingface#45320	merged	merge	unavailable	Fix AttributeError in AssistantToTargetTranslator.unmap_input_ids with cross-vocab models	merge clean; validation unavailable because light validation tests_fetcher cannot checkout commits with pre-existing untracked files in worktree
huggingface#45318	merged	merge	unavailable	fix: leak in tokenizer registry for `test_processors`	merge clean; validation unavailable because light validation tests_fetcher cannot checkout commits with pre-existing untracked files in worktree
huggingface#45317	merged	merge	unavailable	Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True	merge clean; validation unavailable because light validation tests_fetcher cannot checkout commits with pre-existing untracked files in worktree
huggingface#45316	merged	merge	unavailable	Logger has `[transformers]` prefix in non-verbose mode	merge clean; validation unavailable because light validation tests_fetcher cannot checkout commits with pre-existing untracked files in worktree
huggingface#45311	merged	merge	unavailable	resize_token_embeddings does not effect to output_embeddings	merge clean; validation unavailable because light validation tests_fetcher cannot checkout commits with pre-existing untracked files in worktree
huggingface#45302	merged	merge	unavailable	fix(security): prevent untrusted users from triggering TRL CI dispatch	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45300	merged	merge	unavailable	Fix Nemotron-H: add mlp layer type support	merge completed after resolving mechanical conflicts from current layer_types API/NemotronHMLP base; compile and repository checkers passed, but light validation unavailable due t…
huggingface#45297	merged	merge	unavailable	Fix mutable default arguments in quantization config classes	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45293	merged	merge	unavailable	Fix "AttributeError: NewTokenizer has no attribute special_attribute_present" (Remove `REGISTERED_FAST_ALIASES`)	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45289	merged	merge	unavailable	Less unnecessary RoPE warnings	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45286	merged	merge	unavailable	fix(nomic_bert): auto-fix failing tests	merge completed with practical conflict resolution; compile and repository checkers passed, but light validation was unavailable because tests_fetcher checkout was blocked by tran…
huggingface#45284	merged	merge	unavailable	[AMD CI] Fix Qwen2 expectations	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45282	merged	merge	unavailable	[AMD CI] Fix torch.compile/export failures on AMD CI due to untraceable set.contains	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45281	merged	merge	unavailable	Fix resize failure caused by zero-sized masks in PP-DocLayoutV3	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45277	merged	merge	unavailable	Fix AttributeError in Gemma3ForConditionalGeneration and Gemma3ForSequenceClassification when config.return_dict=False	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45275	applied	patch	unavailable	fix(ernie4_5_vl_moe): resolve three config loading failures for ERNIE-4.5-VL MoE models	ported ERNIE-4.5-VL MoE legacy model_type mapping and moe_num_experts list typing; compile and checkers passed, light validation unavailable due missing requests dependency
huggingface#45273	merged	merge	unavailable	fix: liger unnecessarily materializes logits in VRAM during eval, causing OOM	merge clean; compile and repository checkers passed, but light validation timed out as in baseline under tool limit
huggingface#45272	merged	merge	unavailable	Fix redundant logic in video processing SmolVLM	merge clean; compile and repository checkers passed, but light validation unavailable due missing requests dependency
huggingface#45691	merged	merge	unavailable	[serve] cb error	merge clean; compile and repository checkers passed, but light validation unavailable because baseline pytest collection fails without requests dependency
huggingface#45263	merged	merge	unavailable	Add `hasattr(torch.backends.cudnn, "conv")` to `conftest.py`	merge clean; compile and repository checkers passed, but light validation unavailable because baseline pytest collection fails without requests dependency
huggingface#45257	merged	merge	unavailable	[Gemma4] Fix chat template and stop tokens for OpenAI tool calling compatibility	merge clean; compile and repository checkers passed, but light validation unavailable because baseline pytest collection fails without requests dependency
huggingface#45253	merged	merge	unavailable	Fix Gemma4 producing bad logits	merge clean; compile and repository checkers passed, but light validation timed out as it did in baseline validation
huggingface#45252	merged	merge	unavailable	Fix unexpected TF32 being enabled in testing	merge clean; compile and repository checkers passed, but light validation timed out as it did in baseline validation
huggingface#45247	merged	merge	unavailable	Fix UnboundLocalError in invert_attention_mask by adding proper shape…	merge clean; compile and repository checkers passed, but light validation timed out as it did in baseline validation
huggingface#45240	merged	merge	unavailable	fix: restore mypy type checking for PreTrainedConfig subclasses (huggingface#45071)	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45238	merged	merge	unavailable	Update `get_test_info.py` (related to tiny model creation)	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45236	merged	merge	unavailable	resize_token_embeddings does not resize lm_head	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45226	already_present	none	not_run	fix: handle trailing replacement character in Whisper word timestamp decoding	Whisper trailing replacement-character bounds check is already present on the cumulative branch
huggingface#45225	merged	merge	unavailable	fix: hf-doc-builder insallation was failing	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45224	merged	merge	unavailable	remove unnecessary entries in some auto model mappings	merge clean after trivial mapping conflict resolution preserving current dia entry; compile and repository checks passed, light validation timed out as in baseline
huggingface#45223	merged	merge	unavailable	Fix: ObjectDetectionPipeline batch inference only returns first image results	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45222	already_present	none	not_run	fix(gemma3, gemma4): default token_type_ids to zeros for text-only training	current Gemma3/Gemma4 mask code no longer raises when token_type_ids/mm_token_type_ids are absent during training, so the text-only training fix is already subsumed
huggingface#45221	merged	merge	unavailable	user friendly error when loading audio from video	merge clean after preserving current timeout plumbing while adding video-file error; compile and repository checks passed, light validation timed out as in baseline
huggingface#45214	merged	merge	unavailable	cohere_asr: fix bug for model_parallel_beam_search test case	merge clean; compile and repository checks passed, light validation timed out as in baseline
huggingface#45211	merged	merge	unavailable	[Qwen3MoE] Fix wrong return type annotation in Qwen3MoeSparseMoeBlock.forward	merge clean; compileall and repository checkers passed; light validation timed out as in baseline
huggingface#45210	already_present	none	not_run	Fix pypi release	release workflow twine check/verbose publish and hf-doc-builder dependency-table changes are already present; current Python-version guard supersedes the older conflicting guard
huggingface#45204	merged	merge	unavailable	fix bug for videomt model device mismatch	merge clean; compileall and repository checkers passed; removed transient untracked docs that blocked tests_fetcher checkout; light validation then timed out as in baseline
huggingface#45202	merged	merge	unavailable	Fix gemma4 has flash-attention incompatbile head-dim=512	merge clean; compileall and repository checkers passed; light validation selected 458 targets but failed before tests because validation environment lacks requests
huggingface#45199	merged	merge	unavailable	fix(models): Resolve regressions in Wav2Vec2PhonemeCTCTokenizer (wav2vec2-lv-60-espeak-cv-ft)	merge clean; compileall and repository checkers passed; light validation timed out as in baseline before reporting selected targets
huggingface#45193	merged	merge	unavailable	Config can apply pyndatic validation without torch-dependence	merge clean; compileall and checker commands passed; light validation unavailable because selected CLI tests require missing dependency requests in validation environment
huggingface#45188	merged	merge	unavailable	fix `test_register_result_handler`	merge clean; compileall and checker commands passed; light validation unavailable because tests_fetcher could not checkout parent with pre-existing untracked files in worktree
huggingface#45185	merged	merge	unavailable	Generalize gemma vision mask to videos	merge clean; compileall and checker commands passed; light validation unavailable because tests_fetcher checkout was blocked by pre-existing untracked files
huggingface#45173	merged	merge	unavailable	[misc] fix qwen35 tests: correct the text model type and skip reverse_mapping	merge clean; compileall and checker commands passed; light validation unavailable because tests_fetcher checkout was blocked by pre-existing untracked files
huggingface#45171	already_present	none	not_run	Fix Sam3Processor missing input_boxes_labels for padded None entries	Sam3Processor default input_boxes_labels generation and regression tests for padded None entries are already present on the cumulative branch
huggingface#45169	merged	merge	unavailable	Fix explicit local code resolution for tokenizers and image processors	merge clean; compileall and checker commands passed, but light validation unavailable because tests_fetcher cannot checkout commits while pre-existing untracked source files would…
huggingface#45166	already_present	none	not_run	Re-add regex substitutions to the response parsing spec	response parsing support for x-regex-substitutions and corresponding tests are already present on the cumulative branch; direct PR-head merge only conflicted with later added Gemm…
huggingface#45165	already_present	none	not_run	Fix missing image processors backends	image processor backend fixes from PR huggingface#45165 are already present on the cumulative branch
huggingface#45164	already_present	none	not_run	Fix TypeError: 'NoneType' object is not iterable in GenerationMixin.generate	GenerationMixin now guards None layer_types with an empty iterable; PR huggingface#45164 is already present on the cumulative branch
huggingface#45163	already_present	none	not_run	tweak checkers output on errors	checker traceback/error-output changes and tests from PR huggingface#45163 are already present on the cumulative branch
huggingface#45156	merged	merge	unavailable	Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45147	merged	merge	unavailable	Fix broken HQQ support	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45140	merged	merge	unavailable	Fix stupid test fetcher	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45138	merged	merge	unavailable	CI] Small T5 expectations updated	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45136	already_present	none	unavailable	Fix huggingface#45127: Auto-fix diverged tie_word_embeddings config on save to prevent silent weight corruption	the cumulative branch already contains the save_pretrained diverged tied-embedding config fix and regression test; direct PR merge conflicts only with the already-present equivale…
huggingface#45131	merged	merge	unavailable	Fix MoE routers returning probabilities instead of logits	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45129	already_present	none	unavailable	fix(config): annotate PreTrainedConfig.dtype as Any to fix pydantic schema generation (huggingface#45070)	current cumulative branch already annotates PreTrainedConfig.dtype as Any via dtype_validator, covering the pydantic forward-reference fix; PR conflicts only against this evolved …
huggingface#45124	merged	merge	unavailable	[Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45123	merged	merge	unavailable	Fix PP test_ocr_queries	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45119	already_present	none	unavailable	Fix: Preserve PreTrainedConfig init signatures for type checkers (fixes huggingface#45071)	fix code already present in cumulative branch via preserved signature in wrap_init_to_accept_kwargs; PR merge only conflicted on tests adjacent to existing config-subclass cov…
huggingface#45117	merged	merge	unavailable	Copy the template resolution logic from the base apply_chat_template to Voxtral	merge clean; compileall/checkers passed, but light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files also present at baseline
huggingface#45111	already_present	none	unavailable	Fix double softmax in MoE router load-balancing loss	PR head is already an ancestor of the cumulative branch; no code change needed
huggingface#45108	merged	merge	unavailable	Fix type to allow	merge clean; validation unavailable: run-light-validation cannot checkout comparison commits because existing untracked files would be overwritten
huggingface#45107	merged	merge	unavailable	Fix `text-to-speech` pipeline crash when generation config contains `None` values	merge clean; validation unavailable: run-light-validation cannot checkout comparison commits because existing untracked files would be overwritten
huggingface#45098	merged	merge	unavailable	fix: incomplete string literal causes syntax error in config docstring checker	merge clean; validation unavailable: run-light-validation cannot checkout comparison commits because existing untracked files would be overwritten
huggingface#45091	merged	merge	unavailable	Fix _get_feat_extract_output_lengths in qwen3_omni_moe	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45090	merged	merge	unavailable	Fix TypeError when chat_template is None in VoxtralProcessor	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45089	already_present	none	unavailable	fix: use sys.modules.get() to avoid KeyError in modeling_utils	fix already present: current modeling_utils already uses sys.modules.get(cls.module) and checks class_module is None in both attention and experts implementation probes; attem…
huggingface#45088	merged	merge	unavailable	fix audio encoder output length formula in qwen3_omni_moe	merge clean; changes overlapped with already-merged Qwen3 Omni MoE length fix; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout…
huggingface#45086	merged	merge	unavailable	fix AttributeError in _patch_mistral_regex	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45085	merged	merge	unavailable	Fix dtype mismatches in SwitchTransformers and TimmWrapperModel for bfloat16	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45080	merged	merge	unavailable	Fix PreTrainedConfig as Pydantic field type after dataclass conversion	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45079	already_present	none	unavailable	Fix resized LM head weights being overwritten by post_init	intended LM head initialization marker is already present in modeling_utils.py; PR head conflicts only on direct assignment versus setattr spelling
huggingface#45078	already_present	none	unavailable	throw error when conversion required	current tokenizer auto dispatch already contains PR behavior: fallback to TokenizersBackend then raises when conversion backend is unavailable; direct PR head also includes unrela…
huggingface#45077	already_present	none	unavailable	fix: pin 50 unpinned actions to commit SHA, extract 1 secret to env var	workflow hardening is already present on the cumulative branch: actions are pinned to SHAs; direct merge conflicts are formatting/comment spacing across updated workflow files
huggingface#45074	merged	merge	unavailable	fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45069	merged	merge	unavailable	Fix TypeError in rope validation when ignore_keys is a list	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45062	merged	merge	unavailable	Add regression test for ByteLevel added-token Unicode decode corruption	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45061	merged	merge	unavailable	[`FA`] Fix BC support for a few versions + add deprecation cycle	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45060	merged	merge	unavailable	Fix PIL backend fallback when torchvision is unavailable	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45057	already_present	none	unavailable	[serving] Fix continuous batching JSON response serialization	fix already present after serving code was split into src/transformers/cli/serving: non-streaming JSONResponse uses model_dump and regression test exists; direct merge conflicted …
huggingface#45056	merged	merge	unavailable	[`auto_docstring`] needs to be only run on doc	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files
huggingface#45055	applied	patch	unavailable	Save model config in Trainer checkpoints for non-PreTrainedModel models	direct merge conflicted in src/transformers/trainer.py after Trainer refactors; applied minimal config save in current _save path; compileall and repository checks passed; light v…
huggingface#45053	merged	merge	unavailable	Fix failing `XCLIPModelIntegrationTest`	merge clean; compileall and repository checks passed; light validation unavailable because tests_fetcher checkout is blocked by pre-existing untracked files from earlier cumulativ…

Rejected / not included records (296)

PR	Category	Status	Original PR summary / goal	Rejection reason
huggingface#45679	other	skipped	TST Run fast PEFT tests in normal CI	not a defect fix
huggingface#45677	other	skipped	No serving in quality docker image	not a defect fix
huggingface#45673	feature	skipped	Laguna XS.2 implementation	not a defect fix
huggingface#45669	documentation	skipped	zero_shot_object_detection ValueError fix for python 3.13	not a defect fix
huggingface#45668	feature	skipped	[GGUF] Add support for Qwen3.5 MoE (qwen35moe arch)	not a defect fix
huggingface#45667	other	skipped	chore(typing): add ty type checking for 3 pipeline files	not a defect fix
huggingface#45666	feature	skipped	Extended n-to-1 kernel fusion via `KernelConfig`	not a defect fix
huggingface#45664	documentation	skipped	Doc translate to Persian(farsi)	not a defect fix
huggingface#45661	feature	skipped	[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter)	not a defect fix
huggingface#45660	documentation	skipped	[docs] cpu offloading	not a defect fix
huggingface#45659	documentation	skipped	[docs] dtype	not a defect fix
huggingface#45654	feature	skipped	[CB] Refactor any model-related code in a separate class	not a defect fix
huggingface#45653	feature	skipped	[CB] Better overall script and decode bucketting	not a defect fix
huggingface#45652	other	skipped	Fix colmodernvbert tests	not a defect fix
huggingface#45648	other	skipped	Fix SDPA inference tolerances for MPS backend	not a defect fix
huggingface#45645	other	skipped	Fix xdist collisions for captured_info artifacts and preserve CI debug logs	not a defect fix
huggingface#45643	feature	skipped	Add DeepSeek V4	not a defect fix
huggingface#45640	feature	skipped	🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config	not a defect fix
huggingface#45638	feature	skipped	Add Multi-Token Prediction (MTP) support for Qwen3.5	not a defect fix
huggingface#45637	feature	skipped	Add Multi-Token Prediction (MTP) support for Qwen3.5	not a defect fix
huggingface#45635	other	skipped	qa: speed up dtype regex weight load + reduce dtype tests to 3 random	not a defect fix
huggingface#45634	other	skipped	DeepGEMM BF16, isolation, refactor	not a defect fix
huggingface#45633	other	skipped	CircleCI with torch 2.11	not a defect fix
huggingface#45631	other	skipped	chore: bump doc-builder SHA for main doc build workflow	not a defect fix
huggingface#45630	feature	skipped	Add new model: Kimi2-6	not a defect fix
huggingface#45629	other	skipped	Allow more artifacts to be download in CI	not a defect fix
huggingface#45626	feature	skipped	[Model] Add PP-FormulaNet Model Support	not a defect fix
huggingface#45623	other	skipped	Glm5 change	not a defect fix
huggingface#45621	feature	skipped	Better Grouped GEMM + EP	not a defect fix
huggingface#45618	feature	skipped	Add MTP speculative decoding via MTPCandidateGenerator	not a defect fix
huggingface#45617	feature	skipped	Add Multi-Token Prediction (MTP) inference support	not a defect fix
huggingface#45616	feature	skipped	Add DeepSeek V4	not a defect fix
huggingface#45614	defect	validation_failed	Add missing requests dependency to transformers[serving]	cherry-pick was clean but validation failed ruff_format: setup.py would be reformatted after adding requests to serving extras; reverted the top cherry-pick
huggingface#45613	feature	skipped	[New Model] Add MiniCPM3 support	not a defect fix
huggingface#45612	documentation	skipped	[docs] update model cards	not a defect fix
huggingface#45609	feature	skipped	make it possible to ser/deser HF MoE models with torchao	not a defect fix
huggingface#45608	documentation	skipped	Python code in model docs	not a defect fix
huggingface#45607	other	skipped	Add regression test for Gemma4 audio relative positional range	not a defect fix
huggingface#45604	feature	skipped	Agent first cli with skill	not a defect fix
huggingface#45683	defect	validation_failed	Exclude audio modules from conversion process	ruff_check failed: blank line contains whitespace in src/transformers/quantizers/base.py
huggingface#45599	other	skipped	qa: more lazy loading	not a defect fix
huggingface#45597	feature	skipped	Add Granite 4.1 Vision (granite4_vision)	not a defect fix
huggingface#45595	feature	skipped	Add unified Cache-layer management for GLM-5 DSA Indexer keys	not a defect fix
huggingface#45587	documentation	skipped	[docs] cb memory management	not a defect fix
huggingface#45586	feature	skipped	Add Audio-Visual Flamingo model	not a defect fix
huggingface#45585	other	skipped	qa: bumped mlinter and allow local override	not a defect fix
huggingface#45583	other	skipped	Update dev version	not a defect fix
huggingface#45581	other	skipped	Add automated reviewer assignment script	not a defect fix
huggingface#45580	feature	skipped	[`Privacy Filter`] Add model	not a defect fix
huggingface#45579	other	skipped	Update assign_reviewers.py	not a defect fix
huggingface#45577	feature	skipped	Allow for registered experts from kernels hub	not a defect fix
huggingface#45576	documentation	skipped	docs(pipeline): fix num_workers docstring default from 8 to 0	not a defect fix
huggingface#45574	documentation	skipped	Fix typos	not a defect fix
huggingface#45572	other	skipped	refactor(Dots1): drop Dots1MoE override to `pass` (inherits from DSV3 MoE)	not a defect fix
huggingface#45569	feature	skipped	Proper nemotron H and 3 and 2	not a defect fix
huggingface#45567	other	skipped	Move some conversion mappings to PrefixChange	not a defect fix
huggingface#45560	documentation	skipped	Update torchao usage for XPU and CPU	not a defect fix
huggingface#45558	feature	skipped	feat(trainer): log individual losses from loss_dict	not a defect fix
huggingface#45556	documentation	skipped	Add image processors refactor to v5 migration guide	not a defect fix
huggingface#45555	feature	skipped	perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models	not a defect fix
huggingface#45554	documentation	skipped	[docs] multi-turn tool calling	not a defect fix
huggingface#45553	documentation	skipped	[docs] per-request sampling params	not a defect fix
huggingface#45551	feature	skipped	Add ForSequenceClassification heads for the OLMo family	not a defect fix
huggingface#45550	other	skipped	Add runner selection for mi325 GPU type	not a defect fix
huggingface#45546	feature	skipped	feat: Add GGUF loading support for Llama 4 (text)	not a defect fix
huggingface#45543	other	skipped	ci: OTEL support	not a defect fix
huggingface#45537	feature	skipped	NVFP4 quantization: streaming loader, fused MoE experts (Qwen + Llama…	not a defect fix
huggingface#45535	other	skipped	[Sam3LiteText] Remove unnecessary modules/configs	not a defect fix
huggingface#45534	feature	skipped	🚨 [ALM] Add base model without head	not a defect fix
huggingface#45532	feature	skipped	[Model] Add SLANet Model Support	not a defect fix
huggingface#45531	defect	validation_failed	Revert "Fix: modular image processors (huggingface#45492)"	merge was clean and compileall/checkers passed, but configured tests_fetcher/pytest validation did not complete within the command timeout; merge reset
huggingface#45527	other	skipped	Reapply modular to examples	not a defect fix
huggingface#45524	defect	aborted	utils: handle flash_attn missing from importlib packages_distributions without crashing	codebase moved on: src/transformers/utils/import_utils.py already uses PACKAGE_DISTRIBUTION_MAPPING.get(..., []) for the flash_attn availability checks, while current code differs from the PR on the flash_attn_3 package…
huggingface#45686	defect	aborted	Fix custom-module copies inheriting read-only permissions	merge conflicted in tests/utils/test_dynamic_module_utils.py where current branch already added local-cache-key tests and PR adds a custom_object_save read-only-permission regression test; runtime code change in src/tra…
huggingface#45519	feature	skipped	[Trainer] Add ddp_static_graph option	not a defect fix
huggingface#45516	defect	validation_failed	T5Gemma2: fix `prepare_decoder_input_ids_from_labels`	compileall and repo checkers passed, but tests_fetcher validation did not complete within the command timeout after the merge
huggingface#45515	defect	validation_failed	Fix CUDA availability check in get_device_properties()	ruff_check failed with F823 because PR removed the local torch import while get_device_properties still has later local torch imports
huggingface#45512	feature	skipped	[OutputRecorder] re.search on layer_name	not a defect fix
huggingface#45511	defect	validation_failed	Fix NaN in Gemma3/EmbeddingGemma when batching mixed-length sequences…	compileall and repo checkers passed, but tests_fetcher validation did not complete within the command timeout after the merge
huggingface#45509	defect	validation_failed	Fix get_device_properties crash when CUDA is installed but no GPU	compileall and repo checkers passed, but tests_fetcher validation did not complete within the command timeout after the merge
huggingface#45508	documentation	skipped	[Doc] Fix 'tokenized' -> 'tokenizer' typo in streamer docstrings	not a defect fix
huggingface#45506	feature	skipped	Add full GGUF loading support for GPT‑OSS (fixes huggingface#43366, supersedes huggingface#43757) latest	not a defect fix
huggingface#45504	feature	skipped	.4021378118068288:1e40fd96a800b4038c914120b0aa85c2_69e2faf342c82e665fe04170.69e2faf642c82e665fe04174.69e2faf6d18af355a624eac3:Trae CN.T(2026/4/18 11:31:02)	not a defect fix
huggingface#45503	feature	skipped	.4021378118068288:aafb9167aaa6b321205f754209b0cbcb_69e2f25642c82e665fe0407f.69e2f43542c82e665fe04083.69e2f435d18af355a624eac1:Trae CN.T(2026/4/18 11:02:13)	not a defect fix
huggingface#45502	feature	skipped	.4021378118068288:4da7ed27ccaa5f974fe4a552e2b67bb6_69e2eea842c82e665fe04002.69e2eec142c82e665fe04006.69e2eec18b8bd167e2faa241:Trae CN.T(2026/4/18 10:38:57)	not a defect fix
huggingface#45500	feature	skipped	Add full GGUF loading support for GPT‑OSS (fixes huggingface#43366, supersedes huggingface#43757) latest	not a defect fix
huggingface#45497	feature	skipped	Add V-JEPA 2.1 inference support	not a defect fix
huggingface#45495	other	skipped	revert sha commit pointing to main for transformers_amd_ci_ workflows	not a defect fix
huggingface#45494	defect	validation_failed	Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM	compileall and repo checkers passed, but tests_fetcher validation timed out after the merge
huggingface#45493	feature	skipped	Modularize `ProcessorMixin` into smaller components	not a defect fix
huggingface#45492	defect	validation_failed	Fix: modular image processors	compileall and repo checkers passed, but tests_fetcher validation timed out after the merge
huggingface#45490	feature	skipped	Add ctsm model	not a defect fix
huggingface#45489	other	skipped	Align gemma3n cache sharing to gemma4	not a defect fix
huggingface#45487	defect	validation_failed	Fix model parallel issue for altclip model and ChineseClip model	compileall and repo checkers passed, but tests_fetcher validation timed out after the merge
huggingface#45485	feature	skipped	[serve] Update tool call to switch to `parse_response`	not a defect fix
huggingface#45484	other	skipped	Minor update	not a defect fix
huggingface#45481	other	skipped	Add check-auto in repo-consistency and fix sorting	not a defect fix
huggingface#45480	defect	validation_failed	Update quants tests	compileall and repo checkers passed, but tests_fetcher/pytest validation timed out after the merge
huggingface#45477	feature	skipped	Blockwise mask fn as opt arg in all masking functions	not a defect fix
huggingface#45476	other	skipped	[Don't merge] Call CI workflow	not a defect fix
huggingface#45475	other	skipped	chore(qa): split out mlinter	not a defect fix
huggingface#45474	other	skipped	chore: bump doc-builder SHA for main doc build workflow	not a defect fix
huggingface#45473	defect	validation_failed	Fix EP: RouterParallel shape, tp_plan property, grouped_mm sentinels	merge conflict in tensor_parallel.py was resolved by retaining newer post_shard_wrap, compileall and checkers passed, but tests_fetcher/pytest validation timed out
huggingface#45472	defect	validation_failed	fix(testing_utils): guard get_device_capability with torch.cuda.is_available()	compileall and repo checkers passed, but tests_fetcher/pytest validation timed out after the merge
huggingface#45471	feature	skipped	Add EXAONE 4.5 implementations	not a defect fix
huggingface#45470	defect	validation_failed	sam3_lite_text: skip flash_attn_2_can_dispatch_composite_models tests	repo checkers failed: PR added a duplicate test_flash_attn_2_can_dispatch_composite_models definition in sam3_lite_text tests
huggingface#45469	defect	validation_failed	Fix: propagate interpolate_pos_encoding through Pixio model hierarchy	repo checkers failed: ruff format would reformat pixio modeling and modular files
huggingface#45467	defect	validation_failed	Fix MPS SDPA output shape when value head dim differs from query head dim	compileall and repo checkers passed, but tests_fetcher/pytest validation timed out after the merge
huggingface#45465	documentation	skipped	[docs] contributing	not a defect fix
huggingface#45463	defect	aborted	Fix response api support	codebase moved on: response API support overlaps heavily with newer serve/tool-call parsing changes; direct merge conflicts across docs, response/chat handlers, serving utils, and serve tests, making a speculative manua…
huggingface#45462	other	skipped	chore(sec): added a handful of security checks	not a defect fix
huggingface#45461	other	skipped	Remove redundant condition checks in `get_image_size` method	not a defect fix
huggingface#45460	defect	validation_failed	fix(tokenization): re-raise ImportError to allow RuntimeError/OSError fallback (huggingface#45459)	compileall and repo checkers passed, but tests_fetcher/pytest validation timed out after the merge
huggingface#45457	feature	skipped	Allow loading Qwen Thinker 'base' models without generative head	not a defect fix
huggingface#45456	other	skipped	refactor(qa): extend extras so ty can run on server modules	not a defect fix
huggingface#45455	defect	validation_failed	[`fix`] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once	compileall and repo checkers passed, but impacted-test validation timed out after the merge
huggingface#45454	defect	aborted	Gemma4 training with text-only samples	codebase moved on: PR head includes unrelated main merge history, and cherry-picking the relevant Gemma token-type commits conflicts with newer Gemma3/Gemma4 causal mask call signatures and test skips; resolving would r…
huggingface#45453	other	skipped	Draft commit	not a defect fix
huggingface#45452	other	skipped	refactor: replace wildcard imports with explicit imports in model init.py files	not a defect fix
huggingface#45451	documentation	skipped	Fix 'seperate' typo in qwen3/glm video-model docstrings	not a defect fix
huggingface#45450	other	skipped	chore: bump doc-builder SHA for PR upload workflow	not a defect fix
huggingface#45449	defect	validation_failed	Add `step3_vl` to `MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS`	compileall and repo checkers passed, but impacted-test validation timed out after the merge
huggingface#45448	defect	aborted	[loading] Clean way to add/remove full parts in checkpoint names	codebase moved on: loading/conversion mapping changes conflict with newer cumulative conversion mappings for qwen3_5_moe_text, hy_v3/laguna, and cohere_asr; resolving the combined checkpoint-conversion table safely woul…
huggingface#45445	defect	validation_failed	Update Torch version check for flex attention	compileall and repo checkers passed, but impacted-test validation timed out after the merge
huggingface#45442	other	skipped	Update workflow references to new commit hash	not a defect fix
huggingface#45439	feature	skipped	feat: bump min safetensors version to `0.8.0-rc.0`	not a defect fix
huggingface#45438	feature	skipped	Add Gemma4ForSequenceClassification	not a defect fix
huggingface#45436	feature	skipped	Add expert parallelism (EP) config support for Qwen3 MoE	not a defect fix
huggingface#45434	other	skipped	better grad acc tests	not a defect fix
huggingface#45433	feature	skipped	SonicMoe	not a defect fix
huggingface#45432	other	skipped	chore(qa): split pipeline and add type checking	not a defect fix
huggingface#45430	documentation	skipped	[Doc] Correct checkpoint path in Dinov2 model_docs	not a defect fix
huggingface#45429	other	skipped	Improve workflow file	not a defect fix
huggingface#45426	feature	skipped	Feature/add axk1	not a defect fix
huggingface#45425	other	skipped	chore(typing): added modeling_utils to ty	not a defect fix
huggingface#45424	feature	skipped	Add IndexCache support for GLM5 DSA	not a defect fix
huggingface#45415	other	skipped	Adds type checking to `src/transformers/*py`	not a defect fix
huggingface#45409	feature	skipped	from_pretrained orchestration + distributed save/load	not a defect fix
huggingface#45408	feature	skipped	MoE expert parallelism + sequence parallelism	not a defect fix
huggingface#45401	feature	skipped	Add support for Voxtral-4B-TTS-2603 to transformers	not a defect fix
huggingface#45398	documentation	skipped	Add example for iterative chatting with MLLMs	not a defect fix
huggingface#45396	feature	skipped	Extract dynamic vision/audio tensors into standalone pure functions	not a defect fix
huggingface#45687	defect	validation_failed	fix: Made histc_input robust for broader hardware	merge clean but validation failed: ruff format would reformat src/transformers/integrations/moe.py
huggingface#45392	other	skipped	remove cache file from tree	not a defect fix
huggingface#45391	feature	skipped	audio tester class	not a defect fix
huggingface#45387	defect	aborted	Fix flash_attention_3 detection and import for hopper wheel installs	codebase moved on: direct merge conflicts in is_flash_attn_3_available; current branch already has newer flash-attn package-distribution guard changes while the PR replaces detection with import probing, so choosing eit…
huggingface#45384	feature	skipped	generation/stopping_criteria: short-circuit StoppingCriteriaList when all sequences are done	not a defect fix
huggingface#45688	documentation	skipped	docs(README_zh-hans): clarify conditions for not using Transformers	not a defect fix
huggingface#45382	feature	skipped	Add AudioGen (AudioCraft) to MusicGen conversion scripts	not a defect fix
huggingface#45378	defect	validation_failed	fix(mistral): guard ReasoningEffort import for older mistral_common versions	merge clean but validation failed: ruff format would reformat src/transformers/tokenization_mistral_common.py
huggingface#45374	feature	skipped	Adding hierarchical classification example	not a defect fix
huggingface#45370	documentation	skipped	docs: fix 5 docstring errors in Gemma3nTextConfig (typos, grammar, formatting)	not a defect fix
huggingface#45367	feature	skipped	Add dtype config options for Four Over Six	not a defect fix
huggingface#45366	defect	validation_failed	Fix OLMoE routing and Mistral4 RoPE dimensions	merge clean but validation failed: import ordering errors in tests/models/mistral4/test_modeling_mistral4.py and tests/models/olmoe/test_modeling_olmoe.py
huggingface#45365	other	skipped	Refactor GPT-J output tracing to use standardized decorators	not a defect fix
huggingface#45364	feature	skipped	Add PolarQuant backend to QuantizedCache (Hadamard-rotated Lloyd-Max)	not a defect fix
huggingface#45363	feature	skipped	n-to-1 kernel fusion via `KernelConfig`	not a defect fix
huggingface#45361	feature	skipped	Add CLIP-like models in conversion to VLMs	not a defect fix
huggingface#45360	other	skipped	Replace deprecated `huggingface-cli` references with `hf`	not a defect fix
huggingface#45355	feature	skipped	Add universal phone recognition model - PhoneticXeus	not a defect fix
huggingface#45353	feature	skipped	add kwargs to all methods in the CallbackHandler class	not a defect fix
huggingface#45350	feature	skipped	WIP: Add support for Granite4VisionForConditionalGeneration	not a defect fix
huggingface#45349	defect	validation_failed	Fix huggingface#45305 + add regression test GAS	ruff failed after merge: duplicate test_gradient_accumulation_steps_not_leaked_to_accelerator and undefined set_seed in tests/trainer/test_trainer.py; merge reset
huggingface#45344	other	skipped	refactor: display test duration	not a defect fix
huggingface#45342	defect	validation_failed	Use recursively from children	ruff format failed after merge: src/transformers/modeling_utils.py would be reformatted; merge reset
huggingface#45339	other	skipped	chore: added circleci python script to ruff and ty checkers	not a defect fix
huggingface#45338	documentation	skipped	docs: document known limitations of _can_set_attn/experts_implementation source inspection	not a defect fix
huggingface#45337	other	skipped	chore: remove test_hub for now	not a defect fix
huggingface#45334	feature	skipped	Feature/add axk1	not a defect fix
huggingface#45333	feature	skipped	Add heterogeneous config support (per-layer configuration)	not a defect fix
huggingface#45689	feature	skipped	fixing more typos	not a defect fix
huggingface#45332	feature	skipped	Add heterogeneous model support (per-layer config and modeling)	not a defect fix
huggingface#45329	feature	skipped	Update `trackio` integration to use Buckets and "freeze" Space after training	not a defect fix
huggingface#45327	documentation	skipped	[docs] modular transformers	not a defect fix
huggingface#45326	feature	skipped	feat[vLLM × v5]: Add vLLM compatibility for audio models	not a defect fix
huggingface#45321	defect	aborted	Remove references to torchao's AffineQuantizedTensor	codebase moved on: direct merge conflicted in tests/quantization/torchao_integration/test_torchao.py where current branch imports newer NVFP4DynamicActivationNVFP4WeightConfig while the PR removes deprecated AffineQuant…
huggingface#45319	defect	aborted	fix: dont download artifacts from the test hub	codebase moved on: direct merge conflicted in utils/fetch_hub_objects_for_ci.py at URLS_FOR_TESTING_DATA where cumulative branch added PP-OCR/Paddle demo URLs while the PR rewrites hub artifact fetching to avoid staging…
huggingface#45315	defect	aborted	Fix softmaxing router logits	codebase moved on: direct merge conflicted across many MoE router implementations where current branch already has router_probs/router_scores naming while PR changes routing_weights/router_logits return semantics to avo…
huggingface#45314	defect	aborted	Conversion for LLM class loading with VLM ckpt	codebase moved on: direct merge conflicted in conversion mapping and Gemma3n modular code where current branch has newer PrefixChange/model-prefix conversion logic, qwen3_5_text mappings, and Gemma3n unexpected-key hand…
huggingface#45312	defect	validation_failed	[gemma4] Dissociate kv states sharing from the Cache	compileall failed after clean merge: Gemma4 modeling and modular files contain duplicate shared_kv_states arguments
huggingface#45309	defect	aborted	Fix KeyError in apply_chat_template when message has no content (huggingface#45290)	codebase moved on: direct merge conflicted in chat-template/processing/serving paths where current branch already uses message.get/content handling and PR introduces a get_message_content helper for missing content; res…
huggingface#45303	defect	validation_failed	Fix FA2 inference equivalence failures for Whisper (closes huggingface#29942)	patch applied from gh diff because PR ref was unavailable, but ruff_format would reformat tests/test_modeling_common.py
huggingface#45301	documentation	skipped	docs maintenance for transformers repository 979e8	not a defect fix
huggingface#45299	other	skipped	[Please ignore] CI Test PR	not a defect fix
huggingface#45298	feature	skipped	Add new qwen2 5 vl	not a defect fix
huggingface#45296	feature	skipped	Add GGUF support to Gemma4 (31B & 26B-A4B) text	not a defect fix
huggingface#45294	feature	skipped	feat: add Gemma4ForSequenceClassification	not a defect fix
huggingface#45291	feature	skipped	First pull request	not a defect fix
huggingface#45288	defect	aborted	fix(cohere_asr): auto-fix failing tests	codebase moved on: Cohere ASR expectations now use the shared Expectations helper with xpu/cuda entries, overlapping older torch_device conditional expectations; accepting the PR head would regress current expectation s…
huggingface#45287	defect	aborted	fix(videomt): auto-fix failing tests	codebase moved on: VideoMT expected-mask tensors already differ from the closed draft auto-fix in multiple positions; choosing either side would be speculative without rerunning the model-specific slow tests
huggingface#45285	defect	aborted	Fix export for gemma4 and add Integration tests	codebase moved on: Gemma4 tests and ExecuTorch export paths have diverged substantially; current branch already has newer export/cache handling and expanded Gemma4 expectations, while the PR head would remove later Quan…
huggingface#45283	feature	skipped	Add Qwen3.5 GGUF loading support	not a defect fix
huggingface#45280	feature	skipped	add Qianfan-OCR model definition	not a defect fix
huggingface#45279	feature	skipped	add expert parallelism for gemma-4-26B-A4B-it	not a defect fix
huggingface#45274	defect	aborted	Fix CB Accuracy Regression under FA2	codebase moved on: continuous batching CUDA graph handling now has split varlen/decode controls and a different graph-key API; the PR edits older use_cuda_graph and graph signature paths and conflicts across runtime plu…
huggingface#45271	documentation	skipped	[docs] vlm addition	not a defect fix
huggingface#45270	feature	skipped	[Trainer] Support multi-loss component logging	not a defect fix
huggingface#45269	other	skipped	Fix typos in src/transformers/utils/output_capturing.py	not a defect fix
huggingface#45690	feature	skipped	[serve] Support for reasoning	not a defect fix
huggingface#45268	defect	aborted	Fix `Qwen2IntegrationTest`	codebase moved on: Qwen2 integration expectations now have per-backend cuda/rocm/xpu entries while PR changes the same expectations to a generic default; resolving would require choosing expected values across backends …
huggingface#45267	documentation	skipped	Add docstring to FFN.forward in DistilBERT	not a defect fix
huggingface#45266	documentation	skipped	Add docstrings to AlbertMLMHead and AlbertSOPHead forward methods	not a defect fix
huggingface#45262	documentation	skipped	doc: fix TokenizersBackend.convert_to_native_format docstring	not a defect fix
huggingface#45261	other	skipped	empty	not a defect fix
huggingface#45258	defect	aborted	Fix `SmolVLM` video processor `resize` using wrong interpolation after backend refactor	codebase moved on: current SmolVLMVideoProcessor already has the resample signature and calls super().resize with resample, while PR adds manual PIL-to-torch interpolation mapping in the same block; resolving would requ…
huggingface#45256	defect	aborted	fix: skip qwen3_5_text checkpoint remap for nested VL language_model	codebase moved on: current conversion_mapping now passes named module prefixes into extract_weight_conversions_for_model, while the PR changes older model.modules traversal and nested language_model detection; resolving…
huggingface#45254	other	skipped	Fix more integration tests for important models	not a defect fix
huggingface#45251	defect	aborted	fix(generation): beam sample when num_beams * vocab_size exceeds multinomial limit	codebase moved on: current generation utils already contains a multinomial-dimension workaround for beam sampling using top-k prefiltering, while the PR applies a different Gumbel-top-k fallback in the same _get_top_k_c…
huggingface#45248	defect	aborted	Fix tf32 issue: set explicitly.	codebase moved on: conftest.py already contains the TF32 disablement block from this PR with an additional guard for torch.backends.cudnn.conv, causing a conflict on the same hasattr line; resolving would amount to choo…
huggingface#45244	other	skipped	Let's CI go great	not a defect fix
huggingface#45243	other	skipped	Nvidia CI with `torch 2.11`	not a defect fix
huggingface#45241	other	skipped	Update tiny model creation script	not a defect fix
huggingface#45235	feature	skipped	feat/rfc/poc: Agnostic GPU	not a defect fix
huggingface#45233	feature	skipped	feat: make timesfm2_5 onnx export compatible	not a defect fix
huggingface#45232	documentation	skipped	[docs] static model rules	not a defect fix
huggingface#45228	defect	aborted	More fix for tiny model creation	codebase moved on: conflict in generated auto image processor mapping; PR adds vivit mapping that is already present on cumulative branch while surrounding mapping entries differ, and remaining tiny-model/config changes…
huggingface#45227	defect	validation_failed	fix: remove nonexistent PILImageResampling import from video_processing_utils	patch removing PILImageResampling runtime import failed ruff with F821 undefined name in quoted annotation; light validation also missing requests after targeted selection
huggingface#45220	feature	skipped	Multimodal serve support	not a defect fix
huggingface#45219	feature	skipped	Add MoE to Gemma4 TP plan	not a defect fix
huggingface#45218	feature	skipped	Proposal: Agent-first CLI	not a defect fix
huggingface#45215	other	skipped	[Qwen3_5]Remove unnecessary masked_fill_ in torch_chunk_gated_delta_rule attention computation: "attn = (q_i @ k_i.transpose(-1, -2) * decay_mask[:, :, i]).masked_fill_(mask, 0)"	not a defect fix
huggingface#45213	other	skipped	DO NOT MERGE - model creation skill	not a defect fix
huggingface#45212	feature	skipped	musicflamingo: add test support for Intel XPU device	not a defect fix
huggingface#45209	feature	skipped	nomic_bert: make the test suitable for general device.	not a defect fix
huggingface#45207	documentation	skipped	[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline	not a defect fix
huggingface#45197	documentation	skipped	fix(docs): correct gemma4 docs and examples	not a defect fix
huggingface#45196	documentation	skipped	[docs] formatting	not a defect fix
huggingface#45195	feature	skipped	Use torchvision `decode_image` to load images in the torchvision backend	not a defect fix
huggingface#45194	defect	aborted	Configuration insoncistencies	codebase moved on: PR adds NougatConfig and config mapping cleanups, but current branch already has NougatConfig with newer auto_docstring checkpoint metadata; add/add conflict in generated/model config area, so not res…
huggingface#45192	feature	skipped	casually dropping the most capable open weights on the planet	not a defect fix
huggingface#45191	other	skipped	Add edge case tests for out-of-range token id decoding in Qwen2 tokenizer	not a defect fix
huggingface#45190	defect	aborted	Fix ty for transformers cli	codebase moved on: PR updates CLI typing and ty include paths, but current cumulative branch has substantially changed serving CLI request/response/transcription utilities and check_types wiring; multiple content confli…
huggingface#45189	other	skipped	Add doc test CI workflow reusing existing model job infrastructure	not a defect fix
huggingface#45187	defect	aborted	Close file handler	codebase moved on: PR fixes base64 image temp-file handle handling in old BaseHandler parsing, but current branch rewrote multimodal/message parsing and tool-call support in the same block; applying the close-file chang…
huggingface#45186	feature	skipped	Add new model: Isaac	not a defect fix
huggingface#45184	feature	skipped	[CB] [Major] Add CPU request offloading	not a defect fix
huggingface#45181	feature	skipped	Make the cli a top-level package	not a defect fix
huggingface#45180	other	skipped	🔒 Pin GitHub Actions to commit SHAs	not a defect fix
huggingface#45179	defect	aborted	[CB] Tweaks to update and minor fixes	codebase moved on: PR changes continuous batching cache/memory handler indexing and paged tests, but current branch already has newer activation-peak memory accounting, optional empty read-index handling, CPU offload te…
huggingface#45176	feature	skipped	added efficietvitsam model to HF	not a defect fix
huggingface#45174	documentation	skipped	[docs] transformers serve	not a defect fix
huggingface#45172	feature	skipped	Add TopNSigmaLogitsWarper and top_n_sigma generation config support	not a defect fix
huggingface#45170	defect	aborted	`layrnorm` -> `layernorm`	codebase moved on: CLIP-like vision modeling files have been refactored since the PR; the typo-only rename conflicts with moved/duplicated CLIPVisionModel/forward blocks and would require regenerating copied/modular mod…
huggingface#45168	feature	skipped	Update min_lr and max_lr default values to better defaults	not a defect fix
huggingface#45167	feature	skipped	Add anthropic style of function schema	not a defect fix
huggingface#45159	documentation	skipped	Add Turkish documentation: Get Started section	not a defect fix
huggingface#45158	documentation	skipped	Add Turkish (tr) translation for Get Started section	not a defect fix
huggingface#45157	feature	skipped	[WIP] PrismML Bonsai model support	not a defect fix
huggingface#45155	feature	skipped	Load adapter with TP	not a defect fix
huggingface#45154	defect	aborted	Pretrained-config bug(45072/huggingfacebug)	codebase moved on: PR rewrites PreTrainedConfig/auto_docstring type-validation code against an older layout, while current branch already uses dataclass/strict config fields, dtype_validator, dataclass_transform, and up…
huggingface#45153	feature	skipped	[`FA`] Native torch integration	not a defect fix
huggingface#45152	documentation	skipped	[docs] model testing	not a defect fix
huggingface#45150	documentation	skipped	Fix incorrect TrainingArguments example in training.md	not a defect fix
huggingface#45149	feature	skipped	DO NOT MERGE adding SAML3-LiteText with a skill, first pass	not a defect fix
huggingface#45148	feature	skipped	Allow for all layers in Qwen3.5 architecture to be Gated Deltanet.	not a defect fix
huggingface#45144	feature	skipped	Add Xiaomi MiMo-V2	not a defect fix
huggingface#45143	feature	skipped	Add parse_response to Processor, make it a bit more official	not a defect fix
huggingface#45142	other	skipped	refactor(gpt-oss): rename `eager_attention_forward` to `eager_attention_forward_with_sink`	not a defect fix
huggingface#45139	defect	aborted	Fix vllm cis	codebase moved on: PR applies a broad generated RoPE/cis update across modeling files and conflicts in CLIPSeg attention shape handling; resolving safely would require regenerating/checking the copied model updates rath…
huggingface#45135	defect	aborted	Fix model saving corruption for dynamically untied embeddings	codebase moved on: PR overlaps the already-present tied-embedding save_pretrained fix in modeling_utils with a different implementation/restoration behavior; merging conflicts in the same block and safe resolution would…
huggingface#45134	feature	skipped	Optimize Parakeet feature extraction on CUDA	not a defect fix
huggingface#45133	feature	skipped	Add sarvam model	not a defect fix
huggingface#45132	defect	validation_failed	Fix: Remove double softmax in MoE router load-balancing loss (Mixtral, Qwen2MoE, Qwen3VLMoE)	merge clean but validation failed: ruff format would reformat three added/modified MoE test files; light validation also unavailable as in baseline
huggingface#45130	documentation	skipped	[docs] @auto_docstring decorator	not a defect fix
huggingface#45128	defect	validation_failed	Fix: handle future annotations in _process_kwargs_parameters	merge clean but validation failed: ruff import ordering error in src/transformers/utils/auto_docstring.py; light validation unavailable as in baseline
huggingface#45126	defect	aborted	http retries on audio file downloads	codebase moved on: PR adds generic retry/backoff and tests but conflicts with current audio video-container handling in audio_utils and newer generic utility tests/imports; resolving is small but would require manually …
huggingface#45122	defect	aborted	🚨 [`LightGlue`] Remove remote code execution	merge blocked by pre-existing untracked utils/mlinter/trf014.py in the cumulative worktree, which the PR also adds; not moving/removing existing workflow artifacts during this turn, and safely reconciling the linter sta…
huggingface#45121	defect	aborted	fix: remove unsafe exec() in serve.py	codebase moved on: PR patches the old monolithic FastAPI /load_model implementation in src/transformers/cli/serve.py, but current branch has refactored serving into build_server/ModelManager handlers; safe security hand…
#45118	feature	skipped	Add full GGUF loading support for GPT‑OSS (fixes huggingface#43366, supersedes huggingface#43757)	not a defect fix
#45116	feature	skipped	Add full GGUF loading support for GPT‑OSS (fixes huggingface#43366)	not a defect fix
#45115	other	skipped	Refactor/nemotron h inherit granitemoehybrid	not a defect fix
#45114	documentation	skipped	fix: lets fix all doctests	not a defect fix
#45113	feature	skipped	Add GDS support for safetensors loading	not a defect fix
#45112	feature	skipped	[CB] Add warmup feature	not a defect fix
#45110	feature	skipped	Add SAM 3.1	not a defect fix
#45109	defect	aborted	Fix T5Attention shape mismatch under Tensor Parallelism	codebase moved on: T5-family attention shape code now uses input_shape/hidden_shape generalization that overlaps the PR batch_size/seq_length view fix across copied/model files; resolving six generated/copy-related conf…
#45105	defect	aborted	Fix @auto_docstring crash with from future import annotations in _process_kwargs_parameters	codebase moved on: auto_docstring kwargs annotation handling has overlapping guard changes in _process_kwargs_parameters; PR adds get_type_hints/string-resolution but conflicts with current args/name guard, so a…
#45104	defect	aborted	Fix auto_docstring crash with from future import annotations	codebase moved on: same auto_docstring _process_kwargs_parameters area already has overlapping annotation guard changes; this closed PR conflicts with the newer guard and duplicate string-annotation fix area
#45101	feature	skipped	Adding support for Nandi Models	not a defect fix
#45100	documentation	skipped	Update accelerator_selection.md	not a defect fix
#45097	feature	skipped	Add old InternVL2-1B/2B support to the InternVL conversion script #45092	not a defect fix
#45096	defect	aborted	Fix: Skip meta device initialization for remote code models	codebase moved on: modeling_utils initialization context now uses init.meta_device_safe_creation_ops to make meta-device construction safer, directly overlapping the PR change that conditionally skips torch.device("meta…
#45094	defect	aborted	fix: prefer registered config over remote code in AutoConfig.from_pretrained	codebase moved on: AutoImageProcessor now resolves mapped processors through _load_class_with_fallback for backend-specific mappings and AutoTokenizer has newer explicit-local-code guards for TOKENIZER_MAPPING and missi…
#45087	defect	validation_failed	Fix PretrainedConfig type checking with mypy	ruff validation failed after merge: src/transformers/configuration_utils.py references TYPE_CHECKING without importing it
#45082	feature	skipped	[VidEoMT] Update conversion script	not a defect fix
#45076	feature	skipped	Osman-Level Innovations: Hardware-Aware Advisor & Selective Weight Surgery CLI	not a defect fix
#45075	feature	skipped	Add Deepseek-OCR-2 model	not a defect fix
#45073	other	skipped	Refactor OwlViT to modular Transformers	not a defect fix
#45067	feature	skipped	feat: trainer resume_from_checkpoint support hub downloads (#43375)	not a defect fix
#45066	feature	skipped	[PR] Unique Enhancement: Transformers Model Advisor & Legacy Cleanup	not a defect fix
#45065	other	skipped	Remove unused TensorFlow env var	not a defect fix
#45064	other	skipped	refactor: shard checkers	not a defect fix
#45063	feature	skipped	CB improvements for serving	not a defect fix
#45058	feature	skipped	Allow advanced users to override `model_type` in `AutoConfig.from_pretrained`	not a defect fix
#45054	other	skipped	chore: update update_metdata.yml	not a defect fix
#45052	defect	aborted	chore: Fix mlinter cache location	codebase moved on: PR replaces old utils/.check_modeling_structure_cache.json ignore with utils/mlinter/.mlinter_cache.json, but current branch already passed through later checker/mlinter refactors and .gitignore now i…

SunMarc and others added 30 commits April 22, 2026 22:00

Merge branch 'main' into 20260421_torchao_nvfp4_fix

068a3d9

Merge branch 'main' into clip_tokenizer_max_model_length

e902b1b

fix padding side issue for fast_vlm tests

2920dd7

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

add XPU Expectation

83926d1

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into fast-vlm-fix

3606baf

fix small conflict and add comment

1aa1d50

fix 2 failed test cases for blt model on XPU

95a4781

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

fix: Resolve backbone test regressions

a4f77a9

Apply repo consistency fixes

bb18772

Merge branch 'main' into cb-very-long-gen

7bd12b3

align

9fb7a64

Fix

1695d37

Merge branch 'main' into cb-very-long-gen

6b45f17

Merge branch 'main' into fix/paged-num-return-sequences-warning

4888862

fix: compute auxiliary losses when denoising is disabled in D-FINE

2454afb

style: fix formatting

d7a3935

update expectations for gemma3n

4090f24

Make EtaLogitsWarper fail fast on fully masked rows

70a1530

Merge PR huggingface#43254

2648669

Merge PR huggingface#43251

d05cad3

# Conflicts: # src/transformers/loss/loss_utils.py

Check fully-masked rows before softmax in eta warper

3fc3e80

Merge branch 'main' into fix-deepspeed-ep-init

d98bad4

fix: continue when content is a string

ff13d50

infer from config instead of hardcoding

5233d19

Update test_modeling_gemma4.py

535353c

test: add regression test for auxiliary losses when denoising is disa…

91d179d

…bled

test: fix num_labels config in auxiliary loss regression test

8659ae6

Update modeling_gemma4.py

8aa98af

evalstate added 28 commits April 28, 2026 19:19

Merge branch 'mergeability-pr-45185' into all-defects

3f10949

Merge branch 'mergeability-pr-45173' into all-defects

d47759e

Merge branch 'mergeability-pr-45169' into all-defects

1a7e9ed

Merge branch 'mergeability-pr-45156' into all-defects

e1030b8

Merge branch 'mergeability-pr-45147' into all-defects

083abc3

Merge branch 'mergeability-pr-45140' into all-defects

1f12e15

Merge branch 'mergeability-pr-45138' into all-defects

a46ff67

Merge branch 'mergeability-pr-45131' into all-defects

207ed53

Merge branch 'mergeability-pr-45124' into all-defects

8334e8a

Merge branch 'mergeability-pr-45123' into all-defects

979a54d

Merge branch 'mergeability-pr-45117' into all-defects

d2a6b51

Merge branch 'mergeability-pr-45108' into all-defects

2e34e8c

Merge branch 'mergeability-pr-45107' into all-defects

9e6abb9

Merge branch 'mergeability-pr-45098' into all-defects

0569ec9

Merge branch 'mergeability-pr-45091' into all-defects

11c7b92

Merge branch 'mergeability-pr-45090' into all-defects

96a7e28

Merge branch 'mergeability-pr-45088' into all-defects

b372e44

Merge branch 'mergeability-pr-45086' into all-defects

e005884

Merge branch 'mergeability-pr-45085' into all-defects

9439783

Merge branch 'mergeability-pr-45080' into all-defects

84d7b50

Merge branch 'mergeability-pr-45074' into all-defects

5257e0d

Merge branch 'mergeability-pr-45069' into all-defects

66e4441

Merge branch 'mergeability-pr-45062' into all-defects

f6fe71c

Merge branch 'mergeability-pr-45061' into all-defects

f901b63

Merge branch 'mergeability-pr-45060' into all-defects

92845ec

Merge branch 'mergeability-pr-45056' into all-defects

c3d2b83

Apply PR huggingface#45055 fix for Trainer checkpoint configs

02ae108

Direct merge conflicted after Trainer refactors; applied the minimal config-saving change from 57cb2b9.

Merge branch 'mergeability-pr-45053' into all-defects

ae9e74b

evalstate force-pushed the all-defects branch from f342edb to ae9e74b Compare April 28, 2026 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cumulative defect fixes from recent Transformers PRs#41

Cumulative defect fixes from recent Transformers PRs#41
evalstate wants to merge 607 commits intomainfrom
all-defects

evalstate commented Apr 28, 2026 •

edited

Loading

Uh oh!

evalstate commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

evalstate commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cumulative defect fixes from recent Transformers PRs

Status counts

Category counts

Validation

Details

Uh oh!

evalstate commented Apr 28, 2026

All-defects flow status

Merged / applied defect records (204)

Rejected / not included records (296)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

evalstate commented Apr 28, 2026 •

edited

Loading