Cumulative defect fixes from recent Transformers PRs by evalstate · Pull Request #43 · evalstate/transformers

evalstate · 2026-04-29T22:37:12Z

Cumulative defect fixes from recent Transformers PRs

This PR is generated by the all-defects mergeability flow. It accumulates defect-fix PRs from huggingface/transformers that could be applied cleanly to the current base.

Source branch: all-defects-750
Base: evalstate/transformers:main
Head: 2143f93a30
PRs classified: 762
PRs with terminal state: 762
Applied/merged/already-present defect fixes: 244
Aborted defect fixes: 50
Validation failures reverted: 10
Non-defect skipped: 458

Status counts

aborted: 50
already_present: 51
applied: 43
merged: 150
skipped: 458
validation_failed: 10

Category counts

defect: 304
documentation: 99
feature: 253
other: 106

Validation

Each applied defect fix was followed by the configured lightweight validation profile:

compileall -q src/transformers
utils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappings
utils/tests_fetcher.py ... && pytest ... when impacted pytest targets are selected

Note: this is intentionally not an end-to-end or slow-test validation pass.

Details

A detailed status table is posted as a PR comment and is also available locally in:

.mergeability/defect-merge-state.jsonl
.mergeability/pr-classifications.jsonl
all-defects-report.md

@Rocketknight1

Per @Rocketknight1's review: replace the ad-hoc `_no_reinit` flag with the existing `_is_hf_initialized` flag that `from_pretrained` already sets on checkpoint-loaded parameters. Guard each Mamba2 init target (A_log / D / dt_bias) and the residual-scaled `out_proj.weight` independently, so parameters restored from a checkpoint survive any subsequent `_init_weights` pass.

Fixed 4 test(s): - tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_batched_generate - tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_forward - tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_generate - tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_generate_text_only

Only installing transformers[serving] failed to launch transformers serve due to the lack of requests dependency

Add 'requests' to serving extras dependencies

…ined When a caller passes a pre-built sub-processor via kwargs to `AutoProcessor.from_pretrained` (e.g. `tokenizer=tok` or `bpe_tokenizer=tok`), use the instance directly instead of silently forwarding it into the sub-loader calls. Exact attribute names take precedence; the canonical modality name is also accepted as an alias when a single sub-processor has that modality.

…g logs

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

…from FSDP

…erter

…rleaving renames and converts ops

# Conflicts: # src/transformers/modeling_utils.py

# Conflicts: # src/transformers/loss/loss_utils.py

# Conflicts: # src/transformers/models/gpt2/configuration_gpt2.py

# Conflicts: # src/transformers/models/doge/modeling_doge.py # src/transformers/models/doge/modular_doge.py

Adapted from huggingface#39999 because tensor parallel setup moved from modeling_utils.py to integrations/tensor_parallel.py.

# Conflicts: # src/transformers/trainer.py

Direct merge of 9ec50bc conflicted after Trainer refactors; apply equivalent current-code guard.

Direct merge of ee690bc conflicted after Trainer/TrainingArguments refactors; apply the current-code equivalent for misaligned save/eval steps.

# Conflicts: # setup.py

# Conflicts: # src/transformers/utils/generic.py

evalstate · 2026-04-29T22:37:14Z

All-defects flow status

Processed terminal records: 762

Merged / applied defect records (244)

PR	Status	Method	Validation	Original PR summary / goal	Merge note
huggingface#45692	merged	merge	passed	[Fix Phi4 test] Add option to override image_processor_auto_map with local code when trust_remote_code is True	clean merge; validation passed with bounded light pytest targets
huggingface#45691	merged	merge	passed	[serve] cb error	clean merge; validation passed with bounded light pytest targets
huggingface#45687	merged	merge	passed	fix: Made histc_input robust for broader hardware	clean merge; validation passed with bounded light pytest targets
huggingface#45686	merged	merge	passed	Fix custom-module copies inheriting read-only permissions	clean merge; validation passed with bounded light pytest targets
huggingface#45683	merged	merge	passed	Exclude audio modules from conversion process	clean merge; fixed PR-local whitespace/formatting and validation passed
huggingface#45682	merged	merge	unavailable	FIX Restore LoRA hotswapping functionality	merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45681	already_present	none	not_run	Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch	current branch already has equivalent DeepSeek TokenizersBackend override and regression coverage; PR conflicts only because it edits the same implemented block
huggingface#45678	merged	merge	unavailable	Fix shared config mutation issue in flash_attn_from_config	merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45671	merged	merge	unavailable	Update latest revision for Phi-4-multimodal test	merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45670	merged	merge	unavailable	[nit] glmasr should be in AutoModelForMultimodalLM	merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45662	merged	merge	passed	Fix EP + FSDP2: experts silently overwritten by rank-0 broadcast	clean merge; validation passed with bounded light pytest targets
huggingface#45661	merged	merge	passed	[Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter)	clean merge; validation passed with bounded light pytest targets
huggingface#45649	merged	merge	passed	Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models	clean merge; compile/style and light validation passed (202 pytest passed, 107 skipped)
huggingface#45645	merged	merge	passed	Fix xdist collisions for captured_info artifacts and preserve CI debug logs	clean merge; compile/style and light validation passed (202 pytest passed, 107 skipped)
huggingface#45642	merged	merge	passed	Fix trust_remote_code local cache collisions for local models (huggingface#45632)	merge required an additive test-file conflict resolution with prior dynamic-module test coverage; validation passed
huggingface#45639	already_present	none	not_run	Make patched testing debug logs xdist-safe	equivalent xdist-safe captured_info path/reset behavior already merged via huggingface#45645; direct merge conflicted only with the broader implementation/tests
huggingface#45694	merged	merge	passed	Fix train_batch_size and eval_batch_size to respect split_batches config	merge required PR-local ruff formatting; validation passed
huggingface#45627	merged	merge	passed	Processing Utils: honor pre-built sub-processor kwargs in from_pretrained	clean merge; validation passed with bounded light pytest targets
huggingface#45615	merged	merge	passed	fix(qianfan_ocr): add XPU expectations	clean merge; validation passed with bounded light pytest targets
huggingface#45697	applied	cherry-pick	passed	fix(testing): check torch.cuda.is_available() before get_device_capability	cherry-picked PR fix to avoid dragging unrelated upstream/main commits; removed stale local import so ruff passed; validation passed
huggingface#45614	merged	merge	passed	Add missing requests dependency to transformers[serving]	clean merge; fixed PR-local setup.py formatting and validation passed
huggingface#45594	merged	merge	passed	fix(utils): Resolve backbone utils test regressions	clean merge; validation passed with bounded light validation
huggingface#45591	merged	merge	passed	[nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight	clean merge; compile/style and bounded light validation passed
huggingface#45578	merged	merge	passed	Remove attribute_map from GptOssConfig	clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45570	merged	merge	passed	Fix whisper long-form generation when eos_token_id is a list	clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45568	merged	merge	passed	Gemma4: fix failed test cases	clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45552	merged	merge	passed	Remove warnings for modernbert	clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45549	merged	merge	passed	fix: apply channel averaging correctly in audio feature extractors	clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45548	merged	merge	passed	Fix EP + DeepSpeed ZeRO-3 loading via accelerate launch	clean merge; removed duplicate has_ep introduced by overlap with cumulative branch and validation passed
huggingface#45541	merged	merge	passed	Fix local_files_only tokenizer fallback when tokenizer files are missing (Issue 45538)	clean merge; validation passed
huggingface#45524	merged	merge	passed	utils: handle flash_attn missing from importlib packages_distributions without crashing	clean merge; validation passed
huggingface#45523	merged	merge	passed	Fix Seq2SeqLM ExecuTorch export: add encoder_attention_mask to decoder and use static encoder shapes	clean merge; validation passed
huggingface#45487	merged	merge	passed	Fix model parallel issue for altclip model and ChineseClip model	clean merge; validation passed
huggingface#45423	merged	merge	passed	Fix void segmentation map label reduction	clean merge; validation passed
huggingface#45422	merged	merge	passed	Drop from messages in	clean merge; validation passed
huggingface#45413	merged	merge	passed	Fix EtaLogitsWarper on fully masked logits	clean merge; validation passed
huggingface#45389	merged	merge	passed	Require input_ids for repetition penalty	clean merge; validation passed
huggingface#45379	merged	merge	passed	fix(config): add deepstack_visual_indexes to Qwen3_5MoeVisionConfig	clean merge; validation passed
huggingface#45378	merged	merge	passed	fix(mistral): guard ReasoningEffort import for older mistral_common versions	clean merge; ruff format applied to changed file and amended; validation passed
huggingface#45360	merged	merge	passed	Replace deprecated references with	clean merge; validation passed
huggingface#45351	already_present	none	not_run	fix(testing_utils): guard get_device_capability with torch.cuda.is_available()	current branch already guards CUDA/ROCM get_device_capability with torch.cuda.is_available(); direct PR merge only conflicted on later get_device_properties refactor/XPU availabil…
huggingface#45346	merged	merge	passed	Fix Double Application of Softmax for Router Logits in MoE models	clean merge; validation passed
huggingface#45342	merged	merge	passed	Use `_keys_to_ignore_on_load_unexpected/missing` recursively from children	clean merge; ruff format applied to changed file and amended; validation passed
huggingface#45321	merged	merge	passed	Remove references to torchao's AffineQuantizedTensor	merged with trivial import conflict resolved by removing obsolete AffineQuantizedTensor import; validation passed
huggingface#45317	merged	merge	passed	Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True	clean merge; validation passed
huggingface#45221	applied	patch	passed	user friendly error when loading audio from video	applied narrow video-container error check for audio loading; direct PR-head merge would drag hundreds of unrelated old-main changes
huggingface#45202	merged	merge	passed	Fix gemma4 has flash-attention incompatbile head-dim=512	merged Gemma4 FlashAttention guard fix; validation passed
huggingface#45193	merged	merge	passed	Config can apply pyndatic validation without torch-dependence	merged pydantic type-validator fix; validation passed
huggingface#45170	applied	patch	passed	`layrnorm` -> `layernorm`	applied CLIP-like pre_layernorm typo fix; validation passed
huggingface#45147	applied	patch	passed	Fix broken HQQ support	applied HQQ quantization and serialization fixes; direct merge avoided because PR head would bring 734 unrelated upstream-history files
huggingface#45128	applied	patch	passed	Fix: handle future annotations in _process_kwargs_parameters	applied future-annotations auto_docstring crash fix and amended formatting; direct merge avoided because PR head diff included 1049 unrelated upstream-history files
huggingface#45114	applied	patch	passed	fix: lets fix all doctests	applied documentation doctest fixes; resolved a simple one-line markdown conflict; no pytest targets selected
huggingface#45105	applied	patch	passed	Fix @auto_docstring crash with from future import annotations in _process_kwargs_parameters	applied focused auto_docstring string-annotation fix and tests; validation passed
huggingface#45086	merged	merge	passed	fix AttributeError in _patch_mistral_regex	merged tokenizer regex AttributeError fix; compileall, repo checkers, and light validation passed
huggingface#45060	merged	merge	passed	Fix PIL backend fallback when torchvision is unavailable	merged PIL backend fallback fix; compileall, repo checkers, and light validation passed
huggingface#45056	merged	merge	unavailable	[`auto_docstring`] needs to be only run on doc	merged auto_docstring lazy doc fix with straightforward conflict resolution; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#45055	applied	patch	passed	Save model config in Trainer checkpoints for non-PreTrainedModel models	applied narrow Trainer _save config persistence fix; direct merge conflicted across trainer refactor, but patch was localized and validation passed
huggingface#45034	merged	merge	passed	Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels from data collator	merged Qwen3.5 packed boundary metadata fix; resolved conflicts by combining prior cached multi-token fix with seq_idx/cu_seqlens fast-path changes; validation passed
huggingface#45017	merged	merge	passed	[WIP][Fix] GLM 5 set `apply_rotary_pos_emb` to `is_neox_style=False` && remove `F.relu()`	merged GLM-5 DSA rotary/FP8 indexer fix; resolved modular/generated cache/import conflicts and validation passed
huggingface#44981	merged	merge	unavailable	Trainer: set skip_logits for loss-only eval when liger enabled	merged Trainer skip_logits loss-only eval fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44973	merged	merge	unavailable	Fix max_seqlen type in vision attention for torch.compile + FA2	merged vision attention max_seqlen .item() fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44958	merged	merge	unavailable	fixed import error with PILImageResampling	merged PILImageResampling import error fix; amended auto-fix for unused import; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44952	merged	merge	unavailable	Fix: Add correct return behaviour when output_hidden_states=True for CLIP and SIGLIP vision models	merged CLIP/SigLIP hidden_states output fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44940	merged	merge	unavailable	Fix tie_weights skipping logic is not tied to model thread scope	merged tie_weights thread-scope fix; resolved initialization.py overlap with meta_device_safe_creation_ops; compileall and repo checkers passed, light validation timed out as base…
huggingface#44923	merged	merge	unavailable	fix: avoid unconditional model_info call in _patch_mistral_regex	merged offline/local _patch_mistral_regex model_info guard; resolved is_local overlap and amended formatting; compileall and repo checkers passed, light validation timed out as ba…
huggingface#44907	merged	merge	unavailable	Remove unnecessary expand_as in get_placeholder_mask across VLMs	merged VLM placeholder mask broadcast/compile fix; resolved ColQwen2 visual API overlap while keeping current self.vlm.visual call; compileall and repo checkers passed, light vali…
huggingface#44893	merged	merge	unavailable	add `StaticLayer.crop()` to match `DynamicLayer` API	merged StaticLayer.crop API fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44889	merged	merge	unavailable	[DeepSpeed] Fix evaluate()/predict() before train()	merged DeepSpeed evaluate-before-train fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44836	merged	merge	unavailable	Add cu_seqlens support to OlmoHybridGatedDeltaNet for packed sequences	merged OlmoHybrid packed-sequence cu_seqlens fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44827	merged	merge	unavailable	Fix Mistral4 tests	merged Mistral4 grouped-mm alignment/test fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44781	applied	cherry-pick	passed	Fix `_set_model_specific_special_tokens` to accept list-format `extra_special_tokens`	cherry-picked targeted tokenizer fix; direct PR-head merge would drag unrelated upstream history and conflicted on tests/test_tokenization_common.py; validation passed after autof…
huggingface#44731	merged	merge	unavailable	[Tests] Fix slow video tensor creation from list of numpy arrays in SmolVLM	merge clean; compileall and repo checkers passed, but light validation/tests_fetcher timed out as baseline light validation did
huggingface#44697	applied	cherry-pick	passed	fix: torch_float should return float, not int	cherry-picked targeted torch_float fix; compileall, repo checkers, and light validation passed (255 passed, 102 skipped)
huggingface#44680	merged	merge	passed	Allow kernel modules to declare their preferred mask function	merge clean; validation commands passed
huggingface#44664	merged	merge	passed	🚨 Generic Sequence Classifier works for multimodal models	merge clean; validation commands passed
huggingface#44641	merged	merge	passed	Conditinally passing and_mask_function arg to create_causal_mask	merge clean; compileall, repo checkers, and light validation passed
huggingface#44626	already_present	none	not_run	don't break legacy behavior when enforced!	PR head is already contained in the cumulative branch; no merge was needed
huggingface#44615	merged	merge	unavailable	Restore is_torch_fx_available for trust_remote_code backwards compatibility	merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44606	merged	merge	unavailable	optionally override tokenizer class with serialized tokenizer	merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44603	merged	merge	unavailable	fixed dockerfile for arm64 systems	merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44587	merged	merge	unavailable	Fix: Handling fused qkv result tensor slicing for tp sharded qkv weights	merge clean; narrowed ruff format was amended into merge commit; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44585	merged	merge	unavailable	Fix missing rms_norm_eps in DeepseekV3 MLA layernorms	merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44535	already_present	none	unavailable	Fix crash in Qwen2_5_VLProcessor when using batched input with padding=False	current branch already avoids the ragged np.array crash via ProcessorMixin.create_mm_token_type_ids iterating per input; direct merge conflicted with the newer helper-based implem…
huggingface#44385	merged	merge	unavailable	Fix make check-repo	merge clean; validation unavailable: baseline and post-merge light validation timed out in tests_fetcher
huggingface#44270	merged	merge	unavailable	Add correct typing to custom images_kwargs in ProcessorsKwargs	merge clean after resolving Ernie VL Moe naming conflict and narrow import-order fixes; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44257	merged	merge	unavailable	use nanmean for aggregating loss	merge clean; compileall and style passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44228	merged	merge	unavailable	[Quantisation] account for nested tensors from quantisers	merge clean; compileall and style passed after removing transient untracked validation checkout blockers; light validation unavailable because tests_fetcher timed out as in baseli…
huggingface#44189	merged	merge	unavailable	fix: don't move model to device under other dist train backends	merge resolved by carrying the current full-eval move block and adding the distributed-backend guards; compileall and style passed; light validation unavailable because tests_fetc…
huggingface#43989	merged	merge	unavailable	Fix AutoVideoProcessor class lookup when torchvision is unavailable	merged AutoVideoProcessor None-mapping guard; resolved current string-valued mapping conflict; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43967	merged	merge	unavailable	Fix AttributeError in run_classification.py when detecting multi-label data	merge clean; compileall and repo checkers passed; removed transient tests_fetcher checkout blockers, then light validation timed out as in baseline
huggingface#43961	merged	merge	unavailable	Replace mutable default arguments with None	merged mutable-default cleanup; resolved Idefics defaults conflict; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43911	merged	merge	unavailable	add Llama to mapping names in tokenization_auto.py	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43875	merged	merge	unavailable	Improve handling of QuantizedLayer.reset	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43842	already_present	none	unavailable	fix(cli): Fix TypeAdapter NameError when pydantic is not installed	serve CLI has been refactored to cli/serving modules and no current TypeAdapter runtime annotation/import remains; direct old-file merge conflicted, so no code change needed
huggingface#43836	already_present	none	unavailable	fix: wrapped TypeAdpater in string literals (for now)	same TypeAdapter serve import defect as huggingface#43842; current refactored serve entrypoint no longer has runtime TypeAdapter annotations, and old-file merge conflicted without needing a …
huggingface#43833	merged	merge	unavailable	fix: ensure dtype consistency in grouped_mm under autocast	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43826	merged	merge	unavailable	fix: error message of pipeline	resolved pipeline check_task conflict by retaining translation alias handling; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43816	already_present	none	unavailable	fix: add id and resume parameters to SwanLab integration	current SwanLab callback already supports SWANLAB_RUN_ID and SWANLAB_RESUME, with an additional resume_from_checkpoint default; old merge conflicted only in the same already-porte…
huggingface#43779	merged	merge	unavailable	SwanLab: Add support for id and resume arguments in SwanLabCallback	merge clean on top of existing SwanLab env-var resume support; compileall and repo checkers passed, removed transient tests_fetcher checkout blockers, then light validation timed …
huggingface#43775	merged	merge	unavailable	fix(moe): normalize auxiliary loss by top_k for correct load balancing	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43757	already_present	none	unavailable	Avoid hard failure for gpt-oss GGUF architecture by falling back to g…	current GGUF loader has native gpt_oss/gpt-oss mapping and GptOssTensorProcessor support; old fallback-to-gpt-neox change conflicts with the newer support, so no code change was n…
huggingface#43747	merged	merge	unavailable	Remove CompressedLinear support for compressed-tensors > 0.13	resolved compressed-tensors 0.14 conflict by combining newer QuantizationStatus-based tests with automatic decompression; compileall and repo checkers passed, removed transient do…
huggingface#43656	already_present	none	unavailable	Fix TypeAdapter NameError in transformers CLI	same TypeAdapter serve import defect as huggingface#43842/huggingface#43836; current serve CLI uses Typer/Annotated and no TypeAdapter runtime annotation remains
huggingface#43654	merged	merge	unavailable	fix(tokenizer): Avert special token property overwrites in batch add_tokens calls	resolved tokenizer test conflict by retaining protobuf tests and adding BigBird mask-token regression; compileall and repo checkers passed, removed transient docs checkout blocker…
huggingface#43651	merged	merge	unavailable	Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43549	merged	merge	unavailable	[kernels] exception handling for fa kernels	merge clean; compileall and repo checkers passed, removed transient docs checkout blockers, then light validation timed out as in baseline
huggingface#43543	merged	merge	unavailable	Fix fp16 underflow in MoE load balancing loss by enforcing fp32 softmax	merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43542	already_present	none	unavailable	fix: output router capture wrong router logits in qwen moe models	current MoE router implementations already preserve raw router_logits in a separate variable and use router_probs for top-k routing; PR merge only conflicted on a variable rename …
huggingface#43492	merged	merge	unavailable	Perception Encoder follow up PR	merge conflicted only with newer strict config/conversion mapping layout; resolved locally, compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43466	merged	merge	unavailable	Fix mask loss to ignore padding areas in object detection	merge clean; compileall and repo checkers passed, light validation unavailable after transient checkout blockers were cleaned and rerun timed out as in baseline
huggingface#43395	merged	merge	unavailable	Fix label truncation for per-sample nested structures in Trainer	merge conflict was limited to preserving both trainer_pt_utils imports; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43382	merged	merge	unavailable	Allow Path type in transformers.image_utils.load_image function	resolved moved-on import/signature conflict to allow pathlib.Path in load_image; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43378	merged	merge	unavailable	feat(models): Make MimiModel encoding padding-aware to ensure batch-to-individual consistency	clean merge; compileall and repo checkers passed; light validation first hit transient tests_fetcher checkout blockers, then timed out as in baseline after cleanup
huggingface#43291	merged	merge	unavailable	Fix whisper tests	clean merge; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43270	merged	merge	unavailable	fix _retrieve_segment timestamps offset bug	clean merge; fixed PR-introduced trailing whitespace/formatting and amended merge commit; compileall and repo checkers passed; light validation hit transient tests_fetcher checkou…
huggingface#43254	merged	merge	unavailable	Add supported kwargs to fixed_cross_entropy	clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43251	already_present	none	unavailable	Fix(43240): pass kwargs to nn.functional.cross_entropy	same issue and API change already merged from PR 43254; direct merge only conflicted on equivalent formatting in loss_utils.py, so no additional code change needed
huggingface#43238	applied	patch	unavailable	Fix ObjectDetectionPipeline batch processing bug huggingface#31356	patch applied cleanly; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43212	merged	merge	unavailable	Add regression test for offline tokenizer loading (fixes huggingface#43200)	clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43151	merged	merge	unavailable	Make TF32 tests hardware-aware for PyTorch 2.9+	clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43149	already_present	none	unavailable	docs(serving): add minimal Python client examples for chat completion…	stop_strings serving fix is already present in the refactored serving handlers; direct merge conflicts with the newer split serving implementation and deleted docs page
huggingface#43133	applied	patch	unavailable	Fix flaky SAM-HQ integration tests by adding set_seed	patch applied; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43096	applied	patch	unavailable	Fix save_pretrained for quantized models with custom serialization	patch applied; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43028	applied	patch	unavailable	Fix default interpolation to BICUBIC for ViT, EfficientNet, PVT	ViT image processor defaults patched to BICUBIC; EfficientNet/PVT were already BICUBIC; compileall/checkers passed, light validation unavailable due tests_fetcher reverse dependen…
huggingface#43015	merged	merge	unavailable	FIX: TF32 warning (huggingface#43012)	clean merge replacing RoPE outer-product matmul with broadcast multiplication; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42979	merged	merge	unavailable	Fix dtype mismatch in in modeling_llava_next	clean merge casting LlavaNext hidden states to lm_head dtype before logits; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42942	merged	merge	unavailable	Fix result retrieval starvation and terminate request-scoped iteration on completion	merged starvation-safe queue scanning and request iterator completion/cancellation handling; conflict resolved for current OutputRouter API; compileall/checkers passed, light vali…
huggingface#42916	already_present	none	not_run	Fix: Apply clean_up_tokenization_spaces in TokenizersBackend._decode	current branch already applies clean_up_tokenization_spaces in TokenizersBackend._decode, with additional BPE safeguards; PR head conflicted only with that evolved implementation
huggingface#42900	merged	merge	unavailable	Fix: Set clean_up_tokenization_spaces	merged default clean_up_tokenization_spaces=True while preserving current BPE override guard; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42881	merged	merge	unavailable	[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping	merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75)
huggingface#42865	merged	merge	unavailable	Raise explicit error when FP8 is requested via from_config	merge conflict in modeling_utils.py resolved by keeping current name_or_path assignment and adding quantization_config guard; compileall/checkers passed; light validation unavaila…
huggingface#42824	applied	patch	unavailable	Fix torch only support for fast Processors	applied narrower MLX BatchFeature support; compileall/checkers passed; light validation unavailable because tests_fetcher failed on missing generated fast image processor map entry
huggingface#42793	merged	merge	unavailable	[Quantization] Fixing last issues for a green 2026 CI hopefully	merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75)
huggingface#42774	already_present	none	not_run	Fix: add None check for extractors in video_processor_class_from_name	current branch already skips None VIDEO_PROCESSOR_MAPPING_NAMES entries before comparing the requested video processor class; PR head conflicted only because it targeted the older…
huggingface#42717	merged	merge	unavailable	image_transforms: fix tensor annotations	merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75), matching baseline
huggingface#42631	merged	merge	unavailable	Make GraniteMoeHybridModel compatible with torch.export	merged with a local conflict resolution adapting the torch.export guard to the current past_key_values-based mask helper; compileall/checkers passed; light validation unavailable …
huggingface#42598	merged	merge	unavailable	[pipeline] fix unwanted import failure	merge clean; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75), matching baseline
huggingface#42542	already_present	none	passed	handle get_input_embeddings() on models like gemma3 gracefully	current branch already catches NotImplementedError from get_input_embeddings and skips modules without hook support; no code change needed
huggingface#42521	merged	merge	unavailable	Fix FSDP2 defaulting to version 1 in TrainingArguments; add dynamic plugin param passthrough	merged with local conflict resolution against current FSDP plugin construction; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed…
huggingface#42493	merged	merge	unavailable	fix: remove trailing os sep in local pretrained model path	merged with local test conflict resolution; removed stale untracked files that blocked tests_fetcher; compileall and repository checkers passed; light validation unavailable becau…
huggingface#42446	merged	merge	unavailable	Fix DataParallel dtype access in smolvlm	merge clean; compile and style passed; light validation unavailable because tests_fetcher timed out with exit 124, matching baseline
huggingface#42311	merged	merge	unavailable	Fix: Guard against None num_query_tokens in Blip2Processor (to avoid TypeError)	merge completed with a small Blip2Processor conflict resolution; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s,…
huggingface#42228	merged	merge	unavailable	Support .to(device) or Device Aware Handling for Segmentation Labels in EOMTImageProcessor huggingface#42205	merge clean; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s, matching baseline
huggingface#42133	merged	merge	unavailable	Fix qwen moe Load balancing loss calculation outside training	merge clean; compileall and repository checkers passed after removing untracked validation checkout leftovers; light validation unavailable because tests_fetcher timed out after 9…
huggingface#42131	already_present	none	not_run	Mistral Tokenizer Converter Script - Initialization with Pattern Argument	PR conflicted only because current branch already extracts the Mistral Tekken pattern and passes it to MistralConverter while using the newer PreTrainedTokenizerFast path; no code…
huggingface#42130	already_present	none	not_run	Try refactoring logging to make type checks pass	current branch already resolves the logging typing issue with TransformersLogger typing and explicit type ignores on logger method injection; the older PR subclass refactor confli…
huggingface#42127	merged	merge	unavailable	Standardize conv len function for audio models	merge required straightforward import-conflict resolution preserving current deprecation/flash-attention imports plus new _conv_out_length helper; compileall and repository checke…
huggingface#42098	merged	merge	unavailable	Fix mel length computation in Qwen2-Audio	merge required straightforward docstring conflict resolution and applied odd-length mel sequence formula fix; compileall and repository checkers passed; light validation unavailab…
huggingface#42051	merged	merge	unavailable	Fix model_input_names singleton issue causing shared state	merge clean; compileall and repository checkers passed after removing untracked validation artifacts; light validation unavailable because tests_fetcher timed out after 90s, match…
huggingface#41980	already_present	none	not_run	Correct type hint in config models	current branch already has rms_norm_eps annotated as float in the affected configuration and modular files
huggingface#41973	merged	merge	unavailable	Fix import error with huggingface_hub v1.0.0+	resolved import-only conflicts to use huggingface_hub.errors; compileall and repository checkers passed after narrow ruff fix and removing untracked validation artifacts; light va…
huggingface#41928	merged	merge	unavailable	fix: add clear error message when mistral-common is missing for AutoTokenizer loading Voxtral	resolved AutoTokenizer conflict by adding Voxtral mistral-common availability check to current v5 tokenizer mapping path; compileall and repository checkers passed after narrow im…
huggingface#41904	applied	patch	unavailable	Fix inaccurate eval and train loss computation with variable batch sizes	applied focused loss-weighting patch after direct merge conflicted in moved Trainer utility/save_model region; compileall and repository checkers passed; light validation unavaila…
huggingface#41901	merged	merge	unavailable	[executorch] Update pytree registration for DynamicCache	clean merge; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s, consistent with baseline validation unavailability
huggingface#41879	already_present	none	unavailable	Fix/processor multiple tokenizers	direct merge conflicted in processing_utils.py, but the cumulative branch already excludes tokenizer attributes before deepcopy, saves secondary tokenizers in per-attribute subfol…
huggingface#41855	already_present	none	unavailable	Add Mistral tokenizer missing methods	direct merge conflicted in tokenization_mistral_common.py docs/code from an older branch, but the cumulative branch already has get_vocab, convert_ids_to_tokens, convert_tokens_to…
huggingface#41851	already_present	none	unavailable	Fix deepcopy in ProcessorMixin.to_dict for GemmaTokenizerFast	same head as PR 41879; direct merge conflicted in processing_utils.py, but the cumulative branch already avoids deepcopying tokenizer attributes and has the multiple-tokenizer sav…
huggingface#41844	applied	patch	unavailable	Fix FSDPv2 checkpoint saving on TPU by using recursive unwrap	direct merge conflicted after TPU saving moved to integrations/tpu.py; adapted recursive FSDPv2 unwrap fix there; compileall and repository checkers passed; light validation unava…
huggingface#41827	applied	patch	unavailable	[`Flash Attention`] Disable packed sequences with pos ids only during torch compile	direct merge conflicted in evolved FlashAttention utility; applied compile/tracing guard for packed position ids; compileall and repository checkers passed; light validation unava…
huggingface#41754	applied	patch	unavailable	Add pytree registration for static cache	direct merge conflicted in executorch cache export code; adapted generalized DynamicCache and StaticCache pytree registration to current cache layer classes; compileall and reposi…
huggingface#41734	merged	merge	unavailable	Fix CUDA errors in sharded generation with Qwen3	direct merge conflicted in generation/utils.py but was resolved by placing the sharded sampling invalid-value guard after current generation-config handling; compileall and reposi…
huggingface#41724	merged	merge	unavailable	Fix confusing warning in EncoderDecoderModel when creating decoder_input_ids from labels	direct merge conflicted in EncoderDecoderModel mask creation but was resolved by preserving the current mask dtype path and adding the PR warning split; compileall and repository …
huggingface#41721	merged	merge	unavailable	Fix Qwen3-VL Processor flattening multi-image batches (fix huggingface#41709)	clean merge; removed unrelated untracked legacy/generated artifacts that were making ruff scan obsolete files; compileall and repository checkers passed; light validation unavaila…
huggingface#41718	already_present	none	unavailable	AutoTokenizer: clear ImportError when loading Voxtral without mistral-common + unit test	direct merge conflicted in auto tokenization and the Voxtral tokenizer test, but the cumulative branch already has the clear mistral-common ImportError guard and matching unit tes…
huggingface#41701	merged	merge	unavailable	Fix qwen3_vl mix precision dtype	clean merge; compileall and repository checkers passed; light validation unavailable as at baseline
huggingface#41698	merged	merge	unavailable	Fix tokenizer check script: safe dataset access, default checkpoints, and tested in dry-run mode	direct merge conflicted in scripts/check_tokenizers.py debug-output removal; resolved with the PR dry-run/default-checkpoint changes and amended narrow ruff fixes; compileall and …
huggingface#41687	merged	merge	unavailable	fix(data): Handle integer labels in DataCollatorWithFlattening	direct merge conflicted in DataCollatorWithFlattening and the refactored data-collator tests; adapted scalar-label broadcasting to the current collator API and added a focused tes…
huggingface#41631	already_present	none	not_run	Incorrect access of dataset field fixed	current scripts/check_tokenizers.py already accesses XNLI premise and hypothesis directly and keeps the newer dry_run handling, so the PR fix is present without merging the older …
huggingface#41609	already_present	none	not_run	Fix gemma gguf tokenizer	current AutoTokenizer GGUF path loads config from GGUF and maps gemma model types to GemmaTokenizer, so the intended Gemma GGUF tokenizer-class fix is already covered; direct PR b…
huggingface#41606	already_present	none	not_run	fix(processing): Filter kwargs in ProcessorMixin call to prevent Type…	current ProcessorMixin.call already merges kwargs by modality and filters them against each sub-processor call signature before dispatch, so the TypeError fix is present; …
huggingface#41592	already_present	none	not_run	Improve AutoTokenizer error message for Voxtral models missing mistral-common	current AutoTokenizer already raises a clear ImportError for model_type 'voxtral' when mistral-common is unavailable, covering the intended missing-dependency error-message fix
huggingface#41521	merged	merge	unavailable	Fix forced_bos_token_id not set in generation_config	merge clean; compileall and repository checkers passed, but light validation unavailable as in baseline: tests_fetcher timed out after 90s (exit 124) and run-light-validation exit…
huggingface#41485	applied	patch	unavailable	Fix smolvlm2 dtype mismatch final	applied minimal dtype conversion in SmolVLM inputs_merger; compileall passed, but full validation unavailable as in baseline due pre-existing ruff failures in untracked benchmark_…
huggingface#41441	merged	merge	unavailable	Enhance the handling of Union types in HfArgumentParser	merge clean; compileall passed, but full validation unavailable as in baseline due pre-existing ruff failures and tests_fetcher checkout failure on untracked files
huggingface#41330	merged	merge	unavailable	Unskip and fix offline mode tests, use HF_HUB_OFFLINE, make hermetic	merge completed with one test_offline conflict resolved to current transformers.utils is_offline_mode semantics; compileall passed, full validation unavailable as in baseline due …
huggingface#41319	already_present	none	not_run	Use torch._check instead of a test in Gemma3Model	current Gemma3Model already implements the PR intent with torch_compilable_check for the image token/feature validation, so the older torch._check patch conflicts only because the…
huggingface#41313	merged	merge	unavailable	Jitter noise PR	merge completed after resolving router conflict to preserve current classifier dtype casting while applying jitter only to a routing copy; compileall passed, full validation unava…
huggingface#41312	already_present	none	not_run	tests: unskip and fix offline mode test using HF_HUB_OFFLINE + hermetic cache warmup	offline-mode test is already unskipped and uses HF_HUB_OFFLINE/cache warmup after merged PR huggingface#41330, which supersedes this narrower duplicate fix
huggingface#41273	already_present	none	not_run	fix(quantization): Skip weight initialization for quantized models	current v5 loading path already removes loaded quantized targets from missing_keys and marks them _is_hf_initialized in core_model_loading, superseding this older modeling_utils/H…
huggingface#41239	already_present	none	not_run	Add num_hidden_layers to t5gemma's top level config	current T5GemmaConfig already exposes top-level num_hidden_layers and uses it to build both encoder and decoder layers
huggingface#41169	merged	merge	unavailable	Fix TorchDynamo crash in StaticCache by validating offloading and offload_only_non_sliding arguments	merge clean; compileall passed, full validation unavailable as in baseline due pre-existing untracked benchmark/examples lint failures and tests_fetcher checkout blockers
huggingface#41132	applied	cherry-pick	unavailable	fix(SpeechT5Config): missing annotation on `inputs_to_logits_ratio` property	applied @Property fix for SpeechT5Config.inputs_to_logits_ratio; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failures and tests_f…
huggingface#41121	applied	cherry-pick	unavailable	fix: resolve the unexpected video frame drop issue of the InternVL model with multiple video inputs	adapted InternVL multi-video frame indexing fix and updated test expectation; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failure…
huggingface#41105	applied	patch	unavailable	Fix is_torch_neuroncore_available	patched is_torch_neuroncore_available to respect check_device via is_torch_xla_available(check_is_gpu=check_device); compileall passed, full validation unavailable as in baseline …
huggingface#41097	applied	patch	unavailable	Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py	patched _upad_input to delay nonzero index materialization in the equal query/key length path; compileall passed, full validation unavailable as in baseline due pre-existing untra…
huggingface#41077	merged	merge	unavailable	Fix: add num_hidden_layers property to T5GemmaConfig and add test for use_cache	merge clean; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failures and tests_fetcher checkout/dependency-map failure
huggingface#41075	applied	cherry-pick	unavailable	Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding	applied Qwen3 greedy deterministic generation fix and regression test; compileall and narrow ruff checks for changed files passed, full validation unavailable as in baseline due p…
huggingface#40954	already_present	none	not_run	Fix Issue huggingface#40913: Respect user-provided chat_template parameter in processor creation	current processing_utils.py already applies user kwargs after loaded processor/chat template defaults, so a user-provided chat_template overrides model defaults; PR merge conflict…
huggingface#40908	merged	merge	unavailable	Fix `load_balancing_loss_func` incompatible with `past_key_values`	merge clean; compileall passed, but baseline/checkers already fail on pre-existing untracked benchmark_v2/examples legacy lint, and light validation is blocked by pre-existing unt…
huggingface#40857	already_present	none	not_run	Token	current Trainer already snapshots initial num_input_tokens_seen at train start and reports train speed from current-session tokens; PR direct merge conflicts with extensively refa…
huggingface#40790	applied	patch	unavailable	Handle loading non-existent checkpoints or corrupted checkpoints.	applied checkpoint resume handling: missing output dir returns no checkpoint, resume_from_checkpoint=True warns instead of raising when no checkpoint exists, and saves latest chec…
huggingface#40783	merged	merge	unavailable	Fix None quantization_config equivalence with omitted param in AutoModel.from_pretrained	merge clean; compileall and ruff on changed file passed, but full validation unavailable due pre-existing benchmark_v2/examples lint failures and tests_fetcher checkout blocked by…
huggingface#40695	already_present	none	not_run	remove vision2seq vs image-text-to-text	current branch already has no MODEL_FOR_VISION_2_SEQ mapping, so Qwen entries are not tagged vision2seq; merge aborted after conflict against removed mapping
huggingface#40666	merged	merge	unavailable	Remove overwritten GitModelTest::test_beam_search_generate	merge clean; compileall passed, but baseline lint already fails on pre-existing untracked benchmark/examples artifacts and light validation cannot checkout because of pre-existing…
huggingface#40563	already_present	none	not_run	fix to get output of intermediate output of dinov3 for more use case	current DINOv3 ViT already forwards output_hidden_states through DINOv3ViTEncoder and returns hidden_states in BaseModelOutputWithPooling; smoke check produced three hidden-state …
huggingface#40492	applied	patch	unavailable	avoid divid zero errors.	direct merge conflicted in masking_utils after API changes; applied divide-by-zero guard and prefix update locally; compile passed, style/light validation unavailable due pre-exis…
huggingface#40438	applied	patch	unavailable	Resolve automatic label name detection when single label provided	direct merge conflicted after Trainer init refactor; applied equivalent args.label_names fallback before self.args assignment; compile passed, style/light validation unavailable f…
huggingface#40392	merged	merge	unavailable	Remove debug print statement from ShieldGemma2 conversion script	merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40385	merged	merge	unavailable	Fix typo: 'seperate' -> 'separate' in mm_grounding_dino conversion sc…	merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40358	applied	patch	unavailable	Fix MXFP4 mlp_forward to handle 2D and 3D hidden_states shapes for multi-turn chat	direct merge conflicted but the source fix was portable; applied 2D/3D shape-preserving MXFP4 mlp_forward change; compile passed; style/light validation unavailable for baseline e…
huggingface#40244	already_present	none	not_run	add-loftr-keypoints-to-map	current cumulative branch already contains EfficientLoFTRForKeypointMatching in MODEL_FOR_KEYPOINT_MATCHING_MAPPING_NAMES
huggingface#40221	already_present	none	not_run	FIX: enable load_best_model_at_end within SaveStrategy.BEST and initialize metric_for_best_model as loss when SaveStrategy.BEST	current cumulative branch already includes SaveStrategy.BEST in best_global_step handling and defaults metric_for_best_model for save_strategy=best
huggingface#40208	applied	patch	unavailable	Save only model sharded sd	direct cherry-pick conflicted against current Trainer refactors, but the remaining missing PR change was portable; removed the stale FSDP save_only_model SHARDED_STATE_DICT reject…
huggingface#40148	merged	merge	unavailable	Update utils.py: fix nan	merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40123	already_present	none	not_run	Lazily import torchao Int4WeightOnlyConfig to avoid side effects	current modeling_utils no longer eagerly imports torchao Int4WeightOnlyConfig at module import time; loading code has been refactored so the PR side-effect fix is already present/…
huggingface#40114	applied	patch	unavailable	Fix torch.export compatibility for Mixtral MoE models	direct PR head contains old main merge history; applied the Mixtral inference expert loop fix to the current modular and generated files; compile passed; style/light validation un…
huggingface#40090	merged	merge	unavailable	Fix RuntimeError when loading quantized models with int8 weights (huggingface#39366)	merge required resolving modeling_utils initialization conflict; compile passes; style/light validation unavailable due pre-existing untracked benchmark/examples/doc artifacts and…
huggingface#40065	merged	merge	unavailable	Delay float32 upcast in ForCausalLMLoss after filtering ignore_index	merge required resolving loss_utils conflict against current union type syntax; compile passes; style/light validation unavailable due pre-existing untracked benchmark/examples/do…
huggingface#40059	merged	merge	unavailable	Fix Inefficient GELU implementation in GPT2	merge required resolving GPT2Config conflict against current declarative config form by changing activation_function default to gelu; compile passes; style/light validation unavai…
huggingface#40022	merged	merge	unavailable	fix: resolve dropout type error in DogeDecoder	merge required resolving Doge init conflict using current init helpers; dropout tuple guard present; compile passes; style/light validation unavailable due pre-existing untracked …
huggingface#39999	applied	patch	unavailable	allow TP to work in ND-parallel with fsdp cpu ram efficient loading	direct merge conflicted after tensor-parallel setup moved; adapted meta device_map fix to integrations/tensor_parallel.py; compile passed; style/light validation unavailable from …
huggingface#39997	merged	merge	unavailable	make sure position_ids are passed in for causal mask creation for gpt-oss	clean merge; compile passed; style/light validation unavailable due same baseline environmental untracked-artifact blockage
huggingface#39962	already_present	none	unavailable	Use torch._check instead of a test to make the model Gemma3 exportable	direct merge conflicted across copied VLM files, but current code already uses torch_compilable_check, a torch._check-based export-compatible helper, for these image-token checks;…
huggingface#39866	merged	merge	unavailable	make sure model.save_pretrained has the correct is_main_process	merge required resolving Trainer._save conflict by adding save_pretrained is_main_process gate to current save_safetensors call; compile passes; style/light validation unavailable…
huggingface#39794	applied	patch	unavailable	Fix ProphetNet forward to handle tuple encoder_outputs	direct merge conflicted across ProphetNet signatures and generic kwargs utilities after v5 API changes; applied minimal tuple-to-BaseModelOutput conversion in current ProphetNetMo…
huggingface#39793	merged	merge	unavailable	Fix DAC conversion script	clean merge; compile passes; style/light validation unavailable due same baseline untracked benchmark/examples/doc artifacts and tests_fetcher checkout blockage
huggingface#39756	already_present	none	not_run	Fix rope_deltas corruption in Qwen2.5VL during CFG generation	core rope_deltas generation fix is already present in current qwen2_vl/qwen2_5_vl/glm4v code; direct PR head also contains unrelated merged upstream history, so no code change was…
huggingface#39741	merged	merge	unavailable	Fix HfArgumentParser to filter out dict types from Union	merge clean after narrow ruff-compatible amendment; compile passed, but baseline ruff/test selection remain unavailable due pre-existing untracked artifacts
huggingface#39698	applied	patch	unavailable	Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"	patch applied; compile passed, while ruff/light validation remain unavailable due pre-existing dirty-baseline artifacts and tests_fetcher mapping issue
huggingface#39697	merged	merge	unavailable	use untyped storage for dtensors due to deprecation	merge clean; compile passed, ruff/light validation unavailable due pre-existing dirty-baseline artifacts
huggingface#39683	applied	patch	unavailable	Fix issue huggingface#39191 respect accelerate config to disable torch.dynamo compilation	patch applied; compile passed, validation otherwise unavailable due baseline ruff artifacts/tests_fetcher failure
huggingface#39675	already_present	none	not_run	[BugFix]: Support dict and config file path for deepspeed	current TrainingArguments already types deepspeed as dict \| str \| None and preserves dict/path handling; direct PR merge conflicted against the newer dataclass layout
huggingface#39674	applied	patch	unavailable	Fix loss scaling and token aggregation to use only data parallel group	patch applied; compile passed, validation otherwise unavailable due baseline ruff artifacts/tests_fetcher failure
huggingface#39625	already_present	none	not_run	Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI	current HfArgumentParser already filters dict from Union types before CLI parsing, covering Union[str, dict, None] fields such as deepspeed
huggingface#39617	already_present	none	not_run	Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model	current Trainer already assigns accelerator.prepare(self.model) to the local model and then sets self.model/model_wrapped to the wrapped model for FSDP
huggingface#39599	applied	patch	unavailable	Fix: check TrainerState file exists before loading during resume	patch applied; compile passed, ruff/light validation unavailable due baseline untracked artifact ruff failures and tests_fetcher reverse-map failure
huggingface#39560	applied	patch	unavailable	fix load_model_end = true work when save_steps < eval_steps	patch applied; compile passed, ruff/light validation unavailable due baseline untracked artifact ruff failures and tests_fetcher reverse-map failure
huggingface#39493	merged	merge	unavailable	[Voxtral] nit + pin correct mistral common version	merged dependency update; validation unavailable because baseline ruff already fails on pre-existing untracked benchmark_v2/examples legacy files
huggingface#39491	merged	merge	unavailable	Fix: Skip weight initialization for quantized int8 models	merged cleanly; validation unavailable because baseline ruff already fails on pre-existing untracked benchmark_v2/examples legacy files
huggingface#39468	applied	patch	unavailable	Fix quantized model dispatch with device_map=auto	ported bitsandbytes device_map dispatch skip; validation unavailable due pre-existing baseline ruff failures
huggingface#39464	already_present	none	unavailable	Skipping `initialize_weights` when model is quantized	quantized initialization fix already present from PR 39491; direct merge conflicted in refactored modeling_utils.py
huggingface#39449	already_present	none	unavailable	Fix logger warnings in Gemma model test files	Gemma logger.warning_once/warn occurrences targeted by the PR are already absent in current files; direct merge conflicted across refactored/deleted Gemma files
huggingface#39297	already_present	none	not_run	Fix bug with deepspeed and accelerator args in training_args.py	current training_args.py already supports dict\|str for deepspeed and accelerator_config via _VALID_DICT_FIELDS and post-init conversion, so the CLI parsing defect is already addr…
huggingface#39264	already_present	none	not_run	Fix: Add version check for timm to support mobilenetv5 models (fixes huggingface#39208)	current timm wrapper already routes model construction through _create_timm_model_with_error_handling and raises an ImportError suggesting timm upgrade for unknown architectures, …
huggingface#39257	merged	merge	unavailable	Fix to tuple conversion with config	merged cleanly after preserving current helper functions and applying return_dict=True wrapper fix; compile passed, but full validation remains unavailable due pre-existing style …
huggingface#39211	already_present	none	not_run	Add mobilenet_v5 stub implementation to fix "Unknown Model" error	the current timm wrapper already uses _create_timm_model_with_error_handling and raises an actionable ImportError for unknown timm architectures, covering the Gemma3n MobileNetV5 …
huggingface#39206	applied	patch	unavailable	fix: filter None router logits in Qwen3 MoE and handle empty router logits (huggingface#39203)	applied the current-code equivalent guard for empty Qwen3 MoE router logits; direct PR-head merge/patch would drag stale generated-model and unrelated upstream changes. Compile pa…
huggingface#39103	applied	patch	unavailable	Fix audio-related config naming for Gemma3n	applied the current-code equivalent Gemma3n audio_soft_tokens_per_audio rename; compile passed, but validation remains unavailable due pre-existing ruff failures in untracked benc…
huggingface#39046	applied	patch	unavailable	Fix deprecated max_size parameter handling in DETR image processors	applied the current-code equivalent max_size handling to DETR-family torchvision/PIL processors; compile passed, but validation remains unavailable due pre-existing ruff failures …
huggingface#39037	applied	patch	unavailable	fix `kosmos2` tests	applied the current-code equivalent Kosmos2 non-causal image-to-text attention override; compile passed, but validation remains unavailable due pre-existing ruff failures in untra…
huggingface#39012	already_present	none	unavailable	[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training	PR conflicts because DeepseekV3 MoE has since moved to DeepseekV3NaiveMoe/current modular code; the stale expert loop targeted by the PR is no longer present, so the defect fix is…
huggingface#38999	already_present	none	unavailable	Use deep copies instead of shallow copies for bbox_embed in GroundingDINO decoder (huggingface#37333).	current GroundingDINO already constructs a fresh GroundingDinoMLPPredictionHead per decoder layer and uses tied-weight metadata when sharing is requested, so the shallow ModuleLis…
huggingface#38888	merged	merge	unavailable	continue to fix distributed_type from TPU to XLA in LM examples (huggingface#38652)	merge clean; validation unavailable from pre-existing untracked/runtime artifacts: ruff fails on benchmark_v2/examples legacy artifacts and light validation/tests_fetcher cannot c…
huggingface#38884	already_present	none	not_run	Llama 4 conversion fix for moe models	current converter already avoids treating missing/None moe_args as an MoE configuration via params.get('moe_args', False); no code change needed

Rejected / not included records (518)

PR	Category	Status	Original PR summary / goal	Rejection reason
huggingface#45690	feature	skipped	[serve] Support for reasoning	category not configured for this cumulative branch
huggingface#45679	other	skipped	TST Run fast PEFT tests in normal CI	category not configured for this cumulative branch
huggingface#45668	feature	skipped	[GGUF] Add support for Qwen3.5 MoE (qwen35moe arch)	category not configured for this cumulative branch
huggingface#45667	other	skipped	chore(typing): add ty type checking for 3 pipeline files	category not configured for this cumulative branch
huggingface#45666	feature	skipped	Extended n-to-1 kernel fusion via `KernelConfig`	category not configured for this cumulative branch
huggingface#45664	documentation	skipped	Doc translate to Persian(farsi)	category not configured for this cumulative branch
huggingface#45654	other	skipped	[CB] Refactor any model-related code in a separate class	category not configured for this cumulative branch
huggingface#45653	feature	skipped	[CB] Better overall script and decode bucketting	category not configured for this cumulative branch
huggingface#45651	feature	skipped	[Trainer] Optimize LengthGroupedSampler computation with select_columns and tqdm	category not configured for this cumulative branch
huggingface#45643	feature	skipped	Add DeepSeek V4	category not configured for this cumulative branch
huggingface#45640	feature	skipped	🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config	category not configured for this cumulative branch
huggingface#45638	feature	skipped	Add Multi-Token Prediction (MTP) support for Qwen3.5	category not configured for this cumulative branch
huggingface#45695	feature	skipped	Support for a new Granite-Speech-Plus model	category not configured for this cumulative branch
huggingface#45635	other	skipped	qa: speed up dtype regex weight load + reduce dtype tests to 3 random	category not configured for this cumulative branch
huggingface#45634	feature	skipped	DeepGEMM BF16, isolation, refactor	category not configured for this cumulative branch
huggingface#45630	feature	skipped	Add new model: Kimi2-6	category not configured for this cumulative branch
huggingface#45626	feature	skipped	[Model] Add PP-FormulaNet Model Support	category not configured for this cumulative branch
huggingface#45621	feature	skipped	Better Grouped GEMM + EP	category not configured for this cumulative branch
huggingface#45618	feature	skipped	Add MTP speculative decoding via MTPCandidateGenerator	category not configured for this cumulative branch
huggingface#45613	feature	skipped	[New Model] Add MiniCPM3 support	category not configured for this cumulative branch
huggingface#45612	documentation	skipped	[docs] update model cards	category not configured for this cumulative branch
huggingface#45609	feature	skipped	make it possible to ser/deser HF MoE models with torchao	category not configured for this cumulative branch
huggingface#45608	documentation	skipped	Python code in model docs	category not configured for this cumulative branch
huggingface#45604	feature	skipped	Agent first cli with skill	category not configured for this cumulative branch
huggingface#45599	other	skipped	qa: more lazy loading	category not configured for this cumulative branch
huggingface#45597	feature	skipped	Add Granite 4.1 Vision (granite4_vision)	category not configured for this cumulative branch
huggingface#45699	feature	skipped	Add FP8 kernel acceleration for compressed-tensors quantized models	category not configured for this cumulative branch
huggingface#45586	feature	skipped	Add Audio-Visual Flamingo model	category not configured for this cumulative branch
huggingface#45569	feature	skipped	Proper nemotron H and 3 and 2	category not configured for this cumulative branch
huggingface#45550	other	skipped	Add runner selection for mi325 GPU type	category not configured for this cumulative branch
huggingface#45546	feature	skipped	feat: Add GGUF loading support for Llama 4 (text)	category not configured for this cumulative branch
huggingface#45543	other	skipped	ci: OTEL support	category not configured for this cumulative branch
huggingface#45534	feature	skipped	🚨 [ALM] Add base model without head	category not configured for this cumulative branch
huggingface#45512	feature	skipped	[OutputRecorder] re.search on layer_name	category not configured for this cumulative branch
huggingface#45497	feature	skipped	Add V-JEPA 2.1 inference support	category not configured for this cumulative branch
huggingface#45493	feature	skipped	Modularize `ProcessorMixin` into smaller components	category not configured for this cumulative branch
huggingface#45490	feature	skipped	Add ctsm model	category not configured for this cumulative branch
huggingface#45477	feature	skipped	Blockwise mask fn as opt arg in all masking functions	category not configured for this cumulative branch
huggingface#45476	other	skipped	[Don't merge] Call CI workflow	category not configured for this cumulative branch
huggingface#45471	feature	skipped	Add EXAONE 4.5 implementations	category not configured for this cumulative branch
huggingface#45465	documentation	skipped	[docs] contributing	category not configured for this cumulative branch
huggingface#45462	other	skipped	chore(sec): added a handful of security checks	category not configured for this cumulative branch
huggingface#45453	other	skipped	Draft commit	category not configured for this cumulative branch
huggingface#45452	other	skipped	refactor: replace wildcard imports with explicit imports in model init.py files	category not configured for this cumulative branch
huggingface#45438	feature	skipped	Add Gemma4ForSequenceClassification	category not configured for this cumulative branch
huggingface#45426	feature	skipped	Feature/add axk1	category not configured for this cumulative branch
huggingface#45421	defect	aborted	Improve nested base_model_prefix handling in weight conversion and loading	codebase moved on: core_model_loading.py now uses unified WeightTransform ordering while PR expects separate renamings/converters and nested-prefix helpers; tests also conflict in test_modeling_common.py, so adapting wo…
huggingface#45415	other	skipped	Adds type checking to src/transformers/*py	category not configured for this cumulative branch
huggingface#45700	documentation	skipped	Add docstrings to type validator functions	category not configured for this cumulative branch
huggingface#45401	feature	skipped	Add support for Voxtral-4B-TTS-2603 to transformers	category not configured for this cumulative branch
huggingface#45396	feature	skipped	Extract dynamic vision/audio tensors into standalone pure functions	category not configured for this cumulative branch
huggingface#45391	feature	skipped	audio tester class	category not configured for this cumulative branch
huggingface#45382	feature	skipped	Add AudioGen (AudioCraft) to MusicGen conversion scripts	category not configured for this cumulative branch
huggingface#45363	feature	skipped	n-to-1 kernel fusion via	category not configured for this cumulative branch
huggingface#45355	feature	skipped	Add universal phone recognition model - PhoneticXeus	category not configured for this cumulative branch
huggingface#45350	feature	skipped	WIP: Add support for Granite4VisionForConditionalGeneration	category not configured for this cumulative branch
huggingface#45333	feature	skipped	Add heterogeneous config support (per-layer configuration)	category not configured for this cumulative branch
huggingface#45332	feature	skipped	Add heterogeneous model support (per-layer config and modeling)	category not configured for this cumulative branch
huggingface#45300	defect	validation_failed	Fix Nemotron-H: add mlp layer type support	merge conflicts were resolved, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45296	feature	skipped	Add GGUF support to Gemma4 (31B & 26B-A4B) text	category not configured for this cumulative branch
huggingface#45294	feature	skipped	feat: add Gemma4ForSequenceClassification	category not configured for this cumulative branch
huggingface#45293	defect	validation_failed	Fix "AttributeError: NewTokenizer has no attribute special_attribute_present" (Remove `REGISTERED_FAST_ALIASES`)	clean merge, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45273	defect	validation_failed	fix: liger unnecessarily materializes logits in VRAM during eval, causing OOM	clean merge, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45270	feature	skipped	[Trainer] Support multi-loss component logging	category not configured for this cumulative branch
huggingface#45267	documentation	skipped	Add docstring to FFN.forward in DistilBERT	category not configured for this cumulative branch
huggingface#45254	defect	aborted	Fix more integration tests for important models	codebase moved on: PR head has no current file list from GitHub but differs from cumulative HEAD by hundreds of unrelated repo-wide changes from an old main snapshot; no narrow defect patch is identifiable to apply safe…
huggingface#45244	other	skipped	Let's CI go great	category not configured for this cumulative branch
huggingface#45233	defect	validation_failed	feat: make timesfm2_5 onnx export compatible	clean merge, but light validation timed out after the merge; reset merge commit
huggingface#45218	feature	skipped	Proposal: Agent-first CLI	category not configured for this cumulative branch
huggingface#45213	feature	skipped	DO NOT MERGE - model creation skill	category not configured for this cumulative branch
huggingface#45189	feature	skipped	Add doc test CI workflow reusing existing model job infrastructure	category not configured for this cumulative branch
huggingface#45186	feature	skipped	Add new model: Isaac	category not configured for this cumulative branch
huggingface#45181	feature	skipped	Make the cli a top-level package	category not configured for this cumulative branch
huggingface#45176	feature	skipped	added efficietvitsam model to HF	category not configured for this cumulative branch
huggingface#45168	feature	skipped	Update min_lr and max_lr default values to better defaults	category not configured for this cumulative branch
huggingface#45167	feature	skipped	Add anthropic style of function schema	category not configured for this cumulative branch
huggingface#45157	feature	skipped	[WIP] PrismML Bonsai model support	category not configured for this cumulative branch
huggingface#45153	feature	skipped	[`FA`] Native torch integration	category not configured for this cumulative branch
huggingface#45152	documentation	skipped	[docs] model testing	category not configured for this cumulative branch
huggingface#45149	feature	skipped	DO NOT MERGE adding SAML3-LiteText with a skill, first pass	category not configured for this cumulative branch
huggingface#45144	feature	skipped	Add Xiaomi MiMo-V2	category not configured for this cumulative branch
huggingface#45134	feature	skipped	Optimize Parakeet feature extraction on CUDA	category not configured for this cumulative branch
huggingface#45133	feature	skipped	Add sarvam model	category not configured for this cumulative branch
huggingface#45115	other	skipped	Refactor/nemotron h inherit granitemoehybrid	category not configured for this cumulative branch
huggingface#45113	feature	skipped	Add GDS support for safetensors loading	category not configured for this cumulative branch
huggingface#45110	feature	skipped	Add SAM 3.1	category not configured for this cumulative branch
huggingface#45101	feature	skipped	Adding support for Nandi Models	category not configured for this cumulative branch
huggingface#45097	feature	skipped	Add old InternVL2-1B/2B support to the InternVL conversion script huggingface#45092	category not configured for this cumulative branch
huggingface#45082	feature	skipped	[VidEoMT] Update conversion script	category not configured for this cumulative branch
huggingface#45077	other	skipped	fix: pin 50 unpinned actions to commit SHA, extract 1 secret to env var	category not configured for this cumulative branch
huggingface#45075	feature	skipped	Add Deepseek-OCR-2 model	category not configured for this cumulative branch
huggingface#45073	other	skipped	Refactor OwlViT to modular Transformers	category not configured for this cumulative branch
huggingface#45067	feature	skipped	feat: trainer resume_from_checkpoint support hub downloads (huggingface#43375)	category not configured for this cumulative branch
huggingface#45064	other	skipped	refactor: shard checkers	category not configured for this cumulative branch
huggingface#45037	documentation	skipped	add missing colon in custom_attention function signature in attention…	category not configured for this cumulative branch
huggingface#45028	feature	skipped	TP refactor for FSDP + TP integration	category not configured for this cumulative branch
huggingface#44989	feature	skipped	🚨 Distributed training API	category not configured for this cumulative branch
huggingface#44979	feature	skipped	Module Fusion API	category not configured for this cumulative branch
huggingface#44974	feature	skipped	Refactor core_model_loading to support FSDP shard-on-read loading	category not configured for this cumulative branch
huggingface#44965	other	skipped	try	category not configured for this cumulative branch
huggingface#44956	feature	skipped	Add HyperCLOVAX SEED Think 14B	category not configured for this cumulative branch
huggingface#44942	feature	skipped	Add inference time layer fusion optimisations via `PreTrainedModel.from_pretrained(fuse_layers=True)`	category not configured for this cumulative branch
huggingface#44891	feature	skipped	[Trainer] add MoERouterHealthCallback Callback	category not configured for this cumulative branch
huggingface#44875	other	skipped	refactor: improved the cli server module code organization	category not configured for this cumulative branch
huggingface#44872	documentation	skipped	Fix: Update outdated sampler comment in generation/utils.py	category not configured for this cumulative branch
huggingface#44830	feature	skipped	Add AudioFlamingoNext model	category not configured for this cumulative branch
huggingface#44815	defect	aborted	Dequant fix	codebase moved on: PR expects the older finegrained_fp8 CUTLASS/Triton loader structure and edits Fp8Dequantize/FP8Experts around _get_triton_kernel, while current cumulative code has the newer lazy_load_kernel plus Dee…
huggingface#44794	feature	skipped	Refacto GGUF weight conversion	category not configured for this cumulative branch
huggingface#44775	documentation	skipped	[docs] n-d parallelism	category not configured for this cumulative branch
huggingface#44772	documentation	skipped	bitsandbytes: Update links and docs	category not configured for this cumulative branch
huggingface#44771	defect	aborted	wtf	codebase moved on: PR edits the old generated src/transformers/models/pi0/image_processing_pi0_fast.py to inherit SiglipImageProcessorFast, but current cumulative code has PI0 under modular generation with src/transform…
huggingface#44729	defect	aborted	Avoid floating point math for ceil operations	codebase moved on: PR applies a broad int_div_ceil migration across many generated image processors/configs and older fast image processor files; current cumulative branch has deleted several *_fast.py files and regener…
huggingface#44724	defect	aborted	Fix some missing / incorrect entries in auto files	codebase moved on: PR edits generated auto mapping files using an older tuple/fast-processor mapping layout, while current cumulative branch uses the newer backend-keyed auto image processor mappings and has many additi…
huggingface#44722	feature	skipped	Refactor gptj output tracing to use standardized decorators	category not configured for this cumulative branch
huggingface#44713	feature	skipped	[ColQwen2] Refactor output tracing (issue huggingface#43979)	category not configured for this cumulative branch
huggingface#44682	feature	skipped	transformers serve + llamacpp	category not configured for this cumulative branch
huggingface#44676	defect	validation_failed	fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights	compileall and repo checkers passed, but light validation/tests_fetcher timed out after the merge
huggingface#44662	feature	skipped	[model] Add PenguinVL implementation	category not configured for this cumulative branch
huggingface#44660	defect	validation_failed	Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models	compileall and repo checkers passed, but light validation timed out after the merge
huggingface#44659	documentation	skipped	docs: remove outdated use_diff docstring from DistributedConfig.to_js…	category not configured for this cumulative branch
huggingface#44650	defect	validation_failed	Fix Seq2SeqTrainer generation path for decoder-only models	compileall and repo checkers passed, but light validation timed out after the merge
huggingface#44646	documentation	skipped	Fix typo: seperate -> separate	category not configured for this cumulative branch
huggingface#44642	documentation	skipped	Clarify that causal LM labels are shifted internally	category not configured for this cumulative branch
huggingface#44635	other	skipped	[Gemma] Modular-friendly buffers	category not configured for this cumulative branch
huggingface#45702	defect	aborted	Reorder decorators for autodoc and dataclass	codebase moved on: PR reorders dataclass/auto_docstring decorators broadly, but cumulative branch has additional qwen3_5 sequence-classification changes adjacent to Qwen3_5CausalLMOutputWithPast causing a content confli…
huggingface#44601	feature	skipped	[Distributed] Add PP support natively	category not configured for this cumulative branch
huggingface#44594	feature	skipped	[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline	category not configured for this cumulative branch
huggingface#44569	feature	skipped	Add SarvamMLA model (sarvamai/sarvam-105b)	category not configured for this cumulative branch
huggingface#44553	feature	skipped	[`FA`] Refactor FA CB kwargs	category not configured for this cumulative branch
huggingface#44550	documentation	skipped	Improve clarity and grammar in Auto Classes documentation	category not configured for this cumulative branch
huggingface#44547	documentation	skipped	Fix position_ids docstring in modeling_flash_attention_utils.py	category not configured for this cumulative branch
huggingface#44543	defect	aborted	Fix assistant_masks for multimodal inputs in apply_chat_template	merge conflict in ProcessorTesterMixin tests: PR adds multimodal assistant mask regression at the same location where current branch has a newer tool-call chat template regression test; processing_utils change staged cl…
huggingface#44517	feature	skipped	Add qwen3 tts	category not configured for this cumulative branch
huggingface#44495	other	skipped	[`Gradient Ckpting`] Remove unnecessary attribute definitions	category not configured for this cumulative branch
huggingface#44467	defect	aborted	Placeholder tokens update	tokenizer stack has moved on substantially: conflicts span convert_slow_tokenizer, tokenization_auto incorrect-class handling, tokenization_utils_base/tokenizers backend, LASR/SigLIP2 tokenizers, and tokenizer regressio…
huggingface#44445	feature	skipped	Adding support for GraniteDoclingHybrid	category not configured for this cumulative branch
huggingface#44438	feature	skipped	Add flashoptim	category not configured for this cumulative branch
huggingface#44420	documentation	skipped	[docs] distributed training	category not configured for this cumulative branch
huggingface#44408	feature	skipped	Add option to export encoder hidden states for Granite-speech	category not configured for this cumulative branch
huggingface#44407	documentation	skipped	docs: add energy efficiency considerations to bitsandbytes quantization guide	category not configured for this cumulative branch
huggingface#44394	feature	skipped	🚨🚧 FeatureExtractor → AudioProcessor	category not configured for this cumulative branch
huggingface#44375	feature	skipped	Add RF-DETR	category not configured for this cumulative branch
huggingface#44369	documentation	skipped	Feature/integrations docs fix	category not configured for this cumulative branch
huggingface#44348	feature	skipped	Enable MetalConfig to load pre-quantized MLX models from HuggingFace Hub	category not configured for this cumulative branch
huggingface#44314	feature	skipped	add HyperClovaX Vision	category not configured for this cumulative branch
huggingface#44298	defect	aborted	Auto detect wrong mapping models	codebase moved on: PR targets older tokenizer conversion/mapping code; current branch already has newer serialized-tokenizer override, padding/truncation preservation, SentencePiece charsmap extraction, and additional a…
huggingface#44264	defect	aborted	[`Moe`] Enable aux loss automatically when in training + coef is not 0	codebase moved on: PR globally changes MoE aux-loss output behavior across generated and modular MoE model files, but current branch has newer decorator/output patterns and model-specific changes causing 32 conflicts; s…
huggingface#44259	feature	skipped	Async data producer	category not configured for this cumulative branch
huggingface#44252	feature	skipped	Timm unification continued	category not configured for this cumulative branch
huggingface#44215	feature	skipped	Add sequence classification capability to Granite models	category not configured for this cumulative branch
huggingface#44184	feature	skipped	feat: add OpenAI CircuitGPT core architecture and sparse linear layers	category not configured for this cumulative branch
huggingface#44178	feature	skipped	Add xcodec2 model	category not configured for this cumulative branch
huggingface#44171	feature	skipped	Parakeet tdt	category not configured for this cumulative branch
huggingface#44161	other	skipped	Refactor LongT5 to use @capture_outputs and @can_return_tuple decorators for unified output handling (Fixes huggingface#43979)	category not configured for this cumulative branch
huggingface#44159	feature	skipped	Add SDPA and Flash Attention support for OWL-ViT	category not configured for this cumulative branch
huggingface#44154	other	skipped	Refactored vits to match standardized output collection interface	category not configured for this cumulative branch
huggingface#44142	feature	skipped	[voxtral-realtime] get more perfs!	category not configured for this cumulative branch
huggingface#44129	other	skipped	Refactor SpeechT5 output tracing to standardized output capture	category not configured for this cumulative branch
huggingface#44123	feature	skipped	Avoid device sync in training loss accumulation	category not configured for this cumulative branch
huggingface#44116	other	skipped	[WIP] [Flaubert] Refactor output tracing to decorator-based interface	category not configured for this cumulative branch
huggingface#44114	other	skipped	Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators	category not configured for this cumulative branch
huggingface#44101	other	skipped	[XLM] Refactor output tracing to align with capture_outputs standardized architecture	category not configured for this cumulative branch
huggingface#44098	other	skipped	[ViLT] Refactor output handling to align with standardized patterns	category not configured for this cumulative branch
huggingface#44086	other	skipped	[MGP-STR] Refactor output tracing to use capture_outputs/can_return_tuple decorators	category not configured for this cumulative branch
huggingface#44085	other	skipped	Refactor RemBERT to use output tracing decorators	category not configured for this cumulative branch
huggingface#44083	feature	skipped	FSDP2 native support in transformers	category not configured for this cumulative branch
huggingface#44076	other	skipped	Refectored modeling_imagegpt.py to enable hooks to capture_outputs	category not configured for this cumulative branch
huggingface#44074	other	skipped	[TextNet] Refactor output tracing using capture_outputs decorator	category not configured for this cumulative branch
huggingface#44073	other	skipped	[VisualBert] Refactor output tracing using capture_outputs and can_return_tuple decorators	category not configured for this cumulative branch
huggingface#44072	other	skipped	refactor efficientnet output tracing with @capture_outputs and @can_r…	category not configured for this cumulative branch
huggingface#44071	other	skipped	[Refactor] Migrate MPT to standardized output tracing decorators	category not configured for this cumulative branch
huggingface#44070	feature	skipped	Add GGUF loading support for Qwen3-Next (qwen3_next) architecture	category not configured for this cumulative branch
huggingface#44068	other	skipped	Refactor GPT-Neo to use `@capture_outputs` and `@can_return_tuple` decorators	category not configured for this cumulative branch
huggingface#44066	other	skipped	Refactor GPT-J to use standardized output tracing (huggingface#43979)	category not configured for this cumulative branch
huggingface#44059	other	skipped	[GPT2] Refactor output tracing to use capture_outputs/can_return_tuple decorators	category not configured for this cumulative branch
huggingface#44056	other	skipped	[MPNet] Refactor output tracing using capture_outputs decorator	category not configured for this cumulative branch
huggingface#44054	feature	skipped	Flash mla interface	category not configured for this cumulative branch
huggingface#44044	other	skipped	Refactor DeBERTa's output tracing interface	category not configured for this cumulative branch
huggingface#44030	other	skipped	refactor output tracing in `dpr`	category not configured for this cumulative branch
huggingface#44029	other	skipped	refactor output tracing in `rwkv`	category not configured for this cumulative branch
huggingface#44028	other	skipped	refactor output tracing for `superpoint`	category not configured for this cumulative branch
huggingface#44027	other	skipped	refactor output tracing in `speech_encoder_decoder`	category not configured for this cumulative branch
huggingface#44026	other	skipped	refactor output tracing for `vision_encoder_decoder`	category not configured for this cumulative branch
huggingface#44025	other	skipped	refactor output tracing for `depth_anything`	category not configured for this cumulative branch
huggingface#44024	other	skipped	Focalnet standardized outputs	category not configured for this cumulative branch
huggingface#44019	other	skipped	Refactor `resnet` to use `@capture_outputs` / `@can_return_tuple` output tracing	category not configured for this cumulative branch
huggingface#44018	other	skipped	Refactor GPT-Neo output tracing to use capture_outputs/can_return_tuple	category not configured for this cumulative branch
huggingface#44017	other	skipped	Refactor output tracing in segformers (huggingface#43979)	category not configured for this cumulative branch
huggingface#44015	other	skipped	Refactor GPT2-based models to standardized output collection interface	category not configured for this cumulative branch
huggingface#44013	other	skipped	Ouptut tracing: Standardizing MobileNetv2	category not configured for this cumulative branch
huggingface#44010	other	skipped	[SqueezeBert] Migrate to standardized output collection decorators	category not configured for this cumulative branch
huggingface#44007	other	skipped	[ResNet] Refactor output tracing to decorator-based interface	category not configured for this cumulative branch
huggingface#44004	other	skipped	refactor output tracing for `codegen`	category not configured for this cumulative branch
huggingface#44003	other	skipped	refactor output tracing in `mamba`	category not configured for this cumulative branch
huggingface#44002	other	skipped	refactor output tracing in `upernet`	category not configured for this cumulative branch
huggingface#44001	other	skipped	refactor output tracing in `univnet`	category not configured for this cumulative branch
huggingface#44000	other	skipped	refactor output tracing in `vision_text_dual_encoder`	category not configured for this cumulative branch
huggingface#43999	other	skipped	refactor output tracing in `mobilenet_v1`	category not configured for this cumulative branch
huggingface#43998	other	skipped	refactor output tracing in `timm_backbone`	category not configured for this cumulative branch
huggingface#43997	other	skipped	Migrate RegNet to standardized output tracing	category not configured for this cumulative branch
huggingface#43996	other	skipped	Refactor FNet and CVT output tracing	category not configured for this cumulative branch
huggingface#43995	other	skipped	Refactoring falcon model to match standardized output collection interface	category not configured for this cumulative branch
huggingface#43973	feature	skipped	Add lfm2.5 audio	category not configured for this cumulative branch
huggingface#43924	defect	aborted	[`Attn`] More old mask APIs	codebase moved on: PR ports old attention-mask API helpers across many model files, but current branch has newer standardized output/TransformersKwargs paths and renamed create_bidirectional_mask call sites; 11 content …
huggingface#43915	feature	skipped	add PaddleOCR-VL conversion	category not configured for this cumulative branch
huggingface#43888	feature	skipped	Support for BharatGen's Param2MoE model architecture	category not configured for this cumulative branch
huggingface#43863	feature	skipped	[whisper] allow to pass text/audio specific kwargs	category not configured for this cumulative branch
huggingface#45703	other	skipped	chore(typing): add ty type checking for 10 utility files	category not configured for this cumulative branch
huggingface#43838	feature	skipped	Qwen3 ASR and Forced Aligner	category not configured for this cumulative branch
huggingface#43823	feature	skipped	Add	category not configured for this cumulative branch
huggingface#43785	defect	aborted	Fix FSDP_CPU_RAM_EFFICIENT_LOADING (huggingface#43749)	codebase moved on: PR targets older FSDP/model-loading flow; cumulative branch already has newer FSDP2 cpu_ram_efficient_loading changes, and direct merge placed the core_model_loading early-return block inside the curr…
huggingface#43751	other	skipped	Fix ruff warnings	category not configured for this cumulative branch
huggingface#43743	feature	skipped	Modular playground	category not configured for this cumulative branch
huggingface#43665	other	skipped	fix	category not configured for this cumulative branch
huggingface#43663	feature	skipped	Add _get_signature_columns method to allow custom trainers to override column filtering	category not configured for this cumulative branch
huggingface#43649	other	skipped	Check new failures reporting 5	category not configured for this cumulative branch
huggingface#43636	feature	skipped	Add _metrics dict to Trainer for custom metric logging	category not configured for this cumulative branch
huggingface#43613	feature	skipped	Add Promptable Visual Segmentation pipeline	category not configured for this cumulative branch
huggingface#43612	feature	skipped	Add Promptable Concept Segmentation pipeline	category not configured for this cumulative branch
huggingface#43532	other	skipped	[do not merge] Show diff	category not configured for this cumulative branch
huggingface#43506	feature	skipped	Add RishAI model with full transformers integration	category not configured for this cumulative branch
huggingface#43498	defect	validation_failed	fix/backward compatibility for tie_weights	merge clean but repo checkers failed: added tie_embeddings_and_encoder_decoder calls undefined tie_weights and import ordering was unformatted; reverted top merge
huggingface#43488	other	skipped	[don't merge] bad format to check repo bot	category not configured for this cumulative branch
huggingface#43484	feature	skipped	Optimize Ernie 4.5 VL timestamp rendering with cached overlays	category not configured for this cumulative branch
huggingface#43469	feature	skipped	argparser: Allow optional bool flags without values	category not configured for this cumulative branch
huggingface#43451	feature	skipped	Add Molmo2	category not configured for this cumulative branch
huggingface#43448	feature	skipped	Add Molmo	category not configured for this cumulative branch
huggingface#43446	feature	skipped	[`typings`] Automatically type decorator return types as `tuple \| X`	category not configured for this cumulative branch
huggingface#43424	other	skipped	Add test to ensure executorch exportability with dynamic shapes	category not configured for this cumulative branch
huggingface#43363	feature	skipped	[Improvement] Update `DistributedLengthGroupedSampler` to allow customizing length function	category not configured for this cumulative branch
huggingface#43340	documentation	skipped	Claude code skills for transformers-api	category not configured for this cumulative branch
#43333	documentation	skipped	Fix typo: interupted -> interrupted	category not configured for this cumulative branch
#43310	other	skipped	Replace regex with re	category not configured for this cumulative branch
#43297	feature	skipped	[Feat] Reduces redundant tokenization of tags to accelerate Qwen3VL.	category not configured for this cumulative branch
#43271	documentation	skipped	Fix typo: necesary → necessary	category not configured for this cumulative branch
#43267	documentation	skipped	Add auto_docstring decorator to Sam3ImageProcessorFast	category not configured for this cumulative branch
#43265	feature	skipped	Adding Omnilingual ASR models	category not configured for this cumulative branch
#43249	feature	skipped	[WIP] Processor moves to `device` in `__call__`	category not configured for this cumulative branch
#43246	other	skipped	GptOss slow tests	category not configured for this cumulative branch
#43213	feature	skipped	feat: allow output_hidden_states and output_attensions to record outputs of specific layers	category not configured for this cumulative branch
#43192	feature	skipped	[Trackio] support trackio gpu logging	category not configured for this cumulative branch
#43139	feature	skipped	[perf] optimize whisper GPU performance	category not configured for this cumulative branch
#43104	documentation	skipped	docs: clarify tokenizer decoder behavior in v5 (#43066)	category not configured for this cumulative branch
#43102	documentation	skipped	Add CPU vs GPU performance comparison example	category not configured for this cumulative branch
#43094	feature	skipped	Avoid inline .item() sync in decoder start token check	category not configured for this cumulative branch
#43088	feature	skipped	Skip attention_mask.all() GPU-CPU sync during generation	category not configured for this cumulative branch
#43085	feature	skipped	Add async_stopping_criteria flag to reduce GPU-CPU syncs during generation	category not configured for this cumulative branch
#43077	other	skipped	compileable=>compilable	category not configured for this cumulative branch
#43063	documentation	skipped	Improve documentation for SegFormer image processor	category not configured for this cumulative branch
#43056	feature	skipped	Perf: enable pin_memory in DataLoader for CLM no_trainer example	category not configured for this cumulative branch
#43044	feature	skipped	[SAM3] Enable single-scale input support in Mask Decoder	category not configured for this cumulative branch
#43036	documentation	skipped	Docs: fix grammar in Pipeline section	category not configured for this cumulative branch
#43020	feature	skipped	Add mimo v2 flash	category not configured for this cumulative branch
#42982	feature	skipped	Add HumanV: decoder-only causal LM	category not configured for this cumulative branch
#42978	feature	skipped	Add ViT NEPA	category not configured for this cumulative branch
#42976	other	skipped	Upgrade GitHub Actions to latest versions	category not configured for this cumulative branch
#42975	other	skipped	Upgrade GitHub Actions for Node 24 compatibility	category not configured for this cumulative branch
#42944	feature	skipped	[Quantization] From config Quantization for FP8	category not configured for this cumulative branch
#42919	feature	skipped	[WIP] Video support in vLLM backend	category not configured for this cumulative branch
#42908	defect	aborted	Fix gguf tokenizers	codebase moved on: PR conflicts in generation/utils.py, auto/tokenization_auto.py, and tokenization_utils_base.py; current branch already contains the main GGUF tokenizer native-format guard, while the PR head also comm…
#42887	feature	skipped	[Quantization] [Compressed Tensors] Support Transforms, Fix Tests	category not configured for this cumulative branch
#42876	documentation	skipped	Document tensor parallelism configuration with Trainer	category not configured for this cumulative branch
#42829	feature	skipped	[WIP] End-to-end exportable pipelines (object detection)	category not configured for this cumulative branch
#42816	defect	aborted	validate tokenizer components	codebase moved on: PR adds tokenizer component validation in tokenization_utils_tokenizers.py, but current branch has newer tokenizer loading/GGUF and minimal-tokenizer extraction logic in the same area; resolving would…
#42785	documentation	skipped	Fixing wrong information in Mimi Docs	category not configured for this cumulative branch
#42781	feature	skipped	Add VibeVoice Realtime	category not configured for this cumulative branch
#42767	defect	aborted	fix: add mapping of deepseek_v32 model type	codebase moved on: PR adds deepseek_v32 as an older-style generated model plus direct auto mapping edits, but current branch derives auto mappings from generated auto_mappings.py/modular model metadata; resolving would …
#42765	other	skipped	Add distributed training CI	category not configured for this cumulative branch
#42744	other	skipped	[FP8 Devstral 24B] Repro PR	category not configured for this cumulative branch
#42742	other	skipped	Remove redundant else in activations.py	category not configured for this cumulative branch
#42706	other	skipped	Nit parakeet	category not configured for this cumulative branch
#42668	defect	aborted	More robust processor from pretrained	codebase moved on: processor auto/tokenizer mappings and ProcessorMixin loading/validation paths have diverged substantially; direct PR-head merge also tried to bring unrelated upstream-era artifacts, so safely resolvin…
#42665	feature	skipped	Some optimizations for offloading	category not configured for this cumulative branch
#42655	feature	skipped	New Feature: Enabling Speculative Decoding with Batch Size > 1 (If draft and target model share tokenizer)	category not configured for this cumulative branch
#42588	documentation	skipped	Document the /v1/models endpoint	category not configured for this cumulative branch
#42572	documentation	skipped	docs: add doctest for SqueezeBERT	category not configured for this cumulative branch
#42527	documentation	skipped	Added doctests for SwiftFormer model	category not configured for this cumulative branch
#42496	feature	skipped	feat: allow CP with trainer	category not configured for this cumulative branch
#42467	defect	aborted	Fixes StaticCache Crashes	codebase moved on: StaticLayer/StaticSlidingWindowLayer now use cumulative_length state, a key/value lazy_initialization signature, and crop/get_mask_sizes APIs that conflict with the PR's older max_batch_size keys_full…
#42461	defect	aborted	Refactor RMSNorm implementations to use torch.nn.functional.rms_norm	codebase moved on: the PR rewrites RMSNorm classes across 65 model files, many of which are now generated/copied or have diverged type signatures and local changes; resolving would require applying the rms_norm migratio…
#42453	feature	skipped	Add SDPA and FlashAttention support to T5	category not configured for this cumulative branch
#42437	other	skipped	One tok typing	category not configured for this cumulative branch
#42432	feature	skipped	Add VideoToTextPipeline with smart frame sampling and system prompts	category not configured for this cumulative branch
#42430	defect	aborted	🚨 Clean up `image-text-to-text` pipeline	codebase moved on: source changes merged, but pipeline tests now contain device-specific Expectations and additional text-only coverage that conflict with the PR's older expected-output rewrites; resolving would require…
#42424	defect	validation_failed	[WIP] attempt to fix ooms in tests	after resolving small conflicts by keeping the PR's added use_cache/cache_position kwargs, ruff failed with undefined cache_position in qwen3_omni_moe, qwen3_vl, modular_qwen3_vl, and qwen3_vl_moe; current model APIs no…
#42415	other	skipped	initial clean	category not configured for this cumulative branch
#42413	feature	skipped	Add chatterbox support	category not configured for this cumulative branch
#42412	other	skipped	Replace Optional and Union typing with \| in examples	category not configured for this cumulative branch
#42403	feature	skipped	Sam3 release on v4.57.3	category not configured for this cumulative branch
#42385	defect	aborted	Fix weight tying logic between `_tied_weights_keys` and `tie_word_embeddings`	codebase moved on: the PR branch contains many upstream merge commits plus tied-weight changes and conflicts in core_model_loading, hub_kernels, FSMT and VLM/audio config/model files; safely extracting the weight-tying …
#42345	feature	skipped	GPT-OSS Flash Attention and memory-efficient attention via Native PyTorch SDPA	category not configured for this cumulative branch
#42310	feature	skipped	[WIP] Add moondream3 model	category not configured for this cumulative branch
#42292	documentation	skipped	docs: clarify recommended usage of max_new_tokens in generate()	category not configured for this cumulative branch
#42277	documentation	skipped	doc(kernels): update kernels integration documentation	category not configured for this cumulative branch
#42256	feature	skipped	Integrate Core AutoAWQ Inference Components into Transformers	category not configured for this cumulative branch
#42244	defect	aborted	[core] Fix torchao loading	codebase moved on: the torchao loading PR rewrites core loading/quantization plus hundreds of generated and modular modeling files, producing 477 conflicts including add/add conflicts in core_model_loading, conversion_m…
#42229	feature	skipped	Add openpangu_moe model	category not configured for this cumulative branch
#42210	feature	skipped	[WIP] started adding support for evo2	category not configured for this cumulative branch
#42166	feature	skipped	add internvl_flash model	category not configured for this cumulative branch
#42134	feature	skipped	Add AutoMergeAdapters utility for merging multiple LoRA adapters with…	category not configured for this cumulative branch
#42124	documentation	skipped	📚 docs(qwen3): add comprehensive usage examples and model details	category not configured for this cumulative branch
#42112	feature	skipped	Add max_thinking_tokens for reasoning models (issue #42111)	category not configured for this cumulative branch
#42039	other	skipped	[WIP] 🚨 clean xcodec 🧼	category not configured for this cumulative branch
#42000	documentation	skipped	Fix Mixtral: Docstring uses consistent 'top_k_index' and 'top_k_weights' in MixtralSparseMoeBlock	category not configured for this cumulative branch
#41992	feature	skipped	[PoC] HF exporters	category not configured for this cumulative branch
#41977	feature	skipped	Add Phi3.5 Vision Model	category not configured for this cumulative branch
#41967	defect	aborted	feat: RoPE-related typing improvements	codebase moved on: current modeling_rope_utils has a newer RoPE standardization/validation layout and RopeParameters definition, while the PR rewrites standardize_rope_params and moves/retypes RopeParameters from an old…
#45705	feature	skipped	[skills] model doc	category not configured for this cumulative branch
#41899	feature	skipped	Testing checkpoint limit changes from PR #37196	category not configured for this cumulative branch
#41895	feature	skipped	Add Telugu Sentiment Classification Example using DistilBERT	category not configured for this cumulative branch
#41886	feature	skipped	ADD FG-CLIP2	category not configured for this cumulative branch
#41882	feature	skipped	Support fdma for models with attention bias	category not configured for this cumulative branch
#41880	documentation	skipped	Indonesian Language Support for ReadMe	category not configured for this cumulative branch
#41823	feature	skipped	Lfm2-VL vllm	category not configured for this cumulative branch
#41807	documentation	skipped	git commit -m "Fix: corrected outdated documentation link in README.md"	category not configured for this cumulative branch
#41800	documentation	skipped	Increasing clarity	category not configured for this cumulative branch
#41798	feature	skipped	p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding	category not configured for this cumulative branch
#41797	feature	skipped	Add deepseek ocr	category not configured for this cumulative branch
#41794	other	skipped	Enable flake8-pie rules	category not configured for this cumulative branch
#41776	feature	skipped	Add safety checking infrastructure for text generation	category not configured for this cumulative branch
#41733	documentation	skipped	transformers CLI documentation issue	category not configured for this cumulative branch
#41710	documentation	skipped	🌐 [i18n-KO] Translated to Korean	category not configured for this cumulative branch
#41693	feature	skipped	🚨 Refactor ViT to updated standards	category not configured for this cumulative branch
#41654	defect	aborted	Improve LLaMA tokenizer error when vocab is missing: suggest installi…	codebase moved on: LLaMA tokenizer has been reduced to tokenizers-backend wiring with only all, while PR modifies legacy SentencePiece get_spm_processor and adds a slow-tokenizer missing-vocab test; resolving would …
#41611	documentation	skipped	Docs add custom loss example	category not configured for this cumulative branch
#41597	documentation	skipped	Standardize RoBERTa model card following issue #36979	category not configured for this cumulative branch
#41594	feature	skipped	Add beginner-friendly sentiment analysis example	category not configured for this cumulative branch
#41593	feature	skipped	examples: add multi-label text classification (BCEWithLogitsLoss, met…	category not configured for this cumulative branch
#41584	defect	aborted	Add clear error message for missing SentencePiece model in `get_spm_processor` (fix #41553)	codebase moved on: PR inserts a guard in legacy LlamaTokenizer.get_spm_processor, but current tokenization_llama.py no longer contains that SentencePiece implementation and only exposes the tokenizers-backend class; res…
#41565	documentation	skipped	🌐 [i18n-KO] Updated `perf_train_gpu_many.md`	category not configured for this cumulative branch
#41561	feature	skipped	Optimize Mamba2 memory usage by replacing broadcast with einsum	category not configured for this cumulative branch
#41557	defect	aborted	Update modeling_llama4.py	PR does not implement the reported Llama4 RMSNorm/L2Norm defect; its only model change is an inline question/comment, while direct merge would also drag unrelated upstream/main history. No practical defect fix to apply.
#41531	documentation	skipped	🌐 [i18n-KO] Translated `video_processor.md` to Korean	category not configured for this cumulative branch
#41528	feature	skipped	Add position encoding interpolation to DeiT	category not configured for this cumulative branch
#41527	documentation	skipped	🌐 [i18n-KO] Translated selecting.md to Korean	category not configured for this cumulative branch
#41524	feature	skipped	Add max_eval_batches argument to TrainingArguments	category not configured for this cumulative branch
#41523	other	skipped	Add test coverage for ConvNextImageProcessorFast	category not configured for this cumulative branch
#41522	defect	aborted	Fix _init_weights to safely skip int8 quantized weights	codebase moved on: PR directly edits generated qwen2_5_vl modeling and replaces the current modular _init_weights override with an ad-hoc initializer plus root-level test scripts. Current tree has modular_qwen2_5_vl as …
#41491	feature	skipped	Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping	category not configured for this cumulative branch
#41490	defect	aborted	Fix _init_weights to safely skip int8 tensors in Qwen2_5_VL model	codebase moved on: PR edits generated qwen2_5_vl modeling _init_weights and adds root-level tester scripts, while current tree uses modular_qwen2_5_vl as source of truth and generated modeling has diverged. Resolving sa…
#41488	other	skipped	Create 1	category not configured for this cumulative branch
#41458	feature	skipped	Adding ScatterMoE kernel support for Granite models.	category not configured for this cumulative branch
#41419	feature	skipped	First QAT for Finegrained FP8	category not configured for this cumulative branch
#41406	feature	skipped	🚨 [v5] Remove deprecated cache classes	category not configured for this cumulative branch
#41362	documentation	skipped	Added Hacktoberfest banner image to README.md	category not configured for this cumulative branch
#41356	feature	skipped	Add DEIMv2 model, image processor, and basic tests	category not configured for this cumulative branch
#41349	feature	skipped	Create (3d_parrallel_v2.py) - Add 3D parallelism training example script	category not configured for this cumulative branch
#41333	feature	skipped	Add DeepseekVLV2 Model	category not configured for this cumulative branch
#41329	defect	aborted	Fix GIL=0 segfault and Add GIL=0 compat for regex paths	codebase moved on: PR rewrites many legacy slow tokenizer files to import a new utils.safe regex wrapper, but current v5 tree has converted several of those files to tokenizers-backend compatibility shims and deleted ot…
#41315	feature	skipped	[model deprecations] Define new version-based model deprecation/deletions with user warnings/exceptions	category not configured for this cumulative branch
#41304	defect	aborted	Fix equality-vs-assignment bug in GptqHfQuantizer.update_device_map	codebase moved on: the buggy auto-gptq CPU-to-CUDA assignment branch targeted by the PR has been removed; current GPTQ validation requires gptqmodel and update_device_map only fills a default CPU map, so applying the PR…
#41299	feature	skipped	copied changes from soghomon-b:add-eval-step-limit-31561	category not configured for this cumulative branch
#41291	feature	skipped	Add-Deimv2	category not configured for this cumulative branch
#41272	feature	skipped	feat: Add HRM Model	category not configured for this cumulative branch
#41254	documentation	skipped	docs: Add Hacktoberfest banner, Contributing and License sections to README	category not configured for this cumulative branch
#41251	feature	skipped	Add deepseek 3.2 exp	category not configured for this cumulative branch
#41224	feature	skipped	Add DINOv3ViTForImageClassification support	category not configured for this cumulative branch
#41215	defect	aborted	Fix CLIP memory leak causing 600-800MB accumulation per batch	codebase moved on: PR deletes local text_outputs/vision_outputs after extracting projected tensors, but current CLIP/Aimv2/MetaClip2 get_*_features APIs now return BaseModelOutputWithPooling objects with projected poole…
#41202	feature	skipped	[WIP] standardize audio kwargs	category not configured for this cumulative branch
#41162	documentation	skipped	Add Sinhala (සිංහල) translation of README	category not configured for this cumulative branch
#41160	documentation	skipped	[docs] update tips syntax	category not configured for this cumulative branch
#41159	feature	skipped	Support setting total_train_batch_size.	category not configured for this cumulative branch
#41144	feature	skipped	Support automatic conversion from zero checkpoint to universal checkpoint.	category not configured for this cumulative branch
#41116	feature	skipped	Add MiniCPM3	category not configured for this cumulative branch
#41095	feature	skipped	Add LLaVA-OneVision-1.5 model and related configurations	category not configured for this cumulative branch
#41053	feature	skipped	Qwen3 moe	category not configured for this cumulative branch
#41041	feature	skipped	[WIP] Add YuE model	category not configured for this cumulative branch
#41040	feature	skipped	Add Keye vl 8b 1.5	category not configured for this cumulative branch
#41037	other	skipped	Tests: Apertus integration tests	category not configured for this cumulative branch
#41035	documentation	skipped	docs: update speech recognition examples to use modern Common Voice d…	category not configured for this cumulative branch
#41033	feature	skipped	feat: make audio feature extractors torch.export-able	category not configured for this cumulative branch
#41024	feature	skipped	Deprecate `max_size` in ConditionalDetrImageProcessor with warning	category not configured for this cumulative branch
#41022	documentation	skipped	🌐 [i18n-KO] Translated `backbones.md` to Korean	category not configured for this cumulative branch
#41021	documentation	skipped	🌐 [i18n-KO] Translated `video_processors.md` to Korean	category not configured for this cumulative branch
#41009	feature	skipped	Add Lexa-Delta model support	category not configured for this cumulative branch
#40976	defect	aborted	Better defaults for assisted generation	codebase moved on: assisted generation internals in candidate_generator.py and generation utils have diverged; PR removes/restores fields around assistant_generation_config, min/max length, and logits processors, making…
#40962	feature	skipped	perceptron: Isaac-0.1 implementation	category not configured for this cumulative branch
#40898	feature	skipped	Adding [T5/MT5/UMT5]EncoderForSequenceClassification	category not configured for this cumulative branch
#40888	documentation	skipped	DOC Fix help for chat and serve commands	category not configured for this cumulative branch
#40887	other	skipped	Refactor output handling in generate for cleaner decoding methods	category not configured for this cumulative branch
#40877	other	skipped	Bug #40833: Fix for kv_offset calculation for mixed padding	category not configured for this cumulative branch
#40871	other	skipped	Refactor benchmark utils: add type hints, GPU metrics helper, and con…	category not configured for this cumulative branch
#40870	feature	skipped	Reduce vRAM usage during generation by allowing to transfer logits to CPU	category not configured for this cumulative branch
#40861	feature	skipped	Support n_groups>1 for mamba2	category not configured for this cumulative branch
#40840	feature	skipped	feat: add qwen2 pruning support	category not configured for this cumulative branch
#40820	feature	skipped	Add models to benchmarks	category not configured for this cumulative branch
#40759	feature	skipped	feat: add qwen3 pruning support	category not configured for this cumulative branch
#40756	feature	skipped	[WIP] Add Canary	category not configured for this cumulative branch
#40755	feature	skipped	[TimesFM] Add support for forecasting with covariates	category not configured for this cumulative branch
#40740	defect	validation_failed	Configure assistant model generation_config with user parameters	merge introduced F821 undefined name use_model_defaults in src/transformers/generation/utils.py; reset top merge
#40738	documentation	skipped	Docs: Clarify rjieba installation for RoFormerTokenizer	category not configured for this cumulative branch
#40736	documentation	skipped	🌐 [i18n-KO] Translated jan.md to Korean	category not configured for this cumulative branch
#40728	feature	skipped	feat(serve): add OTEL	category not configured for this cumulative branch
#40714	documentation	skipped	Remove TF and Flax from README	category not configured for this cumulative branch
#40670	feature	skipped	Add ability to run Gemma 2 models without post layer norm	category not configured for this cumulative branch
#40640	defect	aborted	Resume training by trained samples to avoid elastic job loss or over-reading of data.	codebase moved on: trainer training loop is refactored into _setup_training/_run_epoch on cumulative branch while PR modifies the older monolithic loop and adds broad trainer tests; resolving sample-based resume safely …
#40648	other	skipped	Bump torch from 2.7.1 to 2.8.0 in /examples/flax/vision	category not configured for this cumulative branch
#40637	feature	skipped	[WIP]Add openpangu_dense model	category not configured for this cumulative branch
#40633	feature	skipped	Add support for Custom Accelerate Instance in Trainer	category not configured for this cumulative branch
#40587	feature	skipped	feat(utils): add vision utils for embedding images and getting the hidden size	category not configured for this cumulative branch
#40546	feature	skipped	Implement VibeVoice	category not configured for this cumulative branch
#40524	documentation	skipped	Use begin_of_sequence token in all sliding windows for correct model behaviour	category not configured for this cumulative branch
#40520	feature	skipped	[generate] add faster `stop_strings` stopping criteria	category not configured for this cumulative branch
#40515	feature	skipped	Add Context-Aware Tokenizer Selection Utility Based on Corpus Analysis	category not configured for this cumulative branch
#40505	other	skipped	Refactor Siglip-like models	category not configured for this cumulative branch
#40493	defect	aborted	Update dtypes to suit colab bf16 -> fp16 -> fp32.	codebase moved on: PR patches legacy modeling_utils dtype resolution around imports and _get_dtype; current branch has refactored imports and dtype handling, and resolving the bf16 fallback semantics would require non-m…
#40473	other	skipped	[tests] Unskip DeBERTaV2 tokenizer parity tests; re-enable fast/slow checks	category not configured for this cumulative branch
#40471	documentation	skipped	DOC: Standardize CodeGen model card (issue #36979)	category not configured for this cumulative branch
#40465	documentation	skipped	🌐 [i18n-KO] Translated `tools.md` to Korean	category not configured for this cumulative branch
#40464	documentation	skipped	🌐 [i18n-KO] Translated `agents.md` to Korean	category not configured for this cumulative branch
#40448	feature	skipped	[model] Support MiniCPM-V 4.5	category not configured for this cumulative branch
#40446	defect	aborted	Add convert_segmentation_map_to_binary_masks_sorted function for hand…	codebase moved on: PR adds a NumPy sorted-mask helper to the old mask2former image processor, but current branch has split Mask2Former processing into PIL and fast/torch implementations; resolving would require designin…
#40425	defect	aborted	Fix check_quantized_param method when param_value is a safetensors slice	codebase moved on: PR updates check_quantized_param methods in multiple quantizer classes, but the current branch has refactored quantization loading and those methods are no longer present in the touched quantizer file…
#40404	documentation	skipped	update model card for gpt-j	category not configured for this cumulative branch
#40403	feature	skipped	Customizable Logit Warping Strategies for Generation #40010	category not configured for this cumulative branch
#40400	documentation	skipped	fixed redundant words in readme.md	category not configured for this cumulative branch
#40395	documentation	skipped	🌐 [i18n-KO] Updated `text_generation.md`	category not configured for this cumulative branch
#40390	documentation	skipped	Fix typo: 'lenght' to 'length'	category not configured for this cumulative branch
#40388	feature	skipped	Add fromjson filter to Jinja2 chat templates	category not configured for this cumulative branch
#40328	feature	skipped	[rfc] Prototype to make torch.compile work with DynamicCache	category not configured for this cumulative branch
#40299	feature	skipped	Remove deprecated max_size parameter from ConditionalDetr image processors	category not configured for this cumulative branch
#40286	feature	skipped	Add MOSS-TTSD with XY-Tokenizer	category not configured for this cumulative branch
#40265	other	skipped	Enable PLW and PLE rules	category not configured for this cumulative branch
#40225	documentation	skipped	docs: clarify decoder_input_ids vs decoder_inputs_embeds usage (#39542)	category not configured for this cumulative branch
#40209	feature	skipped	Add fast image processor for ViViT	category not configured for this cumulative branch
#40180	feature	skipped	Enable native mxfp4 training support for GPT-OSS models	category not configured for this cumulative branch
#40177	defect	aborted	Revert text_positions in Qwen25VL	codebase moved on: PR targets old Qwen2.5-VL prepare_inputs_for_generation text/vision position concatenation, but current code moved position-id construction into _prepare_position_ids_for_generation and Qwen2_5_VLMode…
#40171	feature	skipped	Rename to CamelCase	category not configured for this cumulative branch
#40155	documentation	skipped	🌐 [i18n-KO] Translated `t5.md` to Korean	category not configured for this cumulative branch
#40149	feature	skipped	Implemented fast image processor for VitPose	category not configured for this cumulative branch
#40131	documentation	skipped	add missing Arabic translations	category not configured for this cumulative branch
#40115	feature	skipped	[layer_types] update layer_types with conv	category not configured for this cumulative branch
#40102	documentation	skipped	🌐 [i18n-KO] Translated `auto_docstring.md` to Korean	category not configured for this cumulative branch
#40092	feature	skipped	Optimize LlamaAttention by fusing QKV projections	category not configured for this cumulative branch
#40064	documentation	skipped	🌐 [i18n-KO] Translated `videomae.md` to Korean	category not configured for this cumulative branch
#40061	documentation	skipped	🌐 [i18n-KO] Translated `vitdet.md` to Korean	category not configured for this cumulative branch
#40058	feature	skipped	GGUF Qwen2VL	category not configured for this cumulative branch
#40055	feature	skipped	Auto-log parallelism info to wandb.config using HF Accelerate	category not configured for this cumulative branch
#40047	documentation	skipped	Update wavlm.md to match new model card template	category not configured for this cumulative branch
#40023	feature	skipped	Add support for SDPA for OWLViT and OWLv2	category not configured for this cumulative branch
#39987	feature	skipped	Add a VGGT(Visual Geometry Grounded Transformer) model compatible with huggingface transfromers	category not configured for this cumulative branch
#39941	other	skipped	fixing image_utils.py todo	category not configured for this cumulative branch
#39931	feature	skipped	Registers StaticCache serialization functions for torch.export.export	category not configured for this cumulative branch
#39922	documentation	skipped	🌐 [i18n-KO] Translated `attention_interface.md` to Korean	category not configured for this cumulative branch
#39920	documentation	skipped	🌐 [i18n-KO] Updated ko/perf_train_special.md	category not configured for this cumulative branch
#39917	documentation	skipped	🌐 [i18n-KO] Updated ko/perf_train_cpu.md	category not configured for this cumulative branch
#39901	documentation	skipped	🌐 [i18n-KO] Translated `fp_quant` to Korean	category not configured for this cumulative branch
#39899	feature	skipped	[model] Support MiniCPM-V 4.0	category not configured for this cumulative branch
#39895	feature	skipped	Add Videoprism	category not configured for this cumulative branch
#39886	documentation	skipped	🌐 [i18n-KO] Translated `perf_train_gaudi.md` to Korean	category not configured for this cumulative branch
#39859	feature	skipped	WIP: Initial support for bnb 4bit on any nn.Parameter	category not configured for this cumulative branch
#39831	other	skipped	refactor(modeling_llama): make RotaryEmbedding default path explicit	category not configured for this cumulative branch
#39807	documentation	skipped	🌐 [i18n-KO] Translated `bamba.md` to Korean	category not configured for this cumulative branch
#39796	feature	skipped	[pipelines] text-to-audio pipeline standardization	category not configured for this cumulative branch
#39792	defect	aborted	Served models handle with nested content	codebase moved on: PR modifies legacy src/transformers/commands/serving.py and tests/commands/test_serving.py, but both paths are deleted on the cumulative branch/current v5-era tree; resolving would require porting the…
#39785	defect	aborted	fix mllama integration tests	codebase moved on: PR changes mllama hidden-state/intermediate-state return wiring in modeling_mllama.py, but that section has diverged substantially on the cumulative branch; resolving would require understanding the n…
#39772	defect	aborted	Fix missing initializations for models created in 2022	codebase moved on: PR touches weight-initialization code across dozens of model files, many of which have diverged or are generated/modular; direct merge produced broad conflicts that are not safe to resolve mechanicall…
#39760	feature	skipped	[Draft] Add Llasa TTS family of models	category not configured for this cumulative branch
#39751	documentation	skipped	🌐 [i18n-KO] Translated `text-to-speech.md` to Korean	category not configured for this cumulative branch
#39735	defect	aborted	handle multimodal models with tp_plan on the text_config	codebase moved on: PR expects older distribute_model(model, distributed_config, device_mesh, tp_size) API and injects tp_plan lookup there; current code has distribute_model(model, tp_plan, distributed_config, device_me…
#39722	feature	skipped	[Feat] Adding Intern-S1	category not configured for this cumulative branch
#39718	documentation	skipped	Fix SigLIP2 documentation model/processor mismatch	category not configured for this cumulative branch
#39708	documentation	skipped	🌐[i18n-bn] Introduce Bengali version of Transformers documentation	category not configured for this cumulative branch
#39690	feature	skipped	Allow custom hf_quantizer in from_pretrained	category not configured for this cumulative branch
#39632	documentation	skipped	fix dead NVIDIA link	category not configured for this cumulative branch
#39631	feature	skipped	[serve] Add speech-to-text	category not configured for this cumulative branch
#39588	feature	skipped	WIP, reference modeling	category not configured for this cumulative branch
#39575	documentation	skipped	🌐 [i18n-KO] Translated `vitpose.md` to Korean	category not configured for this cumulative branch
#39563	documentation	skipped	🌐 [i18n-KO] Translated `vision-encoder-decoder.md` to Korean	category not configured for this cumulative branch
#39559	documentation	skipped	🌐 [i18n-KO] Translated `main_classes/deepspeed.md` to Korean	category not configured for this cumulative branch
#39555	other	skipped	[WIP] try to relax the tie_weights method	category not configured for this cumulative branch
#39544	documentation	skipped	🌐 [i18n-KO] Translated feature_extractors.md to Korea	category not configured for this cumulative branch
#39541	feature	skipped	Add Muon optimizer implementation and integration	category not configured for this cumulative branch
#39534	feature	skipped	Add Beit3 model	category not configured for this cumulative branch
#39517	documentation	skipped	🌐 [i18n-KO] Translated `compressed_tensor.md` to Korean	category not configured for this cumulative branch
#39480	feature	skipped	Add model arcinstitute state	category not configured for this cumulative branch
#39466	documentation	skipped	README: Update Bert Japanese model card	category not configured for this cumulative branch
#39456	defect	aborted	Fix quantized model initialization for int8 dtypes	codebase moved on: quantized initialization path was refactored/already addressed by PR 39491, and PR also modifies fast image processor files that are deleted in current v5 tree
#39435	defect	aborted	Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs	codebase moved on: cherry-pick of test-only regression PR conflicts in tests/models/bart/test_modeling_bart.py; PR states it is not a fix yet and current file structure differs, so resolving a known-failing test without…
#39403	feature	skipped	Add Vocos model	category not configured for this cumulative branch
#39357	documentation	skipped	Update docstring for glm4v	category not configured for this cumulative branch
#39353	defect	aborted	fix colpali mapping	codebase moved on: conflicts in modeling_utils VLM compatibility area removed/reworked in current tree and ColPali now uses base_model_prefix=vlm; applying old checkpoint mapping would need non-trivial review of the cur…
#39309	defect	aborted	Fix audio pipeline with torchcodec input	codebase moved on: audio pipeline torch/tensor handling has been reworked (ASR now accepts torch tensors and averages multi-channel audio), causing conflicts in pipeline code and test expectations; resolving would requi…
#39303	documentation	skipped	Fix critical typos in code example	category not configured for this cumulative branch
#39293	feature	skipped	Add T5LA models	category not configured for this cumulative branch
#39251	defect	aborted	Fix slow test_moshika_greedy_unconditional_fp16	codebase moved on: PR modifies old cache_utils Cache class/helper layout and an older Moshi prepare_inputs_for_generation implementation, while current branch uses CacheLayerMixin and delegates Moshi input preparation t…
#39236	feature	skipped	added moment_p sampling	category not configured for this cumulative branch
#39222	other	skipped	Enable granite 4 hybrid integration tests	category not configured for this cumulative branch
#39212	documentation	skipped	Add Ukrainian translation of README.md	category not configured for this cumulative branch
#39209	other	skipped	Standardize FSMT class naming: PretrainedFSMTModel → PreTrainedFSMTModel	category not configured for this cumulative branch
#39183	feature	skipped	Add a chat extra	category not configured for this cumulative branch
#39150	feature	skipped	Efficient Expert Weight Fusion for Moe deepseek v3	category not configured for this cumulative branch
#39140	feature	skipped	feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT	category not configured for this cumulative branch
#39109	defect	aborted	Fix: rename 'eval_strategy' to 'evaluation_strategy' in TrainingArgum…	codebase moved on: PR renames TrainingArguments eval_strategy to evaluation_strategy in a stale one-file version, but current TrainingArguments has been extensively reorganized and still intentionally exposes eval_strat…
#39108	defect	aborted	Disable static cache on certain MoE models	codebase moved on: PR branch has stale DeepseekV3/Dots1 generated code and conflicts in modular_deepseek_v3; current files no longer contain the targeted static-cache override and the PR head diff only removes stale _ca…
#39084	other	skipped	Refactor gemma3n	category not configured for this cumulative branch
#39047	feature	skipped	RFC: refactor causal lm loss to handle lm_head in loss function	category not configured for this cumulative branch
#39009	other	skipped	Add submodels support check function	category not configured for this cumulative branch
#38991	feature	skipped	Remove `return_dict` kwarg from all the models	category not configured for this cumulative branch
#38988	defect	aborted	Check docstring inside modular files as well	codebase moved on: PR mixes a checker change with many stale modular/generated model cleanups; direct merge conflicts across Makefile, utils/check_docstrings.py, deleted fast-image/args_doc artifacts, and dozens of modu…
#38962	other	skipped	Update test_candidate_generator.py	category not configured for this cumulative branch
#38959	documentation	skipped	Updated the model card for wav2vec2-phoneme	category not configured for this cumulative branch
#38958	documentation	skipped	Updated model card for wav2vec2-conformer	category not configured for this cumulative branch
#38957	documentation	skipped	Update wav2vec2-bert model card	category not configured for this cumulative branch
#38956	documentation	skipped	Updating model card for wav2vec2	category not configured for this cumulative branch
#38955	documentation	skipped	docs: Musicgen melody model card	category not configured for this cumulative branch
#38926	documentation	skipped	Clarify Python and framework version support in installation.md	category not configured for this cumulative branch
#38923	other	skipped	Remove deprecated max_size support from YOLOS image processor	category not configured for this cumulative branch
#38908	defect	aborted	Add support to use config dtype in HybridChunkedCache	codebase moved on: PR changes HybridChunkedCache dtype initialization, but current v5 cache_utils has removed HybridChunkedCache/HybridCache and ends with StaticCache plus a SlidingWindowCache alias; applying the one-li…
#38893	other	skipped	Fix/deprecate max size conditional detr	category not configured for this cumulative branch
#38886	feature	skipped	Allow compile with bnb	category not configured for this cumulative branch
#38877	documentation	skipped	DOC: Clarify attention_mask usage in BertModel forward method	category not configured for this cumulative branch
#38861	feature	skipped	Add SamImageProcessorFast with 4x performance improvement	category not configured for this cumulative branch
#38859	feature	skipped	Add MobileViT fast image processor	category not configured for this cumulative branch
#38839	other	skipped	[DO NOT MERGE] Testing saftensors 0.6.0	category not configured for this cumulative branch
#38810	feature	skipped	Add kwargs support in WhisperForConditionalGeneration	category not configured for this cumulative branch
#38805	feature	skipped	Add Dust3R	category not configured for this cumulative branch
#38793	documentation	skipped	Fix Typos in Comments and Improve Clarity	category not configured for this cumulative branch

vai-minzhou and others added 30 commits April 24, 2026 02:00

Merge branch 'main' into fix/seq2seq-decoder-encoder-attention-mask

b8ef47c

Add 'requests' to serving extras dependencies

b7689c6

Only installing transformers[serving] failed to launch transformers serve due to the lack of requests dependency

Merge pull request #1 from Oneirag/Oneirag-missing-requests-serving

b15878b

Add 'requests' to serving extras dependencies

Merge branch 'main' into fix/eta-warper-all-inf

4df9607

Fix local trust_remote_code cache key collisions

7889d44

Move repetition penalty guard to logits processor

08ac3d8

Merge branch 'main' into fix-repetition-penalty-inputs-embeds

0361926

Fix xdist collisions for captured_info artifacts and preserve CI debu…

47a512b

…g logs

Truncate hash to 16 chars to prevent Windows path length issues

9abd5e7

Skip CPU param materialization on non-rank-0 FSDP ranks to avoid OOM

74480d4

Merge branch 'main' into gemma4-fix

388ad09

update

6165de2

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Merge branch 'main' into main

d94ced8

Merge branch 'main' into torch-type

7a52c40

Fix EP+FSDP2: wrap EP-sharded params as DTensors and exclude experts …

deb916e

…from FSDP

mappings on classes, scoping for every transforms

7ad712a

fix style

c63a7d8

cleanup imports

17de22d

Merge remote-tracking branch 'upstream/main' into improve-weight-conv…

f2d7154

…erter

Fix deduplication removes submodel mappings of the same type

8f726c7

Fix scoped WeightConverter not applied in the correct order, now inte…

fd2c613

…rleaving renames and converts ops

temp fix paligemma

28ed270

Apply _local() to expert biases under EP

24660f6

Fix import ordering

37c106b

Fix incompatible mappings between head and base model for VLMs

c75244e

glmasr should be in AutoModelForMultimodalLM

dcf9519

add dia to MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES

cb7ba4d

evalstate added 29 commits April 29, 2026 22:48

Apply PR huggingface#40358 MXFP4 MLP shape fix

3eda195

Apply PR huggingface#40208 FSDP save_only_model sharded state fix

d6fd902

Merge branch 'mergeability-pr-40148' into all-defects-750

b9f367e

Apply Mixtral torch.export expert loop fix (huggingface#40114)

66137a3

Merge branch 'mergeability-pr-40090' into all-defects-750

0402db0

# Conflicts: # src/transformers/modeling_utils.py

Merge branch 'mergeability-pr-40065' into all-defects-750

9b016f5

# Conflicts: # src/transformers/loss/loss_utils.py

Merge branch 'mergeability-pr-40059' into all-defects-750

f375ccf

# Conflicts: # src/transformers/models/gpt2/configuration_gpt2.py

Merge branch 'mergeability-pr-40022' into all-defects-750

37c36d6

# Conflicts: # src/transformers/models/doge/modeling_doge.py # src/transformers/models/doge/modular_doge.py

Apply PR 39999 tensor parallel meta device-map fix

753cdc7

Adapted from huggingface#39999 because tensor parallel setup moved from modeling_utils.py to integrations/tensor_parallel.py.

Merge branch 'mergeability-pr-39997' into all-defects-750

90242c2

Merge branch 'mergeability-pr-39866' into all-defects-750

254ff61

# Conflicts: # src/transformers/trainer.py

Apply PR huggingface#39794 fix for ProphetNet tuple encoder outputs

cd54479

Merge branch 'mergeability-pr-39793' into all-defects-750

856779d

Merge branch 'mergeability-pr-39741' into all-defects-750

b883fe4

Apply PR huggingface#39698 Exaone4 sliding window pattern fix

6267345

Merge branch 'mergeability-pr-39697' into all-defects-750

4edb60c

Apply PR huggingface#39683 respect dynamo disable env

d4d2a70

Apply PR huggingface#39674 scale loss by data parallel size

534f37c

Apply PR huggingface#39599: guard missing TrainerState on resume

f8340cd

Direct merge of 9ec50bc conflicted after Trainer refactors; apply equivalent current-code guard.

Apply PR huggingface#39560: save best model on eval

a826098

Direct merge of ee690bc conflicted after Trainer/TrainingArguments refactors; apply the current-code equivalent for misaligned save/eval steps.

Merge branch 'mergeability-pr-39493' into all-defects-750

91dc36c

# Conflicts: # setup.py

Merge branch 'mergeability-pr-39491' into all-defects-750

050c5b7

Apply quantized dispatch fix from PR 39468

ab69093

Merge branch 'mergeability-pr-39257' into all-defects-750

7675317

# Conflicts: # src/transformers/utils/generic.py

Apply PR 39206: handle empty Qwen3 MoE router logits

a83d5b3

Apply PR 39103 Gemma3n audio token config rename

cf3dd54

Apply PR 39046 DETR max_size handling

fe5e536

Apply PR 39037 Kosmos2 attention causality fix

f827509

Merge branch 'mergeability-pr-38888' into all-defects-750

2143f93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cumulative defect fixes from recent Transformers PRs#43

Cumulative defect fixes from recent Transformers PRs#43
evalstate wants to merge 633 commits intomainfrom
all-defects-750

evalstate commented Apr 29, 2026

Uh oh!

evalstate commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

evalstate commented Apr 29, 2026

Cumulative defect fixes from recent Transformers PRs

Status counts

Category counts

Validation

Details

Uh oh!

evalstate commented Apr 29, 2026

All-defects flow status

Merged / applied defect records (244)

Rejected / not included records (518)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants