Skip to content

Cumulative defect fixes from recent Transformers PRs#43

Open
evalstate wants to merge 633 commits intomainfrom
all-defects-750
Open

Cumulative defect fixes from recent Transformers PRs#43
evalstate wants to merge 633 commits intomainfrom
all-defects-750

Conversation

@evalstate
Copy link
Copy Markdown
Owner

Cumulative defect fixes from recent Transformers PRs

This PR is generated by the all-defects mergeability flow. It accumulates defect-fix PRs from huggingface/transformers that could be applied cleanly to the current base.

  • Source branch: all-defects-750
  • Base: evalstate/transformers:main
  • Head: 2143f93a30
  • PRs classified: 762
  • PRs with terminal state: 762
  • Applied/merged/already-present defect fixes: 244
  • Aborted defect fixes: 50
  • Validation failures reverted: 10
  • Non-defect skipped: 458

Status counts

  • aborted: 50
  • already_present: 51
  • applied: 43
  • merged: 150
  • skipped: 458
  • validation_failed: 10

Category counts

  • defect: 304
  • documentation: 99
  • feature: 253
  • other: 106

Validation

Each applied defect fix was followed by the configured lightweight validation profile:

  1. compileall -q src/transformers
  2. utils/checkers.py ruff_check,ruff_format,init_isort,sort_auto_mappings
  3. utils/tests_fetcher.py ... && pytest ... when impacted pytest targets are selected

Note: this is intentionally not an end-to-end or slow-test validation pass.

Details

A detailed status table is posted as a PR comment and is also available locally in:

  • .mergeability/defect-merge-state.jsonl
  • .mergeability/pr-classifications.jsonl
  • all-defects-report.md

vai-minzhou and others added 30 commits April 24, 2026 02:00
Per @Rocketknight1's review: replace the ad-hoc `_no_reinit` flag with the
existing `_is_hf_initialized` flag that `from_pretrained` already sets on
checkpoint-loaded parameters. Guard each Mamba2 init target
(A_log / D / dt_bias) and the residual-scaled `out_proj.weight`
independently, so parameters restored from a checkpoint survive any
subsequent `_init_weights` pass.
Fixed 4 test(s):
- tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_batched_generate
- tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_forward
- tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_generate
- tests/models/qianfan_ocr/test_modeling_qianfan_ocr.py::QianfanOCRIntegrationTest::test_model_integration_generate_text_only
Only installing transformers[serving] failed to launch transformers serve due to the lack of requests dependency
Add 'requests' to serving extras dependencies
…ined

When a caller passes a pre-built sub-processor via kwargs to
`AutoProcessor.from_pretrained` (e.g. `tokenizer=tok` or `bpe_tokenizer=tok`),
use the instance directly instead of silently forwarding it into the
sub-loader calls. Exact attribute names take precedence; the canonical
modality name is also accepted as an alias when a single sub-processor has
that modality.
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
# Conflicts:
#	src/transformers/modeling_utils.py
# Conflicts:
#	src/transformers/loss/loss_utils.py
# Conflicts:
#	src/transformers/models/gpt2/configuration_gpt2.py
# Conflicts:
#	src/transformers/models/doge/modeling_doge.py
#	src/transformers/models/doge/modular_doge.py
Adapted from huggingface#39999 because tensor parallel setup moved from modeling_utils.py to integrations/tensor_parallel.py.
# Conflicts:
#	src/transformers/trainer.py
Direct merge of 9ec50bc conflicted after Trainer refactors; apply equivalent current-code guard.
Direct merge of ee690bc conflicted after Trainer/TrainingArguments refactors; apply the current-code equivalent for misaligned save/eval steps.
# Conflicts:
#	src/transformers/utils/generic.py
@evalstate
Copy link
Copy Markdown
Owner Author

All-defects flow status

Processed terminal records: 762

Merged / applied defect records (244)

PR Status Method Validation Original PR summary / goal Merge note
huggingface#45692 merged merge passed [Fix Phi4 test] Add option to override image_processor_auto_map with local code when trust_remote_code is True clean merge; validation passed with bounded light pytest targets
huggingface#45691 merged merge passed [serve] cb error clean merge; validation passed with bounded light pytest targets
huggingface#45687 merged merge passed fix: Made histc_input robust for broader hardware clean merge; validation passed with bounded light pytest targets
huggingface#45686 merged merge passed Fix custom-module copies inheriting read-only permissions clean merge; validation passed with bounded light pytest targets
huggingface#45683 merged merge passed Exclude audio modules from conversion process clean merge; fixed PR-local whitespace/formatting and validation passed
huggingface#45682 merged merge unavailable FIX Restore LoRA hotswapping functionality merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45681 already_present none not_run Restore TokenizersBackend override for DeepSeek V3/R1 tokenizer dispatch current branch already has equivalent DeepSeek TokenizersBackend override and regression coverage; PR conflicts only because it edits the same implemented block
huggingface#45678 merged merge unavailable Fix shared config mutation issue in flash_attn_from_config merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45671 merged merge unavailable Update latest revision for Phi-4-multimodal test merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45670 merged merge unavailable [nit] glmasr should be in AutoModelForMultimodalLM merge clean; compile and style passed; light validation unavailable because torchcodec cannot load without system FFmpeg libraries
huggingface#45662 merged merge passed Fix EP + FSDP2: experts silently overwritten by rank-0 broadcast clean merge; validation passed with bounded light pytest targets
huggingface#45661 merged merge passed [Weight Converter] More fine-grained mappings on classes, scoping for every transforms (including weight converter) clean merge; validation passed with bounded light pytest targets
huggingface#45649 merged merge passed Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models clean merge; compile/style and light validation passed (202 pytest passed, 107 skipped)
huggingface#45645 merged merge passed Fix xdist collisions for captured_info artifacts and preserve CI debug logs clean merge; compile/style and light validation passed (202 pytest passed, 107 skipped)
huggingface#45642 merged merge passed Fix trust_remote_code local cache collisions for local models (huggingface#45632) merge required an additive test-file conflict resolution with prior dynamic-module test coverage; validation passed
huggingface#45639 already_present none not_run Make patched testing debug logs xdist-safe equivalent xdist-safe captured_info path/reset behavior already merged via huggingface#45645; direct merge conflicted only with the broader implementation/tests
huggingface#45694 merged merge passed Fix train_batch_size and eval_batch_size to respect split_batches config merge required PR-local ruff formatting; validation passed
huggingface#45627 merged merge passed Processing Utils: honor pre-built sub-processor kwargs in from_pretrained clean merge; validation passed with bounded light pytest targets
huggingface#45615 merged merge passed fix(qianfan_ocr): add XPU expectations clean merge; validation passed with bounded light pytest targets
huggingface#45697 applied cherry-pick passed fix(testing): check torch.cuda.is_available() before get_device_capability cherry-picked PR fix to avoid dragging unrelated upstream/main commits; removed stale local import so ruff passed; validation passed
huggingface#45614 merged merge passed Add missing requests dependency to transformers[serving] clean merge; fixed PR-local setup.py formatting and validation passed
huggingface#45594 merged merge passed fix(utils): Resolve backbone utils test regressions clean merge; validation passed with bounded light validation
huggingface#45591 merged merge passed [nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight clean merge; compile/style and bounded light validation passed
huggingface#45578 merged merge passed Remove attribute_map from GptOssConfig clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45570 merged merge passed Fix whisper long-form generation when eos_token_id is a list clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45568 merged merge passed Gemma4: fix failed test cases clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45552 merged merge passed Remove warnings for modernbert clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45549 merged merge passed fix: apply channel averaging correctly in audio feature extractors clean merge; compile/style and bounded light validation passed (202 pytest passed, 103 skipped)
huggingface#45548 merged merge passed Fix EP + DeepSpeed ZeRO-3 loading via accelerate launch clean merge; removed duplicate has_ep introduced by overlap with cumulative branch and validation passed
huggingface#45541 merged merge passed Fix local_files_only tokenizer fallback when tokenizer files are missing (Issue 45538) clean merge; validation passed
huggingface#45524 merged merge passed utils: handle flash_attn missing from importlib packages_distributions without crashing clean merge; validation passed
huggingface#45523 merged merge passed Fix Seq2SeqLM ExecuTorch export: add encoder_attention_mask to decoder and use static encoder shapes clean merge; validation passed
huggingface#45487 merged merge passed Fix model parallel issue for altclip model and ChineseClip model clean merge; validation passed
huggingface#45423 merged merge passed Fix void segmentation map label reduction clean merge; validation passed
huggingface#45422 merged merge passed Drop from messages in clean merge; validation passed
huggingface#45413 merged merge passed Fix EtaLogitsWarper on fully masked logits clean merge; validation passed
huggingface#45389 merged merge passed Require input_ids for repetition penalty clean merge; validation passed
huggingface#45379 merged merge passed fix(config): add deepstack_visual_indexes to Qwen3_5MoeVisionConfig clean merge; validation passed
huggingface#45378 merged merge passed fix(mistral): guard ReasoningEffort import for older mistral_common versions clean merge; ruff format applied to changed file and amended; validation passed
huggingface#45360 merged merge passed Replace deprecated references with clean merge; validation passed
huggingface#45351 already_present none not_run fix(testing_utils): guard get_device_capability with torch.cuda.is_available() current branch already guards CUDA/ROCM get_device_capability with torch.cuda.is_available(); direct PR merge only conflicted on later get_device_properties refactor/XPU availabil…
huggingface#45346 merged merge passed Fix Double Application of Softmax for Router Logits in MoE models clean merge; validation passed
huggingface#45342 merged merge passed Use _keys_to_ignore_on_load_unexpected/missing recursively from children clean merge; ruff format applied to changed file and amended; validation passed
huggingface#45321 merged merge passed Remove references to torchao's AffineQuantizedTensor merged with trivial import conflict resolved by removing obsolete AffineQuantizedTensor import; validation passed
huggingface#45317 merged merge passed Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True clean merge; validation passed
huggingface#45221 applied patch passed user friendly error when loading audio from video applied narrow video-container error check for audio loading; direct PR-head merge would drag hundreds of unrelated old-main changes
huggingface#45202 merged merge passed Fix gemma4 has flash-attention incompatbile head-dim=512 merged Gemma4 FlashAttention guard fix; validation passed
huggingface#45193 merged merge passed Config can apply pyndatic validation without torch-dependence merged pydantic type-validator fix; validation passed
huggingface#45170 applied patch passed layrnorm -> layernorm applied CLIP-like pre_layernorm typo fix; validation passed
huggingface#45147 applied patch passed Fix broken HQQ support applied HQQ quantization and serialization fixes; direct merge avoided because PR head would bring 734 unrelated upstream-history files
huggingface#45128 applied patch passed Fix: handle future annotations in _process_kwargs_parameters applied future-annotations auto_docstring crash fix and amended formatting; direct merge avoided because PR head diff included 1049 unrelated upstream-history files
huggingface#45114 applied patch passed fix: lets fix all doctests applied documentation doctest fixes; resolved a simple one-line markdown conflict; no pytest targets selected
huggingface#45105 applied patch passed Fix @auto_docstring crash with from future import annotations in _process_kwargs_parameters applied focused auto_docstring string-annotation fix and tests; validation passed
huggingface#45086 merged merge passed fix AttributeError in _patch_mistral_regex merged tokenizer regex AttributeError fix; compileall, repo checkers, and light validation passed
huggingface#45060 merged merge passed Fix PIL backend fallback when torchvision is unavailable merged PIL backend fallback fix; compileall, repo checkers, and light validation passed
huggingface#45056 merged merge unavailable [auto_docstring] needs to be only run on doc merged auto_docstring lazy doc fix with straightforward conflict resolution; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#45055 applied patch passed Save model config in Trainer checkpoints for non-PreTrainedModel models applied narrow Trainer _save config persistence fix; direct merge conflicted across trainer refactor, but patch was localized and validation passed
huggingface#45034 merged merge passed Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels from data collator merged Qwen3.5 packed boundary metadata fix; resolved conflicts by combining prior cached multi-token fix with seq_idx/cu_seqlens fast-path changes; validation passed
huggingface#45017 merged merge passed [WIP][Fix] GLM 5 set apply_rotary_pos_emb to is_neox_style=False && remove F.relu() merged GLM-5 DSA rotary/FP8 indexer fix; resolved modular/generated cache/import conflicts and validation passed
huggingface#44981 merged merge unavailable Trainer: set skip_logits for loss-only eval when liger enabled merged Trainer skip_logits loss-only eval fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44973 merged merge unavailable Fix max_seqlen type in vision attention for torch.compile + FA2 merged vision attention max_seqlen .item() fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44958 merged merge unavailable fixed import error with PILImageResampling merged PILImageResampling import error fix; amended auto-fix for unused import; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44952 merged merge unavailable Fix: Add correct return behaviour when output_hidden_states=True for CLIP and SIGLIP vision models merged CLIP/SigLIP hidden_states output fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44940 merged merge unavailable Fix tie_weights skipping logic is not tied to model thread scope merged tie_weights thread-scope fix; resolved initialization.py overlap with meta_device_safe_creation_ops; compileall and repo checkers passed, light validation timed out as base…
huggingface#44923 merged merge unavailable fix: avoid unconditional model_info call in _patch_mistral_regex merged offline/local _patch_mistral_regex model_info guard; resolved is_local overlap and amended formatting; compileall and repo checkers passed, light validation timed out as ba…
huggingface#44907 merged merge unavailable Remove unnecessary expand_as in get_placeholder_mask across VLMs merged VLM placeholder mask broadcast/compile fix; resolved ColQwen2 visual API overlap while keeping current self.vlm.visual call; compileall and repo checkers passed, light vali…
huggingface#44893 merged merge unavailable add StaticLayer.crop() to match DynamicLayer API merged StaticLayer.crop API fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44889 merged merge unavailable [DeepSpeed] Fix evaluate()/predict() before train() merged DeepSpeed evaluate-before-train fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44836 merged merge unavailable Add cu_seqlens support to OlmoHybridGatedDeltaNet for packed sequences merged OlmoHybrid packed-sequence cu_seqlens fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44827 merged merge unavailable Fix Mistral4 tests merged Mistral4 grouped-mm alignment/test fix; compileall and repo checkers passed, light validation timed out as baseline did
huggingface#44781 applied cherry-pick passed Fix _set_model_specific_special_tokens to accept list-format extra_special_tokens cherry-picked targeted tokenizer fix; direct PR-head merge would drag unrelated upstream history and conflicted on tests/test_tokenization_common.py; validation passed after autof…
huggingface#44731 merged merge unavailable [Tests] Fix slow video tensor creation from list of numpy arrays in SmolVLM merge clean; compileall and repo checkers passed, but light validation/tests_fetcher timed out as baseline light validation did
huggingface#44697 applied cherry-pick passed fix: torch_float should return float, not int cherry-picked targeted torch_float fix; compileall, repo checkers, and light validation passed (255 passed, 102 skipped)
huggingface#44680 merged merge passed Allow kernel modules to declare their preferred mask function merge clean; validation commands passed
huggingface#44664 merged merge passed 🚨 Generic Sequence Classifier works for multimodal models merge clean; validation commands passed
huggingface#44641 merged merge passed Conditinally passing and_mask_function arg to create_causal_mask merge clean; compileall, repo checkers, and light validation passed
huggingface#44626 already_present none not_run don't break legacy behavior when enforced! PR head is already contained in the cumulative branch; no merge was needed
huggingface#44615 merged merge unavailable Restore is_torch_fx_available for trust_remote_code backwards compatibility merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44606 merged merge unavailable optionally override tokenizer class with serialized tokenizer merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44603 merged merge unavailable fixed dockerfile for arm64 systems merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44587 merged merge unavailable Fix: Handling fused qkv result tensor slicing for tp sharded qkv weights merge clean; narrowed ruff format was amended into merge commit; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44585 merged merge unavailable Fix missing rms_norm_eps in DeepseekV3 MLA layernorms merge clean; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44535 already_present none unavailable Fix crash in Qwen2_5_VLProcessor when using batched input with padding=False current branch already avoids the ragged np.array crash via ProcessorMixin.create_mm_token_type_ids iterating per input; direct merge conflicted with the newer helper-based implem…
huggingface#44385 merged merge unavailable Fix make check-repo merge clean; validation unavailable: baseline and post-merge light validation timed out in tests_fetcher
huggingface#44270 merged merge unavailable Add correct typing to custom images_kwargs in ProcessorsKwargs merge clean after resolving Ernie VL Moe naming conflict and narrow import-order fixes; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44257 merged merge unavailable use nanmean for aggregating loss merge clean; compileall and style passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#44228 merged merge unavailable [Quantisation] account for nested tensors from quantisers merge clean; compileall and style passed after removing transient untracked validation checkout blockers; light validation unavailable because tests_fetcher timed out as in baseli…
huggingface#44189 merged merge unavailable fix: don't move model to device under other dist train backends merge resolved by carrying the current full-eval move block and adding the distributed-backend guards; compileall and style passed; light validation unavailable because tests_fetc…
huggingface#43989 merged merge unavailable Fix AutoVideoProcessor class lookup when torchvision is unavailable merged AutoVideoProcessor None-mapping guard; resolved current string-valued mapping conflict; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43967 merged merge unavailable Fix AttributeError in run_classification.py when detecting multi-label data merge clean; compileall and repo checkers passed; removed transient tests_fetcher checkout blockers, then light validation timed out as in baseline
huggingface#43961 merged merge unavailable Replace mutable default arguments with None merged mutable-default cleanup; resolved Idefics defaults conflict; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43911 merged merge unavailable add Llama to mapping names in tokenization_auto.py merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43875 merged merge unavailable Improve handling of QuantizedLayer.reset merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43842 already_present none unavailable fix(cli): Fix TypeAdapter NameError when pydantic is not installed serve CLI has been refactored to cli/serving modules and no current TypeAdapter runtime annotation/import remains; direct old-file merge conflicted, so no code change needed
huggingface#43836 already_present none unavailable fix: wrapped TypeAdpater in string literals (for now) same TypeAdapter serve import defect as huggingface#43842; current refactored serve entrypoint no longer has runtime TypeAdapter annotations, and old-file merge conflicted without needing a …
huggingface#43833 merged merge unavailable fix: ensure dtype consistency in grouped_mm under autocast merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43826 merged merge unavailable fix: error message of pipeline resolved pipeline check_task conflict by retaining translation alias handling; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43816 already_present none unavailable fix: add id and resume parameters to SwanLab integration current SwanLab callback already supports SWANLAB_RUN_ID and SWANLAB_RESUME, with an additional resume_from_checkpoint default; old merge conflicted only in the same already-porte…
huggingface#43779 merged merge unavailable SwanLab: Add support for id and resume arguments in SwanLabCallback merge clean on top of existing SwanLab env-var resume support; compileall and repo checkers passed, removed transient tests_fetcher checkout blockers, then light validation timed …
huggingface#43775 merged merge unavailable fix(moe): normalize auxiliary loss by top_k for correct load balancing merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43757 already_present none unavailable Avoid hard failure for gpt-oss GGUF architecture by falling back to g… current GGUF loader has native gpt_oss/gpt-oss mapping and GptOssTensorProcessor support; old fallback-to-gpt-neox change conflicts with the newer support, so no code change was n…
huggingface#43747 merged merge unavailable Remove CompressedLinear support for compressed-tensors > 0.13 resolved compressed-tensors 0.14 conflict by combining newer QuantizationStatus-based tests with automatic decompression; compileall and repo checkers passed, removed transient do…
huggingface#43656 already_present none unavailable Fix TypeAdapter NameError in transformers CLI same TypeAdapter serve import defect as huggingface#43842/huggingface#43836; current serve CLI uses Typer/Annotated and no TypeAdapter runtime annotation remains
huggingface#43654 merged merge unavailable fix(tokenizer): Avert special token property overwrites in batch add_tokens calls resolved tokenizer test conflict by retaining protobuf tests and adding BigBird mask-token regression; compileall and repo checkers passed, removed transient docs checkout blocker…
huggingface#43651 merged merge unavailable Add _loss_is_scaled_for_ga to allow custom trainers to control gradient accumulation loss scaling merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43549 merged merge unavailable [kernels] exception handling for fa kernels merge clean; compileall and repo checkers passed, removed transient docs checkout blockers, then light validation timed out as in baseline
huggingface#43543 merged merge unavailable Fix fp16 underflow in MoE load balancing loss by enforcing fp32 softmax merge clean; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43542 already_present none unavailable fix: output router capture wrong router logits in qwen moe models current MoE router implementations already preserve raw router_logits in a separate variable and use router_probs for top-k routing; PR merge only conflicted on a variable rename …
huggingface#43492 merged merge unavailable Perception Encoder follow up PR merge conflicted only with newer strict config/conversion mapping layout; resolved locally, compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43466 merged merge unavailable Fix mask loss to ignore padding areas in object detection merge clean; compileall and repo checkers passed, light validation unavailable after transient checkout blockers were cleaned and rerun timed out as in baseline
huggingface#43395 merged merge unavailable Fix label truncation for per-sample nested structures in Trainer merge conflict was limited to preserving both trainer_pt_utils imports; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43382 merged merge unavailable Allow Path type in transformers.image_utils.load_image function resolved moved-on import/signature conflict to allow pathlib.Path in load_image; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43378 merged merge unavailable feat(models): Make MimiModel encoding padding-aware to ensure batch-to-individual consistency clean merge; compileall and repo checkers passed; light validation first hit transient tests_fetcher checkout blockers, then timed out as in baseline after cleanup
huggingface#43291 merged merge unavailable Fix whisper tests clean merge; compileall and repo checkers passed, light validation timed out as in baseline
huggingface#43270 merged merge unavailable fix _retrieve_segment timestamps offset bug clean merge; fixed PR-introduced trailing whitespace/formatting and amended merge commit; compileall and repo checkers passed; light validation hit transient tests_fetcher checkou…
huggingface#43254 merged merge unavailable Add supported kwargs to fixed_cross_entropy clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43251 already_present none unavailable Fix(43240): pass kwargs to nn.functional.cross_entropy same issue and API change already merged from PR 43254; direct merge only conflicted on equivalent formatting in loss_utils.py, so no additional code change needed
huggingface#43238 applied patch unavailable Fix ObjectDetectionPipeline batch processing bug huggingface#31356 patch applied cleanly; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43212 merged merge unavailable Add regression test for offline tokenizer loading (fixes huggingface#43200) clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43151 merged merge unavailable Make TF32 tests hardware-aware for PyTorch 2.9+ clean merge; compileall and repo checkers passed; light validation unavailable because tests_fetcher timed out as in baseline
huggingface#43149 already_present none unavailable docs(serving): add minimal Python client examples for chat completion… stop_strings serving fix is already present in the refactored serving handlers; direct merge conflicts with the newer split serving implementation and deleted docs page
huggingface#43133 applied patch unavailable Fix flaky SAM-HQ integration tests by adding set_seed patch applied; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43096 applied patch unavailable Fix save_pretrained for quantized models with custom serialization patch applied; compileall and repo checkers passed; light validation unavailable due tests_fetcher reverse dependency map error
huggingface#43028 applied patch unavailable Fix default interpolation to BICUBIC for ViT, EfficientNet, PVT ViT image processor defaults patched to BICUBIC; EfficientNet/PVT were already BICUBIC; compileall/checkers passed, light validation unavailable due tests_fetcher reverse dependen…
huggingface#43015 merged merge unavailable FIX: TF32 warning (huggingface#43012) clean merge replacing RoPE outer-product matmul with broadcast multiplication; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42979 merged merge unavailable Fix dtype mismatch in in modeling_llava_next clean merge casting LlavaNext hidden states to lm_head dtype before logits; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42942 merged merge unavailable Fix result retrieval starvation and terminate request-scoped iteration on completion merged starvation-safe queue scanning and request iterator completion/cancellation handling; conflict resolved for current OutputRouter API; compileall/checkers passed, light vali…
huggingface#42916 already_present none not_run Fix: Apply clean_up_tokenization_spaces in TokenizersBackend._decode current branch already applies clean_up_tokenization_spaces in TokenizersBackend._decode, with additional BPE safeguards; PR head conflicted only with that evolved implementation
huggingface#42900 merged merge unavailable Fix: Set clean_up_tokenization_spaces merged default clean_up_tokenization_spaces=True while preserving current BPE override guard; compileall/checkers passed, light validation unavailable due tests_fetcher timeout
huggingface#42881 merged merge unavailable [GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75)
huggingface#42865 merged merge unavailable Raise explicit error when FP8 is requested via from_config merge conflict in modeling_utils.py resolved by keeping current name_or_path assignment and adding quantization_config guard; compileall/checkers passed; light validation unavaila…
huggingface#42824 applied patch unavailable Fix torch only support for fast Processors applied narrower MLX BatchFeature support; compileall/checkers passed; light validation unavailable because tests_fetcher failed on missing generated fast image processor map entry
huggingface#42793 merged merge unavailable [Quantization] Fixing last issues for a green 2026 CI hopefully merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75)
huggingface#42774 already_present none not_run Fix: add None check for extractors in video_processor_class_from_name current branch already skips None VIDEO_PROCESSOR_MAPPING_NAMES entries before comparing the requested video processor class; PR head conflicted only because it targeted the older…
huggingface#42717 merged merge unavailable image_transforms: fix tensor annotations merge clean; compileall/checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75), matching baseline
huggingface#42631 merged merge unavailable Make GraniteMoeHybridModel compatible with torch.export merged with a local conflict resolution adapting the torch.export guard to the current past_key_values-based mask helper; compileall/checkers passed; light validation unavailable …
huggingface#42598 merged merge unavailable [pipeline] fix unwanted import failure merge clean; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s (exit 75), matching baseline
huggingface#42542 already_present none passed handle get_input_embeddings() on models like gemma3 gracefully current branch already catches NotImplementedError from get_input_embeddings and skips modules without hook support; no code change needed
huggingface#42521 merged merge unavailable Fix FSDP2 defaulting to version 1 in TrainingArguments; add dynamic plugin param passthrough merged with local conflict resolution against current FSDP plugin construction; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed…
huggingface#42493 merged merge unavailable fix: remove trailing os sep in local pretrained model path merged with local test conflict resolution; removed stale untracked files that blocked tests_fetcher; compileall and repository checkers passed; light validation unavailable becau…
huggingface#42446 merged merge unavailable Fix DataParallel dtype access in smolvlm merge clean; compile and style passed; light validation unavailable because tests_fetcher timed out with exit 124, matching baseline
huggingface#42311 merged merge unavailable Fix: Guard against None num_query_tokens in Blip2Processor (to avoid TypeError) merge completed with a small Blip2Processor conflict resolution; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s,…
huggingface#42228 merged merge unavailable Support .to(device) or Device Aware Handling for Segmentation Labels in EOMTImageProcessor huggingface#42205 merge clean; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s, matching baseline
huggingface#42133 merged merge unavailable Fix qwen moe Load balancing loss calculation outside training merge clean; compileall and repository checkers passed after removing untracked validation checkout leftovers; light validation unavailable because tests_fetcher timed out after 9…
huggingface#42131 already_present none not_run Mistral Tokenizer Converter Script - Initialization with Pattern Argument PR conflicted only because current branch already extracts the Mistral Tekken pattern and passes it to MistralConverter while using the newer PreTrainedTokenizerFast path; no code…
huggingface#42130 already_present none not_run Try refactoring logging to make type checks pass current branch already resolves the logging typing issue with TransformersLogger typing and explicit type ignores on logger method injection; the older PR subclass refactor confli…
huggingface#42127 merged merge unavailable Standardize conv len function for audio models merge required straightforward import-conflict resolution preserving current deprecation/flash-attention imports plus new _conv_out_length helper; compileall and repository checke…
huggingface#42098 merged merge unavailable Fix mel length computation in Qwen2-Audio merge required straightforward docstring conflict resolution and applied odd-length mel sequence formula fix; compileall and repository checkers passed; light validation unavailab…
huggingface#42051 merged merge unavailable Fix model_input_names singleton issue causing shared state merge clean; compileall and repository checkers passed after removing untracked validation artifacts; light validation unavailable because tests_fetcher timed out after 90s, match…
huggingface#41980 already_present none not_run Correct type hint in config models current branch already has rms_norm_eps annotated as float in the affected configuration and modular files
huggingface#41973 merged merge unavailable Fix import error with huggingface_hub v1.0.0+ resolved import-only conflicts to use huggingface_hub.errors; compileall and repository checkers passed after narrow ruff fix and removing untracked validation artifacts; light va…
huggingface#41928 merged merge unavailable fix: add clear error message when mistral-common is missing for AutoTokenizer loading Voxtral resolved AutoTokenizer conflict by adding Voxtral mistral-common availability check to current v5 tokenizer mapping path; compileall and repository checkers passed after narrow im…
huggingface#41904 applied patch unavailable Fix inaccurate eval and train loss computation with variable batch sizes applied focused loss-weighting patch after direct merge conflicted in moved Trainer utility/save_model region; compileall and repository checkers passed; light validation unavaila…
huggingface#41901 merged merge unavailable [executorch] Update pytree registration for DynamicCache clean merge; compileall and repository checkers passed; light validation unavailable because tests_fetcher timed out after 90s, consistent with baseline validation unavailability
huggingface#41879 already_present none unavailable Fix/processor multiple tokenizers direct merge conflicted in processing_utils.py, but the cumulative branch already excludes tokenizer attributes before deepcopy, saves secondary tokenizers in per-attribute subfol…
huggingface#41855 already_present none unavailable Add Mistral tokenizer missing methods direct merge conflicted in tokenization_mistral_common.py docs/code from an older branch, but the cumulative branch already has get_vocab, convert_ids_to_tokens, convert_tokens_to…
huggingface#41851 already_present none unavailable Fix deepcopy in ProcessorMixin.to_dict for GemmaTokenizerFast same head as PR 41879; direct merge conflicted in processing_utils.py, but the cumulative branch already avoids deepcopying tokenizer attributes and has the multiple-tokenizer sav…
huggingface#41844 applied patch unavailable Fix FSDPv2 checkpoint saving on TPU by using recursive unwrap direct merge conflicted after TPU saving moved to integrations/tpu.py; adapted recursive FSDPv2 unwrap fix there; compileall and repository checkers passed; light validation unava…
huggingface#41827 applied patch unavailable [Flash Attention] Disable packed sequences with pos ids only during torch compile direct merge conflicted in evolved FlashAttention utility; applied compile/tracing guard for packed position ids; compileall and repository checkers passed; light validation unava…
huggingface#41754 applied patch unavailable Add pytree registration for static cache direct merge conflicted in executorch cache export code; adapted generalized DynamicCache and StaticCache pytree registration to current cache layer classes; compileall and reposi…
huggingface#41734 merged merge unavailable Fix CUDA errors in sharded generation with Qwen3 direct merge conflicted in generation/utils.py but was resolved by placing the sharded sampling invalid-value guard after current generation-config handling; compileall and reposi…
huggingface#41724 merged merge unavailable Fix confusing warning in EncoderDecoderModel when creating decoder_input_ids from labels direct merge conflicted in EncoderDecoderModel mask creation but was resolved by preserving the current mask dtype path and adding the PR warning split; compileall and repository …
huggingface#41721 merged merge unavailable Fix Qwen3-VL Processor flattening multi-image batches (fix huggingface#41709) clean merge; removed unrelated untracked legacy/generated artifacts that were making ruff scan obsolete files; compileall and repository checkers passed; light validation unavaila…
huggingface#41718 already_present none unavailable AutoTokenizer: clear ImportError when loading Voxtral without mistral-common + unit test direct merge conflicted in auto tokenization and the Voxtral tokenizer test, but the cumulative branch already has the clear mistral-common ImportError guard and matching unit tes…
huggingface#41701 merged merge unavailable Fix qwen3_vl mix precision dtype clean merge; compileall and repository checkers passed; light validation unavailable as at baseline
huggingface#41698 merged merge unavailable Fix tokenizer check script: safe dataset access, default checkpoints, and tested in dry-run mode direct merge conflicted in scripts/check_tokenizers.py debug-output removal; resolved with the PR dry-run/default-checkpoint changes and amended narrow ruff fixes; compileall and …
huggingface#41687 merged merge unavailable fix(data): Handle integer labels in DataCollatorWithFlattening direct merge conflicted in DataCollatorWithFlattening and the refactored data-collator tests; adapted scalar-label broadcasting to the current collator API and added a focused tes…
huggingface#41631 already_present none not_run Incorrect access of dataset field fixed current scripts/check_tokenizers.py already accesses XNLI premise and hypothesis directly and keeps the newer dry_run handling, so the PR fix is present without merging the older …
huggingface#41609 already_present none not_run Fix gemma gguf tokenizer current AutoTokenizer GGUF path loads config from GGUF and maps gemma model types to GemmaTokenizer, so the intended Gemma GGUF tokenizer-class fix is already covered; direct PR b…
huggingface#41606 already_present none not_run fix(processing): Filter kwargs in ProcessorMixin call to prevent Type… current ProcessorMixin.call already merges kwargs by modality and filters them against each sub-processor call signature before dispatch, so the TypeError fix is present; …
huggingface#41592 already_present none not_run Improve AutoTokenizer error message for Voxtral models missing mistral-common current AutoTokenizer already raises a clear ImportError for model_type 'voxtral' when mistral-common is unavailable, covering the intended missing-dependency error-message fix
huggingface#41521 merged merge unavailable Fix forced_bos_token_id not set in generation_config merge clean; compileall and repository checkers passed, but light validation unavailable as in baseline: tests_fetcher timed out after 90s (exit 124) and run-light-validation exit…
huggingface#41485 applied patch unavailable Fix smolvlm2 dtype mismatch final applied minimal dtype conversion in SmolVLM inputs_merger; compileall passed, but full validation unavailable as in baseline due pre-existing ruff failures in untracked benchmark_…
huggingface#41441 merged merge unavailable Enhance the handling of Union types in HfArgumentParser merge clean; compileall passed, but full validation unavailable as in baseline due pre-existing ruff failures and tests_fetcher checkout failure on untracked files
huggingface#41330 merged merge unavailable Unskip and fix offline mode tests, use HF_HUB_OFFLINE, make hermetic merge completed with one test_offline conflict resolved to current transformers.utils is_offline_mode semantics; compileall passed, full validation unavailable as in baseline due …
huggingface#41319 already_present none not_run Use torch._check instead of a test in Gemma3Model current Gemma3Model already implements the PR intent with torch_compilable_check for the image token/feature validation, so the older torch._check patch conflicts only because the…
huggingface#41313 merged merge unavailable Jitter noise PR merge completed after resolving router conflict to preserve current classifier dtype casting while applying jitter only to a routing copy; compileall passed, full validation unava…
huggingface#41312 already_present none not_run tests: unskip and fix offline mode test using HF_HUB_OFFLINE + hermetic cache warmup offline-mode test is already unskipped and uses HF_HUB_OFFLINE/cache warmup after merged PR huggingface#41330, which supersedes this narrower duplicate fix
huggingface#41273 already_present none not_run fix(quantization): Skip weight initialization for quantized models current v5 loading path already removes loaded quantized targets from missing_keys and marks them _is_hf_initialized in core_model_loading, superseding this older modeling_utils/H…
huggingface#41239 already_present none not_run Add num_hidden_layers to t5gemma's top level config current T5GemmaConfig already exposes top-level num_hidden_layers and uses it to build both encoder and decoder layers
huggingface#41169 merged merge unavailable Fix TorchDynamo crash in StaticCache by validating offloading and offload_only_non_sliding arguments merge clean; compileall passed, full validation unavailable as in baseline due pre-existing untracked benchmark/examples lint failures and tests_fetcher checkout blockers
huggingface#41132 applied cherry-pick unavailable fix(SpeechT5Config): missing annotation on inputs_to_logits_ratio property applied @Property fix for SpeechT5Config.inputs_to_logits_ratio; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failures and tests_f…
huggingface#41121 applied cherry-pick unavailable fix: resolve the unexpected video frame drop issue of the InternVL model with multiple video inputs adapted InternVL multi-video frame indexing fix and updated test expectation; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failure…
huggingface#41105 applied patch unavailable Fix is_torch_neuroncore_available patched is_torch_neuroncore_available to respect check_device via is_torch_xla_available(check_is_gpu=check_device); compileall passed, full validation unavailable as in baseline …
huggingface#41097 applied patch unavailable Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py patched _upad_input to delay nonzero index materialization in the equal query/key length path; compileall passed, full validation unavailable as in baseline due pre-existing untra…
huggingface#41077 merged merge unavailable Fix: add num_hidden_layers property to T5GemmaConfig and add test for use_cache merge clean; compileall passed, full validation unavailable as in baseline due pre-existing untracked lint failures and tests_fetcher checkout/dependency-map failure
huggingface#41075 applied cherry-pick unavailable Fix Qwen3 deterministic generation when do_sample=False and num_beams=1 for Greedy Decoding applied Qwen3 greedy deterministic generation fix and regression test; compileall and narrow ruff checks for changed files passed, full validation unavailable as in baseline due p…
huggingface#40954 already_present none not_run Fix Issue huggingface#40913: Respect user-provided chat_template parameter in processor creation current processing_utils.py already applies user kwargs after loaded processor/chat template defaults, so a user-provided chat_template overrides model defaults; PR merge conflict…
huggingface#40908 merged merge unavailable Fix load_balancing_loss_func incompatible with past_key_values merge clean; compileall passed, but baseline/checkers already fail on pre-existing untracked benchmark_v2/examples legacy lint, and light validation is blocked by pre-existing unt…
huggingface#40857 already_present none not_run Token current Trainer already snapshots initial num_input_tokens_seen at train start and reports train speed from current-session tokens; PR direct merge conflicts with extensively refa…
huggingface#40790 applied patch unavailable Handle loading non-existent checkpoints or corrupted checkpoints. applied checkpoint resume handling: missing output dir returns no checkpoint, resume_from_checkpoint=True warns instead of raising when no checkpoint exists, and saves latest chec…
huggingface#40783 merged merge unavailable Fix None quantization_config equivalence with omitted param in AutoModel.from_pretrained merge clean; compileall and ruff on changed file passed, but full validation unavailable due pre-existing benchmark_v2/examples lint failures and tests_fetcher checkout blocked by…
huggingface#40695 already_present none not_run remove vision2seq vs image-text-to-text current branch already has no MODEL_FOR_VISION_2_SEQ mapping, so Qwen entries are not tagged vision2seq; merge aborted after conflict against removed mapping
huggingface#40666 merged merge unavailable Remove overwritten GitModelTest::test_beam_search_generate merge clean; compileall passed, but baseline lint already fails on pre-existing untracked benchmark/examples artifacts and light validation cannot checkout because of pre-existing…
huggingface#40563 already_present none not_run fix to get output of intermediate output of dinov3 for more use case current DINOv3 ViT already forwards output_hidden_states through DINOv3ViTEncoder and returns hidden_states in BaseModelOutputWithPooling; smoke check produced three hidden-state …
huggingface#40492 applied patch unavailable avoid divid zero errors. direct merge conflicted in masking_utils after API changes; applied divide-by-zero guard and prefix update locally; compile passed, style/light validation unavailable due pre-exis…
huggingface#40438 applied patch unavailable Resolve automatic label name detection when single label provided direct merge conflicted after Trainer init refactor; applied equivalent args.label_names fallback before self.args assignment; compile passed, style/light validation unavailable f…
huggingface#40392 merged merge unavailable Remove debug print statement from ShieldGemma2 conversion script merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40385 merged merge unavailable Fix typo: 'seperate' -> 'separate' in mm_grounding_dino conversion sc… merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40358 applied patch unavailable Fix MXFP4 mlp_forward to handle 2D and 3D hidden_states shapes for multi-turn chat direct merge conflicted but the source fix was portable; applied 2D/3D shape-preserving MXFP4 mlp_forward change; compile passed; style/light validation unavailable for baseline e…
huggingface#40244 already_present none not_run add-loftr-keypoints-to-map current cumulative branch already contains EfficientLoFTRForKeypointMatching in MODEL_FOR_KEYPOINT_MATCHING_MAPPING_NAMES
huggingface#40221 already_present none not_run FIX: enable load_best_model_at_end within SaveStrategy.BEST and initialize metric_for_best_model as loss when SaveStrategy.BEST current cumulative branch already includes SaveStrategy.BEST in best_global_step handling and defaults metric_for_best_model for save_strategy=best
huggingface#40208 applied patch unavailable Save only model sharded sd direct cherry-pick conflicted against current Trainer refactors, but the remaining missing PR change was portable; removed the stale FSDP save_only_model SHARDED_STATE_DICT reject…
huggingface#40148 merged merge unavailable Update utils.py: fix nan merge clean; compile passed; style/light validation unavailable for baseline environmental reasons
huggingface#40123 already_present none not_run Lazily import torchao Int4WeightOnlyConfig to avoid side effects current modeling_utils no longer eagerly imports torchao Int4WeightOnlyConfig at module import time; loading code has been refactored so the PR side-effect fix is already present/…
huggingface#40114 applied patch unavailable Fix torch.export compatibility for Mixtral MoE models direct PR head contains old main merge history; applied the Mixtral inference expert loop fix to the current modular and generated files; compile passed; style/light validation un…
huggingface#40090 merged merge unavailable Fix RuntimeError when loading quantized models with int8 weights (huggingface#39366) merge required resolving modeling_utils initialization conflict; compile passes; style/light validation unavailable due pre-existing untracked benchmark/examples/doc artifacts and…
huggingface#40065 merged merge unavailable Delay float32 upcast in ForCausalLMLoss after filtering ignore_index merge required resolving loss_utils conflict against current union type syntax; compile passes; style/light validation unavailable due pre-existing untracked benchmark/examples/do…
huggingface#40059 merged merge unavailable Fix Inefficient GELU implementation in GPT2 merge required resolving GPT2Config conflict against current declarative config form by changing activation_function default to gelu; compile passes; style/light validation unavai…
huggingface#40022 merged merge unavailable fix: resolve dropout type error in DogeDecoder merge required resolving Doge init conflict using current init helpers; dropout tuple guard present; compile passes; style/light validation unavailable due pre-existing untracked …
huggingface#39999 applied patch unavailable allow TP to work in ND-parallel with fsdp cpu ram efficient loading direct merge conflicted after tensor-parallel setup moved; adapted meta device_map fix to integrations/tensor_parallel.py; compile passed; style/light validation unavailable from …
huggingface#39997 merged merge unavailable make sure position_ids are passed in for causal mask creation for gpt-oss clean merge; compile passed; style/light validation unavailable due same baseline environmental untracked-artifact blockage
huggingface#39962 already_present none unavailable Use torch._check instead of a test to make the model Gemma3 exportable direct merge conflicted across copied VLM files, but current code already uses torch_compilable_check, a torch._check-based export-compatible helper, for these image-token checks;…
huggingface#39866 merged merge unavailable make sure model.save_pretrained has the correct is_main_process merge required resolving Trainer._save conflict by adding save_pretrained is_main_process gate to current save_safetensors call; compile passes; style/light validation unavailable…
huggingface#39794 applied patch unavailable Fix ProphetNet forward to handle tuple encoder_outputs direct merge conflicted across ProphetNet signatures and generic kwargs utilities after v5 API changes; applied minimal tuple-to-BaseModelOutput conversion in current ProphetNetMo…
huggingface#39793 merged merge unavailable Fix DAC conversion script clean merge; compile passes; style/light validation unavailable due same baseline untracked benchmark/examples/doc artifacts and tests_fetcher checkout blockage
huggingface#39756 already_present none not_run Fix rope_deltas corruption in Qwen2.5VL during CFG generation core rope_deltas generation fix is already present in current qwen2_vl/qwen2_5_vl/glm4v code; direct PR head also contains unrelated merged upstream history, so no code change was…
huggingface#39741 merged merge unavailable Fix HfArgumentParser to filter out dict types from Union merge clean after narrow ruff-compatible amendment; compile passed, but baseline ruff/test selection remain unavailable due pre-existing untracked artifacts
huggingface#39698 applied patch unavailable Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG" patch applied; compile passed, while ruff/light validation remain unavailable due pre-existing dirty-baseline artifacts and tests_fetcher mapping issue
huggingface#39697 merged merge unavailable use untyped storage for dtensors due to deprecation merge clean; compile passed, ruff/light validation unavailable due pre-existing dirty-baseline artifacts
huggingface#39683 applied patch unavailable Fix issue huggingface#39191 respect accelerate config to disable torch.dynamo compilation patch applied; compile passed, validation otherwise unavailable due baseline ruff artifacts/tests_fetcher failure
huggingface#39675 already_present none not_run [BugFix]: Support dict and config file path for deepspeed current TrainingArguments already types deepspeed as dict | str | None and preserves dict/path handling; direct PR merge conflicted against the newer dataclass layout
huggingface#39674 applied patch unavailable Fix loss scaling and token aggregation to use only data parallel group patch applied; compile passed, validation otherwise unavailable due baseline ruff artifacts/tests_fetcher failure
huggingface#39625 already_present none not_run Fix: allow Union[str, dict, None] fields like deepspeed to be passed via CLI current HfArgumentParser already filters dict from Union types before CLI parsing, covering Union[str, dict, None] fields such as deepspeed
huggingface#39617 already_present none not_run Fix FSDP v1 bug: trainer incorrectly uses an unwrapped model current Trainer already assigns accelerator.prepare(self.model) to the local model and then sets self.model/model_wrapped to the wrapped model for FSDP
huggingface#39599 applied patch unavailable Fix: check TrainerState file exists before loading during resume patch applied; compile passed, ruff/light validation unavailable due baseline untracked artifact ruff failures and tests_fetcher reverse-map failure
huggingface#39560 applied patch unavailable fix load_model_end = true work when save_steps < eval_steps patch applied; compile passed, ruff/light validation unavailable due baseline untracked artifact ruff failures and tests_fetcher reverse-map failure
huggingface#39493 merged merge unavailable [Voxtral] nit + pin correct mistral common version merged dependency update; validation unavailable because baseline ruff already fails on pre-existing untracked benchmark_v2/examples legacy files
huggingface#39491 merged merge unavailable Fix: Skip weight initialization for quantized int8 models merged cleanly; validation unavailable because baseline ruff already fails on pre-existing untracked benchmark_v2/examples legacy files
huggingface#39468 applied patch unavailable Fix quantized model dispatch with device_map=auto ported bitsandbytes device_map dispatch skip; validation unavailable due pre-existing baseline ruff failures
huggingface#39464 already_present none unavailable Skipping initialize_weights when model is quantized quantized initialization fix already present from PR 39491; direct merge conflicted in refactored modeling_utils.py
huggingface#39449 already_present none unavailable Fix logger warnings in Gemma model test files Gemma logger.warning_once/warn occurrences targeted by the PR are already absent in current files; direct merge conflicted across refactored/deleted Gemma files
huggingface#39297 already_present none not_run Fix bug with deepspeed and accelerator args in training_args.py current training_args.py already supports dict|str for deepspeed and accelerator_config via _VALID_DICT_FIELDS and post-init conversion, so the CLI parsing defect is already addr…
huggingface#39264 already_present none not_run Fix: Add version check for timm to support mobilenetv5 models (fixes huggingface#39208) current timm wrapper already routes model construction through _create_timm_model_with_error_handling and raises an ImportError suggesting timm upgrade for unknown architectures, …
huggingface#39257 merged merge unavailable Fix to tuple conversion with config merged cleanly after preserving current helper functions and applying return_dict=True wrapper fix; compile passed, but full validation remains unavailable due pre-existing style …
huggingface#39211 already_present none not_run Add mobilenet_v5 stub implementation to fix "Unknown Model" error the current timm wrapper already uses _create_timm_model_with_error_handling and raises an actionable ImportError for unknown timm architectures, covering the Gemma3n MobileNetV5 …
huggingface#39206 applied patch unavailable fix: filter None router logits in Qwen3 MoE and handle empty router logits (huggingface#39203) applied the current-code equivalent guard for empty Qwen3 MoE router logits; direct PR-head merge/patch would drag stale generated-model and unrelated upstream changes. Compile pa…
huggingface#39103 applied patch unavailable Fix audio-related config naming for Gemma3n applied the current-code equivalent Gemma3n audio_soft_tokens_per_audio rename; compile passed, but validation remains unavailable due pre-existing ruff failures in untracked benc…
huggingface#39046 applied patch unavailable Fix deprecated max_size parameter handling in DETR image processors applied the current-code equivalent max_size handling to DETR-family torchvision/PIL processors; compile passed, but validation remains unavailable due pre-existing ruff failures …
huggingface#39037 applied patch unavailable fix kosmos2 tests applied the current-code equivalent Kosmos2 non-causal image-to-text attention override; compile passed, but validation remains unavailable due pre-existing ruff failures in untra…
huggingface#39012 already_present none unavailable [WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training PR conflicts because DeepseekV3 MoE has since moved to DeepseekV3NaiveMoe/current modular code; the stale expert loop targeted by the PR is no longer present, so the defect fix is…
huggingface#38999 already_present none unavailable Use deep copies instead of shallow copies for bbox_embed in GroundingDINO decoder (huggingface#37333). current GroundingDINO already constructs a fresh GroundingDinoMLPPredictionHead per decoder layer and uses tied-weight metadata when sharing is requested, so the shallow ModuleLis…
huggingface#38888 merged merge unavailable continue to fix distributed_type from TPU to XLA in LM examples (huggingface#38652) merge clean; validation unavailable from pre-existing untracked/runtime artifacts: ruff fails on benchmark_v2/examples legacy artifacts and light validation/tests_fetcher cannot c…
huggingface#38884 already_present none not_run Llama 4 conversion fix for moe models current converter already avoids treating missing/None moe_args as an MoE configuration via params.get('moe_args', False); no code change needed

Rejected / not included records (518)

PR Category Status Original PR summary / goal Rejection reason
huggingface#45690 feature skipped [serve] Support for reasoning category not configured for this cumulative branch
huggingface#45679 other skipped TST Run fast PEFT tests in normal CI category not configured for this cumulative branch
huggingface#45668 feature skipped [GGUF] Add support for Qwen3.5 MoE (qwen35moe arch) category not configured for this cumulative branch
huggingface#45667 other skipped chore(typing): add ty type checking for 3 pipeline files category not configured for this cumulative branch
huggingface#45666 feature skipped Extended n-to-1 kernel fusion via KernelConfig category not configured for this cumulative branch
huggingface#45664 documentation skipped Doc translate to Persian(farsi) category not configured for this cumulative branch
huggingface#45654 other skipped [CB] Refactor any model-related code in a separate class category not configured for this cumulative branch
huggingface#45653 feature skipped [CB] Better overall script and decode bucketting category not configured for this cumulative branch
huggingface#45651 feature skipped [Trainer] Optimize LengthGroupedSampler computation with select_columns and tqdm category not configured for this cumulative branch
huggingface#45643 feature skipped Add DeepSeek V4 category not configured for this cumulative branch
huggingface#45640 feature skipped 🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config category not configured for this cumulative branch
huggingface#45638 feature skipped Add Multi-Token Prediction (MTP) support for Qwen3.5 category not configured for this cumulative branch
huggingface#45695 feature skipped Support for a new Granite-Speech-Plus model category not configured for this cumulative branch
huggingface#45635 other skipped qa: speed up dtype regex weight load + reduce dtype tests to 3 random category not configured for this cumulative branch
huggingface#45634 feature skipped DeepGEMM BF16, isolation, refactor category not configured for this cumulative branch
huggingface#45630 feature skipped Add new model: Kimi2-6 category not configured for this cumulative branch
huggingface#45626 feature skipped [Model] Add PP-FormulaNet Model Support category not configured for this cumulative branch
huggingface#45621 feature skipped Better Grouped GEMM + EP category not configured for this cumulative branch
huggingface#45618 feature skipped Add MTP speculative decoding via MTPCandidateGenerator category not configured for this cumulative branch
huggingface#45613 feature skipped [New Model] Add MiniCPM3 support category not configured for this cumulative branch
huggingface#45612 documentation skipped [docs] update model cards category not configured for this cumulative branch
huggingface#45609 feature skipped make it possible to ser/deser HF MoE models with torchao category not configured for this cumulative branch
huggingface#45608 documentation skipped Python code in model docs category not configured for this cumulative branch
huggingface#45604 feature skipped Agent first cli with skill category not configured for this cumulative branch
huggingface#45599 other skipped qa: more lazy loading category not configured for this cumulative branch
huggingface#45597 feature skipped Add Granite 4.1 Vision (granite4_vision) category not configured for this cumulative branch
huggingface#45699 feature skipped Add FP8 kernel acceleration for compressed-tensors quantized models category not configured for this cumulative branch
huggingface#45586 feature skipped Add Audio-Visual Flamingo model category not configured for this cumulative branch
huggingface#45569 feature skipped Proper nemotron H and 3 and 2 category not configured for this cumulative branch
huggingface#45550 other skipped Add runner selection for mi325 GPU type category not configured for this cumulative branch
huggingface#45546 feature skipped feat: Add GGUF loading support for Llama 4 (text) category not configured for this cumulative branch
huggingface#45543 other skipped ci: OTEL support category not configured for this cumulative branch
huggingface#45534 feature skipped 🚨 [ALM] Add base model without head category not configured for this cumulative branch
huggingface#45512 feature skipped [OutputRecorder] re.search on layer_name category not configured for this cumulative branch
huggingface#45497 feature skipped Add V-JEPA 2.1 inference support category not configured for this cumulative branch
huggingface#45493 feature skipped Modularize ProcessorMixin into smaller components category not configured for this cumulative branch
huggingface#45490 feature skipped Add ctsm model category not configured for this cumulative branch
huggingface#45477 feature skipped Blockwise mask fn as opt arg in all masking functions category not configured for this cumulative branch
huggingface#45476 other skipped [Don't merge] Call CI workflow category not configured for this cumulative branch
huggingface#45471 feature skipped Add EXAONE 4.5 implementations category not configured for this cumulative branch
huggingface#45465 documentation skipped [docs] contributing category not configured for this cumulative branch
huggingface#45462 other skipped chore(sec): added a handful of security checks category not configured for this cumulative branch
huggingface#45453 other skipped Draft commit category not configured for this cumulative branch
huggingface#45452 other skipped refactor: replace wildcard imports with explicit imports in model init.py files category not configured for this cumulative branch
huggingface#45438 feature skipped Add Gemma4ForSequenceClassification category not configured for this cumulative branch
huggingface#45426 feature skipped Feature/add axk1 category not configured for this cumulative branch
huggingface#45421 defect aborted Improve nested base_model_prefix handling in weight conversion and loading codebase moved on: core_model_loading.py now uses unified WeightTransform ordering while PR expects separate renamings/converters and nested-prefix helpers; tests also conflict in test_modeling_common.py, so adapting wo…
huggingface#45415 other skipped Adds type checking to src/transformers/*py category not configured for this cumulative branch
huggingface#45700 documentation skipped Add docstrings to type validator functions category not configured for this cumulative branch
huggingface#45401 feature skipped Add support for Voxtral-4B-TTS-2603 to transformers category not configured for this cumulative branch
huggingface#45396 feature skipped Extract dynamic vision/audio tensors into standalone pure functions category not configured for this cumulative branch
huggingface#45391 feature skipped audio tester class category not configured for this cumulative branch
huggingface#45382 feature skipped Add AudioGen (AudioCraft) to MusicGen conversion scripts category not configured for this cumulative branch
huggingface#45363 feature skipped n-to-1 kernel fusion via category not configured for this cumulative branch
huggingface#45355 feature skipped Add universal phone recognition model - PhoneticXeus category not configured for this cumulative branch
huggingface#45350 feature skipped WIP: Add support for Granite4VisionForConditionalGeneration category not configured for this cumulative branch
huggingface#45333 feature skipped Add heterogeneous config support (per-layer configuration) category not configured for this cumulative branch
huggingface#45332 feature skipped Add heterogeneous model support (per-layer config and modeling) category not configured for this cumulative branch
huggingface#45300 defect validation_failed Fix Nemotron-H: add mlp layer type support merge conflicts were resolved, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45296 feature skipped Add GGUF support to Gemma4 (31B & 26B-A4B) text category not configured for this cumulative branch
huggingface#45294 feature skipped feat: add Gemma4ForSequenceClassification category not configured for this cumulative branch
huggingface#45293 defect validation_failed Fix "AttributeError: NewTokenizer has no attribute special_attribute_present" (Remove REGISTERED_FAST_ALIASES) clean merge, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45273 defect validation_failed fix: liger unnecessarily materializes logits in VRAM during eval, causing OOM clean merge, but light validation timed out in tests_fetcher/run-light-validation after the merge; reset merge commit
huggingface#45270 feature skipped [Trainer] Support multi-loss component logging category not configured for this cumulative branch
huggingface#45267 documentation skipped Add docstring to FFN.forward in DistilBERT category not configured for this cumulative branch
huggingface#45254 defect aborted Fix more integration tests for important models codebase moved on: PR head has no current file list from GitHub but differs from cumulative HEAD by hundreds of unrelated repo-wide changes from an old main snapshot; no narrow defect patch is identifiable to apply safe…
huggingface#45244 other skipped Let's CI go great category not configured for this cumulative branch
huggingface#45233 defect validation_failed feat: make timesfm2_5 onnx export compatible clean merge, but light validation timed out after the merge; reset merge commit
huggingface#45218 feature skipped Proposal: Agent-first CLI category not configured for this cumulative branch
huggingface#45213 feature skipped DO NOT MERGE - model creation skill category not configured for this cumulative branch
huggingface#45189 feature skipped Add doc test CI workflow reusing existing model job infrastructure category not configured for this cumulative branch
huggingface#45186 feature skipped Add new model: Isaac category not configured for this cumulative branch
huggingface#45181 feature skipped Make the cli a top-level package category not configured for this cumulative branch
huggingface#45176 feature skipped added efficietvitsam model to HF category not configured for this cumulative branch
huggingface#45168 feature skipped Update min_lr and max_lr default values to better defaults category not configured for this cumulative branch
huggingface#45167 feature skipped Add anthropic style of function schema category not configured for this cumulative branch
huggingface#45157 feature skipped [WIP] PrismML Bonsai model support category not configured for this cumulative branch
huggingface#45153 feature skipped [FA] Native torch integration category not configured for this cumulative branch
huggingface#45152 documentation skipped [docs] model testing category not configured for this cumulative branch
huggingface#45149 feature skipped DO NOT MERGE adding SAML3-LiteText with a skill, first pass category not configured for this cumulative branch
huggingface#45144 feature skipped Add Xiaomi MiMo-V2 category not configured for this cumulative branch
huggingface#45134 feature skipped Optimize Parakeet feature extraction on CUDA category not configured for this cumulative branch
huggingface#45133 feature skipped Add sarvam model category not configured for this cumulative branch
huggingface#45115 other skipped Refactor/nemotron h inherit granitemoehybrid category not configured for this cumulative branch
huggingface#45113 feature skipped Add GDS support for safetensors loading category not configured for this cumulative branch
huggingface#45110 feature skipped Add SAM 3.1 category not configured for this cumulative branch
huggingface#45101 feature skipped Adding support for Nandi Models category not configured for this cumulative branch
huggingface#45097 feature skipped Add old InternVL2-1B/2B support to the InternVL conversion script huggingface#45092 category not configured for this cumulative branch
huggingface#45082 feature skipped [VidEoMT] Update conversion script category not configured for this cumulative branch
huggingface#45077 other skipped fix: pin 50 unpinned actions to commit SHA, extract 1 secret to env var category not configured for this cumulative branch
huggingface#45075 feature skipped Add Deepseek-OCR-2 model category not configured for this cumulative branch
huggingface#45073 other skipped Refactor OwlViT to modular Transformers category not configured for this cumulative branch
huggingface#45067 feature skipped feat: trainer resume_from_checkpoint support hub downloads (huggingface#43375) category not configured for this cumulative branch
huggingface#45064 other skipped refactor: shard checkers category not configured for this cumulative branch
huggingface#45037 documentation skipped add missing colon in custom_attention function signature in attention… category not configured for this cumulative branch
huggingface#45028 feature skipped TP refactor for FSDP + TP integration category not configured for this cumulative branch
huggingface#44989 feature skipped 🚨 Distributed training API category not configured for this cumulative branch
huggingface#44979 feature skipped Module Fusion API category not configured for this cumulative branch
huggingface#44974 feature skipped Refactor core_model_loading to support FSDP shard-on-read loading category not configured for this cumulative branch
huggingface#44965 other skipped try category not configured for this cumulative branch
huggingface#44956 feature skipped Add HyperCLOVAX SEED Think 14B category not configured for this cumulative branch
huggingface#44942 feature skipped Add inference time layer fusion optimisations via PreTrainedModel.from_pretrained(fuse_layers=True) category not configured for this cumulative branch
huggingface#44891 feature skipped [Trainer] add MoERouterHealthCallback Callback category not configured for this cumulative branch
huggingface#44875 other skipped refactor: improved the cli server module code organization category not configured for this cumulative branch
huggingface#44872 documentation skipped Fix: Update outdated sampler comment in generation/utils.py category not configured for this cumulative branch
huggingface#44830 feature skipped Add AudioFlamingoNext model category not configured for this cumulative branch
huggingface#44815 defect aborted Dequant fix codebase moved on: PR expects the older finegrained_fp8 CUTLASS/Triton loader structure and edits Fp8Dequantize/FP8Experts around _get_triton_kernel, while current cumulative code has the newer lazy_load_kernel plus Dee…
huggingface#44794 feature skipped Refacto GGUF weight conversion category not configured for this cumulative branch
huggingface#44775 documentation skipped [docs] n-d parallelism category not configured for this cumulative branch
huggingface#44772 documentation skipped bitsandbytes: Update links and docs category not configured for this cumulative branch
huggingface#44771 defect aborted wtf codebase moved on: PR edits the old generated src/transformers/models/pi0/image_processing_pi0_fast.py to inherit SiglipImageProcessorFast, but current cumulative code has PI0 under modular generation with src/transform…
huggingface#44729 defect aborted Avoid floating point math for ceil operations codebase moved on: PR applies a broad int_div_ceil migration across many generated image processors/configs and older fast image processor files; current cumulative branch has deleted several *_fast.py files and regener…
huggingface#44724 defect aborted Fix some missing / incorrect entries in auto files codebase moved on: PR edits generated auto mapping files using an older tuple/fast-processor mapping layout, while current cumulative branch uses the newer backend-keyed auto image processor mappings and has many additi…
huggingface#44722 feature skipped Refactor gptj output tracing to use standardized decorators category not configured for this cumulative branch
huggingface#44713 feature skipped [ColQwen2] Refactor output tracing (issue huggingface#43979) category not configured for this cumulative branch
huggingface#44682 feature skipped transformers serve + llamacpp category not configured for this cumulative branch
huggingface#44676 defect validation_failed fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights compileall and repo checkers passed, but light validation/tests_fetcher timed out after the merge
huggingface#44662 feature skipped [model] Add PenguinVL implementation category not configured for this cumulative branch
huggingface#44660 defect validation_failed Fix: avoid late CUDA OOM in load_best_model_at_end with PEFT models compileall and repo checkers passed, but light validation timed out after the merge
huggingface#44659 documentation skipped docs: remove outdated use_diff docstring from DistributedConfig.to_js… category not configured for this cumulative branch
huggingface#44650 defect validation_failed Fix Seq2SeqTrainer generation path for decoder-only models compileall and repo checkers passed, but light validation timed out after the merge
huggingface#44646 documentation skipped Fix typo: seperate -> separate category not configured for this cumulative branch
huggingface#44642 documentation skipped Clarify that causal LM labels are shifted internally category not configured for this cumulative branch
huggingface#44635 other skipped [Gemma] Modular-friendly buffers category not configured for this cumulative branch
huggingface#45702 defect aborted Reorder decorators for autodoc and dataclass codebase moved on: PR reorders dataclass/auto_docstring decorators broadly, but cumulative branch has additional qwen3_5 sequence-classification changes adjacent to Qwen3_5CausalLMOutputWithPast causing a content confli…
huggingface#44601 feature skipped [Distributed] Add PP support natively category not configured for this cumulative branch
huggingface#44594 feature skipped [Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline category not configured for this cumulative branch
huggingface#44569 feature skipped Add SarvamMLA model (sarvamai/sarvam-105b) category not configured for this cumulative branch
huggingface#44553 feature skipped [FA] Refactor FA CB kwargs category not configured for this cumulative branch
huggingface#44550 documentation skipped Improve clarity and grammar in Auto Classes documentation category not configured for this cumulative branch
huggingface#44547 documentation skipped Fix position_ids docstring in modeling_flash_attention_utils.py category not configured for this cumulative branch
huggingface#44543 defect aborted Fix assistant_masks for multimodal inputs in apply_chat_template merge conflict in ProcessorTesterMixin tests: PR adds multimodal assistant mask regression at the same location where current branch has a newer tool-call chat template regression test; processing_utils change staged cl…
huggingface#44517 feature skipped Add qwen3 tts category not configured for this cumulative branch
huggingface#44495 other skipped [Gradient Ckpting] Remove unnecessary attribute definitions category not configured for this cumulative branch
huggingface#44467 defect aborted Placeholder tokens update tokenizer stack has moved on substantially: conflicts span convert_slow_tokenizer, tokenization_auto incorrect-class handling, tokenization_utils_base/tokenizers backend, LASR/SigLIP2 tokenizers, and tokenizer regressio…
huggingface#44445 feature skipped Adding support for GraniteDoclingHybrid category not configured for this cumulative branch
huggingface#44438 feature skipped Add flashoptim category not configured for this cumulative branch
huggingface#44420 documentation skipped [docs] distributed training category not configured for this cumulative branch
huggingface#44408 feature skipped Add option to export encoder hidden states for Granite-speech category not configured for this cumulative branch
huggingface#44407 documentation skipped docs: add energy efficiency considerations to bitsandbytes quantization guide category not configured for this cumulative branch
huggingface#44394 feature skipped 🚨🚧 FeatureExtractor → AudioProcessor category not configured for this cumulative branch
huggingface#44375 feature skipped Add RF-DETR category not configured for this cumulative branch
huggingface#44369 documentation skipped Feature/integrations docs fix category not configured for this cumulative branch
huggingface#44348 feature skipped Enable MetalConfig to load pre-quantized MLX models from HuggingFace Hub category not configured for this cumulative branch
huggingface#44314 feature skipped add HyperClovaX Vision category not configured for this cumulative branch
huggingface#44298 defect aborted Auto detect wrong mapping models codebase moved on: PR targets older tokenizer conversion/mapping code; current branch already has newer serialized-tokenizer override, padding/truncation preservation, SentencePiece charsmap extraction, and additional a…
huggingface#44264 defect aborted [Moe] Enable aux loss automatically when in training + coef is not 0 codebase moved on: PR globally changes MoE aux-loss output behavior across generated and modular MoE model files, but current branch has newer decorator/output patterns and model-specific changes causing 32 conflicts; s…
huggingface#44259 feature skipped Async data producer category not configured for this cumulative branch
huggingface#44252 feature skipped Timm unification continued category not configured for this cumulative branch
huggingface#44215 feature skipped Add sequence classification capability to Granite models category not configured for this cumulative branch
huggingface#44184 feature skipped feat: add OpenAI CircuitGPT core architecture and sparse linear layers category not configured for this cumulative branch
huggingface#44178 feature skipped Add xcodec2 model category not configured for this cumulative branch
huggingface#44171 feature skipped Parakeet tdt category not configured for this cumulative branch
huggingface#44161 other skipped Refactor LongT5 to use @capture_outputs and @can_return_tuple decorators for unified output handling (Fixes huggingface#43979) category not configured for this cumulative branch
huggingface#44159 feature skipped Add SDPA and Flash Attention support for OWL-ViT category not configured for this cumulative branch
huggingface#44154 other skipped Refactored vits to match standardized output collection interface category not configured for this cumulative branch
huggingface#44142 feature skipped [voxtral-realtime] get more perfs! category not configured for this cumulative branch
huggingface#44129 other skipped Refactor SpeechT5 output tracing to standardized output capture category not configured for this cumulative branch
huggingface#44123 feature skipped Avoid device sync in training loss accumulation category not configured for this cumulative branch
huggingface#44116 other skipped [WIP] [Flaubert] Refactor output tracing to decorator-based interface category not configured for this cumulative branch
huggingface#44114 other skipped Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators category not configured for this cumulative branch
huggingface#44101 other skipped [XLM] Refactor output tracing to align with capture_outputs standardized architecture category not configured for this cumulative branch
huggingface#44098 other skipped [ViLT] Refactor output handling to align with standardized patterns category not configured for this cumulative branch
huggingface#44086 other skipped [MGP-STR] Refactor output tracing to use capture_outputs/can_return_tuple decorators category not configured for this cumulative branch
huggingface#44085 other skipped Refactor RemBERT to use output tracing decorators category not configured for this cumulative branch
huggingface#44083 feature skipped FSDP2 native support in transformers category not configured for this cumulative branch
huggingface#44076 other skipped Refectored modeling_imagegpt.py to enable hooks to capture_outputs category not configured for this cumulative branch
huggingface#44074 other skipped [TextNet] Refactor output tracing using capture_outputs decorator category not configured for this cumulative branch
huggingface#44073 other skipped [VisualBert] Refactor output tracing using capture_outputs and can_return_tuple decorators category not configured for this cumulative branch
huggingface#44072 other skipped refactor efficientnet output tracing with @capture_outputs and @can_r… category not configured for this cumulative branch
huggingface#44071 other skipped [Refactor] Migrate MPT to standardized output tracing decorators category not configured for this cumulative branch
huggingface#44070 feature skipped Add GGUF loading support for Qwen3-Next (qwen3_next) architecture category not configured for this cumulative branch
huggingface#44068 other skipped Refactor GPT-Neo to use @capture_outputs and @can_return_tuple decorators category not configured for this cumulative branch
huggingface#44066 other skipped Refactor GPT-J to use standardized output tracing (huggingface#43979) category not configured for this cumulative branch
huggingface#44059 other skipped [GPT2] Refactor output tracing to use capture_outputs/can_return_tuple decorators category not configured for this cumulative branch
huggingface#44056 other skipped [MPNet] Refactor output tracing using capture_outputs decorator category not configured for this cumulative branch
huggingface#44054 feature skipped Flash mla interface category not configured for this cumulative branch
huggingface#44044 other skipped Refactor DeBERTa's output tracing interface category not configured for this cumulative branch
huggingface#44030 other skipped refactor output tracing in dpr category not configured for this cumulative branch
huggingface#44029 other skipped refactor output tracing in rwkv category not configured for this cumulative branch
huggingface#44028 other skipped refactor output tracing for superpoint category not configured for this cumulative branch
huggingface#44027 other skipped refactor output tracing in speech_encoder_decoder category not configured for this cumulative branch
huggingface#44026 other skipped refactor output tracing for vision_encoder_decoder category not configured for this cumulative branch
huggingface#44025 other skipped refactor output tracing for depth_anything category not configured for this cumulative branch
huggingface#44024 other skipped Focalnet standardized outputs category not configured for this cumulative branch
huggingface#44019 other skipped Refactor resnet to use @capture_outputs / @can_return_tuple output tracing category not configured for this cumulative branch
huggingface#44018 other skipped Refactor GPT-Neo output tracing to use capture_outputs/can_return_tuple category not configured for this cumulative branch
huggingface#44017 other skipped Refactor output tracing in segformers (huggingface#43979) category not configured for this cumulative branch
huggingface#44015 other skipped Refactor GPT2-based models to standardized output collection interface category not configured for this cumulative branch
huggingface#44013 other skipped Ouptut tracing: Standardizing MobileNetv2 category not configured for this cumulative branch
huggingface#44010 other skipped [SqueezeBert] Migrate to standardized output collection decorators category not configured for this cumulative branch
huggingface#44007 other skipped [ResNet] Refactor output tracing to decorator-based interface category not configured for this cumulative branch
huggingface#44004 other skipped refactor output tracing for codegen category not configured for this cumulative branch
huggingface#44003 other skipped refactor output tracing in mamba category not configured for this cumulative branch
huggingface#44002 other skipped refactor output tracing in upernet category not configured for this cumulative branch
huggingface#44001 other skipped refactor output tracing in univnet category not configured for this cumulative branch
huggingface#44000 other skipped refactor output tracing in vision_text_dual_encoder category not configured for this cumulative branch
huggingface#43999 other skipped refactor output tracing in mobilenet_v1 category not configured for this cumulative branch
huggingface#43998 other skipped refactor output tracing in timm_backbone category not configured for this cumulative branch
huggingface#43997 other skipped Migrate RegNet to standardized output tracing category not configured for this cumulative branch
huggingface#43996 other skipped Refactor FNet and CVT output tracing category not configured for this cumulative branch
huggingface#43995 other skipped Refactoring falcon model to match standardized output collection interface category not configured for this cumulative branch
huggingface#43973 feature skipped Add lfm2.5 audio category not configured for this cumulative branch
huggingface#43924 defect aborted [Attn] More old mask APIs codebase moved on: PR ports old attention-mask API helpers across many model files, but current branch has newer standardized output/TransformersKwargs paths and renamed create_bidirectional_mask call sites; 11 content …
huggingface#43915 feature skipped add PaddleOCR-VL conversion category not configured for this cumulative branch
huggingface#43888 feature skipped Support for BharatGen's Param2MoE model architecture category not configured for this cumulative branch
huggingface#43863 feature skipped [whisper] allow to pass text/audio specific kwargs category not configured for this cumulative branch
huggingface#45703 other skipped chore(typing): add ty type checking for 10 utility files category not configured for this cumulative branch
huggingface#43838 feature skipped Qwen3 ASR and Forced Aligner category not configured for this cumulative branch
huggingface#43823 feature skipped Add category not configured for this cumulative branch
huggingface#43785 defect aborted Fix FSDP_CPU_RAM_EFFICIENT_LOADING (huggingface#43749) codebase moved on: PR targets older FSDP/model-loading flow; cumulative branch already has newer FSDP2 cpu_ram_efficient_loading changes, and direct merge placed the core_model_loading early-return block inside the curr…
huggingface#43751 other skipped Fix ruff warnings category not configured for this cumulative branch
huggingface#43743 feature skipped Modular playground category not configured for this cumulative branch
huggingface#43665 other skipped fix category not configured for this cumulative branch
huggingface#43663 feature skipped Add _get_signature_columns method to allow custom trainers to override column filtering category not configured for this cumulative branch
huggingface#43649 other skipped Check new failures reporting 5 category not configured for this cumulative branch
huggingface#43636 feature skipped Add _metrics dict to Trainer for custom metric logging category not configured for this cumulative branch
huggingface#43613 feature skipped Add Promptable Visual Segmentation pipeline category not configured for this cumulative branch
huggingface#43612 feature skipped Add Promptable Concept Segmentation pipeline category not configured for this cumulative branch
huggingface#43532 other skipped [do not merge] Show diff category not configured for this cumulative branch
huggingface#43506 feature skipped Add RishAI model with full transformers integration category not configured for this cumulative branch
huggingface#43498 defect validation_failed fix/backward compatibility for tie_weights merge clean but repo checkers failed: added tie_embeddings_and_encoder_decoder calls undefined tie_weights and import ordering was unformatted; reverted top merge
huggingface#43488 other skipped [don't merge] bad format to check repo bot category not configured for this cumulative branch
huggingface#43484 feature skipped Optimize Ernie 4.5 VL timestamp rendering with cached overlays category not configured for this cumulative branch
huggingface#43469 feature skipped argparser: Allow optional bool flags without values category not configured for this cumulative branch
huggingface#43451 feature skipped Add Molmo2 category not configured for this cumulative branch
huggingface#43448 feature skipped Add Molmo category not configured for this cumulative branch
huggingface#43446 feature skipped [typings] Automatically type decorator return types as tuple | X category not configured for this cumulative branch
huggingface#43424 other skipped Add test to ensure executorch exportability with dynamic shapes category not configured for this cumulative branch
huggingface#43363 feature skipped [Improvement] Update DistributedLengthGroupedSampler to allow customizing length function category not configured for this cumulative branch
huggingface#43340 documentation skipped Claude code skills for transformers-api category not configured for this cumulative branch
#43333 documentation skipped Fix typo: interupted -> interrupted category not configured for this cumulative branch
#43310 other skipped Replace regex with re category not configured for this cumulative branch
#43297 feature skipped [Feat] Reduces redundant tokenization of tags to accelerate Qwen3VL. category not configured for this cumulative branch
#43271 documentation skipped Fix typo: necesary → necessary category not configured for this cumulative branch
#43267 documentation skipped Add auto_docstring decorator to Sam3ImageProcessorFast category not configured for this cumulative branch
#43265 feature skipped Adding Omnilingual ASR models category not configured for this cumulative branch
#43249 feature skipped [WIP] Processor moves to device in __call__ category not configured for this cumulative branch
#43246 other skipped GptOss slow tests category not configured for this cumulative branch
#43213 feature skipped feat: allow output_hidden_states and output_attensions to record outputs of specific layers category not configured for this cumulative branch
#43192 feature skipped [Trackio] support trackio gpu logging category not configured for this cumulative branch
#43139 feature skipped [perf] optimize whisper GPU performance category not configured for this cumulative branch
#43104 documentation skipped docs: clarify tokenizer decoder behavior in v5 (#43066) category not configured for this cumulative branch
#43102 documentation skipped Add CPU vs GPU performance comparison example category not configured for this cumulative branch
#43094 feature skipped Avoid inline .item() sync in decoder start token check category not configured for this cumulative branch
#43088 feature skipped Skip attention_mask.all() GPU-CPU sync during generation category not configured for this cumulative branch
#43085 feature skipped Add async_stopping_criteria flag to reduce GPU-CPU syncs during generation category not configured for this cumulative branch
#43077 other skipped compileable=>compilable category not configured for this cumulative branch
#43063 documentation skipped Improve documentation for SegFormer image processor category not configured for this cumulative branch
#43056 feature skipped Perf: enable pin_memory in DataLoader for CLM no_trainer example category not configured for this cumulative branch
#43044 feature skipped [SAM3] Enable single-scale input support in Mask Decoder category not configured for this cumulative branch
#43036 documentation skipped Docs: fix grammar in Pipeline section category not configured for this cumulative branch
#43020 feature skipped Add mimo v2 flash category not configured for this cumulative branch
#42982 feature skipped Add HumanV: decoder-only causal LM category not configured for this cumulative branch
#42978 feature skipped Add ViT NEPA category not configured for this cumulative branch
#42976 other skipped Upgrade GitHub Actions to latest versions category not configured for this cumulative branch
#42975 other skipped Upgrade GitHub Actions for Node 24 compatibility category not configured for this cumulative branch
#42944 feature skipped [Quantization] From config Quantization for FP8 category not configured for this cumulative branch
#42919 feature skipped [WIP] Video support in vLLM backend category not configured for this cumulative branch
#42908 defect aborted Fix gguf tokenizers codebase moved on: PR conflicts in generation/utils.py, auto/tokenization_auto.py, and tokenization_utils_base.py; current branch already contains the main GGUF tokenizer native-format guard, while the PR head also comm…
#42887 feature skipped [Quantization] [Compressed Tensors] Support Transforms, Fix Tests category not configured for this cumulative branch
#42876 documentation skipped Document tensor parallelism configuration with Trainer category not configured for this cumulative branch
#42829 feature skipped [WIP] End-to-end exportable pipelines (object detection) category not configured for this cumulative branch
#42816 defect aborted validate tokenizer components codebase moved on: PR adds tokenizer component validation in tokenization_utils_tokenizers.py, but current branch has newer tokenizer loading/GGUF and minimal-tokenizer extraction logic in the same area; resolving would…
#42785 documentation skipped Fixing wrong information in Mimi Docs category not configured for this cumulative branch
#42781 feature skipped Add VibeVoice Realtime category not configured for this cumulative branch
#42767 defect aborted fix: add mapping of deepseek_v32 model type codebase moved on: PR adds deepseek_v32 as an older-style generated model plus direct auto mapping edits, but current branch derives auto mappings from generated auto_mappings.py/modular model metadata; resolving would …
#42765 other skipped Add distributed training CI category not configured for this cumulative branch
#42744 other skipped [FP8 Devstral 24B] Repro PR category not configured for this cumulative branch
#42742 other skipped Remove redundant else in activations.py category not configured for this cumulative branch
#42706 other skipped Nit parakeet category not configured for this cumulative branch
#42668 defect aborted More robust processor from pretrained codebase moved on: processor auto/tokenizer mappings and ProcessorMixin loading/validation paths have diverged substantially; direct PR-head merge also tried to bring unrelated upstream-era artifacts, so safely resolvin…
#42665 feature skipped Some optimizations for offloading category not configured for this cumulative branch
#42655 feature skipped New Feature: Enabling Speculative Decoding with Batch Size > 1 (If draft and target model share tokenizer) category not configured for this cumulative branch
#42588 documentation skipped Document the /v1/models endpoint category not configured for this cumulative branch
#42572 documentation skipped docs: add doctest for SqueezeBERT category not configured for this cumulative branch
#42527 documentation skipped Added doctests for SwiftFormer model category not configured for this cumulative branch
#42496 feature skipped feat: allow CP with trainer category not configured for this cumulative branch
#42467 defect aborted Fixes StaticCache Crashes codebase moved on: StaticLayer/StaticSlidingWindowLayer now use cumulative_length state, a key/value lazy_initialization signature, and crop/get_mask_sizes APIs that conflict with the PR's older max_batch_size keys_full…
#42461 defect aborted Refactor RMSNorm implementations to use torch.nn.functional.rms_norm codebase moved on: the PR rewrites RMSNorm classes across 65 model files, many of which are now generated/copied or have diverged type signatures and local changes; resolving would require applying the rms_norm migratio…
#42453 feature skipped Add SDPA and FlashAttention support to T5 category not configured for this cumulative branch
#42437 other skipped One tok typing category not configured for this cumulative branch
#42432 feature skipped Add VideoToTextPipeline with smart frame sampling and system prompts category not configured for this cumulative branch
#42430 defect aborted 🚨 Clean up image-text-to-text pipeline codebase moved on: source changes merged, but pipeline tests now contain device-specific Expectations and additional text-only coverage that conflict with the PR's older expected-output rewrites; resolving would require…
#42424 defect validation_failed [WIP] attempt to fix ooms in tests after resolving small conflicts by keeping the PR's added use_cache/cache_position kwargs, ruff failed with undefined cache_position in qwen3_omni_moe, qwen3_vl, modular_qwen3_vl, and qwen3_vl_moe; current model APIs no…
#42415 other skipped initial clean category not configured for this cumulative branch
#42413 feature skipped Add chatterbox support category not configured for this cumulative branch
#42412 other skipped Replace Optional and Union typing with | in examples category not configured for this cumulative branch
#42403 feature skipped Sam3 release on v4.57.3 category not configured for this cumulative branch
#42385 defect aborted Fix weight tying logic between _tied_weights_keys and tie_word_embeddings codebase moved on: the PR branch contains many upstream merge commits plus tied-weight changes and conflicts in core_model_loading, hub_kernels, FSMT and VLM/audio config/model files; safely extracting the weight-tying …
#42345 feature skipped GPT-OSS Flash Attention and memory-efficient attention via Native PyTorch SDPA category not configured for this cumulative branch
#42310 feature skipped [WIP] Add moondream3 model category not configured for this cumulative branch
#42292 documentation skipped docs: clarify recommended usage of max_new_tokens in generate() category not configured for this cumulative branch
#42277 documentation skipped doc(kernels): update kernels integration documentation category not configured for this cumulative branch
#42256 feature skipped Integrate Core AutoAWQ Inference Components into Transformers category not configured for this cumulative branch
#42244 defect aborted [core] Fix torchao loading codebase moved on: the torchao loading PR rewrites core loading/quantization plus hundreds of generated and modular modeling files, producing 477 conflicts including add/add conflicts in core_model_loading, conversion_m…
#42229 feature skipped Add openpangu_moe model category not configured for this cumulative branch
#42210 feature skipped [WIP] started adding support for evo2 category not configured for this cumulative branch
#42166 feature skipped add internvl_flash model category not configured for this cumulative branch
#42134 feature skipped Add AutoMergeAdapters utility for merging multiple LoRA adapters with… category not configured for this cumulative branch
#42124 documentation skipped 📚 docs(qwen3): add comprehensive usage examples and model details category not configured for this cumulative branch
#42112 feature skipped Add max_thinking_tokens for reasoning models (issue #42111) category not configured for this cumulative branch
#42039 other skipped [WIP] 🚨 clean xcodec 🧼 category not configured for this cumulative branch
#42000 documentation skipped Fix Mixtral: Docstring uses consistent 'top_k_index' and 'top_k_weights' in MixtralSparseMoeBlock category not configured for this cumulative branch
#41992 feature skipped [PoC] HF exporters category not configured for this cumulative branch
#41977 feature skipped Add Phi3.5 Vision Model category not configured for this cumulative branch
#41967 defect aborted feat: RoPE-related typing improvements codebase moved on: current modeling_rope_utils has a newer RoPE standardization/validation layout and RopeParameters definition, while the PR rewrites standardize_rope_params and moves/retypes RopeParameters from an old…
#45705 feature skipped [skills] model doc category not configured for this cumulative branch
#41899 feature skipped Testing checkpoint limit changes from PR #37196 category not configured for this cumulative branch
#41895 feature skipped Add Telugu Sentiment Classification Example using DistilBERT category not configured for this cumulative branch
#41886 feature skipped ADD FG-CLIP2 category not configured for this cumulative branch
#41882 feature skipped Support fdma for models with attention bias category not configured for this cumulative branch
#41880 documentation skipped Indonesian Language Support for ReadMe category not configured for this cumulative branch
#41823 feature skipped Lfm2-VL vllm category not configured for this cumulative branch
#41807 documentation skipped git commit -m "Fix: corrected outdated documentation link in README.md" category not configured for this cumulative branch
#41800 documentation skipped Increasing clarity category not configured for this cumulative branch
#41798 feature skipped p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding category not configured for this cumulative branch
#41797 feature skipped Add deepseek ocr category not configured for this cumulative branch
#41794 other skipped Enable flake8-pie rules category not configured for this cumulative branch
#41776 feature skipped Add safety checking infrastructure for text generation category not configured for this cumulative branch
#41733 documentation skipped transformers CLI documentation issue category not configured for this cumulative branch
#41710 documentation skipped 🌐 [i18n-KO] Translated to Korean category not configured for this cumulative branch
#41693 feature skipped 🚨 Refactor ViT to updated standards category not configured for this cumulative branch
#41654 defect aborted Improve LLaMA tokenizer error when vocab is missing: suggest installi… codebase moved on: LLaMA tokenizer has been reduced to tokenizers-backend wiring with only all, while PR modifies legacy SentencePiece get_spm_processor and adds a slow-tokenizer missing-vocab test; resolving would …
#41611 documentation skipped Docs add custom loss example category not configured for this cumulative branch
#41597 documentation skipped Standardize RoBERTa model card following issue #36979 category not configured for this cumulative branch
#41594 feature skipped Add beginner-friendly sentiment analysis example category not configured for this cumulative branch
#41593 feature skipped examples: add multi-label text classification (BCEWithLogitsLoss, met… category not configured for this cumulative branch
#41584 defect aborted Add clear error message for missing SentencePiece model in get_spm_processor (fix #41553) codebase moved on: PR inserts a guard in legacy LlamaTokenizer.get_spm_processor, but current tokenization_llama.py no longer contains that SentencePiece implementation and only exposes the tokenizers-backend class; res…
#41565 documentation skipped 🌐 [i18n-KO] Updated perf_train_gpu_many.md category not configured for this cumulative branch
#41561 feature skipped Optimize Mamba2 memory usage by replacing broadcast with einsum category not configured for this cumulative branch
#41557 defect aborted Update modeling_llama4.py PR does not implement the reported Llama4 RMSNorm/L2Norm defect; its only model change is an inline question/comment, while direct merge would also drag unrelated upstream/main history. No practical defect fix to apply.
#41531 documentation skipped 🌐 [i18n-KO] Translated video_processor.md to Korean category not configured for this cumulative branch
#41528 feature skipped Add position encoding interpolation to DeiT category not configured for this cumulative branch
#41527 documentation skipped 🌐 [i18n-KO] Translated selecting.md to Korean category not configured for this cumulative branch
#41524 feature skipped Add max_eval_batches argument to TrainingArguments category not configured for this cumulative branch
#41523 other skipped Add test coverage for ConvNextImageProcessorFast category not configured for this cumulative branch
#41522 defect aborted Fix _init_weights to safely skip int8 quantized weights codebase moved on: PR directly edits generated qwen2_5_vl modeling and replaces the current modular _init_weights override with an ad-hoc initializer plus root-level test scripts. Current tree has modular_qwen2_5_vl as …
#41491 feature skipped Add skip_unnecessary_grad_clip to TrainingArguments for optimized gradient clipping category not configured for this cumulative branch
#41490 defect aborted Fix _init_weights to safely skip int8 tensors in Qwen2_5_VL model codebase moved on: PR edits generated qwen2_5_vl modeling _init_weights and adds root-level tester scripts, while current tree uses modular_qwen2_5_vl as source of truth and generated modeling has diverged. Resolving sa…
#41488 other skipped Create 1 category not configured for this cumulative branch
#41458 feature skipped Adding ScatterMoE kernel support for Granite models. category not configured for this cumulative branch
#41419 feature skipped First QAT for Finegrained FP8 category not configured for this cumulative branch
#41406 feature skipped 🚨 [v5] Remove deprecated cache classes category not configured for this cumulative branch
#41362 documentation skipped Added Hacktoberfest banner image to README.md category not configured for this cumulative branch
#41356 feature skipped Add DEIMv2 model, image processor, and basic tests category not configured for this cumulative branch
#41349 feature skipped Create (3d_parrallel_v2.py) - Add 3D parallelism training example script category not configured for this cumulative branch
#41333 feature skipped Add DeepseekVLV2 Model category not configured for this cumulative branch
#41329 defect aborted Fix GIL=0 segfault and Add GIL=0 compat for regex paths codebase moved on: PR rewrites many legacy slow tokenizer files to import a new utils.safe regex wrapper, but current v5 tree has converted several of those files to tokenizers-backend compatibility shims and deleted ot…
#41315 feature skipped [model deprecations] Define new version-based model deprecation/deletions with user warnings/exceptions category not configured for this cumulative branch
#41304 defect aborted Fix equality-vs-assignment bug in GptqHfQuantizer.update_device_map codebase moved on: the buggy auto-gptq CPU-to-CUDA assignment branch targeted by the PR has been removed; current GPTQ validation requires gptqmodel and update_device_map only fills a default CPU map, so applying the PR…
#41299 feature skipped copied changes from soghomon-b:add-eval-step-limit-31561 category not configured for this cumulative branch
#41291 feature skipped Add-Deimv2 category not configured for this cumulative branch
#41272 feature skipped feat: Add HRM Model category not configured for this cumulative branch
#41254 documentation skipped docs: Add Hacktoberfest banner, Contributing and License sections to README category not configured for this cumulative branch
#41251 feature skipped Add deepseek 3.2 exp category not configured for this cumulative branch
#41224 feature skipped Add DINOv3ViTForImageClassification support category not configured for this cumulative branch
#41215 defect aborted Fix CLIP memory leak causing 600-800MB accumulation per batch codebase moved on: PR deletes local text_outputs/vision_outputs after extracting projected tensors, but current CLIP/Aimv2/MetaClip2 get_*_features APIs now return BaseModelOutputWithPooling objects with projected poole…
#41202 feature skipped [WIP] standardize audio kwargs category not configured for this cumulative branch
#41162 documentation skipped Add Sinhala (සිංහල) translation of README category not configured for this cumulative branch
#41160 documentation skipped [docs] update tips syntax category not configured for this cumulative branch
#41159 feature skipped Support setting total_train_batch_size. category not configured for this cumulative branch
#41144 feature skipped Support automatic conversion from zero checkpoint to universal checkpoint. category not configured for this cumulative branch
#41116 feature skipped Add MiniCPM3 category not configured for this cumulative branch
#41095 feature skipped Add LLaVA-OneVision-1.5 model and related configurations category not configured for this cumulative branch
#41053 feature skipped Qwen3 moe category not configured for this cumulative branch
#41041 feature skipped [WIP] Add YuE model category not configured for this cumulative branch
#41040 feature skipped Add Keye vl 8b 1.5 category not configured for this cumulative branch
#41037 other skipped Tests: Apertus integration tests category not configured for this cumulative branch
#41035 documentation skipped docs: update speech recognition examples to use modern Common Voice d… category not configured for this cumulative branch
#41033 feature skipped feat: make audio feature extractors torch.export-able category not configured for this cumulative branch
#41024 feature skipped Deprecate max_size in ConditionalDetrImageProcessor with warning category not configured for this cumulative branch
#41022 documentation skipped 🌐 [i18n-KO] Translated backbones.md to Korean category not configured for this cumulative branch
#41021 documentation skipped 🌐 [i18n-KO] Translated video_processors.md to Korean category not configured for this cumulative branch
#41009 feature skipped Add Lexa-Delta model support category not configured for this cumulative branch
#40976 defect aborted Better defaults for assisted generation codebase moved on: assisted generation internals in candidate_generator.py and generation utils have diverged; PR removes/restores fields around assistant_generation_config, min/max length, and logits processors, making…
#40962 feature skipped perceptron: Isaac-0.1 implementation category not configured for this cumulative branch
#40898 feature skipped Adding [T5/MT5/UMT5]EncoderForSequenceClassification category not configured for this cumulative branch
#40888 documentation skipped DOC Fix help for chat and serve commands category not configured for this cumulative branch
#40887 other skipped Refactor output handling in generate for cleaner decoding methods category not configured for this cumulative branch
#40877 other skipped Bug #40833: Fix for kv_offset calculation for mixed padding category not configured for this cumulative branch
#40871 other skipped Refactor benchmark utils: add type hints, GPU metrics helper, and con… category not configured for this cumulative branch
#40870 feature skipped Reduce vRAM usage during generation by allowing to transfer logits to CPU category not configured for this cumulative branch
#40861 feature skipped Support n_groups>1 for mamba2 category not configured for this cumulative branch
#40840 feature skipped feat: add qwen2 pruning support category not configured for this cumulative branch
#40820 feature skipped Add models to benchmarks category not configured for this cumulative branch
#40759 feature skipped feat: add qwen3 pruning support category not configured for this cumulative branch
#40756 feature skipped [WIP] Add Canary category not configured for this cumulative branch
#40755 feature skipped [TimesFM] Add support for forecasting with covariates category not configured for this cumulative branch
#40740 defect validation_failed Configure assistant model generation_config with user parameters merge introduced F821 undefined name use_model_defaults in src/transformers/generation/utils.py; reset top merge
#40738 documentation skipped Docs: Clarify rjieba installation for RoFormerTokenizer category not configured for this cumulative branch
#40736 documentation skipped 🌐 [i18n-KO] Translated jan.md to Korean category not configured for this cumulative branch
#40728 feature skipped feat(serve): add OTEL category not configured for this cumulative branch
#40714 documentation skipped Remove TF and Flax from README category not configured for this cumulative branch
#40670 feature skipped Add ability to run Gemma 2 models without post layer norm category not configured for this cumulative branch
#40640 defect aborted Resume training by trained samples to avoid elastic job loss or over-reading of data. codebase moved on: trainer training loop is refactored into _setup_training/_run_epoch on cumulative branch while PR modifies the older monolithic loop and adds broad trainer tests; resolving sample-based resume safely …
#40648 other skipped Bump torch from 2.7.1 to 2.8.0 in /examples/flax/vision category not configured for this cumulative branch
#40637 feature skipped [WIP]Add openpangu_dense model category not configured for this cumulative branch
#40633 feature skipped Add support for Custom Accelerate Instance in Trainer category not configured for this cumulative branch
#40587 feature skipped feat(utils): add vision utils for embedding images and getting the hidden size category not configured for this cumulative branch
#40546 feature skipped Implement VibeVoice category not configured for this cumulative branch
#40524 documentation skipped Use begin_of_sequence token in all sliding windows for correct model behaviour category not configured for this cumulative branch
#40520 feature skipped [generate] add faster stop_strings stopping criteria category not configured for this cumulative branch
#40515 feature skipped Add Context-Aware Tokenizer Selection Utility Based on Corpus Analysis category not configured for this cumulative branch
#40505 other skipped Refactor Siglip-like models category not configured for this cumulative branch
#40493 defect aborted Update dtypes to suit colab bf16 -> fp16 -> fp32. codebase moved on: PR patches legacy modeling_utils dtype resolution around imports and _get_dtype; current branch has refactored imports and dtype handling, and resolving the bf16 fallback semantics would require non-m…
#40473 other skipped [tests] Unskip DeBERTaV2 tokenizer parity tests; re-enable fast/slow checks category not configured for this cumulative branch
#40471 documentation skipped DOC: Standardize CodeGen model card (issue #36979) category not configured for this cumulative branch
#40465 documentation skipped 🌐 [i18n-KO] Translated tools.md to Korean category not configured for this cumulative branch
#40464 documentation skipped 🌐 [i18n-KO] Translated agents.md to Korean category not configured for this cumulative branch
#40448 feature skipped [model] Support MiniCPM-V 4.5 category not configured for this cumulative branch
#40446 defect aborted Add convert_segmentation_map_to_binary_masks_sorted function for hand… codebase moved on: PR adds a NumPy sorted-mask helper to the old mask2former image processor, but current branch has split Mask2Former processing into PIL and fast/torch implementations; resolving would require designin…
#40425 defect aborted Fix check_quantized_param method when param_value is a safetensors slice codebase moved on: PR updates check_quantized_param methods in multiple quantizer classes, but the current branch has refactored quantization loading and those methods are no longer present in the touched quantizer file…
#40404 documentation skipped update model card for gpt-j category not configured for this cumulative branch
#40403 feature skipped Customizable Logit Warping Strategies for Generation #40010 category not configured for this cumulative branch
#40400 documentation skipped fixed redundant words in readme.md category not configured for this cumulative branch
#40395 documentation skipped 🌐 [i18n-KO] Updated text_generation.md category not configured for this cumulative branch
#40390 documentation skipped Fix typo: 'lenght' to 'length' category not configured for this cumulative branch
#40388 feature skipped Add fromjson filter to Jinja2 chat templates category not configured for this cumulative branch
#40328 feature skipped [rfc] Prototype to make torch.compile work with DynamicCache category not configured for this cumulative branch
#40299 feature skipped Remove deprecated max_size parameter from ConditionalDetr image processors category not configured for this cumulative branch
#40286 feature skipped Add MOSS-TTSD with XY-Tokenizer category not configured for this cumulative branch
#40265 other skipped Enable PLW and PLE rules category not configured for this cumulative branch
#40225 documentation skipped docs: clarify decoder_input_ids vs decoder_inputs_embeds usage (#39542) category not configured for this cumulative branch
#40209 feature skipped Add fast image processor for ViViT category not configured for this cumulative branch
#40180 feature skipped Enable native mxfp4 training support for GPT-OSS models category not configured for this cumulative branch
#40177 defect aborted Revert text_positions in Qwen25VL codebase moved on: PR targets old Qwen2.5-VL prepare_inputs_for_generation text/vision position concatenation, but current code moved position-id construction into _prepare_position_ids_for_generation and Qwen2_5_VLMode…
#40171 feature skipped Rename to CamelCase category not configured for this cumulative branch
#40155 documentation skipped 🌐 [i18n-KO] Translated t5.md to Korean category not configured for this cumulative branch
#40149 feature skipped Implemented fast image processor for VitPose category not configured for this cumulative branch
#40131 documentation skipped add missing Arabic translations category not configured for this cumulative branch
#40115 feature skipped [layer_types] update layer_types with conv category not configured for this cumulative branch
#40102 documentation skipped 🌐 [i18n-KO] Translated auto_docstring.md to Korean category not configured for this cumulative branch
#40092 feature skipped Optimize LlamaAttention by fusing QKV projections category not configured for this cumulative branch
#40064 documentation skipped 🌐 [i18n-KO] Translated videomae.md to Korean category not configured for this cumulative branch
#40061 documentation skipped 🌐 [i18n-KO] Translated vitdet.md to Korean category not configured for this cumulative branch
#40058 feature skipped GGUF Qwen2VL category not configured for this cumulative branch
#40055 feature skipped Auto-log parallelism info to wandb.config using HF Accelerate category not configured for this cumulative branch
#40047 documentation skipped Update wavlm.md to match new model card template category not configured for this cumulative branch
#40023 feature skipped Add support for SDPA for OWLViT and OWLv2 category not configured for this cumulative branch
#39987 feature skipped Add a VGGT(Visual Geometry Grounded Transformer) model compatible with huggingface transfromers category not configured for this cumulative branch
#39941 other skipped fixing image_utils.py todo category not configured for this cumulative branch
#39931 feature skipped Registers StaticCache serialization functions for torch.export.export category not configured for this cumulative branch
#39922 documentation skipped 🌐 [i18n-KO] Translated attention_interface.md to Korean category not configured for this cumulative branch
#39920 documentation skipped 🌐 [i18n-KO] Updated ko/perf_train_special.md category not configured for this cumulative branch
#39917 documentation skipped 🌐 [i18n-KO] Updated ko/perf_train_cpu.md category not configured for this cumulative branch
#39901 documentation skipped 🌐 [i18n-KO] Translated fp_quant to Korean category not configured for this cumulative branch
#39899 feature skipped [model] Support MiniCPM-V 4.0 category not configured for this cumulative branch
#39895 feature skipped Add Videoprism category not configured for this cumulative branch
#39886 documentation skipped 🌐 [i18n-KO] Translated perf_train_gaudi.md to Korean category not configured for this cumulative branch
#39859 feature skipped WIP: Initial support for bnb 4bit on any nn.Parameter category not configured for this cumulative branch
#39831 other skipped refactor(modeling_llama): make RotaryEmbedding default path explicit category not configured for this cumulative branch
#39807 documentation skipped 🌐 [i18n-KO] Translated bamba.md to Korean category not configured for this cumulative branch
#39796 feature skipped [pipelines] text-to-audio pipeline standardization category not configured for this cumulative branch
#39792 defect aborted Served models handle with nested content codebase moved on: PR modifies legacy src/transformers/commands/serving.py and tests/commands/test_serving.py, but both paths are deleted on the cumulative branch/current v5-era tree; resolving would require porting the…
#39785 defect aborted fix mllama integration tests codebase moved on: PR changes mllama hidden-state/intermediate-state return wiring in modeling_mllama.py, but that section has diverged substantially on the cumulative branch; resolving would require understanding the n…
#39772 defect aborted Fix missing initializations for models created in 2022 codebase moved on: PR touches weight-initialization code across dozens of model files, many of which have diverged or are generated/modular; direct merge produced broad conflicts that are not safe to resolve mechanicall…
#39760 feature skipped [Draft] Add Llasa TTS family of models category not configured for this cumulative branch
#39751 documentation skipped 🌐 [i18n-KO] Translated text-to-speech.md to Korean category not configured for this cumulative branch
#39735 defect aborted handle multimodal models with tp_plan on the text_config codebase moved on: PR expects older distribute_model(model, distributed_config, device_mesh, tp_size) API and injects tp_plan lookup there; current code has distribute_model(model, tp_plan, distributed_config, device_me…
#39722 feature skipped [Feat] Adding Intern-S1 category not configured for this cumulative branch
#39718 documentation skipped Fix SigLIP2 documentation model/processor mismatch category not configured for this cumulative branch
#39708 documentation skipped 🌐[i18n-bn] Introduce Bengali version of Transformers documentation category not configured for this cumulative branch
#39690 feature skipped Allow custom hf_quantizer in from_pretrained category not configured for this cumulative branch
#39632 documentation skipped fix dead NVIDIA link category not configured for this cumulative branch
#39631 feature skipped [serve] Add speech-to-text category not configured for this cumulative branch
#39588 feature skipped WIP, reference modeling category not configured for this cumulative branch
#39575 documentation skipped 🌐 [i18n-KO] Translated vitpose.md to Korean category not configured for this cumulative branch
#39563 documentation skipped 🌐 [i18n-KO] Translated vision-encoder-decoder.md to Korean category not configured for this cumulative branch
#39559 documentation skipped 🌐 [i18n-KO] Translated main_classes/deepspeed.md to Korean category not configured for this cumulative branch
#39555 other skipped [WIP] try to relax the tie_weights method category not configured for this cumulative branch
#39544 documentation skipped 🌐 [i18n-KO] Translated feature_extractors.md to Korea category not configured for this cumulative branch
#39541 feature skipped Add Muon optimizer implementation and integration category not configured for this cumulative branch
#39534 feature skipped Add Beit3 model category not configured for this cumulative branch
#39517 documentation skipped 🌐 [i18n-KO] Translated compressed_tensor.md to Korean category not configured for this cumulative branch
#39480 feature skipped Add model arcinstitute state category not configured for this cumulative branch
#39466 documentation skipped README: Update Bert Japanese model card category not configured for this cumulative branch
#39456 defect aborted Fix quantized model initialization for int8 dtypes codebase moved on: quantized initialization path was refactored/already addressed by PR 39491, and PR also modifies fast image processor files that are deleted in current v5 tree
#39435 defect aborted Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs codebase moved on: cherry-pick of test-only regression PR conflicts in tests/models/bart/test_modeling_bart.py; PR states it is not a fix yet and current file structure differs, so resolving a known-failing test without…
#39403 feature skipped Add Vocos model category not configured for this cumulative branch
#39357 documentation skipped Update docstring for glm4v category not configured for this cumulative branch
#39353 defect aborted fix colpali mapping codebase moved on: conflicts in modeling_utils VLM compatibility area removed/reworked in current tree and ColPali now uses base_model_prefix=vlm; applying old checkpoint mapping would need non-trivial review of the cur…
#39309 defect aborted Fix audio pipeline with torchcodec input codebase moved on: audio pipeline torch/tensor handling has been reworked (ASR now accepts torch tensors and averages multi-channel audio), causing conflicts in pipeline code and test expectations; resolving would requi…
#39303 documentation skipped Fix critical typos in code example category not configured for this cumulative branch
#39293 feature skipped Add T5LA models category not configured for this cumulative branch
#39251 defect aborted Fix slow test_moshika_greedy_unconditional_fp16 codebase moved on: PR modifies old cache_utils Cache class/helper layout and an older Moshi prepare_inputs_for_generation implementation, while current branch uses CacheLayerMixin and delegates Moshi input preparation t…
#39236 feature skipped added moment_p sampling category not configured for this cumulative branch
#39222 other skipped Enable granite 4 hybrid integration tests category not configured for this cumulative branch
#39212 documentation skipped Add Ukrainian translation of README.md category not configured for this cumulative branch
#39209 other skipped Standardize FSMT class naming: PretrainedFSMTModel → PreTrainedFSMTModel category not configured for this cumulative branch
#39183 feature skipped Add a chat extra category not configured for this cumulative branch
#39150 feature skipped Efficient Expert Weight Fusion for Moe deepseek v3 category not configured for this cumulative branch
#39140 feature skipped feat(trainer): emergency checkpointing on crashes & SIGTERM/SIGINT category not configured for this cumulative branch
#39109 defect aborted Fix: rename 'eval_strategy' to 'evaluation_strategy' in TrainingArgum… codebase moved on: PR renames TrainingArguments eval_strategy to evaluation_strategy in a stale one-file version, but current TrainingArguments has been extensively reorganized and still intentionally exposes eval_strat…
#39108 defect aborted Disable static cache on certain MoE models codebase moved on: PR branch has stale DeepseekV3/Dots1 generated code and conflicts in modular_deepseek_v3; current files no longer contain the targeted static-cache override and the PR head diff only removes stale _ca…
#39084 other skipped Refactor gemma3n category not configured for this cumulative branch
#39047 feature skipped RFC: refactor causal lm loss to handle lm_head in loss function category not configured for this cumulative branch
#39009 other skipped Add submodels support check function category not configured for this cumulative branch
#38991 feature skipped Remove return_dict kwarg from all the models category not configured for this cumulative branch
#38988 defect aborted Check docstring inside modular files as well codebase moved on: PR mixes a checker change with many stale modular/generated model cleanups; direct merge conflicts across Makefile, utils/check_docstrings.py, deleted fast-image/args_doc artifacts, and dozens of modu…
#38962 other skipped Update test_candidate_generator.py category not configured for this cumulative branch
#38959 documentation skipped Updated the model card for wav2vec2-phoneme category not configured for this cumulative branch
#38958 documentation skipped Updated model card for wav2vec2-conformer category not configured for this cumulative branch
#38957 documentation skipped Update wav2vec2-bert model card category not configured for this cumulative branch
#38956 documentation skipped Updating model card for wav2vec2 category not configured for this cumulative branch
#38955 documentation skipped docs: Musicgen melody model card category not configured for this cumulative branch
#38926 documentation skipped Clarify Python and framework version support in installation.md category not configured for this cumulative branch
#38923 other skipped Remove deprecated max_size support from YOLOS image processor category not configured for this cumulative branch
#38908 defect aborted Add support to use config dtype in HybridChunkedCache codebase moved on: PR changes HybridChunkedCache dtype initialization, but current v5 cache_utils has removed HybridChunkedCache/HybridCache and ends with StaticCache plus a SlidingWindowCache alias; applying the one-li…
#38893 other skipped Fix/deprecate max size conditional detr category not configured for this cumulative branch
#38886 feature skipped Allow compile with bnb category not configured for this cumulative branch
#38877 documentation skipped DOC: Clarify attention_mask usage in BertModel forward method category not configured for this cumulative branch
#38861 feature skipped Add SamImageProcessorFast with 4x performance improvement category not configured for this cumulative branch
#38859 feature skipped Add MobileViT fast image processor category not configured for this cumulative branch
#38839 other skipped [DO NOT MERGE] Testing saftensors 0.6.0 category not configured for this cumulative branch
#38810 feature skipped Add kwargs support in WhisperForConditionalGeneration category not configured for this cumulative branch
#38805 feature skipped Add Dust3R category not configured for this cumulative branch
#38793 documentation skipped Fix Typos in Comments and Improve Clarity category not configured for this cumulative branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.