Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
633 commits
Select commit Hold shift + click to select a range
008d9e9
Switch to canonical _is_hf_initialized flag per review
vai-minzhou Apr 24, 2026
b8ef47c
Merge branch 'main' into fix/seq2seq-decoder-encoder-attention-mask
duyhv-qualgo Apr 24, 2026
c3ef3d6
fix(qianfan_ocr): auto-fix failing tests
kaixuanliu Apr 24, 2026
b7689c6
Add 'requests' to serving extras dependencies
Oneirag Apr 24, 2026
b15878b
Merge pull request #1 from Oneirag/Oneirag-missing-requests-serving
Oneirag Apr 24, 2026
4df9607
Merge branch 'main' into fix/eta-warper-all-inf
Cyrilvallez Apr 24, 2026
343af8e
Processing Utils: honor pre-built sub-processor kwargs in from_pretra…
javierdejesusda Apr 24, 2026
7889d44
Fix local trust_remote_code cache key collisions
Jeevang1-epic Apr 24, 2026
08ac3d8
Move repetition penalty guard to logits processor
ruben-aghayan Apr 25, 2026
0361926
Merge branch 'main' into fix-repetition-penalty-inputs-embeds
ruben-aghayan Apr 25, 2026
47a512b
Fix xdist collisions for captured_info artifacts and preserve CI debu…
stationeros Apr 25, 2026
9abd5e7
Truncate hash to 16 chars to prevent Windows path length issues
Jeevang1-epic Apr 25, 2026
74480d4
Skip CPU param materialization on non-rank-0 FSDP ranks to avoid OOM
AmineDiro Apr 25, 2026
388ad09
Merge branch 'main' into gemma4-fix
kaixuanliu Apr 27, 2026
6165de2
update
kaixuanliu Apr 27, 2026
d94ced8
Merge branch 'main' into main
stationeros Apr 27, 2026
7a52c40
Merge branch 'main' into torch-type
zucchini-nlp Apr 27, 2026
deb916e
Fix EP+FSDP2: wrap EP-sharded params as DTensors and exclude experts …
AmineDiro Apr 27, 2026
7ad712a
mappings on classes, scoping for every transforms
yonigozlan Apr 27, 2026
c63a7d8
fix style
yonigozlan Apr 27, 2026
17de22d
cleanup imports
AmineDiro Apr 27, 2026
f2d7154
Merge remote-tracking branch 'upstream/main' into improve-weight-conv…
yonigozlan Apr 27, 2026
8f726c7
Fix deduplication removes submodel mappings of the same type
yonigozlan Apr 27, 2026
fd2c613
Fix scoped WeightConverter not applied in the correct order, now inte…
yonigozlan Apr 27, 2026
28ed270
temp fix paligemma
yonigozlan Apr 27, 2026
24660f6
Apply _local() to expert biases under EP
AmineDiro Apr 27, 2026
37c106b
Fix import ordering
AmineDiro Apr 27, 2026
c75244e
Fix incompatible mappings between head and base model for VLMs
yonigozlan Apr 28, 2026
dcf9519
glmasr should be in AutoModelForMultimodalLM
eustlb Apr 28, 2026
cb7ba4d
add dia to MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES
eustlb Apr 28, 2026
ba51f15
update revision for Phi-4 model to make it run w/o remote code
kaixuanliu Apr 28, 2026
1174779
update
kaixuanliu Apr 28, 2026
9c712a5
Refactor EP sharding to apply DTensor wrapping during loading
AmineDiro Apr 28, 2026
c2f5df2
Fix shared config mutation issue in flash_attn_from_config
kaixuanliu Apr 28, 2026
1101322
FIX Restore LoRA hotswapping functionality
BenjaminBossan Apr 28, 2026
7d87f95
Merge branch 'main' into main
stationeros Apr 28, 2026
a65c934
Exclude audio modules from conversion process
softguy777 Apr 28, 2026
e999543
update code
kaixuanliu Apr 28, 2026
9ea76fe
fix: Made histc_input robust for broader hardware
rigen1048 Apr 28, 2026
1dbd3da
fix gemma3 mapping
yonigozlan Apr 28, 2026
d8fe583
Merge branch 'main' into main
stationeros Apr 28, 2026
78e2c2d
Merge branch 'main' into OSI
rigen1048 Apr 28, 2026
cf88e4f
Use basename/hash for local trust_remote_code cache paths
Jeevang1-epic Apr 28, 2026
bbc60ad
Merge branch 'main' into main
stationeros Apr 28, 2026
75dad67
Fix lint issues with Ruff
rigen1048 Apr 28, 2026
c9cc099
cb error
SunMarc Apr 28, 2026
30f65e4
Fix custom-module copies inheriting read-only permissions (#45684)
nurpax Apr 28, 2026
b799f32
Fix more issues, address reviews
yonigozlan Apr 28, 2026
a702b56
Add option to override image_processor_auto_map with local code when …
yonigozlan Apr 28, 2026
f4118e2
Merge branch 'main' into fix-phi4-test
yonigozlan Apr 28, 2026
f0eeb8f
Merge branch 'main' into improve-weight-converter
yonigozlan Apr 28, 2026
4bcb04f
Merge branch 'mergeability-pr-45692' into all-defects-750
evalstate Apr 28, 2026
7236eaa
Merge branch 'mergeability-pr-45691' into all-defects-750
evalstate Apr 28, 2026
9fda37a
Merge branch 'mergeability-pr-45687' into all-defects-750
evalstate Apr 28, 2026
453831a
Merge branch 'mergeability-pr-45686' into all-defects-750
evalstate Apr 28, 2026
47edcd1
Merge branch 'mergeability-pr-45683' into all-defects-750
evalstate Apr 28, 2026
8ade0f4
Merge branch 'mergeability-pr-45682' into all-defects-750
evalstate Apr 28, 2026
8a78618
Merge branch 'mergeability-pr-45678' into all-defects-750
evalstate Apr 28, 2026
bb76ed2
Merge branch 'mergeability-pr-45671' into all-defects-750
evalstate Apr 28, 2026
5ba38b9
Merge branch 'mergeability-pr-45670' into all-defects-750
evalstate Apr 28, 2026
286da16
Merge branch 'mergeability-pr-45662' into all-defects-750
evalstate Apr 28, 2026
3eaec5b
Merge branch 'mergeability-pr-45661' into all-defects-750
evalstate Apr 28, 2026
584d284
Merge branch 'mergeability-pr-45649' into all-defects-750
evalstate Apr 28, 2026
0fda7a4
Merge branch 'mergeability-pr-45645' into all-defects-750
evalstate Apr 28, 2026
41be4e1
Merge branch 'mergeability-pr-45642' into all-defects-750
evalstate Apr 28, 2026
73326e2
refactor: Relocate tests
harshaljanjani Apr 29, 2026
936f92c
Fix train_batch_size and eval_batch_size to respect split_batches config
MinuriRajapakse Apr 29, 2026
4d0f5ea
Merge branch 'mergeability-pr-45694' into all-defects-750
evalstate Apr 29, 2026
bb4c2b1
Merge branch 'mergeability-pr-45627' into all-defects-750
evalstate Apr 29, 2026
7d56e34
Merge branch 'mergeability-pr-45615' into all-defects-750
evalstate Apr 29, 2026
a79d37e
fix(testing): check torch.cuda.is_available() before get_device_capab…
PHclaw Apr 29, 2026
6b7af19
Merge branch 'mergeability-pr-45614' into all-defects-750
evalstate Apr 29, 2026
40690cb
Merge branch 'mergeability-pr-45594' into all-defects-750
evalstate Apr 29, 2026
261a099
Merge branch 'mergeability-pr-45591' into all-defects-750
evalstate Apr 29, 2026
d4276fd
Merge branch 'mergeability-pr-45578' into all-defects-750
evalstate Apr 29, 2026
90ad846
Merge branch 'mergeability-pr-45570' into all-defects-750
evalstate Apr 29, 2026
9e51704
Merge branch 'mergeability-pr-45568' into all-defects-750
evalstate Apr 29, 2026
07a4e07
Merge branch 'mergeability-pr-45552' into all-defects-750
evalstate Apr 29, 2026
a91cf72
Merge branch 'mergeability-pr-45549' into all-defects-750
evalstate Apr 29, 2026
a9a88df
Merge branch 'mergeability-pr-45548' into all-defects-750
evalstate Apr 29, 2026
0712093
Merge branch 'mergeability-pr-45541' into all-defects-750
evalstate Apr 29, 2026
3cea3ff
Merge branch 'mergeability-pr-45524' into all-defects-750
evalstate Apr 29, 2026
693e178
Merge branch 'mergeability-pr-45523' into all-defects-750
evalstate Apr 29, 2026
353c382
Merge branch 'mergeability-pr-45487' into all-defects-750
evalstate Apr 29, 2026
bf3b2fc
Merge branch 'mergeability-pr-45423' into all-defects-750
evalstate Apr 29, 2026
8cb099f
Merge branch 'mergeability-pr-45422' into all-defects-750
evalstate Apr 29, 2026
0e524b8
Merge branch 'mergeability-pr-45413' into all-defects-750
evalstate Apr 29, 2026
1a4aa46
Merge branch 'mergeability-pr-45389' into all-defects-750
evalstate Apr 29, 2026
f0de1b3
Merge branch 'mergeability-pr-45379' into all-defects-750
evalstate Apr 29, 2026
917a893
Merge branch 'mergeability-pr-45378' into all-defects-750
evalstate Apr 29, 2026
7b68c9c
Merge branch 'mergeability-pr-45360' into all-defects-750
evalstate Apr 29, 2026
74797e6
Merge branch 'mergeability-pr-45346' into all-defects-750
evalstate Apr 29, 2026
87e7cb9
Merge branch 'mergeability-pr-45342' into all-defects-750
evalstate Apr 29, 2026
998f785
Merge branch 'mergeability-pr-45321' into all-defects-750
evalstate Apr 29, 2026
75512d8
Merge branch 'mergeability-pr-45317' into all-defects-750
evalstate Apr 29, 2026
627aafb
Apply PR #45221 audio video error fix
evalstate Apr 29, 2026
7d4f2ff
Merge branch 'mergeability-pr-45202' into all-defects-750
evalstate Apr 29, 2026
5d13b83
Merge branch 'mergeability-pr-45193' into all-defects-750
evalstate Apr 29, 2026
7240f99
Apply PR #45170: fix pre_layernorm typo
evalstate Apr 29, 2026
da190f0
Apply HQQ support fixes from PR #45147
evalstate Apr 29, 2026
aa9d4d3
Apply future-annotations auto_docstring fix from PR #45128
evalstate Apr 29, 2026
49d1cc4
Apply doctest fixes from PR #45114
evalstate Apr 29, 2026
b2fd525
Apply auto_docstring string-annotation fix from PR #45105
evalstate Apr 29, 2026
5d6cb0a
Merge branch 'mergeability-pr-45086' into all-defects-750
evalstate Apr 29, 2026
2e034ea
Merge branch 'mergeability-pr-45060' into all-defects-750
evalstate Apr 29, 2026
2ad8865
Merge branch 'mergeability-pr-45056' into all-defects-750
evalstate Apr 29, 2026
0d7dccf
Apply Trainer custom model checkpoint config fix (#45055)
evalstate Apr 29, 2026
f424c17
Merge branch 'mergeability-pr-45034' into all-defects-750
evalstate Apr 29, 2026
94bdbb1
Merge branch 'mergeability-pr-45017' into all-defects-750
evalstate Apr 29, 2026
a193191
Merge branch 'mergeability-pr-44981' into all-defects-750
evalstate Apr 29, 2026
841ee16
Merge branch 'mergeability-pr-44973' into all-defects-750
evalstate Apr 29, 2026
8b19118
Merge branch 'mergeability-pr-44958' into all-defects-750
evalstate Apr 29, 2026
a25ba13
Merge branch 'mergeability-pr-44952' into all-defects-750
evalstate Apr 29, 2026
3c7ab21
Merge branch 'mergeability-pr-44940' into all-defects-750
evalstate Apr 29, 2026
ca7bbbd
Merge branch 'mergeability-pr-44923' into all-defects-750
evalstate Apr 29, 2026
141c7ab
Merge branch 'mergeability-pr-44907' into all-defects-750
evalstate Apr 29, 2026
d3000bb
Merge branch 'mergeability-pr-44893' into all-defects-750
evalstate Apr 29, 2026
42395b4
Merge branch 'mergeability-pr-44889' into all-defects-750
evalstate Apr 29, 2026
6c2136c
Merge branch 'mergeability-pr-44836' into all-defects-750
evalstate Apr 29, 2026
0dcce01
Merge branch 'mergeability-pr-44827' into all-defects-750
evalstate Apr 29, 2026
61340a9
Fix `_set_model_specific_special_tokens` to accept list-format `extra…
bensons Mar 17, 2026
7752002
Merge branch 'mergeability-pr-44731' into all-defects-750
evalstate Apr 29, 2026
0651254
fix: torch_float should return float, not int
LincolnBurrows2017 Mar 14, 2026
4b53359
Merge branch 'mergeability-pr-44680' into all-defects-750
evalstate Apr 29, 2026
3e46789
Merge branch 'mergeability-pr-44664' into all-defects-750
evalstate Apr 29, 2026
a7c76bb
Merge branch 'mergeability-pr-44641' into all-defects-750
evalstate Apr 29, 2026
336874b
Merge branch 'mergeability-pr-44626' into all-defects-750
evalstate Apr 29, 2026
d157a9f
Merge branch 'mergeability-pr-44615' into all-defects-750
evalstate Apr 29, 2026
c8bc9ff
Merge branch 'mergeability-pr-44606' into all-defects-750
evalstate Apr 29, 2026
266b04f
Merge branch 'mergeability-pr-44603' into all-defects-750
evalstate Apr 29, 2026
3f4dc09
Merge branch 'mergeability-pr-44587' into all-defects-750
evalstate Apr 29, 2026
5e41492
Merge branch 'mergeability-pr-44585' into all-defects-750
evalstate Apr 29, 2026
8babe48
Merge branch 'mergeability-pr-44385' into all-defects-750
evalstate Apr 29, 2026
c386bfa
Merge branch 'mergeability-pr-44270' into all-defects-750
evalstate Apr 29, 2026
37bfe7e
Merge branch 'mergeability-pr-44257' into all-defects-750
evalstate Apr 29, 2026
e67239e
Merge branch 'mergeability-pr-44228' into all-defects-750
evalstate Apr 29, 2026
10dcfac
Merge branch 'mergeability-pr-44189' into all-defects-750
evalstate Apr 29, 2026
1cdecb7
Merge branch 'mergeability-pr-43989' into all-defects-750
evalstate Apr 29, 2026
5bdff6c
Merge branch 'mergeability-pr-43967' into all-defects-750
evalstate Apr 29, 2026
3b6fea8
Merge branch 'mergeability-pr-43961' into all-defects-750
evalstate Apr 29, 2026
2498545
Merge branch 'mergeability-pr-43911' into all-defects-750
evalstate Apr 29, 2026
9d58077
Merge branch 'mergeability-pr-43875' into all-defects-750
evalstate Apr 29, 2026
f07b909
Merge branch 'mergeability-pr-43833' into all-defects-750
evalstate Apr 29, 2026
08cd0b0
Merge branch 'mergeability-pr-43826' into all-defects-750
evalstate Apr 29, 2026
748d311
Merge branch 'mergeability-pr-43779' into all-defects-750
evalstate Apr 29, 2026
0574327
Merge branch 'mergeability-pr-43775' into all-defects-750
evalstate Apr 29, 2026
834e4f4
Merge branch 'mergeability-pr-43747' into all-defects-750
evalstate Apr 29, 2026
54d8f82
Merge branch 'mergeability-pr-43654' into all-defects-750
evalstate Apr 29, 2026
7d98f17
Merge branch 'mergeability-pr-43651' into all-defects-750
evalstate Apr 29, 2026
3ca1fc4
Merge branch 'mergeability-pr-43549' into all-defects-750
evalstate Apr 29, 2026
a3a5c52
Merge branch 'mergeability-pr-43543' into all-defects-750
evalstate Apr 29, 2026
3371349
Merge branch 'mergeability-pr-43492' into all-defects-750
evalstate Apr 29, 2026
8e2a101
Merge branch 'mergeability-pr-43466' into all-defects-750
evalstate Apr 29, 2026
09e1bd2
Merge branch 'mergeability-pr-43395' into all-defects-750
evalstate Apr 29, 2026
78316ff
Merge branch 'mergeability-pr-43382' into all-defects-750
evalstate Apr 29, 2026
d653ca9
Merge branch 'mergeability-pr-43378' into all-defects-750
evalstate Apr 29, 2026
db7652a
Merge branch 'mergeability-pr-43291' into all-defects-750
evalstate Apr 29, 2026
19445cb
Merge branch 'mergeability-pr-43270' into all-defects-750
evalstate Apr 29, 2026
92a4ab8
Merge branch 'mergeability-pr-43254' into all-defects-750
evalstate Apr 29, 2026
874da4b
Apply PR #43238 object detection batch fix
evalstate Apr 29, 2026
755aeff
Merge branch 'mergeability-pr-43212' into all-defects-750
evalstate Apr 29, 2026
3ed2716
Merge branch 'mergeability-pr-43151' into all-defects-750
evalstate Apr 29, 2026
b98390d
Apply SAM-HQ positional embedding sharing fix
evalstate Apr 29, 2026
256b576
Skip weight conversion when quantizer provides save state
evalstate Apr 29, 2026
8df09cf
Apply ViT BICUBIC default interpolation fix from PR #43028
evalstate Apr 29, 2026
64bd3ce
Merge branch 'mergeability-pr-43015' into all-defects-750
evalstate Apr 29, 2026
19760b2
Merge branch 'mergeability-pr-42979' into all-defects-750
evalstate Apr 29, 2026
56e26eb
Merge branch 'mergeability-pr-42942' into all-defects-750
evalstate Apr 29, 2026
d49826a
Merge branch 'mergeability-pr-42900' into all-defects-750
evalstate Apr 29, 2026
54d5437
Merge branch 'mergeability-pr-42881' into all-defects-750
evalstate Apr 29, 2026
dbf2fd6
Merge branch 'mergeability-pr-42865' into all-defects-750
evalstate Apr 29, 2026
497d85c
Apply MLX BatchFeature tensor conversion from PR #42824
evalstate Apr 29, 2026
9c796c8
Merge branch 'mergeability-pr-42793' into all-defects-750
evalstate Apr 29, 2026
590c45a
Merge branch 'mergeability-pr-42717' into all-defects-750
evalstate Apr 29, 2026
45cdf80
Merge branch 'mergeability-pr-42631' into all-defects-750
evalstate Apr 29, 2026
677c1d1
Merge branch 'mergeability-pr-42598' into all-defects-750
evalstate Apr 29, 2026
c5c59c3
Merge branch 'mergeability-pr-42521' into all-defects-750
evalstate Apr 29, 2026
613c833
Merge branch 'mergeability-pr-42493' into all-defects-750
evalstate Apr 29, 2026
2fd9a4e
Merge branch 'mergeability-pr-42446' into all-defects-750
evalstate Apr 29, 2026
8759c03
Merge branch 'mergeability-pr-42311' into all-defects-750
evalstate Apr 29, 2026
dddf40d
Merge branch 'mergeability-pr-42228' into all-defects-750
evalstate Apr 29, 2026
90f5cd5
Merge branch 'mergeability-pr-42133' into all-defects-750
evalstate Apr 29, 2026
2e8cc24
Merge branch 'mergeability-pr-42127' into all-defects-750
evalstate Apr 29, 2026
5ec95c0
Merge branch 'mergeability-pr-42098' into all-defects-750
evalstate Apr 29, 2026
787693c
Merge branch 'mergeability-pr-42051' into all-defects-750
evalstate Apr 29, 2026
fb3f1fb
Merge branch 'mergeability-pr-41973' into all-defects-750
evalstate Apr 29, 2026
22ceea3
Merge branch 'mergeability-pr-41928' into all-defects-750
evalstate Apr 29, 2026
9892b6d
Apply PR #41904 loss averaging fix
evalstate Apr 29, 2026
1524f7b
Merge branch 'mergeability-pr-41901' into all-defects-750
evalstate Apr 29, 2026
8d16856
Apply PR #41844 FSDPv2 TPU checkpoint unwrap fix
evalstate Apr 29, 2026
3e60b11
Apply PR #41827 FlashAttention compile guard
evalstate Apr 29, 2026
797464c
Apply PR #41754 cache pytree registration
evalstate Apr 29, 2026
ce06fb9
Merge branch 'mergeability-pr-41734' into all-defects-750
evalstate Apr 29, 2026
6ec76f7
Merge branch 'mergeability-pr-41724' into all-defects-750
evalstate Apr 29, 2026
284cd4c
Merge branch 'mergeability-pr-41721' into all-defects-750
evalstate Apr 29, 2026
3e44ff0
Merge branch 'mergeability-pr-41701' into all-defects-750
evalstate Apr 29, 2026
dafb36c
Merge branch 'mergeability-pr-41698' into all-defects-750
evalstate Apr 29, 2026
333839c
Merge branch 'mergeability-pr-41687' into all-defects-750
evalstate Apr 29, 2026
324617d
Merge branch 'mergeability-pr-41521' into all-defects-750
evalstate Apr 29, 2026
4b661c1
Apply SmolVLM quantization dtype fix from PR #41485
evalstate Apr 29, 2026
04bb5c7
Merge branch 'mergeability-pr-41441' into all-defects-750
evalstate Apr 29, 2026
c04f9f2
Merge branch 'mergeability-pr-41330' into all-defects-750
evalstate Apr 29, 2026
f42311c
Merge branch 'mergeability-pr-41313' into all-defects-750
evalstate Apr 29, 2026
0ae6fe3
Merge branch 'mergeability-pr-41169' into all-defects-750
evalstate Apr 29, 2026
5eaf62d
fix(SpeechT5Config): missing @property annotation
sw00 Sep 24, 2025
f071007
fix: Resolve unexpected video frame dropping for multi-video inputs
WesKwong Sep 24, 2025
5767e04
fix: update video output length calculation
WesKwong Sep 24, 2025
8c04115
Fix torch neuroncore availability check
evalstate Apr 29, 2026
ab9e694
Delay flash attention unpad index materialization
evalstate Apr 29, 2026
aba1bb7
Merge branch 'mergeability-pr-41077' into all-defects-750
evalstate Apr 29, 2026
aa92e98
Fix Qwen3 deterministic generation when do_sample=False
Flakes342 Sep 22, 2025
c70a1c5
Fix Qwen3 deterministic generation when do_sample=False
Flakes342 Sep 22, 2025
16f7f3a
Iamashamed
Flakes342 Sep 22, 2025
d6ef47b
Merge branch 'mergeability-pr-40908' into all-defects-750
evalstate Apr 29, 2026
1f2fd45
Apply PR #40790 checkpoint resume handling
evalstate Apr 29, 2026
edd6e3d
Merge branch 'mergeability-pr-40783' into all-defects-750
evalstate Apr 29, 2026
201e995
Merge branch 'mergeability-pr-40666' into all-defects-750
evalstate Apr 29, 2026
c0e181c
Apply PR #40492 divide-by-zero guards
evalstate Apr 29, 2026
ad6ce8a
Apply PR #40438 label_names fallback
evalstate Apr 29, 2026
c03d8e9
Merge branch 'mergeability-pr-40392' into all-defects-750
evalstate Apr 29, 2026
328ee57
Merge branch 'mergeability-pr-40385' into all-defects-750
evalstate Apr 29, 2026
3eda195
Apply PR #40358 MXFP4 MLP shape fix
evalstate Apr 29, 2026
d6fd902
Apply PR #40208 FSDP save_only_model sharded state fix
evalstate Apr 29, 2026
b9f367e
Merge branch 'mergeability-pr-40148' into all-defects-750
evalstate Apr 29, 2026
66137a3
Apply Mixtral torch.export expert loop fix (#40114)
evalstate Apr 29, 2026
0402db0
Merge branch 'mergeability-pr-40090' into all-defects-750
evalstate Apr 29, 2026
9b016f5
Merge branch 'mergeability-pr-40065' into all-defects-750
evalstate Apr 29, 2026
f375ccf
Merge branch 'mergeability-pr-40059' into all-defects-750
evalstate Apr 29, 2026
37c36d6
Merge branch 'mergeability-pr-40022' into all-defects-750
evalstate Apr 29, 2026
753cdc7
Apply PR 39999 tensor parallel meta device-map fix
evalstate Apr 29, 2026
90242c2
Merge branch 'mergeability-pr-39997' into all-defects-750
evalstate Apr 29, 2026
254ff61
Merge branch 'mergeability-pr-39866' into all-defects-750
evalstate Apr 29, 2026
cd54479
Apply PR #39794 fix for ProphetNet tuple encoder outputs
evalstate Apr 29, 2026
856779d
Merge branch 'mergeability-pr-39793' into all-defects-750
evalstate Apr 29, 2026
b883fe4
Merge branch 'mergeability-pr-39741' into all-defects-750
evalstate Apr 29, 2026
6267345
Apply PR #39698 Exaone4 sliding window pattern fix
evalstate Apr 29, 2026
4edb60c
Merge branch 'mergeability-pr-39697' into all-defects-750
evalstate Apr 29, 2026
d4d2a70
Apply PR #39683 respect dynamo disable env
evalstate Apr 29, 2026
534f37c
Apply PR #39674 scale loss by data parallel size
evalstate Apr 29, 2026
f8340cd
Apply PR #39599: guard missing TrainerState on resume
evalstate Apr 29, 2026
a826098
Apply PR #39560: save best model on eval
evalstate Apr 29, 2026
91dc36c
Merge branch 'mergeability-pr-39493' into all-defects-750
evalstate Apr 29, 2026
050c5b7
Merge branch 'mergeability-pr-39491' into all-defects-750
evalstate Apr 29, 2026
ab69093
Apply quantized dispatch fix from PR 39468
evalstate Apr 29, 2026
7675317
Merge branch 'mergeability-pr-39257' into all-defects-750
evalstate Apr 29, 2026
a83d5b3
Apply PR 39206: handle empty Qwen3 MoE router logits
evalstate Apr 29, 2026
cf3dd54
Apply PR 39103 Gemma3n audio token config rename
evalstate Apr 29, 2026
fe5e536
Apply PR 39046 DETR max_size handling
evalstate Apr 29, 2026
f827509
Apply PR 39037 Kosmos2 attention causality fix
evalstate Apr 29, 2026
2143f93
Merge branch 'mergeability-pr-38888' into all-defects-750
evalstate Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
13 changes: 12 additions & 1 deletion .github/workflows/model_jobs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,18 @@ jobs:
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/captured_info.txt"
shopt -s nullglob
captured_info_files=("/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"/captured_info*.txt)

if [ ${#captured_info_files[@]} -eq 0 ]; then
echo "No captured information files found."
exit 0
fi

for captured_info_file in "${captured_info_files[@]}"; do
echo "===== ${captured_info_file##*/} ====="
cat "$captured_info_file"
done

- name: Copy test_outputs.txt
if: ${{ always() }}
Expand Down
98 changes: 98 additions & 0 deletions all_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
gpustat==1.1.1
psutil==6.0.0
psycopg2==2.9.9
pandas>=1.5.0
numpy>=1.21.0
psutil>=5.8.0
nvidia-ml-py>=12.0.0
torch>=2.0.0
datasets>=2.10.0
huggingface_hub>=0.16.0
amdsmi>=7.0.2
git+https://github.com/huggingface/transformers.git@main # install main or adjust it with vX.X.X for installing version specific transforms
datasets==1.8.0accelerate >= 0.12.0
datasets >= 1.8.0
torch >= 1.3.0
evaluateaccelerate >= 0.21.0
sentencepiece != 0.1.92
protobuf
torch >= 1.3
datasets[audio]>=1.14.0
evaluate
librosa
torchaudio
torch>=1.6
accelerate >= 0.12.0
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
sacrebleu >= 1.4.12
py7zr
torch >= 1.3
evaluatedatasets >= 2.0.0
torch >= 1.3
accelerate
evaluate
Pillow
albumentations >= 1.4.16
accelerate >= 0.12.0
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
rouge-score
nltk
py7zr
torch >= 1.3
evaluate
torch>=1.5.0
torchvision>=0.6.0
datasets>=1.8.0accelerate >= 0.12.0
datasets >= 1.8.0
sentencepiece != 0.1.92
scipy
scikit-learn
protobuf
torch >= 1.3
evaluateaccelerate>=0.12.0
torch>=1.5.0
torchvision>=0.6.0
datasets>=2.14.0
evaluate
scikit-learnaccelerate >= 0.12.0
torch >= 1.3
datasets >= 2.14.0
sentencepiece != 0.1.92
protobuf
evaluate
scikit-learn
accelerate >= 0.12.0
seqeval
datasets >= 1.8.0
torch >= 1.3
evaluatealbumentations >= 1.4.16
timm
datasets>=4.0
torchmetrics
pycocotools
datasets[audio] >= 1.18.0
torch >= 1.5
torchaudio
librosa
jiwer
evaluate
datasets[audio] >= 1.12.0
torch >= 1.5
torchaudio
accelerate >= 0.12.0
librosatorch>=1.5.0
torchvision>=0.6.0
datasets>=1.8.0albumentations >= 1.4.16
timm
datasets
torchmetrics
pycocotools
accelerate >= 0.12.0
sentencepiece != 0.1.92
protobuf
torch >= 1.3
evaluate
19 changes: 17 additions & 2 deletions docker/transformers-all-latest-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,20 @@ ARG TORCHCODEC='0.11.0'

ARG FLASH_ATTN='false'

# 'x86_64' or 'arm64'
ARG ARCHITECTURE='x86_64'

RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs curl
RUN git lfs install

RUN set-e; \
if [ "$ARCHITECTURE" = "arm64" ]; then \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y;\
PATH="/root/.cargo/bin:${PATH}";\
rustc --version;\
fi;

RUN python3 -m pip install --no-cache-dir --upgrade pip

ARG REF=main
Expand All @@ -36,7 +47,11 @@ RUN set -e; \
# Determine torch version
if [ ${#PYTORCH} -gt 0 ] && [ "$PYTORCH" != "pre" ]; then \
VERSION="torch==${PYTORCH}.*"; \
TORCHCODEC_VERSION="torchcodec==${TORCHCODEC}.*"; \
if [ "$ARCHITECTURE" = "arm64" ]; then \
TORCHCODEC_VERSION="torchcodec"; \
else \
TORCHCODEC_VERSION="torchcodec==${TORCHCODEC}.*"; \
fi; \
else \
VERSION="torch"; \
TORCHCODEC_VERSION="torchcodec"; \
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/auto_docstring.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,11 +134,11 @@ class MyModelConfig(PreTrainedConfig):
Description of another model-specific parameter.

```python
>>> from transformers import MyModelConfig, MyModel
from transformers import MyModelConfig, MyModel

>>> configuration = MyModelConfig()
>>> model = MyModel(configuration)
>>> configuration = model.config
configuration = MyModelConfig()
model = MyModel(configuration)
configuration = model.config
```
"""

Expand Down
14 changes: 10 additions & 4 deletions docs/source/en/internal/import_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,24 @@ This object is still importable:

```python
>>> from transformers import DetrImageProcessor
>>> print(DetrImageProcessor)
<class 'DetrImageProcessor'>
>>> print(DetrImageProcessor) # doctest: +ELLIPSIS
<class '...DetrImageProcessor'>
```

However, no method can be called on that object:

```python
>>> from transformers.utils.import_utils import BACKENDS_MAPPING, DummyObject
>>> _torchvision_backend = BACKENDS_MAPPING["torchvision"]
>>> BACKENDS_MAPPING["torchvision"] = (lambda: False, _torchvision_backend[1].lstrip("\n"))
>>> DetrImageProcessor = DummyObject("DetrImageProcessor", (), {"_backends": ["torchvision"]})
>>> DetrImageProcessor.from_pretrained()
ImportError:
DetrImageProcessor requires the Torchvision library but it was not found in your environment. Check out the instructions on the
Traceback (most recent call last):
...
ImportError: DetrImageProcessor requires the Torchvision library but it was not found in your environment. Check out the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.
>>> BACKENDS_MAPPING["torchvision"] = _torchvision_backend
```

Let's see how to specify specific object dependencies.
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/main_classes/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ pipeline but can provide additional quality of life.
Simple call on one item:

```python
>>> from transformers import pipeline
>>> pipe = pipeline("text-classification")
>>> pipe("This restaurant is awesome")
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
Expand Down
42 changes: 41 additions & 1 deletion docs/source/en/model_doc/pe_audio_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,47 @@ TODO
### Basic usage

```py
TODO

model = PeAudioVideoModel.from_pretrained("facebook/pe-av-large", device_map="cuda", dtype=torch.bfloat16)
processor = PeAudioVideoProcessor.from_pretrained("facebook/pe-av-large")

from huggingface_hub import hf_hub_download

video_path = hf_hub_download(
repo_id="eustlb/dummy-video-dataset", filename="audiobox.mp4", repo_type="dataset"
)

video_path2 = hf_hub_download(
repo_id="eustlb/dummy-video-dataset", filename="glass_breaking.mp4", repo_type="dataset"
)

audio_path = hf_hub_download(
repo_id="eustlb/dummy-video-dataset", filename="audiobox.mp4", repo_type="dataset"
)

audio_path2 = hf_hub_download(
repo_id="eustlb/dummy-video-dataset", filename="glass_breaking.mp4", repo_type="dataset"
)

video_files = [video_path, video_path2]
descriptions = ["A woman and a man speaking", "A glass breaking"]
audio_files = [audio_path, audio_path2]

inputs = processor(
videos=video_files, text=descriptions, audio=audio_files, return_tensors="pt", padding=True
)

with torch.inference_mode(), torch.autocast(model.device.type, dtype=torch.bfloat16):
outputs = model(**inputs.to(model.device, dtype=model.dtype))

audio_embeds = outputs.audio_embeds # Audio-only embeddings
video_embeds = outputs.video_embeds # Video-only embeddings
audio_video_embeds = outputs.audio_video_embeds # Joint audio-video embeddings
text_audio_embeds = outputs.text_audio_embeds # Text embeddings aligned to audio
text_video_embeds = outputs.text_video_embeds # Text embeddings aligned to video
text_audio_video_embeds = outputs.text_audio_video_embeds # Text embeddings aligned to audio-video
audio_plus_text_embeds = outputs.audio_plus_text_embeds # Joint audio and text embedding
video_plus_text_embeds = outputs.video_plus_text_embeds # Joint video and text embedding
```

## PeAudioVideoProcessor
Expand Down
9 changes: 7 additions & 2 deletions docs/source/en/model_doc/qwen3_5.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,19 @@ TODO
[[autodoc]] Qwen3_5ForCausalLM
- forward

## Qwen3_5ForConditionalGeneration

[[autodoc]] Qwen3_5ForConditionalGeneration
- forward

## Qwen3_5ForSequenceClassification

[[autodoc]] Qwen3_5ForSequenceClassification
- forward

## Qwen3_5ForConditionalGeneration
## Qwen3_5TextForSequenceClassification

[[autodoc]] Qwen3_5ForConditionalGeneration
[[autodoc]] Qwen3_5TextForSequenceClassification
- forward

## Qwen3_5Tokenizer
Expand Down
3 changes: 1 addition & 2 deletions docs/source/en/tasks/zero_shot_object_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,7 @@ boxes have the correct coordinates relative to the original image:
... outputs = model(**inputs)

>>> results = processor.post_process_grounded_object_detection(
... outputs, threshold=0.50, target_sizes=[(image.height, image.width)], text_labels=text_labels,
... )[0]
... outputs, threshold=0.50, target_sizes=[(image.height, image.width)], text_labels=text_labels)[0]

>>> draw = ImageDraw.Draw(image)

Expand Down
10 changes: 4 additions & 6 deletions examples/modular-transformers/modeling_new_task_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,10 +160,8 @@ def create_causal_mask_mapping(
# from `forward` call. If users run a `forward` call, we have no option to infer `is_first_iteration` because users may be
# running generation with custom loop. Thus we need to infer it in a `non-perfect` way
# NOTE: Determining prefill in that case requires checking data values, which is not compile-compatible.
is_first_iteration = (
is_first_iteration
if is_first_iteration
else (past_key_values is None or not past_key_values.is_initialized or pixel_values is not None)
is_first_iteration = is_first_iteration or (
past_key_values is None or not past_key_values.is_initialized or pixel_values is not None
)

if is_first_iteration or not kwargs.get("use_cache", True):
Expand Down Expand Up @@ -256,9 +254,9 @@ def get_placeholder_mask(

n_image_tokens = special_image_mask.sum()
n_image_features = image_features.shape[0] * image_features.shape[1]
special_image_mask = special_image_mask.unsqueeze(-1).expand_as(inputs_embeds).to(inputs_embeds.device)
special_image_mask = special_image_mask.unsqueeze(-1).to(inputs_embeds.device)
torch_compilable_check(
inputs_embeds[special_image_mask].numel() == image_features.numel(),
n_image_tokens * inputs_embeds.shape[-1] == image_features.numel(),
f"Image features and image tokens do not match, tokens: {n_image_tokens}, features: {n_image_features}",
)
return special_image_mask
Expand Down
2 changes: 1 addition & 1 deletion examples/pytorch/image-pretraining/run_mim_no_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,7 +633,7 @@ def preprocess_images(examples):
)

# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
if accelerator.distributed_type == DistributedType.XLA:
model.tie_weights()

# We need to recalculate our total training steps as the size of the training dataloader may have changed.
Expand Down
12 changes: 8 additions & 4 deletions examples/pytorch/language-modeling/run_clm_no_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ def group_texts(examples):
)

# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
if accelerator.distributed_type == DistributedType.XLA:
model.tie_weights()

# We need to recalculate our total training steps as the size of the training dataloader may have changed.
Expand Down Expand Up @@ -627,6 +627,7 @@ def group_texts(examples):
model.train()
if args.with_tracking:
total_loss = 0
total_samples = 0
if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None:
# We skip the first `n` batches in the dataloader when resuming from a checkpoint
active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step)
Expand All @@ -638,7 +639,9 @@ def group_texts(examples):
loss = outputs.loss
# We keep track of the loss at each epoch
if args.with_tracking:
total_loss += loss.detach().float()
batch_size = batch["input_ids"].shape[0]
total_loss += loss.detach().float() * batch_size
total_samples += batch_size
accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
Expand All @@ -665,7 +668,8 @@ def group_texts(examples):
outputs = model(**batch)

loss = outputs.loss
losses.append(accelerator.gather_for_metrics(loss.repeat(args.per_device_eval_batch_size)))
batch_size = batch["input_ids"].shape[0]
losses.append(accelerator.gather_for_metrics(loss.repeat(batch_size)))

losses = torch.cat(losses)
try:
Expand All @@ -681,7 +685,7 @@ def group_texts(examples):
{
"perplexity": perplexity,
"eval_loss": eval_loss,
"train_loss": total_loss.item() / len(train_dataloader),
"train_loss": total_loss.item() / total_samples,
"epoch": epoch,
"step": completed_steps,
},
Expand Down
Loading