New wo models by LysandreJik · Pull Request #3 · LysandreJik/transformers

LysandreJik · 2025-06-25T13:42:50Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

apex amp is deprecated

) flax/mistral: Allow sliding_window to be set to none

Num parameters in index.json

* Remove type annotation * remove print statement

* fix * fix * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * update * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix 1: not sure * fix 2: _supports_flex_attn = False * fix 3: embedding_output = self.layernorm(query_embeds.to(self.layernorm.weight.dtype)) * fix 4: query_embeds = query_embeds.to(self.layernorm.weight.dtype) * fix 5: text_embeds = text_embeds.to(dtype=torch.float16) * fix 5: question_embeds.to(dtype=torch.float16) * fix 6: text_embeds = text_embeds.to(dtype=self.itm_head.weight.dtype) * fix 7: image_embeds and question_embeds * fix 8: fix other 2 fp16 tests * fix 9: fix T5 OOM * fix 10: fix T5 OOM * fix 11: fix T5 * fix 11: fix T5 beam * fix 12: _supports_sdpa=False * fix 12: style and expect * revert * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* expand the test for VLMs * typo * mark models `supports_flex` + expand test for additional kwargs * flex attn for refactored vision models * fix copies * fix * unskip * style * address comments

* don't use default attn if pre-set in sib-config * style * add a test maybe

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update * Update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…38491) [bugfix] fix apply_rotary_emb error on Ascend NPU

* Fix: change to `python3` * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…e#38553) Update tokenization_utils_base.py Add encoding explicitly

* fix * fix * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* bc * style

fix table

* Fix multiple devices error on Janus * Fix AttributeError on Janus BOI token * Initialize lm first in Janus to get correct device map * Added expectations for Janus test_model_generate_images * Fixed JanusVisionEncoderLayer being split across devices * Code formatting * Adding modeling file * Reverted changes out of scope for this PR

* end-to-end architecture * lightning-attn: refactor, clean, optimize * put minimax_text_01 in other files * use latest __init__ standards and auto-generate modular * support attention_mask for lightning-attn * Revert "use latest __init__ standards and auto-generate modular" This reverts commit d8d3c40. * fix modular conversion * pass both attention masks instead of tuple * formatting * Updated Dynamic Cache * created MiniMaxText01Cache * fix hardcoded slope_rate * update attn_type_list in config * fix lightning when use_cache=False * copy tests from mixtral * (checkpoint) all tests pass for normal attention * fix all unittests * fix import sorting * fix consistency and formatting tests * fix config * update tests, since changes in main * fix seq_len error * create dummy docs * fix checkpoint * add checkpoint in config docstring * run modular_conversion * update docs * fix checkpoint path and update tests * fix ruff * remove repeated expected_slice * update docs * rename "minimax-text-01" to "minimax" * inherit config from mixtral * remove from docs in other languages * undo files that should be untouched * move minimax to end in conversation docs * use MiniMaxForCausalLM as it is * ruff fixes * run modular * fix docstring example in causallm * refactor attention loop and decay factors * refactor config in modular * run modular * refactor cache * rename static_cache to linear_cache * make positional embeddings necessary * remove unnecessary layernorms declarations * fix import in tests * refactor attention in next tokens * remove outdated code * formatting and modular * update tests * rename layernorm alpha/beta factors * register decay factors as buffers * remove unused declarations of decay factors * update config for alpha/beta factors * run modular * remove head_dim in tests * remove minimax from fx.py * remove stuff that is not really needed * update __init__ * update qkv torch.split Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix qkv torch.split * quality fixes * remove mistakenly added dummy * purge unused ModelTester code * fix-copies * run fix-copies * fix head_dim * write cache formatting tests * remove postnorm * avoid contiguous in attention current states * update expected_slice * add generation test for integration * fix dtype in generation test * update authors * update with changes in main * update graident checkpointing and minor fixes * fix mutable attn_type_list * rename: attn_type -> layer_type * update for layer_types * update integration tests * update checkpoint * clean overview in docs --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…#38563) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix 1 * fix 2 * fix 3 * fix 4 * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix * style * check * check 2 * add deepseek workaround

…ggingface#38577)

allow custom head_dim Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* feat: add `repository` field to benchmarks table * fix: remove unwanted `,`

* Fix: resolve import order and duplicate import (ruff I001, F811) * Format: clean up Dinov2 test file with ruff formatter * Add _no_split_modules = ['Dinov2Layer'] to enable device_map='auto' * Revert dinov2_with_registers _no_split_modules to original state * Remove redundant device_map test as suggested * Remove unused import after deleting test * removed import torch and the redundant test function * Update tests/models/dinov2/test_modeling_dinov2.py --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

Fix "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu" error running the following roformer tests on GPUs (CUDA or XPU): ``` tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings ``` Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Commit for new_gpt_model_card. * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

small fix

* Add Arcee model support to transformers - Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer - Use LlamaTokenizer for tokenization - Add FX graph support for Arcee models - Create lazy loading module structure for Arcee * feat: update YARN scaling and RoPE validation for Arcee model * feat: add auto_docstring checkpoint config to Arcee model classes * docs: add pre-trained model weights reference to Arcee configuration files * refactor: move RoPE utilities to dedicated modeling_rope_utils module * Add comprehensive test suite for Arcee model - Add test_modeling_arcee.py following standard transformers test patterns - Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add specific test for ReLU² activation in ArceeMLP - Add RoPE scaling tests including YARN support - Follow CausalLMModelTest pattern used by similar models * Add documentation for Arcee model - Add comprehensive model documentation with usage examples - Include all model variants in autodoc - Add to table of contents in proper alphabetical order - Fixes documentation coverage for Arcee model classes * Make style/fixup * fix copyright year * Sync modular conversion * revert in legacy supported models in src/transformers/utils/fx * cleaned redundant code in modular_arcee.py * cleaned testing * removed pretraining tp * fix styles * integration testing --------- Co-authored-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>

* fix modular * Update modular_arcee.py * fix

…xt (huggingface#37506) Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix(qwen3moe): skip experts with no workload * avoid tolist and also update other moe models * fix: should squeeze 0-dim only

* sort correctly * Update modeling_minimax.py * Update modular_model_converter.py

…uggingface#38833) * ensure the query is updated during training avoid unused parameters that DDP does not like * avoid a crash when `kwargs` contain `padding=True` trainers often pass this argument automatically * minor * Remove mel_spec lazy init, and rename to mel_filters. this ensures save_pretrained will not crash when saving the processor during training https://github.com/huggingface/transformers/blob/d5d007a1a0f0c11a726a54c8f00bd71825f84d02/src/transformers/feature_extraction_utils.py#L595 * minor - most feature extractors has a `sampling_rate` property

* fix * fix * fix flow * remove non compiling path * change * style * fix * update * update pin * revert

* first draft * cleaner version * udpate tests + modeling * add tests * init * udpate test_modeling_common * fix tests * csm Processor draft * convertion update * mimi cache padding convolutions draft * mimi streaming udpates * update mimi padding cache test * udpate cache padding mimi test * make style mimi * updates generate moshi asr * moshi asr integration tests (single + batched) * update tests * update conversion script * good default sliding window value * udpdate generate * update test checkpoint * nit * fix mimi * fix codec prefix * revert * revert * update config * update config * unnecessary mimi input restriction * remove delay in tokens * remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask * test update * modular update * make style * nit * rename * create codec model generation config at init * remove delay * max_new_tokens/length warning * correct conv1 padding cache import for modular * nit * fix on encoder_past_key_values * convert modular * move frame_size to config * move frame_size to config * update test name * handle first token is bos * better handling of max_new_tokens * fix * fix batch size in test input prep * update docstring * convert modular * make style * make style * add feature extractor * correct modular convention name for feature_extraction file * update convertion script * doc processor * update doc * udpate init * update model type * fixes * update tests * fix * make * add doc * nit * fix * doc * auto mappings * doc * nit * convert modular * doc * nit * extend _keep_in_fp32_modules to enforce fp32 * renaming to stt * doc update + test update * doc fixes * doc fix * doc fix * fix musicgen tests * fix musicgen tests * make style * fix musicgen tests * correct frame_rate config param for mimi * update mimi test * revert update mimi test * enforce cpu test * move cache init in cache class * convert modular * docstring update * update model id * feature_extractor -> feature_extraction (SEW) * convert modular * update model id

* Fix bugs in DynamicCache * Updarte * Update * Lint * lint * Rename test * update * update

add ivarflakstad to self-comment-ci.yml

…gface#39010)

…-processing (huggingface#39002) * ThreadPool instead of Pool for parallel pre-processing * ThreadPool only if hpu available

huggingface#38954) * Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.) * Update quicktour.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

…ector_descriptor_dim (huggingface#39021) fix: fix descriptor dimension handling in LightGlue model

* Add zero dim tensor check when using flash_attention Signed-off-by: ranzhejiang <zhejiang.ran@intel.com> * Add zero dim tensor check when using flash_attention Signed-off-by: ranzhejiang <zhejiang.ran@intel.com> --------- Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

…one and batch size > 1 (huggingface#37332) * Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 * fix code format * add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1 * add assert loss is not nan

Remove duplicate code

…uggingface#38880) * don't move the whole video to GPU * add torchcodec * add tests * make style * instrucblip as well * consistency * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/video_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Drop unnecessary tokens in GPT2Model generation. Co-authored-by: Yi Pan <conlesspan@outlook.com>

* 20250508 Model Architecture * Update modeling_glm4v.py * Update modeling_glm4v.py * Update modeling_glm4v.py * update 1447 * 0526 * update * format * problem * update * update with only image embed diff * Final * upload * update * 1 * upload with ruff * update * update * work * 1 * 1 * update with new note * 2 * Update convert_glm4v_mgt_weights_to_hf.py * Update tokenization_auto.py * update with new format * remove rmsnrom * draft with videos * draft * update * update * fix for review problem * try to remove min_pixel * update * for test * remove timestamps * remove item * update with remove * change * update 2200 * update * Delete app.py * format * update * Update test_video_processing_glm4v.py * 1 * 2 * use new name * Update test_video_processing_glm4v.py * remove docs * change * update for image processors update * 2108 * 2128 * Update modular_glm4v.py * 1 * update some * update * rename * 1 * remove tests output * 2 * add configuration * update * Update test_video_processing_glm4v.py * fix simple forward tests * update with modular * 1 * fix more tests * fix generation test * fix beam search and init * modular changed * fix beam search in case of single-image/video. Fails if multiple visuals per text * update processor * update test * pass * fix beam search * update * param correct * Update convert_glm4v_mgt_weights_to_hf.py * 1 * Update test_modeling_glm4v.py * 4 * 2 * 2123 video process * 2 * revert * 1 * 2 * revert processing * update preprocesor * changed * 1 * update * update * 6 * update * update * update * Delete tmp.txt * config * Update video_processing_glm4v.py * apply modular correctly * move functions * fix order * update the longest_edge * style * simplify a lot * fix random order of classes * skip integration tests * correctly fix the tests * fix TP plan --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Initial submit * Fix bugs: 1. add __init__ file 2. tied word embedding 3. support flash/flex attention 4. model saving and loading * Code refactor: * Rename encdecgemma to t5gemma. * Split attention into self- and cross-attention * Split stack into encoder and decoder * Add test cases * Add auto configuration * Update configurations. * Fix bugs related to copy and attribute checks * Fix type union * Fix merge errors * run ruff format * Run make style and update tests. * Add t5gemma model doc. * ruff and style formatting. * Add missed module config. * Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.). * Update model doc. * Minor updates following Arthur's comments: * replace docstrings with auto_docstrings * remove checkpoint layers * remove deprecate_kwargs * fix rebase errors * Fix docstring issues. * fix t5gemma doc issue. * run ruff format * Updates: * split encoder-only model out * make t5gemmamodel encoder-decoder only * update token and sequence classification * update tests

* add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>

* Fix the seamless_m4t cannot work on Gaudi Signed-off-by: yuanwu <yuan.wu@intel.com> * Refine the patch Signed-off-by: yuanwu <yuan.wu@intel.com> * Fix seamless_m4t_v2 crash Signed-off-by: yuanwu <yuan.wu@intel.com> * Use the patched_gather Signed-off-by: yuanwu <yuan.wu@intel.com> * Remove debug logs Signed-off-by: yuanwu <yuan.wu@intel.com> * Remove useless modifications Signed-off-by: yuanwu <yuan.wu@intel.com> * Add hpu check Signed-off-by: yuanwu <yuan.wu@intel.com> * Add comments Signed-off-by: yuanwu <yuan.wu@intel.com> --------- Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on huggingface#36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix

SunMarc and others added 30 commits June 2, 2025 16:15

Fix amp deprecation issue (huggingface#38100)

1a25fd2

apex amp is deprecated

[flax/mistral] support sliding_window: null in config (huggingface#37402

cceab97

) flax/mistral: Allow sliding_window to be set to none

Num parameters in model.safetensors.index.json (huggingface#38531)

afb35a1

Num parameters in index.json

Remove type annotation in Siglip Attention Module (huggingface#38503)

1094dd3

* Remove type annotation * remove print statement

Fix Gemma2IntegrationTest (huggingface#38492)

ccc8596

* fix * fix * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * update * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[tests] expand flex-attn test for vision models (huggingface#38434)

bf68dd9

* expand the test for VLMs * typo * mark models `supports_flex` + expand test for additional kwargs * flex attn for refactored vision models * fix copies * fix * unskip * style * address comments

Don't use default attn if pre-set in sub-config (huggingface#38526)

55ec319

* don't use default attn if pre-set in sib-config * style * add a test maybe

update emu3 test (huggingface#38543)

8144324

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Update docker image to use av (huggingface#38548)

ca0a682

* Update * Update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[bugfix] [WIP] fix apply_rotary_emb error on Ascend NPU (huggingface#…

fdf86fb

…38491) [bugfix] fix apply_rotary_emb error on Ascend NPU

[TP] Change command in tests to python3 (huggingface#38555)

caf708d

* Fix: change to `python3` * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Explicitly setting encoding in tokenization_utils_base.py (huggingfac…

8cb9678

…e#38553) Update tokenization_utils_base.py Add encoding explicitly

Fix utils/notification_service.py (huggingface#38556)

e8b292e

* fix * fix * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Name change AOPermod -> ModuleFqn (huggingface#38456)

279000b

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

Fix hqq issue (huggingface#38551)

0f41c41

* bc * style

[docs] Format fix (huggingface#38414)

78d771c

fix table

Fix chameleon tests (huggingface#38565)

3c995c1

* update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

update utils/notification_service.py for AMD vs Nvidia (huggingface…

6085cde

…#38563) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix deepseekv3 (huggingface#38562)

ff3fad6

* fix 1 * fix 2 * fix 3 * fix 4 * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[FlexAttn] Fix models with unique characteristics (huggingface#38433)

1dc619e

* fix * style * check * check 2 * add deepseek workaround

fix(attention_visualizer): add default value for image_seq_length (hu…

82fa68c

…ggingface#38577)

allow custom head_dim for qwen2_moe (huggingface#37188)

6c5d4b1

allow custom head_dim Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Docs: fix code formatting in torchao docs (huggingface#38504)

1285aec

feat: add repository field to benchmarks table (huggingface#38582)

ae3733f

* feat: add `repository` field to benchmarks table * fix: remove unwanted `,`

vasqu and others added 27 commits June 24, 2025 14:42

[Attention] Small fix on output attentions (huggingface#38948)

23c89a6

small fix

Fixes for Arcee model (huggingface#39001)

1636a7b

* fix modular * Update modular_arcee.py * fix

Added scikit-learn to the example image-classification requirements.t…

9f42c1f

…xt (huggingface#37506) Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update attention_visualizer.py (huggingface#37860)

719058c

Skip non-selected experts for qwen3_moe (huggingface#38133)

bdf5fb7

* fix(qwen3moe): skip experts with no workload * avoid tolist and also update other moe models * fix: should squeeze 0-dim only

Fix undeterministic order in modular dependencies (huggingface#39005)

e1e11b0

* sort correctly * Update modeling_minimax.py * Update modular_model_converter.py

Add kernelize to transformers (huggingface#38205)

08bf7f1

* fix * fix * fix flow * remove non compiling path * change * style * fix * update * update pin * revert

Fix bugs in DynamicCache (huggingface#37880)

67d36dc

* Fix bugs in DynamicCache * Updarte * Update * Lint * lint * Rename test * update * update

Update self-comment-ci.yml user list (huggingface#39014)

f367c63

add ivarflakstad to self-comment-ci.yml

Skip sdpa dispatch on flash test due to unsupported head dims (huggin…

995666e

…gface#39010)

[HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre…

ea9a309

…-processing (huggingface#39002) * ThreadPool instead of Pool for parallel pre-processing * ThreadPool only if hpu available

Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (

48b6ef0

huggingface#38954) * Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.) * Update quicktour.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

[LightGlue] Fixed attribute usage from descriptor_dim to keypoint_det…

ca402e2

…ector_descriptor_dim (huggingface#39021) fix: fix descriptor dimension handling in LightGlue model

[AutoModelForMaskGeneration] Remove duplicate code (huggingface#38622)

11d0fea

Remove duplicate code

Drop unnecessary tokens in GPT2Model generation (huggingface#39016)

7b38073

Drop unnecessary tokens in GPT2Model generation. Co-authored-by: Yi Pan <conlesspan@outlook.com>

[Model] add dots1 (huggingface#38143)

7503cb9

* add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>

Remove

87be528

LysandreJik changed the base branch from old-wo-models to main June 25, 2025 13:43

LysandreJik changed the base branch from main to old-wo-models June 25, 2025 13:43

LysandreJik closed this Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New wo models#3

New wo models#3
LysandreJik wants to merge 273 commits intoold-wo-modelsfrom
new-wo-models

LysandreJik commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

LysandreJik commented Jun 25, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants