Refacto GGUF weight conversion by ArthurZucker · Pull Request #44794 · huggingface/transformers

ArthurZucker · 2026-03-17T11:35:33Z

What does this PR do?

Adds support for a more generic path, aligned with the rest of the loading!

model	PR	main
"gdax/Qwen1.5-MoE-A2.7B_gguf"	1min 5s	1min 18s

HuggingFaceDocBuilderDev · 2026-03-17T13:13:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-03-17T17:01:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

* Fix unexpected `position_ids` keys when loading OwlViT models Older OwlViT checkpoints saved `position_ids` as buffers in the text and vision embedding modules. These tensors are simple integer ranges and are now recomputed dynamically during initialization. This results in `UNEXPECTED` key warnings when loading models such as `google/owlvit-base-patch32`. Add the corresponding patterns to `_keys_to_ignore_on_load_unexpected` to suppress these warnings. * Fix OwlViT copy consistency for owlv2

* Add GreedyLR adaptive learning rate scheduler Add GreedyLR, a metric-based scheduler that increases LR on improvement and decreases on plateau, based on arxiv.org/abs/2512.14527. - Add GreedyLR class and get_greedy_schedule() to optimization.py - Add StreamingAverage helper for metric smoothing - Integrate with Trainer via ReduceLROnPlateau-style metric stepping - Add GREEDY to SchedulerType enum and TrainingArguments validation - Add comprehensive tests in tests/optimization/test_greedy_lr.py - Add example script and documentation * Address review comments: rename examples/greedy-lr to examples/scheduler, delete .gitignore, add trainer integration tests

* feat(ci): added a network debug report * xdist-aware for parallel runs * fix fmt * moved the hooks to tests/utils/test_network_logging.py * forgot to add the new file * use plugin approach * rename env variables * narrow public API * fix the env name in circleci

* Added Model Documentation. * Added conversion_mapping weight renamings * Added Auto Mappings. * init * Modular jina_embeddings_v3 * modular -> modeling + config * __init__.py * Created folder for tests * Added documentation for the jina-embeddings-v3 Model * Tests * Update Tests * Update Tests * Update modular * Fix failing test * scope * Update modular, Add docstring for adapter_mask * Testing * Fix failing test * Added IntegrationTests * Updated model doc date * post_init() * make style. * adapter_mask gone * Better Modular * Add conversion_mapping * Modular -> Modeling + Config * Update model doc * Update tests * Small fix * make fix-repo * fix _tied_weights_keys * self.is_causal=False * Add tie_word_embeddings in configuration class * small fix in configuration doc-string * config update * fix check_docstrings.py * ruff: Reformat * Remove extra args from config * update tests + model doc * Better, modern modular * make fix-repo * Update conversion mapping * fix dropout * Better modular * Update conversion mapping * Update tests * Update docs * Better modular * Fix license * Fix date * Better modular, Configuration * make fix-repo * Fix config * Use autodocstring * lets use auto * hmm is it this * make hf version * my bad... * retry whats up with ci * ci pls --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

#44808) * init * fix * add image processor test * add mobile_rec * fix * fix * fix code style * add mobile_rec * fix * fix toctree * update * cleanup inits and docs etc * dang * make separate auto model for text recognition --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

…enizer class on the hub (#44801) * deepseek and modernbert * deepseek v2

Fix formatting of code block in weightconverter.md

* Avoid multiple fix_mistral_regex (KeyError) * add regression test * style * fix --------- Co-authored-by: Leonardo Emili <lemili@apple.com> Co-authored-by: vasqu <antonprogamer@gmail.com>

* Stacked commits * New config * good logged and silent deletion * nits * better config * review * Update example script * Test for new flag * Better benchmarking of CB * Rebase fixes * Avoid deleting an non-existing arg * No refering non existing var * Remove useless wrapper * Fix a bug where padding was not handled * Update tests * Trying to solve tests * Fix tests * style * Claude review * Exaplantions in cache paged * grammar * style2 * Review compliance * New doc * Fix docs * Fix repo consistency

Fix annotations reader for python 3.14

#44782) fix: pass device to torch.arange in XLNet relative_positional_encoding

When loading tokenizers like vesteinn/ScandiBERT whose tokenizer_config specifies XLMRobertaTokenizer (model=Unigram) but whose tokenizer.json contains a dict-type vocab, the expression vocab[0] raises KeyError because dict keys are strings, not integers. Add an isinstance(vocab, list) guard so the list-to-tuple conversion is only attempted on list vocabs.

* v5-style AFMoE impl * don't unnecessarily return router logits * inherit MoE code and refactor for stylistic consistency * remove pointless type alias * remove legacy cache reference * type and lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>

* remove from generation * update tests * more tests * fix * doc * last changes * aqlm slipped through * add bc for remote code models * anton's review * add warning

* init refactor * Fix llava * changes after review * update first batch of image processors * refactor part 2 * improve base image processor class, move backends to separate file * refactor to have backends in separate files, with backends now inheriting from BaseImageProcessor * fix docstrings * update some image processors to new refactored standards * refactor more image processors * refactor more image processors * refactor more fast image processors * refactor more image processors * refactor more image processor * improve compatibility with video processors * refactor more image processors * add more image processors, improve compatibility with video processors * support for modular * refactor modular ima proc * refactor more modular image processors * adjustments before merge * fimish image processors refactor * update docs * add fallback to Pil backend for backward compat * fix repo * Fix all processors and image processors tests * fix modular and style * fix docs * fix remote code backward compatibility + super in lists * Update docs and add new model like cli * fix processor tests * relax test tvp (used to be skipped) * fix 4 channels oneformer * Changes after review * Fixes after review * Fix tests * Change imports in modeling tests to minimize integration tests changes * fix wrong import * fix import and missing doc * fix typo PI0 * Fix all integration tests * Fix after review, enforce protected torch/torchvision imports in pil image processors (directly in modular model converter) * Fix style * Fix test modeling depth pro * Fix processing_idefics * Fixes after merge * _rescale_and_normalize -> rescale_and_normalize * fix-repo

relu

* enable tp for benchmark Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * refine code Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>

* update eos and q-lora-ranl * oops, wrong name for class

* Propagate the model loading from transformers serve to chat * Docs and tests * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * logigng update * Adjust docs re Marc's comment * Remove model name if too long for current console size * Refactor dual model loading w/ locks --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

fix

* init * fix doc * update * update * update * update * update * update * update * update * refactor image_processor_fast * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * upddate * update * update * update * update * update * update * update * update * update * update * update * update * small fixes * more explicit skip msg * some quick fixes * fix * quick cleanups * update * update * update * update * update * update * update * fixup after new refactor * fix * update * update * last fixups * update * remove my todos I left there --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* added cache * added make typing * use explicit call

fix

nemotron-config

* align to other mambas * oupsi * fix

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…44873) * Fix Qwen3.5 rope_deltas persistence causing crash in online RL training * Extend * Extend

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Init with zeros instead of empty in _move_missing_keys_from_meta_to_device

* Fix MoE routers returning probabilities instead of logits * Propagate modular fix to modeling files via make fix-repo --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Fix unintended Hub metadata calls from _patch_mistral_regex * ruff fixes * pass local files only * Cache and fail-closed model_info call, add regression tests - Wrap is_base_mistral with lru_cache so repeated loads of the same repo id (notebooks, rollout loops, DDP workers) don't each hit the Hub. - Swallow any Hub error in model_info — a 5xx/ratelimit/network hiccup must not block tokenizer init for non-Mistral models. - Add regression tests: (a) local_files_only=True never calls model_info, (b) a Hub failure does not break _patch_mistral_regex. --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>

fix

* suppress warning if int * remove override, not needed anymore * also youtu --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

…rmatting) (#45370) docs: fix docstring errors in Gemma3nTextConfig Fix five documentation errors in Gemma3nTextConfig docstring: - Typo: "emebeddings" → "embeddings" - Incomplete sentence for altup_active_idx (truncated at "or correct") - Grammar: "should be make" → "should make" in altup_num_inputs - Grammar: "number of layer" → "number of layers" in num_kv_shared_layers - Formatting: add missing backticks around type annotations for laurel_rank and activation_sparsity_pattern to match HF docstring conventions Both modular_gemma3n.py (source of truth) and the generated configuration_gemma3n.py are updated in sync. Built by Rudrendu Paul, developed with Claude Code Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>

…44949) * Fix NotebookProgressCallback to allow evaluate() before and after train * Add unit test for NotebookProgressCallback evaluating before and after training * Skip NotebookProgressCallback tests when IPython is not installed * Display eval metrics when training tracker is None on NotebookProgressCallback * Add is_ipython_available and require_ipython test decorator * Filter model_preparation_time metric and add code comments in on_eval

…ock.forward (#45352) * fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward * fix: propagate Qwen3MoeSparseMoeBlock forward return type fix to generated vl_moe and omni_moe files Built by Rudrendu Paul, developed with Claude Code --------- Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>

…training (#45329) * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * Apply repo consistency fixes * changes * changes * changes * changes * changes * changes * changes * changes * chore: empty commit --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix #45305 + add regression test GAS * Refine test model_accepts_loss_kwargs * fix style * Fix properly setup model_accepts_loss_kwargs+True * Update tests/trainer/test_trainer.py remove unnecessary parameters Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix: simplify error messages, back to a simpler test * feat: add new test with actual training --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

… generation (#45368) ProcessorMixin subclasses (e.g. Qwen3VLProcessor) expose the fast tokenizer at .tokenizer, not ._tokenizer. Use getattr() to handle both ProcessorMixin and PreTrainedTokenizerFast when extracting the rust tokenizer backend for DirectStreamer and CBStreamer. Fixes #45362 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…Error (#45359) Fixes #45356 Remove `kimi_k25` from `MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS` — its remote `TikTokenTokenizer` is the only correct backend (no `tokenizer.json`, non-sequential added-token IDs that `TokenizersBackend` cannot reproduce). Also fix `_patch_mistral_regex`: the method receives the raw `tokenizers.Tokenizer` object, which has `.pre_tokenizer` directly, not `.backend_tokenizer.pre_tokenizer`. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rebased onto current main to resolve 274 commits behind - Added pytest-benchmark to _deps list and dependency_versions_table - Fixed ruff linting issues (quoted type annotations, formatting) - Resolved rebase conflicts in core_model_loading.py and modeling_utils.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

KartikPawade and others added 28 commits March 18, 2026 16:27

deepseek_v2, deepseek_v3, and modernbert fix for having incorrect tok…

16a5b09

…enizer class on the hub (#44801) * deepseek and modernbert * deepseek v2

Correct code block formatting in weightconverter.md (#44839)

529504b

Fix formatting of code block in weightconverter.md

Fix KeyError when patching mistral regex (#43376)

cecacd3

* Avoid multiple fix_mistral_regex (KeyError) * add regression test * style * fix --------- Co-authored-by: Leonardo Emili <lemili@apple.com> Co-authored-by: vasqu <antonprogamer@gmail.com>

Fix annotations reader for python 3.14 in PreTrainedModel (#44672)

be6cf08

Fix annotations reader for python 3.14

fix: XLNet: relative_positional_encoding computes on CPU every forward (

70e454c

#44782) fix: pass device to torch.arange in XLNet relative_positional_encoding

[generate] Never use cache_position anymore in generation (#44816)

7805ec2

* remove from generation * update tests * more tests * fix * doc * last changes * aqlm slipped through * add bc for remote code models * anton's review * add warning

Fix glm dsa (#44564)

92c6258

relu

Update some type hints (#44851)

f4ed5b8

* update eos and q-lora-ranl * oops, wrong name for class

[Mistral] Fix query scaling for Mistral4 and Ministral3 (#44860)

b96f8a9

fix

feat: added cache to the model linter (#44790)

2cd52c2

* added cache * added make typing * use explicit call

Fix nemotron_h modular (#44876)

8dc7a52

fix

Fix nemotron config docstrings (#44878)

f5e5730

nemotron-config

Align lfm2 cache to other mamba caches (#44866)

cef2830

* align to other mambas * oupsi * fix

Fix layer_types type hint for AFMoE and Llama4 (#44874)

1229e90

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Fix VL model rope_deltas batch size mismatch in online RL training (#…

b7164ec

…44873) * Fix Qwen3.5 rope_deltas persistence causing crash in online RL training * Extend * Extend

Add missing dunder methods to SizeDict (#44884)

7cd9b98

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

tarekziade and others added 27 commits April 13, 2026 08:18

remove cache file from tree (#45392)

a001f34

Fix NaN weights on non-rank-0 FSDP processes (#45050)

ff49f7c

Init with zeros instead of empty in _move_missing_keys_from_meta_to_device

Fix MoE routers returning probabilities instead of logits (#45131)

d294169

* Fix MoE routers returning probabilities instead of logits * Propagate modular fix to modeling files via make fix-repo --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>

[Tokenizers] Move gpt sw3 tokenizer out (#45404)

2c498df

fix

Less unnecessary RoPE warnings (#45289)

19a7c10

* suppress warning if int * remove override, not needed anymore * also youtu --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Draft AI SLOPE

b2cb4ee

some updates

98c1d64

more simplifcations

ba11c40

looks better

c8070fe

nits

2bbaa3c

add pytest-benchmark test

d0b5b53

fixes

9e2a252

test bigger model

cf65754

fix qwen map

3481077

partial fix

5934cfd

?

c9c1e28

up

4be6816

Merge remote-tracking branch 'origin/update-gguf' into update-gguf

52e2bf1

tarekziade mentioned this pull request Apr 28, 2026

Refacto GGUF weight conversion tarekziade/tarekziade-transformers-reviewer-test#8

Open

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refacto GGUF weight conversion#44794

Refacto GGUF weight conversion#44794
ArthurZucker wants to merge 281 commits intomainfrom
update-gguf

ArthurZucker commented Mar 17, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ArthurZucker commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ArthurZucker commented Mar 17, 2026 •

edited

Loading