Skip to content

Refacto GGUF weight conversion#44794

Draft
ArthurZucker wants to merge 281 commits intomainfrom
update-gguf
Draft

Refacto GGUF weight conversion#44794
ArthurZucker wants to merge 281 commits intomainfrom
update-gguf

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Mar 17, 2026

What does this PR do?

Adds support for a more generic path, aligned with the rest of the loading!

model PR main
"gdax/Qwen1.5-MoE-A2.7B_gguf" 1min 5s 1min 18s

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

KartikPawade and others added 28 commits March 18, 2026 16:27
* Fix unexpected `position_ids` keys when loading OwlViT models

Older OwlViT checkpoints saved `position_ids` as buffers in the text and
vision embedding modules. These tensors are simple integer ranges and are
now recomputed dynamically during initialization.

This results in `UNEXPECTED` key warnings when loading models such as
`google/owlvit-base-patch32`.

Add the corresponding patterns to `_keys_to_ignore_on_load_unexpected`
to suppress these warnings.

* Fix OwlViT copy consistency for owlv2
* Add GreedyLR adaptive learning rate scheduler

Add GreedyLR, a metric-based scheduler that increases LR on improvement
and decreases on plateau, based on arxiv.org/abs/2512.14527.

- Add GreedyLR class and get_greedy_schedule() to optimization.py
- Add StreamingAverage helper for metric smoothing
- Integrate with Trainer via ReduceLROnPlateau-style metric stepping
- Add GREEDY to SchedulerType enum and TrainingArguments validation
- Add comprehensive tests in tests/optimization/test_greedy_lr.py
- Add example script and documentation

* Address review comments: rename examples/greedy-lr to examples/scheduler, delete .gitignore, add trainer integration tests
* feat(ci): added a network debug report

* xdist-aware for parallel runs

* fix fmt

* moved the hooks to tests/utils/test_network_logging.py

* forgot to add the new file

* use plugin approach

* rename env variables

* narrow public API

* fix the env name in circleci
* Added Model Documentation.

* Added conversion_mapping weight renamings

* Added Auto Mappings.

* init

* Modular jina_embeddings_v3

* modular -> modeling + config

* __init__.py

* Created folder for tests

* Added documentation for the jina-embeddings-v3 Model

* Tests

* Update Tests

* Update Tests

* Update modular

* Fix failing test

* scope

* Update modular, Add docstring for adapter_mask

* Testing

* Fix failing test

* Added IntegrationTests

* Updated model doc date

* post_init()

* make style.

* adapter_mask gone

* Better Modular

* Add conversion_mapping

* Modular -> Modeling + Config

* Update model doc

* Update tests

* Small fix

* make fix-repo

* fix _tied_weights_keys

* self.is_causal=False

* Add tie_word_embeddings in configuration class

* small fix in configuration doc-string

* config update

* fix check_docstrings.py

* ruff: Reformat

* Remove extra args from config

* update tests + model doc

* Better, modern modular

* make fix-repo

* Update conversion mapping

* fix dropout

* Better modular

* Update conversion mapping

* Update tests

* Update docs

* Better modular

* Fix license

* Fix date

* Better modular, Configuration

* make fix-repo

* Fix config

* Use autodocstring

* lets use auto

* hmm is it this

* make hf version

* my bad...

* retry whats up with ci

* ci pls

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
#44808)

* init

* fix

* add image processor test

* add mobile_rec

* fix

* fix

* fix code style

* add mobile_rec

* fix

* fix toctree

* update

* cleanup inits and docs etc

* dang

* make separate auto model for text recognition

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…enizer class on the hub (#44801)

* deepseek and modernbert

* deepseek v2
Fix formatting of code block in weightconverter.md
* Avoid multiple fix_mistral_regex (KeyError)

* add regression test

* style

* fix

---------

Co-authored-by: Leonardo Emili <lemili@apple.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
* Stacked commits

* New config

* good logged and silent deletion

* nits

* better config

* review

* Update example script

* Test for new flag

* Better benchmarking of CB

* Rebase fixes

* Avoid deleting an non-existing arg

* No refering non existing var

* Remove useless wrapper

* Fix a bug where padding was not handled

* Update tests

* Trying to solve tests

* Fix tests

* style

* Claude review

* Exaplantions in cache paged

* grammar

* style2

* Review compliance

* New doc

* Fix docs

* Fix repo consistency
#44782)

fix: pass device to torch.arange in XLNet relative_positional_encoding
When loading tokenizers like vesteinn/ScandiBERT whose tokenizer_config
specifies XLMRobertaTokenizer (model=Unigram) but whose tokenizer.json
contains a dict-type vocab, the expression vocab[0] raises KeyError
because dict keys are strings, not integers. Add an isinstance(vocab,
list) guard so the list-to-tuple conversion is only attempted on list
vocabs.
* v5-style AFMoE impl

* don't unnecessarily return router logits

* inherit MoE code and refactor for stylistic consistency

* remove pointless type alias

* remove legacy cache reference

* type and lint

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
* remove from generation

* update tests

* more tests

* fix

* doc

* last changes

* aqlm slipped through

* add bc for remote code models

* anton's review

* add warning
* init refactor

* Fix llava

* changes after review

* update first batch of image processors

* refactor part 2

* improve base image processor class, move backends to separate file

* refactor to have backends in separate files, with backends now inheriting from BaseImageProcessor

* fix docstrings

* update some image processors to new refactored standards

* refactor more image processors

* refactor more image processors

* refactor more fast image processors

* refactor more image processors

* refactor more image processor

* improve compatibility with video processors

* refactor more image processors

* add more image processors, improve compatibility with video processors

* support for modular

* refactor modular ima proc

* refactor more modular image processors

* adjustments before merge

* fimish image processors refactor

* update docs

* add fallback to Pil backend for backward compat

* fix repo

* Fix all processors and image processors tests

* fix modular and style

* fix docs

* fix remote code backward compatibility + super in lists

* Update docs and add new model like cli

* fix processor tests

* relax test tvp (used to be skipped)

* fix 4 channels oneformer

* Changes after review

* Fixes after review

* Fix tests

* Change imports in modeling tests to minimize integration tests changes

* fix wrong import

* fix import and missing doc

* fix typo PI0

* Fix all integration tests

* Fix after review, enforce protected torch/torchvision imports in pil image processors (directly in modular model converter)

* Fix style

* Fix test modeling depth pro

* Fix processing_idefics

* Fixes after merge

* _rescale_and_normalize -> rescale_and_normalize

* fix-repo
* enable tp for benchmark

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* refine code

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
* update eos and q-lora-ranl

* oops, wrong name for class
* Propagate the model loading from transformers serve to chat

* Docs and tests

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* logigng update

* Adjust docs re Marc's comment

* Remove model name if too long for current console size

* Refactor dual model loading w/ locks

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* init

* fix doc

* update

* update

* update

* update

* update

* update

* update

* update

* refactor image_processor_fast

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* upddate

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* small fixes

* more explicit skip msg

* some quick fixes

* fix

* quick cleanups

* update

* update

* update

* update

* update

* update

* update

* fixup after new refactor

* fix

* update

* update

* last fixups

* update

* remove my todos I left there

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* added cache

* added make typing

* use explicit call
* align to other mambas

* oupsi

* fix
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…44873)

* Fix Qwen3.5 rope_deltas persistence causing crash in online RL training

* Extend

* Extend
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
tarekziade and others added 27 commits April 13, 2026 08:18
Init with zeros instead of empty in _move_missing_keys_from_meta_to_device
* Fix MoE routers returning probabilities instead of logits

* Propagate modular fix to modeling files via make fix-repo

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
* Fix unintended Hub metadata calls from _patch_mistral_regex

* ruff fixes

* pass local files only

* Cache and fail-closed model_info call, add regression tests

- Wrap is_base_mistral with lru_cache so repeated loads of the same repo
  id (notebooks, rollout loops, DDP workers) don't each hit the Hub.
- Swallow any Hub error in model_info — a 5xx/ratelimit/network hiccup
  must not block tokenizer init for non-Mistral models.
- Add regression tests: (a) local_files_only=True never calls
  model_info, (b) a Hub failure does not break _patch_mistral_regex.

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* suppress warning if int

* remove override, not needed anymore

* also youtu

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…rmatting) (#45370)

docs: fix docstring errors in Gemma3nTextConfig

Fix five documentation errors in Gemma3nTextConfig docstring:
- Typo: "emebeddings" → "embeddings"
- Incomplete sentence for altup_active_idx (truncated at "or correct")
- Grammar: "should be make" → "should make" in altup_num_inputs
- Grammar: "number of layer" → "number of layers" in num_kv_shared_layers
- Formatting: add missing backticks around type annotations for
  laurel_rank and activation_sparsity_pattern to match HF docstring
  conventions

Both modular_gemma3n.py (source of truth) and the generated
configuration_gemma3n.py are updated in sync.

Built by Rudrendu Paul, developed with Claude Code

Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>
…44949)

* Fix NotebookProgressCallback to allow evaluate() before and after train

* Add unit test for NotebookProgressCallback evaluating before and after training

* Skip NotebookProgressCallback tests when IPython is not installed

* Display eval metrics when training tracker is None on NotebookProgressCallback

* Add is_ipython_available and require_ipython test decorator

* Filter model_preparation_time metric and add code comments in on_eval
…ock.forward (#45352)

* fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward

* fix: propagate Qwen3MoeSparseMoeBlock forward return type fix to generated vl_moe and omni_moe files

Built by Rudrendu Paul, developed with Claude Code

---------

Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>
…training (#45329)

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* Apply repo consistency fixes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* chore: empty commit

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix #45305 + add regression test GAS

* Refine test model_accepts_loss_kwargs

* fix style

* Fix properly setup model_accepts_loss_kwargs+True

* Update tests/trainer/test_trainer.py

remove unnecessary parameters

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix: simplify error messages, back to a simpler test

* feat: add new test with actual training

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
… generation (#45368)

ProcessorMixin subclasses (e.g. Qwen3VLProcessor) expose the fast tokenizer
at .tokenizer, not ._tokenizer. Use getattr() to handle both ProcessorMixin
and PreTrainedTokenizerFast when extracting the rust tokenizer backend for
DirectStreamer and CBStreamer.

Fixes #45362

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…Error (#45359)

Fixes #45356

Remove `kimi_k25` from `MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS` — its
remote `TikTokenTokenizer` is the only correct backend (no `tokenizer.json`,
non-sequential added-token IDs that `TokenizersBackend` cannot reproduce).

Also fix `_patch_mistral_regex`: the method receives the raw
`tokenizers.Tokenizer` object, which has `.pre_tokenizer` directly,
not `.backend_tokenizer.pre_tokenizer`.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rebased onto current main to resolve 274 commits behind
- Added pytest-benchmark to _deps list and dependency_versions_table
- Fixed ruff linting issues (quoted type annotations, formatting)
- Resolved rebase conflicts in core_model_loading.py and modeling_utils.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.