Add full GGUF loading support for GPT‑OSS (fixes #43366, supersedes #43757) by sirzechs66 · Pull Request #45118 · huggingface/transformers

sirzechs66 · 2026-03-30T13:10:36Z

What does this PR do?

This PR adds full GGUF loading support for GPT‑OSS models (20B/120B). It allows Transformers (and consequently vLLM) to directly load GPT‑OSS GGUF files without falling back to a wrong architecture. The changes include:

Architecture registration in GGUF mappings.
A custom GptOssTensorProcessor to handle MoE expert splitting and gate/up interleaving.
Reconstruction of nested rope_scaling (YaRN) from flat GGUF metadata.
Tests: fast registration test + slow integration test using a real 20B GGUF file.

Fixes #43366, supersedes #43757.
Related vLLM issue: vllm-project/vllm#22353

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). – Not applicable, it adds a feature.
Did you read the contributor guideline, Pull Request section? – Yes.
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. – Issue GGUF model with architecture gpt-oss support #43366, discussion in comments.
Did you make sure to update the documentation with your changes? – yes!
Did you write any new necessary tests? – Yes, in test/quantization/test_ggml.py

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Tagging:

@SunMarc (original issue tagger, quantization)
@Cyrilvallez (model loading)
@ArthurZucker (tokenizers, model structure)

sirzechs66 · 2026-03-30T16:44:08Z

@SunMarc, @Cyrilvallez please feel free to look into this as i am getting the failed tests from the unedited files
thank you

SunMarc

thanks ! did a first pass

sirzechs66 · 2026-03-31T18:22:23Z

the branch was behind, when i and it merged without any conflict please review @SunMarc if any more changes are required let me know
and apologies for my naivity
happy to contribute

added dynamic rope config fetching as gpt_oss and current gguf version have different metadata (gguf only has 3 rope configs, while gpt_oss have ""factor", "attention_factor", "beta_fast", "beta_slow")
changed the test paradigm to ggml ways and removed incorrect test function

please refer to the final commit for review

sirzechs66 · 2026-04-05T06:39:39Z

@SunMarc @Cyrilvallez @ArthurZucker Please review as the requested changes have been made,
thanks

SunMarc

Thanks !

SunMarc · 2026-04-09T15:20:21Z

@bot /style

github-actions · 2026-04-09T15:20:56Z

Style fix bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2026-04-15T13:54:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sirzechs66 · 2026-04-16T03:42:33Z

@SunMarc please review, there are some unexpected errors in "test_processors" in the CI test, that made merge failed
thank you

…el __all__

* Music flamingo * Fix pos embeddings * Method arg docstrings * Add tests & docs * Fix AF3 dtype bug * Fix the MF performance issue * Fix pos embeddings * Fix embeddings & format * Remove external deps * Update processor token names * Cleanup * Simplify RotaryEmbedding to lang-only * Reuse AF3 config classes * Trim+rename rotary embedding * Call parent _init_weights first and drop rotary einsum * Precompute rotary cache at init * Use modular processor pattern for MusicFlamingo * Remove audio-only inference example * Refactor Audio Feature Casting Path * Clarify private source repo * Clean up modular * Move config to modular * Formatting * Remove dummy * Derive musicflamingo timing and rotary config * Llama style rotary embeddings * Added reproducer comments * Expose _init_weights for modular. * Satisfy repo checks * Align MusicFlamingo rotary with Llama style * Move MusicFlamingo _init_weights to encoder * Keep old behavior * Move MusicFlamingo rotary settings into encoder rope_parameters * Use AutoConfig in AF3/MF * Align MusicFlamingo RoTE with Llama RoPE conventions * Update outdated fixtures * init_weights without changing others * FIx import * Remove backward compat * Regenerate modeling for MF * Fix AF3 batch inference bug * Simplify config and nit. * Conform more to transformers convention, e.g. removing unused code paths. * Add another possible AF3 prefix. * Use auto_docstring and update docstrings. * Nits * Nit for review * Shift RoTE to main model so that encoder can be directly used from AF3. * Refactoring nit. * Fix init * Fix some failing tests * Fix AF3 & MF and add batching tests * Fix audio embedding masking (bad post length) * Nits and remove since same as GLM was bug in post length computation * Simplify MF as AF3, and style checks. * New config after merge and modular update. * Address music flamingo tests, and some cleanup. * style check * Regenerate config. * Update fixtures. * Nits * Nit * Improve RoTE config * Refine MusicFlamingo rotary time handling * Simplification, and update AF3 processor for better modular * Fix torch export * Simplify modular, including upstreaming input_ids input to get_audio_features * Remove upstreaming of input_ids to get_audio_features, and remove audio_rotary_dim. * Switch to MoonshineRotaryEmbedding, and cleanup. * Remove hardcoded MusicFlamingo partial_rotary_factor * Update fixtures * Compile re.sub * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/musicflamingo/modular_musicflamingo.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style * Update fixtures. * Conditional torch import for processor. --------- Co-authored-by: Eric B <ebezzam@gmail.com> Co-authored-by: Eric Bezzam <4757445+ebezzam@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add Cache and test on Mamba * fix * fix * fix * fix * fix * final fix * test hybrid with jamba * fix tests * fixes * fix * fix * fix * combine both types + zambas * add config mapèping * adjust tests * fix * fix * fix * more models * final mambas * config * finalize almost everything * simplify tests * simplify tests further * fix tests * oupsi * fix * fix broken no_split_modules * fix * fixes * fix * fix * fixes * add layer type * oupsi * fix * style * fix * fixes * final fix * forgot those qwens * tests * offloading * much better static shape native design * oupsi * adjustments in generate * allow cudagraphs * small oupsi * start renaming * revert unrelated what are they doing here * more renaming * revert offloading change * add offloading skips * split shapes for tests * comments and renaming

…face#45143) * Add parse_response to Processor, make it a bit more official * Make the parse_response annotation a string to avoid torch import issues * Add the same logic to any-to-any

…5150) * Correct comment in training.md for TrainingArguments Fix comment formatting for TrainingArguments instantiation. * Update docs/source/en/training.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix bug for janus model image generation Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update expected tokens Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update comment Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * use `_preapre_generation_config` Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update expected token Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update comments Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update * update * update --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…45166) * Re-add regex substitutions to the spec * make fix-repo * Update the test schema to drop the empty content block * Trigger tests

…rmatting) (huggingface#45370) docs: fix docstring errors in Gemma3nTextConfig Fix five documentation errors in Gemma3nTextConfig docstring: - Typo: "emebeddings" → "embeddings" - Incomplete sentence for altup_active_idx (truncated at "or correct") - Grammar: "should be make" → "should make" in altup_num_inputs - Grammar: "number of layer" → "number of layers" in num_kv_shared_layers - Formatting: add missing backticks around type annotations for laurel_rank and activation_sparsity_pattern to match HF docstring conventions Both modular_gemma3n.py (source of truth) and the generated configuration_gemma3n.py are updated in sync. Built by Rudrendu Paul, developed with Claude Code Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>

…uggingface#44949) * Fix NotebookProgressCallback to allow evaluate() before and after train * Add unit test for NotebookProgressCallback evaluating before and after training * Skip NotebookProgressCallback tests when IPython is not installed * Display eval metrics when training tracker is None on NotebookProgressCallback * Add is_ipython_available and require_ipython test decorator * Filter model_preparation_time metric and add code comments in on_eval

…ock.forward (huggingface#45352) * fix(qwen3_moe): correct return type annotation on Qwen3MoeSparseMoeBlock.forward * fix: propagate Qwen3MoeSparseMoeBlock forward return type fix to generated vl_moe and omni_moe files Built by Rudrendu Paul, developed with Claude Code --------- Co-authored-by: Rudrendu <RudrenduPaul@users.noreply.github.com>

…training (huggingface#45329) * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * Apply repo consistency fixes * changes * changes * changes * changes * changes * changes * changes * changes * chore: empty commit --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix huggingface#45305 + add regression test GAS * Refine test model_accepts_loss_kwargs * fix style * Fix properly setup model_accepts_loss_kwargs+True * Update tests/trainer/test_trainer.py remove unnecessary parameters Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix: simplify error messages, back to a simpler test * feat: add new test with actual training --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

… generation (huggingface#45368) ProcessorMixin subclasses (e.g. Qwen3VLProcessor) expose the fast tokenizer at .tokenizer, not ._tokenizer. Use getattr() to handle both ProcessorMixin and PreTrainedTokenizerFast when extracting the rust tokenizer backend for DirectStreamer and CBStreamer. Fixes huggingface#45362 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…Error (huggingface#45359) Fixes huggingface#45356 Remove `kimi_k25` from `MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS` — its remote `TikTokenTokenizer` is the only correct backend (no `tokenizer.json`, non-sequential added-token IDs that `TokenizersBackend` cannot reproduce). Also fix `_patch_mistral_regex`: the method receives the raw `tokenizers.Tokenizer` object, which has `.pre_tokenizer` directly, not `.backend_tokenizer.pre_tokenizer`. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…45041) * ok * fix consistency * pass qwen35 reverse mapping * update new failed test according to captured info * Revert "update new failed test according to captured info" This reverts commit 445a400. * make it optional * make fusion_mapping more general * make conv3d conversion more general * make fusion_mapping more general * better name for conversion * add fusion_mapping doc and clean tests * fix reverse mapping test follow gemma3n * chore: retrigger ci * tests: move qwen3.5 reverse mapping fix to separate branch * code clean! * ruff format and clean test to make it simple * richer doc * get converters from config rather than each module * add explict module_name check for fusion! * better isolated test and code clean * support serialized fusion_config * ruff format * config can handle unknown attributes * move fused cls out of spec by mixin * detailed comments * ruff

…le set.__contains__ (huggingface#45282) * fix torch.compile/export failures on amd * test * move imports

…uggingface#45414) * Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3 When `kernels` is installed, `@use_kernelized_func` attaches a `rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to be invoked during forward. The model's forward still calls the plain Python `apply_rotary_pos_emb`, so `rotary_fn` is never executed and the trace desynchronizes, raising `IndexError: pop from an empty deque` on the second forward. Skip attaching the kernelized submodule when ZeRO-3 is enabled; users running under ZeRO-3 fall back to the Python implementation, which is what they were getting before huggingface#41147. Fixes huggingface#45137 * Add dates to new model cards to satisfy check-repository-consistency

* Fix * First draft * Add push-to-hub options for SAM3-LiteText conversion * Fix SAM3-LiteText model tests and text encoder init stability * Add LiteText ViT auto mappings and use LiteText config * Improve conversion script * Do not require triton * Improve modeling * Fix repo * Fix repo * Add vision model to auto mapping * Add missing entries to auto mapping * reverse serve.py * simplify implementation * fix modular * Address review comments * fix repo * fix after review 2 * fix tests + repo * Address comments * Address comments * Make fix-repo * add to hub cache + fixup base sam3 as well --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com>

…nt (huggingface#45348) * Make content truly optional * style + test * improve test

* inital protoype * remove unneeded selected_experts * Revert MoE expert replay; document pattern via monkey patching Replace the intrusive record/replay implementation across modeling files with a documentation-only guide. All three pieces — the replayable router subclass, the replay context manager, and the runtime OutputRecorder registration — can be built on top of the existing monkey_patching and output_capturing APIs without touching core MoE modeling code. Also shows the one-line conversion from vLLM's CompletionOutput.routed_experts numpy array to the per-layer tuple this pattern expects, enabling RLHF workflows that generate with vLLM and train with transformers. * Preserve unrelated forward-progress in __init__.py and generic.py The previous revert commit accidentally rolled back unrelated work on these two files — version bump, TorchvisionBackend addition, and module-alias refactor. Restore those while keeping the MoE-specific additions (MoERouting export, output_moe_routing kwarg) removed.

…ggingface#45428) * PEFT integration fixes preventing save/load & integration * Rerun make style with newer ruff --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* interesting * oops * test uses better temporal positions now * fix repo * re-unite glm and qwen3-vl * add some fast tests * dummy import * missed another dummy import * move comments around and add more comments

huggingface#43047) * fix: add TimmWrapperForImageClassification to _no_split_modules in modular files and regenerate - Update modular_pe_audio_video.py and modular_pe_video.py (source of truth) - Regenerate modeling_pe_audio_video.py and modeling_pe_video.py via modular_model_converter.py - Remove @unittest.skip on test_model_parallelism now that the crash is resolved Fixes huggingface#42918 * fix: add TimmWrapperForImageClassification to _no_split_modules in pe_audio, pe_video, pe_audio_video --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…vec2-lv-60-espeak-cv-ft) (huggingface#45199) * fix: Resolve regressions from tokenizer refactor * chore: Add regression test * nit: Remove the test * fix: Expand test coverage to all tests --------- Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>

* Fix ty for transformers cli * ty * await class * style * adress comments * Fix ! * fix

"docs Dinov2: fix typo in checkpoint path from google/dinov2-base-patch16-224 to facebook/dinov2-base"

…ingface#45207) * [Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline The PLE system is complex and underdocumented, which makes it hard for third-party implementations (llama.cpp, candle, mlx, etc.) to get right. This adds: - Config docstring for hidden_size_per_layer_input explaining that the actual embedding dim is num_hidden_layers * hidden_size_per_layer_input, the embedding is scaled by sqrt(hidden_size_per_layer_input), and describing the full two-component pipeline - Docstring for get_per_layer_inputs() explaining the token-identity component and the packed-to-4D reshape - Docstring for project_per_layer_inputs() explaining the context-aware projection, normalization, and combination with scale factors - Comment on the PLE init block pointing to the pipeline methods Fixes huggingface#45206 * Address review: move PLE details to model doc, shorten config docstring Move the detailed PLE pipeline description from the config docstring to the Gemma4 model documentation page. The config docstring now just describes the parameter shape and links to the full docs. * Address review nits: move edits to modular_gemma4.py, simplify gemma4.md - Remove bold formatting and config params section from gemma4.md per review - Move docstrings and PLE comment from modeling_gemma4.py to modular_gemma4.py - Revert modeling_gemma4.py (CI regenerates it from modular) * fix: run make fix-repo to align modeling_gemma4.py with modular_gemma4.py

github-actions · 2026-04-18T08:03:31Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ggml

github-actions · 2026-04-18T08:16:37Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45118&sha=ae93f5

sirzechs66 · 2026-04-18T08:18:35Z

@SunMarc please verify these merges, and the related CI errors

sirzechs66 · 2026-04-18T09:00:54Z

@SunMarc the pr is outdated and can make pollute the github history, closing this pr
new pr #45506

sirzechs66 force-pushed the fix-transformers-issue branch from 10463a1 to a5eef4f Compare March 30, 2026 16:43

sirzechs66 mentioned this pull request Mar 30, 2026

Add full GGUF loading support for GPT‑OSS (fixes #43366) #45116

Closed

6 tasks

SunMarc reviewed Mar 31, 2026

View reviewed changes

sirzechs66 requested a review from SunMarc April 2, 2026 11:22

SunMarc approved these changes Apr 9, 2026

View reviewed changes

SunMarc enabled auto-merge April 15, 2026 11:39

auto-merge was automatically disabled April 15, 2026 12:40
Head branch was pushed to by a user without write access

SunMarc enabled auto-merge April 15, 2026 13:46

SunMarc added this pull request to the merge queue Apr 15, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 15, 2026

SunMarc added this pull request to the merge queue Apr 16, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026

sirzechs66 and others added 11 commits April 18, 2026 13:02

Add GPT-OSS GGUF support with YaRN rope scaling reconstruction

fffd5a6

Add GGUF loading test suite for GPT‑OSS

c10bb13

docs: add GGUF loading section to gpt_oss.md

aceb8f0

fix: correct import of GptOssTensorProcessor in test; remove from mod…

50ed466

…el __all__

Finalize GPT‑OSS GGUF support: move test, adjust config reconstruction

8b00561

Add parse_response to Processor, make it a bit more official (hugging…

dfbfcb2

…face#45143) * Add parse_response to Processor, make it a bit more official * Make the parse_response annotation a string to avoid torch import issues * Add the same logic to any-to-any

Re-add regex substitutions to the response parsing spec (huggingface#…

f348895

…45166) * Re-add regex substitutions to the spec * make fix-repo * Update the test schema to drop the empty content block * Trigger tests

RudrenduPaul and others added 23 commits April 18, 2026 13:13

[AMD CI] Fix torch.compile/export failures on AMD CI due to untraceab…

72c43ea

…le set.__contains__ (huggingface#45282) * fix torch.compile/export failures on amd * test * move imports

Fix apply_chat_template crash on tool_call messages without conte…

f9a970f

…nt (huggingface#45348) * Make content truly optional * style + test * improve test

Fix the response schema for the gemma4 converter (huggingface#45411)

8039d33

[fix] PEFT integration fixes preventing save/load & integration (hu…

2ba0248

…ggingface#45428) * PEFT integration fixes preventing save/load & integration * Rerun make style with newer ruff --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

Fix Qwen2.5VL temporal grid positions (huggingface#45400)

88cdf4e

* interesting * oops * test uses better temporal positions now * fix repo * re-unite glm and qwen3-vl * add some fast tests * dummy import * missed another dummy import * move comments around and add more comments

Fix ty for transformers cli (huggingface#45190)

f035445

* Fix ty for transformers cli * ty * await class * style * adress comments * Fix ! * fix

[Doc] Correct checkpoint path in Dinov2 model_docs (huggingface#45430)

48dda87

"docs Dinov2: fix typo in checkpoint path from google/dinov2-base-patch16-224 to facebook/dinov2-base"

Update workflow references to new commit hash (huggingface#45442)

68daf33

fixed docs not closing example bracket

170394a

sirzechs66 force-pushed the fix-transformers-issue branch from ba3ebfc to 170394a Compare April 18, 2026 07:52

Merge branch 'main' into fix-transformers-issue

ae93f50

sirzechs66 closed this Apr 18, 2026

Conversation

sirzechs66 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

sirzechs66 commented Mar 30, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sirzechs66 commented Mar 31, 2026

Uh oh!

sirzechs66 commented Apr 5, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2026

Uh oh!

Uh oh!

sirzechs66 commented Apr 16, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

sirzechs66 commented Apr 18, 2026

Uh oh!

sirzechs66 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

sirzechs66 commented Mar 30, 2026 •

edited

Loading

github-actions Bot commented Apr 9, 2026 •

edited

Loading