fix(ernie4_5_vl_moe): resolve three config loading failures for ERNIE-4.5-VL MoE models by avarga1 · Pull Request #45275 · huggingface/transformers

avarga1 · 2026-04-07T02:22:26Z

Problem

AutoConfig.from_pretrained("baidu/ERNIE-4.5-VL-28B-A3B-Paddle", trust_remote_code=True) raises errors that prevent the model from loading at all. Three separate bugs compound each other:

Bug 1 — `model_type` mismatch (KeyError on load)

The published checkpoint uses "model_type": "ernie4_5_moe_vl" in its config.json, but the transformers class is registered as "ernie4_5_vl_moe". Since there is no auto_map in the checkpoint's config, AutoConfig hits a KeyError and raises:

ValueError: The checkpoint you are trying to load has model type `ernie4_5_moe_vl`
but Transformers does not recognize this architecture.

Fix: Add "ernie4_5_moe_vl" alias in SPECIAL_MODEL_TYPE_TO_MODULE_NAME (pointing to the ernie4_5_vl_moe module), CONFIG_MAPPING_NAMES, and MODEL_NAMES_MAPPING.

Bug 2 — `rope_theta` validation skipped (silent misconfiguration)

PreTrainedConfig.__init__ only triggered convert_rope_params_to_dict when rope_theta was present in **kwargs. However, Ernie4_5Config.__init__ consumes rope_theta as a named parameter (sets self.rope_theta = 500000) before calling super().__init__(**kwargs) — so rope_theta is never in kwargs. The RoPE standardization branch never fires.

Fix: Also check getattr(self, "rope_theta", None) is not None so the conversion path fires correctly when rope_theta was already set as an instance attribute.

Bug 3 — `moe_num_experts` type too narrow (StrictDataclassFieldValidationError)

Ernie4_5_VLMoeTextConfig declares moe_num_experts: int | None = 64, but the published checkpoint supplies "moe_num_experts": [64, 64] — a per-layer list. The @strict dataclass validator rejects the list.

Fix: Widen the type annotation to int | list[int] | None.

Verification

from transformers import AutoConfig

cfg = AutoConfig.from_pretrained("baidu/ERNIE-4.5-VL-28B-A3B-Paddle", trust_remote_code=True)
# Before: raises ValueError (model_type not recognized)
# After:  Ernie4_5_VLMoeConfig — loads cleanly

assert type(cfg).__name__ == "Ernie4_5_VLMoeConfig"
assert cfg.text_config.rope_parameters["rope_theta"] == 500000
assert cfg.text_config.moe_num_experts == [64, 64]

…L MoE models Three issues prevented AutoConfig from loading baidu/ERNIE-4.5-VL-28B-A3B-Paddle: 1. model_type mismatch: the published checkpoint uses "ernie4_5_moe_vl" but transformers registers the class as "ernie4_5_vl_moe". Add "ernie4_5_moe_vl" alias in SPECIAL_MODEL_TYPE_TO_MODULE_NAME, CONFIG_MAPPING_NAMES, and MODEL_NAMES_MAPPING so AutoConfig resolves it to Ernie4_5_VLMoeConfig. 2. rope_theta validation failure: PreTrainedConfig.__init__ only triggered convert_rope_params_to_dict when rope_theta was present in **kwargs, but Ernie4_5Config.__init__ consumes rope_theta as a named parameter before calling super().__init__(). Also check getattr(self, "rope_theta", None) so the RoPE standardization path fires correctly. 3. moe_num_experts type error: Ernie4_5_VLMoeTextConfig declared the field as int | None but the checkpoint supplies a list [64, 64] for per-layer expert counts. Widen the type to int | list[int] | None.

zucchini-nlp · 2026-04-07T12:25:41Z

We support the model without trust_remote_code=True though, is there any reason you want to load with custom code?

…ed file

avarga1 · 2026-04-07T17:58:03Z

Fair point — trust_remote_code=True was just leftover from my debugging session and shouldn't have been in the verification snippet. The three bugs are all in transformers' native code (auto-mapping alias, PreTrainedConfig rope_theta path, and the config dataclass type annotation) — none of them require remote code to reproduce.

Updated the snippet:

from transformers import AutoConfig
cfg = AutoConfig.from_pretrained("baidu/ERNIE-4.5-VL-28B-A3B-Paddle")
assert type(cfg).__name__ == "Ernie4_5_VLMoeConfig"
assert cfg.text_config.rope_parameters["rope_theta"] == 500000
assert cfg.text_config.moe_num_experts == [64, 64]

Also just pushed a fix for the check_repository_consistency CI failure — the moe_num_experts type override needed to be in the modular source file (modular_ernie4_5_vl_moe.py), not just the generated config.

zucchini-nlp · 2026-04-07T18:02:35Z

        ("ernie", "ErnieConfig"),
        ("ernie4_5", "Ernie4_5Config"),
        ("ernie4_5_moe", "Ernie4_5_MoeConfig"),
+        ("ernie4_5_moe_vl", "Ernie4_5_VLMoeConfig"),


@vasqu for this, i remember you were changing model types recently

Yea this looks like the only valid change IF we change the config model type of the hub PRs - this would imply that we support 2 model types for one model which is not in our code base, i.e. a dirty workaround.

Imo, we should sync with vLLM support first / change the model type there. But that needs v5 support first, so I'd like to withhold on this PR for now and potentially "fix" on vLLM side instead

zucchini-nlp · 2026-04-07T18:08:39Z

-        elif kwargs.get("rope_scaling") and kwargs.get("rope_theta"):
+        elif kwargs.get("rope_scaling") and (
+            kwargs.get("rope_theta") or getattr(self, "rope_theta", None) is not None
+        ):


the rope is set to 50k even without this line. I see that the text config class has a rope_parameters field with default None so we will go by the first if path

…edback

avarga1 · 2026-04-07T18:42:33Z

Good catch — reverted. The rope_theta path is already handled via rope_parameters on the text config, so that change was unnecessary. PR is now just the two remaining fixes: the model_type alias and the moe_num_experts type widening.

github-actions · 2026-04-07T18:42:58Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ernie4_5_vl_moe

avarga1 · 2026-04-07T18:52:39Z

The tests_processors failure (AttributeError: NewTokenizer has no attribute special_attribute_present) appears to be a pre-existing flaky test unrelated to this PR — it's in test_processor_auto.py::AutoFeatureExtractorTest::test_from_pretrained_dynamic_processor and involves dynamic Hub tokenizer registration, which this PR doesn't touch. Happy to rerun if needed.

github-actions · 2026-04-07T19:01:28Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45275&sha=c55854

zucchini-nlp · 2026-04-08T13:19:31Z

Thanks, lets wait for "vasqu" who added the model and has more context on recent naming changed

vasqu

I would like to wait for v5 getting into vLLM and then see how we go about it + adjust the config for the integration there. As of now, it does not make much sense to me to have this merged as we need to rely on revisions either way and these are possible without this

vasqu · 2026-04-09T16:09:10Z

@@ -149,6 +149,7 @@ class Ernie4_5_VLMoeTextConfig(Ernie4_5_MoeConfig):
    pad_token_id: int | None = None
    eos_token_id: int | list[int] | None = None
    bos_token_id: int | None = None
+    moe_num_experts: int | list[int] | None = 64


This is still for remote only, no? The transformers version should not have a list for these as they are always the same size

nope, this one is valid for strict-type validation 😓

update: actually no, hf-converted configs won't have list of ints (see https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT/discussions/11/files)

discussed internally, it's only the remote configs as well

vasqu · 2026-04-09T16:11:32Z

        ("ernie", "ErnieConfig"),
        ("ernie4_5", "Ernie4_5Config"),
        ("ernie4_5_moe", "Ernie4_5_MoeConfig"),
+        ("ernie4_5_moe_vl", "Ernie4_5_VLMoeConfig"),


Yea this looks like the only valid change IF we change the config model type of the hub PRs - this would imply that we support 2 model types for one model which is not in our code base, i.e. a dirty workaround.

Imo, we should sync with vLLM support first / change the model type there. But that needs v5 support first, so I'd like to withhold on this PR for now and potentially "fix" on vLLM side instead

avarga1 · 2026-04-09T16:40:19Z

Thanks, that makes sense.

I agree that if this only applies to remote configs and the cleaner fix belongs on the vLLM / hub side after v5 support lands, then it probably shouldn't be forced into core Transformers right now.

I opened this mainly because the current loading path exposed a few compounding mismatches, but I'm happy to defer if the right place to resolve them is upstream in the integration flow instead of here.

If helpful, I can narrow this PR to only the change(s) that are still considered valid, or close it and revisit once the vLLM side is aligned.

vasqu · 2026-04-09T16:43:48Z

Hey @avarga1, is there anything needed from this PR then?

Everything should work fine as long as you don't use trust_remote_code=True and pass the correct revision (see the docs for usage examples). Let us know if there is indeed something breaking. It is indeed smarter to wait for now imo

avarga1 · 2026-04-09T16:57:21Z

Makes sense — I ran into this while integrating a model I'm training locally and trust_remote_code was required for it to load. Happy to keep that workaround on my end for now and revisit properly post-v5. Closing.

avarga1 mentioned this pull request Apr 7, 2026

fix: add ernie4_5_moe_vl to _AUTO_CONFIG_KWARGS_OVERRIDES (has_no_defaults_at_init) vllm-project/vllm#39142

Open

avarga1 force-pushed the fix/ernie4-5-vl-moe-config-loading branch from b93e005 to 98ad854 Compare April 7, 2026 02:52

avarga1 force-pushed the fix/ernie4-5-vl-moe-config-loading branch from 98ad854 to 1219f83 Compare April 7, 2026 02:56

avarga1 and others added 2 commits April 7, 2026 09:28

Merge branch 'main' into fix/ernie4-5-vl-moe-config-loading

65b6cc2

fix: override moe_num_experts type in modular source to match generat…

a770099

…ed file

fix: sync modular docstring with generated config

e169dc9

zucchini-nlp reviewed Apr 7, 2026

View reviewed changes

revert: remove unnecessary rope_theta getattr check per maintainer fe…

c558547

…edback

vasqu reviewed Apr 9, 2026

View reviewed changes

avarga1 closed this Apr 9, 2026

evalstate added a commit to evalstate/transformers that referenced this pull request Apr 28, 2026

Apply PR huggingface#45275 ERNIE VL MoE config loading fix

0a72588

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Conversation

avarga1 commented Apr 7, 2026

Problem

Bug 1 — model_type mismatch (KeyError on load)

Bug 2 — rope_theta validation skipped (silent misconfiguration)

Bug 3 — moe_num_experts type too narrow (StrictDataclassFieldValidationError)

Verification

Related

Uh oh!

zucchini-nlp commented Apr 7, 2026

Uh oh!

avarga1 commented Apr 7, 2026

Uh oh!

zucchini-nlp Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

avarga1 commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

avarga1 commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

zucchini-nlp commented Apr 8, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

avarga1 commented Apr 9, 2026

Uh oh!

vasqu commented Apr 9, 2026

Uh oh!

avarga1 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bug 1 — `model_type` mismatch (KeyError on load)

Bug 2 — `rope_theta` validation skipped (silent misconfiguration)

Bug 3 — `moe_num_experts` type too narrow (StrictDataclassFieldValidationError)

zucchini-nlp Apr 9, 2026 •

edited

Loading