Add EXAONE-MoE implementations by nuxlear · Pull Request #43080 · huggingface/transformers

nuxlear · 2026-01-02T07:29:46Z

What does this PR do?

Add EXAONE-MoE architecture for the K-EXAONE model released by LG AI Research.

This PR adds the modeling code of EXAONE-MoE (K-EXAONE), which is available at the fork of the LG AI Research:
https://github.com/Aim-Highest/transformers
Test code and documentation will be updated.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

vasqu

Leaving some initial comments

Missing tests but you already say that they will be added
Our Moe implementation has changed for v5 <-- this is the biggest thing to change IMO but it comes with nice benefits (fullgraph compile, boosted moe performance, fp8 support OOB etc)

vasqu · 2026-01-12T16:25:21Z

@@ -0,0 +1,200 @@
+<!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.


Suggested change

<!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.

<!--Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved.

probably elsewhere as well then, happy new year :D

To replace :p

vasqu · 2026-01-12T16:32:01Z

        ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None),
        ("esm", "EsmTokenizer"),
        ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None),
+        ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None),


Suggested change

("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None),

I suspect that you need the tokenizers backend, please see #42894 for more details. Can you double-check?

As a side note, this does not require any changes on the hub repo (we autodetect this). Only if you notice that you indeed need the gpt2 tokenizer, then we will need to add this to the mapping here

Sure. I will check whether the tokenizer backend works well with EXAONE MoE (and EXAONE 4 as well).

Any update here? Can this be removed?

vasqu · 2026-01-12T16:34:21Z

                for i in range(self.num_hidden_layers)
            ]
-        if "sliding_window" in self.layer_types:
+        if "sliding_attention" in self.layer_types:


Oh wow, that's a good catch 😅

vasqu · 2026-01-12T16:39:38Z

+        self.is_moe_layer = is_moe_layer
+        if self.is_moe_layer is None:
+            self.is_moe_layer = [0] * self.first_k_dense_replace + [1] * (
+                self.num_hidden_layers - self.first_k_dense_replace
+            )


Similar to attention layers (sliding window, full etc), we also introduced it similarly for moe layers, see

transformers/src/transformers/models/ernie4_5_vl_moe/configuration_ernie4_5_vl_moe.py

Lines 227 to 231 in 5f70e19

# Default to MoE from the second layer and on

self.mlp_layer_types = mlp_layer_types

if self.mlp_layer_types is None:

self.mlp_layer_types = ["dense"] + ["sparse"] * (self.num_hidden_layers - 1)

layer_type_validation(self.mlp_layer_types, self.num_hidden_layers, attention=False)

Can you change it to that logic?

vasqu · 2026-01-12T16:42:14Z

+        if "sliding_attention" in self.layer_types:
+            self.cache_implementation = "hybrid"


Unsure if we still need this

vasqu · 2026-01-12T16:59:33Z

+class ExaoneMoEDecoderLayer(OlmoeDecoderLayer):
+    def __init__(self, config: ExaoneMoEConfig, layer_idx: int):
+        super().__init__(config, layer_idx)
+        self.self_attn = ExaoneMoEAttention(config=config, layer_idx=layer_idx)


Any reason we need this, should also be inheritable with modular, no?

vasqu · 2026-01-12T16:59:56Z

+    def __init__(self, config: ExaoneMoEConfig, layer_idx: int):
+        super().__init__(config, layer_idx)
+        self.self_attn = ExaoneMoEAttention(config=config, layer_idx=layer_idx)
+        self.mlp = ExaoneMoESparseMoEBlock(config) if config.is_moe_layer[layer_idx] else ExaoneMoEMLP(config)


See my comment about mlp_layer_types (in the config)

vasqu · 2026-01-12T17:00:49Z

+        "attentions": ExaoneMoEAttention,
+        "router_logits": ExaoneMoESparseMoEBlock,
+    }
+    _can_compile_fullgraph = False


See

transformers/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py

Lines 546 to 548 in 61d7f8a

_can_compile_fullgraph = (

is_grouped_mm_available()

) # https://huggingface.co/docs/transformers/experts_interface#torchcompile

If we get the conversion working, we can compile fullgraph

vasqu · 2026-01-12T17:01:50Z

+class ExaoneMoEForSequenceClassification(Exaone4ForSequenceClassification):
+    pass
+
+
+class ExaoneMoEForTokenClassification(Exaone4ForTokenClassification):
+    pass
+
+
+class ExaoneMoEForQuestionAnswering(Exaone4ForQuestionAnswering):
+    pass


Nit: Do we really need this? If we can, I'd like to avoid these

github-actions · 2026-01-23T15:11:33Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43080&sha=e7d79e

vasqu · 2026-01-26T13:08:56Z

@nuxlear just ping me again when it's ready for review

nuxlear · 2026-01-26T18:57:42Z

@vasqu I think it's ready for review, but make fix-repo does not seem to be consistent. (it adds extra tabs after overwriting the configuration_exaone_moe.py, which makes a CI failure)

vasqu

Looks already super clean, just a few small nit + a dummy model for our CI

vasqu · 2026-01-26T19:07:21Z

@@ -0,0 +1,200 @@
+<!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.


To replace :p

vasqu · 2026-01-26T19:08:26Z

+The K-EXAONE model is compatible with both OpenAI and HuggingFace tool calling specifications. 
+The example below demonstrates tool calling using HuggingFace’s docstring-to-tool-schema utility.
+
+Please check the [example file](examples/example_output_search.txt) for an example of a search agent conversation using K-EXAONE.


Wrong link?

Oh, I'll fix it :)

vasqu · 2026-01-26T19:08:56Z

        ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None),
        ("esm", "EsmTokenizer"),
        ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None),
+        ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None),


vasqu · 2026-01-26T19:21:29Z

+
+@require_torch
+class ExaoneMoeIntegrationTest(unittest.TestCase):
+    TEST_MODEL_ID = "LGAI-EXAONE/K-EXAONE-236B-A23B"


This will be too big for our CI, can we create a dummy model instead? (up to 24GB Vram as it's an A10 GPU)

The tests are nice tho!

Is it necessary to upload a dummy model to the HF hub?
We don't have a proper model for this, and it feels a bit awkward to upload dummy weights under our official organization.
Would it be okay if I uploaded it under my personal account instead?

Yes sure, I can also move it to our internal testing repo afterwards

https://huggingface.co/nuxlear/EXAONE-MoE-Dummy-7B-A1B
just uploaded, but I need to do more test with it.

vasqu · 2026-01-26T19:23:26Z

+    @slow
+    @require_torch_large_accelerator
+    def test_model_generation_beyond_sliding_window_flash(self):
+        EXPECTED_OUTPUT_TOKEN_IDS = [21605, 2711]
+        input_ids = [72861, 2711] * 2048
+        model = self.get_model()
+        input_ids = torch.tensor([input_ids]).to(model.model.embed_tokens.weight.device)
+
+        with torch.no_grad():
+            generated_ids = model.generate(input_ids, max_new_tokens=4, temperature=0)
+        self.assertEqual(EXPECTED_OUTPUT_TOKEN_IDS, generated_ids[0][-2:].tolist())


Would need to change get_model to pass the implementation? It should load with sdpa currently this way - we can also just rename the test

vasqu · 2026-01-26T19:24:03Z

+        input_ids = input_ids.to(model.model.embed_tokens.weight.device)
+
+        with torch.no_grad():
+            generated_ids = model.generate(**input_ids, max_new_tokens=20, temperature=0)


Suggested change

generated_ids = model.generate(**input_ids, max_new_tokens=20, temperature=0)

generated_ids = model.generate(**input_ids, max_new_tokens=20, do_sample=False)

nit: just our preferred way to do it

below as well

vasqu · 2026-01-26T20:15:55Z

+        sliding_window_pattern=4,
+        layer_types=None,
+        mlp_layer_types=None,
+        first_k_dense_replace=1,


Ah missed this: this should be mlp layer types with a list of the types. (Similar to layer types for attention)

You mean one of 'dense' and 'sparse', right?

Yes, exactly

vasqu · 2026-01-28T09:25:53Z

You can ping me when it's ready for review

nuxlear · 2026-01-28T15:42:13Z

Should I update the test code with a dummy model? I think everything else is ready.

vasqu · 2026-01-28T15:47:49Z

Should I update the test code with a dummy model? I think everything else is ready.

Yes, please 🙏 taking a look in a second then

nuxlear · 2026-01-28T17:34:28Z

It seems the current dummy model needs to be updated, so I’ll notify you when it’s ready.

vasqu

Leaving some small last comments, imo it looks very much ready! Let's cleanup the config a tad more and wrap up the integration tests then we are good to go

Just ping me again when ready, great work

vasqu · 2026-01-28T16:22:45Z

        ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None),
        ("esm", "EsmTokenizer"),
        ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None),
+        ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None),


Any update here? Can this be removed?

vasqu · 2026-01-28T16:22:59Z

@@ -0,0 +1,27 @@
+# Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.


Suggested change

# Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.

# Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved.

vasqu · 2026-01-29T15:27:54Z

+        sliding_window_pattern (`str`, *optional*, defaults to 4):
+            The pattern to use for sliding window attention. Can be one of:
+                - `None`: No sliding window attention is used
+                - `int`: Every `sliding_window` layers, use global attention, else use local attention.
+                - `str`: A sequence of "L" (local attention) and "G" (global attention) characters that defines the
+                    attention pattern. The pattern starts from layer 0 and repeats every `sliding_window` layers. The
+                    final layer always uses global attention regardless of the pattern.
+            For instance, sliding_window_pattern="LLLG" same as sliding_window=4, which means:
+                - Layer 0, 1, 2: local attention,
+                - Layer 3: global attention,
+                ...(repeated)


I'd like to avoid this if possible, and just use layertypes directly. We also start to do the same for mlp layers (moe) and it gives more flexibility with other attention flavors (e.g. linear attention (gated delta net))

I understand, and it would be better to remove them.

However, since these configs (including those below) are often used by other libraries such as llama.cpp, they should remain in the model’s config.json.

If that is acceptable, we have no reason to keep them in the config implementation. 😃

Yea, no worries not super important 👍 would be just the ideal case

vasqu · 2026-01-29T15:28:37Z

+        first_k_dense_replace (`int`, *optional*, defaults to 1):
+            Number of dense layers in shallow layers(embed->dense->dense->...->dense->moe->moe...->lm_head).
+                                                        \--k dense layers--/


In the same spirit to my comment before, let's remove this and only use mlp layer types directly

vasqu · 2026-01-29T15:30:11Z

+from ...configuration_utils import PreTrainedConfig, layer_type_validation
+
+
+class ExaoneMoeConfig(PreTrainedConfig):


Probably needs to sync with main, I recently made the Rope mixin explicit for models that use it - can you check

E.g.

transformers/src/transformers/models/exaone4/configuration_exaone4.py

Line 25 in 071e178

class Exaone4Config(PreTrainedConfig, RotaryEmbeddingConfigMixin):

(modular should do it automatically for you, just need to merge with main and apply modular again)

vasqu · 2026-01-29T15:37:47Z

+        PreTrainedConfig.__init__(
+            bos_token_id=bos_token_id, eos_token_id=eos_token_id, tie_word_embeddings=tie_word_embeddings, **kwargs
+        )


Sorry I commented directly on the config file but should be done here ofc

vasqu · 2026-01-29T15:48:54Z

Also sorry about the CI, it's still flaky here and there but it should be more stable on main

Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> Co-authored-by: Kibong Choi <rlqhd26@naver.com>

nuxlear · 2026-01-30T18:17:28Z

I’ve updated the dummy test model and the docstrings.
Could you please do a final check? @vasqu

nuxlear · 2026-02-02T03:32:09Z

@ArthurZucker @Rocketknight1 could you kindly review this PR?

ArthurZucker · 2026-02-02T11:13:36Z

Yes!

HuggingFaceDocBuilderDev · 2026-02-02T11:23:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2026-02-02T16:28:41Z

run-slow: exaone4, exaone_moe

github-actions · 2026-02-02T16:30:00Z

This comment contains run-slow, running the specified jobs:

models: ["models/exaone4", "models/exaone_moe"]
quantizations: []

github-actions · 2026-02-02T17:09:12Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	b1f6aff2	merge commit
PR	0e1e5bc6	branch commit
main	751cff7c	base commit

Model CI Report

❌ 7 new failed tests from this PR 😭

exaone4:
tests/models/exaone4/test_modeling_exaone4.py::Exaone4ModelTest::test_generate_compilation_all_outputs
tests/models/exaone4/test_modeling_exaone4.py::Exaone4ModelTest::test_generate_compile_model_forward_fullgraph
exaone_moe:
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeModelTest::test_cpu_offload
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeModelTest::test_disk_offload_bin
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeModelTest::test_disk_offload_safetensors
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeIntegrationTest::test_model_generation_beyond_sliding_window_flash
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeIntegrationTest::test_model_logits

vasqu

Some last comments from my side, fixed a few smaller issues (checking with run slow again in a second)

vasqu · 2026-02-02T16:30:20Z

+        bos_token_id (`int`, *optional*, defaults to 1):
+            Beginning of stream token id.
+        eos_token_id (`int`, *optional*, defaults to 53):
+            End of stream token id.
+        pad_token_id (`int`, *optional*, defaults to 0):
+            Padding token id.


Took this from https://huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B/blob/main/generation_config.json

A bit confused since the values were different would be nice if you could confirm these or if it should be the previous values, see 0e1e5bc

We use 53 as the end-of-turn token, while 2 is used as EOS.
Either can be used as the default value, so you can set it to 53.

vasqu · 2026-02-02T17:56:36Z

+
+        return cls.model
+
+    def test_model_logits(self):


Logits don't match on our CI, I think it's a GPU diff so let me know if I should update them myself

I agree with that. It looks like you’ll need to update them in your CI environment.

Gotcha, let me update them tomorrow then 👍 (and also copy the repo to our internal testing)

vasqu · 2026-02-02T17:57:39Z

run-slow: exaone4, exaone_moe

github-actions · 2026-02-02T17:59:01Z

This comment contains run-slow, running the specified jobs:

models: ["models/exaone4", "models/exaone_moe"]
quantizations: []

github-actions · 2026-02-02T18:36:56Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	24f1d500	merge commit
PR	1b3b159f	branch commit
main	78e4f885	base commit

Model CI Report

❌ 1 new failed tests from this PR 😭

exaone_moe:
tests/models/exaone_moe/test_modeling_exaone_moe.py::ExaoneMoeIntegrationTest::test_model_logits

ArthurZucker

Nice! 🤗

github-actions · 2026-02-03T16:23:03Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, exaone4, exaone_moe

vasqu · 2026-02-03T16:23:21Z

run-slow: exaone_moe

github-actions · 2026-02-03T16:24:40Z

This comment contains run-slow, running the specified jobs:

models: ["models/exaone_moe"]
quantizations: []

github-actions · 2026-02-03T16:30:21Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	7910da44	merge commit
PR	87f6ca4f	branch commit
main	01e860eb	base commit

✅ No failing test specific to this PR 🎉 👏 !

vasqu

I updated the values and made a copy of the repo to our internal testing repos so feel free to remove your private one @nuxlear

Merging in a second, thanks a lot for iterating 🤗

nuxlear marked this pull request as ready for review January 3, 2026 12:46

github-actions Bot requested review from ArthurZucker and Rocketknight1 January 3, 2026 12:46

nuxlear force-pushed the add-exaone-moe branch 2 times, most recently from 930a3b7 to cf89e66 Compare January 10, 2026 11:42

lkm2835 mentioned this pull request Jan 12, 2026

Add K-EXAONE-236B-A23B vllm-project/vllm#31621

Merged

5 tasks

vasqu reviewed Jan 12, 2026

View reviewed changes

yonigozlan added the New model label Jan 13, 2026

nuxlear force-pushed the add-exaone-moe branch from cf89e66 to 2bf942b Compare January 23, 2026 14:03

nuxlear force-pushed the add-exaone-moe branch from e7d79e8 to dd1754e Compare January 25, 2026 19:44

nuxlear force-pushed the add-exaone-moe branch from 000ebd5 to 540eba0 Compare January 26, 2026 18:51

vasqu reviewed Jan 26, 2026

View reviewed changes

vasqu approved these changes Jan 29, 2026

View reviewed changes

lgai-exaone and others added 8 commits January 30, 2026 17:25

Add EXAONE-MoE implementations

dca7556

Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> Co-authored-by: Kibong Choi <rlqhd26@naver.com>

Add documentations of EXAONE-MoE

39f2a50

Fix EXAONE configs

d146090

Change model prefix into ExaoneMoe

2582a4c

Remove unnecessary classes and update EXAONE MoE config

d709a44

Fix EXAONE MoE modeling & config

d2204e3

Update docs

c1c8854

Add simple test for EXAONE MoE

a236cbd

nuxlear force-pushed the add-exaone-moe branch from d51d5ca to 9948661 Compare January 30, 2026 18:11

Fix test and config of EXAONE MoE

361e13d

some quick fixes

0e1e5bc

fix

81d9d56

vasqu added 2 commits February 2, 2026 18:33

more fixes

7fede69

Merge branch 'main' into add-exaone-moe

1b3b159

vasqu approved these changes Feb 2, 2026

View reviewed changes

ArthurZucker approved these changes Feb 3, 2026

View reviewed changes

vasqu and others added 4 commits February 3, 2026 17:01

update expectations

e36b95b

update id

04a8958

Merge branch 'main' into add-exaone-moe

99c8809

style

87f6ca4

vasqu approved these changes Feb 3, 2026

View reviewed changes

vasqu enabled auto-merge (squash) February 3, 2026 16:32

vasqu disabled auto-merge February 3, 2026 16:34

vasqu merged commit 379ec6b into huggingface:main Feb 3, 2026
26 checks passed

		@@ -0,0 +1,200 @@
		<!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.

	<!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.
	<!--Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved.

	# Default to MoE from the second layer and on
	self.mlp_layer_types = mlp_layer_types
	if self.mlp_layer_types is None:
	self.mlp_layer_types = ["dense"] + ["sparse"] * (self.num_hidden_layers - 1)
	layer_type_validation(self.mlp_layer_types, self.num_hidden_layers, attention=False)

		if "sliding_attention" in self.layer_types:
		self.cache_implementation = "hybrid"

	_can_compile_fullgraph = (
	is_grouped_mm_available()
	) # https://huggingface.co/docs/transformers/experts_interface#torchcompile

	generated_ids = model.generate(**input_ids, max_new_tokens=20, temperature=0)
	generated_ids = model.generate(**input_ids, max_new_tokens=20, do_sample=False)

		@@ -0,0 +1,27 @@
		# Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.

	# Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved.
	# Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved.

		from ...configuration_utils import PreTrainedConfig, layer_type_validation


		class ExaoneMoeConfig(PreTrainedConfig):

Conversation

nuxlear commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

vasqu commented Jan 26, 2026

Uh oh!

nuxlear commented Jan 26, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jan 28, 2026

nuxlear commented Jan 2, 2026 •

edited

Loading