Add EXAONE-MoE implementations#43080
Conversation
930a3b7 to
cf89e66
Compare
vasqu
left a comment
There was a problem hiding this comment.
Leaving some initial comments
- Missing tests but you already say that they will be added
- Our Moe implementation has changed for v5 <-- this is the biggest thing to change IMO but it comes with nice benefits (fullgraph compile, boosted moe performance, fp8 support OOB etc)
| @@ -0,0 +1,200 @@ | |||
| <!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |||
There was a problem hiding this comment.
| <!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |
| <!--Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved. |
probably elsewhere as well then, happy new year :D
| ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None), | ||
| ("esm", "EsmTokenizer"), | ||
| ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None), | ||
| ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None), |
There was a problem hiding this comment.
| ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None), |
I suspect that you need the tokenizers backend, please see #42894 for more details. Can you double-check?
As a side note, this does not require any changes on the hub repo (we autodetect this). Only if you notice that you indeed need the gpt2 tokenizer, then we will need to add this to the mapping here
There was a problem hiding this comment.
Sure. I will check whether the tokenizer backend works well with EXAONE MoE (and EXAONE 4 as well).
There was a problem hiding this comment.
Any update here? Can this be removed?
| for i in range(self.num_hidden_layers) | ||
| ] | ||
| if "sliding_window" in self.layer_types: | ||
| if "sliding_attention" in self.layer_types: |
There was a problem hiding this comment.
Oh wow, that's a good catch 😅
| self.is_moe_layer = is_moe_layer | ||
| if self.is_moe_layer is None: | ||
| self.is_moe_layer = [0] * self.first_k_dense_replace + [1] * ( | ||
| self.num_hidden_layers - self.first_k_dense_replace | ||
| ) |
There was a problem hiding this comment.
Similar to attention layers (sliding window, full etc), we also introduced it similarly for moe layers, see
Can you change it to that logic?
| if "sliding_attention" in self.layer_types: | ||
| self.cache_implementation = "hybrid" |
There was a problem hiding this comment.
Unsure if we still need this
| class ExaoneMoEDecoderLayer(OlmoeDecoderLayer): | ||
| def __init__(self, config: ExaoneMoEConfig, layer_idx: int): | ||
| super().__init__(config, layer_idx) | ||
| self.self_attn = ExaoneMoEAttention(config=config, layer_idx=layer_idx) |
There was a problem hiding this comment.
Any reason we need this, should also be inheritable with modular, no?
| def __init__(self, config: ExaoneMoEConfig, layer_idx: int): | ||
| super().__init__(config, layer_idx) | ||
| self.self_attn = ExaoneMoEAttention(config=config, layer_idx=layer_idx) | ||
| self.mlp = ExaoneMoESparseMoEBlock(config) if config.is_moe_layer[layer_idx] else ExaoneMoEMLP(config) |
There was a problem hiding this comment.
See my comment about mlp_layer_types (in the config)
| "attentions": ExaoneMoEAttention, | ||
| "router_logits": ExaoneMoESparseMoEBlock, | ||
| } | ||
| _can_compile_fullgraph = False |
There was a problem hiding this comment.
See
transformers/src/transformers/models/deepseek_v3/modeling_deepseek_v3.py
Lines 546 to 548 in 61d7f8a
If we get the conversion working, we can compile fullgraph
| class ExaoneMoEForSequenceClassification(Exaone4ForSequenceClassification): | ||
| pass | ||
|
|
||
|
|
||
| class ExaoneMoEForTokenClassification(Exaone4ForTokenClassification): | ||
| pass | ||
|
|
||
|
|
||
| class ExaoneMoEForQuestionAnswering(Exaone4ForQuestionAnswering): | ||
| pass |
There was a problem hiding this comment.
Nit: Do we really need this? If we can, I'd like to avoid these
cf89e66 to
2bf942b
Compare
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43080&sha=e7d79e |
e7d79e8 to
dd1754e
Compare
|
@nuxlear just ping me again when it's ready for review |
000ebd5 to
540eba0
Compare
|
@vasqu I think it's ready for review, but |
vasqu
left a comment
There was a problem hiding this comment.
Looks already super clean, just a few small nit + a dummy model for our CI
| @@ -0,0 +1,200 @@ | |||
| <!--Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |||
| The K-EXAONE model is compatible with both OpenAI and HuggingFace tool calling specifications. | ||
| The example below demonstrates tool calling using HuggingFace’s docstring-to-tool-schema utility. | ||
|
|
||
| Please check the [example file](examples/example_output_search.txt) for an example of a search agent conversation using K-EXAONE. |
| ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None), | ||
| ("esm", "EsmTokenizer"), | ||
| ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None), | ||
| ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None), |
|
|
||
| @require_torch | ||
| class ExaoneMoeIntegrationTest(unittest.TestCase): | ||
| TEST_MODEL_ID = "LGAI-EXAONE/K-EXAONE-236B-A23B" |
There was a problem hiding this comment.
This will be too big for our CI, can we create a dummy model instead? (up to 24GB Vram as it's an A10 GPU)
There was a problem hiding this comment.
Is it necessary to upload a dummy model to the HF hub?
We don't have a proper model for this, and it feels a bit awkward to upload dummy weights under our official organization.
Would it be okay if I uploaded it under my personal account instead?
There was a problem hiding this comment.
Yes sure, I can also move it to our internal testing repo afterwards
There was a problem hiding this comment.
https://huggingface.co/nuxlear/EXAONE-MoE-Dummy-7B-A1B
just uploaded, but I need to do more test with it.
| @slow | ||
| @require_torch_large_accelerator | ||
| def test_model_generation_beyond_sliding_window_flash(self): | ||
| EXPECTED_OUTPUT_TOKEN_IDS = [21605, 2711] | ||
| input_ids = [72861, 2711] * 2048 | ||
| model = self.get_model() | ||
| input_ids = torch.tensor([input_ids]).to(model.model.embed_tokens.weight.device) | ||
|
|
||
| with torch.no_grad(): | ||
| generated_ids = model.generate(input_ids, max_new_tokens=4, temperature=0) | ||
| self.assertEqual(EXPECTED_OUTPUT_TOKEN_IDS, generated_ids[0][-2:].tolist()) |
There was a problem hiding this comment.
Would need to change get_model to pass the implementation? It should load with sdpa currently this way - we can also just rename the test
| input_ids = input_ids.to(model.model.embed_tokens.weight.device) | ||
|
|
||
| with torch.no_grad(): | ||
| generated_ids = model.generate(**input_ids, max_new_tokens=20, temperature=0) |
There was a problem hiding this comment.
| generated_ids = model.generate(**input_ids, max_new_tokens=20, temperature=0) | |
| generated_ids = model.generate(**input_ids, max_new_tokens=20, do_sample=False) |
nit: just our preferred way to do it
| sliding_window_pattern=4, | ||
| layer_types=None, | ||
| mlp_layer_types=None, | ||
| first_k_dense_replace=1, |
There was a problem hiding this comment.
Ah missed this: this should be mlp layer types with a list of the types. (Similar to layer types for attention)
There was a problem hiding this comment.
You mean one of 'dense' and 'sparse', right?
|
You can ping me when it's ready for review |
|
Should I update the test code with a dummy model? I think everything else is ready. |
Yes, please 🙏 taking a look in a second then |
|
It seems the current dummy model needs to be updated, so I’ll notify you when it’s ready. |
vasqu
left a comment
There was a problem hiding this comment.
Leaving some small last comments, imo it looks very much ready! Let's cleanup the config a tad more and wrap up the integration tests then we are good to go
Just ping me again when ready, great work
| ("ernie4_5_vl_moe", "TokenizersBackend" if is_tokenizers_available() else None), | ||
| ("esm", "EsmTokenizer"), | ||
| ("exaone4", "GPT2Tokenizer" if is_tokenizers_available() else None), | ||
| ("exaone_moe", "GPT2Tokenizer" if is_tokenizers_available() else None), |
There was a problem hiding this comment.
Any update here? Can this be removed?
| @@ -0,0 +1,27 @@ | |||
| # Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |||
There was a problem hiding this comment.
| # Copyright 2025 The LG AI Research and The HuggingFace Team. All rights reserved. | |
| # Copyright 2026 The LG AI Research and The HuggingFace Team. All rights reserved. |
| sliding_window_pattern (`str`, *optional*, defaults to 4): | ||
| The pattern to use for sliding window attention. Can be one of: | ||
| - `None`: No sliding window attention is used | ||
| - `int`: Every `sliding_window` layers, use global attention, else use local attention. | ||
| - `str`: A sequence of "L" (local attention) and "G" (global attention) characters that defines the | ||
| attention pattern. The pattern starts from layer 0 and repeats every `sliding_window` layers. The | ||
| final layer always uses global attention regardless of the pattern. | ||
| For instance, sliding_window_pattern="LLLG" same as sliding_window=4, which means: | ||
| - Layer 0, 1, 2: local attention, | ||
| - Layer 3: global attention, | ||
| ...(repeated) |
There was a problem hiding this comment.
I'd like to avoid this if possible, and just use layertypes directly. We also start to do the same for mlp layers (moe) and it gives more flexibility with other attention flavors (e.g. linear attention (gated delta net))
There was a problem hiding this comment.
I understand, and it would be better to remove them.
However, since these configs (including those below) are often used by other libraries such as llama.cpp, they should remain in the model’s config.json.
If that is acceptable, we have no reason to keep them in the config implementation. 😃
There was a problem hiding this comment.
Yea, no worries not super important 👍 would be just the ideal case
| first_k_dense_replace (`int`, *optional*, defaults to 1): | ||
| Number of dense layers in shallow layers(embed->dense->dense->...->dense->moe->moe...->lm_head). | ||
| \--k dense layers--/ |
There was a problem hiding this comment.
In the same spirit to my comment before, let's remove this and only use mlp layer types directly
| from ...configuration_utils import PreTrainedConfig, layer_type_validation | ||
|
|
||
|
|
||
| class ExaoneMoeConfig(PreTrainedConfig): |
There was a problem hiding this comment.
Probably needs to sync with main, I recently made the Rope mixin explicit for models that use it - can you check
E.g.
(modular should do it automatically for you, just need to merge with main and apply modular again)| PreTrainedConfig.__init__( | ||
| bos_token_id=bos_token_id, eos_token_id=eos_token_id, tie_word_embeddings=tie_word_embeddings, **kwargs | ||
| ) |
There was a problem hiding this comment.
Sorry I commented directly on the config file but should be done here ofc
|
Also sorry about the CI, it's still flaky here and there but it should be more stable on main |
Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> Co-authored-by: Kibong Choi <rlqhd26@naver.com>
d51d5ca to
9948661
Compare
|
I’ve updated the dummy test model and the docstrings. |
|
@ArthurZucker @Rocketknight1 could you kindly review this PR? |
|
Yes! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: exaone4, exaone_moe |
|
This comment contains models: ["models/exaone4", "models/exaone_moe"] |
CI ResultsCommit Info
Model CI Report❌ 7 new failed tests from this PR 😭
|
vasqu
left a comment
There was a problem hiding this comment.
Some last comments from my side, fixed a few smaller issues (checking with run slow again in a second)
| bos_token_id (`int`, *optional*, defaults to 1): | ||
| Beginning of stream token id. | ||
| eos_token_id (`int`, *optional*, defaults to 53): | ||
| End of stream token id. | ||
| pad_token_id (`int`, *optional*, defaults to 0): | ||
| Padding token id. |
There was a problem hiding this comment.
Took this from https://huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B/blob/main/generation_config.json
A bit confused since the values were different would be nice if you could confirm these or if it should be the previous values, see 0e1e5bc
There was a problem hiding this comment.
We use 53 as the end-of-turn token, while 2 is used as EOS.
Either can be used as the default value, so you can set it to 53.
|
|
||
| return cls.model | ||
|
|
||
| def test_model_logits(self): |
There was a problem hiding this comment.
Logits don't match on our CI, I think it's a GPU diff so let me know if I should update them myself
There was a problem hiding this comment.
I agree with that. It looks like you’ll need to update them in your CI environment.
There was a problem hiding this comment.
Gotcha, let me update them tomorrow then 👍 (and also copy the repo to our internal testing)
|
run-slow: exaone4, exaone_moe |
|
This comment contains models: ["models/exaone4", "models/exaone_moe"] |
CI ResultsCommit Info
Model CI Report❌ 1 new failed tests from this PR 😭
|
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, exaone4, exaone_moe |
|
run-slow: exaone_moe |
|
This comment contains models: ["models/exaone_moe"] |
What does this PR do?
Add EXAONE-MoE architecture for the K-EXAONE model released by LG AI Research.
This PR adds the modeling code of EXAONE-MoE (K-EXAONE), which is available at the fork of the LG AI Research:
https://github.com/Aim-Highest/transformers
Test code and documentation will be updated.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker