Add EXAONE 4.5 implementations by nuxlear · Pull Request #45471 · huggingface/transformers

nuxlear · 2026-04-16T08:52:35Z

What does this PR do?

Add EXAONE 4.5 architecture for the EXAONE 4.5 model released by LG AI Research.

This PR adds the modeling code for EXAONE 4.5, which uses the same LLM architecture as EXAONE 4.
Documentation will be updated.

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@zucchini-nlp

zucchini-nlp

Hey @nuxlear , great addition!

I am seeing that the model is almost Qwen2-5VLVision + ExaoneLM with a small difference on the number of kv groups. Can you confirm if official ckpt have different values or if we can drop it, and fully copy from qwen?

If we can drop it, I left comment on how to clean it up further :)

Ah, and also, I see the doc page is missing which should be in docs/source/en/model_docs

zucchini-nlp · 2026-04-16T09:19:58Z

+class Exaone4_5_TextConfig(Exaone4Config):
+    model_type = "exaone4_5_text"
+    base_config_key = "text_config"
+    keys_to_ignore_at_inference = ["past_key_values"]


this looks identical, we should be able to load Exaone4Config directly. For ex see how we load "llama" in llava by default

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/configuration_llava.py

zucchini-nlp · 2026-04-16T09:21:03Z

+from ..qwen2_vl.video_processing_qwen2_vl import Qwen2VLVideoProcessor
+
+
+@strict


all configs have to be strict and also have @autodoctring(checkpoint="my-hub-repo")

zucchini-nlp

Nice work @nuxlear 🤩

I think the PR looks very good and we just need one final clean-up for style, before asing a core maintainer review

Ping me when you're ready and CI is green for model (ignore unrelated CI failures), and I will ask a core maintainer rveiew

zucchini-nlp · 2026-04-21T14:47:25Z

        ("depth_anything", "DepthAnythingConfig"),
        ("depth_pro", "DepthProConfig"),
-        ("detr", "DetrConfig"),
+        ("detr", "MaskFormerDetrConfig"),


was that done manually or after fix-repo 🤔 very weird if automatically, I need to check

can you revert it, looks like a bad rebase

zucchini-nlp · 2026-04-21T14:58:13Z

+        outputs = self.model(
+            input_ids=input_ids,
+            pixel_values=pixel_values,
+            pixel_values_videos=pixel_values_videos,
+            image_grid_thw=image_grid_thw,
+            video_grid_thw=video_grid_thw,
+            second_per_grid_ts=second_per_grid_ts,
+            position_ids=position_ids,
+            attention_mask=attention_mask,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            **kwargs,
+        )
+
+        hidden_states = outputs.last_hidden_state
+        slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+        logits = self.lm_head(hidden_states[:, slice_indices, :])
+
+        loss = None
+        if labels is not None:


we don't need forward to override a docsring, instead we can return super().forward(**super_kwargs)

We don't use rope_deltas or mm_token_type_ids in forward(), so can we drop these kwargs and use CausalLMOutputWithPast instead of the generated Exaone4_5_CausalLMOutputWithPast? (which has rope_deltas unnecessarily)

Ah I see, indeed, we don't have a way to easily drop args from signature

zucchini-nlp · 2026-04-21T14:59:13Z

+class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False):
+    _defaults = {
+        "text_kwargs": {
+            "padding": False,
+        },
+        "videos_kwargs": {"return_metadata": True},
+    }
+
+
+class Exaone4_5_Processor(Qwen2_5_VLProcessor):
+    tokenizer_class = "AutoTokenizer"


nah, I see you added it in processing_auto, so now we can just delete these

are we deleteing it?

zucchini-nlp · 2026-04-21T15:03:09Z

You might also need a rebase, and at last run make fix-repo for repo consistency CI

nuxlear · 2026-04-27T10:09:24Z

@zucchini-nlp Sorry for the delay. I'm starting to address the feedback.

BTW, since EXAONE 4.5 was recently released in vLLM (v0.20.0), we need to keep some class names as-is for compatibility, including Exaone4_5_ImageProcessor.

Is there a recommended way to keep the model config unchanged while mapping the old name to the Transformers convention (e.g., Exaone4_5_ImageProcessor → Qwen2VLImageProcessor)?

If not, can I just patch this by aliasing the class name, e.g.,
from transformers import Qwen2VLImageProcessor as Exaone4_5_ImageProcessor in modeling_exaone4_5.py?

zucchini-nlp · 2026-04-27T10:17:40Z

@nuxlear not sure I understand the vLLM part. We didn't yet release the model in transformers, so vLLM should be using its own integration without importing anything like from transformers import Exaone4_5_ImageProcessor

What part do we need to keep without changing?

nuxlear · 2026-04-27T10:47:05Z

Yes, we understand that it would be better for vLLM to use its own integration.

However, in this case, it would require updating our model config (e.g., exaone4_5_text → exaone4, and Exaone4_5_[Image/Video]Processor → Qwen2VL[Image/Video]Processor) and submitting a separate PR to fix the imports in vLLM (see here).

So if there is a simple way to map or alias the existing config values and class names, we’d prefer that. Otherwise, we’d need to update the EXAONE 4.5 config, which would break compatibility with vLLM v0.20.0 and require additional changes.

zucchini-nlp · 2026-04-27T11:02:28Z

@nuxlear so vllm is importing smth that doesn't yet exist and isn't released? 🫠

In that case, I don't think it falls into breaking BC since it wasn't released yet and there is nothing to break. Seems like vLLM can't support Exaone before transformers release anyway

nuxlear · 2026-04-27T11:26:29Z

@zucchini-nlp I understand. Then we will update the model config and open a patch PR later. I'll continue addressing the feedback 😃

nuxlear · 2026-04-29T00:16:41Z

@zucchini-nlp I think it's almost done, but I can't figure out why utils/check_repo.py is failing. It passes in my environment with the latest commit. (Edit: this was my bad. Never mind.)

To get the tests fully passing, we need to update the config from exaone4_5_text to exaone4, which may break the current vLLM job. Could you check the failing tests locally with this change? If it works on your side, we’ll proceed with updating the config and docs accordingly.

github-actions · 2026-04-29T01:01:08Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, exaone4_5

github-actions · 2026-04-29T01:28:30Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45471&sha=a8cebd

zucchini-nlp

Great work, we can request a core maintainer review as last step before merging. I left some nitty-picky comments

Btw, are we deleting the unused processor class?

zucchini-nlp · 2026-04-29T09:49:44Z

+processor = AutoProcessor.from_pretrained(model_id)
+model = AutoModelForImageTextToText.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,


nit: lets delete torch_dtype, we just merged a PR cleaning all docs. It default to dtype from config when loading

zucchini-nlp · 2026-04-29T09:50:17Z

        ("depth_anything", "DepthAnythingConfig"),
        ("depth_pro", "DepthProConfig"),
-        ("detr", "DetrConfig"),
+        ("detr", "MaskFormerDetrConfig"),


can you revert it, looks like a bad rebase

zucchini-nlp · 2026-04-29T09:52:46Z

+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+        self.q_dim = self.num_heads * self.head_dim
+        self.kv_dim = self.num_key_value_heads * self.head_dim
+        self.qkv = nn.Linear(self.dim, self.q_dim + (self.kv_dim * 2), bias=True)


i was pointed recently, that we usually prefer unfused qkv, but I will leave it for core maintainer to decide. In any case, you won't have to change state-dicts, we will fuse-unfuse on the fly

zucchini-nlp · 2026-04-29T09:53:51Z

+        self,
+        hidden_states: torch.Tensor,
+        cu_seqlens: torch.Tensor,
+        rotary_pos_emb: torch.Tensor | None = None,


nit: unused arg rotary_pos_emb (I realize it is copy from qwen but lets delete)

zucchini-nlp · 2026-04-29T09:55:40Z

+        elif position_ids.ndim > 2:
+            position_ids = position_ids[-1]


is this possible? I think we shouldn't allow it and let it error out naturally at some point

zucchini-nlp · 2026-04-29T09:56:36Z

+        )
+
+
+@auto_docstring(checkpoint="LGAI-EXAONE/EXAONE-4.5-33B")


ultra nit: we don't need to add a ckpt everywhere, only in config classes :)

zucchini-nlp · 2026-04-29T09:56:55Z

+class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False):
+    _defaults = {
+        "text_kwargs": {
+            "padding": False,
+        },
+        "videos_kwargs": {"return_metadata": True},
+    }
+
+
+class Exaone4_5_Processor(Qwen2_5_VLProcessor):
+    tokenizer_class = "AutoTokenizer"


are we deleteing it?

zucchini-nlp · 2026-04-29T10:03:42Z

To get the tests fully passing, we need to update the config from exaone4_5_text to exaone4, which may break the current vLLM job. Could you check the failing tests locally with this change? If it works on your side, we’ll proceed with updating the config and docs accordingly.

You mean the configs on the hub? Yes, looking at the tests looks like the changing model-type will fix it. If you cannot change config.json due to BC with vLLM/etc, I recomment to change it directly in code. For ex, smth like this inside a config.__post_init__:

# inlne comment here explaining why we override the model-type
if isinstance(text_config, dict):
    model_type = text_config.get('model_type', 'exaone4')
    if model_type = "exaone4_5_text": model_type = "exaone4"
    text_config = CONFIG_MAPPING[model_type](**text_cofnig)
elif text_config is None:
    text_config = CONFIG_MAPPING['exaone4']()

zucchini-nlp reviewed Apr 16, 2026

View reviewed changes

nuxlear force-pushed the add-exaone4_5 branch from 361b242 to 31991e7 Compare April 20, 2026 13:46

nuxlear requested a review from zucchini-nlp April 20, 2026 15:34

zucchini-nlp reviewed Apr 21, 2026

View reviewed changes

nuxlear force-pushed the add-exaone4_5 branch from 31991e7 to 1f23651 Compare April 27, 2026 15:16

lgai-exaone added 9 commits April 29, 2026 03:57

Add EXAONE 4.5 modeling & config code

a7838c9

Add test code for EXAONE 4.5 and Fix style

4afe826

Address PR feedback

4c9c815

Address PR feedback

1923fd9

Update EXAONE 4.5 docs

45ae7ca

Update docstring

4e5f13d

Address the feedback

ec86389

Fix docs

7204958

Minor fix

fa9c371

nuxlear force-pushed the add-exaone4_5 branch from e887ee3 to fa9c371 Compare April 28, 2026 18:58

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Minor fix

2f81fdb

Update docs

a8cebd1

nuxlear requested a review from zucchini-nlp April 29, 2026 05:58

zucchini-nlp approved these changes Apr 29, 2026

View reviewed changes

		from ..qwen2_vl.video_processing_qwen2_vl import Qwen2VLVideoProcessor


		@strict

		)


		@auto_docstring(checkpoint="LGAI-EXAONE/EXAONE-4.5-33B")

Conversation

nuxlear commented Apr 16, 2026

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Apr 21, 2026

Uh oh!

nuxlear commented Apr 27, 2026

Uh oh!

zucchini-nlp commented Apr 27, 2026

Uh oh!

nuxlear commented Apr 27, 2026

Uh oh!

zucchini-nlp commented Apr 27, 2026

Uh oh!

nuxlear commented Apr 27, 2026

Uh oh!

nuxlear commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zucchini-nlp left a comment •

edited

Loading

nuxlear commented Apr 29, 2026 •

edited

Loading

zucchini-nlp commented Apr 29, 2026 •

edited

Loading