Skip to content

Add EXAONE 4.5 implementations#45471

Open
nuxlear wants to merge 11 commits intohuggingface:mainfrom
nuxlear:add-exaone4_5
Open

Add EXAONE 4.5 implementations#45471
nuxlear wants to merge 11 commits intohuggingface:mainfrom
nuxlear:add-exaone4_5

Conversation

@nuxlear
Copy link
Copy Markdown
Contributor

@nuxlear nuxlear commented Apr 16, 2026

What does this PR do?

Add EXAONE 4.5 architecture for the EXAONE 4.5 model released by LG AI Research.

This PR adds the modeling code for EXAONE 4.5, which uses the same LLM architecture as EXAONE 4.
Documentation will be updated.

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@zucchini-nlp

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nuxlear , great addition!

I am seeing that the model is almost Qwen2-5VLVision + ExaoneLM with a small difference on the number of kv groups. Can you confirm if official ckpt have different values or if we can drop it, and fully copy from qwen?

If we can drop it, I left comment on how to clean it up further :)

Ah, and also, I see the doc page is missing which should be in docs/source/en/model_docs

Comment on lines +48 to +51
class Exaone4_5_TextConfig(Exaone4Config):
model_type = "exaone4_5_text"
base_config_key = "text_config"
keys_to_ignore_at_inference = ["past_key_values"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks identical, we should be able to load Exaone4Config directly. For ex see how we load "llama" in llava by default

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/configuration_llava.py

from ..qwen2_vl.video_processing_qwen2_vl import Qwen2VLVideoProcessor


@strict
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all configs have to be strict and also have @autodoctring(checkpoint="my-hub-repo")

Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread tests/models/exaone4_5/test_modeling_exaone4_5.py
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @nuxlear 🤩

I think the PR looks very good and we just need one final clean-up for style, before asing a core maintainer review

Ping me when you're ready and CI is green for model (ignore unrelated CI failures), and I will ask a core maintainer rveiew

Comment thread docs/source/en/model_doc/exaone4_5.md Outdated
Comment thread docs/source/en/model_doc/exaone4_5.md Outdated
Comment thread docs/source/en/model_doc/exaone4_5.md
Comment thread docs/source/ko/model_doc/exaone4_5.md
("depth_anything", "DepthAnythingConfig"),
("depth_pro", "DepthProConfig"),
("detr", "DetrConfig"),
("detr", "MaskFormerDetrConfig"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was that done manually or after fix-repo 🤔 very weird if automatically, I need to check

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert it, looks like a bad rebase

Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment thread src/transformers/models/exaone4_5/modular_exaone4_5.py Outdated
Comment on lines +449 to +469
outputs = self.model(
input_ids=input_ids,
pixel_values=pixel_values,
pixel_values_videos=pixel_values_videos,
image_grid_thw=image_grid_thw,
video_grid_thw=video_grid_thw,
second_per_grid_ts=second_per_grid_ts,
position_ids=position_ids,
attention_mask=attention_mask,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
**kwargs,
)

hidden_states = outputs.last_hidden_state
slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
logits = self.lm_head(hidden_states[:, slice_indices, :])

loss = None
if labels is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need forward to override a docsring, instead we can return super().forward(**super_kwargs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use rope_deltas or mm_token_type_ids in forward(), so can we drop these kwargs and use CausalLMOutputWithPast instead of the generated Exaone4_5_CausalLMOutputWithPast? (which has rope_deltas unnecessarily)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, indeed, we don't have a way to easily drop args from signature

Comment on lines +520 to +530
class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False):
_defaults = {
"text_kwargs": {
"padding": False,
},
"videos_kwargs": {"return_metadata": True},
}


class Exaone4_5_Processor(Qwen2_5_VLProcessor):
tokenizer_class = "AutoTokenizer"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, I see you added it in processing_auto, so now we can just delete these

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we deleteing it?

Comment thread tests/models/exaone4_5/test_modeling_exaone4_5.py
@zucchini-nlp
Copy link
Copy Markdown
Member

You might also need a rebase, and at last run make fix-repo for repo consistency CI

@nuxlear
Copy link
Copy Markdown
Contributor Author

nuxlear commented Apr 27, 2026

@zucchini-nlp Sorry for the delay. I'm starting to address the feedback.

BTW, since EXAONE 4.5 was recently released in vLLM (v0.20.0), we need to keep some class names as-is for compatibility, including Exaone4_5_ImageProcessor.

Is there a recommended way to keep the model config unchanged while mapping the old name to the Transformers convention (e.g., Exaone4_5_ImageProcessorQwen2VLImageProcessor)?

If not, can I just patch this by aliasing the class name, e.g.,
from transformers import Qwen2VLImageProcessor as Exaone4_5_ImageProcessor in modeling_exaone4_5.py?

@zucchini-nlp
Copy link
Copy Markdown
Member

@nuxlear not sure I understand the vLLM part. We didn't yet release the model in transformers, so vLLM should be using its own integration without importing anything like from transformers import Exaone4_5_ImageProcessor

What part do we need to keep without changing?

@nuxlear
Copy link
Copy Markdown
Contributor Author

nuxlear commented Apr 27, 2026

Yes, we understand that it would be better for vLLM to use its own integration.

However, in this case, it would require updating our model config (e.g., exaone4_5_textexaone4, and Exaone4_5_[Image/Video]ProcessorQwen2VL[Image/Video]Processor) and submitting a separate PR to fix the imports in vLLM (see here).

So if there is a simple way to map or alias the existing config values and class names, we’d prefer that. Otherwise, we’d need to update the EXAONE 4.5 config, which would break compatibility with vLLM v0.20.0 and require additional changes.

@zucchini-nlp
Copy link
Copy Markdown
Member

@nuxlear so vllm is importing smth that doesn't yet exist and isn't released? 🫠

In that case, I don't think it falls into breaking BC since it wasn't released yet and there is nothing to break. Seems like vLLM can't support Exaone before transformers release anyway

@nuxlear
Copy link
Copy Markdown
Contributor Author

nuxlear commented Apr 27, 2026

@zucchini-nlp I understand. Then we will update the model config and open a patch PR later. I'll continue addressing the feedback 😃

@nuxlear
Copy link
Copy Markdown
Contributor Author

nuxlear commented Apr 29, 2026

@zucchini-nlp I think it's almost done, but I can't figure out why utils/check_repo.py is failing. It passes in my environment with the latest commit. (Edit: this was my bad. Never mind.)

To get the tests fully passing, we need to update the config from exaone4_5_text to exaone4, which may break the current vLLM job. Could you check the failing tests locally with this change? If it works on your side, we’ll proceed with updating the config and docs accordingly.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, exaone4_5

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45471&sha=a8cebd

@nuxlear nuxlear requested a review from zucchini-nlp April 29, 2026 05:58
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, we can request a core maintainer review as last step before merging. I left some nitty-picky comments

Btw, are we deleting the unused processor class?

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lets delete torch_dtype, we just merged a PR cleaning all docs. It default to dtype from config when loading

("depth_anything", "DepthAnythingConfig"),
("depth_pro", "DepthProConfig"),
("detr", "DetrConfig"),
("detr", "MaskFormerDetrConfig"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert it, looks like a bad rebase

self.num_key_value_groups = self.num_heads // self.num_key_value_heads
self.q_dim = self.num_heads * self.head_dim
self.kv_dim = self.num_key_value_heads * self.head_dim
self.qkv = nn.Linear(self.dim, self.q_dim + (self.kv_dim * 2), bias=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was pointed recently, that we usually prefer unfused qkv, but I will leave it for core maintainer to decide. In any case, you won't have to change state-dicts, we will fuse-unfuse on the fly

self,
hidden_states: torch.Tensor,
cu_seqlens: torch.Tensor,
rotary_pos_emb: torch.Tensor | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused arg rotary_pos_emb (I realize it is copy from qwen but lets delete)

Comment on lines +279 to +280
elif position_ids.ndim > 2:
position_ids = position_ids[-1]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this possible? I think we shouldn't allow it and let it error out naturally at some point

)


@auto_docstring(checkpoint="LGAI-EXAONE/EXAONE-4.5-33B")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultra nit: we don't need to add a ckpt everywhere, only in config classes :)

Comment on lines +520 to +530
class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False):
_defaults = {
"text_kwargs": {
"padding": False,
},
"videos_kwargs": {"return_metadata": True},
}


class Exaone4_5_Processor(Qwen2_5_VLProcessor):
tokenizer_class = "AutoTokenizer"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we deleteing it?

@zucchini-nlp
Copy link
Copy Markdown
Member

zucchini-nlp commented Apr 29, 2026

To get the tests fully passing, we need to update the config from exaone4_5_text to exaone4, which may break the current vLLM job. Could you check the failing tests locally with this change? If it works on your side, we’ll proceed with updating the config and docs accordingly.

You mean the configs on the hub? Yes, looking at the tests looks like the changing model-type will fix it. If you cannot change config.json due to BC with vLLM/etc, I recomment to change it directly in code. For ex, smth like this inside a config.__post_init__:

# inlne comment here explaining why we override the model-type
if isinstance(text_config, dict):
    model_type = text_config.get('model_type', 'exaone4')
    if model_type = "exaone4_5_text": model_type = "exaone4"
    text_config = CONFIG_MAPPING[model_type](**text_cofnig)
elif text_config is None:
    text_config = CONFIG_MAPPING['exaone4']()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants