Add EXAONE 4.5 implementations#45471
Conversation
There was a problem hiding this comment.
Hey @nuxlear , great addition!
I am seeing that the model is almost Qwen2-5VLVision + ExaoneLM with a small difference on the number of kv groups. Can you confirm if official ckpt have different values or if we can drop it, and fully copy from qwen?
If we can drop it, I left comment on how to clean it up further :)
Ah, and also, I see the doc page is missing which should be in docs/source/en/model_docs
| class Exaone4_5_TextConfig(Exaone4Config): | ||
| model_type = "exaone4_5_text" | ||
| base_config_key = "text_config" | ||
| keys_to_ignore_at_inference = ["past_key_values"] |
There was a problem hiding this comment.
this looks identical, we should be able to load Exaone4Config directly. For ex see how we load "llama" in llava by default
| from ..qwen2_vl.video_processing_qwen2_vl import Qwen2VLVideoProcessor | ||
|
|
||
|
|
||
| @strict |
There was a problem hiding this comment.
all configs have to be strict and also have @autodoctring(checkpoint="my-hub-repo")
zucchini-nlp
left a comment
There was a problem hiding this comment.
Nice work @nuxlear 🤩
I think the PR looks very good and we just need one final clean-up for style, before asing a core maintainer review
Ping me when you're ready and CI is green for model (ignore unrelated CI failures), and I will ask a core maintainer rveiew
| ("depth_anything", "DepthAnythingConfig"), | ||
| ("depth_pro", "DepthProConfig"), | ||
| ("detr", "DetrConfig"), | ||
| ("detr", "MaskFormerDetrConfig"), |
There was a problem hiding this comment.
was that done manually or after fix-repo 🤔 very weird if automatically, I need to check
There was a problem hiding this comment.
can you revert it, looks like a bad rebase
| outputs = self.model( | ||
| input_ids=input_ids, | ||
| pixel_values=pixel_values, | ||
| pixel_values_videos=pixel_values_videos, | ||
| image_grid_thw=image_grid_thw, | ||
| video_grid_thw=video_grid_thw, | ||
| second_per_grid_ts=second_per_grid_ts, | ||
| position_ids=position_ids, | ||
| attention_mask=attention_mask, | ||
| past_key_values=past_key_values, | ||
| inputs_embeds=inputs_embeds, | ||
| use_cache=use_cache, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| hidden_states = outputs.last_hidden_state | ||
| slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep | ||
| logits = self.lm_head(hidden_states[:, slice_indices, :]) | ||
|
|
||
| loss = None | ||
| if labels is not None: |
There was a problem hiding this comment.
we don't need forward to override a docsring, instead we can return super().forward(**super_kwargs)
There was a problem hiding this comment.
We don't use rope_deltas or mm_token_type_ids in forward(), so can we drop these kwargs and use CausalLMOutputWithPast instead of the generated Exaone4_5_CausalLMOutputWithPast? (which has rope_deltas unnecessarily)
There was a problem hiding this comment.
Ah I see, indeed, we don't have a way to easily drop args from signature
| class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False): | ||
| _defaults = { | ||
| "text_kwargs": { | ||
| "padding": False, | ||
| }, | ||
| "videos_kwargs": {"return_metadata": True}, | ||
| } | ||
|
|
||
|
|
||
| class Exaone4_5_Processor(Qwen2_5_VLProcessor): | ||
| tokenizer_class = "AutoTokenizer" |
There was a problem hiding this comment.
nah, I see you added it in processing_auto, so now we can just delete these
|
You might also need a rebase, and at last run |
|
@zucchini-nlp Sorry for the delay. I'm starting to address the feedback. BTW, since EXAONE 4.5 was recently released in vLLM (v0.20.0), we need to keep some class names as-is for compatibility, including Is there a recommended way to keep the model config unchanged while mapping the old name to the Transformers convention (e.g., If not, can I just patch this by aliasing the class name, e.g., |
|
@nuxlear not sure I understand the vLLM part. We didn't yet release the model in transformers, so vLLM should be using its own integration without importing anything like What part do we need to keep without changing? |
|
Yes, we understand that it would be better for vLLM to use its own integration. However, in this case, it would require updating our model config (e.g., So if there is a simple way to map or alias the existing config values and class names, we’d prefer that. Otherwise, we’d need to update the EXAONE 4.5 config, which would break compatibility with vLLM v0.20.0 and require additional changes. |
|
@nuxlear so vllm is importing smth that doesn't yet exist and isn't released? 🫠 In that case, I don't think it falls into breaking BC since it wasn't released yet and there is nothing to break. Seems like vLLM can't support Exaone before transformers release anyway |
|
@zucchini-nlp I understand. Then we will update the model config and open a patch PR later. I'll continue addressing the feedback 😃 |
|
@zucchini-nlp I think it's almost done, but I can't figure out why utils/check_repo.py is failing. It passes in my environment with the latest commit. (Edit: this was my bad. Never mind.) To get the tests fully passing, we need to update the config from |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, exaone4_5 |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45471&sha=a8cebd |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Great work, we can request a core maintainer review as last step before merging. I left some nitty-picky comments
Btw, are we deleting the unused processor class?
| processor = AutoProcessor.from_pretrained(model_id) | ||
| model = AutoModelForImageTextToText.from_pretrained( | ||
| model_id, | ||
| torch_dtype=torch.bfloat16, |
There was a problem hiding this comment.
nit: lets delete torch_dtype, we just merged a PR cleaning all docs. It default to dtype from config when loading
| ("depth_anything", "DepthAnythingConfig"), | ||
| ("depth_pro", "DepthProConfig"), | ||
| ("detr", "DetrConfig"), | ||
| ("detr", "MaskFormerDetrConfig"), |
There was a problem hiding this comment.
can you revert it, looks like a bad rebase
| self.num_key_value_groups = self.num_heads // self.num_key_value_heads | ||
| self.q_dim = self.num_heads * self.head_dim | ||
| self.kv_dim = self.num_key_value_heads * self.head_dim | ||
| self.qkv = nn.Linear(self.dim, self.q_dim + (self.kv_dim * 2), bias=True) |
There was a problem hiding this comment.
i was pointed recently, that we usually prefer unfused qkv, but I will leave it for core maintainer to decide. In any case, you won't have to change state-dicts, we will fuse-unfuse on the fly
| self, | ||
| hidden_states: torch.Tensor, | ||
| cu_seqlens: torch.Tensor, | ||
| rotary_pos_emb: torch.Tensor | None = None, |
There was a problem hiding this comment.
nit: unused arg rotary_pos_emb (I realize it is copy from qwen but lets delete)
| elif position_ids.ndim > 2: | ||
| position_ids = position_ids[-1] |
There was a problem hiding this comment.
is this possible? I think we shouldn't allow it and let it error out naturally at some point
| ) | ||
|
|
||
|
|
||
| @auto_docstring(checkpoint="LGAI-EXAONE/EXAONE-4.5-33B") |
There was a problem hiding this comment.
ultra nit: we don't need to add a ckpt everywhere, only in config classes :)
| class Exaone4_5_ProcessorKwargs(ProcessingKwargs, total=False): | ||
| _defaults = { | ||
| "text_kwargs": { | ||
| "padding": False, | ||
| }, | ||
| "videos_kwargs": {"return_metadata": True}, | ||
| } | ||
|
|
||
|
|
||
| class Exaone4_5_Processor(Qwen2_5_VLProcessor): | ||
| tokenizer_class = "AutoTokenizer" |
You mean the configs on the hub? Yes, looking at the tests looks like the changing model-type will fix it. If you cannot change # inlne comment here explaining why we override the model-type
if isinstance(text_config, dict):
model_type = text_config.get('model_type', 'exaone4')
if model_type = "exaone4_5_text": model_type = "exaone4"
text_config = CONFIG_MAPPING[model_type](**text_cofnig)
elif text_config is None:
text_config = CONFIG_MAPPING['exaone4']() |
What does this PR do?
Add EXAONE 4.5 architecture for the EXAONE 4.5 model released by LG AI Research.
This PR adds the modeling code for EXAONE 4.5, which uses the same LLM architecture as EXAONE 4.
Documentation will be updated.
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@zucchini-nlp