Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"#39698
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG"#39698wheeze01 wants to merge 2 commits intohuggingface:mainfrom
Conversation
…g, add safe pattern handling
|
[For maintainers] Suggested jobs to run (before merge) run-slow: exaone4 |
What does this PR do?This PR fixes two issues in
Now, we safely branch on three cases for
We also remove the incorrect Fixes #39696 Why were these changes needed?
Manual test script (
|
|
This PR infers and implements the intended behavior from LG AI Research’s existing code and PR discussion for EXAONE-4.0. It may differ slightly from the original developer’s intent, so any feedback is greatly appreciated. We also verified that inference works correctly with the following script: from transformers import AutoModelForCausalLM, AutoTokenizer
# model_name = "LGAI-EXAONE/EXAONE-4.0-1.2B" # same result
model_name = "LGAI-EXAONE/EXAONE-4.0-32B"
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="bfloat16", device_map=None
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)
# choose your prompt
prompt = "너가 얼마나 대단한지 설명해 봐"
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))output: |
|
Hello, @wheeze01. Thank you for your attention and contribution! Your PR appears to align with our intentions, except for one point:
By the way, we will update the models' configuration with proper |
ArthurZucker
left a comment
There was a problem hiding this comment.
Yes @lgai-exaone mentioned, the best is to align with layer_types! which should be explicit!
|
@ArthurZucker, thank you for your response! Apart from handling |
|
Hey! I am wondering what: attn_implementation="hybrid"would refer to? As currently all attention implementation in transformers support both sliding and non sliding |
|
@ArthurZucker, the current EXAONE 4.0 configuration includes this implementation at transformers/src/transformers/models/exaone4/modular_exaone4.py Lines 248 to 250 in 551a89a |
|
Ah ! Sorry what I wanted to write is |
What does this PR do?
Fixes a crash in
Exaone4Config.__init__whensliding_window_patternisNone(EXAONE-4.0-1.2B) or a string like"LLLG"(EXAONE-4.0-32B). The original code unconditionally performed a modulo operation onsliding_window_pattern, causing either aZeroDivisionErroror aTypeError. It also removed an incorrect"sliding_window"key check that left_attn_implementationunset. Now:We branch safely on three cases for
sliding_window_pattern:Noneor0→ all layers use"full_attention".str(e.g."LLLG") → map each character (L→"sliding_attention", others →"full_attention"), repeat to cover all layers, and force the final layer to"full_attention".int(e.g.4) → everyn‑th layer is"full_attention", others"sliding_attention", final layer forced"full_attention".We remove the incorrect check for
"sliding_window"inlayer_typesand no longer force_attn_implementation="hybrid"; we let Hugging Face’s internal_check_and_adjust_attn_implementationdecide the proper backend (e.g.,"eager","sdpa","flash_attention_*").This resolves both the division/modulo crash and the risk of
_attn_implementationremainingNonedownstream.Fixes #39696
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: