Skip to content

fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights#44676

Open
JokeYoonic wants to merge 1 commit intohuggingface:mainfrom
JokeYoonic:fix/python313-nan-lmhead
Open

fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights#44676
JokeYoonic wants to merge 1 commit intohuggingface:mainfrom
JokeYoonic:fix/python313-nan-lmhead

Conversation

@JokeYoonic
Copy link
Copy Markdown

@JokeYoonic JokeYoonic commented Mar 13, 2026

Problem:

  • On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head forward pass produces NaN/Inf values during inference
  • Root cause: lm_head.weight is tied to transformer.wte.weight, and the shared memory reference causes numerical instability in Python 3.13

Solution:

  • Clone the lm_head weight before passing to F.linear in GPT2LMHeadModel and GPT2DoubleHeadsModel forward methods
  • This breaks the memory sharing and resolves the NaN issue

Changes:

  • src/transformers/models/gpt2/modeling_gpt2.py: Modified GPT2LMHeadModel.forward() and GPT2DoubleHeadsModel.forward() to use self.lm_head.weight.clone()

Testing:

  • Verified fix with gpt2-medium model on Python 3.13.5 + PyTorch 2.6.0
  • All existing GPT-2 model tests pass

Reproduction Code

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("gpt2-medium")
model.eval()
tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
inputs = tokenizer("Hello world", return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

print(f"Has NaN: {torch.isnan(logits).any().item()}")  # Should be False after fix
print(f"Has Inf: {torch.isinf(logits).any().item()}")  # Should be False after fix

What does this PR do?

Fixes # (issue)

Before submitting

  • [x ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x ] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…weights

Problem:
- On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head
  forward pass produces NaN/Inf values during inference
- Root cause: lm_head.weight is tied to transformer.wte.weight, and the
  shared memory reference causes numerical instability in Python 3.13

Solution:
- Clone the lm_head weight before passing to F.linear in GPT2LMHeadModel
  and GPT2DoubleHeadsModel forward methods
- This breaks the memory sharing and resolves the NaN issue

Changes:
- src/transformers/models/gpt2/modeling_gpt2.py: Modified GPT2LMHeadModel.forward()
  and GPT2DoubleHeadsModel.forward() to use self.lm_head.weight.clone()

Testing:
- Verified fix with gpt2-medium model on Python 3.13.5 + PyTorch 2.6.0
- All existing GPT-2 model tests pass
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt2

gambletan

This comment was marked as spam.

@Rocketknight1
Copy link
Copy Markdown
Member

Hi, "shared memory reference causes numerical instability" doesn't make any sense to me, because "numerical instability" usually refers to errors in the least significant bits caused by floating point precision. Is the tied weight just not being handled correctly and we're getting randomly-initialized values instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants