fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights by JokeYoonic · Pull Request #44676 · huggingface/transformers

JokeYoonic · 2026-03-13T16:28:01Z

Problem:

On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head forward pass produces NaN/Inf values during inference
Root cause: lm_head.weight is tied to transformer.wte.weight, and the shared memory reference causes numerical instability in Python 3.13

Solution:

Clone the lm_head weight before passing to F.linear in GPT2LMHeadModel and GPT2DoubleHeadsModel forward methods
This breaks the memory sharing and resolves the NaN issue

Changes:

src/transformers/models/gpt2/modeling_gpt2.py: Modified GPT2LMHeadModel.forward() and GPT2DoubleHeadsModel.forward() to use self.lm_head.weight.clone()

Testing:

Verified fix with gpt2-medium model on Python 3.13.5 + PyTorch 2.6.0
All existing GPT-2 model tests pass

Reproduction Code

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("gpt2-medium")
model.eval()
tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
inputs = tokenizer("Hello world", return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

print(f"Has NaN: {torch.isnan(logits).any().item()}")  # Should be False after fix
print(f"Has Inf: {torch.isinf(logits).any().item()}")  # Should be False after fix

What does this PR do?

Fixes # (issue)

Before submitting

[x ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x ] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…weights Problem: - On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head forward pass produces NaN/Inf values during inference - Root cause: lm_head.weight is tied to transformer.wte.weight, and the shared memory reference causes numerical instability in Python 3.13 Solution: - Clone the lm_head weight before passing to F.linear in GPT2LMHeadModel and GPT2DoubleHeadsModel forward methods - This breaks the memory sharing and resolves the NaN issue Changes: - src/transformers/models/gpt2/modeling_gpt2.py: Modified GPT2LMHeadModel.forward() and GPT2DoubleHeadsModel.forward() to use self.lm_head.weight.clone() Testing: - Verified fix with gpt2-medium model on Python 3.13.5 + PyTorch 2.6.0 - All existing GPT-2 model tests pass

github-actions · 2026-03-13T16:29:29Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt2

Rocketknight1 · 2026-03-18T17:16:49Z

Hi, "shared memory reference causes numerical instability" doesn't make any sense to me, because "numerical instability" usually refers to errors in the least significant bits caused by floating point precision. Is the tied weight just not being handled correctly and we're getting randomly-initialized values instead?

This comment was marked as spam.

Sign in to view

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights#44676

fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights#44676
JokeYoonic wants to merge 1 commit intohuggingface:mainfrom
JokeYoonic:fix/python313-nan-lmhead

JokeYoonic commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

Rocketknight1 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JokeYoonic commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproduction Code

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

Rocketknight1 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JokeYoonic commented Mar 13, 2026 •

edited

Loading