Add ForSequenceClassification heads for the OLMo family by earino · Pull Request #45551 · huggingface/transformers

earino · 2026-04-21T14:06:50Z

What does this PR do?

Adds ForSequenceClassification for Olmo, Olmo2, and Olmo3 so AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B") (and the Olmo/Olmo3 equivalents) work.

Structure follows the same pattern as Gemma2, Qwen3, and Glm4: one OlmoForSequenceClassification(LlamaForSequenceClassification): pass in modular_olmo.py, then Olmo2 and Olmo3 each subclass the previous one. After make fix-repo, the generated modeling_*.py files use the GenericForSequenceClassification mixin, same as Jamba, JetMoe, Ministral3, and Gemma3Text.

Scope is the dense chain only. OlmoHybrid and the MoE branch (Olmoe, FlexOlmo) can come as follow-up PRs.

Fixes #45529

Code Agent Policy

This was drafted with Claude Code. @Rocketknight1 explicitly opted in for this specific change on the coordination issue:

This is welcome! Sequence classification heads are often not included in the initial PR adding a new causal LM, but we're happy to add them. Your reason for needing it is good, and PRs like this are usually very easy to automate, so I'm happy for it to be mostly AI-written.

#45529 (comment)

I read every changed line before each commit, ran the local test suite on my machine, and did GPU validation on real checkpoints before opening this. I can defend each change.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Add Olmo2ForSequenceClassification (and ideally OlmoForSequenceClassification / Olmo3ForSequenceClassification) #45529
Did you make sure to update the documentation with your changes? Added autodoc sections to docs/source/en/model_doc/olmo{,2,3}.md.
Did you write any new necessary tests? See test plan below.

Changes

src/transformers/models/olmo/modular_olmo.py: class OlmoForSequenceClassification(LlamaForSequenceClassification): pass
src/transformers/models/olmo2/modular_olmo2.py: subclass of OlmoForSequenceClassification
src/transformers/models/olmo3/modular_olmo3.py: subclass of Olmo2ForSequenceClassification
Regenerated modeling_olmo.py, modeling_olmo2.py, modeling_olmo3.py via make fix-repo
Registered all three in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES in models/auto/modeling_auto.py
Added autodoc sections to docs/source/en/model_doc/olmo{,2,3}.md
Tests:
- Olmo and Olmo2 use the older ModelTesterMixin pattern. Added the new class to all_model_classes and text-classification/zero-shot to pipeline_model_mapping.
- Olmo3 uses the newer CausalLMModelTester. Set sequence_classification_class = Olmo3ForSequenceClassification, which auto-enables the three test_sequence_classification_model* tests from the base class.

Test plan

Local, on MPS:

make style, make typing, make check-repo all pass
pytest tests/models/olmo tests/models/olmo2 tests/models/olmo3: 413 passed, 0 failing. 4 test_tp_* tests deselected because they require CUDA multi-GPU.
Olmo3's three test_sequence_classification_model* tests pass

GPU validation notebook with outputs: https://gist.github.com/earino/2bc6f246eef21a36c3c64d64150b9510

Ran on an NVIDIA RTX PRO 6000 Blackwell. For each of allenai/OLMo-1B-hf, allenai/OLMo-2-0425-1B, and allenai/Olmo-3-7B-Instruct:

AutoModelForSequenceClassification.from_pretrained(...) dispatches to the right class
Forward returns logits of shape (batch, num_labels)
Loss is finite at random init, roughly ln(num_labels) as expected
Backward produces finite gradients for every trainable parameter
The library's LOAD REPORT correctly shows score.weight | MISSING (new head) and lm_head.weight | UNEXPECTED (causal-LM head unused)

A LoRA fine-tune on IMDB with allenai/OLMo-2-0425-1B (4.2M trainable params, 250 steps) brings loss from 1.48 over the first 20 steps down to 0.0005 over the last 20. I only ran the full training loop on one of the three because the classification head implementation is identical across them (same GenericForSequenceClassification mixin, different backbones and *PreTrainedModel bases), so the training-loop plumbing is shared; smoke-test forward/backward covers the other two. Happy to extend the LoRA run to Olmo and Olmo3 if you'd prefer.

Who can review?

cc @Rocketknight1 as requested on #45529.

@Rocketknight1

Adds sequence-classification heads to the OLMo family so `AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")` (and the Olmo/Olmo3 equivalents) work out of the box. Implementation follows the canonical modular-inheritance pattern used by Gemma/Gemma2, Qwen2/Qwen3, and Glm/Glm4: a single hand-written subclass in `modular_olmo.py` cascades trivially to Olmo2 and Olmo3 via the modular tooling, which resolves to the `GenericForSequenceClassification` mixin. Also registers the three classes in `MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES` and adds autodoc entries to each model's doc page. Coordination: huggingface#45529 Maintainer approval: @Rocketknight1 ("This is welcome! ... happy for it to be mostly AI-written. Just ping me on the PR for review when it's ready!") AI assistance: yes, per issue huggingface#45529. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

For Olmo and Olmo2 (older `ModelTesterMixin` pattern), adds the new class to `all_model_classes` and wires up `text-classification` + `zero-shot` in `pipeline_model_mapping`, so standard forward/gradient tests run against the classification head. For Olmo3 (newer `CausalLMModelTester` pattern), sets `sequence_classification_class = Olmo3ForSequenceClassification` on the model tester, which auto-enables `test_sequence_classification_model`, `test_sequence_classification_model_for_single_label`, and `test_sequence_classification_model_for_multi_label` from the base class. Local verification on MPS: 413 non-TP tests pass; Olmo3's three classification tests pass specifically. TP tests (`test_tp_*`) are deselected on MPS hardware — CUDA-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-21T14:08:00Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, olmo, olmo2, olmo3

Rocketknight1 · 2026-04-22T13:46:34Z

run-slow: auto, olmo, olmo2, olmo3

github-actions · 2026-04-22T13:48:07Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/auto", "models/olmo", "models/olmo2", "models/olmo3"]
quantizations: []

HuggingFaceDocBuilderDev · 2026-04-22T13:57:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1

Yes, LGTM! I'll merge as soon as tests pass, if we're lucky we might get in before the branch cut for today's release.

github-actions · 2026-04-22T14:19:45Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	22f14e8d	workflow commit (merge commit)
PR	812cba00	branch commit (from PR)
main	8fb7c7e5	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

earino and others added 2 commits April 21, 2026 15:18

Rocketknight1 approved these changes Apr 22, 2026

View reviewed changes

Rocketknight1 added this pull request to the merge queue Apr 22, 2026

Merged via the queue into huggingface:main with commit 864db66 Apr 22, 2026
30 checks passed

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ForSequenceClassification heads for the OLMo family#45551

Add ForSequenceClassification heads for the OLMo family#45551
Rocketknight1 merged 2 commits intohuggingface:mainfrom
earino:add-olmo-family-classification

earino commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Rocketknight1 commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 22, 2026

Uh oh!

Rocketknight1 left a comment

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

earino commented Apr 21, 2026

What does this PR do?

Code Agent Policy

Before submitting

Changes

Test plan

Who can review?

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Rocketknight1 commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 22, 2026

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 22, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants