Add ForSequenceClassification heads for the OLMo family#45551
Merged
Rocketknight1 merged 2 commits intohuggingface:mainfrom Apr 22, 2026
Merged
Add ForSequenceClassification heads for the OLMo family#45551Rocketknight1 merged 2 commits intohuggingface:mainfrom
Rocketknight1 merged 2 commits intohuggingface:mainfrom
Conversation
Adds sequence-classification heads to the OLMo family so
`AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")`
(and the Olmo/Olmo3 equivalents) work out of the box.
Implementation follows the canonical modular-inheritance pattern used by
Gemma/Gemma2, Qwen2/Qwen3, and Glm/Glm4: a single hand-written subclass in
`modular_olmo.py` cascades trivially to Olmo2 and Olmo3 via the modular
tooling, which resolves to the `GenericForSequenceClassification` mixin.
Also registers the three classes in `MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES`
and adds autodoc entries to each model's doc page.
Coordination: huggingface#45529
Maintainer approval: @Rocketknight1 ("This is welcome! ... happy for it to
be mostly AI-written. Just ping me on the PR for review when it's ready!")
AI assistance: yes, per issue huggingface#45529.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For Olmo and Olmo2 (older `ModelTesterMixin` pattern), adds the new class to `all_model_classes` and wires up `text-classification` + `zero-shot` in `pipeline_model_mapping`, so standard forward/gradient tests run against the classification head. For Olmo3 (newer `CausalLMModelTester` pattern), sets `sequence_classification_class = Olmo3ForSequenceClassification` on the model tester, which auto-enables `test_sequence_classification_model`, `test_sequence_classification_model_for_single_label`, and `test_sequence_classification_model_for_multi_label` from the base class. Local verification on MPS: 413 non-TP tests pass; Olmo3's three classification tests pass specifically. TP tests (`test_tp_*`) are deselected on MPS hardware — CUDA-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, olmo, olmo2, olmo3 |
Member
|
run-slow: auto, olmo, olmo2, olmo3 |
Contributor
|
This comment contains models: ["models/auto", "models/olmo", "models/olmo2", "models/olmo3"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Rocketknight1
approved these changes
Apr 22, 2026
Member
Rocketknight1
left a comment
There was a problem hiding this comment.
Yes, LGTM! I'll merge as soon as tests pass, if we're lucky we might get in before the branch cut for today's release.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds
ForSequenceClassificationfor Olmo, Olmo2, and Olmo3 soAutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-0425-1B")(and the Olmo/Olmo3 equivalents) work.Structure follows the same pattern as Gemma2, Qwen3, and Glm4: one
OlmoForSequenceClassification(LlamaForSequenceClassification): passinmodular_olmo.py, then Olmo2 and Olmo3 each subclass the previous one. Aftermake fix-repo, the generatedmodeling_*.pyfiles use theGenericForSequenceClassificationmixin, same as Jamba, JetMoe, Ministral3, and Gemma3Text.Scope is the dense chain only. OlmoHybrid and the MoE branch (Olmoe, FlexOlmo) can come as follow-up PRs.
Fixes #45529
Code Agent Policy
This was drafted with Claude Code. @Rocketknight1 explicitly opted in for this specific change on the coordination issue:
#45529 (comment)
I read every changed line before each commit, ran the local test suite on my machine, and did GPU validation on real checkpoints before opening this. I can defend each change.
Before submitting
Olmo2ForSequenceClassification(and ideallyOlmoForSequenceClassification/Olmo3ForSequenceClassification) #45529docs/source/en/model_doc/olmo{,2,3}.md.Changes
src/transformers/models/olmo/modular_olmo.py:class OlmoForSequenceClassification(LlamaForSequenceClassification): passsrc/transformers/models/olmo2/modular_olmo2.py: subclass ofOlmoForSequenceClassificationsrc/transformers/models/olmo3/modular_olmo3.py: subclass ofOlmo2ForSequenceClassificationmodeling_olmo.py,modeling_olmo2.py,modeling_olmo3.pyviamake fix-repoMODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMESinmodels/auto/modeling_auto.pydocs/source/en/model_doc/olmo{,2,3}.mdModelTesterMixinpattern. Added the new class toall_model_classesandtext-classification/zero-shottopipeline_model_mapping.CausalLMModelTester. Setsequence_classification_class = Olmo3ForSequenceClassification, which auto-enables the threetest_sequence_classification_model*tests from the base class.Test plan
Local, on MPS:
make style,make typing,make check-repoall passpytest tests/models/olmo tests/models/olmo2 tests/models/olmo3: 413 passed, 0 failing. 4test_tp_*tests deselected because they require CUDA multi-GPU.test_sequence_classification_model*tests passGPU validation notebook with outputs: https://gist.github.com/earino/2bc6f246eef21a36c3c64d64150b9510
Ran on an NVIDIA RTX PRO 6000 Blackwell. For each of
allenai/OLMo-1B-hf,allenai/OLMo-2-0425-1B, andallenai/Olmo-3-7B-Instruct:AutoModelForSequenceClassification.from_pretrained(...)dispatches to the right class(batch, num_labels)ln(num_labels)as expectedLOAD REPORTcorrectly showsscore.weight | MISSING(new head) andlm_head.weight | UNEXPECTED(causal-LM head unused)A LoRA fine-tune on IMDB with
allenai/OLMo-2-0425-1B(4.2M trainable params, 250 steps) brings loss from 1.48 over the first 20 steps down to 0.0005 over the last 20. I only ran the full training loop on one of the three because the classification head implementation is identical across them (sameGenericForSequenceClassificationmixin, different backbones and*PreTrainedModelbases), so the training-loop plumbing is shared; smoke-test forward/backward covers the other two. Happy to extend the LoRA run to Olmo and Olmo3 if you'd prefer.Who can review?
cc @Rocketknight1 as requested on #45529.