Skip to content

Add Gemma4ForSequenceClassification (missing from gemma4 module — Gemma 2/3 have it) #45373

@LarsKlawitter

Description

@LarsKlawitter

Feature request

Please add Gemma4ForSequenceClassification (and ideally Gemma4TextForSequenceClassification) to transformers.models.gemma4. The current __all__ in modeling_gemma4.py contains only:

__all__ = [
    "Gemma4AudioModel",
    "Gemma4ForCausalLM",
    "Gemma4ForConditionalGeneration",
    "Gemma4Model",
    "Gemma4PreTrainedModel",
    "Gemma4TextModel",
    "Gemma4VisionModel",
]

No classification head variants are exported, which is inconsistent with every prior Gemma release:

Model family | ForSequenceClassification | TextForSequenceClassification -- | -- | -- gemma | ✅ | N/A (text-only) gemma2 | ✅ | N/A (text-only) gemma3 | ✅ | ✅ gemma3n | ✅ | ✅ gemma4 | ❌ | ❌

As a result, AutoModelForSequenceClassification.from_pretrained("google/gemma-4-E4B", num_labels=3) raises:

ValueError: Unrecognized configuration class <class 'transformers.models.gemma4.configuration_gemma4.Gemma4Config'>
for this kind of AutoModel: AutoModelForSequenceClassification.

Motivation

AutoModelForSequenceClassification is the standard entry point for fine-tuning causal LMs as classifiers — a common use case across research (domain classification, sentiment, reward modelling) and production (content moderation, intent detection, financial signal prediction). Excluding it from gemma4 forces users to either:

Roll a custom classifier head wrapper (breaks the AutoModel contract and duplicates logic that already exists in GenericForSequenceClassification),
Fine-tune generatively and parse argmax over label tokens (worse calibration, loses softmax-head properties), or
Skip Gemma 4 entirely and stay on Gemma 3 or a different family.
We are currently working around this in a financial-NLP fine-tuning pipeline by subclassing GenericForSequenceClassification and registering the result via AutoModelForSequenceClassification.register(). It works but is exactly the kind of boilerplate the AutoModel registry is supposed to eliminate, and it has to be duplicated by every project that wants to fine-tune Gemma 4 as a classifier.

Gemma 4 E4B is a Mixture-of-Experts model (Gemma4TextExperts + Gemma4TextRouter confirmed in modeling_gemma4.py), which makes it particularly inconvenient for users to hand-roll a classification head: the correct pooling behaviour and last_non_pad_token handling that GenericForSequenceClassification provides for free becomes something every user has to re-implement while also navigating MoE-specific LoRA pitfalls. Adding Gemma4(Text)ForSequenceClassification upstream would spare every downstream user from reinventing the same wheel.

Your contribution

The implementation should be trivial — Gemma 3 provides the exact template. In transformers/models/gemma3/modeling_gemma3.py:

class Gemma3TextForSequenceClassification(GenericForSequenceClassification, Gemma3PreTrainedModel):
config: Gemma3TextConfig
input_modalities = ("text",)
And its multimodal sibling Gemma3ForSequenceClassification, which wires up a Gemma3Model + a nn.Linear(text_config.hidden_size, num_labels) head. The same pattern applied to Gemma 4 should require only a few lines of code plus:

Adding the class(es) to all in modeling_gemma4.py
Registering them in auto/modeling_auto.py against Gemma4Config (and Gemma4TextConfig if it exists)
I'm happy to open a PR if that would help. The change is mechanical but I'd like to confirm first which of the two variants (multimodal-path + text-only, or just text-only) the maintainers prefer — Gemma 3 shipped both, so matching that seems sensible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions