Feature request
Please add Gemma4ForSequenceClassification (and ideally Gemma4TextForSequenceClassification) to transformers.models.gemma4. The current __all__ in modeling_gemma4.py contains only:
__all__ = [
"Gemma4AudioModel",
"Gemma4ForCausalLM",
"Gemma4ForConditionalGeneration",
"Gemma4Model",
"Gemma4PreTrainedModel",
"Gemma4TextModel",
"Gemma4VisionModel",
]
No classification head variants are exported, which is inconsistent with every prior Gemma release:
Model family | ForSequenceClassification | TextForSequenceClassification
-- | -- | --
gemma | ✅ | N/A (text-only)
gemma2 | ✅ | N/A (text-only)
gemma3 | ✅ | ✅
gemma3n | ✅ | ✅
gemma4 | ❌ | ❌
As a result, AutoModelForSequenceClassification.from_pretrained("google/gemma-4-E4B", num_labels=3) raises:
ValueError: Unrecognized configuration class <class 'transformers.models.gemma4.configuration_gemma4.Gemma4Config'>
for this kind of AutoModel: AutoModelForSequenceClassification.
Motivation
AutoModelForSequenceClassification is the standard entry point for fine-tuning causal LMs as classifiers — a common use case across research (domain classification, sentiment, reward modelling) and production (content moderation, intent detection, financial signal prediction). Excluding it from gemma4 forces users to either:
Roll a custom classifier head wrapper (breaks the AutoModel contract and duplicates logic that already exists in GenericForSequenceClassification),
Fine-tune generatively and parse argmax over label tokens (worse calibration, loses softmax-head properties), or
Skip Gemma 4 entirely and stay on Gemma 3 or a different family.
We are currently working around this in a financial-NLP fine-tuning pipeline by subclassing GenericForSequenceClassification and registering the result via AutoModelForSequenceClassification.register(). It works but is exactly the kind of boilerplate the AutoModel registry is supposed to eliminate, and it has to be duplicated by every project that wants to fine-tune Gemma 4 as a classifier.
Gemma 4 E4B is a Mixture-of-Experts model (Gemma4TextExperts + Gemma4TextRouter confirmed in modeling_gemma4.py), which makes it particularly inconvenient for users to hand-roll a classification head: the correct pooling behaviour and last_non_pad_token handling that GenericForSequenceClassification provides for free becomes something every user has to re-implement while also navigating MoE-specific LoRA pitfalls. Adding Gemma4(Text)ForSequenceClassification upstream would spare every downstream user from reinventing the same wheel.
Your contribution
The implementation should be trivial — Gemma 3 provides the exact template. In transformers/models/gemma3/modeling_gemma3.py:
class Gemma3TextForSequenceClassification(GenericForSequenceClassification, Gemma3PreTrainedModel):
config: Gemma3TextConfig
input_modalities = ("text",)
And its multimodal sibling Gemma3ForSequenceClassification, which wires up a Gemma3Model + a nn.Linear(text_config.hidden_size, num_labels) head. The same pattern applied to Gemma 4 should require only a few lines of code plus:
Adding the class(es) to all in modeling_gemma4.py
Registering them in auto/modeling_auto.py against Gemma4Config (and Gemma4TextConfig if it exists)
I'm happy to open a PR if that would help. The change is mechanical but I'd like to confirm first which of the two variants (multimodal-path + text-only, or just text-only) the maintainers prefer — Gemma 3 shipped both, so matching that seems sensible.
Feature request
Please add
Gemma4ForSequenceClassification(and ideallyGemma4TextForSequenceClassification) totransformers.models.gemma4. The current__all__inmodeling_gemma4.pycontains only:No classification head variants are exported, which is inconsistent with every prior Gemma release:
Model family | ForSequenceClassification | TextForSequenceClassification -- | -- | -- gemma | ✅ | N/A (text-only) gemma2 | ✅ | N/A (text-only) gemma3 | ✅ | ✅ gemma3n | ✅ | ✅ gemma4 | ❌ | ❌As a result,
AutoModelForSequenceClassification.from_pretrained("google/gemma-4-E4B", num_labels=3)raises:Motivation
AutoModelForSequenceClassification is the standard entry point for fine-tuning causal LMs as classifiers — a common use case across research (domain classification, sentiment, reward modelling) and production (content moderation, intent detection, financial signal prediction). Excluding it from gemma4 forces users to either:
Roll a custom classifier head wrapper (breaks the AutoModel contract and duplicates logic that already exists in GenericForSequenceClassification),
Fine-tune generatively and parse argmax over label tokens (worse calibration, loses softmax-head properties), or
Skip Gemma 4 entirely and stay on Gemma 3 or a different family.
We are currently working around this in a financial-NLP fine-tuning pipeline by subclassing GenericForSequenceClassification and registering the result via AutoModelForSequenceClassification.register(). It works but is exactly the kind of boilerplate the AutoModel registry is supposed to eliminate, and it has to be duplicated by every project that wants to fine-tune Gemma 4 as a classifier.
Gemma 4 E4B is a Mixture-of-Experts model (Gemma4TextExperts + Gemma4TextRouter confirmed in modeling_gemma4.py), which makes it particularly inconvenient for users to hand-roll a classification head: the correct pooling behaviour and last_non_pad_token handling that GenericForSequenceClassification provides for free becomes something every user has to re-implement while also navigating MoE-specific LoRA pitfalls. Adding Gemma4(Text)ForSequenceClassification upstream would spare every downstream user from reinventing the same wheel.
Your contribution
The implementation should be trivial — Gemma 3 provides the exact template. In transformers/models/gemma3/modeling_gemma3.py:
class Gemma3TextForSequenceClassification(GenericForSequenceClassification, Gemma3PreTrainedModel):
config: Gemma3TextConfig
input_modalities = ("text",)
And its multimodal sibling Gemma3ForSequenceClassification, which wires up a Gemma3Model + a nn.Linear(text_config.hidden_size, num_labels) head. The same pattern applied to Gemma 4 should require only a few lines of code plus:
Adding the class(es) to all in modeling_gemma4.py
Registering them in auto/modeling_auto.py against Gemma4Config (and Gemma4TextConfig if it exists)
I'm happy to open a PR if that would help. The change is mechanical but I'd like to confirm first which of the two variants (multimodal-path + text-only, or just text-only) the maintainers prefer — Gemma 3 shipped both, so matching that seems sensible.