Add sequence classification support for small Gemma 3 models

### Feature request

While #39465 added support for larger Gemma 3 (4b parameters or more) via `Gemma3ForSequenceClassification`, it did not add a `Gemma3TextForSequenceClassification` as for example proposed in https://github.com/huggingface/transformers/issues/36755#issuecomment-2929098303. However this is necessary to use the small model variants with 1b and 270m parameters.

### Motivation

Especially the smallest model with 270m parameters might be an interesting replacement for older model architectures like DistilBERT in resource-constrained environments. We have seen reasonable performance in full-parameter conversation fine-tuning, but being able to use a task-specific head for sequence classification would improve things considerably.

### Your contribution

We could test proposed changes by adapting our existing training scripts which currently only work with 4b parameters or larger model variants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence classification support for small Gemma 3 models #40554

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add sequence classification support for small Gemma 3 models #40554

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions