Skip to content

Add sequence classification support for small Gemma 3 models #40554

@adamreichold

Description

@adamreichold

Feature request

While #39465 added support for larger Gemma 3 (4b parameters or more) via Gemma3ForSequenceClassification, it did not add a Gemma3TextForSequenceClassification as for example proposed in #36755 (comment). However this is necessary to use the small model variants with 1b and 270m parameters.

Motivation

Especially the smallest model with 270m parameters might be an interesting replacement for older model architectures like DistilBERT in resource-constrained environments. We have seen reasonable performance in full-parameter conversation fine-tuning, but being able to use a task-specific head for sequence classification would improve things considerably.

Your contribution

We could test proposed changes by adapting our existing training scripts which currently only work with 4b parameters or larger model variants.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions