Feature request
While #39465 added support for larger Gemma 3 (4b parameters or more) via Gemma3ForSequenceClassification, it did not add a Gemma3TextForSequenceClassification as for example proposed in #36755 (comment). However this is necessary to use the small model variants with 1b and 270m parameters.
Motivation
Especially the smallest model with 270m parameters might be an interesting replacement for older model architectures like DistilBERT in resource-constrained environments. We have seen reasonable performance in full-parameter conversation fine-tuning, but being able to use a task-specific head for sequence classification would improve things considerably.
Your contribution
We could test proposed changes by adapting our existing training scripts which currently only work with 4b parameters or larger model variants.
Feature request
While #39465 added support for larger Gemma 3 (4b parameters or more) via
Gemma3ForSequenceClassification, it did not add aGemma3TextForSequenceClassificationas for example proposed in #36755 (comment). However this is necessary to use the small model variants with 1b and 270m parameters.Motivation
Especially the smallest model with 270m parameters might be an interesting replacement for older model architectures like DistilBERT in resource-constrained environments. We have seen reasonable performance in full-parameter conversation fine-tuning, but being able to use a task-specific head for sequence classification would improve things considerably.
Your contribution
We could test proposed changes by adapting our existing training scripts which currently only work with 4b parameters or larger model variants.