Skip to content

Raise clear error for problem_type="single_label_classification" with num_labels=1#45611

Merged
Rocketknight1 merged 4 commits intohuggingface:mainfrom
gaurav0107:fix-45479-single-label-num-labels-1-error
Apr 24, 2026
Merged

Raise clear error for problem_type="single_label_classification" with num_labels=1#45611
Rocketknight1 merged 4 commits intohuggingface:mainfrom
gaurav0107:fix-45479-single-label-num-labels-1-error

Conversation

@gaurav0107
Copy link
Copy Markdown
Contributor

What

Adds validation in PreTrainedConfig.__post_init__ that rejects the combination problem_type="single_label_classification" + num_labels=1 with a clear ValueError pointing users to the correct setup.

Closes #45479

Why

Before this change, constructing a sequence-classification model with this combination silently produced a degenerate zero cross-entropy loss (softmax over a single logit is always 1, so CrossEntropyLoss always returns 0). Training appeared to run but accomplished nothing.

Reproducer (confirmed on main, both on shared-loss models like BERT and inlined-loss models like ModernBERT):

import torch
from transformers import BertConfig, BertForSequenceClassification
config = BertConfig(vocab_size=100, hidden_size=32, num_hidden_layers=2,
                    num_attention_heads=2, intermediate_size=64,
                    num_labels=1, problem_type="single_label_classification")
model = BertForSequenceClassification(config)
model.eval()
out = model(input_ids=torch.tensor([[1, 2, 3, 4]]), labels=torch.tensor([0]))
print(out.loss)  # tensor(0.)

Fix

PreTrainedConfig.__post_init__ now raises ValueError on this combination after num_labels is resolved from id2label. Error message redirects the user to num_labels=2 (binary classification) or problem_type="regression" (single-output regression).

The check sits at the config layer so it covers every sequence-classification model uniformly — both models that delegate to ForSequenceClassificationLoss in src/transformers/loss/loss_utils.py and models that inline the loss selection (e.g. ModernBERT, ~40 others).

Coordination / approach

This matches the maintainer direction in this comment on #45479:

"maybe we could raise a clearer error if users set num_labels=1 and problem_type="single_label_classification" to tell them not to do that?"

Tests

New regression test tests/utils/test_configuration_utils.py::ConfigTestUtils::test_single_label_classification_requires_more_than_one_label asserts the degenerate combination raises, and verifies the three valid shapes (num_labels=2 + single_label, num_labels=1 + regression, num_labels=1 with no explicit problem_type) still construct.

Verified that the test FAILS on main without the fix and PASSES with it.

Ran locally:

pytest tests/utils/test_configuration_utils.py
pytest tests/models/bert/test_modeling_bert.py -k "for_sequence or problem_type"
pytest tests/models/modernbert/test_modeling_modernbert.py -k "problem_type or sequence_classification"
pytest tests/models/roberta/test_modeling_roberta.py::RobertaModelTest::test_problem_types
pytest tests/models/albert/test_modeling_albert.py::AlbertModelTest::test_problem_types

All pass. ruff check and ruff format --check clean on the two touched files.

The existing common test_problem_types continues to pass: it assigns problem_type and num_labels as attributes on an already-constructed config, which bypasses __post_init__ by design, and the test only asserts loss.backward() doesn't error.

Compatibility

  • Valid configurations are unaffected.
  • Save/load round-trip of valid configs verified.
  • A saved Hub config containing this exact bad combination will now fail to load with a clear error. Any such checkpoint can never have been trained successfully (loss was always 0), so no working workflow regresses.

AI disclosure

AI-assisted: the fix was drafted with Claude, then human-reviewed end-to-end including the reproducer, diff, and test outputs listed above. Disclosed per CONTRIBUTING.md "AI-assisted and agentic contributions".

… num_labels=1

This combination is mathematically degenerate: applying cross-entropy loss to a
single logit always yields zero loss, so training silently accomplishes nothing.
Validate the combination in PreTrainedConfig.__post_init__ so users get a clear
error at config construction with a pointer to the correct setup (num_labels=2
for binary classification, or problem_type="regression" for a single-output
regression head).

Closes huggingface#45479
Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but the code agent was a little verbose! I'll trim it down

Comment thread src/transformers/configuration_utils.py Outdated
Comment thread tests/utils/test_configuration_utils.py Outdated
Comment thread src/transformers/configuration_utils.py
@Rocketknight1 Rocketknight1 enabled auto-merge April 24, 2026 11:40
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1 Rocketknight1 added this pull request to the merge queue Apr 24, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 24, 2026
@gaurav0107
Copy link
Copy Markdown
Contributor Author

gaurav0107 commented Apr 24, 2026

Hi @Rocketknight1
would you mind taking a look at the CI results and helping me understand what caused the failure? The tests_processors job is reporting a worker crash (crashed and worker restarting disabled, exit status 1)

@Rocketknight1
Copy link
Copy Markdown
Member

@gaurav0107 probably just an intermittent CI error, I'll throw it into the queue again

@Rocketknight1 Rocketknight1 added this pull request to the merge queue Apr 24, 2026
Merged via the queue into huggingface:main with commit c472755 Apr 24, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

problem_type="single_label_classification" with num_labels=1 leads to degenerate zero loss across multiple sequence-classification models

3 participants