Skip to content

Fix AttributeError in run_classification.py when detecting multi-label data#43967

Open
shtse8 wants to merge 1 commit intohuggingface:mainfrom
shtse8:fix-run-classification-multilabel-detection
Open

Fix AttributeError in run_classification.py when detecting multi-label data#43967
shtse8 wants to merge 1 commit intohuggingface:mainfrom
shtse8:fix-run-classification-multilabel-detection

Conversation

@shtse8
Copy link
Copy Markdown
Contributor

@shtse8 shtse8 commented Feb 12, 2026

What does this PR do?

Fixes the AttributeError: 'List' object has no attribute 'dtype' crash in run_classification.py when loading JSON data with list-type labels for multi-label classification (reported in #43116).

Problem

When label data is in list format (e.g. "labels": ["4359", "5181"]), the HuggingFace datasets library infers the feature type as Sequence/List, which does not have a dtype attribute. The existing code crashes in two places:

  1. is_regression check (line 416): raw_datasets["train"].features["label"].dtype in ["float32", "float64"]AttributeError
  2. Multi-label detection (line 442): ...features["label"].dtype == "list"AttributeError

Fix

  1. Use getattr(label_feature, "dtype", None) for the regression check, so list-type features safely return None (not regression)
  2. Use isinstance(feature, datasets.Sequence) for multi-label detection instead of checking .dtype == "list"

Reproduction

# With JSON data like: [{"text": "...", "labels": ["a", "b"]}, ...]
# Before fix:
#   AttributeError: 'List' object has no attribute 'dtype'
# After fix:
#   Correctly detects multi-label classification

Note: This PR only fixes the crash in multi-label detection. The separate thresholding issue for empty predictions is tracked in #43135.

When loading JSON data with list-type labels for multi-label
classification, the label feature is a datasets.Sequence/List object
which does not have a 'dtype' attribute, causing:

  AttributeError: 'List' object has no attribute 'dtype'

Two fixes:
1. Use getattr(feature, 'dtype', None) for the is_regression check
   so list-type features don't crash (they're not regression)
2. Use isinstance(feature, datasets.Sequence) for the multi-label
   detection instead of checking .dtype == 'list'

Fixes part of huggingface#43116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant