[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline by vimal-crypto · Pull Request #44594 · huggingface/transformers

vimal-crypto · 2026-03-11T12:37:46Z

What this PR does

This PR brings ObjectDetectionPipeline in line with its sister pipelines (ZeroShotObjectDetectionPipeline, ImageClassificationPipeline) by adding four enhancements to the postprocessing stage.

Changes

1. Score sorting — results are now always returned sorted by descending confidence score, consistent with ZeroShotObjectDetectionPipeline and ImageClassificationPipeline. Previously ObjectDetectionPipeline was the only vision pipeline that returned results in raw model-output order (anchor order), which was unintuitive and undocumented.

2. top_k parameter — allows users to cap the number of returned detections to the N highest-confidence results. Defaults to None (return all detections above threshold), preserving full backward compatibility.

3. labels parameter — accepts a list of class-name strings; only detections whose label appears in the list are returned. This is a novel filtering capability with no prior equivalent in the standard object detection pipeline. Defaults to None (return all detected classes).

4. box_format parameter — controls the coordinate format of the returned bounding boxes. Supported values:

"xyxy" (default, backward-compatible): {xmin, ymin, xmax, ymax} in pixels
"xywh": {x_center, y_center, width, height} in pixels
"normalized": {xmin, ymin, xmax, ymax} as floats in [0, 1] relative to image dimensions

The _get_bounding_box helper is extended to handle all three formats in one place.

Motivation

ZeroShotObjectDetectionPipeline already has top_k and sorted outputs (see postprocess)
ImageClassificationPipeline already has top_k and sorted outputs
ObjectDetectionPipeline was the only vision pipeline missing these features, creating an inconsistency across the pipeline family
box_format and labels are new capabilities addressing common downstream use-cases (COCO eval tools, Roboflow, OpenCV, torchvision draw utilities all expect different box formats; class filtering is a very common need)

Breaking changes

None. All four new parameters are fully optional with safe defaults:

top_k=None → no truncation
labels=None → no filtering
box_format="xyxy" → identical output to before
Score sorting is the only behavioral change to default output; results are now consistently ordered highest-to-lowest (expected order by virtually all downstream consumers)

Existing slow test expected outputs in test_large_model_pt and test_integration_torch_object_detection are updated to reflect the new sorted order.

Tests added

8 new unit tests covering all new parameters and edge cases:

Test	Enhancement	Speed
`test_top_k`	top_k truncation to N results	Fast
`test_results_sorted_by_score`	score sort order guarantee	Fast
`test_label_filter`	label allowlist filtering (match)	Slow
`test_label_filter_excludes_all`	label allowlist (no match → empty list)	Fast
`test_box_format_xyxy`	xyxy format returns int keys	Fast
`test_box_format_xywh`	xywh format returns correct keys and positive dims	Fast
`test_box_format_normalized`	normalized format returns floats in [0,1]	Fast
`test_box_format_invalid_raises`	unsupported format raises ValueError	Fast

Files changed

src/transformers/pipelines/object_detection.py
tests/pipelines/test_pipelines_object_detection.py

…o ObjectDetectionPipeline This PR brings ObjectDetectionPipeline in line with its sister pipelines (ZeroShotObjectDetectionPipeline, ImageClassificationPipeline) by adding four enhancements: 1. Score sorting: results are now always returned sorted by descending confidence score, consistent with ZeroShotObjectDetectionPipeline and ImageClassificationPipeline. 2. top_k parameter: allows users to cap the number of returned detections to the N highest-confidence results. 3. labels parameter: accepts a list of class-name strings; only detections whose label appears in the list are returned. This is a novel filtering capability with no prior equivalent in the standard detection pipeline. 4. box_format parameter: controls the coordinate format of the returned bounding boxes. Supported values: - 'xyxy' (default): {xmin, ymin, xmax, ymax} in pixels (backward compat) - 'xywh': {x_center, y_center, width, height} in pixels - 'normalized': {xmin, ymin, xmax, ymax} as floats in [0, 1] All new parameters are optional with safe defaults, preserving 100% backward compatibility. _get_bounding_box is extended to handle all three formats in one place. 8 new unit tests added covering: - top_k truncation - score sort order guarantee - label allowlist filtering (match and no-match) - all three box_format values - invalid box_format raises ValueError Fixes inconsistency where ObjectDetectionPipeline was the only vision pipeline without top_k or sorted outputs.

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline#44594

[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline#44594
vimal-crypto wants to merge 1 commit intohuggingface:mainfrom
vimal-crypto:pipeline/object-detection-enhancements

vimal-crypto commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vimal-crypto commented Mar 11, 2026

What this PR does

Changes

Motivation

Breaking changes

Tests added

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant