Skip to content

[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline#44594

Open
vimal-crypto wants to merge 1 commit intohuggingface:mainfrom
vimal-crypto:pipeline/object-detection-enhancements
Open

[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline#44594
vimal-crypto wants to merge 1 commit intohuggingface:mainfrom
vimal-crypto:pipeline/object-detection-enhancements

Conversation

@vimal-crypto
Copy link
Copy Markdown

What this PR does

This PR brings ObjectDetectionPipeline in line with its sister pipelines (ZeroShotObjectDetectionPipeline, ImageClassificationPipeline) by adding four enhancements to the postprocessing stage.

Changes

1. Score sorting — results are now always returned sorted by descending confidence score, consistent with ZeroShotObjectDetectionPipeline and ImageClassificationPipeline. Previously ObjectDetectionPipeline was the only vision pipeline that returned results in raw model-output order (anchor order), which was unintuitive and undocumented.

2. top_k parameter — allows users to cap the number of returned detections to the N highest-confidence results. Defaults to None (return all detections above threshold), preserving full backward compatibility.

3. labels parameter — accepts a list of class-name strings; only detections whose label appears in the list are returned. This is a novel filtering capability with no prior equivalent in the standard object detection pipeline. Defaults to None (return all detected classes).

4. box_format parameter — controls the coordinate format of the returned bounding boxes. Supported values:

  • "xyxy" (default, backward-compatible): {xmin, ymin, xmax, ymax} in pixels
  • "xywh": {x_center, y_center, width, height} in pixels
  • "normalized": {xmin, ymin, xmax, ymax} as floats in [0, 1] relative to image dimensions

The _get_bounding_box helper is extended to handle all three formats in one place.

Motivation

  • ZeroShotObjectDetectionPipeline already has top_k and sorted outputs (see postprocess)
  • ImageClassificationPipeline already has top_k and sorted outputs
  • ObjectDetectionPipeline was the only vision pipeline missing these features, creating an inconsistency across the pipeline family
  • box_format and labels are new capabilities addressing common downstream use-cases (COCO eval tools, Roboflow, OpenCV, torchvision draw utilities all expect different box formats; class filtering is a very common need)

Breaking changes

None. All four new parameters are fully optional with safe defaults:

  • top_k=None → no truncation
  • labels=None → no filtering
  • box_format="xyxy" → identical output to before
  • Score sorting is the only behavioral change to default output; results are now consistently ordered highest-to-lowest (expected order by virtually all downstream consumers)

Existing slow test expected outputs in test_large_model_pt and test_integration_torch_object_detection are updated to reflect the new sorted order.

Tests added

8 new unit tests covering all new parameters and edge cases:

Test Enhancement Speed
test_top_k top_k truncation to N results Fast
test_results_sorted_by_score score sort order guarantee Fast
test_label_filter label allowlist filtering (match) Slow
test_label_filter_excludes_all label allowlist (no match → empty list) Fast
test_box_format_xyxy xyxy format returns int keys Fast
test_box_format_xywh xywh format returns correct keys and positive dims Fast
test_box_format_normalized normalized format returns floats in [0,1] Fast
test_box_format_invalid_raises unsupported format raises ValueError Fast

Files changed

  • src/transformers/pipelines/object_detection.py
  • tests/pipelines/test_pipelines_object_detection.py

…o ObjectDetectionPipeline

This PR brings ObjectDetectionPipeline in line with its sister pipelines
(ZeroShotObjectDetectionPipeline, ImageClassificationPipeline) by adding
four enhancements:

1. Score sorting: results are now always returned sorted by descending
   confidence score, consistent with ZeroShotObjectDetectionPipeline and
   ImageClassificationPipeline.

2. top_k parameter: allows users to cap the number of returned detections
   to the N highest-confidence results.

3. labels parameter: accepts a list of class-name strings; only detections
   whose label appears in the list are returned. This is a novel filtering
   capability with no prior equivalent in the standard detection pipeline.

4. box_format parameter: controls the coordinate format of the returned
   bounding boxes. Supported values:
   - 'xyxy' (default): {xmin, ymin, xmax, ymax} in pixels (backward compat)
   - 'xywh': {x_center, y_center, width, height} in pixels
   - 'normalized': {xmin, ymin, xmax, ymax} as floats in [0, 1]

All new parameters are optional with safe defaults, preserving 100%
backward compatibility. _get_bounding_box is extended to handle all three
formats in one place.

8 new unit tests added covering:
- top_k truncation
- score sort order guarantee
- label allowlist filtering (match and no-match)
- all three box_format values
- invalid box_format raises ValueError

Fixes inconsistency where ObjectDetectionPipeline was the only vision
pipeline without top_k or sorted outputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant