[Pipeline] Add top_k, label filtering, box_format and score sorting to ObjectDetectionPipeline#44594
Open
vimal-crypto wants to merge 1 commit intohuggingface:mainfrom
Open
Conversation
…o ObjectDetectionPipeline
This PR brings ObjectDetectionPipeline in line with its sister pipelines
(ZeroShotObjectDetectionPipeline, ImageClassificationPipeline) by adding
four enhancements:
1. Score sorting: results are now always returned sorted by descending
confidence score, consistent with ZeroShotObjectDetectionPipeline and
ImageClassificationPipeline.
2. top_k parameter: allows users to cap the number of returned detections
to the N highest-confidence results.
3. labels parameter: accepts a list of class-name strings; only detections
whose label appears in the list are returned. This is a novel filtering
capability with no prior equivalent in the standard detection pipeline.
4. box_format parameter: controls the coordinate format of the returned
bounding boxes. Supported values:
- 'xyxy' (default): {xmin, ymin, xmax, ymax} in pixels (backward compat)
- 'xywh': {x_center, y_center, width, height} in pixels
- 'normalized': {xmin, ymin, xmax, ymax} as floats in [0, 1]
All new parameters are optional with safe defaults, preserving 100%
backward compatibility. _get_bounding_box is extended to handle all three
formats in one place.
8 new unit tests added covering:
- top_k truncation
- score sort order guarantee
- label allowlist filtering (match and no-match)
- all three box_format values
- invalid box_format raises ValueError
Fixes inconsistency where ObjectDetectionPipeline was the only vision
pipeline without top_k or sorted outputs.
This was referenced Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR brings
ObjectDetectionPipelinein line with its sister pipelines (ZeroShotObjectDetectionPipeline,ImageClassificationPipeline) by adding four enhancements to the postprocessing stage.Changes
1. Score sorting — results are now always returned sorted by descending confidence score, consistent with
ZeroShotObjectDetectionPipelineandImageClassificationPipeline. PreviouslyObjectDetectionPipelinewas the only vision pipeline that returned results in raw model-output order (anchor order), which was unintuitive and undocumented.2.
top_kparameter — allows users to cap the number of returned detections to the N highest-confidence results. Defaults toNone(return all detections abovethreshold), preserving full backward compatibility.3.
labelsparameter — accepts a list of class-name strings; only detections whose label appears in the list are returned. This is a novel filtering capability with no prior equivalent in the standard object detection pipeline. Defaults toNone(return all detected classes).4.
box_formatparameter — controls the coordinate format of the returned bounding boxes. Supported values:"xyxy"(default, backward-compatible):{xmin, ymin, xmax, ymax}in pixels"xywh":{x_center, y_center, width, height}in pixels"normalized":{xmin, ymin, xmax, ymax}as floats in[0, 1]relative to image dimensionsThe
_get_bounding_boxhelper is extended to handle all three formats in one place.Motivation
ZeroShotObjectDetectionPipelinealready hastop_kand sorted outputs (seepostprocess)ImageClassificationPipelinealready hastop_kand sorted outputsObjectDetectionPipelinewas the only vision pipeline missing these features, creating an inconsistency across the pipeline familybox_formatandlabelsare new capabilities addressing common downstream use-cases (COCO eval tools, Roboflow, OpenCV, torchvision draw utilities all expect different box formats; class filtering is a very common need)Breaking changes
None. All four new parameters are fully optional with safe defaults:
top_k=None→ no truncationlabels=None→ no filteringbox_format="xyxy"→ identical output to beforeExisting slow test expected outputs in
test_large_model_ptandtest_integration_torch_object_detectionare updated to reflect the new sorted order.Tests added
8 new unit tests covering all new parameters and edge cases:
test_top_ktest_results_sorted_by_scoretest_label_filtertest_label_filter_excludes_alltest_box_format_xyxytest_box_format_xywhtest_box_format_normalizedtest_box_format_invalid_raisesFiles changed
src/transformers/pipelines/object_detection.pytests/pipelines/test_pipelines_object_detection.py