[RFC] NMS API Change

To support gluoncv object detection model, nms operator api needs to be changed.
While the old api is nms(data, valid_count, overlap_threshold, force_suppress, topk), new api is non_max_suppression(data, valid_count, return_indices, iou_threshold, force_suppress, topk, id_axis, invalid_to_bottom).

- overlap_threshold is changed to iou_threshold to align with intersection over union(IoU) in object detection context. 
- id_axis is the axis of class categories
- invalid_to_bottom is to decide whether to move invalid boxes to the bottom. 
- return_indices indicating whether to return box or box indices.

This new api can support both mxnet legacy ssd model and gluoncv box_nms op.

Some investigation for nms implementation in other frameworks:
```
Tensorflow and Pytorch:
non_max_suppression(
    boxes,
    scores,
    max_output_size,
    iou_threshold=0.5,
    score_threshold=float('-inf'),
)
Note that this nms is for single instance and boxes/scores doesn't include batch axis: 
boxes: A 2-D float Tensor of shape [num_boxes, 4].
scores: A 1-D float Tensor of shape [num_boxes] representing a single score corresponding to each box (each row of boxes).
The output is selected indices which has variable length depending on the input data:
selected_indices: A 1-D integer Tensor of shape [M] representing the selected indices from the boxes tensor, where M <= max_output_size.

Keras:
DecodeDetections Layer(
    confidence_thresh=0.01,
    iou_threshold=0.45,
    top_k=200,
    nms_max_output_size=400,
    coords='centroids',
    normalize_coords=True,
    img_height=None,
    img_width=None,
)
Input shape:
    3D tensor of shape (batch_size, n_boxes, n_classes + 12).
Output shape:
    3D tensor of shape (batch_size, top_k, 6).
This doesn't only contains nms but some other preprocessing steps.

Proposed TVM non_max_suppression(
    data,
    valid_counts,
    max_output_size=-1,
    iou_threshold=0.5,
    force_suppress=False,
    top_k=-1,
    id_index=0,
    return_indices=True,
    invalid_to_bottom=True,
)
data : tvm.Tensor
    3-D tensor with shape [batch_size, num_anchors, 6].
    The last dimension should be in format of  
    [class_id, score, box_left, box_top, box_right,  box_bottom].
valid_count : tvm.Tensor
    1-D tensor for valid number of boxes.
out : tvm.Tensor
    3-D tensor with shape [batch_size, num_anchors, 6].
```
One key difference between tvm implementation and tf/pt implementation is tvm always returns a fixed shape output and pad invalid boxed with -1, while tf/pt returns a variable shape tensor denpending on input data values.

@zhreshold @tqchen @Laurawly @vinx13 Do you have concerns about naming or other aspects?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] NMS API Change #2535

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] NMS API Change #2535

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions