Skip to content

[RFC] NMS API Change #2535

@kevinthesun

Description

@kevinthesun

To support gluoncv object detection model, nms operator api needs to be changed.
While the old api is nms(data, valid_count, overlap_threshold, force_suppress, topk), new api is non_max_suppression(data, valid_count, return_indices, iou_threshold, force_suppress, topk, id_axis, invalid_to_bottom).

  • overlap_threshold is changed to iou_threshold to align with intersection over union(IoU) in object detection context.
  • id_axis is the axis of class categories
  • invalid_to_bottom is to decide whether to move invalid boxes to the bottom.
  • return_indices indicating whether to return box or box indices.

This new api can support both mxnet legacy ssd model and gluoncv box_nms op.

Some investigation for nms implementation in other frameworks:

Tensorflow and Pytorch:
non_max_suppression(
    boxes,
    scores,
    max_output_size,
    iou_threshold=0.5,
    score_threshold=float('-inf'),
)
Note that this nms is for single instance and boxes/scores doesn't include batch axis: 
boxes: A 2-D float Tensor of shape [num_boxes, 4].
scores: A 1-D float Tensor of shape [num_boxes] representing a single score corresponding to each box (each row of boxes).
The output is selected indices which has variable length depending on the input data:
selected_indices: A 1-D integer Tensor of shape [M] representing the selected indices from the boxes tensor, where M <= max_output_size.

Keras:
DecodeDetections Layer(
    confidence_thresh=0.01,
    iou_threshold=0.45,
    top_k=200,
    nms_max_output_size=400,
    coords='centroids',
    normalize_coords=True,
    img_height=None,
    img_width=None,
)
Input shape:
    3D tensor of shape (batch_size, n_boxes, n_classes + 12).
Output shape:
    3D tensor of shape (batch_size, top_k, 6).
This doesn't only contains nms but some other preprocessing steps.

Proposed TVM non_max_suppression(
    data,
    valid_counts,
    max_output_size=-1,
    iou_threshold=0.5,
    force_suppress=False,
    top_k=-1,
    id_index=0,
    return_indices=True,
    invalid_to_bottom=True,
)
data : tvm.Tensor
    3-D tensor with shape [batch_size, num_anchors, 6].
    The last dimension should be in format of  
    [class_id, score, box_left, box_top, box_right,  box_bottom].
valid_count : tvm.Tensor
    1-D tensor for valid number of boxes.
out : tvm.Tensor
    3-D tensor with shape [batch_size, num_anchors, 6].

One key difference between tvm implementation and tf/pt implementation is tvm always returns a fixed shape output and pad invalid boxed with -1, while tf/pt returns a variable shape tensor denpending on input data values.

@zhreshold @tqchen @Laurawly @vinx13 Do you have concerns about naming or other aspects?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions