-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Description
To support gluoncv object detection model, nms operator api needs to be changed.
While the old api is nms(data, valid_count, overlap_threshold, force_suppress, topk), new api is non_max_suppression(data, valid_count, return_indices, iou_threshold, force_suppress, topk, id_axis, invalid_to_bottom).
- overlap_threshold is changed to iou_threshold to align with intersection over union(IoU) in object detection context.
- id_axis is the axis of class categories
- invalid_to_bottom is to decide whether to move invalid boxes to the bottom.
- return_indices indicating whether to return box or box indices.
This new api can support both mxnet legacy ssd model and gluoncv box_nms op.
Some investigation for nms implementation in other frameworks:
Tensorflow and Pytorch:
non_max_suppression(
boxes,
scores,
max_output_size,
iou_threshold=0.5,
score_threshold=float('-inf'),
)
Note that this nms is for single instance and boxes/scores doesn't include batch axis:
boxes: A 2-D float Tensor of shape [num_boxes, 4].
scores: A 1-D float Tensor of shape [num_boxes] representing a single score corresponding to each box (each row of boxes).
The output is selected indices which has variable length depending on the input data:
selected_indices: A 1-D integer Tensor of shape [M] representing the selected indices from the boxes tensor, where M <= max_output_size.
Keras:
DecodeDetections Layer(
confidence_thresh=0.01,
iou_threshold=0.45,
top_k=200,
nms_max_output_size=400,
coords='centroids',
normalize_coords=True,
img_height=None,
img_width=None,
)
Input shape:
3D tensor of shape (batch_size, n_boxes, n_classes + 12).
Output shape:
3D tensor of shape (batch_size, top_k, 6).
This doesn't only contains nms but some other preprocessing steps.
Proposed TVM non_max_suppression(
data,
valid_counts,
max_output_size=-1,
iou_threshold=0.5,
force_suppress=False,
top_k=-1,
id_index=0,
return_indices=True,
invalid_to_bottom=True,
)
data : tvm.Tensor
3-D tensor with shape [batch_size, num_anchors, 6].
The last dimension should be in format of
[class_id, score, box_left, box_top, box_right, box_bottom].
valid_count : tvm.Tensor
1-D tensor for valid number of boxes.
out : tvm.Tensor
3-D tensor with shape [batch_size, num_anchors, 6].
One key difference between tvm implementation and tf/pt implementation is tvm always returns a fixed shape output and pad invalid boxed with -1, while tf/pt returns a variable shape tensor denpending on input data values.
@zhreshold @tqchen @Laurawly @vinx13 Do you have concerns about naming or other aspects?
vinx13 and zhreshold