Skip to content

Conversation

@kevinthesun
Copy link
Contributor

This PR limits the resources used by dynamic shape gpu kernels to avoid runtime errors. It also skips CallPacked in vm if kernel has only one output and this output is empty, like (1, 0, 6).

After this PR, TF and PT object detection models should be runnable on Nvidia GPU.

@zhiics @Laurawly @mbrookhart

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nitpicks, but overall, it looks great, awesome work.

mod,
[np_indices_result, np_valid_box_count],
only_vm=False,
disable_targets=["nvptx"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the empty output VM change 👍
Why disable nvptx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is issue causing segfault from dynamic nms for nvptx, and generally we need thrust for any dynamic shape sorting. For now nvptx is not ready for these operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'm trying to fix the default sort kernel in #7099, if you want to take a look


num_inputs = 25
x = [relay.var("x", shape=(relay.Any(),), dtype="float32") for _ in range(num_inputs)]
z = relay.op.concatenate(x, axis=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests the injective schedule 👍

score_axis = score_index
score_shape = (batch_size, num_anchors)
score_tensor = te.compute(score_shape, lambda i, j: data[i, j, score_axis], tag=tag.ELEMWISE)
data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", data_alignment=8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, but I'm a little surprised it's necessary. Do you have a test case that breaks the current code, or is this mostly for performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the nms workload is large like in RCNN models, general cuda injective schedule can still cause runtime error even with the improvement of this PR. It's common that any dynamic injective op can have runtime issue with current uniform cuda injective schedule.

This problem is not directly related to nms, but cuda injective schedule. Later we might need to revisit this part for gpu dynamic ops and have a better and more general solution(together with more tests).

if cfg.is_fallback:
N, F, Y, X = get_const_tuple(conv.shape)
if not isinstance(N, int):
N = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that hits this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we do have a test for this. Now I enabled all targets.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinthesun kevinthesun merged commit bad149e into apache:main Dec 17, 2020
@kevinthesun
Copy link
Contributor Author

Thanks @mbrookhart

masahi pushed a commit to masahi/tvm that referenced this pull request Dec 18, 2020
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
* Fix GPU dynamic op schedules

* Fix dynamic shape nms

* Fix

* Fix test format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants