Skip to content

Conversation

@zhiics
Copy link
Member

@zhiics zhiics commented Sep 30, 2020

This PR enables dynamic conv2d for CUDA.

CC @kevinthesun @icemelon9 @mbrookhart @comaniac

)
cfg.add_flop(2 * N * CO * H * W * CI * KH * KW)
if isinstance(N, int):
cfg.add_flop(2 * N * CO * H * W * CI * KH * KW)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinthesun @icemelon9 @comaniac is this okay to autotvm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay in terms of the functionality, but the output message would be weird. Since the AutoTVM progress bar shows throughput instead of latency, users will always see 0 GFLOPS during the tuning process (https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/tuner/callback.py#L159).

Maybe we can still have the FLOPS with N=1 and pop a message saying we are tuning the kernel with N=1 but it can be used by the kernel with any batch size?

Copy link
Member Author

@zhiics zhiics Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought about 1 as well. But it actually maybe not 1

Copy link
Contributor

@kevinthesun kevinthesun Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine since generally AutoTVM can't be used for dynamic shape op. User won't see any flops info when N is symbolic.

)
cfg.add_flop(2 * N * CO * H * W * CI * KH * KW)
if isinstance(N, int):
cfg.add_flop(2 * N * CO * H * W * CI * KH * KW)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay in terms of the functionality, but the output message would be weird. Since the AutoTVM progress bar shows throughput instead of latency, users will always see 0 GFLOPS during the tuning process (https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/tuner/callback.py#L159).

Maybe we can still have the FLOPS with N=1 and pop a message saying we are tuning the kernel with N=1 but it can be used by the kernel with any batch size?

@comaniac comaniac changed the title [RELAY][OP] Dynamic conv2d for cuda [RELAY][OP] Dynamic conv2d batch size for cuda Sep 30, 2020
Copy link
Contributor

@kevinthesun kevinthesun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comaniac comaniac merged commit e78aa61 into apache:master Oct 1, 2020
@comaniac
Copy link
Contributor

comaniac commented Oct 1, 2020

Thanks @zhiics @kevinthesun

TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 13, 2020
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 14, 2020
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 15, 2020
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Oct 16, 2020
@zhiics zhiics deleted the dynamic_conv2d_cuda branch October 17, 2020 00:17
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants