-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[microTVM] Generalize depthwise_conv2d schedule #12856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
711a213 to
5bba3d0
Compare
Why doesn't
|
areusch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @guberti , this looks pretty good. leaving a couple comments. @tkonolige, could you give the AlterOpLayout part a look?
| KH, KW, _, _ = get_const_tuple(kernel.shape) | ||
| simd_width = get_dtype_simd_width(data.dtype) | ||
|
|
||
| HWOI_kernel_np = inputs[1].data.numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to check that the kernel is a constant and fallback to a different implementation if it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how easy it is to check if the kernel is a constant from python/tvm/relay/op/strategy/arm_cpu.py, but you're right that it is a thing we should check. I've added an assertion, though it is a bit of a stopgap solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a message to the assert and a comment about what needs to be done to not make it a stopgap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, a clean solution is hard, as the strategy function does not have access to the needed information. When conv2d_alter_op is called, inputs[1] (the kernel) has the form:
meta[relay.Constant][0] /* ty=Tensor[(3, 3, 3, 8), int16] */However when the Relay strategy functions are called, inputs[1] (the kernel) looks like:
Tensor(shape=[3, 3, 8, 1], op.name=placeholder)Nowhere inside relay/op/strategy do any of the strategy functions check whether the relevant tensors are constant, so there's not much we can do. I've added comments explaining this, but please let me know if you have ideas for how this could be done.
python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/multi_channel_convolve.py
Outdated
Show resolved
Hide resolved
python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/multi_channel_convolve.py
Outdated
Show resolved
Hide resolved
|
Thanks for the detailed comments @areusch @tkonolige! I've addressed your comments with dee04b1 - please take another look. |
dee04b1 to
5198aea
Compare
ekalda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and interesting discussion about using DSP instructions for depthwise! cc @ashutosh-arm for visibility
tkonolige
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @guberti!
* Method without SMLAD * Remove kernel packing without decreasing speed * Finish removing weights reorg * Unit tests for larger kernels * Prototype int16 depthwise schedule * Bugfixes and unit tests * Formatting and linting * Linting fix * Address comments from code review * Fix accidental winograd bug * Clarifying comment about Relay constant assertion * Another round of code review comments
* Method without SMLAD * Remove kernel packing without decreasing speed * Finish removing weights reorg * Unit tests for larger kernels * Prototype int16 depthwise schedule * Bugfixes and unit tests * Formatting and linting * Linting fix * Address comments from code review * Fix accidental winograd bug * Clarifying comment about Relay constant assertion * Another round of code review comments
* Method without SMLAD * Remove kernel packing without decreasing speed * Finish removing weights reorg * Unit tests for larger kernels * Prototype int16 depthwise schedule * Bugfixes and unit tests * Formatting and linting * Linting fix * Address comments from code review * Fix accidental winograd bug * Clarifying comment about Relay constant assertion * Another round of code review comments
This pull request removes a number of restrictions on the usage of the
depthwise_conv2dschedule for microTVM. It:int16input data typeint16version to run on Cortex-M0 and M3 (which do not have SIMD instructions)It also removes use of the
SMLADinstruction, which cannot improve performance onNHWClayouts (discussion about why in the comments below). This results in a slight performance boost for depthwise convolutions wherekernel_width * kernel_heightis odd, and no change otherwise. It also contains some readability improvements.Known issue: my call to
topi.reshapeindepthwise_conv2d.pyresults in useless C code being generated that unnecessarily duplicates an array. The performance hit from this is small, but it is a gross issue.