-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Description
Relevant discuss post - https://discuss.tvm.ai/t/topi-using-x86-schedules-for-arm-conv2d/6365
Currently, TVM has different schedules for ARM and Intel for conv2d operators. The discuss post listed above shows that Intel conv2d NCHWc schedule on ARM gives better end-to-end latency compared to ARM NCHW conv2d spatial pack schedule for many TFLite networks.
However, this is just one opportunity and there are also some more ideas that we should pursue. This issue lists those potential issues and anybody interested can pick them up. This list is a result of discussions from the above post.
- Try ARM winograd NCHW schedule on Intel servers.
- Write NCHWc winograd for Intel servers to comply with existing NCHWc conv2d implementation - work started by @ajtulloch -[X86] [NNVM] [TOPI] Implement NCHWc Winograd convolutions #2111
- Work on NHWC ARM schedule - tuning and optimization - work started by @jackwish - [TOPI][AutoTVM] NHWC conv2d templates for ARM #3859
- Investigate NHWC vs NCHWc schedule - NCHWc can bring data layouts transform. Check if NHWC can achieve same performance as NCHWc conv2d, and eliminate data layout conversion overhead.
FrozenGene, tqchen, comaniac, zhenhuaw-me and masahi
Metadata
Metadata
Assignees
Labels
No labels