Skip to content

Conversation

@FrozenGene
Copy link
Member

@FrozenGene FrozenGene commented Oct 29, 2018

To leverage existing spatial pack schedule and add tunable compute_at knob to re-implement ARM CPU's depthwise convolution.

On my A53@2.0GHz ARM CPU (MTK6763), which can boost 1.6X performance compared with previous depthwise convolution in the Mobilenet V1 model (I have also checked the correctness of this schedule).

The following is the Tensorflow Mobilenet V1 model auto tvm training GFLOPS log:
Currently:
[Task 2/20] Current/Best: 0.98/ 2.32 GFLOPS | Progress: (1427/2000) | 2679.82 s Done.
[Task 4/20] Current/Best: 0.56/ 1.15 GFLOPS | Progress: (1072/2000) | 2461.27 s Done.
[Task 6/20] Current/Best: 1.08/ 2.78 GFLOPS | Progress: (1084/2000) | 1987.91 s Done.
[Task 8/20] Current/Best: 0.39/ 1.19 GFLOPS | Progress: (1815/2000) | 2744.70 s Done.
[Task 10/20] Current/Best: 1.09/ 2.33 GFLOPS | Progress: (1222/2000) | 1866.02 s Done.
[Task 12/20] Current/Best: 0.42/ 0.90 GFLOPS | Progress: (1716/2000) | 2528.94 s Done.
[Task 14/20] Current/Best: 1.89/ 2.63 GFLOPS | Progress: (1284/2000) | 2288.55 s Done.
[Task 16/20] Current/Best: 0.47/ 0.96 GFLOPS | Progress: (1467/2000) | 2282.65 s Done.
[Task 18/20] Current/Best: 1.43/ 2.61 GFLOPS | Progress: (1007/2000) | 1525.76 s Done.

After this PR optimization:
[Task 2/20] Current/Best: 0.00/ 4.83 GFLOPS | Progress: (1682/2000) | 1470.40 s Done.
[Task 4/20] Current/Best: 1.35/ 3.17 GFLOPS | Progress: (1257/2000) | 1032.80 s Done.
[Task 6/20] Current/Best: 2.04/ 5.49 GFLOPS | Progress: (1904/2000) | 1623.10 s Done.
[Task 8/20] Current/Best: 0.75/ 3.15 GFLOPS | Progress: (1885/2000) | 1546.22 s Done.
[Task 10/20] Current/Best: 2.09/ 6.07 GFLOPS | Progress: (2000/2000) | 1640.41 s Done.
[Task 12/20] Current/Best: 2.99/ 3.80 GFLOPS | Progress: (1853/2000) | 1547.13 s Done.
[Task 14/20] Current/Best: 4.59/ 6.06 GFLOPS | Progress: (1355/2000) | 1091.93 s Done.
[Task 16/20] Current/Best: 1.96/ 4.01 GFLOPS | Progress: (2000/2000) | 1586.18 s Done.
[Task 18/20] Current/Best: 2.33/ 4.63 GFLOPS | Progress: (2000/2000) | 1599.89 s Done.

The depthwise convolution total execution time on single A53@2.0GHz time can be from 45.3839ms to 28.1945ms.

One thing you must notice to use this schedule: You MUST make the XGBTunner constructor’s feature type argument be feature_type= 'knob'. i.e. XGBTuner(tsk, loss_type='rank', feature_type='knob'). Otherwise your program maybe hang forever.

@merrymercy @tqchen Pls review it.

@FrozenGene
Copy link
Member Author

FrozenGene commented Oct 29, 2018

The CI's test_topi_depthwise_conv2d.py test error is because I have modified the schedule, which doesn't have tile_c and so on in previous schedule. But TopHub(fall back will use) doesn't contain my new schedule cfg. Maybe I need @merrymercy 's help to handle this situation.

icemelon and others added 20 commits December 20, 2018 14:40
* Add Eddie to committer

* Fix order
* Add MXNet test example for relay
* Fix a bug in BiasAddSimplifier
Dtype of output of pad should follows input, but if dtype of input is not float,
  output will still be float becase pad_value is float.
…and ssd ops (apache#2322)

* add ssd ops to mxnet.py

* add ssd ops to mxnet.py

* add result check for multibox and nms unit tests

* add result check for multibox and nms unit tests

* address @kevinthesun's comments

* Disable cuda test for nms for now.
dtype of count is the same as dtype of inputs[0] when created, but its type may
  change when multiplied by inputs[0]->shape[i]. Which causes dtype of
  output is not same as dtype of input.
* Add cast op
* Rename dtype_cast to cast
* Add additional safety check for String2TVMType
* Add missing relay op docs
@FrozenGene FrozenGene force-pushed the arm_cpu_depthwise_convolution branch from d95a24c to aa73419 Compare December 27, 2018 12:28
@FrozenGene FrozenGene force-pushed the arm_cpu_depthwise_convolution branch from aa73419 to bfc259b Compare December 27, 2018 12:41
@FrozenGene FrozenGene closed this Dec 27, 2018
@FrozenGene FrozenGene deleted the arm_cpu_depthwise_convolution branch December 27, 2018 12:51
@FrozenGene
Copy link
Member Author

I'm very sorry that I commit the merge code previously. I wish this doesn't interrupt you.

Currently, I open one new PR: #2345 to continue this work and add this PR as reference in case people are interested in the background.

Sorry again for my mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.