Skip to content

Performance regression of quantization on CUDA after [Relay][AutoTVM] Relay op strategy (#4644)  #4972

@w-zr

Description

@w-zr

My environment:

Linux ziran-pc 5.5.6-1-MANJARO #1 SMP Mon Feb 24 09:24:51 UTC 2020 x86_64 GNU/Linux
CUDA Version: 10.2
Python 3.8.1
gcc (Arch Linux 9.2.1+20200130-2) 9.2.1 20200130

Here is my code, which uses resnet18v1 onnx model.

resnetv1 = onnx.load('models/resnet18v1.onnx')
input_blob = resnetv1.graph.input[0]
input_shape = tuple(map(lambda x: getattr(x, 'dim_value'), input_blob.type.tensor_type.shape.dim))
shape_dict = {input_blob.name: input_shape}
mod_resnetv1, params_resnetv1 = relay.frontend.from_onnx(resnetv1, shape_dict)

mod_q_resnetv1 = quantize(mod_resnetv1, params_resnetv1)

graph, mod, params = relay.build_module.build(mod_q_resnetv1, target='cuda', params=params_resnetv1)

val_data = get_val_data()
for i, batch in enumerate(val_data):
    if i > 0:
        break
    data, categories = batch['data'], batch['label']
    m = debug_runtime.create(graph, mod, ctx, dump_root='tvmdbg')
    m.set_input('data', tvm.nd.array(data.astype('float32')))
    m.run()
    tvm_out = m.get_output(0, tvm.nd.empty(tuple([1, 1000]), 'float32')).asnumpy() 

Output when TVM is at ([Fix] Fix get_valid_count flaky test for cuda (#4901)):

[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92: Iteration: 0
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #0 fused_nn_conv2d_multiply_add_nn_relu: 1685.52 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #1 fused_nn_max_pool2d_1: 32.843 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #2 fused_multiply_round_clip_cast: 13.9443 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_: 320.88 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3: 321.255 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #5 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_: 16.196 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #6 fused_cast_25: 12.0867 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1: 319.658 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4: 322.954 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #9 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2: 15.1093 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2: 63.3707 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #11 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2: 482.38 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #12 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5: 508.352 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #13 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2: 12.5682 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #14 fused_cast_24: 10.7158 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #15 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3: 506.871 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #16 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6: 510.042 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #17 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1: 12.363 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #18 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1: 77.2029 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #19 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4: 691.62 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #20 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7: 532.286 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #21 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1: 10.7689 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #22 fused_cast_23: 9.9673 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #23 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5: 538.167 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #24 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8: 540.056 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #25 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_: 11.4951 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #26 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_: 104.663 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #27 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6: 962.534 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #28 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9: 1023.26 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #29 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_: 9.9758 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #30 fused_cast_22: 9.3292 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #31 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7: 1025.56 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #32 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10: 1024.85 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #33 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 10.0607 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #34 fused_nn_global_avg_pool2d_cast_multiply: 12.0975 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.2545 us/iter
[22:55:22] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #36 fused_nn_dense_nn_bias_add: 21.2773 us/iter
Node Name                                                                                                  Ops                                                                                                        Time(us)   Time(%)  Shape              Inputs  Outputs
---------                                                                                                  ---                                                                                                        --------   -------  -----              ------  -------
fused_nn_conv2d_multiply_add_nn_relu                                                                       fused_nn_conv2d_multiply_add_nn_relu                                                                       1685.52    14.294   (1, 64, 112, 112)  4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7   1025.56    8.697    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10  fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10  1024.85    8.691    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9   1023.26    8.678    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6   962.534    8.163    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4   691.62     5.865    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8   540.056    4.58     (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5   538.167    4.564    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7   532.286    4.514    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6   510.042    4.325    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5   508.352    4.311    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3   506.871    4.299    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2   482.38     4.091    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4   322.954    2.739    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3   321.255    2.724    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_     fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_     320.88     2.721    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1   319.658    2.711    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_     fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_     104.663    0.888    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1   77.203     0.655    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2   63.371     0.537    (1, 128, 28, 28)   4       1
fused_nn_max_pool2d_1                                                                                      fused_nn_max_pool2d_1                                                                                      32.843     0.279    (1, 64, 56, 56)    1       1
fused_nn_dense_nn_bias_add                                                                                 fused_nn_dense_nn_bias_add                                                                                 21.277     0.18     (1, 1000)          3       1
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_      fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_      16.196     0.137    (1, 64, 56, 56)    2       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2    fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2    15.109     0.128    (1, 64, 56, 56)    2       1
fused_multiply_round_clip_cast                                                                             fused_multiply_round_clip_cast                                                                             13.944     0.118    (1, 64, 56, 56)    1       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2   fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2   12.568     0.107    (1, 128, 28, 28)   2       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1    fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1    12.363     0.105    (1, 128, 28, 28)   2       1
fused_nn_global_avg_pool2d_cast_multiply                                                                   fused_nn_global_avg_pool2d_cast_multiply                                                                   12.097     0.103    (1, 512, 1, 1)     1       1
fused_cast_25                                                                                              fused_cast_25                                                                                              12.087     0.103    (1, 64, 56, 56)    1       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_      fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_      11.495     0.097    (1, 256, 14, 14)   2       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1   fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1   10.769     0.091    (1, 256, 14, 14)   2       1
fused_cast_24                                                                                              fused_cast_24                                                                                              10.716     0.091    (1, 128, 28, 28)   1       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast                                  fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast                                  10.061     0.085    (1, 512, 7, 7)     2       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_     fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_     9.976      0.085    (1, 512, 7, 7)     2       1
fused_cast_23                                                                                              fused_cast_23                                                                                              9.967      0.085    (1, 256, 14, 14)   1       1
fused_cast_22                                                                                              fused_cast_22                                                                                              9.329      0.079    (1, 512, 7, 7)     1       1
fused_nn_batch_flatten_nn_batch_flatten_multiply                                                           fused_nn_batch_flatten_nn_batch_flatten_multiply                                                           9.254      0.078    (1, 512)           1       1
Total_time                                                                                                 -                                                                                                          11791.534  -        -                  -       -

Output when TVM is at ([Relay][AutoTVM] Relay op strategy (#4644)):

[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:92: Iteration: 0
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #0 fused_nn_conv2d_multiply_add_nn_relu: 4584.26 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #1 fused_nn_max_pool2d_1: 30.2865 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #2 fused_multiply_round_clip_cast: 14.6314 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #3 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_: 5281.79 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #4 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3: 5251.26 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #5 fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_: 19.2247 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #6 fused_cast_25: 12.4631 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #7 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1: 5161.39 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #8 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4: 5320.71 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #9 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2: 107.187 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #10 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2: 59.8113 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #11 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2: 426.696 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #12 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5: 9036.95 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #13 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2: 18.7588 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #14 fused_cast_24: 13.5717 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #15 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3: 9323.67 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #16 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6: 9690.43 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #17 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1: 76.843 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #18 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1: 70.4272 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #19 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4: 596.825 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #20 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7: 9047.68 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #21 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1: 56.8034 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #22 fused_cast_23: 10.0938 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #23 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5: 8854.5 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #24 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8: 9212.74 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #25 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_: 14.1323 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #26 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_: 93.6364 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #27 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6: 843.468 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #28 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9: 11918 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #29 fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_: 56.1085 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #30 fused_cast_22: 10.012 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #31 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7: 11729.8 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #32 fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10: 12051.1 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #33 fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast: 38.601 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #34 fused_nn_global_avg_pool2d_cast_multiply: 22.1764 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #35 fused_nn_batch_flatten_nn_batch_flatten_multiply: 9.9415 us/iter
[22:43:06] /home/ziran/repositories/incubator-tvm/src/runtime/graph/debug/graph_runtime_debug.cc:97: Op #36 fused_nn_dense_nn_bias_add: 22.5578 us/iter
Node Name                                                                                                  Ops                                                                                                        Time(us)    Time(%)  Shape              Inputs  Outputs
---------                                                                                                  ---                                                                                                        --------    -------  -----              ------  -------
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10  fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__10  12051.1     10.119   (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__9   11918.0     10.008   (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__7   11729.8     9.85     (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__6   9690.43     8.137    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__3   9323.67     7.829    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__8   9212.74     7.736    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__7   9047.68     7.597    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__5   9036.95     7.588    (1, 128, 28, 28)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__5   8854.5      7.435    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__4   5320.71     4.468    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_     fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036_     5281.79     4.435    (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__3   5251.26     4.41     (1, 64, 56, 56)    4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__1   5161.39     4.334    (1, 64, 56, 56)    4       1
fused_nn_conv2d_multiply_add_nn_relu                                                                       fused_nn_conv2d_multiply_add_nn_relu                                                                       4584.26     3.849    (1, 64, 112, 112)  4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__6   843.468     0.708    (1, 512, 7, 7)     4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__4   596.825     0.501    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_nn_relu_cas_14207774232819154036__2   426.696     0.358    (1, 128, 28, 28)   4       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2    fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__2    107.187     0.09     (1, 64, 56, 56)    2       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_     fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588_     93.636      0.079    (1, 512, 7, 7)     4       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1    fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948__1    76.843      0.065    (1, 128, 28, 28)   2       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__1   70.427      0.059    (1, 256, 14, 14)   4       1
fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2   fused_nn_conv2d_cast_multiply_add_right_shift_clip_cast_multiply_add_cast_multip_12768018879016187588__2   59.811      0.05     (1, 128, 28, 28)   4       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1   fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__1   56.803      0.048    (1, 256, 14, 14)   2       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_     fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089_     56.108      0.047    (1, 512, 7, 7)     2       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast                                  fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast                                  38.601      0.032    (1, 512, 7, 7)     2       1
fused_nn_max_pool2d_1                                                                                      fused_nn_max_pool2d_1                                                                                      30.287      0.025    (1, 64, 56, 56)    1       1
fused_nn_dense_nn_bias_add                                                                                 fused_nn_dense_nn_bias_add                                                                                 22.558      0.019    (1, 1000)          3       1
fused_nn_global_avg_pool2d_cast_multiply                                                                   fused_nn_global_avg_pool2d_cast_multiply                                                                   22.176      0.019    (1, 512, 1, 1)     1       1
fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_      fused_multiply_round_clip_cast_cast_left_shift_multiply_add_right_shift_cast_add_2320814265661055830_      19.225      0.016    (1, 64, 56, 56)    2       1
fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2   fused_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multiply_ad_12564017943341662089__2   18.759      0.016    (1, 128, 28, 28)   2       1
fused_multiply_round_clip_cast                                                                             fused_multiply_round_clip_cast                                                                             14.631      0.012    (1, 64, 56, 56)    1       1
fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_      fused_cast_cast_left_shift_multiply_add_right_shift_cast_add_nn_relu_cast_multip_3103932645001264948_      14.132      0.012    (1, 256, 14, 14)   2       1
fused_cast_24                                                                                              fused_cast_24                                                                                              13.572      0.011    (1, 128, 28, 28)   1       1
fused_cast_25                                                                                              fused_cast_25                                                                                              12.463      0.01     (1, 64, 56, 56)    1       1
fused_cast_23                                                                                              fused_cast_23                                                                                              10.094      0.008    (1, 256, 14, 14)   1       1
fused_cast_22                                                                                              fused_cast_22                                                                                              10.012      0.008    (1, 512, 7, 7)     1       1
fused_nn_batch_flatten_nn_batch_flatten_multiply                                                           fused_nn_batch_flatten_nn_batch_flatten_multiply                                                           9.941       0.008    (1, 512)           1       1
Total_time                                                                                                 -                                                                                                          119088.537  -        -                  -       -

Besides, the accuracy after the commit is close to zero on ILSVRC2012_img_val dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions