-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address itPRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug
Description
I encountered this when trying to run this script over RPC on machines with v100's. Though it was done using Relax, @zxybazh says he thinks this can probably be triggered on mainline as well.
I ran ResNet-50 on V100 with an input shape of (1, 3, 224, 224), using 5 tuning trials. The tuning task began started hanging on the first tuning task, fused_conv2d_add_relu. It appeared that there were failures encountered during the task.
Output from the host:
input_name: input0
input_shape: [1, 3, 224, 224]
input_dtype: float32
/home/ubuntu/tvm-runtime/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
warnings.warn(
INFO:tvm.meta_schedule.runner.rpc_runner:RPCRunner: max_workers = 2
INFO:tvm.meta_schedule.tune:Working directory: /home/ubuntu/dump/
2022-08-05 12:13:55.897 INFO Logging directory: /home/ubuntu/dump/logs
2022-08-05 12:13:55.897 INFO Working directory: /home/ubuntu/dump/
2022-08-05 12:13:55.898 INFO Creating JSONDatabase. Workload at: /home/ubuntu/dump/database_workload.json. Tuning records at: /home/ubuntu/dump/database_tuning_record.json
2022-08-05 12:13:56.063 INFO LocalBuilder: max_workers = 24
2022-08-05 12:13:56.388 INFO Initializing Task #0: "layout_transform"
2022-08-05 12:13:56.459 INFO Initializing Task #1: "fused_conv2d_add_relu"
2022-08-05 12:13:56.726 INFO Initializing Task #2: "max_pool2d"
2022-08-05 12:13:56.866 INFO Initializing Task #3: "fused_conv2d1_add1_relu1"
2022-08-05 12:13:57.114 INFO Initializing Task #4: "fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1"
2022-08-05 12:13:58.024 INFO Initializing Task #5: "fused_conv2d2_add2"
2022-08-05 12:13:58.231 INFO Initializing Task #6: "fused_conv2d2_add2_add3_relu2"
2022-08-05 12:13:58.532 INFO Initializing Task #7: "fused_conv2d3_add1_relu1"
2022-08-05 12:13:58.784 INFO Initializing Task #8: "fused_conv2d4_add4_relu3"
2022-08-05 12:13:59.033 INFO Initializing Task #9: "fused_conv2d5_add5_relu4"
2022-08-05 12:13:59.301 INFO Initializing Task #10: "fused_conv2d7_add6"
2022-08-05 12:13:59.518 INFO Initializing Task #11: "fused_conv2d6_add6_add7_relu5"
2022-08-05 12:13:59.823 INFO Initializing Task #12: "fused_conv2d8_add5_relu4"
2022-08-05 12:14:00.077 INFO Initializing Task #13: "fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4"
2022-08-05 12:14:00.771 INFO Initializing Task #14: "fused_conv2d9_add8_relu6"
2022-08-05 12:14:01.022 INFO Initializing Task #15: "fused_conv2d10_add9_relu7"
2022-08-05 12:14:01.290 INFO Initializing Task #16: "fused_conv2d12_add10"
2022-08-05 12:14:01.504 INFO Initializing Task #17: "fused_conv2d11_add10_add11_relu8"
2022-08-05 12:14:01.806 INFO Initializing Task #18: "fused_conv2d13_add9_relu7"
2022-08-05 12:14:02.057 INFO Initializing Task #19: "fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7"
2022-08-05 12:14:02.753 INFO Initializing Task #20: "fused_conv2d14_add12_relu9"
2022-08-05 12:14:03.003 INFO Initializing Task #21: "fused_conv2d15_add13_relu10"
2022-08-05 12:14:03.272 INFO Initializing Task #22: "fused_conv2d17_add14"
2022-08-05 12:14:03.486 INFO Initializing Task #23: "fused_conv2d16_add14_add15_relu11"
2022-08-05 12:14:03.788 INFO Initializing Task #24: "fused_conv2d18_add13_relu10"
2022-08-05 12:14:04.039 INFO Initializing Task #25: "fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10"
2022-08-05 12:14:04.739 INFO Initializing Task #26: "adaptive_avg_pool2d"
2022-08-05 12:14:04.865 INFO Initializing Task #27: "fused_layout_transform1_reshape_squeeze"
2022-08-05 12:14:05.006 INFO Initializing Task #28: "fused_dense_add16"
2022-08-05 12:14:05.113 INFO
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | layout_transform | 1 | 1 | N/A | N/A | N/A | 0 |
1 | fused_conv2d_add_relu | 237633536 | 1 | N/A | N/A | N/A | 0 |
2 | max_pool2d | 1806336 | 1 | N/A | N/A | N/A | 0 |
3 | fused_conv2d1_add1_relu1 | 26091520 | 1 | N/A | N/A | N/A | 0 |
4 | fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1 | 128651264 | 3 | N/A | N/A | N/A | 0 |
5 | fused_conv2d2_add2 | 103563264 | 1 | N/A | N/A | N/A | 0 |
6 | fused_conv2d2_add2_add3_relu2 | 105168896 | 3 | N/A | N/A | N/A | 0 |
7 | fused_conv2d3_add1_relu1 | 103161856 | 2 | N/A | N/A | N/A | 0 |
8 | fused_conv2d4_add4_relu3 | 206323712 | 1 | N/A | N/A | N/A | 0 |
9 | fused_conv2d5_add5_relu4 | 231411712 | 1 | N/A | N/A | N/A | 0 |
10 | fused_conv2d7_add6 | 205922304 | 1 | N/A | N/A | N/A | 0 |
11 | fused_conv2d6_add6_add7_relu5 | 103964672 | 4 | N/A | N/A | N/A | 0 |
12 | fused_conv2d8_add5_relu4 | 102961152 | 3 | N/A | N/A | N/A | 0 |
13 | fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4 | 127045632 | 3 | N/A | N/A | N/A | 0 |
14 | fused_conv2d9_add8_relu6 | 205922304 | 1 | N/A | N/A | N/A | 0 |
15 | fused_conv2d10_add9_relu7 | 231311360 | 1 | N/A | N/A | N/A | 0 |
16 | fused_conv2d12_add10 | 205721600 | 1 | N/A | N/A | N/A | 0 |
17 | fused_conv2d11_add10_add11_relu8 | 103362560 | 6 | N/A | N/A | N/A | 0 |
18 | fused_conv2d13_add9_relu7 | 102860800 | 5 | N/A | N/A | N/A | 0 |
19 | fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7 | 114903040 | 5 | N/A | N/A | N/A | 0 |
20 | fused_conv2d14_add12_relu9 | 205721600 | 1 | N/A | N/A | N/A | 0 |
21 | fused_conv2d15_add13_relu10 | 231261184 | 1 | N/A | N/A | N/A | 0 |
22 | fused_conv2d17_add14 | 205621248 | 1 | N/A | N/A | N/A | 0 |
23 | fused_conv2d16_add14_add15_relu11 | 103061504 | 3 | N/A | N/A | N/A | 0 |
24 | fused_conv2d18_add13_relu10 | 102810624 | 2 | N/A | N/A | N/A | 0 |
25 | fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10 | 142132224 | 2 | N/A | N/A | N/A | 0 |
26 | adaptive_avg_pool2d | 102400 | 1 | N/A | N/A | N/A | 0 |
27 | fused_layout_transform1_reshape_squeeze | 1 | 1 | N/A | N/A | N/A | 0 |
28 | fused_dense_add16 | 4097000 | 1 | N/A | N/A | N/A | 0 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2022-08-05 12:14:05.114 INFO Scheduler picks Task #0: "layout_transform"
2022-08-05 12:14:06.380 INFO Sending 6 sample(s) to builder
2022-08-05 12:14:06.713 INFO Sending 6 sample(s) to runner
2022-08-05 12:14:06.713 INFO Scheduler picks Task #1: "fused_conv2d_add_relu"
The tail of the long of task 1 (excerpted, as it goes on for a long time):
[etc]
2022-08-05 12:36:14.188 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1685504 failure(s)
2022-08-05 12:36:15.803 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1687552 failure(s)
2022-08-05 12:36:17.411 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1689600 failure(s)
Metadata
Metadata
Assignees
Labels
needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address itPRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug