Skip to content

Conversation

@comaniac
Copy link
Contributor

@comaniac comaniac commented Oct 21, 2020

  • Disable the failed checker when reading the log file.
  • Add a logging to track why the log is illegal on CI.

We can merge this PR first if it passes the CI. Then we can have a follow-up PR to fix the root cause and remove the logging.

cc @merrymercy @jcf94 @junrushao1994

@comaniac comaniac changed the title [Bugfix] Auto scheduler tutorial failure on CI [DO NOT MERGE][Bugfix] Auto scheduler tutorial failure on CI Oct 21, 2020
@comaniac comaniac changed the title [DO NOT MERGE][Bugfix] Auto scheduler tutorial failure on CI [Bugfix] Auto scheduler tutorial failure on CI Oct 21, 2020
@jroesch
Copy link
Member

jroesch commented Oct 21, 2020

@comaniac thanks for doing this!

@junrushao
Copy link
Member

Yeah we should probably unblock the CI first, then can work together to find the root cause in exactly the same docker environment used in the CI.

@comaniac
Copy link
Contributor Author

Although CI still failed, I've finally got the incorrect JSON. Here are the instructions to reproduce:

  1. Put the following line to conv2d.json.
{"i": [["[\"conv2d_layer\", 1, 7, 7, 512, 512, 3, 3, [1, 1], [1, 1]]", "cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32", [-1, 16, 64, 49152, 65536, 1024, 8, 32]], [[], [["CI", 5], ["SP", 3, 0, 1, [1, 1, 1, 1], 1], ["SP", 3, 5, 512, [4, 32, 1, 1], 1], ["SP", 3, 10, 7, [1, 1, 1, 7], 1], ["SP", 3, 15, 7, [1, 1, 7, 1], 1], ["SP", 3, 20, 512, [8, 1], 1], ["SP", 3, 23, 3, [1, 1], 1], ["SP", 3, 26, 3, [1, 3], 1], ["RE", 3, [0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 17, 20, 23, 26, 21, 24, 27, 3, 8, 13, 18, 22, 25, 28, 4, 9, 14, 19]], ["FSP", 6, 0, 1, 3], ["FSP", 6, 4, 2, 3], ["FSP", 6, 8, 3, 3], ["FSP", 6, 12, 4, 3], ["RE", 6, [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15]], ["CA", 3, 6, 11], ["CHR", 2, "shared", [3]], ["CA", 3, 4, 14], ["CHR", 1, "shared", [4]], ["CA", 2, 5, 14], ["CI", 1], ["FU", 8, [0, 1, 2, 3]], ["AN", 8, 0, 5], ["FU", 8, [1, 2, 3, 4]], ["AN", 8, 1, 4], ["FU", 8, [2, 3, 4, 5]], ["AN", 8, 2, 6], ["FU", 4, [0, 1, 2, 3]], ["SP", 4, 0, 16, [1], 1], ["AN", 4, 1, 2], ["FFSP", 4, 0, [4, 3, 2, 1], 1, 1], ["AN", 4, 1, 6], ["FU", 2, [0, 1, 2, 3]], ["SP", 2, 0, 784, [7], 1], ["AN", 2, 1, 2], ["FFSP", 2, 0, [4, 3, 2, 1], 1, 1], ["AN", 2, 1, 6], ["PR", 5, 0, "auto_unroll_max_step$0"]]]], "r": [[0.001027474], 0, 1.97181, 1603158399], "v": "v0.2"}
  1. Run the following code.
import tvm
from tvm import auto_scheduler
auto_scheduler.load_best("conv2d.json")

I'll try to fix it tomorrow. @jcf94 will help investigate as well.

@jcf94
Copy link
Contributor

jcf94 commented Oct 21, 2020

@comaniac Thanks!

@junrushao
Copy link
Member

Nice we finally get the json file to reproduce!

@jcf94
Copy link
Contributor

jcf94 commented Oct 21, 2020

Nice we finally get the json file to reproduce!

...... 😢 Unfortunately, it seems we still can't reproduce such bug in local runtime. I'm still not able to figure out how this log was generated.

@comaniac
Copy link
Contributor Author

I disabled the tutorials to make the CI green first. Will file another PR to fix the issue.

@comaniac
Copy link
Contributor Author

We found that the root cause is the log file generated by the tutorial is not removed, meaning that each CI will append several lines of log to the same file. Based on that, #6671 changes the log format and appended the record in different format to the file that is read by other CI runs. After this PR, I'll file another PR to make sure every CI run is independent.

@tqchen tqchen merged commit 9ae386c into apache:main Oct 21, 2020
@comaniac comaniac deleted the ansor_fix_ci branch October 21, 2020 21:04
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants