Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Conversation

@akarbown
Copy link
Contributor

Description

This proposal of using the pytest-parallel plugin could be the temporary WA for the slowdown in the tests (#20092) after removal of LLVM OpenMP.
Using pytest-parallel plugin (instead of pytest-xdist):

  • needs the WA connected with python described here. Thus, the '--timeout-method=thread' option appeared in the command line and the fixes in the dataloader.py file.
  • might result in the deadlocks, as some of the tests may not be thread safe. The one I've encountered is i.e test_ndarray_saveload and this is why I've mark that test as serial. There might be some others but it needs definitely longer testing.
  • make the tests run faster than when using pytest-xdist.

For now, it's only the draft that will check/test that fix in the production. In the next steps I'll try to find the root-cause of the slowdown when MxNET is compiled with GNU OpenMP.

Comments

Where can I find any installation guide or other documentation that needs mentioning that change? I'm asking as I couldn't find any.

@mxnet-bot
Copy link

Hey @akarbown , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, miscellaneous, sanity, centos-gpu, unix-cpu, edge, clang, unix-gpu, centos-cpu, windows-gpu, website]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added the pr-work-in-progress PR is still work in progress label Jun 15, 2021
@akarbown akarbown force-pushed the openmp-slowdown branch 3 times, most recently from efe41a9 to acb4be5 Compare June 15, 2021 13:38
Using pytest-parallel makes the testing times be more or less equal
for LLVM OpenMP & GNU OpenMP.
@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jun 21, 2021

hi @akarbown , what's the status of this pr? Thanks!

@akarbown
Copy link
Contributor Author

I guess that @bgawrych fixed the issue in the following PR: #20367. However, there might be a potential to speed up the CI, but it will require more time and effort to figure out the details. Thus, I'm thinking if this might be useful and dig into it more (here the GPU cases), or withdraw it. @Zha0q1, what do you think?

@Zha0q1
Copy link
Contributor

Zha0q1 commented Jun 22, 2021

I guess that @bgawrych fixed the issue in the following PR: #20367. However, there might be a potential to speed up the CI, but it will require more time and effort to figure out the details. Thus, I'm thinking if this might be useful and dig into it more (here the GPU cases), or withdraw it. @Zha0q1, what do you think?

Thanks for the reply! Do we know if #20367 is the only cause of slowdown? I think faster ci will be favorable but I can monitor the CI runtime dash board for a few days and update back.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-work-in-progress PR is still work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants