-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Optimize parallel test execution for unit tests #30705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This one is the next optimisation after the #29223 - and it should bring our "main" test execution time on sel-hosted runners below 20 minutes (I hope - let's see). Over the last couple of months Amazon tests grew, and with moto being powerful (but rather slow) mocking framework, they started to take the biggest chunk of Provider test time turning Providers test type into by far longest test type to run, when tests are run in parallel on our self-hosted runners. I saw that when looking at the test-output that often last few minutes of the tests were only "Providers" test type running, while all the others were completeed. This brought utilisation of our 8 CPU machines down quite a lot and increased elapsed time of time execution. By adding this change, where I separate amazon, google out from "other" provider tests we should get back to the situation that the test types are more-or-less even (With amazon still being slowest likely) so that the utilization of the machines might get back to close 100% all the time and elapsed time should go down. |
|
I also fixed a few small things I found: unused flag on parallel tests, more accurate unit tests to selective_checks (I found it was doing non-full-line-matching on the output) and it also required to move sorting order for parallel tests to selective checks (but this is better as unit tests cover it too now). |
377f1a9 to
e67a47f
Compare
|
I need to fix mssql tests with that one - but it seems my observation and optimisation was right. This change has a great impact. The full test suite on our Self-hosted runners goes down to 12 minutes from 18 minutes - which makes it nearly 30% faster. |
af5ede7 to
a314440
Compare
8f2d686 to
025fabc
Compare
025fabc to
a85fbf9
Compare
daa4376 to
7607b26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now remote this whole part - each test is "smaller" so the memory issues we experienced in the past should be far less frequent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter was unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We needed that to handle several Providers* tests run in parallel - each parallel docker-compose needs unique volume name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all pretty specific... Running tests in parallel for all providers regardless is not good ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot run tests in parallell, because far too many of our tests rely on a shared database (for example connections are not mocked. DagRuns are created, etc. etc. ) . Simply speaking HUGE percent of our tests are not pure unit tests with everything mocked but they rely on a shared database to be there (and they prepare data. use it and sometimes delete and sometimes not when running). We even run all our tests WITH specific database. New tests that continue using the shared database are added / modified /updated in every PR.
if we run them in parallel, the tests will start override each other data in the database.
So we can do either
-
review the 12.500 of tests of ours and separate out "real unit test" from the "DB tests" and add mechanisms to keep the separaiton - then we would be able to parallelise the "real unit tests". Possibly even rewrite the tests to be "real unit tests" and mock the DB access
-
or do what we are doing - i.e. split the tests into more-or-less equal chunks (in terms of execution time) and run them sequentially, each of the test type with its own database (this is what we do now).
Option 1) seems to require enormous effort - but if you (or anyone) would like to take on the task, this is a good idea. I would love to have it, but it seems not feasible (but I would love to be proven wrong)
Option 2) make a deliberate effort to split the tests and balance-optimize them from time to time with few hours effort and custom parallel running framework (this is what we have now)
Option 3.) ... I do not see a 3rd option
But maybe there is one? Courious to hear your thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. If you want to see what's going on when you try to parallelize our tests, I encourage you:
1) enter breeze with a lot of CPUs available
2) run pytest -n auto tests
3) Have fun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes. That question is being asked from time to time (including myself) :) . The answer has not changed since the day I joined the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @vandonr-amz - I just run the pytest -n auto tests for the fun of it. This has always ben an interesting experience (and pretty much always a different one).
This time It failed pytest collection:
2 skipped, 2520 warnings, 407 errors in 313.56s (0:05:13`
Most of them:
E botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the GetParameter operation: The security token included in the request is invalid.
Seems that AWS tests have problems to be even collected in parallel :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. I am really not trying to mock you :D - it's just REALLY our tests are ment to be collected and executed sequentially and it would be HUGE effort to change it, that's why I figured out that more feasible way to utilise multiple CPUs, one that does not cost enormoust amount of engineering time (not mentioning mental breakdown while doing so) is to split the tests in sequential chunks that are run in parallel each with own database and in an isolated container.
This is a way to keep sanity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah ok I was not aware of those shortcomings. I'm just a bit weary of introducing some if amazon: in the code, but I understand how the situation here is... special ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe some day, someone will make it less specific :). I just had a conversation where we discussed how things were working when I joined the project 5 years ago ... and welll.. A LOT has changed since. So maybe in a few year's time :D. Who knows.
We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low.
7607b26 to
53ba5cf
Compare
|
Now we only need #29223 to get the full speedup/feedback (without it, sdist test takes 27 minutes on main). |
We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low. (cherry picked from commit 310eaf9)
We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.
The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.
When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.
However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.
This PR aims to do that:
Split Provider tests (if amazon or google are part of the
provider tests) into amazon, google, all-other chunks
Move sorting of the test types to selective_check, to sort the
test types according to expected longest running time (the longest
tests to run are added first)
This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.
We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.