Optimize parallel test execution for unit tests #30705

potiuk · 2023-04-18T12:00:16Z

We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

Split Provider tests (if amazon or google are part of the
provider tests) into amazon, google, all-other chunks
Move sorting of the test types to selective_check, to sort the
test types according to expected longest running time (the longest
tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk · 2023-04-18T12:07:29Z

This one is the next optimisation after the #29223 - and it should bring our "main" test execution time on sel-hosted runners below 20 minutes (I hope - let's see).

Over the last couple of months Amazon tests grew, and with moto being powerful (but rather slow) mocking framework, they started to take the biggest chunk of Provider test time turning Providers test type into by far longest test type to run, when tests are run in parallel on our self-hosted runners. I saw that when looking at the test-output that often last few minutes of the tests were only "Providers" test type running, while all the others were completeed. This brought utilisation of our 8 CPU machines down quite a lot and increased elapsed time of time execution.

By adding this change, where I separate amazon, google out from "other" provider tests we should get back to the situation that the test types are more-or-less even (With amazon still being slowest likely) so that the utilization of the machines might get back to close 100% all the time and elapsed time should go down.

potiuk · 2023-04-18T12:10:44Z

I also fixed a few small things I found: unused flag on parallel tests, more accurate unit tests to selective_checks (I found it was doing non-full-line-matching on the output) and it also required to move sorting order for parallel tests to selective checks (but this is better as unit tests cover it too now).

potiuk · 2023-04-18T13:06:26Z

I need to fix mssql tests with that one - but it seems my observation and optimisation was right.

This change has a great impact. The full test suite on our Self-hosted runners goes down to 12 minutes from 18 minutes - which makes it nearly 30% faster.

dev/breeze/src/airflow_breeze/commands/testing_commands.py

BREEZE.rst

potiuk · 2023-04-18T18:11:17Z

dev/breeze/src/airflow_breeze/commands/testing_commands.py

We can now remote this whole part - each test is "smaller" so the memory issues we experienced in the past should be far less frequent.

potiuk · 2023-04-18T18:11:45Z

dev/breeze/src/airflow_breeze/commands/testing_commands.py

This parameter was unused.

potiuk · 2023-04-18T18:12:46Z

dev/breeze/src/airflow_breeze/params/shell_params.py

We needed that to handle several Providers* tests run in parallel - each parallel docker-compose needs unique volume name

vandonr-amz · 2023-04-18T20:44:49Z

dev/breeze/src/airflow_breeze/utils/selective_checks.py

this is all pretty specific... Running tests in parallel for all providers regardless is not good ?

We cannot run tests in parallell, because far too many of our tests rely on a shared database (for example connections are not mocked. DagRuns are created, etc. etc. ) . Simply speaking HUGE percent of our tests are not pure unit tests with everything mocked but they rely on a shared database to be there (and they prepare data. use it and sometimes delete and sometimes not when running). We even run all our tests WITH specific database. New tests that continue using the shared database are added / modified /updated in every PR.

if we run them in parallel, the tests will start override each other data in the database.

So we can do either

review the 12.500 of tests of ours and separate out "real unit test" from the "DB tests" and add mechanisms to keep the separaiton - then we would be able to parallelise the "real unit tests". Possibly even rewrite the tests to be "real unit tests" and mock the DB access

or do what we are doing - i.e. split the tests into more-or-less equal chunks (in terms of execution time) and run them sequentially, each of the test type with its own database (this is what we do now).

Option 1) seems to require enormous effort - but if you (or anyone) would like to take on the task, this is a good idea. I would love to have it, but it seems not feasible (but I would love to be proven wrong)

Option 2) make a deliberate effort to split the tests and balance-optimize them from time to time with few hours effort and custom parallel running framework (this is what we have now)

Option 3.) ... I do not see a 3rd option

But maybe there is one? Courious to hear your thoughts :)

BTW. If you want to see what's going on when you try to parallelize our tests, I encourage you:

1) enter breeze with a lot of CPUs available 2) run pytest -n auto tests 3) Have fun

And yes. That question is being asked from time to time (including myself) :) . The answer has not changed since the day I joined the project.

cc: @vandonr-amz - I just run the pytest -n auto tests for the fun of it. This has always ben an interesting experience (and pretty much always a different one).

This time It failed pytest collection:

2 skipped, 2520 warnings, 407 errors in 313.56s (0:05:13`

Most of them:

E botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the GetParameter operation: The security token included in the request is invalid.

Seems that AWS tests have problems to be even collected in parallel :)

BTW. I am really not trying to mock you :D - it's just REALLY our tests are ment to be collected and executed sequentially and it would be HUGE effort to change it, that's why I figured out that more feasible way to utilise multiple CPUs, one that does not cost enormoust amount of engineering time (not mentioning mental breakdown while doing so) is to split the tests in sequential chunks that are run in parallel each with own database and in an isolated container.

This is a way to keep sanity.

Oh yeah ok I was not aware of those shortcomings. I'm just a bit weary of introducing some if amazon: in the code, but I understand how the situation here is... special ^^

But maybe some day, someone will make it less specific :). I just had a conversation where we discussed how things were working when I joined the project 5 years ago ... and welll.. A LOT has changed since. So maybe in a few year's time :D. Who knows.

dev/breeze/src/airflow_breeze/utils/selective_checks.py

We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low.

potiuk · 2023-04-19T00:55:40Z

Now we only need #29223 to get the full speedup/feedback (without it, sdist test takes 27 minutes on main).

We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low. (cherry picked from commit 310eaf9)

potiuk requested review from ashb and jedcunningham as code owners April 18, 2023 12:00

boring-cyborg bot added the area:dev-tools label Apr 18, 2023

potiuk requested review from Taragolis, eladkal, ephraimbuddy, hussein-awala, josh-fell and uranusjr and removed request for ashb and jedcunningham April 18, 2023 12:00

potiuk force-pushed the optimize-provider-tests-execution branch from 377f1a9 to e67a47f Compare April 18, 2023 12:18

potiuk requested a review from o-nikolas April 18, 2023 13:07

potiuk force-pushed the optimize-provider-tests-execution branch 3 times, most recently from af5ede7 to a314440 Compare April 18, 2023 14:57

potiuk added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Apr 18, 2023

potiuk force-pushed the optimize-provider-tests-execution branch 3 times, most recently from 8f2d686 to 025fabc Compare April 18, 2023 16:11

vincbeck reviewed Apr 18, 2023

View reviewed changes

dev/breeze/src/airflow_breeze/commands/testing_commands.py Outdated Show resolved Hide resolved

BREEZE.rst Outdated Show resolved Hide resolved

potiuk force-pushed the optimize-provider-tests-execution branch from 025fabc to a85fbf9 Compare April 18, 2023 18:06

potiuk changed the title ~~Optimize parallel test execution for Google, Amazon, WWW tests~~ Optimize parallel test execution for unit tests Apr 18, 2023

potiuk removed the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Apr 18, 2023

potiuk force-pushed the optimize-provider-tests-execution branch 2 times, most recently from daa4376 to 7607b26 Compare April 18, 2023 18:08

vincbeck approved these changes Apr 18, 2023

View reviewed changes

potiuk commented Apr 18, 2023

View reviewed changes

dev/breeze/src/airflow_breeze/commands/testing_commands.py Outdated

Copy link

Member Author

potiuk Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter was unused.

potiuk commented Apr 18, 2023

View reviewed changes

vandonr-amz reviewed Apr 18, 2023

View reviewed changes

vandonr-amz approved these changes Apr 18, 2023

View reviewed changes

hussein-awala reviewed Apr 18, 2023

View reviewed changes

dev/breeze/src/airflow_breeze/utils/selective_checks.py Outdated Show resolved Hide resolved

potiuk force-pushed the optimize-provider-tests-execution branch from 7607b26 to 53ba5cf Compare April 19, 2023 00:18

hussein-awala approved these changes Apr 19, 2023

View reviewed changes

potiuk merged commit 310eaf9 into apache:main Apr 19, 2023

potiuk deleted the optimize-provider-tests-execution branch April 19, 2023 00:54

potiuk mentioned this pull request Apr 19, 2023

Split installation of sdist providers into parallel chunks #29223

Merged

ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 23, 2023

ephraimbuddy added this to the Airflow 2.6.0 milestone Apr 23, 2023

Optimize parallel test execution for unit tests #30705

Optimize parallel test execution for unit tests #30705

Uh oh!

Conversation

potiuk commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Apr 18, 2023

Uh oh!

potiuk commented Apr 18, 2023

Uh oh!

potiuk commented Apr 18, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

potiuk commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

potiuk commented Apr 18, 2023 •

edited

Loading

potiuk Apr 18, 2023 •

edited

Loading

potiuk Apr 18, 2023 •

edited

Loading

potiuk commented Apr 19, 2023 •

edited

Loading