Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Apr 18, 2023

We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

  • Split Provider tests (if amazon or google are part of the
    provider tests) into amazon, google, all-other chunks

  • Move sorting of the test types to selective_check, to sort the
    test types according to expected longest running time (the longest
    tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Apr 18, 2023

This one is the next optimisation after the #29223 - and it should bring our "main" test execution time on sel-hosted runners below 20 minutes (I hope - let's see).

Over the last couple of months Amazon tests grew, and with moto being powerful (but rather slow) mocking framework, they started to take the biggest chunk of Provider test time turning Providers test type into by far longest test type to run, when tests are run in parallel on our self-hosted runners. I saw that when looking at the test-output that often last few minutes of the tests were only "Providers" test type running, while all the others were completeed. This brought utilisation of our 8 CPU machines down quite a lot and increased elapsed time of time execution.

By adding this change, where I separate amazon, google out from "other" provider tests we should get back to the situation that the test types are more-or-less even (With amazon still being slowest likely) so that the utilization of the machines might get back to close 100% all the time and elapsed time should go down.

@potiuk
Copy link
Member Author

potiuk commented Apr 18, 2023

I also fixed a few small things I found: unused flag on parallel tests, more accurate unit tests to selective_checks (I found it was doing non-full-line-matching on the output) and it also required to move sorting order for parallel tests to selective checks (but this is better as unit tests cover it too now).

@potiuk potiuk force-pushed the optimize-provider-tests-execution branch from 377f1a9 to e67a47f Compare April 18, 2023 12:18
@potiuk
Copy link
Member Author

potiuk commented Apr 18, 2023

I need to fix mssql tests with that one - but it seems my observation and optimisation was right.

This change has a great impact. The full test suite on our Self-hosted runners goes down to 12 minutes from 18 minutes - which makes it nearly 30% faster.

@potiuk potiuk requested a review from o-nikolas April 18, 2023 13:07
@potiuk potiuk force-pushed the optimize-provider-tests-execution branch 3 times, most recently from af5ede7 to a314440 Compare April 18, 2023 14:57
@potiuk potiuk added the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Apr 18, 2023
@potiuk potiuk force-pushed the optimize-provider-tests-execution branch 3 times, most recently from 8f2d686 to 025fabc Compare April 18, 2023 16:11
@potiuk potiuk force-pushed the optimize-provider-tests-execution branch from 025fabc to a85fbf9 Compare April 18, 2023 18:06
@potiuk potiuk changed the title Optimize parallel test execution for Google, Amazon, WWW tests Optimize parallel test execution for unit tests Apr 18, 2023
@potiuk potiuk removed the use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) label Apr 18, 2023
@potiuk potiuk force-pushed the optimize-provider-tests-execution branch 2 times, most recently from daa4376 to 7607b26 Compare April 18, 2023 18:08
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now remote this whole part - each test is "smaller" so the memory issues we experienced in the past should be far less frequent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter was unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed that to handle several Providers* tests run in parallel - each parallel docker-compose needs unique volume name

Comment on lines 612 to 620
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all pretty specific... Running tests in parallel for all providers regardless is not good ?

Copy link
Member Author

@potiuk potiuk Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot run tests in parallell, because far too many of our tests rely on a shared database (for example connections are not mocked. DagRuns are created, etc. etc. ) . Simply speaking HUGE percent of our tests are not pure unit tests with everything mocked but they rely on a shared database to be there (and they prepare data. use it and sometimes delete and sometimes not when running). We even run all our tests WITH specific database. New tests that continue using the shared database are added / modified /updated in every PR.

if we run them in parallel, the tests will start override each other data in the database.

So we can do either

  1. review the 12.500 of tests of ours and separate out "real unit test" from the "DB tests" and add mechanisms to keep the separaiton - then we would be able to parallelise the "real unit tests". Possibly even rewrite the tests to be "real unit tests" and mock the DB access

  2. or do what we are doing - i.e. split the tests into more-or-less equal chunks (in terms of execution time) and run them sequentially, each of the test type with its own database (this is what we do now).

Option 1) seems to require enormous effort - but if you (or anyone) would like to take on the task, this is a good idea. I would love to have it, but it seems not feasible (but I would love to be proven wrong)

Option 2) make a deliberate effort to split the tests and balance-optimize them from time to time with few hours effort and custom parallel running framework (this is what we have now)

Option 3.) ... I do not see a 3rd option

But maybe there is one? Courious to hear your thoughts :)

Copy link
Member Author

@potiuk potiuk Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. If you want to see what's going on when you try to parallelize our tests, I encourage you:

1) enter breeze with a lot of CPUs available
2) run pytest -n auto tests
3) Have fun

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes. That question is being asked from time to time (including myself) :) . The answer has not changed since the day I joined the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @vandonr-amz - I just run the pytest -n auto tests for the fun of it. This has always ben an interesting experience (and pretty much always a different one).

This time It failed pytest collection:

2 skipped, 2520 warnings, 407 errors in 313.56s (0:05:13`

Most of them:

E   botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the GetParameter operation: The security token included in the request is invalid.

Seems that AWS tests have problems to be even collected in parallel :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. I am really not trying to mock you :D - it's just REALLY our tests are ment to be collected and executed sequentially and it would be HUGE effort to change it, that's why I figured out that more feasible way to utilise multiple CPUs, one that does not cost enormoust amount of engineering time (not mentioning mental breakdown while doing so) is to split the tests in sequential chunks that are run in parallel each with own database and in an isolated container.

This is a way to keep sanity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah ok I was not aware of those shortcomings. I'm just a bit weary of introducing some if amazon: in the code, but I understand how the situation here is... special ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe some day, someone will make it less specific :). I just had a conversation where we discussed how things were working when I joined the project 5 years ago ... and welll.. A LOT has changed since. So maybe in a few year's time :D. Who knows.

We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

* Split Provider tests (if amazon or google are part of the
  provider tests) into amazon, google, all-other chunks

* Move sorting of the test types to selective_check, to sort the
  test types according to expected longest running time (the longest
  tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.
@potiuk potiuk force-pushed the optimize-provider-tests-execution branch from 7607b26 to 53ba5cf Compare April 19, 2023 00:18
@potiuk potiuk merged commit 310eaf9 into apache:main Apr 19, 2023
@potiuk potiuk deleted the optimize-provider-tests-execution branch April 19, 2023 00:54
@potiuk
Copy link
Member Author

potiuk commented Apr 19, 2023

Now we only need #29223 to get the full speedup/feedback (without it, sdist test takes 27 minutes on main).

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 23, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.6.0 milestone Apr 23, 2023
ephraimbuddy pushed a commit that referenced this pull request Apr 23, 2023
We are runnig the tests in parallel test types in order to speed
up their execution. Howver some test types and subsets of tests
are taking far longer to execute than other test types.

The longest tests to run are Providers and WWW tests, and the
longest tests from Providers are by far Amazon tests, then
Google. "All Other" Provider tests take about the same time
as Amazon tests - also after splitting the provider tests,
Core tests take the longest time.

When we are running tests in parallel on multiple CPUs, often
the longest running tests remain runing on their own while the
other CPUS are not busy. We could run separate tests type
per provider, but overhead of starting the database and collecting
and initializing tests for them is too big for it to achieve
speedups - especially for Public runners, having 80 separate
databases with 80 subsequent container runs is slower than
running all Provider tests together.

However we can split the Provider tests into smaller number of
chunks and prioritize running the long chunks first. This
should improve the effect of parellelisation and improve utilization of
our multi-CPU machines.

This PR aims to do that:

* Split Provider tests (if amazon or google are part of the
  provider tests) into amazon, google, all-other chunks

* Move sorting of the test types to selective_check, to sort the
  test types according to expected longest running time (the longest
  tests to run are added first)

This should improve the CPU utilization of our multi-CPU runners
and make the tests involving complete Provider set (or even sets
containing amazon, google and few other providers)
execute quite a few minutes faster on average.

We could also get rid of some sequential processing for the Public PRs
because each test type we will run will be less demanding overall. We
used to get a lot of 137 exit codes (memory errors) but with splitting
out Providers, the risk of exhausting resources be two test types
running in paralel are low.

(cherry picked from commit 310eaf9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants