Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jan 29, 2023

Sdist provider installation takes a lot of time because pip cannot parallelise the sdist package building. But we still want to test all our provider's installation as sdist packages.

This can be achieved by running N parallel installations with only subset of providers being installed in each chunk.

This is what we do in this PR.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 16, 2023
@potiuk potiuk added pinned Protect from Stalebot auto closing and removed stale Stale PRs per the .github/workflows/stale.yml policy file labels Mar 17, 2023
@potiuk
Copy link
Member Author

potiuk commented Mar 17, 2023

Not stale

@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch 2 times, most recently from 101d6a8 to 2a09c18 Compare April 16, 2023 16:13
@potiuk potiuk marked this pull request as ready for review April 16, 2023 16:48
@potiuk potiuk requested review from eladkal and ephraimbuddy and removed request for ashb, jedcunningham and kaxil April 16, 2023 16:48
@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch from 2a09c18 to f8f243d Compare April 16, 2023 16:48
@potiuk
Copy link
Member Author

potiuk commented Apr 16, 2023

This one speeds up one of the longest CI jobs we have on main / self-hosted-runners from 26 minutes to 6 minutes (whoping 5x faster). The `sdist' packages were only instaleld (just for regression) already - we did not run import verification on them (we run import verification for .whl packages only).

Installing sdist packages has been always super-slow (and recent pip changes made it even slower - because now they always go sdist -> whl - > install and the installation cannot be parallelised to use multiple CPUS.

However - since we only want to see if provider sdist packages can be installed, we can parallelise the installation ourselves - by splitting the list of providers into chunks, and running separate docker container in parallel for each chunk.

This way we can install all sdist packages 5 times faster on seff-hosted machine with 8 CPUS (it's not 8x faster because of some common overhead repeated for each one - we install airflow from sdist package in each of the containers.

This job is only running for self-hosted runners, so it will have no impact on non-committer PRS (but when we will speed up the remaning Helm Unit test that is very slow as well, this will give faster feedback for commiter PRs and faster builds for main "canary" builds (and some 20 minutes build-time cost for those.

@potiuk
Copy link
Member Author

potiuk commented Apr 16, 2023

@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch from f8f243d to d9b5c8c Compare April 16, 2023 19:54
@potiuk
Copy link
Member Author

potiuk commented Apr 17, 2023

We also need this one to get feedback improvements (and cc: @hussein-awala - to your doubts from #30672 , this one has only positive impact - it just uses mulitple CPUS of our self-hosted runners much more effectively) we get close to 100% utilisation thanks to splitting sdist installation into chunks and parallelising them in multiple containers. Pretty much same "build time" but with 5x decrease of "elapsed time".

@potiuk
Copy link
Member Author

potiuk commented Apr 17, 2023

Would apreciate to get that one merged. Tt will speed up some of the main builds and constraint refreshing, I think sdist and #30672 are the ones that delay (and cancel completion) of quite a number of those complete "canary" builds (because the jobs are cancelled by subsequent merges), and merging those will make the refreshing our cache and constraints more often.

@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch 3 times, most recently from f46c942 to 869f9aa Compare April 17, 2023 17:05
@potiuk potiuk requested a review from pierrejeambrun April 17, 2023 20:42
@potiuk
Copy link
Member Author

potiuk commented Apr 17, 2023

Last speedup for CI in the current series.

@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch 3 times, most recently from 30b4df4 to 87a1471 Compare April 17, 2023 21:03
@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch 3 times, most recently from 1a7c363 to 7b3e734 Compare April 19, 2023 00:59
@potiuk
Copy link
Member Author

potiuk commented Apr 19, 2023

Conflict after merging #30705 resolved.

@potiuk
Copy link
Member Author

potiuk commented Apr 19, 2023

Reaaly nice speedup: from 20 to 6 mins without any other impact (just better use of multiple CPUS). I hope to get a quick review on that one :)

@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch 3 times, most recently from d7c9db8 to a119984 Compare April 19, 2023 12:25
Sdist provider installation takes a lot of time because pip cannot
parallelise the sdist package building. But we still want to test all
our provider's installation as sdist packages.

This can be achieved by running N parallel installations with only
subset of providers being installed in each chunk.

This is what we do in this PR.
@potiuk potiuk force-pushed the run-sdist-providers-installation-in-parallel branch from a119984 to 51d701b Compare April 19, 2023 15:15
@potiuk potiuk requested a review from o-nikolas April 19, 2023 21:34
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good for me, LGTM

@potiuk potiuk merged commit a9328ef into apache:main Apr 20, 2023
@potiuk potiuk deleted the run-sdist-providers-installation-in-parallel branch April 20, 2023 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools pinned Protect from Stalebot auto closing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants