Experiment with running tests in parallel on GHA #5211

Kami · 2021-03-26T22:49:59Z

This pull request is an attempt to speed up unit and integration test runs on Github Actions CI by parallelizing the runs and utilizing multiple jobs.

Background, Context

Github actions offers 20 concurrent job runs (https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits) for free plans / open source projects.

Since right now we only have a couple of jobs in this workflow, it makes sense to try to parallelize the tests to speed things. Slow tests are something which affects everyone, slow development cycle and in aggregate over time results in tons of wasted time waiting for CI.

This pull request tries to do that by utilizing nose parallel plugin and splitting tests across multiple jobs (we split both unit tests and integration tests runs into 4 chunks and jobs).

Another benefit with splitting tests into multiple chunks is identifying cross test pollution and dependencies which are not desired - we had a lot of issues in the past related to cross test pollution and state sharing (e.g. test X (usually not intentionally) relies on some state from test Y).

Having cross test dependencies means we can't run tests in isolation by itself which is not desired - each test should be able to run by itself and all the state should be created by the test itself and setUp / setUpClass methods.

This change already identified a couple of issues like that which I have fixed (this is also nothing new, we already fixed tons of issues like that in the past when we identified them - usually when manually trying to run some test in isolation).

Results

With those changes, the whole workflow run now takes ~8-9 minutes instead of ~13-15 minutes.

I was actually hoping for even a bigger speed up, but this is still better than nothing.

I'm not exactly sure why the speed up is not bigger and I need to dig in some more (there of course is some overhead involved with spinning up each job, running common steps, etc., but still), but even that is better than nothing and every minute counts.

It's worth noting that even if and when we start utilizing Github Actions for more projects, I don't think we should decrease number of concurrent job runs in this workflow - this is the primary and main workflow which runs the most and it needs to be fast. It doesn't matter that much if st2-rbac or some other repo job which runs maybe couple of times per month needs to wait a bit on the jobs from this workflow to finish.

Gotchas, etc.

I don't think there should be any gotchas and things should just work out of the box.

Only potential issue could be code coverage, but we utilize codecov.io which supports incremental result uploading and combining the results on the server side so as long as every test job uploads results to codecov.io, it should work fine without any issues.

TODO

Document this setup in github actions workflow file
Update github status checks

Kami · 2021-03-26T23:35:42Z

I was hoping for a bigger performance improvements, but there doesn't appear to be all that much.

test. This way test method can run in a standalone fashion.

Kami · 2021-03-27T14:56:35Z

@amanda11 Just a heads up - 78baa3b.

Trying to parallelize test run uncovered a small issue with that unit test which was added recently.

The test case / method relies on a state (pack being registered) from a previous test method which is usually bad practice which results in tests which can't run in standalone fashion, etc.

Similar "cross test pollution" has bitten us many times in the past already (and I'm sure will bite us many more times in the future).

amanda11 · 2021-03-27T15:33:02Z

Don't think that was my test! But agree with point.

Kami · 2021-03-27T15:35:33Z

OK so after some more runs, there are some speed gains. Around 8-9 minutes for parallelized runs vs ~13-15 minutes for non-parallelized.

I was hoping for a bigger gain, but that's probably still better than nothing.

If we do decide to go with this approach, I just need to confirm code coverage works correctly - codecov.io supports multiple submissions and incremental updates (aka each job submits partial coverage which it generated) so I believe it should work out of the box if the current setup worked before already.

For OSS projects, Github claims they offer 20 concurrent jobs at once so until we migrate more projects to GHA I think we should try to utilize as much concurrency for this main workflow as possible to make tests faster.

Granted end to end tests will still be slow, but we need to start somewhere and can't fix everything at once.

needed.

slow.

evenly across different jobs and speed up the build.

Kami · 2021-03-28T11:32:55Z

I made a couple of more optimizations and improvements, including caching apt packages (since that step takes up to 50 seconds).

On that note, I also noticed some weird issue going on with cache not being saved (" Unable to reserve cache with key , another job may be creating this cache.") - actions/toolkit#658 even when only a single job runs at once 🤷

Kami · 2021-03-28T14:46:09Z

This is RFR now.

Run time is now just under 8 minutes with primed cache (vs around 12-15 minutes before - e.g. https://github.com/StackStorm/st2/actions?query=event%3Apull_request+branch%3A5060_pip_to_latest, https://github.com/StackStorm/st2/actions?query=event%3Apull_request+branch%3Adependabot%2Fpip%2Fpyyaml-5.4).

I'll update github checks after it's reviewed.

arm4b

With those changes, the whole workflow run now takes ~8-9 minutes instead of ~13-15 minutes.

That's indeed an improvement in timing, but I'm wondering that multiplying the CI checks by such a significant number (x4-5) pollutes the UI here making scrolling excessive and indicators harder to observe:
.

vs old version:

Is that level of complexity and UX a good price to pay for a couple of minutes of improvement?

Another point, having the unit + integration tests optimized to sub 10mins, the bottleneck is still the CircleCI packaging which takes ~17 mins and e2e tests taking ~30mins. So it feels to me while the GH CI checks optimizations are nice, in reality, users still have to wait for other jobs to finish. Our CI is slow as the longest CI job on the checklist.

WDYT? Any ideas on how to find the balance here?

Kami · 2021-03-28T17:39:27Z

Yes, it does make UX a bit more complex, but I think it's a small price to pay (and well worth it) - scrolling couple of pages is still much faster than waiting multiple minutes on the test results :)

And yes, sadly end to end tests will still be the bottleneck, but this doesn't mean we should not try to improve and speed up other checks.

As far as packages go - IIRC, I already mentioned in the past that we should cache more things there (aka Python packages we download to speed up the build - /tmp/wheelhouse, etc.).

arm4b · 2021-03-28T20:56:53Z

I think splitting both Unit and Integration to x2 jobs would be just good enough to on-board the functionality you'd like and avoiding CI checks spam.

Kami · 2021-03-28T21:07:59Z

@armab That wouldn't get much us much of a performance improvement (I started with 2).

And again, I really don't think "CI checks spam" is a big issue - aka it's a compromise and it's still much better than the alternative (slower build).

And yes, from a hypothetical "person who maybe contributes a couple of times per year" perspective, slightly longer build is not a big deal (and in that case people may be willing to trade off less checks for it), but for people who contribute on a regular basis, even a couple of minute speed up is well worth the compromise.

Long term, if we want to reduce number of parallel jobs we can probably do that if we switch to self-hosted runners and use some faster CPU optimized VM / instance.

Kami · 2021-03-28T23:18:48Z

I did (more than) my part - StackStorm/st2-packages#697 (comment).

Now someone just needs to fix / improve end to end tests (as mentioned many times already, first step should be upgrading cicd server to 3.5dev packages which include my performance improvements - hoping that will speed things up a bit) :P

arm4b

With the CircleCI optimizations you did (StackStorm/st2-packages#697) and now reducing the entire deb/rpm build time to ~ 10mins, let's please balance the build timing by lowering the Github Actions parallel number so the total wait time for all independent tasks is around 10mins and so it's practical.

With that, I think it's also a good time to remove the e2e checks for every PR and so it's not a blocker and we can request it on-demand via Github message/request in PR.

Kami · 2021-03-31T19:27:10Z

@armab Yeah, will remove 1 or 2 parallel jobs and see how it goes.

As far as end to end tests go - I will still leave those enabled for the time being, until on demand workflow trigger via PR comment is implemented.

Kami · 2021-03-31T19:51:43Z

@armab OK, so I removed two parallel jobs for each set of tests (integration, unit) and the overall workflow time was still around 8 minutes (with primed cache) which is still quite decent.

I still hope that at some point in the future we will be able to utilize on demand runner on EC2 / Azure with a larger VM and faster CPU to speed things up even more.

To put things into context - locally unit tests take around 5-6 minutes without any parallelization (granted, I do have a very fast CPU).

Kami · 2021-03-31T19:59:35Z

I'll update github checks for master branch once this PR is merged.

arm4b

LGTM and thanks a lot for the optimization! 👍

…_experiment

…into nose_parallel_experiment

Kami · 2021-04-01T21:05:02Z

Merged into master, updated github checks.

pull-request-size bot added the size/S PR that changes 10-29 lines. Very easy to review. label Mar 26, 2021

Kami force-pushed the nose_parallel_experiment branch 2 times, most recently from 5e32c22 to 12b2047 Compare March 26, 2021 23:00

pull-request-size bot added size/M PR that changes 30-99 lines. Good size to review. and removed size/S PR that changes 10-29 lines. Very easy to review. labels Mar 26, 2021

Kami force-pushed the nose_parallel_experiment branch from 12b2047 to d20ef53 Compare March 26, 2021 23:05

Test running unit tests in multiple jobs in parallel.

8730a84

Kami force-pushed the nose_parallel_experiment branch from d20ef53 to 8730a84 Compare March 26, 2021 23:37

Fix test case and make sure we don't realy on state created by another

78baa3b

test. This way test method can run in a standalone fashion.

Add additional logging.

6287ddd

Add additional parallel job.

bffff5e

Kami added infrastructure: ci/cd performance labels Mar 27, 2021

Kami changed the title ~~[WIP] Experiment with running tests in parallel on GHA~~ Experiment with running tests in parallel on GHA Mar 27, 2021

Only try running redis container for integration tests where it's

6de8b11

needed.

pull-request-size bot added size/L PR that changes 100-499 lines. Requires some effort to review. and removed size/M PR that changes 30-99 lines. Good size to review. labels Mar 27, 2021

Kami added 9 commits March 27, 2021 19:20

Move redis to test requirements.

6b3c1e4

Try listening only on ipv4.

8f74e7a

For now, dont run micro benchmarks on nightly basis since it's very

35f2e2f

slow.

Use docker rm.

1dfce18

Remove workarounds we don't need anymore.

cbbe3fb

Remove testing change.

aa10c02

Add two more chunks and a comment.

2248e71

Move pylint check from ci-checks to ci-compile to spread the load more

31d966b

evenly across different jobs and speed up the build.

Try simplifying and unifying the requirements.

84cea73

Test a change.

d755ba1

Kami force-pushed the nose_parallel_experiment branch from 549f2f9 to d755ba1 Compare March 28, 2021 11:42

Kami added 2 commits March 28, 2021 13:52

Change cache key.

f0e0e6a

Dont run apt-get update on populated cache.

581a61e

Kami force-pushed the nose_parallel_experiment branch from 9e04340 to 581a61e Compare March 28, 2021 12:03

Kami added 3 commits March 28, 2021 14:05

Remove testing changes.

c97eb14

Remove file we don't need anymore.

32798c1

Remove testing change.

c6e0569

Remove change which is not needed anymore.

5bf8f73

arm4b suggested changes Mar 28, 2021

View reviewed changes

arm4b reviewed Mar 31, 2021

View reviewed changes

Decrease number of jobs which run in parallel.

23094ff

Kami added this to the 3.5.0 milestone Mar 31, 2021

Merge branch 'master' into nose_parallel_experiment

76d0422

arm4b approved these changes Apr 1, 2021

View reviewed changes

arm4b force-pushed the nose_parallel_experiment branch from 36a4709 to 76d0422 Compare April 1, 2021 18:24

Kami added 2 commits April 1, 2021 22:00

Merge branch 'master' of github.com:StackStorm/st2 into nose_parallel…

ef852fd

…_experiment

Merge branch 'nose_parallel_experiment' of github.com:StackStorm/st2 …

bc440bc

…into nose_parallel_experiment

Kami merged commit 77e20dc into master Apr 1, 2021

Kami deleted the nose_parallel_experiment branch April 1, 2021 21:03

Uh oh!

Experiment with running tests in parallel on GHA #5211

Experiment with running tests in parallel on GHA #5211

Uh oh!

Conversation

Kami commented Mar 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background, Context

Results

Gotchas, etc.

TODO

Uh oh!

Kami commented Mar 26, 2021

Uh oh!

Kami commented Mar 27, 2021

Uh oh!

amanda11 commented Mar 27, 2021

Uh oh!

Kami commented Mar 27, 2021

Uh oh!

Kami commented Mar 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kami commented Mar 28, 2021

Uh oh!

arm4b left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kami commented Mar 28, 2021

Uh oh!

arm4b commented Mar 28, 2021

Uh oh!

Kami commented Mar 28, 2021

Uh oh!

Kami commented Mar 28, 2021

Uh oh!

arm4b left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kami commented Mar 31, 2021

Uh oh!

Kami commented Mar 31, 2021

Uh oh!

Kami commented Mar 31, 2021

Uh oh!

arm4b left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kami commented Apr 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kami commented Mar 26, 2021 •

edited

Loading

Kami commented Mar 28, 2021 •

edited

Loading

arm4b left a comment •

edited

Loading

arm4b left a comment •

edited

Loading

arm4b left a comment •

edited

Loading