Add step to load pods in cache before building and testing#19551
Add step to load pods in cache before building and testing#19551
Conversation
It occurred to me, because of our steps parallelism, we can run in situations where a commit that changes the `Podfile.lock` results in multiple steps in the same CI build installing the pods from scratch. For example, see this build where all three IPA deployment steps spend 15+ minutes in the CocoaPods part of the build because there was no cached version of the pods available, https://buildkite.com/automattic/wordpress-ios/builds/10636. By introducing a step at the start of the pipeline, on which the following steps depend upon, to install the pods we can make sure parallel steps won't waste time building the same new version of the Pods.
c923ffc to
d88a72e
Compare
You can test the changes in WordPress from this Pull Request by:
|
You can test the changes in Jetpack from this Pull Request by:
|
There was a problem hiding this comment.
As you said in the PR description, I'm not entirely sure that this is worth it, especially if it adds ≈ 4mn to each build when there's a cache hit.
Besides, the solution you implemented in this PR wouldn't work, because independent steps can run on different agents, so we'd need to also upload then download a tar of the Pods folder in between those new steps for it to work.
One alternative approach that one could try though, would be to add a step similar to your :cocoapods: Pre-load Pods one, but that would instead:
- Only check if cache would hit or miss. Ideally without having to setup ruby and run
podcommand. Like, maybe we can just [compute thePodfilechecksum like here, and check if there exists a cache for it on S3… but without downloading it. - If there's a cache hit, do nothing (in particular, don't bother downloading and installing said cache in that step–we just needed the check here, not the actual Pods; the subsequent dependent steps–which might run on different agents–will just use the existing cache)
- If there's a cache miss, then setup ruby, then install the (new) pods, then save and upload them to a new cache. The subsequent dependent steps would then just detect and use that cache automatically.
If we can make this work, that would mean that the precheck step that we would add to validate Pods cache is up-to-date… should only take a couple of seconds (compute the Podfile hash, then check if a cache entry for it exists, without downloading it) in the most common case of a cache hit. While still saving those duplicated ≈15mn on subsequent steps if there was a cache miss.
| # Make sure there is a cached version of the pods avaialble before testing | ||
| # the app or deploying installable builds to avoid parallel steps wasting | ||
| # time building the same `Podfile.lock` from scratch. | ||
| - label: ':cocoapods: Pre-load Pods' | ||
| key: pods | ||
| command: .buildkite/commands/load-pods.sh | ||
| env: *common_env | ||
| plugins: *common_plugins |
There was a problem hiding this comment.
Does this really work? I thought that each step in a Buildkite pipeline were independent from one another in terms of the agents they might be using, which means that this step downloading / installing pods might run on one agent, while the following steps that depends on it might run on a different one:
However you add steps to your pipeline, keep in mind that steps may run on different agents. It is good practice to install your dependencies in the same step that you run them.
— Source: https://buildkite.com/docs/pipelines/defining-steps#adding-steps
So if we want this to work as expected, we'd also have to e.g. upload a .tar of the Pods/ folder that got installed by this step in artefacts, then download and uncompress that Pods.tar back in the subsequent steps (similar to how we do it for build-for-testing uploading the Build Products here then unit-tests.sh downloading them back here)
There was a problem hiding this comment.
So if we want this to work as expected, we'd also have to e.g...
That's what install_cocoapods does at the bash-cache plugin level.
There was a problem hiding this comment.
Or, I should say "install_cocoapods should be doing". I haven't gotten to the point of testing a build that would trigger a new cache generation. But I'm pretty sure it works as install_cocoapods is tried and tested.
There was a problem hiding this comment.
Oh I didn't realize that install_cocoapods was already I closing the cache dance! Your initial approach makes more sense now 🙂
So I guess the only thing left to solve is how to tell install_cocoapods told only install them (and thus ultimately rebuild the cache) if there was a cache miss… but don't bother installing them during this check step in case of a cache hit.
|
@AliSoftware thank you for having a look here and validating my doubts on the ROI of the setup as it is right now. Your suggestion of doing a quick cache check at the start and only if we get a miss do the hard work of setting things up is exactly what I was thinking 😄 I'll have a look at some point. |
It occurred to me, because of our steps parallelism, we can run in situations where a commit that changes the
Podfile.lockresults in multiple steps in the same CI build installing the pods from scratch. For example, see this build where all three IPA deployment steps spend 15+ minutes in the CocoaPods part of the build because there was no cached version of the pods available,https://buildkite.com/automattic/wordpress-ios/builds/10636.
By introducing a step at the start of the pipeline, on which the following steps depend upon, to install the pods we can make sure parallel steps won't waste time building the same new version of the Pods.
Testing
I'll add a few example builds here if we decide to move forward with this idea
RFC – Is this worth it & How to improve it
Based on the crude build time numbers mentioned above, on a release CI build that doesn't have a cached
Pods/available, this could save ~30 minutes – 15 minutes for the cache, then run 3 app builds with the cache available vs. run 3 app builds each building the cache. The savings should be similar on the regular pipeline, where we have 2 installable build steps plus the one that builds the app for testing. We'll also soon have the step to validate the.stringsgeneration, which will run CocoaPods, too.However, on builds that do have a cache available, this will introduce a ~4 minutes delay.
That means that will have caught up with the saving on non-cached builds every 7-8 cached builds. I don't have stats but, intuitively, I don't think we change the
Podfile.lockthat frequently.The time saved might even be less that ~30 mins on a 3 steps build because, let's not forget, each step will still need to download the cache.
So, I wonder whether this is worth pursuing. From one point of view, I "can't unsee" the time waste that occurs when the cache is not available. From the other, my back of the napkin math points to the time saving becoming irrelevant once we take into account
Podfile.lockdoesn't change that often.Improvement ideas:
install_gemsandinstall_pods. To do so, we'd have to extract the logic that checks that in a dedicated command at thebash-cachelevel.Podfile.lockdoesn't change often in development builds, but the build-to-change ratio is much higher in for release builds. The code freeze build will almost always change the pods. We always run at least two additional CI release builds: localization smoke testing and release finalization. On top of those, we may or may not have additional betas to ship. Even accounting for 3 additional betas, the number of CI release builds after the cache is still such that ....I need a break! I'll finish this later because I'm becoming more and more unsure of my math
Regression Notes
RELEASE-NOTES.txtif necessary. N.A.