Conversation
|
I wonder if we should switch from |
|
That sounds like a great idea @keewis Edit: I can hopefully have a go at this next week |
| - name: Install conda dependencies | ||
| run: | | ||
| mamba env update -f $CONDA_ENV_FILE | ||
| if: steps.cache-env.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
I think this needs to be removed. A dependency's version can change without any change in the cache key.
There was a problem hiding this comment.
Thanks @dcherian
My understanding of conda (which I only use when developing xarray, so don't know) is that running the update step updates all the minor versions. And so that would cause cache misses on any patch version increase.
So my understanding of this if is that to avoid updating patch versions, so we still get a hit, at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).
Is your point that this is bad, because we do want to update the patch version in these cases?
There was a problem hiding this comment.
Is your point that this is bad, because we do want to update the patch version in these cases?
Yes.
There was a problem hiding this comment.
at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).
Actually they last a week if they haven't been accessed, so this doesn't work.
I don't think it will helpful if we remove the if, because GHA caches are write-once, so as soon as a new version comes out, we'll have to reinstall it until the cache key changes.
In other projects I've used a poetry lock file, which is useful to cache on, and then updated at some frequency. But I'm not sure if that approach is supported by conda.
I can't seem to figure out whether we can key on a file that's generated as part of the run rather than in the repo — that would let us make the minimum changes to the process while getting the cache to work. We could try it if people agree it would be a good idea.
There was a problem hiding this comment.
If we remove the if, then we'll spend time "solving" the environment every time but will save time because we aren't downloading + installing every single package, just those that have updated since the cache was created (IIUC). Please correct me if I am wrong here.
This should still be a good improvement, no?
There was a problem hiding this comment.
just those that have updated since the cache was created
Yes but it's only created the first time the key hit. Caches are never overwritten. So this would decay a lot.
The docs suggest we add the date onto the key, which seems not that ambitious, but maybe better than nothing.
There was a problem hiding this comment.
I had a thought that we could add the merge's base ref onto the key (i.e. the commit on main that it branches off), so it updated every time we merged. I need to think through this more though.
Reverts #6543 as discussed.