cache: ensure random prefixes are in the exported cache#4468
Conversation
Signed-off-by: Justin Chadwell <me@jedevc.com>
tonistiigi
left a comment
There was a problem hiding this comment.
I think if you want to the sessionid/uniq based cache key for local source (I think this is the only case where these appear) to be exported then this needs to be specifically marked in the LLB of that local step. Either as a special field or we can use some convention that a specific prefix means "stable key". We should also have a recommendation that different tools always add their product name to stable key (instead of just using some file path for example) to avoid accidental collisions. We do not want true random identifiers in the cache manifest - they will never match, cache manifest will change in every rebuild and grow forever.
What's frustrating is that the cache for the expensive sleep step is right there in the manifest (I can see it, laughing at me 😢) - I just can't actually seem to match against it.
Is this because the cache key is only for the "sleep" and not for the "local"? I also assume the if you make the mount readonly then you get cache matches. If this is the case then it looks like a bug in normalization of cache manifest as no need to write unreachable cache records to the manifest.
|
I have no real memory of what I was specifically trying to do here - this was a weird edge case anyways. |
Ok, this is the result of a lot of deep-dive investigations with @sipsma into some caching-related issues in dagger, that I've eventually tracked here 🎉 Disclaimer: I'm pretty sure that this open PR is not the right solution, but not quite sure what the right kind of approach here. Figured it might be easier to talk with some actual code examples instead of just showing up empty handed 😄
First, some context:
session:CacheKeyfrom aSource. This is sort of not ideal, but we do this so that we don't end up exporting the cache for thatSource.nopRecordprocessing as introduced in Make exported build cache deterministic #780, it makes records that depend on that one essentially unable to be cached - while including the cache values for those records (essentially we push cache that won't ever match).random:prefixes in the cache with no content associated (what this PR implements, and resolves our issue downstream)random:(which is also fine, but then in dagger we'd need to add something that allows us to not cache the source that we have without needing to play around withsession:- maybe that'sllb.WithoutExportCachewhich is the partial motivation for my question in https://dockercommunity.slack.com/archives/C7S7A40MP/p1701860805180019 🎉).You can see a contrived example here:
When importing an exported cache manifest from here, we end up correctly caching the
apk add curlstep, but recomputing the expensivesleepstep even when the LLB and cache-keys are completely identical between runs (I've got a sort-of test for this in the PR).What's frustrating is that the cache for the expensive
sleepstep is right there in the manifest (I can see it, laughing at me 😢) - I just can't actually seem to match against it.