Skip to content

cache: ensure random prefixes are in the exported cache#4468

Closed
jedevc wants to merge 1 commit into
moby:masterfrom
jedevc:fix-export-cache-random
Closed

cache: ensure random prefixes are in the exported cache#4468
jedevc wants to merge 1 commit into
moby:masterfrom
jedevc:fix-export-cache-random

Conversation

@jedevc
Copy link
Copy Markdown
Member

@jedevc jedevc commented Dec 6, 2023

Ok, this is the result of a lot of deep-dive investigations with @sipsma into some caching-related issues in dagger, that I've eventually tracked here 🎉 Disclaimer: I'm pretty sure that this open PR is not the right solution, but not quite sure what the right kind of approach here. Figured it might be easier to talk with some actual code examples instead of just showing up empty handed 😄

First, some context:

  • Essentially, in Dagger we're in a position where we end up reusing a session: CacheKey from a Source. This is sort of not ideal, but we do this so that we don't end up exporting the cache for that Source.
  • Everything works "as expected" - except, that the exported cache manifests are a little odd. Specifically, where we do the nopRecord processing as introduced in Make exported build cache deterministic #780, it makes records that depend on that one essentially unable to be cached - while including the cache values for those records (essentially we push cache that won't ever match).
  • This feels wrong to me. IMO there are two "correct" behaviors here:
    1. We should keep the random: prefixes in the cache with no content associated (what this PR implements, and resolves our issue downstream)
    2. We should also remove these dependent records that won't match because one of their links was a random: (which is also fine, but then in dagger we'd need to add something that allows us to not cache the source that we have without needing to play around with session: - maybe that's llb.WithoutExportCache which is the partial motivation for my question in https://dockercommunity.slack.com/archives/C7S7A40MP/p1701860805180019 🎉).

You can see a contrived example here:

img := llb.Image("alpine:latest@sha256:34871e7290500828b39e22294660bee86d966bc0017544e848dd9a255cdf59e0")
img = img.Run(llb.Args([]string{"apk", "add", "curl"})).Root()

run := img.Run(llb.Args([]string{"/bin/sh", "-c", "sleep 10 && echo foo > /foo"}))
// NOTE: yes, the right thing to do here is to set llb.Readonly/etc, but by not doing
// it we force it to use the definition-based fast cache, which includes the fixed
// "session" prefix we encounter the issue with (content-based slow cache doesn't have
// this problem)
run.AddMount("/mount", llb.Local("context", llb.SessionID("fixed")))
img = run.Root()

When importing an exported cache manifest from here, we end up correctly caching the apk add curl step, but recomputing the expensive sleep step even when the LLB and cache-keys are completely identical between runs (I've got a sort-of test for this in the PR).

What's frustrating is that the cache for the expensive sleep step is right there in the manifest (I can see it, laughing at me 😢) - I just can't actually seem to match against it.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you want to the sessionid/uniq based cache key for local source (I think this is the only case where these appear) to be exported then this needs to be specifically marked in the LLB of that local step. Either as a special field or we can use some convention that a specific prefix means "stable key". We should also have a recommendation that different tools always add their product name to stable key (instead of just using some file path for example) to avoid accidental collisions. We do not want true random identifiers in the cache manifest - they will never match, cache manifest will change in every rebuild and grow forever.

What's frustrating is that the cache for the expensive sleep step is right there in the manifest (I can see it, laughing at me 😢) - I just can't actually seem to match against it.

Is this because the cache key is only for the "sleep" and not for the "local"? I also assume the if you make the mount readonly then you get cache matches. If this is the case then it looks like a bug in normalization of cache manifest as no need to write unreachable cache records to the manifest.

@jedevc
Copy link
Copy Markdown
Member Author

jedevc commented Nov 13, 2025

I have no real memory of what I was specifically trying to do here - this was a weird edge case anyways.

@jedevc jedevc closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants