Skip to content

Add content-based instruction cache #58

@tonistiigi

Description

@tonistiigi

I'll tackle this next. Writing down some thoughts.

One of the problems with docker's instruction cache is that it only defines a cache between two steps. You have to solve the definition to a certain point to see if there is a next step that may be possibly cached.

Buildkit should attempt to find all the cache keys as soon as possible. You don't have to solve the whole graph or all branches to find out that a vertex data has been cached. For example, when two branches are merged together you don't have to have the data for the original branches to verify that you have the cache for the merged part as long as you can verify that the sources and the graph definition have not been updated.

Other difference is that vertexes should have multiple cache keys. For example, COPY should be cached by both definition and source content. In docker build, COPY is only fixed to content while other commands only use meta definition. This is because in docker build there is no unique cache key for the root of the context source. Also, cache keys by content should never need to be recalculated, even with --no-cache options.

Some definitions for the cache keys:
Image source: ChainID
Git source: commit-sha
Local file source: session-id
Exec: meta+cachekey of inputs, possibly meta+cachekey of input contents
Copy: meta+cachekey of inputs, meta + cachekey of input contents

A complication is that keys based on contents can't be found until the input has been solved. In the case of sources, cache key can be usually found without fully downloading the source data. The source interface would need to be updated to add an extra method for that.

@AkihiroSuda

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions