-
Notifications
You must be signed in to change notification settings - Fork 395
docker-archive generates docker legacy compatible images #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is related to this issue containers/skopeo#425. |
7edbab1 to
dd91c5c
Compare
|
I'm not able to get the error message from the CI, and Any ideas? |
|
The CI failure is Probably reformat the file using As for the local |
1968d6e to
c4c8480
Compare
|
Thanks @mtrmac Moreover, I cannot reproduce it when I use the docker registry. Is there a trick with the test registry? (maybe it is using |
c4c8480 to
5e0bcdd
Compare
|
Ok, I've finally fixed tests |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry, afraid I didn’t have time to review this in detail yet (the core changes in docker/tarfile in particular), here are a few mostly API-focused notes for now.
| if isCompressed && ic.dest.ShouldDecompressLayers() { | ||
| logrus.Debugf("Blob will be decompressed") | ||
| destStream, err = decompressor(destStream) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a decompression code around getOriginalLayerCopyWriter. If at all possible, we should only decompress the data once and share the output between the two consumers, if they exist at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac getOriginalLayerCopyWriter is only used when the format of the source image is Docker SchemaV1. So, we could have two decompression if the source format image is Docker SchemaV1 and the destination is docker-archive.
But I think it's not trivial to change this behavior since getOriginalLayerCopyWriter is already a little bit tricky...
Do you know if SchemaV1 is still widely used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema1 is still around (and old images will keep using it), but hopefully its will diminish over time… just like using the repositories archive format :)
I really don’t like decompressing the possibly gigabytes of data twice, but you’re right that the code is tricky and I don’t quite have the extra time necessary to get this right, so I guess let’s live with the two separate decompressions.
| if err != nil { | ||
| return types.BlobInfo{}, err | ||
| } | ||
| inputInfo.Digest = digester.Digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be cheaper to either compute the digest always, or use io.TeeReader/io.MultiWriter, so that we make as few passes over the data as possible.
types/types.go
Outdated
| type BlobInfo struct { | ||
| Digest digest.Digest // "" if unknown. | ||
| Size int64 // -1 if unknown | ||
| Config bool // true if the blob is a config blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very elegant... conceptually, the information is already available in MediaType, and more importantly if Config:true is only set by Image.ConfigInfo(), the caller structurally knows that the blob is a config without having to ask.
Maybe we should have separate ImageDestination.{PutLayer,PutBlob}, or at the very least least move this Config bool into a separate parameter of PutBlob which can be set by the caller directly instead of relying on ConfigInfo. (@runcom WDYT?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…separate imageDestination.{PutLayer,PutConfig} that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only use case that needs the distinction between layers and the config, I think it would be less intrusive to add a bool paramater to the putBlob method.
Do you think there are other use cases that need this distinction?
types/types.go
Outdated
| // ShouldCompressLayers returns true iff it is desirable to compress layer blobs written to this destination. | ||
| ShouldCompressLayers() bool | ||
| // ShouldDecompressLayers returns true iff it is desirable to decompress layer blobs written to this destination. | ||
| ShouldDecompressLayers() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a single enum (with cases compress/decompress/preserveoriginal) instead of two booleans, to make structurally sure ShouldCompressLayers&&ShouldDecompressLayers can’t be true at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
ff42919 to
bb5fb7a
Compare
|
CI is failing because Skopeo needs to be patched. Someone could still have a look to this PR? |
|
I'm a nixos / docker tooling user and I would very much love to see this merged :) |
|
This PR has never been really reviewed. I have had some comments from @mtrmac but I don't know if this "feature" could be merged. |
|
Sorry about that, I was unavailable for most of December and getting to this through my backlog took a while. Here’s an in-progress review, I’m afraid I have to context switch to something else for the rest of today… |
copy/copy.go
Outdated
| func (c *copier) copyBlobFromStream(srcStream io.Reader, srcInfo types.BlobInfo, | ||
| getOriginalLayerCopyWriter func(decompressor compression.DecompressorFunc) io.Writer, | ||
| canCompress bool) (types.BlobInfo, error) { | ||
| canCompress bool, isConfig bool) (types.BlobInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canCompress value comes from canModifyManifest in callers, and applies just as well to decompression; please rename the parameter, perhaps to canModifyBlob, and make it prevent both compression and decompression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
copy/copy.go
Outdated
|
|
||
| // === Decompress the layer if it is compressed and decompression is desired | ||
| // This is currently only used by docker-archive | ||
| if isCompressed && c.dest.CompressesLayers() == types.Decompress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This if /*complex*/ {} else if {}; if {} sequence feels like something that should be possible to express in a clearer way. I haven’t actually tried yet, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the sequence can be rewritten such as:
if compression is desired and it is possible
compress()
else if decompression is desired and it is possible
decompress()
else
continue
docker/tarfile/dest.go
Outdated
| if inputInfo.Size == -1 { // Ouch, we need to stream the blob into a temporary file just to determine the size. | ||
| // Ouch, we need to stream the blob into a temporary file just to determine the size. | ||
| // When the layer is decompressed, we also have to generate the digest on uncompressed datas. | ||
| if inputInfo.Size == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this be inputInfo.Size == -1 || inputInfo.Digest == ""?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is possible that inputInfo.Digest == "" && inputInfo.Size != -1 is true but I add a guard in case of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right that the copy.Image code won’t have inputInfo.Digest == "" && inputInfo.Size != -1, but the documented behavior of the interface does allow it.
The added check below (Can not stream a blob with unknown digest to docker tarfile) is too late; check in this condition, and then at the location of the current a digest ill always be available.
docker/tarfile/dest.go
Outdated
| } | ||
| if inputInfo.Digest == "" { | ||
| inputInfo.Digest = digester.Digest() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more consistent to keep the update of inputInfo.Digest and inputInfo.Size close together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand wht you mean... but I've tryied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this. Thanks.
docker/tarfile/dest.go
Outdated
| if err := d.sendFile(inputInfo.Digest.String(), inputInfo.Size, tee); err != nil { | ||
| return types.BlobInfo{}, err | ||
| // When the digest is generated on the uncompressed layer, we | ||
| // have to recheck if the layer has been already sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? inputInfo.Digest, if present, must match exactly the contents of the input stream. It should never happen that the stream is uncompressed but the digest value matches a compressed version and needs to be recomputed just to be correct. Is that broken somewhere?
Maybe the HasBlob check at the top should just be moved to this place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
docker/tarfile/dest.go
Outdated
| t := make(map[string]string) | ||
| t[repoTag[1]] = rootLayerID | ||
| r := make(map[string]map[string]string) | ||
| r[repoTag[0]] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above can be expressed as
repositories := map[string]map[string]string{
repoTag[0]:{repoTag[1]:rootLayerID}
}which seems much more readable, and does not even need the “This file looks like” comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
docker/tarfile/dest.go
Outdated
| } | ||
|
|
||
| func createRepositoriesFile(d *Destination, rootLayerID string) error { | ||
| // We generate the repositories file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name already says that, no need to repeat it here. Or, if you want to, move it one line up and make it a proper Golang doc string.
docker/tarfile/dest.go
Outdated
| func createRepositoriesFile(d *Destination, rootLayerID string) error { | ||
| // We generate the repositories file | ||
| // This file looks like '{repo : { tag : rootLayerSha }}' | ||
| repoTag := strings.Split(d.repoTag, ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewDestination already works with the name and tag as separate strings extracted from a well-typed reference.NamedTagged; it is awkward to concatenate them to a string and then parse them out of the string again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docker/tarfile/types.go
Outdated
| // legacyConfigFileName = "json" | ||
| // legacyVersionFileName = "VERSION" | ||
| legacyVersionFileName = "VERSION" | ||
| // legacyRepositoriesFileName = "repositories" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either use the legacyRepositoriesFileName and legacyLayerFileName constants, or remove them from here if you want the code to hard-code the names directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use these constants.
docker/tarfile/dest.go
Outdated
| // The legacy format requires a config file per layer | ||
| layerConfig := make(map[string]*json.RawMessage) | ||
| id := l.Digest.Hex() | ||
| idJSON := json.RawMessage("\"" + id + "\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awkward and potentially unsafe. Couldn't layerConfig be a map[string]interface{}, assigning some string and some *json.RawMesage values to individual entries of the map? Or actually a typed structure?
(It’s not yet clear to me whether all of this should share code with c/I/image/v1ConfigFromConfigJSON.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not clean but the objective is just to extract some parts of the image configuration file and use them to create layer configuration files, without really know what is inside these parts. Since contents of these parts don't seem to be relevant, I think it's not necessary to create a complex data structure just to copy/paste these JSON blobs.
I now use a map[string]interface{} where the extracted attributes are *json.RawMesage while attributes that are created are string (id and parent).
nlewo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments @mtrmac.
Note, I didn't run your validation targets yet.
docker/tarfile/dest.go
Outdated
| } | ||
| if inputInfo.Digest == "" { | ||
| inputInfo.Digest = digester.Digest() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand wht you mean... but I've tryied.
docker/tarfile/dest.go
Outdated
| if inputInfo.Size == -1 { // Ouch, we need to stream the blob into a temporary file just to determine the size. | ||
| // Ouch, we need to stream the blob into a temporary file just to determine the size. | ||
| // When the layer is decompressed, we also have to generate the digest on uncompressed datas. | ||
| if inputInfo.Size == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is possible that inputInfo.Digest == "" && inputInfo.Size != -1 is true but I add a guard in case of.
docker/tarfile/dest.go
Outdated
| if err := d.sendFile(inputInfo.Digest.String(), inputInfo.Size, tee); err != nil { | ||
| return types.BlobInfo{}, err | ||
| // When the digest is generated on the uncompressed layer, we | ||
| // have to recheck if the layer has been already sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
| if isConfig { | ||
| buf := new(bytes.Buffer) | ||
| buf.ReadFrom(stream) | ||
| d.config = buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Thanks.
docker/tarfile/dest.go
Outdated
| func createRepositoriesFile(d *Destination, rootLayerID string) error { | ||
| // We generate the repositories file | ||
| // This file looks like '{repo : { tag : rootLayerSha }}' | ||
| repoTag := strings.Split(d.repoTag, ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docker/tarfile/dest.go
Outdated
| t := make(map[string]string) | ||
| t[repoTag[1]] = rootLayerID | ||
| r := make(map[string]map[string]string) | ||
| r[repoTag[0]] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
docker/tarfile/dest.go
Outdated
| for i, l := range man.LayersDescriptors { | ||
| layerPaths = append(layerPaths, l.Digest.Hex()+"/layer.tar") | ||
| b := []byte("1.0") | ||
| if err := d.sendFile(filepath.Join(l.Digest.Hex(), legacyVersionFileName), int64(len(b)), bytes.NewReader(b)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docker/tarfile/dest.go
Outdated
| // The legacy format requires a config file per layer | ||
| layerConfig := make(map[string]*json.RawMessage) | ||
| id := l.Digest.Hex() | ||
| idJSON := json.RawMessage("\"" + id + "\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not clean but the objective is just to extract some parts of the image configuration file and use them to create layer configuration files, without really know what is inside these parts. Since contents of these parts don't seem to be relevant, I think it's not necessary to create a complex data structure just to copy/paste these JSON blobs.
I now use a map[string]interface{} where the extracted attributes are *json.RawMesage while attributes that are created are string (id and parent).
docker/tarfile/dest.go
Outdated
| } | ||
| } | ||
|
|
||
| if err := createRepositoriesFile(d, man.LayersDescriptors[len(man.LayersDescriptors)-1].Digest.Hex()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a guard to only create the repostorie file is there is at least one layer.
docker/tarfile/types.go
Outdated
| // legacyConfigFileName = "json" | ||
| // legacyVersionFileName = "VERSION" | ||
| legacyVersionFileName = "VERSION" | ||
| // legacyRepositoriesFileName = "repositories" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use these constants.
bbff322 to
50a38b3
Compare
copy/copy.go
Outdated
| if canModifyBlob && c.dest.CompressesLayers() == types.Compress && !isCompressed { | ||
| logrus.Debugf("Compressing blob on the fly") | ||
| pipeReader, pipeWriter := io.Pipe() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please drop this line, to keep the defer immediately after creating the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
directory/directory_test.go
Outdated
| compress := dest.ShouldCompressLayers() | ||
| assert.False(t, compress) | ||
| info, err := dest.PutBlob(bytes.NewReader(blob), types.BlobInfo{Digest: digest.Digest("sha256:digest-test"), Size: int64(9)}) | ||
| assert.Equal(t, dest.CompressesLayers(), types.PreserveOriginal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of arguments should be assert.Equal(t, expectedValue, valueBeingTested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // a hostname-qualified reference. | ||
| // See https://github.com/containers/image/issues/72 for a more detailed | ||
| // analysis and explanation. | ||
| refString := fmt.Sprintf("%s:%s", ref.Name(), ref.Tag()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the big comment above this line to PutManifest as well, along with the refString variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docker/tarfile/dest.go
Outdated
| tee := io.TeeReader(stream, digester.Hash()) | ||
| if err := d.sendFile(inputInfo.Digest.String(), inputInfo.Size, tee); err != nil { | ||
| return types.BlobInfo{}, err | ||
| if inputInfo.Digest.String() == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always be false at this point, at least after the previous if is updated (and the code below already depends on inputInfo.Digest being valid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm fixing it... Let me know if you meant something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at this point the code can unconditionally assume inputInfo.Digest is valid.
The start of PutBlob could, I think, be
func (d *Destination) PutBlob(stream io.Reader, inputInfo types.BlobInfo, isConfig bool) (types.BlobInfo, error) {
// Ouch, we need to stream the blob into a temporary file just to determine the size or digest.
if inputInfo.Size == -1 || inputInfo.Digest.String() == "" {
logrus.Debugf("docker tarfile: input with unknown size or digest, streaming to disk first ...")
and transparently handle the hypothetical case of known size && unknown digest instead of failing.
docker/tarfile/dest.go
Outdated
| if err := d.sendFile(filepath.Join(inputInfo.Digest.Hex(), "layer.tar"), inputInfo.Size, stream); err != nil { | ||
| return types.BlobInfo{}, err | ||
| } | ||
| d.blobs[inputInfo.Digest] = types.BlobInfo{Digest: inputInfo.Digest, Size: inputInfo.Size} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.blobs can be updated in the isConfig case as well. HasBlob is, strictly speaking, not restricted to layer blobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, exactly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, why not move the d.blobs[inputInfo.Digest = … line one line lower, out of the if?
types/types.go
Outdated
| // ShouldCompressLayers returns true iff it is desirable to compress layer blobs written to this destination. | ||
| ShouldCompressLayers() bool | ||
| // CompressesLayers is used to know what kind of compression should be applied on layers | ||
| CompressesLayers() LayerCompression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompressesLayers, e.g. when compared with SupportsSignatures above it, reads as if it answers the question ”does the destination compress layers”, which is not the case. Name it perhaps something like DesiredLayerCompression?
copy/copy.go
Outdated
| originalLayerReader = destStream | ||
| } | ||
|
|
||
| // === Compress the layer if it is uncompressed and compression is desired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment now does not apply to the whole section, it is more of an alternative to the new two // == indented comments instead of a higher-level description to the two.
The original idea was that the // == comments delineate separate stages of the pipeline; so, maybe rewrite this one as something like // == Deal with layer compression/decompression if necessary,
and drop the two indented // == ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docker/tarfile/dest.go
Outdated
|
|
||
| // All layers expect the root one have a parent | ||
| if i != 0 { | ||
| layerConfig["parent"] = man.LayersDescriptors[i-1].Digest.Hex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An image may in principle use exactly the same layers several times, in a pathological case even in sequence; in that case this would create multiple versions of files with the same name in a single tar archive, and a loop in the “parent” links with some of the versions.
Instead, the IDs used in here need to be distinct even if the same l.Digest.Hex() value is repeated, see ChainID in https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b or the implementation in moby/moby/image/tarexport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, take any existing image (e.g. skopeo copy docker://whatever dir:tmpdir to have an easy way to edit it), edits its manifest to repeat one item of the layers array, edit its config.json to correspondingly list the layer twice in rootfs.diff_ids, and update the manifest with the updated manifest digest.
(To be honest, this would probably be the first time anyone is actually testing this corner case, so something else might turn out to be broken.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. I think I successfully built an image such like that. Currently, when I docker-load this image (after copying it with skopeo docker-archive), Docker sees 5 layers while there are 6 layers in the rootfs array.
I will see how we could handle this kind of images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac I've modified an image in order to reuse a layer by using the skopeo transport dir. I've tryied to copy this image to docker-daemon and it prints:
Skipping fetch of repeat blob sha256:6fbc4a21b806838b63b774b338c6ad19d696a9e655f50b4e358cc4006c3baa79
In fact, the same behavior is observed if we use the proposed docker-archive implementation: the duplicated layer is ignored.
To be honest, I don't really know if the generated image reproduces well this use case and I'm not sure what is the expected behavior when this kind of images are loaded by Docker... It would be easier by having a real image.
Moreover, in the Docker implementation, they use the chainID to generate the path of the layer in the tar stream. This is hard for us to do this because the layer is sent to the tar stream by the PutBlob method which is designed to handle each layer independently from other ones: we can not easily use previous layer sha to generate the current layer path.
Finally, it seems this kind of image would probably never been used!
So instead of adding complexity to the code for this corner case, we could print a message that says "Repeated layer is ignored".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping fetch of repeat blob sha256:…
That part is an optional optimization to PutBlob and more or less known to work; I am more worried about the correctness of the generated metadata (in this case, primarily the parent links).
(I need to test this scenario out myself, maybe I am talking nonsense; hopefully tomorrow. For now just this quick note.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, tested this. TL;DR: the legacy metadata in the generated tar file crashes the Docker daemon on docker load, so this really needs to be fixed.
(To be fair, the unmodified tar file which includes the new metadata as well works fine with docker load; but generating of the legacy format is the whole port of this PR, so the legacy format should be correct.)
# Build skopeo with this PR, updating layers.go as necessary
$ ./skopeo --override-os linux --policy default-policy.json copy docker://busybox dir:edits
$ python -mjson.tool < edits/5b0d59026729b68570d99bc4f3f7c31a2e4f2a5736435641565d93e7c25bd2c3.tar > _
# To duplicate the `rootfs.diff_ids` array entry
$ vi _
$ shasum --algorithm 256 _ | cut -d ' ' -f 1
e2f0703c8b534d097a822e895b0852ceedccb83a7a122aa25f92757b0ee1fcc3
$ ls -l _
$ mv _ edits/$(shasum --algorithm 256 _ | cut -d ' ' -f 1).tar
# To update the size and digest of the config, and duplicate the layer entry
$ vi edits/manifest.json
# Test that the result is minimally consistent
$ ./skopeo --override-os linux --policy default-policy.json copy dir:edits dir:t
$ rm -rf t
$ ./skopeo --override-os linux --policy default-policy.json copy dir:edits docker-archive:archive0.tar:busybox:v0
$ mkdir archive0
$ cd archive0/
$ tar xf ../archive0.tar
$ ls
4febd3792a1fb2153108b4fa50161c6ee5e3d16aa483a63215f936a113a88e9a manifest.json
e2f0703c8b534d097a822e895b0852ceedccb83a7a122aa25f92757b0ee1fcc3.json repositories
# The layer refers to itself as a parent!
$ python -mjson.tool 4febd3792a1fb2153108b4fa50161c6ee5e3d16aa483a63215f936a113a88e9a/json
…
"id": "4febd3792a1fb2153108b4fa50161c6ee5e3d16aa483a63215f936a113a88e9a",
…
"parent": "4febd3792a1fb2153108b4fa50161c6ee5e3d16aa483a63215f936a113a88e9a"
…
# Make sure the legacy loader is used
$ rm manifest.json
override r--r--r-- mitr/staff for manifest.json? y
$ tar cf ../archive1.tar .
$ cd ..
# Verify that skopeo can’t use a legacy-only archive
$ ./skopeo --policy default-policy.json copy docker-archive:archive1.tar:busybox:v0 dir:t
WARN[0000] docker-archive: references are not supported for sources (ignoring)
FATA[0000] Error determining manifest MIME type for docker-archive:archive1.tar:docker.io/library/busybox:v0: Error loading tar component manifest.json: file does not exist
# Try using the file with docker. This hangs for a while, eventually failing with
# docker load < ~mitr/archive1.tar
error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/images/load?quiet=0: EOFand the system logs show the daemon aborting with a stack overflow due to an infinite recursion in github.com/docker/docker/image/tarexport.(*tarexporter).legacyLoadImage().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, in the Docker implementation, they use the
chainIDto generate the path of the layer in the tar stream. This is hard for us to do this because the layer is sent to the tar stream by thePutBlobmethod which is designed to handle each layer independently from other ones: we can not easily use previous layer sha to generate the current layer path.
The metadata is created not in PutBlob, but in writeLegacyLayerMetadata, at which point the full ordered set of layers is available as layerDescriptors.
docker/tarfile/dest.go
Outdated
| return false | ||
| // CompressesLayers indicates if layers must be compressed, decompressed or preserved | ||
| func (d *Destination) CompressesLayers() types.LayerCompression { | ||
| return types.Decompress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this out of docker/tarfile.Destination into the users, docker/daemon.daemonImageDestination and docker/archive.archiveImageDestination, and return Decompress only in the docker/archive implementation; for docker/daemon, let’s keep the current PreserveOriginal behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| if isCompressed && ic.dest.ShouldDecompressLayers() { | ||
| logrus.Debugf("Blob will be decompressed") | ||
| destStream, err = decompressor(destStream) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema1 is still around (and old images will keep using it), but hopefully its will diminish over time… just like using the repositories archive format :)
I really don’t like decompressing the possibly gigabytes of data twice, but you’re right that the code is tricky and I don’t quite have the extra time necessary to get this right, so I guess let’s live with the two separate decompressions.
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is a complete review, unlike the previous in-progress/partial ones.
The one critical blocker is the layer ID computation; otherwise various smaller things. From earlier, only the handling of inputInfo.Digest() == "" is outstanding (and the double decompression of schema1 images, though that is non-blocking).
|
Thanks for your review. I've addressed all of your comments, excepted the one related to the |
docker/tarfile/dest.go
Outdated
| } | ||
|
|
||
| // WriteLegacyLayerMetadata writes legacy VERSION and configuration files for all layers | ||
| func (d *Destination) WriteLegacyLayerMetadata(layerDescriptors []manifest.Schema2Descriptor) (layerPaths []string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please make this method private (lowercase writeLegacy…)
docker/tarfile/dest.go
Outdated
| tee := io.TeeReader(stream, digester.Hash()) | ||
| if err := d.sendFile(inputInfo.Digest.String(), inputInfo.Size, tee); err != nil { | ||
| return types.BlobInfo{}, err | ||
| if inputInfo.Digest.String() == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at this point the code can unconditionally assume inputInfo.Digest is valid.
The start of PutBlob could, I think, be
func (d *Destination) PutBlob(stream io.Reader, inputInfo types.BlobInfo, isConfig bool) (types.BlobInfo, error) {
// Ouch, we need to stream the blob into a temporary file just to determine the size or digest.
if inputInfo.Size == -1 || inputInfo.Digest.String() == "" {
logrus.Debugf("docker tarfile: input with unknown size or digest, streaming to disk first ...")
and transparently handle the hypothetical case of known size && unknown digest instead of failing.
|
The |
20a6720 to
8bb449d
Compare
|
@gilligan yes, it should work with and without this patch (i'm already using skopeo to push images that have been produced with |
|
huh? Last I tried that did not work since skopeo was complaining about a missing manifest file I believe |
|
@gilligan I'm pushing to a registry several images generated by |
|
@mtrmac Thanks for your how to "build crazy images" :) When I had triyed, I didn't remove the manifest.json file. This explains why it was working fine for me. I pushed a commit that avoids the loop by skipping repeated layers. A warning is printed to inform the user. |
7106151 to
e107475
Compare
That changes the content of the image; the duplicates matter (see below for an example). Please either compute a Actually, there is a far simpler way to produce such an image: Create an empty This results in the following "sha256:d32459d9ce237564fb93573b85cbc707600d43fbe5e46e8eeef22cad914bb516",
"sha256:4b5c23136932e4a87cecc06f674ff3f66ca21c8a61653b97aea095b5822ded60",
"sha256:46c039e328386fe951548c428c030f15bf6c764aa3222fedb9d114d2b3270bb8",
"sha256:4b5c23136932e4a87cecc06f674ff3f66ca21c8a61653b97aea095b5822ded60"i.e. the two |
ca67b84 to
0dac8ec
Compare
|
Would https://github.com/mtrmac/image/tree/docker-archive-chainid (testable via https://github.com/mtrmac/skopeo/tree/docker-archive-chainid ) work for you? At least (Note how the layers end up in the root of the tarball again; the legacy paths all become symlinks.) |
|
@mtrmac thanks for your patches and the skopeo branch:) |
0dac8ec to
a490e69
Compare
|
@mtrmac I've integrated your patches and rebased this PR on master. |
|
👍 Thanks! @runcom PTAL |
|
Failing |
|
@runcom Could you please have a look? |
|
@mtrmac it would be really nice to get this PR merged since I'm waiting for it to improve our Docker tooling. How could we move forward? |
|
@runcom ping again, this has been waiting for you for 24 days now. |
|
@nlewo this needs one last rebase - I'm reviewing it |
docker save generates image compatible with the legacy format, ie, layers are tar, they have a configuration file and a repositories file is created. There are some external tools that still relie on this old format such as mesos [1] and nixos [2]. [1] https://github.com/apache/mesos/blob//7ca46e24b3339ba27e88b99ea95362c956ef03c1/src/slave/containerizer/mesos/provisioner/docker/local_puller.cpp#L168 [2] https://github.com/NixOS/nixpkgs/blob/5c6dc717a66b6555e5757f92376527da7a553fec/pkgs/build-support/docker/default.nix#L143 Signed-off-by: Antoine Eiche <lewo@abesis.fr>
A new layer id is generated by using the current layer id and its parent layer id. Signed-off-by: Antoine Eiche <lewo@abesis.fr>
Duplicate IDs would otherwise result in duplicate files in the tarball, with some of the versions possibly causing an import loop. Duplicate layer IDs can happen - using layer-at-a-time editing tools - naively by accident, e.g. > FROM fedora > > ADD empty / > RUN rm /empty > ADD empty / creates two identical layers - by a malicious actor Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead, keep all layers in the root of the tarbale, and create subdirectories only in the legacy metadata writer. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
e992967 to
7e96dbf
Compare
Signed-off-by: Antoine Eiche <lewo@abesis.fr>
docker save generates image compatible with the legacy format, ie,
layers are tar, they have a configuration file and a repositories file
is created.
There are some external tools that still relie on this old format such
as mesos [1] and nixos [2].
[1] https://github.com/apache/mesos/blob//7ca46e24b3339ba27e88b99ea95362c956ef03c1/src/slave/containerizer/mesos/provisioner/docker/local_puller.cpp#L168
[2] https://github.com/NixOS/nixpkgs/blob/5c6dc717a66b6555e5757f92376527da7a553fec/pkgs/build-support/docker/default.nix#L143