Skip to content

Conversation

@erikh
Copy link
Contributor

@erikh erikh commented Jan 19, 2017

This is a proof-of-concept and still needs tests and other completeness changes
(like for things that are not docker). Please let me know if you'd like to
accept the feature so I know to continue work towards getting this fleshed out
in the other image systems, tests, etc.

erikh/box has a layer editor inside it that allows it to filter out layers as
necessary. This is made possible currently with some hand-rolled image code.
I'd love to move to containers/image but it has a very coarse copy
implementation.

This adds a simple hook that provides the source layer information to a hook in
the copier which returns bool on whether to proceed to continue to copy
(contrast with filepath.Walk).

Here is some example code that demonstrates how it can be used: in this case,
the golang image is pulled from the host docker, the first layer is clipped
off the top and pushed to a new image named erikh/test. The other layer
relationships have been removed and naturally the image is much smaller.

This has proven useful for building minimal images of containers in the past.

package main

import (
	"fmt"

	"github.com/containers/image/copy"
	"github.com/containers/image/docker/daemon"
	"github.com/containers/image/docker/reference"
	"github.com/containers/image/types"
)

func main() {
	ref, err := daemon.ParseReference("docker.io/library/golang:latest")
	if err != nil {
		panic(err)
	}

	img, err := ref.NewImage(nil)
	if err != nil {
		panic(err)
	}
	defer img.Close()

	tgtRef, _ := reference.ParseNamed("docker.io/erikh/test:latest")
	tgt, err := daemon.NewReference("", tgtRef)
	if err != nil {
		panic(err)
	}

	b, _, err := img.Manifest()
	if err != nil {
		panic(err)
	}
	fmt.Println(string(b))

	var i int
	err = copy.Image(nil, tgt, ref, &copy.Options{
		RemoveSignatures: true,
		LayerCopyHook: func(srcLayer types.BlobInfo) bool {
			i++
			return i < 2 // only the first layer
		},
	})

	if err != nil {
		panic(err)
	}
}

@runcom
Copy link
Member

runcom commented Jan 19, 2017

I've not yet done a full code review but high level concept sounds really good to me. Maybe we can use the callback to also skip layers we're not interested in (or the other way around). I like it. @mtrmac WDYT? I'm not totally sure about the implications this code could have to the way we work with signatures.

@erikh
Copy link
Contributor Author

erikh commented Jan 19, 2017 via email

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to #159 , except that it uses a function instead of a prescribed set, and that it tries to update the manifest as well, whereas #159 just punts on that.

At a high level, there’s only so far copy.Image can go as a generic image editing system, without it becoming a completely unmanageable mess (and it is fairly close to that already, and it will get worse with things like #105 ). I don’t really know how flexible we can or want to make it…

For example, a plausible alternative could be that we would instead make some part of imageCopier public (the top-level copylayers, which would in the future handle parallelization? Almost certainly the non-trivial copyLayer should be made available.), and leave it to the callers to coordinate getting the necessary information for types.Image.UpdatedImage.

OTOH it does make sense for UpdatedImage to support adding/removing layers, and that’s, probably, the most difficult part of this.

Overall, I really don’t know right now.

copy/copy.go Outdated
// Please keep this policy check BEFORE reading any other information about the image.
if allowed, err := policyContext.IsRunningImageAllowed(unparsedImage); !allowed || err != nil { // Be paranoid and fail if either return value indicates so.
return errors.Wrap(err, "Source image rejected")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d rather force the caller to provide a policy. It is… not too difficult… to do:

signature.Policy{Default: []signature.PolicyRequirement{signature.NewPRInsecureAcceptAnything()}}

and passing nil here by mistake could be disastrous.

copy/copy.go Outdated
}

canModifyManifest := len(sigs) == 0
canModifyManifest := len(sigs) == 0 || options.RemoveSignatures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Per

var sigs [][]byte
if options != nil && options.RemoveSignatures {
    sigs = [][]byte{}

above, this seems redundant.

(Also, options can be nil.)

copy := *m // NOTE: This is not a deep copy, it still shares slices etc.

if options.LayerInfos != nil {
if len(copy.LayersDescriptors) != len(options.LayerInfos) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add support for this, it should land for all manifest schemas simultaneously.

// Edits the layers in the manifest when called by replacing them with the
// appropriate infos provided. This is metadata-only -- the actual layers still
// have to get to the image.
func (m *manifestSchema2) performEdits(infos []types.BlobInfo) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performEdits is too generic a name for this limited operation.

m.LayersDescriptors = make([]descriptor, infolen)
for i, info := range infos {
imageConfig.History[i] = imageHistory{}
imageConfig.RootFS.DiffIDs[i] = info.Digest
Copy link
Collaborator

@mtrmac mtrmac Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect if the layer is compressed. DiffID values must be the digests of the compressedEDIT uncompressed layers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See also the UpdatedImageNeedsLayerDiffIDs hack. That decision mechanism would obviously not work with the “anything can be returned” hook, and it is also too late by that time; so “should we compute DiffIDs” would need a complete rethinking.)

m.LayersDescriptors[i].Size = info.Size
m.LayersDescriptors[i].URLs = info.URLs
}
imageConfig.RootFS.BaseLayer = m.LayersDescriptors[0].Digest.String()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note to self: I didn’t review the config.json edits in detail here, … but this is inconsistent with what manifestSchema1.convertToManifestSchema2 does.)

copy/copy.go Outdated
ReportWriter io.Writer
SourceCtx *types.SystemContext
DestinationCtx *types.SystemContext
LayerCopyHook func(types.BlobInfo) bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really have a more descriptive name; shouldLayerBeCopied or something shorter to that effect; “hook” does not describe the function at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and does this really have to be a function? Could this be a set, as per #159 ?

Or, more generally, if we add “delete layer” here, pretty soon we will be adding an “add a layer at this index”. What would that interface look like, and can we define that instead right now? (Yeah, that would be making this PR more complex. Perhaps we should just merge this, or the “layer subset” non-hook variant, right now, with a // Warning: API likely to change, and not block on the higher-level discussions?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pass the layer id so really you could do anything you want, including keeping a table of images you want to keep, which is exactly what box does.

}

// regurgitate the image configuration for recalculation of layers in the event the layer list has been edited.
imageConfig := &image{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fairly high risk that this would drop any fields of config.json added in the future.

@erikh
Copy link
Contributor Author

erikh commented Jan 19, 2017

The two security changes I made seemed necessary at the time but I will back them out.

@erikh
Copy link
Contributor Author

erikh commented Jan 19, 2017

Ok. I will not close this while you decide. I realize the code is not in the best shape; it's merely meant as a discussion-starter.

For the rest of it, I'm focused on it being doable; we accomplish this in box with a layer list we manage outside of the layer list in the image (and then repackage at the end) so there's no reason we couldn't do it that way too -- I just felt a function was a safer bet given how generic they can be. No objection to doing it the other way.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 19, 2017

Ok. I will not close this while you decide. I realize the code is not in the best shape; it's merely meant as a discussion-starter.

Yeah; this strongly depends on how you want to use it in general.

Vaguely, the two major directions I see (but I may well be missing something)

  • Expose enough of copyLayers that a caller who exactly knows what it needs, and doesn’t care about general API availability (e.g. is restricted to schema2, only uncompressed layers, and the like) can use the copy/compress/verify-digest/future-parallel-copy functionality, and manually do the high-level image manipulation (copying/editing configs, manifests), using some but perhaps not all of the types.Image functions. Negative: Need to know much more about the image and the formats. Positive: almost infinite flexibility for editing the image.

  • Add an options.LayerEditor hook, which would receive ic.src.LayerInfos() and return an edited value, with some layers deleted, and some added (in which case, probably also supplying an io.Reader for the contents of that layer).:

    LayerEditor func([]types.BlobInfo) []struct{info types.BlobInfo; sourceIfNewLayer io.Reader}`

    Then, copy.Image would be on the hook for making these edits work. Negative: More work to support the various edits in containers/image (or a confusing API where some combinations work and some don’t); only those kinds of edits would be possible; Positive: Much easier to use for callers who want to do a single well-targeted edit to the layers only.

Does any of this make sense? Are there other options I have missed? @runcom ?

(And would the LayerSubset parameter instead of a hook from #159 work for you, or is there something about the hook which is more useful?)

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 19, 2017

(admittedly the LayerEditor name doesn’t mean anything and in general sucks.)

@erikh
Copy link
Contributor Author

erikh commented Jan 19, 2017 via email

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 19, 2017

I don't think doing this just in schema2 is the right answer, if that's what you were suggesting.

To be fair, my initial comment on the docker_schema2.go file was probably too strong. It would definitely not be the first case where some types.Image.UpdatedImage functionality is only available for some schemas or schema combinations, and if nobody cared to implement this for $obscure_format, well, nobody would exactly be hurt. But it does make the interface confusing when we can’t easily explain to users what is and is not supposed to work.

@erikh
Copy link
Contributor Author

erikh commented Jan 19, 2017 via email

@erikh
Copy link
Contributor Author

erikh commented Jan 20, 2017

Hmm; maybe make the hook a part of the interface somehow, so it has to at least be considered when implementing a destination?

@erikh
Copy link
Contributor Author

erikh commented Jan 20, 2017

Also please note that I have elided the security changes, so we can focus on the meat. :)

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 20, 2017

Hmm; maybe make the hook a part of the interface somehow, so it has to at least be considered when implementing a destination?

As far as layers go, it seems to me that ManifestUpdateOptions.LayerInfos along with the data in .InformationOnly should allow any edits (additions, deletions, reorders). It would be up to copy.Image / copyLayers along with any hooks / edit parameters to supply the necessary info for the ManifestUpdateOptions. Am I missing anything?

@erikh
Copy link
Contributor Author

erikh commented Jan 20, 2017

Sure; I'm talking about the requirement for the hook to be implemented (or at least, considered when implementing a destination).

Unrelated, is there an irc or slack I can join to collaborate with you folks?

@erikh
Copy link
Contributor Author

erikh commented Jan 20, 2017

I'm sorry, I'm not being clear.

For a destination to work with copy, I think it should at least implement or at minimum, do something (perhaps by return an error) to indicate that it cannot do layer edits.

This could be done by satisfying an interface.

I hope that's clearer, sorry.

@erikh
Copy link
Contributor Author

erikh commented Feb 15, 2017

@mtrmac I'm going to try implementing option #2 in #218 (comment) tonight -- please let me know if you'd prefer I do it some other way.

I am already running with a fork with this PR in erikh/box so I could get moving, but I want to realign as soon as we can come to a conclusion of how to implement this.

@erikh
Copy link
Contributor Author

erikh commented Feb 15, 2017

... the layer editing hook, to be precise.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 15, 2017

@mtrmac I'm going to try implementing option #2 in #218 (comment) tonight -- please let me know if you'd prefer I do it some other way.

No, I continue to think that an interface vaguely like that would be general and flexible enough to be useful long-term.

And it is quite acceptable to implement this kind of editing only for one of the image manifest formats in the initial PR, as long as an attempt to do it with the unsupported formats is clearly rejected.

@erikh
Copy link
Contributor Author

erikh commented Feb 15, 2017 via email

@erikh
Copy link
Contributor Author

erikh commented Feb 19, 2017

I'm going to leave the new layer reader off for now. I really think this would be better done as a separate method on the src that accepts a reader for each layer.

The rest I'm working on now.

@erikh erikh force-pushed the copy-hook branch 2 times, most recently from 798114b to 1a0fcb4 Compare February 19, 2017 06:17
@erikh
Copy link
Contributor Author

erikh commented Feb 19, 2017

I've taken a stab at this (which I have overwritten the previous patch with)
but the crux of the problem nwo seems to be that the config is copied verbatim
(see ConfigBlob() in manifestSchema2 and copyConfig in copy.Image) and editing
it requires knowledge of the manifest format, which is hard to reap in a
portable way here.

Looking for some suggestions on how to resolve this.

if err := dest.PutSignatures(sigs); err != nil {
return errors.Wrap(err, "Error writing signatures")
}
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea what happened here. I'll investigate.

srcInfos := ic.src.LayerInfos()

if ic.layerEditor != nil {
srcInfos = ic.layerEditor(ic.src.LayerInfos())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling LayerInfos three times is a bit much… at the very least srcInfos = ic.layerEditor(srcInfos)

retval := types.BlobInfo{Digest: digester.Digest(), Size: inputInfo.Size}

d.blobs[inputInfo.Digest] = retval
return retval, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(WRT this, comment changes, empty lines, etc…) I guess, why not, but please, in the final version, one idea per commit. It should be clear why every change was introduced, and in big commits it is often difficult whether a small change is essential for the primary task of the commit or really a typo in an unrelated cleanup which was not supposed to change behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… OTOH, in the extreme, having 10 one-line commits is just fine with me. Obviously correct, easy to review.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 21, 2017

but the crux of the problem nwo seems to be that the config is copied verbatim
(see ConfigBlob() in manifestSchema2 and copyConfig in copy.Image) and editing
it requires knowledge of the manifest format, which is hard to reap in a
portable way here.

Yeah. Still doing that is really necessary.

A possible step towards making this possible could be to replace the existing ManifestUpdateOptions.LayerInfos with something like LayerInfosPreservingDiffIDs, with the existing functionality (editing the manifest only, not the config; we need to preserve this for all currently supported formats to be able compress blobs on upload), and a new LayerInfosReplacement (or a better name?), which requires a config update, and initially fail for all formats. The caller would presumably be allowed to set only one of the two. (Or perhaps ManifestUpdateOptions.LayerInfos just needs to be richer so that it can capture the necessary information in both of these cases.)

And then, we do need to figure out what the required config updates are. Right now, when the layers must come from the original src, the edits can only be a (possibly permuted) subset of the original layers; so we might be able to build the config updates by creating an equivalent (possibly permuted) subset of the original config entries. That seems fairly easy, or I may well be missing something essential about the config format.

The more general case, with arbitrary new layers being introduced, is noticeably harder; luckily we do have the DiffID computation code already, so it would “only” have to be enabled if necessary, but the hook might still want to provide the Created/Author/CreatedBy fields (from schema2 history). Or perhaps nobody cares about this metadata?

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017

The problem is not editing the config, it's converting the config to the new layers; which means it has to be unmarshalled first and its properties changed and remarshalled, but to what format? At that level there's no indicator of what format to use, much less convert to.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 22, 2017

At that level there's no indicator of what format to use, much less convert to.

I guess I don’t understand the problem; is there any ambiguity about config formats? We are already parsing / creating config.json files in manifestSchema[12].convertToManifestSchema[12]. It’s somewhat hairy but it seems perfectly doable.

Or are you perhaps saying that a schema2 manifest could refer to an OCI config, or an OCi manifest could refer to a schema1 config?

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 22, 2017

Yes but the abstraction doesn't support that, unless I'm missing something, by the time you're in ConfigBlob() you don't have access to the manifest format to unmarshal/marshal it.

What happens is that the copy code, after supplying all needed data, calls .UpdatedImage; that creates an in-memory-only variant of the original manifest (a memoryImage with a new manifest object); that new manifest object is initialized with an edited version of configBlob from the start. Follow e.g. https://github.com/containers/image/blob/master/image/docker_schema1.go#L171https://github.com/containers/image/blob/master/image/docker_schema1.go#L284https://github.com/containers/image/blob/master/image/docker_schema2.go#L56 .

(And of course the abstraction is not set in stone, image.genericManifest is a private type perfectly subject to change, and even the public types in types.go have been modified fairly frequently in the past. The existing structure does seem to be more or less suitable for this in principle, concentrating all the true complexity in .UpdatedImage and its helper methods, and leaving the rest as fairly simple parsers / getters. It can very likely be improved.)

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 22, 2017

Right, but if you read that code, it takes it right out of the original image

What path exactly? In the conversion case I have demonstrated above, https://github.com/containers/image/blob/master/image/docker_schema2.go#L84 uses the m.configBlob != nil path, with m.configBlob prepopulated with a memory-only version which does not exist in the source (obviously, when the source is not a schema2 image and there is no separate config.json in the original at all).

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 22, 2017

That added fmt.Println is in the wrong place; again, in the in-memory case, m.configBlob is non-nil from the start and that is never reached.

If that’s not the case, I’m not immediately sure what is going on. Here’s what I have used to test that I am not going crazy:

diff --git a/vendor/github.com/containers/image/copy/copy.go b/vendor/github.com/containers/image/copy/copy.go
index d27e634..a0f3f8a 100644
--- a/vendor/github.com/containers/image/copy/copy.go
+++ b/vendor/github.com/containers/image/copy/copy.go
@@ -180,7 +180,7 @@ func Image(policyContext *signature.PolicyContext, destRef, srcRef types.ImageRe
        }
 
        pendingImage := src
-       if !reflect.DeepEqual(manifestUpdates, types.ManifestUpdateOptions{InformationOnly: manifestUpdates.InformationOnly}) {
+       if _ = reflect.DeepEqual; true { //} !reflect.DeepEqual(manifestUpdates, types.ManifestUpdateOptions{InformationOnly: manifestUpdates.InformationOnly}) {
                if !canModifyManifest {
                        return errors.Errorf("Internal error: copy needs an updated manifest but that was known to be forbidden")
                }
diff --git a/vendor/github.com/containers/image/image/docker_schema2.go b/vendor/github.com/containers/image/image/docker_schema2.go
index 76ceea7..1188cd9 100644
--- a/vendor/github.com/containers/image/image/docker_schema2.go
+++ b/vendor/github.com/containers/image/image/docker_schema2.go
@@ -151,6 +151,24 @@ func (m *manifestSchema2) UpdatedImageNeedsLayerDiffIDs(options types.ManifestUp
 // This does not change the state of the original Image object.
 func (m *manifestSchema2) UpdatedImage(options types.ManifestUpdateOptions) (types.Image, error) {
        copy := *m // NOTE: This is not a deep copy, it still shares slices etc.
+
+       if true {
+               logrus.Errorf("HERE")
+               blob, err := copy.ConfigBlob()
+               if err != nil {
+                       return nil, err
+               }
+               configJSON := bytes.Join([][]byte{[]byte("INVALIDPREFIX"), blob}, nil)
+               configDescriptor := descriptor{
+                       MediaType: "application/vnd.docker.container.image.v1+json",
+                       Size:      int64(len(configJSON)),
+                       Digest:    digest.FromBytes(configJSON),
+               }
+
+               m2 := manifestSchema2FromComponents(configDescriptor, nil, configJSON, copy.LayersDescriptors)
+               return memoryImageFromManifest(m2), nil
+       }
+
        if options.LayerInfos != nil {
                if len(copy.LayersDescriptors) != len(options.LayerInfos) {
                        return nil, errors.Errorf("Error preparing updated manifest: layer count changed from %d to %d", len(copy.LayersDescriptors), len(options.LayerInfos))

and ./skopeo --policy=default-policy.json copy docker://busybox:latest dir:t.

My best guess (but just a guess) is that the reflect.DeepEqual test (which the above stupid hack patches out) is preventing the run of UpdatedImage at all.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 22, 2017

… actually, let’s step back: This PR, as far as manifest/config editing goes, only updates ManifestUpdateOptions.LayerInfos. That of course does not edit config.json simply because that code doesn’t exist. .LayerInfos has been created to update the manifest after blobs are compressed (changing their digests but not DiffIDs). Editing the config as necessary would be a new functionality which needs to be written. It isn’t already there just to enable.

@erikh
Copy link
Contributor Author

erikh commented Feb 22, 2017 via email

Erik Hollensbe added 4 commits February 25, 2017 02:45
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
Specifically, with the layer editing changes. The layers are variable
and unvalidated now so this test will no longer pass.

Signed-off-by: Erik Hollensbe <github@hollensbe.org>
@rhatdan
Copy link
Member

rhatdan commented Jan 29, 2018

@erikh Is this still something you want. Could you rebase the PR and @mtrmac PTAL/

@erikh
Copy link
Contributor Author

erikh commented Jan 29, 2018

it's ok, thanks and sorry to keep this hanging up things for so long

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants