Skip to content

c8d/push: Push multi-platform images#56

Closed
vvoland wants to merge 1 commit into
rumpl:c8dfrom
vvoland:containerd-push-multi-platform
Closed

c8d/push: Push multi-platform images#56
vvoland wants to merge 1 commit into
rumpl:c8dfrom
vvoland:containerd-push-multi-platform

Conversation

@vvoland
Copy link
Copy Markdown
Collaborator

@vvoland vvoland commented Aug 11, 2022

Now that we have the option to build multi-platform images it would be nice to be able to push them.
Before this only the default platform was pushed so eventually the image in the registry was single-platform again.

Signed-off-by: Paweł Gronowski pawel.gronowski@docker.com

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@vvoland vvoland requested review from ndeloof, rumpl and thaJeztah August 11, 2022 13:10
Copy link
Copy Markdown

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior in here seems to be that images are modified to new images before pushing unless all manifests are present. Is this really the behavior we want?

That could have quite unexpected results to user if some random set of platforms was present based on previous commands or for example the pull of the image is running at the same time by another requests.

To me that kind of defeats the containerd storage model as well where everything is immutable and digests are persistent. But I see the conversion package is actually part of containerd repo so I guess somebody thought there are useful cases for it.

Alternative I would suggest is that part of the data that is not available locally is cross-repo mounted. The source location is saved to the blob labels anyway on pull to make regular cross-repo push work. If we hit the case where cross-repo-mount can't work (pushing to a different registry) then it would pull the missing data from one registry and push to the other. If that case is hit, we could print a warning to used as well, suggesting that if they only wished to push a single platform they should do docker push --platform or docker push image:tag@sha-of-the-specific-platform.

This is also how I imagine docker run would work but I haven't checked current state. If I do docker pull alpine on arm64 it pulls only one arch by default. But then if I do docker run --platform=amd64 alpine then it isn't an error but will just pull the missing part of the image before running it(same as if I didn't have the alpine image at all).


store := i.client.ContentStore()

children, err := containerdimages.Children(ctx, store, index)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image indexes can be nested so not only a single level should be accounted for in here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to know! Do you remember any specific images that have this kind of nesting so I could test with them?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread daemon/containerd/image.go Outdated

// Create a new manifest list which contains only the manifests we have in store.
srcRef := img.Name
targetRef := srcRef + "-tmp"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple requests end up in here on similar time. Does this need to be named?

Could we avoid leaking an image at all. Containerd probably just needs a digest to root manifest for source, not the image around it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if multiple requests end up in here then funny things can happen. The Name field in Image is required, so we need it.
For now I will add an UUID here, just like I did with the other case.

We could avoid the whole convert and just extract manifest to a particular platform, but only if there is only one matching manifest.
Otherwise we need to construct a new manifest list with the present manifests.
Of course we could also create the manifest list by hand but for now I think we don't want to implement this ourselves, especially since I'm not sure whether we'll want this behaviour in the future.

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 12, 2022

The conversion doesn't affect the original images - it creates a new one, so it's just the image on the registry side that will have a different digest.
I will need to take a closer look on the cross-repo mounting, but from my tests attempting to perform the Push with containerd when other platforms blobs are missing ends with error for some cases (I tested both the Hub and localhost registry).
So for now the converting is an easy way to make a standard push use-case work for a standard Docker user without forcing the user to download all the manifests when he tries to push his tag.

I agree that we need to rethink how to handle this overall. I think that in the end we will get rid of the conversion. But we just yet need to think how to fit the multi-platform images in the current Docker UX.

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 12, 2022

Current behaviour also has funny side effects, consider this:

docker pull busybox:latest            # pulls busybox for current platform (linux/arm64/v8), but image targets a whole manifest list
docker tag busybox:latest ext/busybox-new:latest
docker push ext/busybox-new:latest    # ext/busybox-new:latest pushes only manifest for linux/arm64/v8 platform - as we don't have any other downloaded

echo 'FROM busybox:latest' docker buildx build - -t test --platform linux/arm64/v8 --platform linux/amd64
# ^ a new image is built, but additionaly busybox:latest for linux/amd64 is fetched into store

docker push ext/busybox-new:latest    # ext/busybox:latest is now a manifest list with manifest for linux/arm64/v8 and manifest linux/amd64

So even though we didn't directly change the ext/busybox-new:latest at all, the image now has additional platforms which are now possible to push!

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 12, 2022

This is also how I imagine docker run would work but I haven't checked current state. If I do docker pull alpine on arm64 it pulls only one arch by default. But then if I do docker run --platform=amd64 alpine then it isn't an error but will just pull the missing part of the image before running it(same as if I didn't have the alpine image at all).

Yeah, that's how it works currently.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
@vvoland vvoland force-pushed the containerd-push-multi-platform branch from 93d6593 to 9df4272 Compare August 12, 2022 11:56
@thaJeztah
Copy link
Copy Markdown
Collaborator

The behavior in here seems to be that images are modified to new images before pushing unless all manifests are present. Is this really the behavior we want?

Hmm... so, it's actually a good one. I think with the "non-containerd" snapshotter we already may have a similar issue (although in that case perhaps it's more logical).

Say, I have a multi-arch image;

docker buildx create --use --driver=docker-container
nostalgic_raman

docker buildx build --platform=linux/amd64,linux/arm64 --push -t thajeztah/multifoo:latest -<<'EOF'
FROM alpine
RUN echo hello > /foo.txt
EOF
=> => exporting layers                                                                                                                                                                                0.0s
 => => exporting manifest sha256:5e2d6fef09495657aebe71fd6ef55e47d1598eaefb0d07f9427ed7e6d70a19cc                                                                                                      0.0s
 => => exporting config sha256:cb269d7c0c1ca22fb5a70342c3ed2196c57a825f94b3f0e5ce3aa8c55baee829                                                                                                        0.0s
 => => exporting manifest sha256:e2500bf6b43600fdd62e72f945ed7443844e380bbcaac2f42d10930ca50bbd25                                                                                                      0.0s
 => => exporting config sha256:60fecf06714a8249d962e318e19bfaf2e0454866bba9b0f1d9702bcc2b996df0                                                                                                        0.0s
 => => exporting manifest list sha256:94c333fa15282d57c802f5ce4d37208674d367dbe424a8b7b0ff53847c7a8231                                                                                                 0.0s
 => => pushing layers                                                                                                                                                                                  0.6s

Then pull the image;

docker pull thajeztah/multifoo:latest
latest: Pulling from thajeztah/multifoo
213ec9aee27d: Pull complete
c7ec7661263e: Pull complete
Digest: sha256:94c333fa15282d57c802f5ce4d37208674d367dbe424a8b7b0ff53847c7a8231
Status: Downloaded newer image for thajeztah/multifoo:latest
docker.io/thajeztah/multifoo:latest

Notice that docker always shows the digest that was resolved from the name; in this case that's the digest of the manifest list, which is also shown in docker image ls as digest;

docker image ls --digests
REPOSITORY                                                          TAG                DIGEST                                                                    IMAGE ID       CREATED         SIZE
thajeztah/multifoo                                                  latest             sha256:94c333fa15282d57c802f5ce4d37208674d367dbe424a8b7b0ff53847c7a8231   cb269d7c0c1c   6 minutes ago   5.54MB

But (pre containerd integration) at this moment, I only have 1 image from the manifest list, and the ID of the image refers to the uncompressed image;

docker image inspect thajeztah/multifoo
[
    {
        "Id": "sha256:cb269d7c0c1ca22fb5a70342c3ed2196c57a825f94b3f0e5ce3aa8c55baee829",
        "RepoTags": [
            "thajeztah/multifoo:latest"
        ],
        "RepoDigests": [
            "thajeztah/multifoo@sha256:94c333fa15282d57c802f5ce4d37208674d367dbe424a8b7b0ff53847c7a8231"
        ],

But while it shows the digest of the manifest list, that's not used when pushing the same image again;

docker push thajeztah/multifoo:latest
The push refers to repository [docker.io/thajeztah/multifoo]
cd4e02e5bfde: Layer already exists
994393dc58e7: Layer already exists
latest: digest: sha256:09902b434e82972ad529df8baf34cb4dcebe70bf0f78dbf4d2b90895c18636b4 size: 735

When pushing, it converts the multi-arch image into a single arch without a manifest list. This of course makes sense if this was an image that was built locally (so no "multi-arch" existed), but pushing an image that you pulled (without any changes), leading to (effectively) deleting images is not ideal.

Even with the pre-containerd case, we could possibly (if we keep all metadata);

  • check the manifest-list (sha256:94c333fa15282d57c802f5ce4d37208674d367dbe424a8b7b0ff53847c7a8231)
  • if digest for the linux/amd64 image we have locally still matches the digest in the manifest-list, then either we don't have to push, or we only would have to push the manifest-list (I think it would be rejected if the digests that are referred in it don't exist; but perhaps that's where the "mounts" come into play.

We should definitely have a look what we want the UX to be like for this, because there's different scenario's possible, and we may not be able to "automatically" decide for the user (so may need options to override behavior); we likely don't want to "force" having to pull all image variants (including windows)

@tonistiigi
Copy link
Copy Markdown

The conversion doesn't affect the original images - it creates a new one, so it's just the image on the registry side that will have a different digest.

Yes, but it changes the image in the registry. So if the pushed image is compared with the original one it is different now.

Current behaviour also has funny side effects, consider this:

Yeah, I think that is the reason not to convert. If we don't convert , then ext/busybox-new:latest never changes and will always point to the full-multiplatform image from Docker hub. It doesn't matter what platforms were actually pulled locally and what only appear on the registry.

This is also how I imagine docker run would work but I haven't checked current state. If I do docker pull alpine on arm64 it pulls only one arch by default. But then if I do docker run --platform=amd64 alpine then it isn't an error but will just pull the missing part of the image before running it(same as if I didn't have the alpine image at all).
Yeah, that's how it works currently.

Clarification on this that this is how it should work when alpine in the registry did not change in between the pull and run command. If before running docker run --platform=amd64 the image in the registry had been replaced then now user should have two images. One called alpine that only has amd64 platform and another dangling image that has the old arm64 part of the old image.

I think with the "non-containerd" snapshotter we already may have a similar issue

Yes, these were the issues containerd was designed to solve. Without it Docker does not keep the manifest so we always need to recreate one. We could check based on the root digest we store but that would mean that if there is any change on the registry or pushing to different repo then the push behavior would be completely different and push new single manifest instead.

What I forgot to mention is that since Docker 1.10 (2015) we have had a strong guarantee that when we do the pull(load)-push(save)-pull(load) cycle then the image ID is persistent and never changes. In containerd the image ID (eg in image ls) is the root manifest digest and should have similar guarantees.

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 16, 2022

Clarification on this that this is how it should work when alpine in the registry did not change in between the pull and run command. If before running docker run --platform=amd64 the image in the registry had been replaced then now user should have two images. One called alpine that only has amd64 platform and another dangling image that has the old arm64 part of the old image.

I'm not sure what should be regarded as a dangling image under containerd.
I don't think a dangling image can be represented in ImageStore - all images must have a unique name. If a newer image is pulled with the same name, then the old one is removed. If the content blobs of the old image aren't referenced by anything then they are removed from the content store.
We would have to explicitly create an image that targets the old content with something like busybox@sha256:oldhash and pull the newer image as busybox:latest.

@thaJeztah
Copy link
Copy Markdown
Collaborator

oh, crap; this needs a rebase; tried it locally, and not too complicated, but in case it's useful; here's the rebased version I pushed to a temporary branch; https://github.com/rumpl/moby/compare/c8d...thaJeztah:rumpl_containerd_push_multi_platform_TEMP_rebase?expand=1

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 24, 2022

For now I think we'll go with #54 #74 instead. We can fallback to the convert in case it fails though.

@thaJeztah
Copy link
Copy Markdown
Collaborator

did you mean #74 ?

@vvoland
Copy link
Copy Markdown
Collaborator Author

vvoland commented Aug 24, 2022

Yes! Sorry, must have copied the wrong tab 😆

@rumpl
Copy link
Copy Markdown
Owner

rumpl commented Aug 26, 2022

Closing this one, we have #74

@rumpl rumpl closed this Aug 26, 2022
rumpl pushed a commit that referenced this pull request Dec 30, 2024
cross-compiling for arm/v5 was failing;

    #56 84.12 /usr/bin/arm-linux-gnueabi-clang -marm -o $WORK/b001/exe/a.out -Wl,--export-dynamic-symbol=_cgo_panic -Wl,--export-dynamic-symbol=_cgo_topofstack -Wl,--export-dynamic-symbol=crosscall2 -Qunused-arguments -Wl,--compress-debug-sections=zlib /tmp/go-link-759578347/go.o /tmp/go-link-759578347/000000.o /tmp/go-link-759578347/000001.o /tmp/go-link-759578347/000002.o /tmp/go-link-759578347/000003.o /tmp/go-link-759578347/000004.o /tmp/go-link-759578347/000005.o /tmp/go-link-759578347/000006.o /tmp/go-link-759578347/000007.o /tmp/go-link-759578347/000008.o /tmp/go-link-759578347/000009.o /tmp/go-link-759578347/000010.o /tmp/go-link-759578347/000011.o /tmp/go-link-759578347/000012.o /tmp/go-link-759578347/000013.o /tmp/go-link-759578347/000014.o /tmp/go-link-759578347/000015.o /tmp/go-link-759578347/000016.o /tmp/go-link-759578347/000017.o /tmp/go-link-759578347/000018.o -O2 -g -O2 -g -O2 -g -lpthread -O2 -g -no-pie -static
    #56 84.12 ld.lld: error: undefined symbol: __atomic_load_4
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced 2 more times
    #56 84.12
    #56 84.12 ld.lld: error: undefined symbol: __atomic_store_4
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(x_cgo_notify_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(x_cgo_set_context_function)
    #56 84.12 clang: error: linker command failed with exit code 1 (use -v to see invocation)

From discussion on GitHub;
moby#46982 (comment)

The arm/v5 build failure looks to be due to libatomic not being included
in the link. For reasons probably buried in mailing list archives,
[gcc](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81358) and clang don't
bother to implicitly auto-link libatomic. This is not a big deal on many
modern platforms with atomic intrinsics as the compiler generates inline
instruction sequences, avoiding any libcalls into libatomic. ARMv5 is not
one of those platforms: all atomic operations require a libcall.

In theory, adding `CGO_LDFLAGS=-latomic` should fix arm/v5 builds.

While it could be argued that cgo should automatically link against
libatomic in the same way that it automatically links against libpthread,
the Go maintainers would have a valid counter-argument that it should be
the C toolchain's responsibility to link against libatomic automatically,
just like it does with libgcc or compiler-rt.

Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Cory Snider <csnider@mirantis.com>
rumpl pushed a commit that referenced this pull request Jan 2, 2025
cross-compiling for arm/v5 was failing;

    #56 84.12 /usr/bin/arm-linux-gnueabi-clang -marm -o $WORK/b001/exe/a.out -Wl,--export-dynamic-symbol=_cgo_panic -Wl,--export-dynamic-symbol=_cgo_topofstack -Wl,--export-dynamic-symbol=crosscall2 -Qunused-arguments -Wl,--compress-debug-sections=zlib /tmp/go-link-759578347/go.o /tmp/go-link-759578347/000000.o /tmp/go-link-759578347/000001.o /tmp/go-link-759578347/000002.o /tmp/go-link-759578347/000003.o /tmp/go-link-759578347/000004.o /tmp/go-link-759578347/000005.o /tmp/go-link-759578347/000006.o /tmp/go-link-759578347/000007.o /tmp/go-link-759578347/000008.o /tmp/go-link-759578347/000009.o /tmp/go-link-759578347/000010.o /tmp/go-link-759578347/000011.o /tmp/go-link-759578347/000012.o /tmp/go-link-759578347/000013.o /tmp/go-link-759578347/000014.o /tmp/go-link-759578347/000015.o /tmp/go-link-759578347/000016.o /tmp/go-link-759578347/000017.o /tmp/go-link-759578347/000018.o -O2 -g -O2 -g -O2 -g -lpthread -O2 -g -no-pie -static
    #56 84.12 ld.lld: error: undefined symbol: __atomic_load_4
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced 2 more times
    #56 84.12
    #56 84.12 ld.lld: error: undefined symbol: __atomic_store_4
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(_cgo_wait_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(x_cgo_notify_runtime_init_done)
    #56 84.12 >>> referenced by gcc_libinit.c
    #56 84.12 >>>               /tmp/go-link-759578347/000009.o:(x_cgo_set_context_function)
    #56 84.12 clang: error: linker command failed with exit code 1 (use -v to see invocation)

From discussion on GitHub;
moby#46982 (comment)

The arm/v5 build failure looks to be due to libatomic not being included
in the link. For reasons probably buried in mailing list archives,
[gcc](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81358) and clang don't
bother to implicitly auto-link libatomic. This is not a big deal on many
modern platforms with atomic intrinsics as the compiler generates inline
instruction sequences, avoiding any libcalls into libatomic. ARMv5 is not
one of those platforms: all atomic operations require a libcall.

In theory, adding `CGO_LDFLAGS=-latomic` should fix arm/v5 builds.

While it could be argued that cgo should automatically link against
libatomic in the same way that it automatically links against libpthread,
the Go maintainers would have a valid counter-argument that it should be
the C toolchain's responsibility to link against libatomic automatically,
just like it does with libgcc or compiler-rt.

Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 4cd5c2b)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants