Skip to content

support for rootless containers with an external rootfs#871

Closed
giuseppe wants to merge 16 commits into
containers:masterfrom
giuseppe:rootfs-rootless
Closed

support for rootless containers with an external rootfs#871
giuseppe wants to merge 16 commits into
containers:masterfrom
giuseppe:rootfs-rootless

Conversation

@giuseppe
Copy link
Copy Markdown
Member

@giuseppe giuseppe commented Jun 1, 2018

This is a hacky and PoC attempt at having rootless containers in podman. A proper implementation will take much more work than this PR (biggest issue, containers/storage must handle rootless access and management for the images, most likely we will need some changes in containers/image as well). In any case we would be able to use only the vfs backend storage until overlayfs can be used by a not privileged user.

Given these limitations, the current implementation expects an exploded container rootfs. My tests were limited to run a container and see it appears in podman ps.

The last patch must land in containers/storage, I've added it here so that the PR can be used.

$ bin/podman run -v /tmp:/tmp --rootfs /path/to/an/exploded/container/rootfs sh -c "echo it works > /tmp/out"; cat /tmp/out
it works
$ bin/podman run -v /tmp:/tmp --rootfs /path/to/an/exploded/container/rootfs cat /proc/self/uid_map
         0       1000          1

@giuseppe giuseppe force-pushed the rootfs-rootless branch 2 times, most recently from 3db4525 to c75e705 Compare June 1, 2018 12:17
@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jun 1, 2018

@nalind @mtrmac PTAL

Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one of these (rootFsSize() and rwSize()) we should probably compute the size of the user-specified rootfs so we can report something meaningful via Inspect.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added implementation for rwSize

Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make Mounted always true if config.Rootfs is set. It's used as a check that the container's rootfs is accessible, and with an exploded rootfs that's always the case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this check is already superfluous and I am going to drop it as c.State.Mounted should be set at this point.

Comment thread pkg/spec/createconfig.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if these guards were based on capabilities instead of UIDs. In this case, I'm guessing that's CAP_NET_BIND_SERVICE and/or CAP_NET_ADMIN in the host network namespace.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be our leading if clause - it will match always as long as we are not rootless, when we don't want to include a network ns if we are --net=host, --net=container, --net=none

Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just set it nil.

Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a workaround for hooks that lack sufficient permission checks? I'd rather patch the hooks, or wrap them with something that does permission checks, or allow users to set the hook directory (which we currently restrict to internal testing).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hooks we have now assume root access, that is why I've temporarily disabled them. We can revisit this once we have hooks that can run as a non privileged user

Comment thread libpod/oci.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a runc-ism?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is used by runc to find the path where to write the status for the rootless containers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can manually override with --root. XDG_RUNTIME_DIR is just used as the default (instead of /run which is the default when running as root).

Comment thread libpod/runtime.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to rootless-ness?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is related since at the moment I don't read the system configuration file (as they use paths that are not writeable to a non root user). The default configuration file already has this path, probably it was just not updated here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default configuration file already has this path, probably it was just not updated here.

Sounds like this commit could be pulled out into a separate PR and landed immediately then. Is there a reason to hold it back with the rootless stuff?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to: #872

Comment thread cmd/podman/libpodruntime/runtime.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to rootless-ness?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I've added it for local testing and I thought it could be in general useful to expose this setting

Copy link
Copy Markdown
Contributor

@wking wking Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I thought it could be in general useful to expose this setting

So make it a separate PR? It will probably settle faster that way without the rest of the rootless stuff.

@giuseppe giuseppe force-pushed the rootfs-rootless branch 4 times, most recently from 593dbd5 to cf73656 Compare June 1, 2018 17:37
@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Jun 1, 2018

@nalind could you have a look at the patch "[VENDOR-FIX]: containers/storage do not chown if not root". Does it look fine for containers/storage?

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jun 1, 2018

Rebase please

Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I by no means understand the codebase, so just a few localized comments. I’m sure several of them are very premature.

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ugly. It would be nice for c.StringSlice("[ug]idmap") to be parsed exactly once, and the rest should deal with well-typed values (including the initialization of mappings here) instead of strings.

Also, separate the implementation logic / defaulting from the CLI integration, e.g. to make testing possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored it so that it is done only in ParseIDMapping

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(From a quick search this will probably mess up podman ps formatting.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. My preference is probably to manually handle the no-image case and pop up a <none> or similar to indicate it's not using an image

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but the interaction between data and c.Args() will be very non-obvious after without the context of this PR. Could the inputCommand determination be moved to the caller’s if rootfs == "" condition, or maybe the if rootfs == "" condition into this function?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still unsure about how the CLI should look like, the image argument doesn't make any sense when rootfs is specified (as the current PR does) but I am not yet sure this is the nicest way to handle it. @rhatdan @mheon @baude @wking what do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the last field was an image or a path? The path would then be the rootfs.

Use path for rootfs and skip all image handling?
podman run -ti ~/mycontainer /bin/sh

versus

podman run -ti fedora /bin/sh

But why can't we get the second example to work with containers/storage, Just have podman realize it is not running with CAP_SYS_ADMIN and so it sets up a local storage ~/containers/storage and pulls the images to a vfs layer (Hopefully at some point an overlayfs, if UserNS gets support).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we want --rootfs to not accept an argument but change the behaviour of the first argument?

podman run -ti --rootfs -v /foo:/bar --net host --other-args [...] /path/to/rootfs echo foo

vs what it is done now:

podman run -ti -v /foo:/bar --net host --other-args [...] --rootfs /path/to/rootfs echo foo

Thinking of it now, the first makes more sense to me. As --rootfs can simply be "treat the image argument as a rootfs`

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unprivileged overlayfs is still WIP. Differently FUSE is going to be soon supported in an userNS.

I've a very hacky implementation of overlay in FUSE, but I got overlayfs to work with buildah in an userNS. I'll do some cleanup and publish it somewhere later this week. containers/storage will need to learn how to mount an overlayfs using FUSE instead of "mount -t overlay"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more that you would not need --rootfs at all. If the "image" field is a path then it implies --rootfs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good idea, although don't we still need --rootfs when the directory is a relative path? Or is it mandatory to use an absolute path?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I’d prefer explicit options for deciding the semantics, instead of arguments that can be parsed multiple ways and a heuristic guess what the user meant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @mtrmac here. Parsing image names is an absolute mess because of the ambiguity of shortname vs fullname amongst other things. Let's not make the same mistake here.

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs should be capitalized (throughout), for consistency with libpod.WithRootFSFromImage.

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(3 copies of identical DefaultStoreOptions+GetRootlessStorageOpts — 3 copies is my rough threshold for almost certainly benefiting from a helper function.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a helper function

Comment thread libpod/runtime.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("libpod" or "podman", then? Or are both paths pre-existing by now?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen both used but "libpod" seems to be more correct (as it is already used for TmpDir), going to change it.

Comment thread libpod/storage.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be outside of the if imageName == "" … condition; was the move intentional?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I moved it back of the if block. Thanks for noticing it!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this decision should happen at some higher layer (which will understand that the change has not actually taken place) instead of silently doing nothing and subverting expectations. Up to @nalind , though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I agree, I've added this patch only to enable this use case but it is definitely not the final way on how this should be handled in containers/storage

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very incorrect; chdir("/") is an essential part of the chrooting process, and by no means a replacement for it!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sorry, I messed up the conditions. I think the Chroot errors should be ignored, in that case only the Chdir is performed (as it looks it should be from the code comments)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this also seems like a decision that should be visible at the caller level, of possibly a few levels higher.

@giuseppe giuseppe force-pushed the rootfs-rootless branch 6 times, most recently from 4e70919 to f03dc93 Compare June 4, 2018 09:50
@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Jun 4, 2018

I've simplified the changes in containers/storage, adding a dummy driver that can be used with --rootfs.

@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented Jun 4, 2018

I've simplified the changes in containers/storage, adding a dummy driver that can be used with --rootfs.

That’s very nice.

Comment thread cmd/podman/common.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword: "but the exploded root filesystem of the container"

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only do this if Rootfs is actually set. The WithRootFS() option should validate its input to make sure an empty-string rootfs is not passed, and the given rootfs directory actually exists

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d conceptually prefer that as well—but from a quick check quite a few of the calls above silently ignore empty strings, from the critical WithRootFSFromImage to the more trivial WithConmonPidFile or WithShmDir.

(Yeah, Golang not having proper nullable types sucks.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix WithRootFSFromImage() as well at a minimum (make it require non-empty input and only set it if we're using an image rootfs).

The rest should also be hit by a validation pass as well at some point.

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a final if block at the end here to make sure command is not empty, if there isn't already one.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Jun 4, 2018

@mheon @mtrmac I understand your concern. but you guys are making what I believe is the classic engineering mistake and making the UI more complicated so your job is easier.
If we enforced full path for the image, that would be better then adding a --rootfs flag.

But if the code did a simple check to see if the path exists on disk before doing any image processing, I don't see that as being all that complicated.

Yes there will be corner cases when I could have a fedora directory in the current working directory, but then just make the user specify full path.

Comment thread cmd/podman/create.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should consolidate all the image based changes into a single function and call it at the beginning, to avoid all of these if data != nil blocks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not necessarily need to happen in this PR, as this entire section of code really needs to be refactored - we have extensive duplication in three places (create, run, varlink create) that needs to be folded into here

@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented Jun 4, 2018

but you guys are making what I believe is the classic engineering mistake and making the UI more complicated so your job is easier.

It’s not obvious to me that the --rootfs option is less code, and that was not really a concern to me at all.

The way I think about this, UI complexity is not as much a matter of number of options, or even individual characters to type, as a matter of number of concepts and conditions the user needs to keep track of.

In one sense, (--rootfs explicitA vs. explicitB) and (guessingA vs. guessingB) are conceptually both two options, so a “similar cognitive load”. But as far as what the user must understand, --rootfs one is much easier to Google for, and clearly a separate alternative; it’s safe for the user to learn about explicitB without knowing about --rootfs explicitA at all. With guessingA vs. guessingB, the user must know both, and never forget (when, in practice, many will be copy&pasting commands without being aware of the two formats at all).

But if the code did a simple check to see if the path exists on disk before doing any image processing, I don't see that as being all that complicated.

Yes there will be corner cases when I could have a fedora directory in the current working directory, but then just make the user specify full path.

That’s not making the corner case situation any better: the user writes a script that works just fine with for months with one of the semantics, maybe blissfully unaware of the other feature at all, and suddenly the script stops working just because someone (possibly a different person) created an entirely unrelated object on the same machine.

Comment thread libpod/container.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this up with RootfsImageName and RootfsImageID

@cyphar
Copy link
Copy Markdown

cyphar commented Jun 15, 2018

@rhatdan Yes it was broken very recently with some rootless changes (see opencontainers/runc#1819).

If you can get umoci into containers/storage I would be all for it.

Awesome, I will shoot an email to @nalind and add you to cc to make sure we agree how the integration should work.

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 15, 2018

@giuseppe I'm a little iffy on merging this without a bit more documentation on --rootfs -- it automatically configured a separate store and database if you're not UID 0, so you can't see containers created with it with podman ps from root. This deserves some sort of change to the manpages, I think?

Otherwise, LGTM.

@giuseppe
Copy link
Copy Markdown
Member Author

@mheon this PR is needed mostly in preparation of #936. If you are fine with it, I can add more documentation about rootless containers there as rootless containers are not really usable yet without the changes in containers/storage for skipping Chown's. The alternative solution of using an userNS, and that really enables this user case, is implemented in the other PR

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 15, 2018

@giuseppe Alright. We still needs manpages + bash completions for --rootfs in this PR, though.

giuseppe added 16 commits June 15, 2018 16:11
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The default /dev/pts has the option gid=5 that might not be mapped in
the rootless case.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
so that the user has rw access to it.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Copy Markdown
Member Author

@mheon sure, I've added --rootfs to the man page and the bash autocompletion

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 15, 2018

bot, retest this please

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 15, 2018

Tests are green, code LGTM.

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 15, 2018

@rh-atomic-bot r+

@rh-atomic-bot
Copy link
Copy Markdown
Collaborator

📌 Commit 4932a89 has been approved by mheon

@rh-atomic-bot
Copy link
Copy Markdown
Collaborator

⚡ Test exempted: pull fully rebased and already tested.

@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented Jun 15, 2018

One other unrelated point is that I think the OCI code in containers/image should just use umoci's OCI parsing and handling code (but I'll talk to @mtrmac about that separately).

(Sounds good: containers/image/oci , and the WIP work on OCI multi-arch, handles only a small subset of the possible OCI layouts, to keep the code manageable at the possible cost of interoperability. If we can’t have a simple OCI format, standardizing on a single implementation of the lookup/multi-image/… semantics is the next best way to get widespread interoperability.)

@pauldotknopf
Copy link
Copy Markdown

Can someone enlighten me as to what an "exploded container rootfs" is? Assuming I have ran debootstrap and have a simple rootfs, what am I to do?

➜ podman run --rootfs ./rootfs /bin/bash
Error: rootfs (/home/pknopf/.local/share/containers/storage/vfs-containers/ca37fcd22cedf112c349183200e28040d93b7d299cded3a4d9d2f68129885583/userdata/rootfs) does not exist: OCI runtime error
➜ ls rootfs
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

@pauldotknopf
Copy link
Copy Markdown

Digging into the generated config.json, it looks like the relative path ./rootfs isn't expanded. Using an absolutely path to --rootfs worked. It might be worth auto expanding relative paths in the future.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 20, 2020

Please open a PR to do this? or at least open a separate issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants