support for rootless containers with an external rootfs#871
Conversation
3db4525 to
c75e705
Compare
There was a problem hiding this comment.
For one of these (rootFsSize() and rwSize()) we should probably compute the size of the user-specified rootfs so we can report something meaningful via Inspect.
There was a problem hiding this comment.
added implementation for rwSize
There was a problem hiding this comment.
We should probably make Mounted always true if config.Rootfs is set. It's used as a check that the container's rootfs is accessible, and with an exploded rootfs that's always the case.
There was a problem hiding this comment.
thanks, this check is already superfluous and I am going to drop it as c.State.Mounted should be set at this point.
There was a problem hiding this comment.
It would be nice if these guards were based on capabilities instead of UIDs. In this case, I'm guessing that's CAP_NET_BIND_SERVICE and/or CAP_NET_ADMIN in the host network namespace.
There was a problem hiding this comment.
This can't be our leading if clause - it will match always as long as we are not rootless, when we don't want to include a network ns if we are --net=host, --net=container, --net=none
There was a problem hiding this comment.
Is this a workaround for hooks that lack sufficient permission checks? I'd rather patch the hooks, or wrap them with something that does permission checks, or allow users to set the hook directory (which we currently restrict to internal testing).
There was a problem hiding this comment.
the hooks we have now assume root access, that is why I've temporarily disabled them. We can revisit this once we have hooks that can run as a non privileged user
There was a problem hiding this comment.
yes, it is used by runc to find the path where to write the status for the rootless containers.
There was a problem hiding this comment.
You can manually override with --root. XDG_RUNTIME_DIR is just used as the default (instead of /run which is the default when running as root).
There was a problem hiding this comment.
This seems unrelated to rootless-ness?
There was a problem hiding this comment.
it is related since at the moment I don't read the system configuration file (as they use paths that are not writeable to a non root user). The default configuration file already has this path, probably it was just not updated here.
There was a problem hiding this comment.
The default configuration file already has this path, probably it was just not updated here.
Sounds like this commit could be pulled out into a separate PR and landed immediately then. Is there a reason to hold it back with the rootless stuff?
There was a problem hiding this comment.
This seems unrelated to rootless-ness?
There was a problem hiding this comment.
true, I've added it for local testing and I thought it could be in general useful to expose this setting
There was a problem hiding this comment.
... I thought it could be in general useful to expose this setting
So make it a separate PR? It will probably settle faster that way without the rest of the rootless stuff.
593dbd5 to
cf73656
Compare
|
@nalind could you have a look at the patch "[VENDOR-FIX]: containers/storage do not chown if not root". Does it look fine for containers/storage? |
|
Rebase please |
mtrmac
left a comment
There was a problem hiding this comment.
I by no means understand the codebase, so just a few localized comments. I’m sure several of them are very premature.
There was a problem hiding this comment.
This is a bit ugly. It would be nice for c.StringSlice("[ug]idmap") to be parsed exactly once, and the rest should deal with well-typed values (including the initialization of mappings here) instead of strings.
Also, separate the implementation logic / defaulting from the CLI integration, e.g. to make testing possible.
There was a problem hiding this comment.
I've refactored it so that it is done only in ParseIDMapping
There was a problem hiding this comment.
(From a quick search this will probably mess up podman ps formatting.)
There was a problem hiding this comment.
Agree. My preference is probably to manually handle the no-image case and pop up a <none> or similar to indicate it's not using an image
There was a problem hiding this comment.
This works, but the interaction between data and c.Args() will be very non-obvious after without the context of this PR. Could the inputCommand determination be moved to the caller’s if rootfs == "" condition, or maybe the if rootfs == "" condition into this function?
There was a problem hiding this comment.
What if the last field was an image or a path? The path would then be the rootfs.
Use path for rootfs and skip all image handling?
podman run -ti ~/mycontainer /bin/sh
versus
podman run -ti fedora /bin/sh
But why can't we get the second example to work with containers/storage, Just have podman realize it is not running with CAP_SYS_ADMIN and so it sets up a local storage ~/containers/storage and pulls the images to a vfs layer (Hopefully at some point an overlayfs, if UserNS gets support).
There was a problem hiding this comment.
so do we want --rootfs to not accept an argument but change the behaviour of the first argument?
podman run -ti --rootfs -v /foo:/bar --net host --other-args [...] /path/to/rootfs echo foo
vs what it is done now:
podman run -ti -v /foo:/bar --net host --other-args [...] --rootfs /path/to/rootfs echo foo
Thinking of it now, the first makes more sense to me. As --rootfs can simply be "treat the image argument as a rootfs`
There was a problem hiding this comment.
unprivileged overlayfs is still WIP. Differently FUSE is going to be soon supported in an userNS.
I've a very hacky implementation of overlay in FUSE, but I got overlayfs to work with buildah in an userNS. I'll do some cleanup and publish it somewhere later this week. containers/storage will need to learn how to mount an overlayfs using FUSE instead of "mount -t overlay"
There was a problem hiding this comment.
I was thinking more that you would not need --rootfs at all. If the "image" field is a path then it implies --rootfs.
There was a problem hiding this comment.
that is a good idea, although don't we still need --rootfs when the directory is a relative path? Or is it mandatory to use an absolute path?
There was a problem hiding this comment.
Personally I’d prefer explicit options for deciding the semantics, instead of arguments that can be parsed multiple ways and a heuristic guess what the user meant.
There was a problem hiding this comment.
Agree with @mtrmac here. Parsing image names is an absolute mess because of the ambiguity of shortname vs fullname amongst other things. Let's not make the same mistake here.
There was a problem hiding this comment.
fs should be capitalized (throughout), for consistency with libpod.WithRootFSFromImage.
There was a problem hiding this comment.
(3 copies of identical DefaultStoreOptions+GetRootlessStorageOpts — 3 copies is my rough threshold for almost certainly benefiting from a helper function.)
There was a problem hiding this comment.
I've added a helper function
There was a problem hiding this comment.
("libpod" or "podman", then? Or are both paths pre-existing by now?)
There was a problem hiding this comment.
I've seen both used but "libpod" seems to be more correct (as it is already used for TmpDir), going to change it.
There was a problem hiding this comment.
This used to be outside of the if imageName == "" … condition; was the move intentional?
There was a problem hiding this comment.
no, I moved it back of the if block. Thanks for noticing it!
There was a problem hiding this comment.
It seems to me that this decision should happen at some higher layer (which will understand that the change has not actually taken place) instead of silently doing nothing and subverting expectations. Up to @nalind , though.
There was a problem hiding this comment.
yes I agree, I've added this patch only to enable this use case but it is definitely not the final way on how this should be handled in containers/storage
There was a problem hiding this comment.
This looks very incorrect; chdir("/") is an essential part of the chrooting process, and by no means a replacement for it!
There was a problem hiding this comment.
yes sorry, I messed up the conditions. I think the Chroot errors should be ignored, in that case only the Chdir is performed (as it looks it should be from the code comments)
There was a problem hiding this comment.
All of this also seems like a decision that should be visible at the caller level, of possibly a few levels higher.
4e70919 to
f03dc93
Compare
|
I've simplified the changes in containers/storage, adding a dummy driver that can be used with |
That’s very nice. |
There was a problem hiding this comment.
Reword: "but the exploded root filesystem of the container"
There was a problem hiding this comment.
We should only do this if Rootfs is actually set. The WithRootFS() option should validate its input to make sure an empty-string rootfs is not passed, and the given rootfs directory actually exists
There was a problem hiding this comment.
I’d conceptually prefer that as well—but from a quick check quite a few of the calls above silently ignore empty strings, from the critical WithRootFSFromImage to the more trivial WithConmonPidFile or WithShmDir.
(Yeah, Golang not having proper nullable types sucks.)
There was a problem hiding this comment.
We need to fix WithRootFSFromImage() as well at a minimum (make it require non-empty input and only set it if we're using an image rootfs).
The rest should also be hit by a validation pass as well at some point.
There was a problem hiding this comment.
We should add a final if block at the end here to make sure command is not empty, if there isn't already one.
|
@mheon @mtrmac I understand your concern. but you guys are making what I believe is the classic engineering mistake and making the UI more complicated so your job is easier. But if the code did a simple check to see if the path exists on disk before doing any image processing, I don't see that as being all that complicated. Yes there will be corner cases when I could have a fedora directory in the current working directory, but then just make the user specify full path. |
There was a problem hiding this comment.
At some point we should consolidate all the image based changes into a single function and call it at the beginning, to avoid all of these if data != nil blocks
There was a problem hiding this comment.
Does not necessarily need to happen in this PR, as this entire section of code really needs to be refactored - we have extensive duplication in three places (create, run, varlink create) that needs to be folded into here
It’s not obvious to me that the The way I think about this, UI complexity is not as much a matter of number of options, or even individual characters to type, as a matter of number of concepts and conditions the user needs to keep track of. In one sense, (
That’s not making the corner case situation any better: the user writes a script that works just fine with for months with one of the semantics, maybe blissfully unaware of the other feature at all, and suddenly the script stops working just because someone (possibly a different person) created an entirely unrelated object on the same machine. |
There was a problem hiding this comment.
Move this up with RootfsImageName and RootfsImageID
|
@rhatdan Yes it was broken very recently with some rootless changes (see opencontainers/runc#1819).
Awesome, I will shoot an email to @nalind and add you to cc to make sure we agree how the integration should work. |
|
@giuseppe I'm a little iffy on merging this without a bit more documentation on Otherwise, LGTM. |
|
@mheon this PR is needed mostly in preparation of #936. If you are fine with it, I can add more documentation about rootless containers there as rootless containers are not really usable yet without the changes in containers/storage for skipping Chown's. The alternative solution of using an userNS, and that really enables this user case, is implemented in the other PR |
|
@giuseppe Alright. We still needs manpages + bash completions for |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The default /dev/pts has the option gid=5 that might not be mapped in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
so that the user has rw access to it. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
@mheon sure, I've added |
|
bot, retest this please |
|
Tests are green, code LGTM. |
|
📌 Commit 4932a89 has been approved by |
|
⚡ Test exempted: pull fully rebased and already tested. |
(Sounds good: containers/image/oci , and the WIP work on OCI multi-arch, handles only a small subset of the possible OCI layouts, to keep the code manageable at the possible cost of interoperability. If we can’t have a simple OCI format, standardizing on a single implementation of the lookup/multi-image/… semantics is the next best way to get widespread interoperability.) |
|
Can someone enlighten me as to what an "exploded container rootfs" is? Assuming I have ran |
|
Digging into the generated |
|
Please open a PR to do this? or at least open a separate issue. |
This is a hacky and PoC attempt at having rootless containers in podman. A proper implementation will take much more work than this PR (biggest issue, containers/storage must handle rootless access and management for the images, most likely we will need some changes in containers/image as well). In any case we would be able to use only the vfs backend storage until overlayfs can be used by a not privileged user.
Given these limitations, the current implementation expects an exploded container rootfs. My tests were limited to run a container and see it appears in
podman ps.The last patch must land in containers/storage, I've added it here so that the PR can be used.