Skip to content

rootless: Rearrange setup of rootless containers ***CIRRUS: TEST IMAGES***#3310

Merged
openshift-merge-robot merged 2 commits into
containers:masterfrom
gabibeyer:rootlessKata
Aug 5, 2019
Merged

rootless: Rearrange setup of rootless containers ***CIRRUS: TEST IMAGES***#3310
openshift-merge-robot merged 2 commits into
containers:masterfrom
gabibeyer:rootlessKata

Conversation

@gabibeyer
Copy link
Copy Markdown

In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to:

  1. create a network namespace
  2. pass the netns persistent mount path to the slirp4netns
    to create the tap inferface
  3. pass the netns path to the OCI spec, so the runtime can
    enter the netns

Closes #2897

Signed-off-by: Gabi Beyer gabrielle.n.beyer@intel.com

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

Hi @gabibeyer. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 12, 2019
@gabibeyer
Copy link
Copy Markdown
Author

cc @egernst @amshinde

@haircommander
Copy link
Copy Markdown
Collaborator

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 12, 2019
@rh-atomic-bot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 12, 2019

If we're doing this, do we need to retain the old postConfigureNetNS codepath? It'd be much simpler if every container used the same path, configuring the netns before launching the container.

@giuseppe WDYT?

@AkihiroSuda
Copy link
Copy Markdown
Collaborator

Seems we should cut a new release of slirp4netns from master for this PR to work.

It should be v0.3.1? or 0.4.0?

@cyphar @giuseppe

Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we've entirely defeated the point of postConfigureNetNS with this change. If we're unconditionally doing it before the container is created, there's no further point to the code - we should remove it and simplify the logic.

Copy link
Copy Markdown
Author

@gabibeyer gabibeyer Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want Podman to create a network namespace for runc as well, and pass the netns path? Or keep the logic of passing the emtpy netns path, and have a more appropriately named variable to differentiate the logic?

Copy link
Copy Markdown

@egernst egernst Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of simplifying and having a consistent flow.

I would weigh in more of it was clear (to me) the scenario/use-case around postConfigureNetNS - not to nit, but can someone describe the comment on that? Happy to send a PR to add here:
https://github.com/containers/libpod/blob/master/libpod/container.go#L396-L398

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make creating the networking namespace and passing it into the runtime unconditional.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how easily we can take rid of postConfigureNetNS. It is also needed when a user namespace is created by the OCI runtime. If we create the network namespace before the OCI runtime runs and configures the user namespace, then the network namespace is owned by the wrong user namespace

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a consistent flow for runc and kata sounds good to me, easier to understand, maintain and debug.
While you are at it @gabibeyer, while not strictly related to this PR, it will be useful to add the above explanation from @giuseppe as a doc comment for the postConfigureNetNS flag

Comment thread libpod/networking_linux.go Outdated
Comment thread pkg/netns/netns_linux.go Outdated
Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how easily we can take rid of postConfigureNetNS. It is also needed when a user namespace is created by the OCI runtime. If we create the network namespace before the OCI runtime runs and configures the user namespace, then the network namespace is owned by the wrong user namespace

Comment thread pkg/netns/netns_linux.go Outdated
Comment thread libpod/container_internal.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a consistent flow for runc and kata sounds good to me, easier to understand, maintain and debug.
While you are at it @gabibeyer, while not strictly related to this PR, it will be useful to add the above explanation from @giuseppe as a doc comment for the postConfigureNetNS flag

Comment thread libpod/networking_linux.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you will be dropping the if condition for PostConfigureNetNS here and everywhere else @gabibeyer

Comment thread libpod/networking_linux.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the pipe creation above before the command-line args for slirp4netns are created, so that we know what file-descriptors "3" and "4" passed to the slirp process are.

Comment thread libpod/networking_linux.go Outdated
@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 12, 2019

Per @giuseppe we might still need PostConfigureNetNS for user namespaces - let's hold off on removing for now while we figure that out.

Comment thread libpod/networking_linux.go Outdated
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe Do you know the reasoning for appending the container PID with api-socket? I couldn't figure out the reasoning behind it, but some of the port mapping tests are failing and I want to make sure its not because of removing it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabibeyer
Copy link
Copy Markdown
Author

After some debugging of the failed pod tests, it seems the infra pod either isn't storing or configuring the network namespace correctly. When attemping to execute slirp4netns the netns is nil: ctr.state.NetNS:<nil>. Will update as I go!

@gabibeyer gabibeyer force-pushed the rootlessKata branch 2 times, most recently from d8c1c99 to e4e3835 Compare June 19, 2019 22:11
@gabibeyer
Copy link
Copy Markdown
Author

@mheon @giuseppe I made some changes that I believe will satisfy the postConfiguration logic for OCI specified namespaces. I am having a little trouble with a healthcheck unit test; when I ran them locally I was getting this failure:

ERRO[0000] unable to get systemd connection to add healthchecks: dbus: authentication failed 
ERRO[0000] unable to get systemd connection to start healthchecks: read unix @->/run/systemd/private: read: connection reset by peer 

However, when I attempted to run the tests against the current master and it seems to be failing as well, so I'm guessing that may be my environment. I'm having a hard time finding in the CI logs what the error reported is.

Thank you for the help!

@mheon
Copy link
Copy Markdown
Member

mheon commented Jun 21, 2019

@baude Can you take a look at the healthcheck test issues here?

@gabibeyer
Copy link
Copy Markdown
Author

@baude Do you have any ideas on how I can proceed further, I'm a little stuck. Thank you!

@baude
Copy link
Copy Markdown
Member

baude commented Jul 8, 2019

i dont see the failures here ... is this only local? can you push so i can see what you are talking about?

@gabibeyer
Copy link
Copy Markdown
Author

gabibeyer commented Jul 8, 2019

@baude ah sorry, yeah I have been attempting to find where the failures in the "special_testing_rootless" tests are happening. I was attempting to run the RunNginxWithHealthCheck command locally via cmdline (podman run --name test -dt -P --healthcheck-command CMD-SHELL "curl http://localhost/" nginx), since that is returning a 127 errorcode within the unittests. That error comes from that; which makes sense since its probably my environment, so disregard my initial question.

Instead it seems to be something with seccomp and OCI runtime, maybe: starting container process caused "seccomp: config provided but seccomp not supported". Also, manually running port commands seems to be working. Is there a way to see more details/logs for the failed tests?

@baude
Copy link
Copy Markdown
Member

baude commented Jul 8, 2019

i can help you debug them if you force push your code

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2019
@lsm5
Copy link
Copy Markdown
Member

lsm5 commented Jul 31, 2019

@gabibeyer slirp4netns 0.4.0-beta.2 should land in updates-testing soon
https://bodhi.fedoraproject.org/updates/FEDORA-2019-cfa64128f7 .. please add karma when you can.

@amshinde
Copy link
Copy Markdown

lgtm @gabibeyer

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Aug 1, 2019

@mheon @vrothberg @cevich PTAL


// Setup rootless networking, requires c.state.NetNS to be set
if rootless.IsRootless() {
rootlessSetupErr = c.runtime.setupRootlessNetNS(c)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why isn't this fatal? Are we expecting a lot of spurious errors here? I would expect a slirp failure to prevent container creation

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is a go routine, it it fatal once the other go routine completes, and the error is checked.

cmd := exec.Command(path, cmdArgs...)
cmdArgs = append(cmdArgs, "-c", "-e", "3", "-r", "4")
if !ctr.config.PostConfigureNetNS {
ctr.rootlessSlirpSyncR, ctr.rootlessSlirpSyncW, err = os.Pipe()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need defer Close... on these two in here?

Comment thread libpod/networking_linux.go
return errors.Wrapf(err, "failed to create rootless network sync pipe")
}
} else {
defer errorhandling.CloseQuiet(ctr.rootlessSlirpSyncR)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have two places where we initialize the pipes for slirp, and two places where we do a defer close on the pipes, and those places are not the same... Why is this? It doesn't seem to make sense.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this took me a minute to figure out. Slirp4netns is connected to conmon via this pipe; that's how it knows to die (when the container + conmon stops, the pipe closes at the conmon end, and slirp4netns stops). So, one end of the pipe must be in conmon command, and the other in the slirp4netns command. Depending on whether the new default logic is ran (setup network -> create container) or the PostConfigureNetNS logic (create container -> setup network), the timing of creating and closing the pipe changes.

I looked a little bit into creating the pipe somewhere before both operations, and then closing after both, but didn't see a good place in the code that popped out. I'm definitely open to suggestions!

Comment thread libpod/oci_linux.go
stopSignal = uint(syscall.SIGTERM)
}

defer func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed or desired. The cleanup process should take care of it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of restart, the previous network namespaces are not getting cleaned up. I am slightly baffled as to where the removal of the netns bind mount is occurring in the stop call in general. It is probably horribly obvious, but I see directly after the call to c.StopWithTimeout the bind mount to the netns is gone /run/user/1000/netns/.... However, when I throw a debug statement at the end of the c.StopWithTimeout function it still exists.
I'll continue to look for the appropriate place to get this cleaned up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop itself does not remove them. On the container stopping, the cleanup process (podman container cleanup, spawned by conmon) will fire, removing the network namespace, mounts, etc. On restart, we've just been reusing the old network namespace, instead of configuring a new one.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new flow, Podman is now creating the network namespace instead of the runtime, and it is being bind mounted so that it can be passed to slirp and the runtime. I'm storing this in c.state.NetNS, which wasn't being set/used with the previous rootless flow. So, there needs to be an extra step of explicitly cleaning up the podman created and persisted NetNS. I traced the podman container cleanup call, specifically the CleanupContainers function in the pkg/adapter/containers.go file, and I don't see it being called in either model with a restart cmd, so I'm not entirely sure how to how to move forward.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does restart need to clean up the old namespace? We've been reusing them for containers running as root.

Comment thread pkg/netns/netns_linux.go
defer func() {
if err := origNS.Set(); err != nil {
logrus.Errorf("unable to set namespace: %q", err)
logrus.Warnf("unable to set namespace: %q", err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this always trigger on rootless?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I have seen, since it is attempting to set it back to the golang thread at /proc/...

Comment thread pkg/netns/netns_linux.go

// UnmountNS unmounts the NS held by the netns object
func UnmountNS(ns ns.NetNS) error {
nsRunDir, err := getNSRunDir()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Where is this used? It seems like it ought to be complaining that this is unused...

Copy link
Copy Markdown
Author

@gabibeyer gabibeyer Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabibeyer
Copy link
Copy Markdown
Author

gabibeyer commented Aug 1, 2019

@AkihiroSuda

do you mean this setns(CLONE_NEWNET)? https://github.com/rootless-containers/slirp4netns/blob/aacef69a52dfa8b3ab005f80ce1f2ec8f7e352f6/main.c#L69

Probably you need to specify userns-path as well

Yeah, its giving an Invalid argument error, even with userns path set as well.

@gabibeyer
Copy link
Copy Markdown
Author

@AkihiroSuda disregard that...it's giving an ioctl(TUNSETIFF): Device or resource busy error

@AkihiroSuda
Copy link
Copy Markdown
Collaborator

@gabibeyer could you open an issue (or PR) in https://github.com/rootless-containers/slirp4netns ?
I'd like to satisfy kata requirements before slirp4netns v0.4.0 GA (but if it doesn't require addition of new CLI flag, it might be deferred to v0.4.1+)

@giuseppe giuseppe self-requested a review August 5, 2019 08:51
Copy link
Copy Markdown
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review. Great work! LGTM

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Aug 5, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019
@openshift-merge-robot openshift-merge-robot merged commit e2f38cd into containers:master Aug 5, 2019
@mheon
Copy link
Copy Markdown
Member

mheon commented Aug 5, 2019

Woah woah woah. I am pretty convinced that the deferred removal of network namespaces in stop is not a good thing. This is potentially breaking cleanup under the covers.

@mheon
Copy link
Copy Markdown
Member

mheon commented Aug 5, 2019

I don't feel comfortable with this in 1.5.0 as such. We need to either make a release branch and revert in there, or figure out if there are blockers to reusing network namespaces for rootless containers during restart.

@giuseppe @gabibeyer Will slirp4netns handle this properly, assuming we don't configure the network namespace twice - we just keep using it.

@gabibeyer
Copy link
Copy Markdown
Author

@mheon from what I hacked around with, it seems possible. I can submit a PR to slirp4netns, and have @AkihiroSuda review to see if it is the appropriate way of handling.

@mheon
Copy link
Copy Markdown
Member

mheon commented Aug 5, 2019

Ack, works for me.

For reference, we're going to make a new branch on this repo for 1.5.0, and revert these patches in the branch (not from master). Once this is fixed we can cut a 1.5.1 release from master, including these patches, for testing.

@mheon
Copy link
Copy Markdown
Member

mheon commented Aug 6, 2019

We just had to revert on master - CI was completely broken (generic 'slirp4netns failed' errors)

@gabibeyer
Copy link
Copy Markdown
Author

gabibeyer commented Aug 6, 2019 via email

@github-actions github-actions Bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create and configure network before container is created in case of rootless Podman