rootless: Rearrange setup of rootless containers ***CIRRUS: TEST IMAGES*** by gabibeyer · Pull Request #3310 · containers/podman

gabibeyer · 2019-06-12T18:44:19Z

In order to run Podman with VM-based runtimes unprivileged, the
network must be set up prior to the container creation. Therefore
this commit modifies Podman to:

create a network namespace
pass the netns persistent mount path to the slirp4netns
to create the tap inferface
pass the netns path to the OCI spec, so the runtime can
enter the netns

Closes #2897

Signed-off-by: Gabi Beyer gabrielle.n.beyer@intel.com

openshift-ci-robot · 2019-06-12T18:44:40Z

Hi @gabibeyer. Thanks for your PR.

I'm waiting for a containers or openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gabibeyer · 2019-06-12T18:45:27Z

cc @egernst @amshinde

haircommander · 2019-06-12T18:46:18Z

/ok-to-test

rh-atomic-bot · 2019-06-12T18:52:43Z

Can one of the admins verify this patch?
I understand the following commands:

bot, add author to whitelist
bot, test pull request
bot, test pull request once

mheon · 2019-06-12T19:00:55Z

If we're doing this, do we need to retain the old postConfigureNetNS codepath? It'd be much simpler if every container used the same path, configuring the netns before launching the container.

@giuseppe WDYT?

AkihiroSuda · 2019-06-12T19:01:59Z

Seems we should cut a new release of slirp4netns from master for this PR to work.

It should be v0.3.1? or 0.4.0?

@cyphar @giuseppe

mheon · 2019-06-12T19:01:58Z

I feel like we've entirely defeated the point of postConfigureNetNS with this change. If we're unconditionally doing it before the container is created, there's no further point to the code - we should remove it and simplify the logic.

Would we want Podman to create a network namespace for runc as well, and pass the netns path? Or keep the logic of passing the emtpy netns path, and have a more appropriately named variable to differentiate the logic?

I like the idea of simplifying and having a consistent flow.

I would weigh in more of it was clear (to me) the scenario/use-case around postConfigureNetNS - not to nit, but can someone describe the comment on that? Happy to send a PR to add here:
https://github.com/containers/libpod/blob/master/libpod/container.go#L396-L398

I'd make creating the networking namespace and passing it into the runtime unconditional.

I am not sure how easily we can take rid of postConfigureNetNS. It is also needed when a user namespace is created by the OCI runtime. If we create the network namespace before the OCI runtime runs and configures the user namespace, then the network namespace is owned by the wrong user namespace

Keeping a consistent flow for runc and kata sounds good to me, easier to understand, maintain and debug.
While you are at it @gabibeyer, while not strictly related to this PR, it will be useful to add the above explanation from @giuseppe as a doc comment for the postConfigureNetNS flag

giuseppe · 2019-06-12T19:35:47Z

I am not sure how easily we can take rid of postConfigureNetNS. It is also needed when a user namespace is created by the OCI runtime. If we create the network namespace before the OCI runtime runs and configures the user namespace, then the network namespace is owned by the wrong user namespace

amshinde · 2019-06-12T21:35:42Z

Keeping a consistent flow for runc and kata sounds good to me, easier to understand, maintain and debug.
While you are at it @gabibeyer, while not strictly related to this PR, it will be useful to add the above explanation from @giuseppe as a doc comment for the postConfigureNetNS flag

amshinde · 2019-06-12T21:37:09Z

I suppose you will be dropping the if condition for PostConfigureNetNS here and everywhere else @gabibeyer

amshinde · 2019-06-12T21:42:14Z

I would move the pipe creation above before the command-line args for slirp4netns are created, so that we know what file-descriptors "3" and "4" passed to the slirp process are.

mheon · 2019-06-12T22:45:19Z

Per @giuseppe we might still need PostConfigureNetNS for user namespaces - let's hold off on removing for now while we figure that out.

gabibeyer · 2019-06-13T19:01:11Z

@giuseppe Do you know the reasoning for appending the container PID with api-socket? I couldn't figure out the reasoning behind it, but some of the port mapping tests are failing and I want to make sure its not because of removing it.

@giuseppe ^^

gabibeyer · 2019-06-13T23:53:32Z

After some debugging of the failed pod tests, it seems the infra pod either isn't storing or configuring the network namespace correctly. When attemping to execute slirp4netns the netns is nil: ctr.state.NetNS:<nil>. Will update as I go!

gabibeyer · 2019-06-21T16:11:46Z

@mheon @giuseppe I made some changes that I believe will satisfy the postConfiguration logic for OCI specified namespaces. I am having a little trouble with a healthcheck unit test; when I ran them locally I was getting this failure:

ERRO[0000] unable to get systemd connection to add healthchecks: dbus: authentication failed 
ERRO[0000] unable to get systemd connection to start healthchecks: read unix @->/run/systemd/private: read: connection reset by peer

However, when I attempted to run the tests against the current master and it seems to be failing as well, so I'm guessing that may be my environment. I'm having a hard time finding in the CI logs what the error reported is.

Thank you for the help!

mheon · 2019-06-21T16:33:37Z

@baude Can you take a look at the healthcheck test issues here?

gabibeyer · 2019-07-08T17:19:29Z

@baude Do you have any ideas on how I can proceed further, I'm a little stuck. Thank you!

baude · 2019-07-08T17:40:49Z

i dont see the failures here ... is this only local? can you push so i can see what you are talking about?

gabibeyer · 2019-07-08T18:52:44Z

@baude ah sorry, yeah I have been attempting to find where the failures in the "special_testing_rootless" tests are happening. I was attempting to run the RunNginxWithHealthCheck command locally via cmdline (podman run --name test -dt -P --healthcheck-command CMD-SHELL "curl http://localhost/" nginx), since that is returning a 127 errorcode within the unittests. That error comes from that; which makes sense since its probably my environment, so disregard my initial question.

Instead it seems to be something with seccomp and OCI runtime, maybe: starting container process caused "seccomp: config provided but seccomp not supported". Also, manually running port commands seems to be working. Is there a way to see more details/logs for the failed tests?

baude · 2019-07-08T21:36:53Z

i can help you debug them if you force push your code

lsm5 · 2019-07-31T20:32:13Z

@gabibeyer slirp4netns 0.4.0-beta.2 should land in updates-testing soon
https://bodhi.fedoraproject.org/updates/FEDORA-2019-cfa64128f7 .. please add karma when you can.

amshinde · 2019-07-31T21:21:34Z

lgtm @gabibeyer

rhatdan · 2019-08-01T07:55:58Z

@mheon @vrothberg @cevich PTAL

mheon · 2019-08-01T13:35:50Z

+
+			// Setup rootless networking, requires c.state.NetNS to be set
+			if rootless.IsRootless() {
+				rootlessSetupErr = c.runtime.setupRootlessNetNS(c)


Question: why isn't this fatal? Are we expecting a lot of spurious errors here? I would expect a slirp failure to prevent container creation

Because it is a go routine, it it fatal once the other go routine completes, and the error is checked.

mheon · 2019-08-01T13:40:09Z

-	cmd := exec.Command(path, cmdArgs...)
+	cmdArgs = append(cmdArgs, "-c", "-e", "3", "-r", "4")
+	if !ctr.config.PostConfigureNetNS {
+		ctr.rootlessSlirpSyncR, ctr.rootlessSlirpSyncW, err = os.Pipe()


Don't we need defer Close... on these two in here?

mheon · 2019-08-01T13:44:30Z

+				return errors.Wrapf(err, "failed to create rootless network sync pipe")
+			}
+		} else {
+			defer errorhandling.CloseQuiet(ctr.rootlessSlirpSyncR)


So we have two places where we initialize the pipes for slirp, and two places where we do a defer close on the pipes, and those places are not the same... Why is this? It doesn't seem to make sense.

Right, this took me a minute to figure out. Slirp4netns is connected to conmon via this pipe; that's how it knows to die (when the container + conmon stops, the pipe closes at the conmon end, and slirp4netns stops). So, one end of the pipe must be in conmon command, and the other in the slirp4netns command. Depending on whether the new default logic is ran (setup network -> create container) or the PostConfigureNetNS logic (create container -> setup network), the timing of creating and closing the pipe changes.

I looked a little bit into creating the pipe somewhere before both operations, and then closing after both, but didn't see a good place in the code that popped out. I'm definitely open to suggestions!

mheon · 2019-08-01T13:44:57Z

 		stopSignal = uint(syscall.SIGTERM)
 	}

+	defer func() {


This is not needed or desired. The cleanup process should take care of it.

In the case of restart, the previous network namespaces are not getting cleaned up. I am slightly baffled as to where the removal of the netns bind mount is occurring in the stop call in general. It is probably horribly obvious, but I see directly after the call to c.StopWithTimeout the bind mount to the netns is gone /run/user/1000/netns/.... However, when I throw a debug statement at the end of the c.StopWithTimeout function it still exists.
I'll continue to look for the appropriate place to get this cleaned up.

stop itself does not remove them. On the container stopping, the cleanup process (podman container cleanup, spawned by conmon) will fire, removing the network namespace, mounts, etc. On restart, we've just been reusing the old network namespace, instead of configuring a new one.

With the new flow, Podman is now creating the network namespace instead of the runtime, and it is being bind mounted so that it can be passed to slirp and the runtime. I'm storing this in c.state.NetNS, which wasn't being set/used with the previous rootless flow. So, there needs to be an extra step of explicitly cleaning up the podman created and persisted NetNS. I traced the podman container cleanup call, specifically the CleanupContainers function in the pkg/adapter/containers.go file, and I don't see it being called in either model with a restart cmd, so I'm not entirely sure how to how to move forward.

Does restart need to clean up the old namespace? We've been reusing them for containers running as root.

mheon · 2019-08-01T13:46:56Z

 		defer func() {
 			if err := origNS.Set(); err != nil {
-				logrus.Errorf("unable to set namespace: %q", err)
+				logrus.Warnf("unable to set namespace: %q", err)


Does this always trigger on rootless?

As far as I have seen, since it is attempting to set it back to the golang thread at /proc/...

mheon · 2019-08-01T13:47:59Z


 // UnmountNS unmounts the NS held by the netns object
 func UnmountNS(ns ns.NetNS) error {
+	nsRunDir, err := getNSRunDir()


Hm. Where is this used? It seems like it ought to be complaining that this is unused...

https://github.com/gabibeyer/libpod/blob/80dcd4bebcdc8e280f6b43228561d09c194c328b/pkg/netns/netns_linux.go#L179

gabibeyer · 2019-08-01T23:51:03Z

@AkihiroSuda

do you mean this setns(CLONE_NEWNET)? https://github.com/rootless-containers/slirp4netns/blob/aacef69a52dfa8b3ab005f80ce1f2ec8f7e352f6/main.c#L69

Probably you need to specify userns-path as well

Yeah, its giving an Invalid argument error, even with userns path set as well.

gabibeyer · 2019-08-02T00:02:09Z

@AkihiroSuda disregard that...it's giving an ioctl(TUNSETIFF): Device or resource busy error

AkihiroSuda · 2019-08-02T02:04:38Z

@gabibeyer could you open an issue (or PR) in https://github.com/rootless-containers/slirp4netns ?
I'd like to satisfy kata requirements before slirp4netns v0.4.0 GA (but if it doesn't require addition of new CLI flag, it might be deferred to v0.4.1+)

giuseppe

sorry for the late review. Great work! LGTM

rhatdan · 2019-08-05T12:13:40Z

/lgtm

mheon · 2019-08-05T12:50:52Z

Woah woah woah. I am pretty convinced that the deferred removal of network namespaces in stop is not a good thing. This is potentially breaking cleanup under the covers.

mheon · 2019-08-05T13:09:14Z

I don't feel comfortable with this in 1.5.0 as such. We need to either make a release branch and revert in there, or figure out if there are blockers to reusing network namespaces for rootless containers during restart.

@giuseppe @gabibeyer Will slirp4netns handle this properly, assuming we don't configure the network namespace twice - we just keep using it.

gabibeyer · 2019-08-05T16:25:25Z

@mheon from what I hacked around with, it seems possible. I can submit a PR to slirp4netns, and have @AkihiroSuda review to see if it is the appropriate way of handling.

mheon · 2019-08-05T16:30:25Z

Ack, works for me.

For reference, we're going to make a new branch on this repo for 1.5.0, and revert these patches in the branch (not from master). Once this is fixed we can cut a 1.5.1 release from master, including these patches, for testing.

mheon · 2019-08-06T16:25:04Z

We just had to revert on master - CI was completely broken (generic 'slirp4netns failed' errors)

gabibeyer · 2019-08-06T21:34:58Z

Right, it requires a version of slirp4netns only released within the test repo

openshift-ci-robot requested review from giuseppe and jwhonce June 12, 2019 18:44

openshift-ci-robot added the size/M label Jun 12, 2019

openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 12, 2019

openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 12, 2019

mheon reviewed Jun 12, 2019

View reviewed changes

giuseppe reviewed Jun 12, 2019

View reviewed changes

amshinde reviewed Jun 12, 2019

View reviewed changes

gabibeyer force-pushed the rootlessKata branch from 5c1b1db to 37eb4b9 Compare June 12, 2019 23:30

openshift-ci-robot added size/L and removed size/M labels Jun 12, 2019

gabibeyer commented Jun 13, 2019

View reviewed changes

gabibeyer force-pushed the rootlessKata branch 2 times, most recently from d8c1c99 to e4e3835 Compare June 19, 2019 22:11

gabibeyer force-pushed the rootlessKata branch from e4e3835 to 258578d Compare July 10, 2019 17:26

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2019

mheon reviewed Aug 1, 2019

View reviewed changes

Comment thread libpod/networking_linux.go

mheon reviewed Aug 1, 2019

View reviewed changes

giuseppe self-requested a review August 5, 2019 08:51

giuseppe reviewed Aug 5, 2019

View reviewed changes

openshift-ci-robot assigned rhatdan Aug 5, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2019

openshift-merge-robot merged commit e2f38cd into containers:master Aug 5, 2019

rh-atomic-bot mentioned this pull request Aug 5, 2019

networking: use firewall plugin #2940

Merged

basilgello mentioned this pull request Jun 17, 2021

Implement 'networksetup' extension stage #10704

Closed

github-actions Bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023

github-actions Bot locked as resolved and limited conversation to collaborators Sep 26, 2023

Conversation

gabibeyer commented Jun 12, 2019

Uh oh!

openshift-ci-robot commented Jun 12, 2019

Uh oh!

gabibeyer commented Jun 12, 2019

Uh oh!

haircommander commented Jun 12, 2019

Uh oh!

rh-atomic-bot commented Jun 12, 2019

Uh oh!

mheon commented Jun 12, 2019

Uh oh!

AkihiroSuda commented Jun 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabibeyer Jun 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egernst Jun 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mheon commented Jun 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gabibeyer commented Jun 13, 2019

Uh oh!

gabibeyer commented Jun 21, 2019

Uh oh!

mheon commented Jun 21, 2019

Uh oh!

gabibeyer commented Jul 8, 2019

Uh oh!

baude commented Jul 8, 2019

Uh oh!

gabibeyer commented Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baude commented Jul 8, 2019

Uh oh!

lsm5 commented Jul 31, 2019

Uh oh!

amshinde commented Jul 31, 2019

Uh oh!

rhatdan commented Aug 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

gabibeyer Jun 12, 2019 •

edited

Loading

egernst Jun 12, 2019 •

edited

Loading

gabibeyer commented Jul 8, 2019 •

edited

Loading

gabibeyer Aug 1, 2019 •

edited

Loading

gabibeyer commented Aug 1, 2019 •

edited

Loading