Skip to content

Remove global device mknod cgroup rules#1623

Closed
dasizeman wants to merge 1 commit intoopencontainers:mainfrom
dasizeman:global_cgroup_mknod_fix
Closed

Remove global device mknod cgroup rules#1623
dasizeman wants to merge 1 commit intoopencontainers:mainfrom
dasizeman:global_cgroup_mknod_fix

Conversation

@dasizeman
Copy link

These global rules seem like they aren't needed and they cause conflicts with parent cgroup policies.

Fixes #1618

@cyphar
Copy link
Member

cyphar commented Oct 28, 2017

You need to add a Signed-off-by: line to your commit(s) which indicates that you attest the Developer Certificate of Origin a statement about your contributions that you must read before signing (don't worry, it's quite short and easy-to-read). You can add it to your commits with git commit --amend -s, and then doing a git push --force.

Signed-off-by: Dave Sizer <dave@sizetron.net>
@dasizeman dasizeman force-pushed the global_cgroup_mknod_fix branch from eb6aff5 to f8b2d32 Compare October 30, 2017 20:22
@dasizeman
Copy link
Author

I have signed off on the commit, thanks.

@cyphar
Copy link
Member

cyphar commented Oct 31, 2017

/cc @crosbymichael -- Do you remember why this was around originally?

@crosbymichael
Copy link
Member

One of the reasons is that many things mknod devices and install fails. Things like java mknod fuse, etc. Mknod is safe across the board as long as rw is blocked via cgroups.

Also mknod is save because there is nothing stopping an image from bringing in device nodes so we have a wildcard to allow mknod and block access

@dasizeman
Copy link
Author

I see, so it seems like really the issue I was originally having in #1618 is because SLURM shouldn't be blocking mknod on the devices in the parent policy.

@crosbymichael Do you have any more details/references on how you are sure that global mknod is safe, and maybe some examples of things that fail without it? I would like to understand this better and also I am going to need a compelling argument if I want a change in SLURM to be accepted.

@crosbymichael
Copy link
Member

@dasizeman

I think this will give you all the history and information that you need. Everything from kernel devs to users involved ;)

moby/moby#2979

@rhatdan
Copy link
Contributor

rhatdan commented Oct 31, 2017

Personally I would prefer that this was off. There are hardly any containers that would ever need mknod, and I believe in relying on as many security mechanisms as possible to protect against container break out. Perhaps having separate capabilities for build and run would be helpful. But when far less then 1 % of all containers needs mknod, we should not enable it by default. But that is water under the bridge at this point for docker/moby I am afraid.

@cyphar
Copy link
Member

cyphar commented Nov 1, 2017

@crosbymichael I agree with you from the Docker side of things, but it feels as though these default should be added to the specs that Docker generates rather than having them in runc -- especially since if you wanted to make different mknod policies you'd need to mess around with devices.deny a fair bit in order to make runc act nicely. Not to mention that tools (like umoci) that generate root filesystems for runtime bundles might do filtering of possible inode major+minor numbers.

And while a process inside a container cannot open(...) the device inode, I have found some quite worrying kernel bugs (which I've reported and am discussing upstream) that would make it theoretically possible to escalate the ability to access an inode into being able to trick another (more privileged) process into opening it.

@mrunalp mrunalp added this to the 1.0.0 milestone Nov 1, 2017
@dasizeman
Copy link
Author

@crosbymichael Thanks for pointing me toward that PR, it did a great job of catching me up on the history.

I actually tend to agree with @cyphar here. Based on the original need for these permissions in Docker, it seems like they only need to be present at image build time. The correct course of action in my opinion would be to remove them in runc, and have Docker only implicitly enable them during image builds, not when running containers (if they were needed at runtime, they could be set like any other special permission).

If everyone agrees, I would be happy to help with getting the needed changes merged on the moby side of things.

@crosbymichael
Copy link
Member

I'm fine with it as long as we update projects that rely on runc so that no regressions are introduced. This change is not about security, its about where the responsibility lies. Also the moby change should not apply to just build time, it should apply to all, I'm not comfortable saying that people are not ever going to drop into a bash session and run an apt-get install and it breaks after this change.

@dasizeman
Copy link
Author

dasizeman commented Nov 2, 2017

Good point about apt-get installs from the shell, I forgot about that. I think you're right that it's a responsibility issue, and I think that the responsibility of allowing mknod to enable certain package installs should lie with moby, not runc. Perhaps the best move forward here is to get this merged in runc and update any downstream projects.

Unfortunately, on the moby side of things, this is a more complicated issue. These mknod permissions are required for certain use cases, but at the same time they are really an inconsistency as far as container security goes: they allow certain package installs, while simultaneously breaking --cgroup-parent.

I will open an issue in moby to start a discussion on how to integrate this change if it were merged.

@crosbymichael What other projects are there downstream? I believe moby actually uses runc indirectly through containerd correct? Also, what would the process be for releasing a change like this and making sure we make the needed changes in downstream projects? If we can't figure out a good way to "solve" the inconsistency on the moby side anyway, then I would imagine it may not be worth going through the process of integrating this right now, since there would be no motivating use case (my original use case with Docker would still be broken if no change was made on that side).

@cyphar
Copy link
Member

cyphar commented Nov 3, 2017

@dasizeman

Unfortunately, on the moby side of things, this is a more complicated issue. These mknod permissions are required for certain use cases, but at the same time they are really an inconsistency as far as container security goes: they allow certain package installs, while simultaneously breaking --cgroup-parent.

[Insert common complaint that --cgroup-parent is usually misused.]

This is actually situation that can be detected somewhat trivially by whoever is making decisions on what devices should be allowed. devices.list gives you the effective set of devices that are allowed -- and userspace (whether that be runc or Docker) should be able to figure out whether a *:* m is going to be permitted in a sub-cgroup (of course you'd need to replicate the kernel parsing code and other set-related logic for devices). We could add some functions in libcontainer than Docker could then use, since we already have a lot of code in runc that does these sorts of things.

There are some slight complications for blacklist devices cgroups -- but in that situation there's nothing we can do in userspace other than just try the bad configuration and then adjust afterwards. And ultimately, if you're using a blacklist you're already doing things wrong.

What other projects are there downstream? I believe moby actually uses runc indirectly through containerd correct?

That is correct, but there are other downstreams. cri-o uses runc. Cloud Foundry has also switched to using runc. There's also tools like orca-build which use runc (and there are certainly many others).

@cyphar cyphar removed this from the 1.0.0 milestone Nov 14, 2018
@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

I'm removing this from 1.0 since we don't have time to land other external-to-the-project commits.

@kolyshkin
Copy link
Contributor

I'm inclined to close this one, as every time we remove some permissions we end up realizing some upper-level tools depend on them and/or have no way to re-add those when needed, thus bringing in the unnecessary turmoil. Last time it happened then we tried to remove tun/tap device from the default set (see #3468, #4555).

Feel free to reopen or file a new PR/issue if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default linux cgroup devices.allow rules too general

6 participants