diff --git a/README.md b/README.md index 8cbe1fe6878..50fcd4e9222 100644 --- a/README.md +++ b/README.md @@ -113,8 +113,6 @@ The following build tags were used earlier, but are now obsoleted: - **apparmor** (since runc v1.0.0-rc93 the feature is always enabled) - **selinux** (since runc v1.0.0-rc93 the feature is always enabled) - [contrib-memfd-bind]: /contrib/cmd/memfd-bind/README.md - ### Running the test suite `runc` currently supports running its test suite via Docker. diff --git a/contrib/cmd/memfd-bind/README.md b/contrib/cmd/memfd-bind/README.md index a83cc78208c..93229250259 100644 --- a/contrib/cmd/memfd-bind/README.md +++ b/contrib/cmd/memfd-bind/README.md @@ -1,6 +1,15 @@ ## memfd-bind ## -`runc` normally has to make a binary copy of itself when constructing a +> **NOTE**: Since runc 1.2.0, runc will now use a private overlayfs mount to +> protect the runc binary (if you are on Linux 5.1 or later). This protection +> is far more light-weight than memfd-bind, and for most users this should +> obviate the need for `memfd-bind` entirely. Rootless containers will still +> make a memfd copy (unless you are using `runc` itself inside a user namespace +> -- a-la [`rootlesskit`][rootlesskit] -- and are on Linux 5.11 or later), but +> `memfd-bind` is not particularly useful for rootless container users anyway +> (see [Caveats](#Caveats) for more details). + +`runc` sometimes has to make a binary copy of itself when constructing a container process in order to defend against certain container runtime attacks such as CVE-2019-5736. @@ -38,6 +47,8 @@ much memory usage they can use: container process setup takes up about 10MB per process spawned inside the container by runc (both pid1 and `runc exec`). +[rootlesskit]: https://github.com/rootless-containers/rootlesskit + ### Caveats ### There are several downsides with using `memfd-bind` on the `runc` binary: diff --git a/libcontainer/dmz/overlayfs_linux.go b/libcontainer/dmz/overlayfs_linux.go index 92cb1944e59..b81b7025895 100644 --- a/libcontainer/dmz/overlayfs_linux.go +++ b/libcontainer/dmz/overlayfs_linux.go @@ -84,6 +84,13 @@ func sealedOverlayfs(binPath, tmpDir string) (_ *os.File, Err error) { return nil, fmt.Errorf("fsconfig set overlayfs lowerdir=%s: %w", lowerDirStr, err) } + // We don't care about xino (Linux 4.17) but it will be auto-enabled on + // some systems (if /run/runc and /usr/bin are on different filesystems) + // and this produces spurious dmesg log entries. We can safely ignore + // errors when disabling this because we don't actually care about the + // setting and we're just opportunistically disabling it. + _ = unix.FsconfigSetString(int(overlayCtx.Fd()), "xino", "off") + // Get an actual handle to the overlayfs. if err := unix.FsconfigCreate(int(overlayCtx.Fd())); err != nil { return nil, os.NewSyscallError("fsconfig create overlayfs", err)