Skip to content

Implement best-effort mount clean up when using host mount namespace#3118

Open
SteeleDesmond wants to merge 2 commits intoopencontainers:mainfrom
SteeleDesmond:clean-mounts-no-ns
Open

Implement best-effort mount clean up when using host mount namespace#3118
SteeleDesmond wants to merge 2 commits intoopencontainers:mainfrom
SteeleDesmond:clean-mounts-no-ns

Conversation

@SteeleDesmond
Copy link
Copy Markdown

The goal in the runtime spec is to unmount container mounts created during container create processing.

Below shows an example case where mounts persist on the host after container delete. With this commit, a best-effort clean up is made to remove mounts in LIFO order during container delete when the host mount namespace is used. This is done by matching the container rootfs and mount destination paths given in the container config.

root@srd-server:/# tree runc-test-bundles/ -L 2
runc-test-bundles/
└── busybox-mount-bundle
    ├── config.json
    └── rootfs
root@srd-server:/# cat runc-test-bundles/busybox-mount-bundle/config.json | grep namespaces -A 13
		"namespaces": [
			{
				"type": "pid"
			},
			{
				"type": "network"
			},
			{
				"type": "ipc"
			},
			{
				"type": "uts"
			}
		]
root@srd-server:/# cat runc-test-bundles/busybox-mount-bundle/config.json | grep mounts -A 75
	"mounts": [
		{
			"destination": "/proc",
			"type": "proc",
			"source": "proc"
		},
		{
			"destination": "/dev",
			"type": "tmpfs",
			"source": "tmpfs",
			"options": [
				"nosuid",
				"strictatime",
				"mode=755",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/pts",
			"type": "devpts",
			"source": "devpts",
			"options": [
				"nosuid",
				"noexec",
				"newinstance",
				"ptmxmode=0666",
				"mode=0620",
				"gid=5"
			]
		},
		{
			"destination": "/dev/shm",
			"type": "tmpfs",
			"source": "shm",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"mode=1777",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/mqueue",
			"type": "mqueue",
			"source": "mqueue",
			"options": [
				"nosuid",
				"noexec",
				"nodev"
			]
		},
		{
			"destination": "/sys",
			"type": "sysfs",
			"source": "sysfs",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"ro"
			]
		},
		{
			"destination": "/sys/fs/cgroup",
			"type": "cgroup",
			"source": "cgroup",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"relatime",
				"ro"
			]
		}
	],
root@srd-server:/# ./runc create -b /runc-test-bundles/busybox-mount-bundle/ testid1
root@srd-server:/# mount | grep /runc-test-bundles/
/dev/mapper/ubuntu--vg-ubuntu--lv on /runc-test-bundles/busybox-mount-bundle/rootfs type ext4 (rw,relatime)
proc on /runc-test-bundles/busybox-mount-bundle/rootfs/proc type proc (rw,relatime)
tmpfs on /runc-test-bundles/busybox-mount-bundle/rootfs/dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /runc-test-bundles/busybox-mount-bundle/rootfs/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
shm on /runc-test-bundles/busybox-mount-bundle/rootfs/dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
mqueue on /runc-test-bundles/busybox-mount-bundle/rootfs/dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
sysfs on /runc-test-bundles/busybox-mount-bundle/rootfs/sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/rdma type cgroup (ro,nosuid,nodev,noexec,relatime,rdma)
cgroup on /runc-test-bundles/busybox-mount-bundle/rootfs/sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
root@srd-server:/# ./runc delete testid1
# The above container mounts will persist after container delete. With the change they are unmounted during delete.

See the references below for similar discussions around this issue.
References:
Issue #2095
Issue #1909
libcontainer: containers with host fs root

Signed-off-by: Steele Ray Desmond steele.desmond@ibm.com

@kolyshkin
Copy link
Copy Markdown
Contributor

@SteeleDesmond this calls for an integration test (see tests/integration/ for examples).

@kolyshkin
Copy link
Copy Markdown
Contributor

@SteeleDesmond please rebase

@SteeleDesmond SteeleDesmond force-pushed the clean-mounts-no-ns branch 2 times, most recently from cfa1a71 to 389d9f7 Compare August 3, 2021 18:50
@kolyshkin
Copy link
Copy Markdown
Contributor

@SteeleDesmond ^^^

Comment thread libcontainer/state_linux.go Outdated
config := c.Config()

// Unmount recursive
err := unix.Unmount(config.Rootfs, unix.MNT_DETACH)
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: this relies on the assumption that Rootfs is a mount point, and that assumption is correct.

Comment thread libcontainer/state_linux.go Outdated

// If recursive unmount fails, try best-effort unmount
for i := len(config.Mounts) - 1; i >= 0; i-- {
mountpoint := config.Rootfs + config.Mounts[i].Destination
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use securejoin (or, better, WithProcfd()) here?

Probably not, since we're in a host mountns and we can't have a situation where the mount point is an absolute symlink which only make sense inside the mountns (something like #3047).

@cyphar PTAL 🙏🏻

mountpoint := config.Rootfs + config.Mounts[i].Destination
err := unix.Unmount(mountpoint, unix.MNT_DETACH)
if err != nil {
logrus.Warn(err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix.Unmount errors are bare (e.g. ENOENT), so logging it would not make any sense.

Please use unmount which wraps the error.

Comment thread libcontainer/state_linux.go Outdated
logrus.Warn(err)
}
}
err = unix.Unmount(config.Rootfs, unix.MNT_DETACH)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work, @SteeleDesmond!

Aside from a nit about using unmount vs unix.Unmount, this definitely needs a test case (e.g. an integration test, look into tests/integration/mounts.bats).

@kolyshkin
Copy link
Copy Markdown
Contributor

This still needs a rebase (to pick up latest and greatest CI).

Comment thread libcontainer/state_linux.go Outdated
Comment on lines +57 to +58
if !c.config.Namespaces.Contains(configs.NEWNS) ||
c.config.Namespaces.PathOf(configs.NEWNS) != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use IsPrivate now (introduced by commit 2a7dcbb).

@kolyshkin
Copy link
Copy Markdown
Contributor

@SteeleDesmond do you intend to keep working on this? If yes, I can help with a test case.

@SteeleDesmond SteeleDesmond force-pushed the clean-mounts-no-ns branch 3 times, most recently from 4fb26b3 to ba14a1c Compare December 20, 2025 00:55
Signed-off-by: Steele Ray Desmond <steele@desmond.sh>
Signed-off-by: Steele Ray Desmond <steele@desmond.sh>
@SteeleDesmond
Copy link
Copy Markdown
Author

@kolyshkin Apologies, I had a change of employment and priorities but I went and took another look at this. Feel free to look it over!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants