Allow mounting of /proc/sys/kernel/ns_last_pid#3451
Merged
thaJeztah merged 1 commit intoopencontainers:mainfrom Apr 22, 2022
dsouzai:ns_last_pid
Merged
Allow mounting of /proc/sys/kernel/ns_last_pid#3451thaJeztah merged 1 commit intoopencontainers:mainfrom dsouzai:ns_last_pid
thaJeztah merged 1 commit intoopencontainers:mainfrom
dsouzai:ns_last_pid
Conversation
The CAP_CHECKPOINT_RESTORE linux capability provides the ability to update /proc/sys/kernel/ns_last_pid. However, because this file is under /proc, and by default both K8s and CRI-O specify that /proc/sys should be mounted as Read-Only, by default even with the capability specified, a process will not be able to write to ns_last_pid. To get around this, a pod author can specify a volume mount and a hostpath to bind-mount /proc/sys/kernel/ns_last_pid. However, runc does not allow specifying mounts under /proc. This commit adds /proc/sys/kernel/ns_last_pid to the validProcMounts string array to enable a pod author to mount ns_last_pid as read-write. The default remains unchanged; unless explicitly requested as a volume mount, ns_last_pid will remain read-only regardless of whether or not CAP_CHECKPOINT_RESTORE is specified. Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
Contributor
Author
Contributor
|
@dsouzai looks like your idea is to use criu from inside the container. Do you think it is feasible to do so? |
Contributor
|
@kolyshkin see cri-o/cri-o#5776 for a discussion about possible use cases. |
Contributor
Author
@kolyshkin Yeah, we have a prototype where we
In K8s, I'm able to successfully run the restore container with the runc change in this PR and the following pod spec: |
Contributor
|
@opencontainers/runc-maintainers PTAL |
Contributor
|
@AkihiroSuda PTAL |
Member
|
@dsouzai sent a mail on oci-dev asking when this will be in a release -- given that the patch is very simple and a bugfix, maybe we should backport it to release-1.1? |
This was referenced May 26, 2022
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
CAP_CHECKPOINT_RESTORElinux capability provides the ability to update/proc/sys/kernel/ns_last_pid. However, because this "file" is under/proc, and by default both K8s and CRI-O specify that/proc/sysshould be mounted as Read-Only, even with the capability specified, a process will not be able to write tons_last_pid.To get around this, a pod author can specify a volume mount and a host path to bind-mount
/proc/sys/kernel/ns_last_pid. However,runcdoes not allow specifying mounts under/proc.This PR adds
/proc/sys/kernel/ns_last_pidto thevalidProcMountsstring array to enable a pod author to mountns_last_pidas read-write. The default remains unchanged; unless explicitly requested as a volume mount,ns_last_pidwill remain read-only regardless of whether or notCAP_CHECKPOINT_RESTOREis specified.