Allow mounting of /proc/sys/kernel/ns_last_pid by dsouzai · Pull Request #3451 · opencontainers/runc

dsouzai · 2022-04-07T18:33:49Z

The CAP_CHECKPOINT_RESTORE linux capability provides the ability to update /proc/sys/kernel/ns_last_pid. However, because this "file" is under /proc, and by default both K8s and CRI-O specify that /proc/sys should be mounted as Read-Only, even with the capability specified, a process will not be able to write to ns_last_pid.

To get around this, a pod author can specify a volume mount and a host path to bind-mount /proc/sys/kernel/ns_last_pid. However, runc does not allow specifying mounts under /proc.

This PR adds /proc/sys/kernel/ns_last_pid to the validProcMounts string array to enable a pod author to mount ns_last_pid as read-write. The default remains unchanged; unless explicitly requested as a volume mount, ns_last_pid will remain read-only regardless of whether or not CAP_CHECKPOINT_RESTORE is specified.

The CAP_CHECKPOINT_RESTORE linux capability provides the ability to update /proc/sys/kernel/ns_last_pid. However, because this file is under /proc, and by default both K8s and CRI-O specify that /proc/sys should be mounted as Read-Only, by default even with the capability specified, a process will not be able to write to ns_last_pid. To get around this, a pod author can specify a volume mount and a hostpath to bind-mount /proc/sys/kernel/ns_last_pid. However, runc does not allow specifying mounts under /proc. This commit adds /proc/sys/kernel/ns_last_pid to the validProcMounts string array to enable a pod author to mount ns_last_pid as read-write. The default remains unchanged; unless explicitly requested as a volume mount, ns_last_pid will remain read-only regardless of whether or not CAP_CHECKPOINT_RESTORE is specified. Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

dsouzai · 2022-04-07T18:34:12Z

fyi @mrunalp @haircommander

kolyshkin · 2022-04-08T01:41:33Z

@dsouzai looks like your idea is to use criu from inside the container. Do you think it is feasible to do so?

adrianreber · 2022-04-08T07:34:33Z

@kolyshkin see cri-o/cri-o#5776 for a discussion about possible use cases.

dsouzai · 2022-04-08T13:39:20Z

Do you think it is feasible to do so?

@kolyshkin Yeah, we have a prototype where we

Build CRIU with the change to allow rootless checkpointing as well as other changes.
Build the JVM that links the criu lib, and which exposes an API to allow java applications to self dump.
Build a container in which the criu binary has the following set to it:

setcap cap_checkpoint_restore,cap_net_admin,cap_sys_ptrace=eip /usr/sbin/criu

Start said container, in which the java application self checkpoints and the process ends. The container is then committed to create a restore image.
Start new container based on the restore image wherein a script launches criu restore.

In K8s, I'm able to successfully run the restore container with the runc change in this PR and the following pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: liberty-criu
  namespace: criu
spec:
  replicas: 1
  selector:
    matchLabels:
      name: liberty-criu
  template:
    metadata:
      labels:
        name: liberty-criu
    spec:
      serviceAccount: criusvcacct
      serviceAccountName: criusvcacct
      containers:
        - name: liberty-criu
          image: localhost/liberty-criu:latest
          imagePullPolicy: Always
          volumeMounts:
          - mountPath: /proc/sys/kernel/ns_last_pid
            name: ns-last-pid-mount
          securityContext:
            capabilities:
              add: [ "CHECKPOINT_RESTORE", "NET_ADMIN", "SYS_PTRACE" ]
      volumes:
      - name: ns-last-pid-mount
        hostPath:
          path: /proc/sys/kernel/ns_last_pid
          type: File

kolyshkin

LGTM

kolyshkin · 2022-04-14T05:35:38Z

@opencontainers/runc-maintainers PTAL

kolyshkin · 2022-04-21T23:43:07Z

@AkihiroSuda PTAL

rst0git

LGTM

thaJeztah

LGTM

cyphar · 2022-05-26T10:56:26Z

@dsouzai sent a mail on oci-dev asking when this will be in a release -- given that the patch is very simple and a bugfix, maybe we should backport it to release-1.1?

kolyshkin approved these changes Apr 8, 2022

View reviewed changes

dsouzai mentioned this pull request Apr 12, 2022

CRIU: Checkpoint/Restore Feature Status eclipse-openj9/openj9#14361

Open

rst0git approved these changes Apr 22, 2022

View reviewed changes

thaJeztah approved these changes Apr 22, 2022

View reviewed changes

thaJeztah merged commit 062fc87 into opencontainers:main Apr 22, 2022

dsouzai mentioned this pull request May 25, 2022

Podman and Docker options for running the CRIU restore image eclipse-openj9/openj9#15117

Closed

This was referenced May 26, 2022

Release 1.1.3 #3490

Merged

[1.1] Allow mounting of /proc/sys/kernel/ns_last_pid #3493

Merged

cyphar added the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label May 26, 2022

kolyshkin added backport/1.1-done A PR in main branch which has been backported to release-1.1 and removed backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels May 27, 2022

natalie-bernhard mentioned this pull request Mar 21, 2023

InstantOn beta in Simplified Chinese Draft OpenLiberty/blogs#3013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow mounting of /proc/sys/kernel/ns_last_pid#3451

Allow mounting of /proc/sys/kernel/ns_last_pid#3451
thaJeztah merged 1 commit intoopencontainers:mainfrom
dsouzai:ns_last_pid

dsouzai commented Apr 7, 2022

Uh oh!

dsouzai commented Apr 7, 2022

Uh oh!

kolyshkin commented Apr 8, 2022

Uh oh!

adrianreber commented Apr 8, 2022

Uh oh!

dsouzai commented Apr 8, 2022

Uh oh!

kolyshkin left a comment

Uh oh!

kolyshkin commented Apr 14, 2022

Uh oh!

kolyshkin commented Apr 21, 2022

Uh oh!

rst0git left a comment

Uh oh!

thaJeztah left a comment

Uh oh!

cyphar commented May 26, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dsouzai commented Apr 7, 2022

Uh oh!

dsouzai commented Apr 7, 2022

Uh oh!

kolyshkin commented Apr 8, 2022

Uh oh!

adrianreber commented Apr 8, 2022

Uh oh!

dsouzai commented Apr 8, 2022

Uh oh!

kolyshkin left a comment

Choose a reason for hiding this comment

Uh oh!

kolyshkin commented Apr 14, 2022

Uh oh!

kolyshkin commented Apr 21, 2022

Uh oh!

rst0git left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

cyphar commented May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cyphar commented May 26, 2022 •

edited

Loading