Skip to content

Conversation

@tolusha
Copy link
Contributor

@tolusha tolusha commented Sep 29, 2025

What does this PR do?

This PR enables running nested containers by leveraging Linux kernel user namespaces for isolation.
See for details: https://github.com/cgruver/ocp-4-18-nested-container-tech-preview?tab=readme-ov-file

Added configuration (with defaults values):

spec:
  devEnvironments:
    disableContainerRunCapabilities: true
    containerRunConfiguration:
      openShiftSecurityContextConstraint: container-run
      containerSecurityContext:
        allowPrivilegeEscalation: true
        procMount: "Unmasked"
        capabilities:
          add:
          - SETGID
          - SETUID        
      workspacesPodAnnotations: 
        "io.kubernetes.cri-o.Devices": "/dev/fuse,/dev/net/tun"

IMPORTANT

Previously created workspace can't be started, see eclipse-che/che#23605

Screenshot/screencast of this PR

$ podman run -d --rm --name webserver -p 8080:80 quay.io/libpod/banner
Trying to pull quay.io/libpod/banner:latest...
Getting image source signatures
Copying blob 92ec11331c38 done   | 
Copying blob 2408cc74d12b done   | 
Copying blob ef4966331ce5 done   | 
Copying blob 64dc81575282 done   | 
Copying config 5ba9aec95f done   | 
Writing manifest to image destination
137310a73363f922ff3b1a3734eee3b8f0a0cf516cd397a909e54786ac486d0b
$ podman ps
CONTAINER ID  IMAGE                         COMMAND               CREATED         STATUS         PORTS                 NAMES
137310a73363  quay.io/libpod/banner:latest  nginx -g daemon o...  10 seconds ago  Up 10 seconds  0.0.0.0:8080->80/tcp  webserver

What issues does this PR fix or reference?

https://issues.redhat.com/browse/CRW-8320

How to test this PR?

OpenShift

  1. Deploy the operator from sources:
build/scripts/docker-run.sh /bin/bash -c "oc login --token=<...> --server=<...> --insecure-skip-tls-verify=true && build/scripts/olm/test-catalog-from-sources.sh"
  1. chectl dashboard:open

  2. Opens Start / stop a workspace

  3. Enable container run capabilities
    oc patch -n eclipse-che "checluster/eclipse-che" --type=json -p="[{"op": "replace", "path": "/spec/devEnvironments/disableContainerRunCapabilities", "value": false}]"

  4. Start another workspace

  5. Try podman build & podman run commands in the terminal

podman build:

cd /projects
cat > Dockerfile <<EOF
FROM python:3.12
WORKDIR /usr/local/app
EOF
podman build .

podman run:

podman run -d --rm --name webserver -p 8080:80 quay.io/libpod/banner
  1. Check user id
$ id
uid=1000(user) gid=0(root) groups=0(root),1000
  1. Check /etc/sub*id files
$ cat /etc/sub*id 
user:1001:64535
user:1001:64535

PR Checklist

As the author of this Pull Request I made sure that:

Reviewers

Reviewers, please comment how you tested the PR when approving it.

@openshift-ci
Copy link

openshift-ci bot commented Sep 29, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

❌ Patch coverage is 65.70842% with 167 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.03%. Comparing base (f1d91dc) to head (af61a27).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...y/container-capabilities/container_capabilities.go 65.50% 60 Missing and 19 partials ⚠️
api/v2/zz_generated.deepcopy.go 0.00% 30 Missing ⚠️
...trollers/usernamespace/usernamespace_controller.go 52.63% 15 Missing and 3 partials ⚠️
api/v2/checluster_webhook.go 0.00% 13 Missing ⚠️
controllers/che/checluster_controller.go 0.00% 11 Missing ⚠️
pkg/common/test/deploy_context.go 0.00% 6 Missing ⚠️
...eploy/dev-workspace-config/dev_workspace_config.go 90.00% 2 Missing and 1 partial ⚠️
api/v2/checluster_types.go 0.00% 2 Missing ⚠️
...g/deploy/container-capabilities/container_build.go 96.36% 2 Missing ⚠️
pkg/deploy/container-capabilities/container_run.go 97.05% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2046      +/-   ##
==========================================
- Coverage   58.22%   50.03%   -8.19%     
==========================================
  Files          90       94       +4     
  Lines       12401    12506     +105     
==========================================
- Hits         7220     6258     -962     
- Misses       4679     5821    +1142     
+ Partials      502      427      -75     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tolusha tolusha requested a review from Copilot October 1, 2025 11:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for nested containers by introducing container run capabilities to complement the existing container build capabilities. This extends the security configuration options for workspaces to allow more granular control over container permissions.

  • Container run capabilities with dedicated SecurityContextConstraints and RBAC
  • Consolidation of container capabilities logic into a unified reconciler
  • Extension of annotation merging for workspace pods

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
api/v2/checluster_types.go Adds container run configuration types and capability check methods
pkg/deploy/container-capabilities/ New unified reconciler replacing separate container build logic
pkg/deploy/dev-workspace-config/dev_workspace_config.go Updates workspace config with container run capabilities support
pkg/common/diffs/diffs.go Adds RBAC diff options for consistent role/binding comparisons
pkg/common/k8s-client/ Removes owner reference parameters from Sync/Create methods
Various CRD files Updates custom resource definitions with new container run fields
Comments suppressed due to low confidence (3)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@tolusha
Copy link
Contributor Author

tolusha commented Oct 6, 2025

/retest

1 similar comment
@tolusha
Copy link
Contributor Author

tolusha commented Oct 7, 2025

/retest

@ibuziuk
Copy link
Member

ibuziuk commented Oct 7, 2025

@cgruver could you please review? due to scc nature once the nested container toggle is enabled on CR level, it would not be possible to start old workspaces.

@tolusha
Copy link
Contributor Author

tolusha commented Oct 9, 2025

/retest

1 similar comment
@tolusha
Copy link
Contributor Author

tolusha commented Oct 10, 2025

/retest

@rohanKanojia
Copy link
Contributor

I tested with the steps mentioned above and can confirm it's working as expected:

I used empty workspace from the Select a workspace section.

Steps Performed

  1. Built a container image inside the workspace:

    FROM python:3.12
    WORKDIR /usr/local/app
$ podman build .
STEP 1/2: FROM python:3.12
Resolved "python" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/python:3.12...
Getting image source signatures
Copying blob 41837409009d done   | 
Copying blob cae3b572364a done   | 
Copying blob f0c9d6d993ac done   | 
Copying blob bd090f42c4b7 done   | 
Copying blob a2ade626d67a done   | 
Copying blob 6e8f84dd3b31 done   | 
Copying blob 93ead2e1dc36 done   | 
Copying config 2b819d06d0 done   | 
Writing manifest to image destination
STEP 2/2: WORKDIR /usr/local/app
COMMIT
--> d30325a8936d
d30325a8936d2560a3363ab95ef4ee8574631a0b0a3283e0790a1b8e26e873ef

✔️ Image built successfully, pulled layers from Docker Hub.

  1. Ran a test container:
$ podman run -d --rm --name webserver -p 8080:80 quay.io/libpod/banner
Trying to pull quay.io/libpod/banner:latest...

Getting image source signatures
Copying blob 92ec11331c38 done   | 
Copying blob 64dc81575282 done   | 
Copying blob 2408cc74d12b done   | 
Copying blob ef4966331ce5 done   | 
Copying config 5ba9aec95f done   | 
Writing manifest to image destination
6da58ee669a3ce0ca9abceb6794e0d28e863597c1edeb594bb3d16a7dfb1562c

✔️ Container started and is running as expected.

  1. Verified user namespace mapping:

    $ id
    uid=1000(user) gid=0(root) groups=0(root),1000(user)
    
    $ cat /etc/sub*id
    user:1001:64535
    user:1001:64535

✔️ UID/GID remapping confirms correct user namespace setup.

@dkwon17
Copy link
Contributor

dkwon17 commented Oct 22, 2025

I was able to verify the testing steps.

I've also tested this following scenario:

  1. Create a workspace with https://github.com/che-incubator/quarkus-api-example
  2. Run the following devfile tasks
  • Package
  • Build Image
  1. Run the built image by running:
podman run --network=host  quay.io/che-incubator/quarkus-api-example:latest
  1. In a new terminal, access the service started from step 3:
$ curl http://0.0.0.0:8080/food

[{"id":2,"name":"Apple pie","restaurantName":"Fruit Bistro","price":5.99},{"id":1,"name":"Orange","restaurantName":"Fruit Bistro","price":0.99},{"id":4,"name":"Sandwich","restaurantName":"Local Deli","price":4.99},{"id":5,"name":"Soup","restaurantName":"Local Deli","price":1.99},{"id":3,"name":"Strawberry cake","restaurantName":"Fruit Bistro","price":5.99}]quarkus-api-example

DevWorkspaceOperatorName = "devworkspace-operator"
DevWorkspaceServiceAccountName = "devworkspace-controller-serviceaccount"
DefaultContainerBuildSccName = "container-build"
DefaultContainerRunSccName = "container-run"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tolusha, wondering if we need to create a new SCC? maybe we can simply fall back on restricted-v3 or nested-container, that are specifically designed for use with user namespaces according to the RNs - https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html-single/release_notes/index#ocp-release-notes-machine-config-operator-namespace_release-notes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I have to test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, please check, but my understanding is that for the default arbitrary / random UID is used, and we want it to be 1000 for pre-configuring /etc/subuid, /etc/subgid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it does't work for our case.

image

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I prefer to use a custom SCC that pre-determines the UID/GID of the workspace. That allows you to lock down all of the /etc files instead of having to modify them at run time.

I updated my reference if that is helpful - https://github.com/cgruver/ocp-4-20-nested-containers

Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Signed-off-by: Anatolii Bazko <abazko@redhat.com>
@tolusha
Copy link
Contributor Author

tolusha commented Oct 23, 2025

/retest

// So, remove priority. See details https://issues.redhat.com/browse/CRW-3894
scc.Priority = nil

if err := ctx.ClusterAPI.NonCachingClientWrapper.Sync(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, when should we use NonCachingClientWrapper or ClientWrapper?

From what I understand ClientWrapper might read from the cache for get actions, is that the only difference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's the only difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't cache SCC, that's why we need to use noncaching client

@tolusha
Copy link
Contributor Author

tolusha commented Oct 27, 2025

/retest

Signed-off-by: Anatolii Bazko <abazko@redhat.com>
Copy link
Member

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openshift-ci
Copy link

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dkwon17, ibuziuk, mkuznyetsov, tolusha

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tolusha tolusha merged commit dd0065d into main Oct 29, 2025
21 of 24 checks passed
@tolusha tolusha deleted the nestedcontainers branch October 29, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants