Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error by logica0419 · Pull Request #3347 · youki-dev/youki

logica0419 · 2026-01-03T14:03:30Z

Description

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test updates
CI/CD related changes
Other (please describe):

Testing

Added new unit tests
Added new integration tests
Ran existing test suite
Tested manually (please provide steps)
- Followed the Steps to Reproduce of [Bug]: Nested containers: exec process not added to cgroup v2 #3342 and got the expected result

Related Issues

Fixes #3342

Additional Context

tommady · 2026-01-03T21:18:14Z

Thanks for opening this PR — I ran into the same issue while working on
#3210 and also needed a retry to re-join the cgroup for exec.

This problem is not about the init process, but about exec processes under cgroup v2 when domain controllers are enabled. Once a controller is turned on, the container’s configured cgroup may no longer be joinable (kernel returns EBUSY / EPERM), and exec is expected to fall back to joining the init process’s cgroup.

This behavior is explicitly documented by runc:

Note for cgroup v2: in case the process can’t join the top level cgroup, runc exec fallback is to try joining the cgroup of container’s init.
https://github.com/opencontainers/runc/blob/main/man/runc-exec.8.md

Importantly, this fallback is exec-only:

init process cgroup placement must still fail hard
only exec processes may retry using the init process’s leaf cgroup

Because this is policy, not cgroup mechanism, runc implements it in the container execution path, not inside the cgroup manager itself. This avoids:

accidentally applying fallback to init
duplicating logic across systemd vs cgroupfs managers
diverging behavior depending on the cgroup backend

For youki, the correct place to implement, I think, is here:

// crates/libcontainer/src/process/container_intermediate_process.rs
fn apply_cgroups<
    C: CgroupManager<Error = E> + ?Sized,
    E: std::error::Error + Send + Sync + 'static,
>(
    cmanager: &C,
    resources: Option<&LinuxResources>,
    init: bool,
) -> Result<()> { ... }

where we know:

whether the process is init or exec
the init PID
and can enforce exec-only fallback semantics

Handling this inside libcgroups (or only for systemd) is insufficient and environment-dependent. The expected behavior should be:

init process: no fallback, fail on cgroup join error
exec process + cgroup v2 + EBUSY/EPERM: retry by joining init’s cgroup
all other errors: fail as before

Without implementing this retry at the libcontainer level (as runc does), exec under cgroup v2 with domain controllers enabled will continue to fail for cgroupfs users.

WDYT? Thanks again.

utam0k · 2026-01-03T22:16:10Z

crates/libcgroups/src/systemd/manager.rs

+    /// The init process PID of the parent container if the container is created as a tenant.
+    parent_init_pid: Option<Pid>,


ContainerType should have parent_init_pid.

utam0k · 2026-01-03T22:25:20Z

crates/libcgroups/src/systemd/manager.rs

+                Err(e) => {
+                    // If adding the process to the cgroup fails due to a "Device or resource busy" error,
+                    // manager tries to join the cgroup of the init process of the tenant container.
+                    if e.to_string().contains("Device or resource busy")


How about getting the error(EBUSY) from the debug client instead of parsing the error message?

I really wanted to, but I couldn't achieve that just by putting the following code here.

impl From<nix::Error> for SystemdClientError { fn from(err: nix::Error) -> SystemdClientError { match err { nix::Error::EBUSY => DbusError::DeviceOrResourceBusy(err.to_string()).into(), _ => DbusError::ConnectionError(err.to_string()).into(), } } }

Seems like socket::sendmsg in dbus_native::DbusConnection::send_message() doesn't emit nix::error::EBUSY. Rather, it puts out an error message with no error in Result.

Could you give me some advice on what I should do here?

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

Since you mentioned it, I'll stop focusing on the detailed code of this PR for now. I'll review the detailed code once we've clarified and implemented the non-demo aspects.

utam0k · 2026-01-03T22:31:29Z

crates/libcgroups/src/common.rs

+// is empty string ("") and the value is the cgroup path the <pid> is in.
+//
+// ref: https://github.com/opencontainers/cgroups/blob/main/utils.go#L171-L219
+pub fn parse_proc_cgroup_file(path: &str) -> Result<HashMap<String, String>, ParseProcCgroupError> {


Could we use the procfs crate? Be careful: if it reads inside the container, please use ProcfsHandle for safety.

logica0419 · 2026-01-06T08:33:47Z

@utam0k @tommady
Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅
I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself.
Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.

Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.

Perspective	Container A	Container B	A's init process	B's init process
Container A	self (InitContainer)	child	init_process	child_init_process
Container B	parent	self (TenantContainer)	parent_init_process	init_process
tommady's comment	-	-	init process	exec process
runc	initProcess (containerProcess)	setnsProcess (containerProcess)	linuxStandardInit	linuxSetnsInit

What confused me here is that the word init process used in Container B's context can mean Container A's init process or B's init process. That's why I used the name parent_init_process for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

logica0419 · 2026-01-06T08:42:23Z

@tommady
Thank you too for finding this PR! I'm happy that I can help you solve the issue.
And, thanks again for the precise explanation of what's happening. I managed to get an abstract understanding, but your explanation helped me strengthen it so much.

For youki, the correct place to implement, I think, is here:

I strongly agree with that. I'll re-implement the logic there.
Thank you so much for the advice.

logica0419 · 2026-01-06T08:48:48Z

Just to clarify, since I've forgotten to put sign-offs on the previous commits and I've pushed a complete re-implementation now, I force-pushed the branch.

utam0k · 2026-01-06T10:35:36Z

Please set it to "ready for review" when you are ready to review the detailed codes after the discussion.

tommady · 2026-01-06T11:04:24Z

@utam0k @tommady Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅 I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself. Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.

Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.
Perspective Container A Container B A's init process B's init process Container A self (InitContainer) child init_process child_init_process Container B parent self (TenantContainer) parent_init_process init_process tommady's comment - - init process exec process runc initProcess (containerProcess) setnsProcess (containerProcess) linuxStandardInit linuxSetnsInit What confused me here is that the word `init process` used in Container B's context **can** mean Container A's init process or B's init process. That's why I used the name `parent_init_process` for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

Thanks for the table — that actually helped me realize part of the confusion is on my side too 😅 I think I’ve been a bit sloppy with naming.

Referring to your table, when I said “init process” I meant Container B’s init process (the TenantContainer being exec’d into), not Container A’s init. In your terms, this is the exec case for Container B: if joining the configured cgroup fails under cgroup v2, exec should fall back to B’s init process cgroup, not the parent’s.

Sorry about the naming confusion 🤪 that’s on me. I’d really appreciate hearing others’ opinions on whether using runc-style naming.

utam0k · 2026-01-06T11:09:34Z

This isn't a separate “Container B”; it's an exec/tenant process joining the existing container's cgroup. So calling it “parent” is confusing. How about landlord_init_pid (landlord = parent init in your context)?

utam0k · 2026-03-29T12:24:28Z

@logica0419 What is the status of this PR?

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

…en add_process_to_unit fails Signed-off-by: Takuto Nagami <logica0419@gmail.com>

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

… process.rs Signed-off-by: Takuto Nagami <logica0419@gmail.com>

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

Copilot

Pull request overview

This PR addresses nested-container cgroup v2 behavior when joining a tenant container’s cgroup via systemd fails with an EBUSY (“Device or resource busy”) error, by falling back to joining the parent (landlord) init process’s cgroup.

Changes:

Add an EBUSY-specific fallback in apply_cgroups to join the landlord init process cgroup by reading /proc/<pid>/cgroup and writing to cgroup.procs.
Extend tenant container process args to carry landlord_init_pid and plumb it from the tenant builder.
Teach the systemd dbus layer to classify System.Error.EBUSY distinctly and surface it via SystemdManagerError::is_ebusy().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
crates/libcontainer/src/process/container_intermediate_process.rs	Adds EBUSY detection and tenant fallback to parent init cgroup during cgroup application.
crates/libcontainer/src/process/args.rs	Extends `ContainerType::TenantContainer` with `landlord_init_pid`.
crates/libcontainer/src/container/tenant_builder.rs	Populates `landlord_init_pid` from the parent container state.
crates/libcgroups/src/systemd/manager.rs	Adds `SystemdManagerError::is_ebusy()` for EBUSY classification.
crates/libcgroups/src/systemd/dbus_native/utils.rs	Introduces a `DbusError::DeviceOrResourceBusy` variant.
crates/libcgroups/src/systemd/dbus_native/proxy.rs	Maps DBus `ErrorName=System.Error.EBUSY` to the new EBUSY error variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T21:00:21Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,
+            // we check the ErrorName header and switch error type
+            // see https://github.com/youki-dev/youki/issues/3342


The comment references a specific file path and line number ("process/container_intermediate_process.rs:262"), which will quickly become stale as the code moves. Prefer referencing the issue link (#3342) or describing the call site without hard-coded line numbers/paths.

Suggested change

// to check if error is EBUSY at process/container_intermediate_process.rs:262,

// we check the ErrorName header and switch error type

// see https://github.com/youki-dev/youki/issues/3342

// For the relevant intermediate-process call path, detect EBUSY via the

// ErrorName header and map it to the more specific error type.

// See https://github.com/youki-dev/youki/issues/3342 for context.

Copilot · 2026-04-04T21:00:21Z

crates/libcontainer/src/process/container_intermediate_process.rs

+    if let Err(err) = cmanager.add_task(pid) {
+        if !init && is_ebusy(&err) {
+            // If adding the process to the cgroup fails due to a "Device or resource busy" error,
+            // manager tries to join the cgroup of the init process of the parent container.
+            tracing::debug!(
+                "failed to add task to cgroup, trying to join parent's init process cgroup"
+            );


This adds a new EBUSY-specific fallback path (joining the parent init process's cgroup via /proc and writing to cgroup.procs), but the unit tests in this module don't exercise it. Add a unit test that simulates an EBUSY add_task error and asserts the fallback behavior is taken (and that non-EBUSY errors still fail).

Copilot · 2026-04-04T21:00:21Z

crates/libcgroups/src/systemd/manager.rs

+        )
+    }
+}
+


The new is_ebusy() helper is used to control a behavior-changing fallback, but it's not covered by tests. Add a unit test that constructs a SystemdManagerError::SystemdClient(SystemdClientError::DBus(DbusError::DeviceOrResourceBusy(_))) and verifies is_ebusy() returns true, and false for other DBus errors.

Suggested change

#[cfg(test)]

mod tests {

use super::*;

#[test]

fn test_is_ebusy_true_for_device_or_resource_busy() {

let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(

DbusError::DeviceOrResourceBusy("busy".into()),

));

assert!(err.is_ebusy());

}

#[test]

fn test_is_ebusy_false_for_other_dbus_errors() {

let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(

DbusError::Failed("not busy".into()),

));

assert!(!err.is_ebusy());

}

}

Should this function be tested?

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

utam0k

I took a quick look at it.

utam0k · 2026-04-04T21:53:47Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            let mut ctr = 0;
+            let message = String::deserialize(&msg.body, &mut ctr)?;
+
+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,


process/container_intermediate_process.rs:262

This is not permanent.

Yep, that was my mistake 😩
I rewrote them based on the Copilot's suggestion.

utam0k · 2026-04-04T21:56:08Z

crates/libcontainer/src/process/args.rs

+        exec_notify_fd: RawFd,
+        landlord_init_pid: Option<Pid>,


Please provide descriptions for these fields.

Also, let's update the developer documentation.
https://youki-dev.github.io/youki/developer/introduction.html

As for exec_notify_fd, I’m not the one who implemented it, so it’s a bit hard for me to explain.
It was added in #1252, so maybe @YJDoc2 could suggest a comment for it?

I'll work on the dev doc later as well.
Which pages should be updated specifically?
I’d also appreciate any advice on what kind of information should be included.

utam0k · 2026-04-04T21:58:12Z

crates/libcontainer/src/process/container_intermediate_process.rs

+    #[cfg(not(feature = "systemd"))]
+    {
+        false
+    }
+
+    #[cfg(feature = "systemd")]


Why was it necessary to separate them in this function?

Yes, since when feature = "systemd" is disabled, e (of type libcgroups::common::AnyManagerError::Systemd) doesn't have .ebusy() method.

By the way, I’m personally not very happy with how information is propagated in manager.rs, or with the complex type casting here.
Could you think of another approach?

utam0k · 2026-04-04T21:59:21Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            let message = String::deserialize(&msg.body, &mut ctr)?;
+
+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,
+            // we check the ErrorName header and switch error type


Suggested change

// we check the ErrorName header and switch error type

// we check the ErrorName header and switch error type.

utam0k · 2026-04-04T22:01:24Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,
+            // we check the ErrorName header and switch error type
+            // see https://github.com/youki-dev/youki/issues/3342
+            const EBUSY_ERROR_NAME: &str = "System.Error.EBUSY";


Wouldn't it be more appropriate to define it in message.rs?

I thought so too, so I moved it 👍

utam0k · 2026-04-04T22:03:30Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+                .and_then(|h| match &h.value {
+                    HeaderValue::String(s) => Some(s.as_str()),
+                    _ => None,
+                })


How about using map? instead of match syntax.

How would that work?
Since I’m not very familiar with Rust, I’d really appreciate it if you could share an example snippet 😃

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

logica0419

Hi, thanks for the review!

First of all, I’m really sorry for leaving this PR unattended for so long.
I’ve been crazily busy over the past three months, and I even got seriously sick last month.
I have more free time now, so I’ll actively work on this until it’s merged.”

【Field naming】

How about landlord_init_pid (landlord = parent init in your context)?

This naming is good, I love it! I've fixed the naming, as you kindly reviewed.

【About Dbus error】
With GitHub Copilot, I found that I can retrieve the original error information from the ErrorName header.
On Ubuntu 24.04, it was "System.Error.EBUSY", so I guess it's common across Linux distributions (not sure)... Please comment if you have any concerns.

Also, as I wrote in the review comment, I'm not really satisfied with the current approach of how DbusError::DeviceOrResourceBusy is propagated to apply_cgroup().
I’m not very experienced with Rust, so I’d really appreciate any suggestions.”

【Testing】
I think some integration or end-to-end tests are necessary.
I’d appreciate any advice on what those tests should look like, since I’m not very familiar with the project structure yet.

logica0419 · 2026-04-04T22:36:09Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            let mut ctr = 0;
+            let message = String::deserialize(&msg.body, &mut ctr)?;
+
+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,


Yep, that was my mistake 😩
I rewrote them based on the Copilot's suggestion.

logica0419 · 2026-04-04T22:49:39Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+            // to check if error is EBUSY at process/container_intermediate_process.rs:262,
+            // we check the ErrorName header and switch error type
+            // see https://github.com/youki-dev/youki/issues/3342
+            const EBUSY_ERROR_NAME: &str = "System.Error.EBUSY";


I thought so too, so I moved it 👍

logica0419 · 2026-04-04T22:51:16Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

+                .and_then(|h| match &h.value {
+                    HeaderValue::String(s) => Some(s.as_str()),
+                    _ => None,
+                })


How would that work?
Since I’m not very familiar with Rust, I’d really appreciate it if you could share an example snippet 😃

logica0419 · 2026-04-04T22:51:50Z

crates/libcgroups/src/systemd/manager.rs

+        )
+    }
+}
+


Should this function be tested?

logica0419 · 2026-04-04T22:57:10Z

crates/libcontainer/src/process/args.rs

+        exec_notify_fd: RawFd,
+        landlord_init_pid: Option<Pid>,


As for exec_notify_fd, I’m not the one who implemented it, so it’s a bit hard for me to explain.
It was added in #1252, so maybe @YJDoc2 could suggest a comment for it?

logica0419 · 2026-04-04T23:00:26Z

crates/libcontainer/src/process/container_intermediate_process.rs

+    #[cfg(not(feature = "systemd"))]
+    {
+        false
+    }
+
+    #[cfg(feature = "systemd")]


Yes, since when feature = "systemd" is disabled, e (of type libcgroups::common::AnyManagerError::Systemd) doesn't have .ebusy() method.

By the way, I’m personally not very happy with how information is propagated in manager.rs, or with the complex type casting here.
Could you think of another approach?

logica0419 · 2026-04-04T23:15:03Z

crates/libcontainer/src/process/args.rs

+        exec_notify_fd: RawFd,
+        landlord_init_pid: Option<Pid>,


I'll work on the dev doc later as well.
Which pages should be updated specifically?
I’d also appreciate any advice on what kind of information should be included.

tommady · 2026-04-07T09:00:48Z

crates/libcgroups/src/systemd/dbus_native/proxy.rs

        if !error_message.is_empty() {
            let msg = error_message[0];
            if msg.body.is_empty() {
                // this should rarely be the case
                return Err(DbusError::MethodCallErr("Unknown Dbus Error".into()).into());
-            } else {
-                // in error message, first item of the body (if present) is always a string
-                // indicating the error
-                let mut ctr = 0;
-                let msg = String::deserialize(&msg.body, &mut ctr)?;
-                return Err(DbusError::MethodCallErr(msg).into());
            }
+
+            // in error message, first item of the body (if present) is always a string
+            // indicating the error
+            let mut ctr = 0;
+            let message = String::deserialize(&msg.body, &mut ctr)?;
+
+            // To check if error is EBUSY in the intermediate_process, detect EBUSY via the
+            // ErrorName header and map it to the more specific error type.
+            // See https://github.com/youki-dev/youki/issues/3342 for context.
+            if let Some(error_name) = msg
+                .headers
+                .iter()
+                .find(|h| h.kind == HeaderKind::ErrorName)
+                .and_then(|h| match &h.value {
+                    HeaderValue::String(s) => Some(s.as_str()),
+                    _ => None,
+                })
+                && error_name == ERROR_NAME_EBUSY
+            {
+                return Err(DbusError::DeviceOrResourceBusy(message).into());
+            }
+
+            return Err(DbusError::MethodCallErr(message).into());
        }


from the D-Bus spec (Message Protocol -> Message Types -> Error)
https://dbus.freedesktop.org/doc/dbus-specification.html

An ERROR may have any arguments, but if the first argument is a STRING, it must be an error message. The error message may be logged or shown to the user in some way.

from my understanding that means an error might have 0 arguments (an empty body).

therefore, I’d suggest keeping the else branch, but avoiding an early return when the body is empty. Instead, handle the empty body explicitly, for example:

if !error_message.is_empty() { let msg = error_message[0]; let message = if msg.body.is_empty() { "Unknown Dbus Error".into() } else { // in error message, first item of the body (if present) is always a string // indicating the error let mut ctr = 0; String::deserialize(&msg.body, &mut ctr)? }; // To check if error is EBUSY in the intermediate_process, detect EBUSY via the // ErrorName header and map it to the more specific error type. // See https://github.com/youki-dev/youki/issues/3342 for context. if let Some(error_name) = msg ... }

That's a pretty smart Rust technique, I love that!

tommady · 2026-04-07T09:31:52Z

crates/libcontainer/src/process/container_intermediate_process.rs

+            if let ContainerType::TenantContainer {
+                exec_notify_fd: _,
+                landlord_init_pid,
+            } = container_type
+                && let Some(landlord_init_pid) = landlord_init_pid
+                && let Some(landlord_init_proc_cgroup) =
+                    ProcessCGroups::from_read(ProcfsHandle::new()?.open(
+                        ProcfsBase::ProcPid(landlord_init_pid.as_raw() as u32),
+                        "cgroup",
+                        OpenFlags::O_RDONLY | OpenFlags::O_CLOEXEC,
+                    )?)?
+                    .into_iter()
+                    .find(|c| c.controllers.is_empty())
+                && let Some(landlord_init_proc_cgroup_path) =
+                    landlord_init_proc_cgroup.pathname.strip_prefix("/")
+            {
+                libcgroups::common::write_cgroup_file(
+                    Path::new(libcgroups::common::DEFAULT_CGROUP_ROOT)
+                        .join(Path::new(landlord_init_proc_cgroup_path))
+                        .join(libcgroups::common::CGROUP_PROCS),
+                    pid,
+                )
+                .map_err(|err| IntermediateProcessError::Cgroup(err.to_string()))?;
+                return Ok(());
+            }


I think there’s a subtle issue with how the fallback logic is currently written.

right now it relies quite a bit on the ? operator, which means any failure during the fallback immediately bubbles up.
the problem is that this ends up overwriting the original error, which is usually the one I actually care about.

that can make debugging pretty confusing

scenario:

systemd returns an EBUSY (“Device or resource busy”) error when youki attempts to join the cgroup.

youki then enters the fallback path and tries to join the landlord’s init process cgroup.

however, the landlord process exits just before the fallback executes, so /proc/{pid}/cgroup no longer exists (ENOENT).

as a result, the function exits early with the fallback error, the user ends up seeing only:

ERROR youki::process::container_intermediate_process error during container init: Cgroup("Procfs Error: No such file or directory")

the original "Device or resource busy" error is completely gone.
the user thinks youki failed because a file was missing, not because of a cgroup conflict.

what I think would feel better:

the fallback is more of a “best effort” thing, so it probably shouldn’t be able to override the original failure.
it would be much easier to understand what’s going on if we kept the original error and just logged what happened during the fallback. something like:

DEBUG youki::process::container_intermediate_process failed to add task to cgroup, trying to join parent's init process cgroup DEBUG youki::process::container_intermediate_process failed to read landlord cgroup: Procfs Error: No such file or directory ERROR youki::process::container_intermediate_process failed to add task to cgroup pid=1234 err="Device or resource busy" init=false

this way:

we still see the fallback attempt (useful for debugging),

but we don’t lose the original reason things failed.

WDYT?

tommady added the kind/bug label Jan 3, 2026

utam0k requested a review from tommady January 3, 2026 21:43

utam0k requested changes Jan 3, 2026

View reviewed changes

logica0419 mentioned this pull request Jan 4, 2026

[Bug]: Nested containers: exec process not added to cgroup v2 #3342

Open

logica0419 force-pushed the retyry-systemd-cgroup-EBUSY branch from 2b24f7f to e1e62b0 Compare January 6, 2026 08:44

utam0k marked this pull request as draft January 6, 2026 10:34

tommady mentioned this pull request Jan 6, 2026

fix(3207, 3209) Difference between the exec command in runc and youki #3210

Merged

13 tasks

logica0419 added 6 commits April 5, 2026 04:36

add parent_init_pid field in ContainerType

374522e

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

try to join the cgroup of the init process of the parent container wh…

9c4f8f6

…en add_process_to_unit fails Signed-off-by: Takuto Nagami <logica0419@gmail.com>

fix lint

e34da97

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

fix field name to landlord

6bb1b0c

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

define resource busy error and propagate it to container_intermediate…

126a000

… process.rs Signed-off-by: Takuto Nagami <logica0419@gmail.com>

switch dbus proxy method_call error with ErrorName header

b2c5320

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

logica0419 force-pushed the retyry-systemd-cgroup-EBUSY branch from fc1ad10 to b2c5320 Compare April 4, 2026 20:56

logica0419 marked this pull request as ready for review April 4, 2026 20:56

Copilot AI review requested due to automatic review settings April 4, 2026 20:56

Copilot started reviewing on behalf of logica0419 April 4, 2026 20:57 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

check is_ebusy only if systemd feature is enabled

30c3935

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

utam0k requested changes Apr 4, 2026

View reviewed changes

utam0k added this to the v1.0.0 milestone Apr 4, 2026

logica0419 added 3 commits April 5, 2026 07:22

fix comment

8457b4f

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

move ERROR_NAME_EBUSY to message.rs

7041fa7

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

add comment to landlord_init_pid field

cd91198

Signed-off-by: Takuto Nagami <logica0419@gmail.com>

logica0419 commented Apr 4, 2026

View reviewed changes

tommady reviewed Apr 7, 2026

View reviewed changes

		/// The init process PID of the parent container if the container is created as a tenant.
		parent_init_pid: Option<Pid>,

+#[cfg(test)]
+mod tests {
+    use super::*;
+    #[test]
+    fn test_is_ebusy_true_for_device_or_resource_busy() {
+        let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(
+            DbusError::DeviceOrResourceBusy("busy".into()),
+        ));
+        assert!(err.is_ebusy());
+    }
+    #[test]
+    fn test_is_ebusy_false_for_other_dbus_errors() {
+        let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(
+            DbusError::Failed("not busy".into()),
+        ));
+        assert!(!err.is_ebusy());
+    }
+}

	// we check the ErrorName header and switch error type
	// we check the ErrorName header and switch error type.

Conversation

logica0419 commented Jan 3, 2026

Description

Type of Change

Testing

Related Issues

Additional Context

Uh oh!

tommady commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

logica0419 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

logica0419 commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

tommady commented Jan 6, 2026

Uh oh!

utam0k commented Jan 6, 2026

Uh oh!

utam0k commented Mar 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utam0k left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

logica0419 Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tommady commented Jan 3, 2026 •

edited

Loading

logica0419 commented Jan 6, 2026 •

edited

Loading

logica0419 Apr 4, 2026 •

edited

Loading

logica0419 Apr 4, 2026 •

edited

Loading

logica0419 Apr 4, 2026 •

edited

Loading

logica0419 Apr 4, 2026 •

edited

Loading

tommady Apr 7, 2026 •

edited

Loading

tommady Apr 7, 2026 •

edited

Loading