Skip to content

Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error#3347

Open
logica0419 wants to merge 10 commits intoyouki-dev:mainfrom
logica0419:retyry-systemd-cgroup-EBUSY

Conversation

@logica0419
Copy link
Copy Markdown
Contributor

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test updates
  • CI/CD related changes
  • Other (please describe):

Testing

Related Issues

Fixes #3342

Additional Context

@tommady
Copy link
Copy Markdown
Collaborator

tommady commented Jan 3, 2026

Thanks for opening this PR — I ran into the same issue while working on
#3210 and also needed a retry to re-join the cgroup for exec.

This problem is not about the init process, but about exec processes under cgroup v2 when domain controllers are enabled. Once a controller is turned on, the container’s configured cgroup may no longer be joinable (kernel returns EBUSY / EPERM), and exec is expected to fall back to joining the init process’s cgroup.

This behavior is explicitly documented by runc:

Note for cgroup v2: in case the process can’t join the top level cgroup, runc exec fallback is to try joining the cgroup of container’s init.
https://github.com/opencontainers/runc/blob/main/man/runc-exec.8.md

Importantly, this fallback is exec-only:

  • init process cgroup placement must still fail hard
  • only exec processes may retry using the init process’s leaf cgroup

Because this is policy, not cgroup mechanism, runc implements it in the container execution path, not inside the cgroup manager itself. This avoids:

  • accidentally applying fallback to init
  • duplicating logic across systemd vs cgroupfs managers
  • diverging behavior depending on the cgroup backend

For youki, the correct place to implement, I think, is here:

// crates/libcontainer/src/process/container_intermediate_process.rs
fn apply_cgroups<
    C: CgroupManager<Error = E> + ?Sized,
    E: std::error::Error + Send + Sync + 'static,
>(
    cmanager: &C,
    resources: Option<&LinuxResources>,
    init: bool,
) -> Result<()> { ... }

where we know:

  • whether the process is init or exec
  • the init PID
  • and can enforce exec-only fallback semantics

Handling this inside libcgroups (or only for systemd) is insufficient and environment-dependent. The expected behavior should be:

  • init process: no fallback, fail on cgroup join error
  • exec process + cgroup v2 + EBUSY/EPERM: retry by joining init’s cgroup
  • all other errors: fail as before

Without implementing this retry at the libcontainer level (as runc does), exec under cgroup v2 with domain controllers enabled will continue to fail for cgroupfs users.

WDYT? Thanks again.

@utam0k utam0k requested a review from tommady January 3, 2026 21:43
Comment on lines +56 to +57
/// The init process PID of the parent container if the container is created as a tenant.
parent_init_pid: Option<Pid>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerType should have parent_init_pid.

Err(e) => {
// If adding the process to the cgroup fails due to a "Device or resource busy" error,
// manager tries to join the cgroup of the init process of the tenant container.
if e.to_string().contains("Device or resource busy")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about getting the error(EBUSY) from the debug client instead of parsing the error message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted to, but I couldn't achieve that just by putting the following code here.

impl From<nix::Error> for SystemdClientError {
    fn from(err: nix::Error) -> SystemdClientError {
        match err {
            nix::Error::EBUSY => DbusError::DeviceOrResourceBusy(err.to_string()).into(),
            _ => DbusError::ConnectionError(err.to_string()).into(),
        }
    }
}

Seems like socket::sendmsg in dbus_native::DbusConnection::send_message() doesn't emit nix::error::EBUSY. Rather, it puts out an error message with no error in Result.

Could you give me some advice on what I should do here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

Since you mentioned it, I'll stop focusing on the detailed code of this PR for now. I'll review the detailed code once we've clarified and implemented the non-demo aspects.

// is empty string ("") and the value is the cgroup path the <pid> is in.
//
// ref: https://github.com/opencontainers/cgroups/blob/main/utils.go#L171-L219
pub fn parse_proc_cgroup_file(path: &str) -> Result<HashMap<String, String>, ParseProcCgroupError> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the procfs crate? Be careful: if it reads inside the container, please use ProcfsHandle for safety.

@logica0419 logica0419 changed the title Try to join the cgroup of the init process of the tenant container when add_process_to_unit fails due to a "Device or resource busy" error Try to join the cgroup of the init process of the parent container when apply_cgroup for tenant container fails due to a "Device or resource busy" error Jan 4, 2026
@logica0419 logica0419 changed the title Try to join the cgroup of the init process of the parent container when apply_cgroup for tenant container fails due to a "Device or resource busy" error Try to join the cgroup of the init process of the parent container when apply_cgroup for a tenant container fails due to a "Device or resource busy" error Jan 4, 2026
@logica0419
Copy link
Copy Markdown
Contributor Author

logica0419 commented Jan 6, 2026

@utam0k @tommady
Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅
I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself.
Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.


Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.

Image
Perspective Container A Container B A's init process B's init process
Container A self (InitContainer) child init_process child_init_process
Container B parent self (TenantContainer) parent_init_process init_process
tommady's comment - - init process exec process
runc initProcess (containerProcess) setnsProcess (containerProcess) linuxStandardInit linuxSetnsInit

What confused me here is that the word init process used in Container B's context can mean Container A's init process or B's init process. That's why I used the name parent_init_process for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

@logica0419
Copy link
Copy Markdown
Contributor Author

@tommady
Thank you too for finding this PR! I'm happy that I can help you solve the issue.
And, thanks again for the precise explanation of what's happening. I managed to get an abstract understanding, but your explanation helped me strengthen it so much.

For youki, the correct place to implement, I think, is here:

I strongly agree with that. I'll re-implement the logic there.
Thank you so much for the advice.

@logica0419 logica0419 force-pushed the retyry-systemd-cgroup-EBUSY branch from 2b24f7f to e1e62b0 Compare January 6, 2026 08:44
@logica0419
Copy link
Copy Markdown
Contributor Author

Just to clarify, since I've forgotten to put sign-offs on the previous commits and I've pushed a complete re-implementation now, I force-pushed the branch.

@utam0k utam0k marked this pull request as draft January 6, 2026 10:34
@utam0k
Copy link
Copy Markdown
Member

utam0k commented Jan 6, 2026

Please set it to "ready for review" when you are ready to review the detailed codes after the discussion.

@tommady
Copy link
Copy Markdown
Collaborator

tommady commented Jan 6, 2026

@utam0k @tommady Thanks for the quick feedback! I didn’t expect comments to come in so fast 😅 I was planning to write the explanation today (I was pretty exhausted last night), so this was a nice surprise.

To clarify, the goal of this PR’s initial implementation is to serve as a conceptual demo, providing a starting point for discussing a more suitable implementation.

I’m still not very familiar with youki, or even Rust itself. Please feel free to point out any issues, including basic ones or anything related to “Rust-ish” coding style.

Also, this may be out of context, but I want to clarify the wording here. I made a table of the wording I imagine.

Image Perspective Container A Container B A's init process B's init process Container A self (InitContainer) child init_process child_init_process Container B parent self (TenantContainer) parent_init_process init_process tommady's comment - - init process exec process runc initProcess (containerProcess) setnsProcess (containerProcess) linuxStandardInit linuxSetnsInit What confused me here is that the word `init process` used in Container B's context **can** mean Container A's init process or B's init process. That's why I used the name `parent_init_process` for Container A's init process in the implementation.

FYI: in runc, Container A's init process is called initProcessPid even in the context of Container B.
https://github.com/opencontainers/runc/blob/main/libcontainer/process_linux.go#L175

WDYT about this? Should I use the name init process as runc does?

Thanks for the table — that actually helped me realize part of the confusion is on my side too 😅 I think I’ve been a bit sloppy with naming.

Referring to your table, when I said “init process” I meant Container B’s init process (the TenantContainer being exec’d into), not Container A’s init. In your terms, this is the exec case for Container B: if joining the configured cgroup fails under cgroup v2, exec should fall back to B’s init process cgroup, not the parent’s.

Sorry about the naming confusion 🤪 that’s on me. I’d really appreciate hearing others’ opinions on whether using runc-style naming.

@utam0k
Copy link
Copy Markdown
Member

utam0k commented Jan 6, 2026

This isn't a separate “Container B”; it's an exec/tenant process joining the existing container's cgroup. So calling it “parent” is confusing. How about landlord_init_pid (landlord = parent init in your context)?

@utam0k
Copy link
Copy Markdown
Member

utam0k commented Mar 29, 2026

@logica0419 What is the status of this PR?

Signed-off-by: Takuto Nagami <logica0419@gmail.com>
…en add_process_to_unit fails

Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
… process.rs

Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
@logica0419 logica0419 force-pushed the retyry-systemd-cgroup-EBUSY branch from fc1ad10 to b2c5320 Compare April 4, 2026 20:56
@logica0419 logica0419 marked this pull request as ready for review April 4, 2026 20:56
Copilot AI review requested due to automatic review settings April 4, 2026 20:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses nested-container cgroup v2 behavior when joining a tenant container’s cgroup via systemd fails with an EBUSY (“Device or resource busy”) error, by falling back to joining the parent (landlord) init process’s cgroup.

Changes:

  • Add an EBUSY-specific fallback in apply_cgroups to join the landlord init process cgroup by reading /proc/<pid>/cgroup and writing to cgroup.procs.
  • Extend tenant container process args to carry landlord_init_pid and plumb it from the tenant builder.
  • Teach the systemd dbus layer to classify System.Error.EBUSY distinctly and surface it via SystemdManagerError::is_ebusy().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/libcontainer/src/process/container_intermediate_process.rs Adds EBUSY detection and tenant fallback to parent init cgroup during cgroup application.
crates/libcontainer/src/process/args.rs Extends ContainerType::TenantContainer with landlord_init_pid.
crates/libcontainer/src/container/tenant_builder.rs Populates landlord_init_pid from the parent container state.
crates/libcgroups/src/systemd/manager.rs Adds SystemdManagerError::is_ebusy() for EBUSY classification.
crates/libcgroups/src/systemd/dbus_native/utils.rs Introduces a DbusError::DeviceOrResourceBusy variant.
crates/libcgroups/src/systemd/dbus_native/proxy.rs Maps DBus ErrorName=System.Error.EBUSY to the new EBUSY error variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +107 to +109
// to check if error is EBUSY at process/container_intermediate_process.rs:262,
// we check the ErrorName header and switch error type
// see https://github.com/youki-dev/youki/issues/3342
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references a specific file path and line number ("process/container_intermediate_process.rs:262"), which will quickly become stale as the code moves. Prefer referencing the issue link (#3342) or describing the call site without hard-coded line numbers/paths.

Suggested change
// to check if error is EBUSY at process/container_intermediate_process.rs:262,
// we check the ErrorName header and switch error type
// see https://github.com/youki-dev/youki/issues/3342
// For the relevant intermediate-process call path, detect EBUSY via the
// ErrorName header and map it to the more specific error type.
// See https://github.com/youki-dev/youki/issues/3342 for context.

Copilot uses AI. Check for mistakes.
Comment on lines +283 to +289
if let Err(err) = cmanager.add_task(pid) {
if !init && is_ebusy(&err) {
// If adding the process to the cgroup fails due to a "Device or resource busy" error,
// manager tries to join the cgroup of the init process of the parent container.
tracing::debug!(
"failed to add task to cgroup, trying to join parent's init process cgroup"
);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a new EBUSY-specific fallback path (joining the parent init process's cgroup via /proc and writing to cgroup.procs), but the unit tests in this module don't exercise it. Add a unit test that simulates an EBUSY add_task error and asserts the fallback behavior is taken (and that non-EBUSY errors still fail).

Copilot uses AI. Check for mistakes.
)
}
}

Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new is_ebusy() helper is used to control a behavior-changing fallback, but it's not covered by tests. Add a unit test that constructs a SystemdManagerError::SystemdClient(SystemdClientError::DBus(DbusError::DeviceOrResourceBusy(_))) and verifies is_ebusy() returns true, and false for other DBus errors.

Suggested change
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_is_ebusy_true_for_device_or_resource_busy() {
let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(
DbusError::DeviceOrResourceBusy("busy".into()),
));
assert!(err.is_ebusy());
}
#[test]
fn test_is_ebusy_false_for_other_dbus_errors() {
let err = SystemdManagerError::SystemdClient(SystemdClientError::DBus(
DbusError::Failed("not busy".into()),
));
assert!(!err.is_ebusy());
}
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be tested?

Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Copy link
Copy Markdown
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at it.

let mut ctr = 0;
let message = String::deserialize(&msg.body, &mut ctr)?;

// to check if error is EBUSY at process/container_intermediate_process.rs:262,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process/container_intermediate_process.rs:262

This is not permanent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was my mistake 😩
I rewrote them based on the Copilot's suggestion.

Comment on lines +18 to +19
exec_notify_fd: RawFd,
landlord_init_pid: Option<Pid>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide descriptions for these fields.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, let's update the developer documentation.
https://youki-dev.github.io/youki/developer/introduction.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for exec_notify_fd, I’m not the one who implemented it, so it’s a bit hard for me to explain.
It was added in #1252, so maybe @YJDoc2 could suggest a comment for it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on the dev doc later as well.
Which pages should be updated specifically?
I’d also appreciate any advice on what kind of information should be included.

Comment on lines +266 to +271
#[cfg(not(feature = "systemd"))]
{
false
}

#[cfg(feature = "systemd")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it necessary to separate them in this function?

Copy link
Copy Markdown
Contributor Author

@logica0419 logica0419 Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since when feature = "systemd" is disabled, e (of type libcgroups::common::AnyManagerError::Systemd) doesn't have .ebusy() method.

By the way, I’m personally not very happy with how information is propagated in manager.rs, or with the complex type casting here.
Could you think of another approach?

let message = String::deserialize(&msg.body, &mut ctr)?;

// to check if error is EBUSY at process/container_intermediate_process.rs:262,
// we check the ErrorName header and switch error type
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// we check the ErrorName header and switch error type
// we check the ErrorName header and switch error type.

// to check if error is EBUSY at process/container_intermediate_process.rs:262,
// we check the ErrorName header and switch error type
// see https://github.com/youki-dev/youki/issues/3342
const EBUSY_ERROR_NAME: &str = "System.Error.EBUSY";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more appropriate to define it in message.rs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, so I moved it 👍

Comment on lines +115 to +118
.and_then(|h| match &h.value {
HeaderValue::String(s) => Some(s.as_str()),
_ => None,
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using map? instead of match syntax.

Copy link
Copy Markdown
Contributor Author

@logica0419 logica0419 Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work?
Since I’m not very familiar with Rust, I’d really appreciate it if you could share an example snippet 😃

@utam0k utam0k added this to the v1.0.0 milestone Apr 4, 2026
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Signed-off-by: Takuto Nagami <logica0419@gmail.com>
Copy link
Copy Markdown
Contributor Author

@logica0419 logica0419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the review!

First of all, I’m really sorry for leaving this PR unattended for so long.
I’ve been crazily busy over the past three months, and I even got seriously sick last month.
I have more free time now, so I’ll actively work on this until it’s merged.”


【Field naming】

How about landlord_init_pid (landlord = parent init in your context)?

This naming is good, I love it! I've fixed the naming, as you kindly reviewed.


【About Dbus error】
With GitHub Copilot, I found that I can retrieve the original error information from the ErrorName header.
On Ubuntu 24.04, it was "System.Error.EBUSY", so I guess it's common across Linux distributions (not sure)... Please comment if you have any concerns.

Also, as I wrote in the review comment, I'm not really satisfied with the current approach of how DbusError::DeviceOrResourceBusy is propagated to apply_cgroup().
I’m not very experienced with Rust, so I’d really appreciate any suggestions.”


【Testing】
I think some integration or end-to-end tests are necessary.
I’d appreciate any advice on what those tests should look like, since I’m not very familiar with the project structure yet.

let mut ctr = 0;
let message = String::deserialize(&msg.body, &mut ctr)?;

// to check if error is EBUSY at process/container_intermediate_process.rs:262,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was my mistake 😩
I rewrote them based on the Copilot's suggestion.

// to check if error is EBUSY at process/container_intermediate_process.rs:262,
// we check the ErrorName header and switch error type
// see https://github.com/youki-dev/youki/issues/3342
const EBUSY_ERROR_NAME: &str = "System.Error.EBUSY";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, so I moved it 👍

Comment on lines +115 to +118
.and_then(|h| match &h.value {
HeaderValue::String(s) => Some(s.as_str()),
_ => None,
})
Copy link
Copy Markdown
Contributor Author

@logica0419 logica0419 Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work?
Since I’m not very familiar with Rust, I’d really appreciate it if you could share an example snippet 😃

)
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be tested?

Comment on lines +18 to +19
exec_notify_fd: RawFd,
landlord_init_pid: Option<Pid>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for exec_notify_fd, I’m not the one who implemented it, so it’s a bit hard for me to explain.
It was added in #1252, so maybe @YJDoc2 could suggest a comment for it?

Comment on lines +266 to +271
#[cfg(not(feature = "systemd"))]
{
false
}

#[cfg(feature = "systemd")]
Copy link
Copy Markdown
Contributor Author

@logica0419 logica0419 Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since when feature = "systemd" is disabled, e (of type libcgroups::common::AnyManagerError::Systemd) doesn't have .ebusy() method.

By the way, I’m personally not very happy with how information is propagated in manager.rs, or with the complex type casting here.
Could you think of another approach?

Comment on lines +18 to +19
exec_notify_fd: RawFd,
landlord_init_pid: Option<Pid>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on the dev doc later as well.
Which pages should be updated specifically?
I’d also appreciate any advice on what kind of information should be included.

Comment on lines 95 to 124
if !error_message.is_empty() {
let msg = error_message[0];
if msg.body.is_empty() {
// this should rarely be the case
return Err(DbusError::MethodCallErr("Unknown Dbus Error".into()).into());
} else {
// in error message, first item of the body (if present) is always a string
// indicating the error
let mut ctr = 0;
let msg = String::deserialize(&msg.body, &mut ctr)?;
return Err(DbusError::MethodCallErr(msg).into());
}

// in error message, first item of the body (if present) is always a string
// indicating the error
let mut ctr = 0;
let message = String::deserialize(&msg.body, &mut ctr)?;

// To check if error is EBUSY in the intermediate_process, detect EBUSY via the
// ErrorName header and map it to the more specific error type.
// See https://github.com/youki-dev/youki/issues/3342 for context.
if let Some(error_name) = msg
.headers
.iter()
.find(|h| h.kind == HeaderKind::ErrorName)
.and_then(|h| match &h.value {
HeaderValue::String(s) => Some(s.as_str()),
_ => None,
})
&& error_name == ERROR_NAME_EBUSY
{
return Err(DbusError::DeviceOrResourceBusy(message).into());
}

return Err(DbusError::MethodCallErr(message).into());
}
Copy link
Copy Markdown
Collaborator

@tommady tommady Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the D-Bus spec (Message Protocol -> Message Types -> Error)
https://dbus.freedesktop.org/doc/dbus-specification.html

An ERROR may have any arguments, but if the first argument is a STRING, it must be an error message. The error message may be logged or shown to the user in some way.

from my understanding that means an error might have 0 arguments (an empty body).

therefore, I’d suggest keeping the else branch, but avoiding an early return when the body is empty. Instead, handle the empty body explicitly, for example:

        if !error_message.is_empty() {
            let msg = error_message[0];

            let message = if msg.body.is_empty() {
                "Unknown Dbus Error".into()
            } else {
                // in error message, first item of the body (if present) is always a string
                // indicating the error
                let mut ctr = 0;
                String::deserialize(&msg.body, &mut ctr)?
            };

            // To check if error is EBUSY in the intermediate_process, detect EBUSY via the
            // ErrorName header and map it to the more specific error type.
            // See https://github.com/youki-dev/youki/issues/3342 for context.
            if let Some(error_name) = msg
            ...
        }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty smart Rust technique, I love that!

Comment on lines +299 to +323
if let ContainerType::TenantContainer {
exec_notify_fd: _,
landlord_init_pid,
} = container_type
&& let Some(landlord_init_pid) = landlord_init_pid
&& let Some(landlord_init_proc_cgroup) =
ProcessCGroups::from_read(ProcfsHandle::new()?.open(
ProcfsBase::ProcPid(landlord_init_pid.as_raw() as u32),
"cgroup",
OpenFlags::O_RDONLY | OpenFlags::O_CLOEXEC,
)?)?
.into_iter()
.find(|c| c.controllers.is_empty())
&& let Some(landlord_init_proc_cgroup_path) =
landlord_init_proc_cgroup.pathname.strip_prefix("/")
{
libcgroups::common::write_cgroup_file(
Path::new(libcgroups::common::DEFAULT_CGROUP_ROOT)
.join(Path::new(landlord_init_proc_cgroup_path))
.join(libcgroups::common::CGROUP_PROCS),
pid,
)
.map_err(|err| IntermediateProcessError::Cgroup(err.to_string()))?;
return Ok(());
}
Copy link
Copy Markdown
Collaborator

@tommady tommady Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there’s a subtle issue with how the fallback logic is currently written.

right now it relies quite a bit on the ? operator, which means any failure during the fallback immediately bubbles up.
the problem is that this ends up overwriting the original error, which is usually the one I actually care about.

that can make debugging pretty confusing

scenario:

  1. systemd returns an EBUSY (“Device or resource busy”) error when youki attempts to join the cgroup.
  2. youki then enters the fallback path and tries to join the landlord’s init process cgroup.
  3. however, the landlord process exits just before the fallback executes, so /proc/{pid}/cgroup no longer exists (ENOENT).

as a result, the function exits early with the fallback error, the user ends up seeing only:

ERROR  youki::process::container_intermediate_process  error during container init: Cgroup("Procfs Error: No such file or directory")

the original "Device or resource busy" error is completely gone.
the user thinks youki failed because a file was missing, not because of a cgroup conflict.

what I think would feel better:

the fallback is more of a “best effort” thing, so it probably shouldn’t be able to override the original failure.
it would be much easier to understand what’s going on if we kept the original error and just logged what happened during the fallback. something like:

DEBUG  youki::process::container_intermediate_process  failed to add task to cgroup, trying to join parent's init process cgroup
DEBUG  youki::process::container_intermediate_process  failed to read landlord cgroup: Procfs Error: No such file or directory
ERROR  youki::process::container_intermediate_process  failed to add task to cgroup pid=1234 err="Device or resource busy" init=false

this way:

  1. we still see the fallback attempt (useful for debugging),
  2. but we don’t lose the original reason things failed.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Nested containers: exec process not added to cgroup v2

4 participants