Skip to content

The state.json should be generated prior to the creation of the cgroup.#4535

Closed
jianghao65536 wants to merge 2 commits into
opencontainers:mainfrom
jianghao65536:generate-state-before-procHooks
Closed

The state.json should be generated prior to the creation of the cgroup.#4535
jianghao65536 wants to merge 2 commits into
opencontainers:mainfrom
jianghao65536:generate-state-before-procHooks

Conversation

@jianghao65536
Copy link
Copy Markdown

Fix 4534

Make sure that the state.json is in place before setting up the cgroup or writing 'THAWED' into the freezer.state. This way, the 'runc delete --force' command will work as expected.

@wxx213
Copy link
Copy Markdown

wxx213 commented Nov 19, 2024

@kolyshkin Could you help to check this?

Comment thread libcontainer/process_linux.go Outdated

// A SIGKILL can happen at any time, and without the state.json,
// the 'runc delete --force' command won't be able to clear the cgroup.
_, err = p.container.updateState(p)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
We should know that we will write the pid of init process to state.json, so when we do delete -f, once this init process has dead, but runc stage 2 process is alive, I think maybe we still can't remove this cgroup path either because there is still one process in this cgroup.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, I discovered that when systemd is used to handle cgroup, it gets cleaned up once runc init has exited. However, if we don't use systemd, some remnants of cgroup are left behind.

If we're not using systemd to manage, we might need to do a check in runc delete to see if runc init is still listed in cgroups.procs. This shouldn't take too much time since runc init is bound to exit due to an error. This error happens because when runc init gets to parts like procHooks that require synchronization with the parent process, it fails as the parent process has already been terminated, leading to errors when runc init tries to write or read values from the pipeline, and consequently, it exits.

{BF5FEB57-6023-4623-A04F-7C96FC189661} {F284436E-31E8-4967-AE89-18A124B1C24C}

@lifubang
Copy link
Copy Markdown
Member

I paste the CI error msgs here, you can refer it if you can't see the logs.

  • libcontainer/process_linux.go:564: File is not gofumpt-ed (gofumpt)
    // A SIGKILL can happen at any time, and without the state.json,
  • libcontainer/process_linux.go:564: // A SIGKILL can happen at any time, and without the state.json,
    ^^^ extra whitespace at EOL, please fix
  • Commit sha: 2026161, Author: jianghao53, Committer: jianghao53; The sign-off is missing.

To add your Signed-off-by line to every commit in this branch:

Ensure you have a local copy of your branch by checking out the pull request locally via command line.
In your local branch, run: git rebase HEAD~1 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin generate-state-before-procHooks

@jianghao65536 jianghao65536 force-pushed the generate-state-before-procHooks branch from 2026161 to 7e6327b Compare November 24, 2024 16:41
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I generally agree this is a bug which should be fixed, I don't like the way it is fixed. The issues are:

  • lot of code duplication;
  • API bloat (we now have LoadCreatingState and DestroyCreating -- does libcontainer user really has to care about all this?);
  • maybe some bugs (like, creating-state.json is removed after state.json is written, not at the same time).

Can we reuse the same state.json, and consider the state is "creating" if init pid is not known?

@kolyshkin
Copy link
Copy Markdown
Contributor

Also, it would be nice to have a test case added (somehow).

@jianghao65536
Copy link
Copy Markdown
Author

Thank you, I'll make some adjustments. However, I'm still not sure how to add a test case. This bug isn't easily reproducible unless we simulate a timeout by adding a sleep command before runc creates the state.json file.

@kolyshkin
Copy link
Copy Markdown
Contributor

@jianghao65536 are you still working on this?

@jianghao65536
Copy link
Copy Markdown
Author

@kolyshkin Yes, I've been a bit busy recently and haven't had time to submit. I'll resubmit this week

@jianghao65536
Copy link
Copy Markdown
Author

jianghao65536 commented Feb 26, 2025

My apologies for the late submission. I inadvertently merged the latest code from the main branch into this one, which made things a bit chaotic. To rectify this, I've submitted a new pr, please close this pr

@kolyshkin
Copy link
Copy Markdown
Contributor

I inadvertently merged the latest code from the main branch into this one, which made things a bit chaotic

You can use git pull --rebase origin/main to pick the latest upstream changes without creating a merge commit.

In case you messed up, you can always fix things locally and do git push --force to your repo/branch on github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The 'runc delete --force' command can't delete container if runc receives a SIGKILL before it can generate the state.json file.

4 participants