The state.json should be generated prior to the creation of the cgroup.#4535
The state.json should be generated prior to the creation of the cgroup.#4535jianghao65536 wants to merge 2 commits into
Conversation
|
@kolyshkin Could you help to check this? |
|
|
||
| // A SIGKILL can happen at any time, and without the state.json, | ||
| // the 'runc delete --force' command won't be able to clear the cgroup. | ||
| _, err = p.container.updateState(p) |
There was a problem hiding this comment.
Thanks.
We should know that we will write the pid of init process to state.json, so when we do delete -f, once this init process has dead, but runc stage 2 process is alive, I think maybe we still can't remove this cgroup path either because there is still one process in this cgroup.
There was a problem hiding this comment.
In my testing, I discovered that when systemd is used to handle cgroup, it gets cleaned up once runc init has exited. However, if we don't use systemd, some remnants of cgroup are left behind.
If we're not using systemd to manage, we might need to do a check in runc delete to see if runc init is still listed in cgroups.procs. This shouldn't take too much time since runc init is bound to exit due to an error. This error happens because when runc init gets to parts like procHooks that require synchronization with the parent process, it fails as the parent process has already been terminated, leading to errors when runc init tries to write or read values from the pipeline, and consequently, it exits.
|
I paste the CI error msgs here, you can refer it if you can't see the logs.
To add your Signed-off-by line to every commit in this branch: Ensure you have a local copy of your branch by checking out the pull request locally via command line. |
Signed-off-by: jh <jianghao65536@gmail.com>
2026161 to
7e6327b
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
While I generally agree this is a bug which should be fixed, I don't like the way it is fixed. The issues are:
- lot of code duplication;
- API bloat (we now have
LoadCreatingStateandDestroyCreating-- does libcontainer user really has to care about all this?); - maybe some bugs (like,
creating-state.jsonis removed afterstate.jsonis written, not at the same time).
Can we reuse the same state.json, and consider the state is "creating" if init pid is not known?
|
Also, it would be nice to have a test case added (somehow). |
|
Thank you, I'll make some adjustments. However, I'm still not sure how to add a test case. This bug isn't easily reproducible unless we simulate a timeout by adding a sleep command before runc creates the state.json file. |
|
@jianghao65536 are you still working on this? |
|
@kolyshkin Yes, I've been a bit busy recently and haven't had time to submit. I'll resubmit this week |
|
My apologies for the late submission. I inadvertently merged the latest code from the main branch into this one, which made things a bit chaotic. To rectify this, I've submitted a new pr, please close this pr |
You can use In case you messed up, you can always fix things locally and do |
Fix 4534
Make sure that the state.json is in place before setting up the cgroup or writing 'THAWED' into the freezer.state. This way, the 'runc delete --force' command will work as expected.