libct: fix a race with systemd removal by kolyshkin · Pull Request #3812 · opencontainers/runc

kolyshkin · 2023-04-05T01:48:32Z

For the previous attempt to fix that (and added test cases), see commit 9087f2e (PR #2338).

Alas, it's not always working because of cgroup directory TOCTOU.

To solve this and avoid the race, add an error after the operation. Implement it as a method that ignores the error that should be ignored. Instead of currentStatus(), use faster runType(), since we are not interested in Paused status here.

For Processes(), remove the pre-op check, and only use it after getting an error, making the non-error path more straightforward.

For Signal(), add a second check after getting an error. The first check is left as is because signalAllProcesses might print a warning if the cgroup does not exist, and we'd like to avoid that.

This should fix an occasional failure like this one:

not ok 84 kill detached busybox
# (in test file tests/integration/kill.bats, line 27)
#   `[ "$status" -eq 0 ]' failed
....
# runc kill test_busybox KILL (status=0):
# runc kill -a test_busybox 0 (status=1):
# time="2023-04-04T18:24:27Z" level=error msg="lstat /sys/fs/cgroup/devices/system.slice/runc-test_busybox.scope: no such file or directory"

Fixes: #3372
Fixes: #3744

kolyshkin · 2023-04-06T22:15:29Z

@lifubang @thaJeztah PTAL

lifubang · 2023-04-06T23:40:23Z

libcontainer/container_linux.go

-	// for systemd cgroup, the unit's cgroup path will be auto removed if container's all processes exited
-	if status == Stopped && !c.cgroupManager.Exists() {
+	pids, err := c.cgroupManager.GetAllPids()
+	if c.ignoreCgroupError(err) == nil {


Maybe there is a more readable implement like this:

change the return type of ignoreCgroupError to bool;

if err == nil || c.ignoreCgroupError(err) {
Otherwise the reader needs to see the function body of ignoreCgroupError to find out what happened here.

Perhaps reverse the logic, which seems more natural;

if err := c.ignoreCgroupError(err); err != nil { return nil, fmt.Errorf("unable to get all container pids: %w", err) } return pids, nil

Oh well, my initial implementation was called isIgnorableCgroupError and returned bool. Then I took the approach used by func ignoreTerminateErrors.

Either way is fine with me.

lifubang · 2023-04-06T23:43:31Z

Yes, I think there is a race condition here, because systemd is a service, we can't know the exactly time of the removal of the cgroup path when there is no pid in this cgroup.

thaJeztah · 2023-04-06T23:59:41Z

libcontainer/container_linux.go

-	// for systemd cgroup, the unit's cgroup path will be auto removed if container's all processes exited
-	if status == Stopped && !c.cgroupManager.Exists() {
+	pids, err := c.cgroupManager.GetAllPids()
+	if c.ignoreCgroupError(err) == nil {


Perhaps reverse the logic, which seems more natural;

if err := c.ignoreCgroupError(err); err != nil { return nil, fmt.Errorf("unable to get all container pids: %w", err) } return pids, nil

libcontainer/container_linux.go

kolyshkin · 2023-04-07T00:44:00Z

Now, here'a an interesting question: why runc kill <CTID> <SIG> fails if the container is stopped, while runc kill -a <CTID> <SIG> does not?

lifubang · 2023-04-08T02:43:29Z

Now, here'a an interesting question: why runc kill <CTID> <SIG> fails if the container is stopped, while runc kill -a <CTID> <SIG> does not?

I think this happened when randomly hitting this issue.
Both kill with -a or not may hit this issue randomly. Because ps and kill with -a are all calling m.GetAllPids.

kolyshkin · 2023-04-11T20:52:52Z

Now, here'a an interesting question: why runc kill <CTID> <SIG> fails if the container is stopped, while runc kill -a <CTID> <SIG> does not?

I think this happened when randomly hitting this issue. Both kill with -a or not may hit this issue randomly. Because ps and kill with -a are all calling m.GetAllPids.

Let me rephrase my question.

Currently, on a stopped container, runc kill fails, but runc kill -a does not. See:

[root@kir-rhat runc-tst]# ./runc list
ID          PID         STATUS      BUNDLE                                                CREATED                          OWNER
123         0           stopped     /home/kir/go/src/github.com/opencontainers/runc-tst   2023-04-06T21:51:20.521334579Z   root
[root@kir-rhat runc-tst]# ./runc kill 123; echo $?
ERRO[0000] container not running                        
1
[root@kir-rhat runc-tst]# ./runc kill -a 123; echo $?
0

My question was -- is this (the fact that adding -a makes the error go away) an intended behavior, and why?

lifubang · 2023-04-12T01:21:58Z

My question was -- is this (the fact that adding -a makes the error go away) an intended behavior, and why?

As described in runtime-spec, please see https://github.com/opencontainers/runtime-spec/blob/main/runtime.md?plain=1#L131-L137
Attempting to send a signal to a container that is neither created nor running MUST have no effect on the container and MUST generate an error.
I think this is an intended behavior.

For -a, it's not defined in runtime-spec. But I think it's also right for the stopped container, for example: shared namespace containers. So, shall we need to add a description for -a in runtime-spec?

kolyshkin · 2023-04-18T19:21:38Z

For -a, it's not defined in runtime-spec. But I think it's also right for the stopped container, for example: shared namespace containers. So, shall we need to add a description for -a in runtime-spec?

Good question. So, it looks that kill -a currently violates the spec.

There are a few ways to look at how runc kill -a should work:

Adhere to runtime-spec description quoted above, and return a "container not running" error (i.e. same behavior as without -a option).
Amend the -a description with "Ignore the container-not-running error". This makes it possible to use the command in a scenario where we want to kill and destroy a container no matter what.
Add --ignore-stopped boolean option to achieve the same scenario as in 2 (and optionally have -a implying --ignore-stopped, which can be reverted by specifying --ignore-stopped=false or smth).

kolyshkin · 2023-04-18T19:22:04Z

@giuseppe WDYT? (see previous comment)

giuseppe · 2023-04-18T20:11:16Z

I've no strong opinion, I've followed the same runc behavior in crun.

I am afraid to change the current behavior if anyone depends on it (although the entire mechanism seems like a big race condition).

-a is useful only when there is no PID namespace, since otherwise it is fine to just kill the first process. If -a would follow the runtime spec and fail when the container is stopped, then it wouldn't be able to terminate the other processes. Perhaps the behavior is fine and should just be documented?

kolyshkin · 2023-04-18T20:28:03Z

OK let's agree to merely document the existing behavior of -a.

thaJeztah

LGTM

thaJeztah · 2023-04-19T07:23:22Z

OK let's agree to merely document the existing behavior of -a.

Are you planning to open a PR for that, or do we need a tracking ticket?

(looks like this PR may need a rebase, as it's marked "outdated")

For a previous attempt to fix that (and added test cases), see commit 9087f2e. Alas, it's not always working because of cgroup directory TOCTOU. To solve this and avoid the race, add an error _after_ the operation. Implement it as a method that ignores the error that should be ignored. Instead of currentStatus(), use faster runType(), since we are not interested in Paused status here. For Processes(), remove the pre-op check, and only use it after getting an error, making the non-error path more straightforward. For Signal(), add a second check after getting an error. The first check is left as is because signalAllProcesses might print a warning if the cgroup does not exist, and we'd like to avoid that. This should fix an occasional failure like this one: not ok 84 kill detached busybox # (in test file tests/integration/kill.bats, line 27) # `[ "$status" -eq 0 ]' failed .... # runc kill test_busybox KILL (status=0): # runc kill -a test_busybox 0 (status=1): # time="2023-04-04T18:24:27Z" level=error msg="lstat /sys/fs/cgroup/devices/system.slice/runc-test_busybox.scope: no such file or directory" Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2023-04-21T01:01:30Z

Are you planning to open a PR for that, or do we need a tracking ticket?

Opened #3834

(looks like this PR may need a rebase, as it's marked "outdated")

Rebased

kolyshkin · 2023-05-22T21:20:05Z

1.1 backport: #3877

kolyshkin mentioned this pull request Apr 5, 2023

libct/cg: rm GetInitCgroup[Path] #3810

Merged

kolyshkin added this to the 1.2.0 milestone Apr 5, 2023

AkihiroSuda previously approved these changes Apr 6, 2023

View reviewed changes

kolyshkin added area/systemd backport/1.1-pr A backport PR to release-1.1 backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 and removed backport/1.1-pr A backport PR to release-1.1 labels Apr 6, 2023

lifubang reviewed Apr 6, 2023

View reviewed changes

kolyshkin mentioned this pull request Apr 6, 2023

CI Flaky test: not ok 98 ps after the container stopped [centos-7] #3744

Closed

thaJeztah reviewed Apr 7, 2023

View reviewed changes

kolyshkin dismissed AkihiroSuda’s stale review via f4e94f3 April 7, 2023 00:35

kolyshkin force-pushed the sd-rm-race branch 2 times, most recently from f4e94f3 to 463675a Compare April 7, 2023 00:39

kolyshkin force-pushed the sd-rm-race branch from 463675a to 78d5ec3 Compare April 7, 2023 00:56

kolyshkin force-pushed the sd-rm-race branch from 78d5ec3 to b229e2a Compare April 11, 2023 20:58

thaJeztah approved these changes Apr 19, 2023

View reviewed changes

kolyshkin force-pushed the sd-rm-race branch 2 times, most recently from be1682d to fe278b9 Compare April 21, 2023 00:59

kolyshkin mentioned this pull request Apr 21, 2023

runc-kill(8): amend the --all description #3834

Merged

kolyshkin requested a review from AkihiroSuda April 21, 2023 01:01

AkihiroSuda approved these changes Apr 21, 2023

View reviewed changes

AkihiroSuda merged commit dac3852 into opencontainers:main Apr 21, 2023

This comment was marked as off-topic.

Sign in to view

kolyshkin mentioned this pull request May 22, 2023

[1.1] libct: fix a race with systemd removal #3877

Merged

kolyshkin added backport/1.1-done A PR in main branch which has been backported to release-1.1 and removed backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels May 22, 2023

Conversation

kolyshkin commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented Apr 6, 2023

Uh oh!

lifubang Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

thaJeztah Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

kolyshkin Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

lifubang commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kolyshkin commented Apr 7, 2023

Uh oh!

lifubang commented Apr 8, 2023

Uh oh!

kolyshkin commented Apr 11, 2023

Uh oh!

lifubang commented Apr 12, 2023

Uh oh!

kolyshkin commented Apr 18, 2023

Uh oh!

kolyshkin commented Apr 18, 2023

Uh oh!

giuseppe commented Apr 18, 2023

Uh oh!

kolyshkin commented Apr 18, 2023

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Apr 19, 2023

Uh oh!

kolyshkin commented Apr 21, 2023

Uh oh!

This comment was marked as off-topic.

Uh oh!

kolyshkin commented May 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kolyshkin commented Apr 5, 2023 •

edited

Loading

lifubang commented Apr 6, 2023 •

edited

Loading