Skip to content

fix path error in systemd when stopped#2338

Merged
kolyshkin merged 2 commits intoopencontainers:masterfrom
lifubang:systemdcgroupv2
Jun 16, 2020
Merged

fix path error in systemd when stopped#2338
kolyshkin merged 2 commits intoopencontainers:masterfrom
lifubang:systemdcgroupv2

Conversation

@lifubang
Copy link
Member

@lifubang lifubang commented Apr 21, 2020

fix #2337

Because when the container is in stopped state, the cgroup path has been deleted by systemd driver.

Signed-off-by: lifubang lifubang@acmcoder.com

@lifubang
Copy link
Member Author

lifubang commented Apr 21, 2020

This pr only fix cgroup v2.
For cgroup v1, it also has this problem, shall we need to fix it in cgroup v1?

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather check if path exists in callers, whenever deemed necessary (or ignore ENOENT in some cases)

@lifubang lifubang force-pushed the systemdcgroupv2 branch 3 times, most recently from 0e410f7 to 157518f Compare April 23, 2020 13:45
@lifubang
Copy link
Member Author

I'd rather check if path exists in callers

Force pushed.
And the label is "area/systemd", not "area/cgroupv2". @kolyshkin

@lifubang lifubang force-pushed the systemdcgroupv2 branch 4 times, most recently from f90a6ec to cd9408b Compare April 24, 2020 08:47
@lifubang
Copy link
Member Author

travis-ci green now!

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to add Exists() method to cgroupManager and use it.

}
}

pids, err = c.cgroupManager.GetAllPids()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this already be able to have a check for os.IsNotExist(err) ? (Also wondering if c.cgroupManager.GetAllPids() should ignore os.IsNotExist(err) by default (if the path doesn't exist, just return an empty list of pids ?

Copy link
Member Author

@lifubang lifubang May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this already be able to have a check for os.IsNotExist(err) ?

Yes, but in fs cgroup driver, there is no error when the container is in stopped state. And I think we should not ignore IsNotExist error when it is not in stopped state or it has joined a new pid namespace without path is not in a private pid namespace.

(Also wondering if c.cgroupManager.GetAllPids() should ignore os.IsNotExist(err) by default (if the path doesn't exist, just return an empty list of pids ?

I think we should not ignore IsNotExist error when it is not in stopped state or it has joined a new pid namespace without path is not in a private pid namespace

@lifubang lifubang force-pushed the systemdcgroupv2 branch 4 times, most recently from 146afb1 to 9e6d835 Compare May 3, 2020 12:02
@kolyshkin
Copy link
Contributor

kolyshkin commented May 14, 2020

@lifubang this needs a rebase

@lifubang lifubang force-pushed the systemdcgroupv2 branch 2 times, most recently from d946961 to b565295 Compare May 15, 2020 22:57
pid := 1
stat, err := system.Stat(pid)
if err != nil {
t.Fatalf("can't state pid %d", pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/state/stat/
nit: include the actual error as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cgroupManager: &mockCgroupManager{
allPids: []int{1, 2, 3},
paths: map[string]string{
"device": "/proc/self/cgroups",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks kind of weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mock data, I don't have any idea to improve it.


func (m *manager) Exists() bool {
paths := m.GetPaths()
return cgroups.PathExists(paths["devices"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just do

return cgroups.PathExists(m.Path("devices"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func (m *legacyManager) Exists() bool {
path, err := getSubsystemPath(m.cgroups, "devices")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSubsystemPath is super expensive, it parses the whole /proc/self/mountinfo.

I think you should be using m.Path("devices") here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Jun 2, 2020

@lifubang PTAL at Kir's comments. Also, can we have an integration test?

When we use cgroup with systemd driver, the cgroup path will be auto removed
by systemd when all processes exited. So we should check cgroup path exists
when we access the cgroup path, for example in `kill/ps`, or else we will
got an error.

Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang
Copy link
Member Author

lifubang commented Jun 2, 2020

@lifubang PTAL at Kir's comments. Also, can we have an integration test?

finished now, PTAL.


@test "ps after the container stopped" {
# ps is not supported, it requires cgroups
requires root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should suppprt rootless cgroup

@AkihiroSuda
Copy link
Member

CI failing

Signed-off-by: lifubang <lifubang@acmcoder.com>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

func (m *mockCgroupManager) Exists() bool {
paths := m.GetPaths()
if paths != nil {
_, err := os.Lstat(paths["devices"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is technically OK to use m.GetPaths() here, and it's a mock code so it doesn't really matter, I'd still like to have m.Path("devices") used here, because since commit 714c91e we're not supposed to use GetPaths() for anything other than state save/restore.

}

func (m *manager) Exists() bool {
return cgroups.PathExists(m.paths["devices"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't matter in practice, but we're not supposed to access a map without holding a lock. This is why I have suggested using m.Path("devices") earlier -- it takes a lock before accessing m.paths.

Alternatively, you can leave this code as is but add taking a lock (same as Path()).

}

func (m *legacyManager) Exists() bool {
return cgroups.PathExists(m.paths["devices"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits about using Path() vs GetPaths() and locking when accessing the m.paths map in v1 code.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 11, 2020

@lifubang could you update the PR?

@AkihiroSuda
Copy link
Member

@kolyshkin @mrunalp
Let's merge this as-is and cover the nits in a follow-up PR.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2020

I am fine with that.

@AkihiroSuda
Copy link
Member

@kolyshkin WDYT?

@kolyshkin kolyshkin merged commit 5b247e7 into opencontainers:master Jun 16, 2020
@kolyshkin kolyshkin mentioned this pull request Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

systemd: ps/kill can't work when container is in stopped state

5 participants