[WIP] Refactor cgroup handling#416
[WIP] Refactor cgroup handling#416jingxiaolu wants to merge 2 commits intokata-containers:masterfrom jingxiaolu:refactor_cgroup_handling
Conversation
|
This PR is re-submitted according to @sboeuf's comments at #405 . @sboeuf @jodh-intel @grahamwhaley @devimc although is PR is still WIP, could you help to pay some time on it? Many thanks~ cc\ @WeiZhang555 @jshachm |
|
CI is failing, let me fix it first... |
| packages = [ | ||
| "libcontainer/configs", | ||
| "libcontainer/cgroups", | ||
| "libcontainer/cgroups/fs", |
There was a problem hiding this comment.
Why is "libcontainer/cgroups/systemd" not involved? Is it unnecessary?
There was a problem hiding this comment.
Should we support systemd containers? If yes, I'll add "libcontainer/cgroups/systemd"~
| @@ -0,0 +1,111 @@ | |||
| // +build linux | |||
There was a problem hiding this comment.
Could you remove this comment and just rename the file to be virtcontainers/cgroups_linux.go as that's much clearer imho.
There was a problem hiding this comment.
And, you need a new file named "virtcontainers/cgroups_unsupported.go" contains exactly same interface with current implementation(old codes) for other platforms other than linux. Or it won't compile on ppc64le
There was a problem hiding this comment.
You can imitating
vendor/github.com/opencontainers/runc/libcontainer/cgroups/cgroups_unsupported.go
@@ -0,0 +1,3 @@
+// +build !linux
+
+package cgroups
| } | ||
|
|
||
| state, _ := s.storage.fetchSandboxState(s.id) | ||
| if state.CgroupPaths == nil { |
There was a problem hiding this comment.
I don't really understand this code. Why can't it be just...
state, err := s.storage.fetchSandboxState(s.id)
if err != nil {
return err
}
cgm.libcontainerManager = &fs.Manager{
Cgroups: cgm.libcontainerConfig.Cgroups,
Paths: state.CgroupPaths,
}
Maybe a comment in the code would help here.
There was a problem hiding this comment.
Accepted~ Code is more beautiful now~
But I think we can ignore the err of fetchSandboxState() here, because if state.CgroupPaths is nil, it means we are in the first container creation. That's why I check if state.CgroupPaths == nil.
I'll update as this, so the first container creation, state.CgroupsPaths is nil:
state, _ := s.storage.fetchSandboxState(s.id)
cgm.libcontainerManager = &fs.Manager{
Cgroups: cgm.libcontainerConfig.Cgroups,
Paths: state.CgroupPaths,
}
| return fmt.Errorf("apply %d to host cgroups of sandbox %s failed with %s", shimPid, s.id, err) | ||
| } | ||
|
|
||
| if cgm.libcontainerConfig == nil { |
There was a problem hiding this comment.
Shouldn't this test be the first one in the function (fail fast)?
| } | ||
|
|
||
| // deleteSandbox cleanup cgroup folders | ||
| func (cgm *cgroupsManager) deleteSandbox(s *Sandbox) { |
There was a problem hiding this comment.
It looks like this function should return an error as there are error scenarios it has to deal with?
There was a problem hiding this comment.
I think when deletion, we shouldn't return error to break the shutdown procedure, that's why I just report warning here~
|
PSS Measurement: Memory inside container: |
|
|
||
| // newManager setup cgroup manager for sandbox | ||
| func (cgm *cgroupsManager) newManager(s *Sandbox) error { | ||
| ociConfigStr, err := s.Annotations(annotations.ConfigJSONKey) |
There was a problem hiding this comment.
Where is s.Annotations() defined? I can't find it in the PR. Ideally we should only unmarshal the OCI json once for each container, since it is really slow... And for an empty sandbox w/o containers (e.g. in the CRI case), there is no such container OCI spec. You need to handle that case as well. So IMO the cgroup manager needs to be created upon first container creation instead.
There was a problem hiding this comment.
So we should create this "cgroupManager" when the first container is created, but not the first sandbox is created?~
There was a problem hiding this comment.
Yes, because you rely on a container OCI spec to create the cgroup manager and we won't have a container spec until the first container is to be created.
There was a problem hiding this comment.
Currently the only place unmarshaling the OCI json is in kata-agent level, but I think newManager() should be called at sandbox level.
I would like to:
- add a pointer named
ociSpecatSandbox struct; - unmarshal in
newSandbox()and assign it toociSpec; - when
createContainer()inkata-agentlevel, get it fromociSpec;
Please share your comments, thanks~
There was a problem hiding this comment.
-
Yes, a pointer to the first ocispec in
Sandboxmakes sense since it avoids unmarshalling the same json twice. -
As I stated above, one issue with
newManager()innewSandbox()is that there may be no containers in sandboxConfig. Then you do not have the OCI spec you need to create the cgroup manager. The right place to do it is increateContainer(). -
Yes, it makes sense.
|
|
||
| // addContainer adding shim pid of container to sandbox's host cgroups | ||
| func (cgm *cgroupsManager) addContainer(c *Container) error { | ||
| shimPid := c.process.Pid |
There was a problem hiding this comment.
Need to check for shimPid > 0 to exclude the builtin shim case.
There was a problem hiding this comment.
accepted~ 👍
when shimPid == 0, should we just return with nil or report error?
There was a problem hiding this comment.
shimPid == 0 happens in noop_shim case. You can just return nil IMO.
|
|
||
| // addSandbox adding shim pid to host cgroups and set the resource limitation with cgroups | ||
| func (cgm *cgroupsManager) addSandbox(s *Sandbox) error { | ||
| shimPid := s.state.Pid |
There was a problem hiding this comment.
What is s.state.Pid? A sandbox does not have a corresponding shim. All shims are associated with containers instead.
There was a problem hiding this comment.
sandbox.state.pid is shim pid of the first container in the pod, in other word means pause container.
There was a problem hiding this comment.
First, a sandbox can be empty in which case there is no container in it at all. Secondly you cannot assume the first container in a sandbox is always a pause container that never quits. Such assumption breaks with docker and frakti case.
There was a problem hiding this comment.
Sry for missing for frakti case~
| sandbox.setSandboxPid(c.process.Pid) | ||
| } | ||
|
|
||
| if ann[annotations.ContainerTypeKey] == string(PodContainer) { |
There was a problem hiding this comment.
PodSandbox and PodContainer are both annotations in the kata CLI to get the missing sandbox abstraction from runc compatible command lines. There is no need to use such annotation in virtcontainers, where we know clearly about sandbox vs. containers.
There was a problem hiding this comment.
which means: once createContainer() is called, we're clearly creating a container, I don't need to check PodContainer or what, just add the container's pid to cgroup.
Am I clear?~
There was a problem hiding this comment.
Yes, your understanding is correct.
|
|
||
| shmSize uint64 | ||
|
|
||
| cgroups cgroupsManager |
There was a problem hiding this comment.
Yes, I adding this to enclose the implementations and data of cgroups handlings in cgroups.go (will be cgroups_linux.go and cgroups_unsupported.go)
|
@jingxiaolu is currently too busy and can't get enough time on this. I'll carry his work and take over the process. |
|
Thanks @WeiZhang555 - branch needs updating due to conflicts too btw. |
|
@WeiZhang555 Any updates on this one? |
|
@bergwolf I'll update it soon, didn't get enough time on it in recent days. |
|
Ping @WeiZhang555 :) |
|
@WeiZhang555 I bet you're very busy, but just checking if this is something you're planning to look at? |
|
@sboeuf Sorry for delay, let me try to finish it in several days. It is truly blocked for so many months! |
|
Closing this. New implementation is #734 |
# Kata Containers 1.4.0
runtime: consolidate network types definition
According to #344, this PR is trying to refactor cgroups handling in
runtime.What I've done in this PR:
libcontainer/cgroupspackage for cgroups handling;libcontainer/cgroups;Works to be continue:
UpdateContainer();Fixes: #344
Signed-off-by: Jingxiao Lu lujingxiao@huawei.com