Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

fix container fail to start#331

Closed
Ace-Tang wants to merge 1 commit intokata-containers:masterfrom
Ace-Tang:fix_start_fail
Closed

fix container fail to start#331
Ace-Tang wants to merge 1 commit intokata-containers:masterfrom
Ace-Tang:fix_start_fail

Conversation

@Ace-Tang
Copy link
Contributor

copyParentCPUSet copies the cpuset.cpus and cpuset.mems
from the parent, but if parent directory's file content
is also zero, it will fail. Recursive copy data from
parent directory until cpuset cgroup root path.

Modify variable cgroupsDirPath to cgroupsRootPath(cli/oci.go),
make it more reasonable and not conflict with cgroupsDirPath(cli/create.go).

fixes: #330

Signed-off-by: Ace-Tang aceapril@126.com

@jshachm
Copy link
Member

jshachm commented May 22, 2018

@Ace-Tang Plz use modules: summary to generate commit message~~~

@sboeuf
Copy link

sboeuf commented May 22, 2018

/cc @devimc

@Ace-Tang
Copy link
Contributor Author

@jshachm , thanks for reminding, I have update commit message.

@devimc
Copy link

devimc commented May 22, 2018

CI is not happy

docker: Error response from daemon: OCI runtime create failed: Fail to copy cpuset from parent: open /sys/fs/cgroup/cpu/docker/cpuset.cpus: no such file or directory: unknown.

and this patch makes me wonder if we need integration tests for pouch

@Ace-Tang
Copy link
Contributor Author

docker: Error response from daemon: OCI runtime create failed: Fail to copy cpuset from parent: open /sys/fs/cgroup/cpu/docker/cpuset.cpus: no such file or directory: unknown.

the data should copy from /sys/fs/cgroup/cpuset/docker/cpuset.cpus, that make the error happen, I check my machine, the cpu cgroup is link to cpuset,cpu,cpuacct, so I am not get this error.

lrwxrwxrwx  1 root root 18 May  8 03:52 cpu -> cpuset,cpu,cpuacct

copyParentCPUSet copies the cpuset.cpus and cpuset.mems
from the parent, but if parent directory's file content
is also zero, it will make pid write into cpuset cgroup
error. So recursive copy data from parent directory until
cpuset cgroup root path.

Modify variable cgroupsDirPath to cgroupsRootPath(cli/oci.go),
make it more reasonable and not conflict with cgroupsDirPath(cli/create.go).

Fixes #330

Signed-off-by: Ace-Tang <aceapril@126.com>
@katabuilder
Copy link

PSS Measurement:
Qemu: 143850 KB
Proxy: 6765 KB
Shim: 8931 KB

Memory inside container:
Total Memory: 2045972 KB
Free Memory: 2013040 KB

@codecov
Copy link

codecov bot commented May 23, 2018

Codecov Report

Merging #331 into master will decrease coverage by 0.11%.
The diff coverage is 24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
- Coverage   63.89%   63.78%   -0.12%     
==========================================
  Files          87       87              
  Lines        8623     8640      +17     
==========================================
+ Hits         5510     5511       +1     
- Misses       2529     2545      +16     
  Partials      584      584
Impacted Files Coverage Δ
cli/create.go 72.37% <0%> (-7.02%) ⬇️
cli/oci.go 90.81% <85.71%> (ø) ⬆️
cli/config.go 89.09% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 086d197...f68f894. Read the comment docs.

@katabuilder
Copy link

PSS Measurement:
Qemu: 141854 KB
Proxy: 4722 KB
Shim: 10939 KB

Memory inside container:
Total Memory: 2045972 KB
Free Memory: 2011676 KB

@devimc
Copy link

devimc commented May 23, 2018

@Ace-Tang please try to fix code coverage

lgtm

thanks @Ace-Tang

Approved with PullApprove

@sboeuf
Copy link

sboeuf commented May 24, 2018

@devimc I don't follow why you approved this PR while your comment on the corresponding issue #330 (comment) was saying that you could not reproduce ?

@devimc
Copy link

devimc commented May 24, 2018

@sboeuf read the description of the problem

I test container with pouch, a container engine like moby.
Start a kata container with a clean cgroup path which has no child directory, it will fail

I couldn't produce it because of I'm using docker not pouch #330 (comment), this patch doesn't impact the compatibility with docker but fixes the compatibility with pouch, right @Ace-Tang ?

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for @Ace-Tang response

@Ace-Tang
Copy link
Contributor Author

No, @devimc , the error has not related whether you use pouch or mody, I also re-produce it with mody, as I comment in #330 .

Maybe it is linux distro related, could you try again on a centos mechine, @devimc , I make a simple test on ubuntu machine, seems like ubuntu will inherit cpuset.cpus and cpuset.mems but centos does not.

root@www:/sys/fs/cgroup/cpuset# mkdir newtest
root@www:/sys/fs/cgroup/cpuset# cd newtest/
root@www:/sys/fs/cgroup/cpuset/newtest# cat cpuset.cpus
0-3

here is my test with moby

#docker run --runtime=kata-runtime -d  719f2024853e top
5de544c966b7e3e7ca3e3b0af96c9c4c1f884c1cd30b296aa713a55844c98f01

#docker run --runtime=kata-runtime -d --cgroup-parent=cleanparent 719f2024853e top
c00494cb9a7fa516d56972a04507df6653a9cf0dcc55ebc48e55c24d5c473c14
docker: Error response from daemon: OCI runtime create failed: write /sys/fs/cgroup/cpu/cleanparent/c00494cb9a7fa516d56972a04507df6653a9cf0dcc55ebc48e55c24d5c473c14/tasks: no space left on device: unknown.

#cat /etc/redhat-release 
Alibaba Group Enterprise Linux Server release 7.2 (Paladin)

the moby version is :

#docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:20:16 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:23:58 2018
  OS/Arch:      linux/amd64
  Experimental: false

the kata version is

[root@z07a05275.sqa.zth /home/huamin.thm]
#kata-runtime -v
kata-runtime  : 0.3.0
   commit   : 2245e67f934edaa2d410384b29ab07cca305f363
   OCI specs: 1.0.1

@devimc
Copy link

devimc commented May 25, 2018

sorry @Ace-Tang I can't reproduce it in centOS

$ docker run --cgroup-parent=cleanparent --runtime=kata-runtime -d centos top
3e7ee9c50d8fe76c58a2fc8bf566772e23434aef0c710ad767335a16c8001f39
$ docker run --cgroup-parent=cleanparent --cpus 3 --runtime=kata-runtime -d centos top
ede36031486ddd7cb29cd0caaae123a13e1f0c917b840cb230bd615978495067

centOS info

$ cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

docker info

$ docker info
Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: kata-runtime runc
Default Runtime: kata-runtime
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 94b7982a1ea6bb511c74c8bc18ebfb194e87b33a (expected: 4fc53a81fb7c994640722ac585fa9ca548971871)
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.2.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.701GiB
Name: thoughtful-glatisant
ID: XXXXXXXXXXXXXXXXxx
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 19
 Goroutines: 33
 System Time: 2018-05-25T13:01:42.683880895Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@sboeuf
Copy link

sboeuf commented May 31, 2018

What's the status on this ?

@egernst
Copy link
Member

egernst commented Jun 11, 2018

@Ace-Tang - any updates here?

@jshachm
Copy link
Member

jshachm commented Jun 20, 2018

@Ace-Tang any new updates here ? And maybe this pr is now related to #416 since the whole hostcgroup will be refactored

@egernst
Copy link
Member

egernst commented Jul 23, 2018

@Ace-Tang - I appreciate your efforts, but see this has gone very stale. I'm going to close the PR. If you have updates, please reopen! Thanks for the contributions.

@egernst egernst closed this Jul 23, 2018
@Ace-Tang Ace-Tang deleted the fix_start_fail branch July 24, 2018 01:57
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
agent: add support for online memory and cpu separately
fidencio added a commit to fidencio/kata-runtime that referenced this pull request Sep 28, 2020
Fixes: #0

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kata container fail to start

6 participants