[RFC]cgroup: Move all qemu to sandbox cgroup by jcvenegas · Pull Request #1431 · kata-containers/runtime

jcvenegas · 2019-03-27T22:07:42Z

Instead of move only vcpus, move qemu process.

This will reduce netowork and IO perfomance for limited
cpus pods but will provide better perfomance isolation to
not be noisy neighbor.

Fixes: #1430

Signed-off-by: Jose Carlos Venegas Munoz jose.carlos.venegas.munoz@intel.com

Instead of move only vcpus, move qemu process. This will reduce netowork and IO perfomance for limited cpus pods but will provide better perfomance isolation to not be noisy neighbor. Fixes: kata-containers#1430 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>

jcvenegas · 2019-03-27T22:32:15Z

@Weichen81 in your first implementation you added this optimization, I'd like to know the your comments about this. @kata-containers/architecture-committee please take a look.

jcvenegas · 2019-03-27T22:35:55Z

/test

mcastelino

This makes sense.

To clarify as long as all other containers are running without requiring Guaranteed QoS class and static CPU manager policy this will have no impact?

WeiZhang555 · 2019-03-28T02:25:59Z

Adding only vCPUs is first imported by me I guess: #1189 (comment)

There's a gap between Container world and Virtualization world, namespace based containers normally have only tiny overhead which can be ignored, virtualization based container(AKA kata) will have considerable overhead from hypervisor.

Before this commit, resource limit is only applied to vCPU thread. After this commit, qemu IO thread will be applied too.

This will bring two questions:

if user pay money for 2 cpus, should we give two cpus to GuestOS or GuestOS+Qemu considering qemu has considerable performance overhead?
if IO thread and vCPUs shares 2 cpus, the race will decrease IO performance obviously especially for small chunks IO r/w

@egernst mentioned he is working on a new proposal on K8s that can add POD overhead to Pod spec, I hope it can help remove the gap of the overhead.

This commit can help solve the noisy neighbor problem, but I doubt if it's worth the performance downgrading.

Let's see what others think about it.

@kata-containers/runtime @kata-containers/architecture-committee

bergwolf

IMO, the change will actually break user expectation when they require dedicated CPUs for their application through k8s static cpu manager. I think we need to look at a bigger picture and see how to integrate with k8s cpu manager policies as being discussed in #878

bergwolf · 2019-03-28T02:47:59Z

-		}); err != nil {
-			return err
-		}
+	if err := cgroup.Add(cgroups.Process{Pid: pid}); err != nil {


The pid is also added above to a no constraints cgoup. There is no need to do it anymore.

no constrains cgroup only adds memory subsystem, to actually limit qemu in the rest of subsystems we need to add to this cgroup.

But makes me wonder: if it is better to just add qemu to only to the cpuset subystem and not all the subsystems that has cgroup.? This way when a container is limited by cpu qouta and but not cpuset will not have performance limitations.

Yes, we should not share container CPU quota with the qemu process. Please take a look at #878 (comment).

That is an interesting idea. This require push make k8s a bit more aware of kata containers. Sounds like a more longer final solution. But always that the ecosystem consider this limitations we have in Kata is better. I can bring this topic to next architecture meeting.

Hi @jcvenegas - did this happen? If so, can you summarise the discussion here?

Ping @jcvenegas.

jcvenegas · 2019-03-28T05:02:34Z

Errors:

Nemu CI error, failed to stop container

18:18:36     assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:274
18:18:36         
			container_stats_test.go:323
18:18:36         
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to stop container "b093d1c00d45e69d49b4772c563198e535faedf0f1ad813a93c30b513dab9e1f": container not running: unknown
18:18:36         
18:18:36     assertions.go:237: 
                        
	Error Trace:	container_stats_test.go:270
18:18:36         
			container_stats_test.go:323
18:18:36         
	Error:		Received unexpected error rpc error: code = Unknown desc = failed to set removing state for container "b093d1c00d45e69d49b4772c563198e535faedf0f1ad813a93c30b513dab9e1f": container is still running, to stop first

Metrics memory degradation

16:54:29 Report Summary:
16:54:29 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
16:54:29 | P/F |         NAME         |  FLR  |  MEAN  |  CEIL  |  GAP  |  MIN   |  MAX   |  RNG  | COV  | ITS |
16:54:29 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
16:54:29 | P   | boot-times           | 95.0% | 101.1% | 105.0% | 10.0% | 93.9%  | 110.9% | 18.2% | 3.9% |  20 |
16:54:29 | P   | memory-footprint     | 95.0% | 102.1% | 105.0% | 10.0% | 102.1% | 102.1% | 0.0%  | 0.0% |   1 |
16:54:29 | *F* | memory-footprint-ksm | 95.0% | 121.4% | 105.0% | 10.0% | 121.4% | 121.4% | 0.0%  | 0.0% |   1 |
16:54:29 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
16:54:29 Fails: 1, Passes 2

Failed factory cleanup

+ [[ 1553731386 -le 1553731386 ]]
+ sudo docker rm -f kata-l2bphF
kata-l2bphF
+ destroy_vm_template
+ echo 'destroy vm template'
destroy vm template
+ sudo -E PATH=/tmp/jenkins/workspace/kata-containers-runtime-ubuntu-18-04-PR-initrd/go/bin:/usr/local/go/bin:/usr/sbin:/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games /usr/local/bin/kata-runtime factory destroy
vm factory destroyed
++ mount
++ grep template
++ wc -l
+ res=1
+ '[' 1 -eq 0 ']'
+ die 'template factory is not cleaned up'
+ msg='template factory is not cleaned up'
+ echo 'ERROR: template factory is not cleaned up'
ERROR: template factory is not cleaned up
+ exit 1
Makefile:123: recipe for target 'vm-factory' failed

Suse failed due to network issue restating

fedora & vsock docker state timeout, failed in 2 fedora jobs race condition?

http://jenkins.katacontainers.io/job/kata-containers-runtime-fedora-vsocks-PR/464/console
http://jenkins.katacontainers.io/job/kata-containers-runtime-fedora-PR/1754/console

• Failure [16.040 seconds]
state
/tmp/jenkins/workspace/kata-containers-runtime-fedora-vsocks-PR/go/src/github.com/kata-containers/tests/functional/state_test.go:26
  container
  /tmp/jenkins/workspace/kata-containers-runtime-fedora-vsocks-PR/go/src/github.com/kata-containers/tests/vendor/github.com/onsi/ginkgo/extensions/table/table.go:92
    with workload [true], timeWait 5 [It]
    /tmp/jenkins/workspace/kata-containers-runtime-fedora-vsocks-PR/go/src/github.com/kata-containers/tests/vendor/github.com/onsi/ginkgo/extensions/table/table_entry.go:46

    Expected
        <int>: 1
    to equal
        <int>: 0

    /tmp/jenkins/workspace/kata-containers-runtime-fedora-vsocks-PR/go/src/github.com/kata-containers/tests/functional/state_test.go:45
------------------------------
SSSSSSS

Summarizing 1 Failure:

[Fail] state container [It] with workload [true], timeWait 5

ARM linter issues, not related

18:00:56 cli/kata-check.go:57:2: `genericCPUModelField` is unused (deadcode)
18:00:56 	genericCPUModelField  = "model name" // nolint: varcheck, unused
18:00:56 	^
18:01:08 Build step 'Execute shell' marked build as failure
18:01:08 Performing Post build task...

jcvenegas · 2019-03-28T05:38:03Z

There's a gap between Container world and Virtualization world, namespace based containers normally have only tiny overhead which can be ignored, virtualization based container(AKA kata) will have considerable overhead from hypervisor.

Agree with this it actually comes from this gap. If I could rework how Kuberentes schedule and request pods to CRI I add overhead sandbox limits and if a container creation request has the podsandbox annotation. The container from that request will limit the sandbox aditional bits qemu proxy and more aditional things.

if user pay money for 2 cpus, should we give two cpus to GuestOS or GuestOS+Qemu considering qemu has considerable performance overhead?

This is a tricky question if a usr pays for 2 cpus I'd give him one aditional more for free because I know there are overhead xD.

if IO thread and vCPUs shares 2 cpus, the race will decrease IO performance obviously especially for small chunks IO r/w

@egernst mentioned he is working on a new proposal on K8s that can add POD overhead to Pod spec, I hope it can help remove the gap of the overhead.

This kind of overhead or scheduling is a good item to add to the proposal +1, If kubernetes could assign a best effort cgroup for iotrheads would be awesome :).

This commit can help solve the noisy neighbor problem, but I doubt if it's worth the performance downgrading.

I understand being worried requested sandbox resources the same happens with memory we do not consider the extra kernel or agent memory usage to provide memory to the containers again @egernst proposal sounds like the key for it. This topic is like a double-edged sword, on one side the container has to pay for the sandbox overhead. In the other we limit to avoid this overhead do not be noisy and keep the promise of use only the resources what was assigned to.

Something I need to confirm is its k8s restricts on its cgroup hierarchy at pod level the cpusets I dont think so but that would simplify the issue a bit.

Let's see what others think about it.

@kata-containers/runtime @kata-containers/architecture-committee

jshachm · 2019-04-03T04:29:30Z

This will lead a problem which we have met in production.
If we support cgroup-parent in the future. The kubelet resource manager will set a limit and require for pod or called sandbox here which will lead a oom very easily.

raravena80 · 2019-04-18T00:57:15Z

@jcvenegas nudge.

raravena80 · 2019-05-24T16:10:53Z

@jcvenegas any updates? Thx!

caoruidong · 2019-06-17T10:02:06Z

ping @jcvenegas

raravena80 · 2019-06-27T03:52:04Z

Ping @jcvenegas

egernst · 2019-08-28T07:39:34Z

Closing the PR -- this is handled in #1880

jcvenegas requested review from WeiZhang555, devimc, egernst and mcastelino March 27, 2019 22:07

jcvenegas force-pushed the apply-cgroups-to-all-qemu branch from 0c4caed to e631f2b Compare March 27, 2019 22:09

jcvenegas changed the title ~~[RFC]cgrups: Move all qemu to sandbox cgroup~~ [RFC]cgroup: Move all qemu to sandbox cgroup Mar 27, 2019

mcastelino approved these changes Mar 27, 2019

View reviewed changes

bergwolf suggested changes Mar 28, 2019

View reviewed changes

jodh-intel mentioned this pull request Jul 29, 2019

cgroups: Use only pod cgroup #1880

Merged

egernst closed this Aug 28, 2019

jcvenegas deleted the apply-cgroups-to-all-qemu branch January 23, 2020 19:46

Conversation

jcvenegas commented Mar 27, 2019

Uh oh!

jcvenegas commented Mar 27, 2019

Uh oh!

jcvenegas commented Mar 27, 2019

Uh oh!

mcastelino left a comment

Choose a reason for hiding this comment

Uh oh!

WeiZhang555 commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bergwolf left a comment

Choose a reason for hiding this comment

Uh oh!

bergwolf Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

jcvenegas Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

bergwolf Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

jcvenegas Mar 28, 2019

Choose a reason for hiding this comment

Uh oh!

jodh-intel Jun 24, 2019

Choose a reason for hiding this comment

Uh oh!

jodh-intel Aug 1, 2019

Choose a reason for hiding this comment

Uh oh!

jcvenegas commented Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcvenegas commented Mar 28, 2019

Uh oh!

jshachm commented Apr 3, 2019

Uh oh!

raravena80 commented Apr 18, 2019

Uh oh!

raravena80 commented May 24, 2019

Uh oh!

caoruidong commented Jun 17, 2019

Uh oh!

raravena80 commented Jun 27, 2019

Uh oh!

egernst commented Aug 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

WeiZhang555 commented Mar 28, 2019 •

edited

Loading

jcvenegas commented Mar 28, 2019 •

edited

Loading