test: Make aware test pod cgroup option.#1824
Conversation
92a5698 to
b9d958f
Compare
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime ? * Hypervisor This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. $ kata-runtime overhead Memory: 200 MB ( Proxy, VM and Shim) This probably will be hardcoded (as config or code ) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com>
b9d958f to
94b106a
Compare
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime ? * Hypervisor This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. $ kata-runtime overhead Memory: 200 MB ( Proxy, VM and Shim) This probably will be hardcoded (as config or code ) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com>
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime * Hypervisor Move hypervisor and other process after they are launched does not provide an expected behavior for resource accounting. If a process is moved after is created the cgroup will start to account resources that are used after moved. For better host accounting move the runtime to the cgroup sandbox before anything is created this way all the processes created by the runtime are accounted as expected. limitation for very small memory cgroups the runtime may not work as expected as the process are going to get OOM. TODO: A minimal pod cgroup memory should be defined and validated. This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. By checking the pod cgroup directly cat /sys/fs/cgroup/<type>/pod-cgroup/data Intra conainer usage is still possible to get via kata-runtime events Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. TODO: - [ ] Provide a minimal expected kata memory usage. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. ``` $ kata-runtime overhead This probably will be hardcoded (as config or code ) Memory: 200 MB ( Proxy, VM and Shim) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. ``` OR ``` Daynamic: cgroup usage - containers inside guest usage ``` - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox-<cid-sandboxcontainer> 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com>
94b106a to
4e97883
Compare
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime * Hypervisor Move hypervisor and other process after they are launched does not provide an expected behavior for resource accounting. If a process is moved after is created the cgroup will start to account resources that are used after moved. For better host accounting move the runtime to the cgroup sandbox before anything is created this way all the processes created by the runtime are accounted as expected. limitation for very small memory cgroups the runtime may not work as expected as the process are going to get OOM. TODO: A minimal pod cgroup memory should be defined and validated. This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. By checking the pod cgroup directly cat /sys/fs/cgroup/<type>/pod-cgroup/data Intra conainer usage is still possible to get via kata-runtime events Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. TODO: - [ ] Provide a minimal expected kata memory usage. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. ``` $ kata-runtime overhead This probably will be hardcoded (as config or code ) Memory: 200 MB ( Proxy, VM and Shim) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. ``` OR ``` Daynamic: cgroup usage - containers inside guest usage ``` - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox-<cid-sandboxcontainer> 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com>
|
Ping @jcvenegas. |
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime * Hypervisor Move hypervisor and other process after they are launched does not provide an expected behavior for resource accounting. If a process is moved after is created the cgroup will start to account resources that are used after moved. For better host accounting move the runtime to the cgroup sandbox before anything is created this way all the processes created by the runtime are accounted as expected. limitation for very small memory cgroups the runtime may not work as expected as the process are going to get OOM. TODO: A minimal pod cgroup memory should be defined and validated. This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. By checking the pod cgroup directly cat /sys/fs/cgroup/<type>/pod-cgroup/data Intra conainer usage is still possible to get via kata-runtime events Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. TODO: - [ ] Provide a minimal expected kata memory usage. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. ``` $ kata-runtime overhead This probably will be hardcoded (as config or code ) Memory: 200 MB ( Proxy, VM and Shim) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. ``` OR ``` Daynamic: cgroup usage - containers inside guest usage ``` - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox-<cid-sandboxcontainer> 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com>
* config: add option to use only pod cgroup. Use option to use only pod cgroup. - Get pod Cgroup getting the parent of the SANDBOX container. - Do not create host cgroup for each container - Place all kata processese in the cgroup * Proxy * Shim * Runtime * Hypervisor Move hypervisor and other process after they are launched does not provide an expected behavior for resource accounting. If a process is moved after is created the cgroup will start to account resources that are used after moved. For better host accounting move the runtime to the cgroup sandbox before anything is created this way all the processes created by the runtime are accounted as expected. limitation for very small memory cgroups the runtime may not work as expected as the process are going to get OOM. TODO: A minimal pod cgroup memory should be defined and validated. This may reduce the perfomance of small pods as Hypervisor is totally constrained (not partially anymore to vcpus). It gives to the container administrator more acurrate data on the resource usage. By checking the pod cgroup directly cat /sys/fs/cgroup/<type>/pod-cgroup/data Intra conainer usage is still possible to get via kata-runtime events Memory limitations: Because VMM is contrained in the pod sandox it could be OOM so the "pod admin" should consider to add an extra memory for the overhead. TODO: - [ ] Provide a minimal expected kata memory usage. From Kata perspective we should investigate what is the overhead of Kata for at least memory as this is a critical resource that could get the all the pod worloads killed if the VM get OMM. Ideally the Guest memory is larger enough so first Container workloads running on it get killed and there is room to allow Kata in the host works correctly. It may be helpful to provide the pod overhead as part of cli/other api. ``` $ kata-runtime overhead This probably will be hardcoded (as config or code ) Memory: 200 MB ( Proxy, VM and Shim) So the runtime can check the pod cgroup and decide to fail nicely asking for more memory in the parent cgroup for correct functionality aditionally we could have metrics test to verify that it is possible to creat a true or bash command with that defined memory at least. ``` OR ``` Daynamic: cgroup usage - containers inside guest usage ``` - If the test fail due to change that increase the memory it should be updated. And ideally should be updated when this is reduced. info: docker cgroup before Directory /sys/fs/cgroup/cpu/docker/: └─kata_f12efaa8d43d55904f5fad8344a640536c74276bacff8bb8b0020fd056a4d5cc ├─29198 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f12efaa8d43d55904f5f... └─29221 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... - This mode Use all cgroups types in the pod sandbox This config enables the option to use the cgroup parent of the container with annotation of Sandbox. To make easy debug and CI just name it, kata-sandbox pod-cgroup kata-sandbox-<cid-sandboxcontainer> 2487 /opt/kata/bin/qemu-system-x86_64 -name sandbox-090142d008e431ac5d4a... 22494 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix://... 2654 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/s... Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Carlos Venegas <vmjcarlos@gmail.com> Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
* config: add option to use only pod cgroup (SandboxCgroupOnly)
1) Given a PodSanbox container creation, let
```
cgroupPod=Parent(cgroupPath)
```
and
```
KataSandboxCgroup=${podCgroup}/kata-sandbox-<PodSandbox-id>
```
2) Then create a cgroup the KataSandboxCgroup
3) Join to the KataSandboxCgroup
Any process created by the runtime will be created in KataSandboxCgroup
* Proxy
* Shim
* Runtime
* Hypervisor
Examples:
Limitations
* For very small memory cgroups the runtime may not work as
expected as the process are going to get OOM.
* This may reduce the perfomance of small pods, where
the pod cgroup is sized to the same resources of the containers (no
overhead is considered).
Depends-on: github.com/kata-containers/tests#1824
Fixes: kata-containers#1879
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
* config: add option to use only pod cgroup (SandboxCgroupOnly)
1) Given a PodSanbox container creation, let
```
cgroupPod=Parent(cgroupPath)
```
and
```
KataSandboxCgroup=${podCgroup}/kata-sandbox-<PodSandbox-id>
```
2) Then create a cgroup the KataSandboxCgroup
3) Join to the KataSandboxCgroup
Any process created by the runtime will be created in KataSandboxCgroup
* Proxy
* Shim
* Runtime
* Hypervisor
Examples:
Limitations
* For very small memory cgroups the runtime may not work as
expected as the process are going to get OOM.
* This may reduce the perfomance of small pods, where
the pod cgroup is sized to the same resources of the containers (no
overhead is considered).
Depends-on: github.com/kata-containers/tests#1824
Fixes: kata-containers#1879
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
89ed58f to
458cd9e
Compare
|
/test |
074d649 to
c6d3f09
Compare
|
/test |
| sudo systemctl start ${cri_runtime} | ||
| for i in {1..5}; do | ||
| sleep 5 | ||
| if [ -f "${cri_runtime_socket}" ]; then |
There was a problem hiding this comment.
I see some other white space cleanup, as will as this retry. I’m okay w the whitepace being included in the commit, but what’s this for?
There was a problem hiding this comment.
Because the cri-containerd job is running k8s two times, the contaienrd service is restarted and seems that kubeadm tries to connect and is not ready yet. I add a some retries to check the socket exist.
|
|
||
| echo "Replacing ${option_false} for ${option_true}" | ||
| # If is false | ||
| sudo sed -i "s,^${option_false},${option_true},g" "${KATA_ETC_CONFIG_PATH}" |
There was a problem hiding this comment.
would recommend to use crudini here. that way we don't need to verify if option is commented or true/false.
|
|
||
| if [ -f "${KATA_ETC_CONFIG_PATH}" ]; then | ||
| bk_file="${KATA_ETC_CONFIG_PATH}-${bk_suffix}" | ||
| echo "backup ${KATA_ETC_CONFIG_PATH} in ${bk_file}" |
add option to eneable only pod cgroup (SandboxCgroupOnly) Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
When a new sandbox is created, join to its cgroup path this will create all proxy, shim, etc in the sandbox cgroup. Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
When a new sandbox is created, join to its cgroup path this will create all proxy, shim, etc in the sandbox cgroup. Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
add option to eneable only pod cgroup (SandboxCgroupOnly) Depends-on: github.com/kata-containers/tests#1824 Fixes: kata-containers#1879 Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
ac225d8 to
7b4d416
Compare
|
/test |
|
kata-containers/runtime#1880 is merged now this can be merged too. |
7b4d416 to
9c965ec
Compare
|
/test |
9c965ec to
5ed0047
Compare
|
/test |
5ed0047 to
9d03488
Compare
|
/test |
|
@jodh-intel @egernst @devimc ready to merge PTAL |
|
|
||
| info "Validate option is enabled" | ||
| current_value=$(kata-runtime kata-env --json | jq ".Runtime.SandboxCgroupOnly") | ||
| if [ "$current_value" != "${option_value}" ]; then |
There was a problem hiding this comment.
Could be simplified to:
[ "$current_value" != "${option_value}" ] && die "The option was not updated"| @@ -0,0 +1,35 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
This script is very similar to the enable one. To avoid the code duplication, I'd be tempted to merge them together into .ci/toggle_sandbox_cgroup_only.sh (or similar) and require the caller to specify an enable or disable argument. That will guarantee the two scripts won't diverge in future.
There was a problem hiding this comment.
Done, looks better know thanks.
9d03488 to
547940b
Compare
|
/test |
- Add scripts to enable it if needed. - Add script to rolback config to enable it. - Add ci Job case for pod cgroup - Run again kuberentes test with pod cgroup enabled for K8S_CONTAINERD_JOB - Check in docker test if option is enabled to not check cgroups in the host. Fixes: kata-containers#1810 Signed-off-by: Jose Carlos Venegas Munoz <jcvenega@jcvenega-nuc.zpn.intel.com>
547940b to
29618b3
Compare
|
/test |
|
/test |
host.
Depends-on: github.com/kata-containers/runtime#1880
Fixes: #1810
Signed-off-by: Jose Carlos Venegas Munoz jcvenega@jcvenega-nuc.zpn.intel.com