Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.
This repository was archived by the owner on May 12, 2021. It is now read-only.

support for static configuration of k8s cpu manager - container level cpu affinity #878

@egernst

Description

@egernst

Description of problem

We only think in terms of vCPUS today, that is shares/quota.

Expected result

We need to look at CPU affinity, and in particular the Cpus field:

https://github.com/kubernetes-sigs/cri-o/blob/master/vendor/github.com/opencontainers/runtime-spec/specs-go/config.go#L304

In a default kubernetes configuration, this mask should be set for all CPUs. However, if the cpu-manager is configured as static, it is possible to start setting CPU affinities on a best effort basis on a container granularity. With this in place, you'll see specific masks unique to each container.

Actual result

Today, if a user were to setup a mixed cluster with runc and kata, the kata runtime ignores the CPU set passed in, resulting in the vCPU (and vhost) threads running across all available CPUs (no isol is in place - affinity is managed by kubelet itself). This would result in kata based containers not only not getting a performance tuned affinity, but it would also result in our containers likely utilizing CPUs which kubelet wanted dedicated.

Proposal

Mandatory:

The following would need to be done in order to make sure we aren't utilizing CPUs dedicated to other pods.

  • Augment virtcontainers to track Cpus field provided as part of the UpdateContainer's runtime spec field. The sandbox-level cpuset mask, which would be || of all container cpu-set masks, would be utililized to constrain vCPU threads (and perhaps vhost threads).
  • Track the PID(s) associated with the sandbox's QEMU, and taskset them based on the 'or' of each individual container's CPUSet mask. Same would be needed for vhost and iothreads.
  • Update this sandbox-level mask each time an UpdateContainers call is made which includes an updated CPUset mask.

Optimally, but secondary compared to the first set of changes:

With the mandatory bits in place, we'll be using the CPU set provided, but we won't be providing CPU affinity on a per container basis. To fully support CPU affinity in K8S, we'd also need to:

  • Track mapping of physical CPUs to vCPUs inside the guest
  • pin container processes to a particular vCPU set based on CPU mapping.
  • Look into documentation on allocating more CPUs in the system pool for running non-container vCPUs associated with Kata (ie, run vhost threads and / or shim processes on a particular CPU set as well)

Metadata

Metadata

Assignees

Labels

featureNew functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions