Is your feature request related to a problem? Please describe.
A number of modern high performance apps want to access PCI devices directly from userspace via VFIO - network functions written with DPDK are one of the most prominent examples. Such apps can run easily enough in runc (or other conventional) container runtimes simply by passing the /dev/vfio/vfio control device and /dev/vfio/NN group devices into the container (using the devices part of the Linux specific section in the container's config.json).
At present that won't work with Kata. Although Kata has some support for passing in VFIO devices, these will typically be bound to the guest kernel's native driver , and so will need specialised logic within the container itself to access them (either via the guest kernel's driver interface, or by rebinding the devices to the vfio-pci driverr within the guest).
Describe the solution you'd like
Kata should itself perform the necessary steps so that it behaves much more like a regular OCI runtime in this regard. That is, if VFIO devices are specified in the runtime spec that is given to Kata, corresponding devices should appear within the container. This means that Kata needs to:
- Rebind the guest PCI devices belonging to the passed in VFIO groups to the
vfio-pci driver within the guest
- Configure libcontainer within the guest to create the appropriate device nodes within the "inner" container
Additional context
This will require changes in both kata-runtime and kata-agent. An issue for kata-agent coming soon.
It requires that the Kata guest image be built to include the necessary VFIO drivers (on x86, typically vfio, vfio_pci, vfio_iommu_type1 and vfio_virqfd).
It requires that the Kata guest run with a virtual IOMMU to allow the guest's VFIO drivers to operate. The existing enable_iommu option in configuration.toml can be used for this.
Is your feature request related to a problem? Please describe.
A number of modern high performance apps want to access PCI devices directly from userspace via VFIO - network functions written with DPDK are one of the most prominent examples. Such apps can run easily enough in runc (or other conventional) container runtimes simply by passing the
/dev/vfio/vfiocontrol device and/dev/vfio/NNgroup devices into the container (using thedevicespart of the Linux specific section in the container'sconfig.json).At present that won't work with Kata. Although Kata has some support for passing in VFIO devices, these will typically be bound to the guest kernel's native driver , and so will need specialised logic within the container itself to access them (either via the guest kernel's driver interface, or by rebinding the devices to the
vfio-pcidriverr within the guest).Describe the solution you'd like
Kata should itself perform the necessary steps so that it behaves much more like a regular OCI runtime in this regard. That is, if VFIO devices are specified in the runtime spec that is given to Kata, corresponding devices should appear within the container. This means that Kata needs to:
vfio-pcidriver within the guestAdditional context
This will require changes in both kata-runtime and kata-agent. An issue for kata-agent coming soon.
It requires that the Kata guest image be built to include the necessary VFIO drivers (on x86, typically
vfio,vfio_pci,vfio_iommu_type1andvfio_virqfd).It requires that the Kata guest run with a virtual IOMMU to allow the guest's VFIO drivers to operate. The existing
enable_iommuoption inconfiguration.tomlcan be used for this.