Is your feature request related to a problem? Please describe.
A number of modern high performance apps want to access PCI devices directly from userspace via VFIO - network functions written with DPDK are one of the most prominent examples. Such apps can run easily enough in runc (or other conventional) container runtimes simply by passing the /dev/vfio/vfio control device and /dev/vfio/NN group devices into the container (using the devices part of the Linux specific section in the container's config.json).
At present that won't work with Kata. Although Kata has some support for passing in VFIO devices, these will typically be bound to the guest kernel's native driver , and so will need specialised logic within the container itself to access them (either via the guest kernel's driver interface, or by rebinding the devices to the vfio-pci driverr within the guest).
Describe the solution you'd like
Kata should itself perform the necessary steps so that it behaves much more like a regular OCI runtime in this regard. That is, if VFIO devices are specified in the runtime spec that is given to Kata, corresponding devices should appear within the container. This means that Kata needs to:
- Rebind the guest PCI devices belonging to the passed in VFIO groups to the vfio-pci driver within the guest
- Configure libcontainer within the guest to create the appropriate device nodes within the "inner" container
Additional context
This will require changes in both kata-runtime and kata-agent. The runtime side issue is tracked at: kata-containers/runtime#2938
This depends on fixing the rescan vs. hotplug race which breaks VFIO hotplug entirely (#781)
Is your feature request related to a problem? Please describe.
A number of modern high performance apps want to access PCI devices directly from userspace via VFIO - network functions written with DPDK are one of the most prominent examples. Such apps can run easily enough in runc (or other conventional) container runtimes simply by passing the /dev/vfio/vfio control device and /dev/vfio/NN group devices into the container (using the devices part of the Linux specific section in the container's config.json).
At present that won't work with Kata. Although Kata has some support for passing in VFIO devices, these will typically be bound to the guest kernel's native driver , and so will need specialised logic within the container itself to access them (either via the guest kernel's driver interface, or by rebinding the devices to the vfio-pci driverr within the guest).
Describe the solution you'd like
Kata should itself perform the necessary steps so that it behaves much more like a regular OCI runtime in this regard. That is, if VFIO devices are specified in the runtime spec that is given to Kata, corresponding devices should appear within the container. This means that Kata needs to:
Additional context
This will require changes in both kata-runtime and kata-agent. The runtime side issue is tracked at: kata-containers/runtime#2938
This depends on fixing the rescan vs. hotplug race which breaks VFIO hotplug entirely (#781)