Provide information to agent to let it safely wait for VFIO devices to complete hotplug#2981
Provide information to agent to let it safely wait for VFIO devices to complete hotplug#2981dgibson wants to merge 5 commits into
Conversation
|
/test-ubuntu |
|
/test-ubuntu |
Codecov Report
@@ Coverage Diff @@
## master #2981 +/- ##
==========================================
+ Coverage 50.35% 50.63% +0.27%
==========================================
Files 120 119 -1
Lines 15918 16719 +801
==========================================
+ Hits 8016 8466 +450
- Misses 6812 7117 +305
- Partials 1090 1136 +46 |
jodh-intel
left a comment
There was a problem hiding this comment.
Thanks @dgibson - a few comments.
|
/test |
|
/test |
|
/retest |
|
/test-vfio |
|
/test-vfio |
|
/test-vfio |
1 similar comment
|
/test-vfio |
|
Bother. I've fixed the failure with clh, but now there's another failure in the VFIO check which I don't have any theory to explain. |
|
/test-vfio |
|
@dgibson this is the error: I have restarted the job and fixed the teardown process, The runtime logs will be available here http://jenkins.katacontainers.io/job/kata-containers-runtime-vfio-PR/261/ once the job finish |
|
@dgibson this is the log http://jenkins.katacontainers.io/job/kata-containers-runtime-vfio-PR/261/artifact/artifacts/kata-runtime_00.gz - I hope it can help you to debug this issue |
Yeah, I got that far. I don't know why that error is occuring - it doesn't happen when I try it by hand, it's not obvious what's different about the CI environment, and it's far from easy to replicate exactly what the CI is doing. |
|
@dgibson - there are references to that device in http://jenkins.katacontainers.io/job/kata-containers-runtime-vfio-PR/261/artifact/artifacts/kernel_00.gz if that helps? |
|
/test-vfio |
Bring in the QomGet function.
shortlog:
Chenbin (1):
typo fix
David Gibson (1):
Add qom-get function
Jakob-Naucke (1):
Add support for hot-plugging IBM VFIO-AP devices
Julio Montes (3):
travis: Run coveralls after success
github: enable github actions
travis: disable amd64 jobs
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
hotplugVFIODevice() has several different paths depending if we're plugging into a root port or a PCIE<->PCI bridge and if we're using a regular or mediated VFIO device. We're going to want some common code on the successful exit path here, so refactor the function to allow that without duplication. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For several device types which correspond to a PCI device in the guest we record the device's PCI path in the guest. We don't currently do that for VFIO devices, but we're going to need to for better handling of SR-IOV devices. To accomplish this, we have to determine the guest PCI path from the information the VMM gives us: For qemu, we query the slot of the device and its bridge from QMP. For cloud-hypervisor, the device add interface gives us a guest PCI address. In fact this represents a design error in the clh API - there's no way it can really know the guest PCI address in general. It works in this case, because clh doesn't use PCI bridges, so the device will always be on the root bus. Based on that, the PCI path is simply the device's slot number. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We send information about several kinds of devices to the agent so that it can apply specific handling. We don't currently do this with VFIO devices. However we need to do that so that the agent can properly wait for VFIO devices to be ready (previously it did that using a PCI rescan which may not be reliable and has some very bad side effects). This patch collates and sends the relevant information. Depends-on: github.com/kata-containers/agent#850 fixes #2664 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. fixes #2664 [1] #2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
|
I'm no longer planning to pursue this in Kata1, I'll be following up in Kata 2 instead. |
This is the runtime part of the fix for #2664 and kata-containers/agent#781.
The agent part is kata-containers/agent#850