Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Fix rescan PCI conflicts with shpchp / pciehp#2461

Merged
lifupan merged 1 commit into
kata-containers:masterfrom
Jimmy-Xu:support-lazy-attach-device
Feb 26, 2020
Merged

Fix rescan PCI conflicts with shpchp / pciehp#2461
lifupan merged 1 commit into
kata-containers:masterfrom
Jimmy-Xu:support-lazy-attach-device

Conversation

@Jimmy-Xu
Copy link
Copy Markdown
Contributor

@Jimmy-Xu Jimmy-Xu commented Feb 17, 2020

This is a solution to issue #2460 .

Since the pci rescan will conflict with shpchp / pciehp, we can attach the Nvidia GPU after creating the container, and then the pci rescan will not include the Nvidia GPU.
shpchp / pciehp will handle hot plugging of Nvidia GPU correctly.

There may be better solutions, but this is a working solution.

@Jimmy-Xu Jimmy-Xu force-pushed the support-lazy-attach-device branch 4 times, most recently from 29c7f51 to 01e1a87 Compare February 17, 2020 10:45
Copy link
Copy Markdown
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@bergwolf
Copy link
Copy Markdown
Member

/test

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 17, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@0c3b2c0). Click here to learn what that means.
The diff coverage is 16.66%.

@@            Coverage Diff            @@
##             master    #2461   +/-   ##
=========================================
  Coverage          ?   50.57%           
=========================================
  Files             ?      116           
  Lines             ?    16597           
  Branches          ?        0           
=========================================
  Hits              ?     8394           
  Misses            ?     7177           
  Partials          ?     1026

@devimc
Copy link
Copy Markdown

devimc commented Feb 17, 2020

@Jimmy-Xu does pci-rescan is still needed? I added it in clear-containers because a bug in the pcie-pci-bridges (q35) that was fixed long time ago (qemu > 3.0)

@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

Jimmy-Xu commented Feb 18, 2020

@devimc
I tried to comment out rescanPciBus in kata-agent and this solved my problem. However, I am not sure if it will affect the hot-plugging of other types of devices, so I did not remove it.

So, I added a parameter enable_lazy_attach_device in the kata config, This solution works only when "enable_lazy_attach_device" is true. Currently, only vfio display devices will delay attach.

@devimc
Copy link
Copy Markdown

devimc commented Feb 18, 2020

@Jimmy-Xu I think we can simplify the things a lot with a simple agent's option in the kernel command like, how about adding a new option (agent.no_pci_rescan) in order to don't rescan the PCI bus?

cc @bergwolf

@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

@devimc This is also a solution.

@amshinde
Copy link
Copy Markdown
Member

@Jimmy-Xu I am not in favor of adding a config option in the configuration.toml for delaying attaching devices.
Can we instead have logic in the runtime itself to detect {devices that are hotplugged using pcie native hotplug}/{devices that have a large configuration space} and instead attach those devices in a delayed fashion.
We typically want to have configuration options for supporting diff configurations to the user, I dont think that is the case here.

Also, @devimc pointed out about not rescanning PCI bus again. Here is the background for why the rescan was added in the first place:
kata-containers/agent#380 (comment)

According to that comment and the one following it, we do not need to rescan the PCI bus at all for pc machine type. (@devimc We can probably fix that right away for pc machine). The rescan was added to support attaching devices in case of q35 on a pci-bridge to account for the 5 sec SHPC delay so that we do not have to wait for 5 sec for a device to be detected by the agent. It is certainly not required for devices that are attached using native pcie hotplug.
I dont think we hotplug all devices on pcie ports in case of q35, I think today it is a mix of pci-bridges and pcie ports. So until we get rid of attaching any devices on pci-bridges in case of q35, I dont think we can get rid of the pci bus rescan in case of q35.

@lifupan
Copy link
Copy Markdown
Member

lifupan commented Feb 19, 2020

Hi @devimc @amshinde,

In summary, we couldn't remove the rescan for q35 to avoid the 5 sec SHPC delay.
But for those pci devices need much more io spaces shouldn't be rescaned, since the rescan
couldn't reallocate enough IO space for it, and it must depend on the Hotplug mechanism(pcie native hotplug or SHPC). Thus, we should delay those devices hotplug until the "rescan" action had done.

I think @Jimmy-Xu 's way is a good choice: dividing those device into two classes: one was didn't need big io space and could be rescaned, we could hotplug them before the "rescan" action, and the other class is those devices need a big io spaces and depened on the hotplug
mechanism, do the hotplug after the "rescan" action done.

Of course, it's better to do this classification automatically, but I could't figure out an easy way to recognize whether the specific device needs a big IO space or not.

@devimc
Copy link
Copy Markdown

devimc commented Feb 19, 2020

@lifupan

In summary, we couldn't remove the rescan for q35 to avoid the 5 sec SHPC delay.

does kata-agent wait 5 seconds for the device before creating the container? otherwise the workload has to wait 5 seconds until the device is visible in the container

@lifupan
Copy link
Copy Markdown
Member

lifupan commented Feb 20, 2020

@lifupan

In summary, we couldn't remove the rescan for q35 to avoid the 5 sec SHPC delay.

does kata-agent wait 5 seconds for the device before creating the container? otherwise the workload has to wait 5 seconds until the device is visible in the container

Hi, @devimc , the answer is no. That's why we shouldn't remove the "rescan" action.

@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

@amshinde

pci rescan is used to ensure that certain devices are connected when the container is created, it will discover all devices connected to the VM via QMP.
shpchp / pciehp is required to hot-plug PCIe devices with large BAR space, with a delay of 5 seconds when the guest operating system starts.

By default, the pci rescan runs before shpchp / pciehp, which prevents proper hot-plugging of PCIe devices.

Since the pci rescan cannot be removed, and the Large BAR space devices require shpchp / pciehp. The solution might be to run pci rescan after shpchp / pciehp woks, or attach the large BAR space devices after creating the container.

For the first solution, the kata-agent needs to check when shpchp / pciehp works, but
Not all solutions require shpchp / pciehp, so kata-agent also needs to distinguish whether shpchp / pciehp is required.

For the second solution, the kata shim can identify large BAR space devices and delay attachment.

In this PR:
I identified all PCIe display VFIO device, then delay to attach them in q35.
enable_lazy_attach_device is just a switch, which is off by default. After all, few devices have Large BAR space.

Another way:
Remove the enable_lazy_attach_device parameter in config. Identify all Large BAR space devices, then delay to attach them by default in q35.

Maybe there is a way to identify Large BAR space device
(run the following command line on the host):

$ lspci -nn  | grep -i nvidia
04:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)
84:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:15f8] (rev a1)

$lspci -s 04:00.0 -vv | grep Region
	Region 0: Memory at c6000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Region 1: Memory at 383800000000 (64-bit, prefetchable) [disabled] [size=16G] <<<<<<<
	Region 3: Memory at 383c00000000 (64-bit, prefetchable) [disabled] [size=32M]

$cat /sys/bus/pci/devices/0000\:04\:00.0/resource
0x00000000c6000000 0x00000000c6ffffff 0x0000000000040200
0x0000383800000000 0x0000383bffffffff 0x000000000014220c   <<<<<<<
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000383c00000000 0x0000383c01ffffff 0x000000000014220c
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

@Jimmy-Xu Jimmy-Xu force-pushed the support-lazy-attach-device branch 3 times, most recently from 317bc1a to 3e53560 Compare February 24, 2020 06:45
@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

/test

@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

@amshinde @devimc @bergwolf @lifupan:
I updated the solution:
detect VFIO device, if it is a device with large BAR space, delay to attach it.

Copy link
Copy Markdown

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Jimmy-Xu , this pr looks much better 😄

I left some comments

Comment thread virtcontainers/device/manager/utils.go Outdated
Comment thread virtcontainers/device/manager/utils.go Outdated
Comment thread virtcontainers/device/manager/utils.go Outdated
Comment thread virtcontainers/device/manager/utils.go
Comment thread virtcontainers/device/manager/utils.go Outdated
@Jimmy-Xu Jimmy-Xu force-pushed the support-lazy-attach-device branch from 3e53560 to 39ee2a6 Compare February 25, 2020 02:24
Copy link
Copy Markdown
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the solution:
detect VFIO device, if it is a device with large BAR space, delay to attach it.

Thanks @Jimmy-Xu This is what I was looking for, detecting devices that require large BAR space and attaching them after PCI scan. PR looks lot better. I have left a few comments.

Comment thread virtcontainers/device/manager/utils.go
Comment thread virtcontainers/device/manager/utils.go Outdated
Comment thread virtcontainers/device/manager/utils.go Outdated
@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jimmy-Xu

Comment thread virtcontainers/container.go
Comment thread virtcontainers/device/manager/utils.go
Comment thread virtcontainers/device/manager/utils.go Outdated
Comment thread virtcontainers/device/manager/utils.go Outdated
- support attach large bar space vfio devices after create container

fixes kata-containers#2460

Signed-off-by: Jimmy Xu <junming.xjm@antfin.com>
@Jimmy-Xu Jimmy-Xu force-pushed the support-lazy-attach-device branch from 39ee2a6 to c6cc8b9 Compare February 26, 2020 03:56
@Jimmy-Xu
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jimmy-Xu.

lgtm

@lifupan lifupan merged commit fde6447 into kata-containers:master Feb 26, 2020
devimc pushed a commit to devimc/kata-agent that referenced this pull request May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

fixes kata-containers#781
fixes kata-containers/runtime#2664

[1]: clearcontainers/agent#139
[2]: kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this pull request May 7, 2020
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space)
devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers.
Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not
longer needed.

Depends-on: github.com/kata-containers/agent#782
fixes kata-containers#2664

[1] kata-containers#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this pull request May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-runtime that referenced this pull request May 8, 2020
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space)
devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers.
Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not
longer needed.

Depends-on: github.com/kata-containers/agent#782
fixes kata-containers#2664

[1] kata-containers#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this pull request May 8, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this pull request Sep 4, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
bpradipt pushed a commit to bpradipt/runtime that referenced this pull request Nov 27, 2020
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space)
devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers.
Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not
longer needed.

fixes kata-containers#2664

[1] kata-containers#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants