Skip to content

nodedev-detach hangs #22

@Moonlight63

Description

@Moonlight63

(creating a new issue with the same body as my reply to the old one)
Originally posted by @Moonlight63 in #16 (comment)

Hello, Thank you for the guide. I have been running passthrough for a while, but just did a fresh install of Pop and thought it might be nice to be able to use my second gpu when not running VMs. I am having the same issues as others here.

Running
virsh nodedev-detach $VIRSH_GPU_VIDEO
where, for me

VIRSH_GPU_VIDEO=pci_0000_02_00_0
VIRSH_GPU_AUDIO=pci_0000_02_00_1

causes a hang.

My gpus are on there own IOMMU groups

IOMMU Group 34 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
IOMMU Group 34 02:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
IOMMU Group 35 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
IOMMU Group 35 01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

I have modified my xorg.conf so that it only uses the 1080 for host, and I have disabled AutoAddGPU, and I have a 3 monitor setup with all 3 plugged into the 3 displayports on the 1080, my full config is this:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 460.73.01

Section "ServerFlags"
	Option "AutoAddGPU" "off"
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "SAC DP"
    HorizSync       30.0 - 222.0
    VertRefresh     30.0 - 144.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080"
    BusID          "PCI:1:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-6"
    Option         "metamodes" "DP-4: 2560x1440_144 +2560+0, DP-0: 2560x1440_144 +0+0, DP-2: 2560x1440_144 +5120+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

And finally I have verified that the 1070 is not being used by anything with nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P0    46W / 210W |    429MiB /  8116MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   35C    P8    11W / 230W |      2MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4462      G   /usr/lib/xorg/Xorg                347MiB |
|    0   N/A  N/A      4957      G   /usr/bin/gnome-shell               78MiB |
+-----------------------------------------------------------------------------+

As a side note, my CPU doesn't list the virtualization option as VT-d, but rather calls it by it's full name in dmesg:

[    0.302075] DMAR: IOMMU enabled
...
[    0.543400] DMAR-IR: IOAPIC id 8 under DRHD base  0xfbffc000 IOMMU 1
[    0.543401] DMAR-IR: IOAPIC id 9 under DRHD base  0xfbffc000 IOMMU 1
[    0.543402] DMAR-IR: HPET id 0 under DRHD base 0xfbffc000
[    0.543403] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.543404] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.544009] DMAR-IR: Enabled IRQ remapping in xapic mode
[    5.119927] DMAR: [Firmware Bug]: RMRR entry for device 06:00.0 is broken - applying workaround
[    5.119931] DMAR: dmar0: Using Queued invalidation
[    5.119937] DMAR: dmar1: Using Queued invalidation
[    5.129946] DMAR: Intel(R) Virtualization Technology for Directed I/O

Possibly because it's a xeon? Just thought I would mention it for others who come here.

Anyway, as far as I can tell, I've done everything mentioned and I can't find anything else that would be stopping the unload. Any ideas? Yes my scripts are executable, and I've been trying to just run the commands one by one in terminal to see if I can find an error exit, but since virsh nodedev-detach never completes and just hangs, no error is reported. Any help is greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions