Skip to content

TLB Invalidation time out on i915 SR-IOV passthrough #54

@tchevrie

Description

@tchevrie

Host environment

Operating system: Gentoo Base System release 2.14
OS/kernel version: https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z
Architecture: x86_64
QEMU flavor: qemu-system-x86_64
QEMU version: latest qemu (master branch)
CPU: 12th Gen Intel(R) Core(TM) i7-1270P
igpu: Alder Lake-P
firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz

Emulated/Virtualized environment

Operating system: Windows 10 21H1

Description of problem

Hello,
After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci driver attribution to the new pci..)
I've got my two new pci.


00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
DeviceName: Onboard IGD

Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: i915

00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci

00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics Controller
Kernel driver in use: vfio-pci

I gave one of those pci to my VM with this qemu cmdline:


-cpu host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE
...
-device vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0

Sometimes it working properly when I start the qemu cmdline but most of the time I've got those kernel errors and a GPU hang:

    kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation response timed out for seqno 9679
    ....
    kernel Fence expiration time out i915-0000:00:02.0:renderThread22381:6e0!
    kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.13.1
    kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
    kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
    kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
    kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
    kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dfbfff, in renderThread [22381]
    kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381 context reset due to GPU hang

It mostly appears when Qemu is starting..
Any help would be appreciate, thanks a lot

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions