Skip to content

RTX 3080 laptop GPU not detected - GPU0 RmInitAdapter: Cannot initialize GSP firmware RM #1058

@MacroMelon

Description

@MacroMelon

NVIDIA Open GPU Kernel Modules Version

595.45.04

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Arch Linux

Kernel Release

Linux 6.19.6-arch1-1

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

RTX 3080 mobile

Describe the bug

My RTX 3080 mobile (device - Lenovo T15g Gen 2i, Intel i9 + rtx3080 + Intel iGPU with hybrid graphics, running the latest BIOS they provide) doesn't get detected (GPU doesn't show up on nividia-smi but the kernal modules and drivers were loaded correctly according to lspci -k -d ::03xx)

Seems like the GSP firmware isn't getting initialised right - in dmesg, I get a bunch of:

...
[   42.023690] NVRM: GPU0 gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110100,  regvalue: 0xbadf5720,  error code: Unknown SYS_PRI_ERROR_CODE
[   42.023692] NVRM: GPU0 gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110100,  regvalue: 0xbadf5720,  error code: Unknown SYS_PRI_ERROR_CODE
[   42.023694] NVRM: GPU0 gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110100,  regvalue: 0xbadf5720,  error code: Unknown SYS_PRI_ERROR_CODE
[   42.023696] NVRM: GPU0 gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110100,  regvalue: 0xbadf5720,  error code: Unknown SYS_PRI_ERROR_CODE
[   42.023698] NVRM: GPU0 gpuHandleSanityCheckRegReadError_GM107: Possible bad register read: addr: 0x110100,  regvalue: 0xbadf5720,  error code: Unknown SYS_PRI_ERROR_CODE
....

followed by

[   42.023712] NVRM: GPU0 kflcnWaitForHalt_TU102: Timeout waiting for Falcon to halt
[   42.023714] NVRM: GPU0 gpuWaitForGfwBootComplete_TU102: GSP failed to halt with GFW_BOOT: (progress 0xff)
[   42.023715] NVRM: GPU0 kgspWaitForGfwBootOk_TU102: failed to wait for GFW boot complete: 0x65 VBIOS version 94.04.51.00.2E
[   42.023716] NVRM: GPU0 kgspWaitForGfwBootOk_TU102: (the GPU may be in a bad state and may need to be reset)
[   42.023718] NVRM: GPU0 nvCheckOkFailedNoLog: Check failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from kgspWaitForGfwBootOk_HAL(pGpu, pKernelGsp) @ kernel_gsp.c:4904
[   42.023746] NVRM: GPU0 RmInitAdapter: Cannot initialize GSP firmware RM
[   42.024746] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x65:2168)
[   42.025458] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

[edit] - Sometimes this is also followed by:

[ 2184.373341] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from RPC_HDR->rpc_result @ kernel_gsp.c:6068
[ 2184.373347] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_tu102.c:569
[ 2184.373356] NVRM: GPU0 _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
[ 2184.373356] NVRM: GPU0 _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset)
[ 2184.373378] NVRM: GPU0 RmInitAdapter: Cannot initialize GSP firmware RM
[ 2184.374153] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:2168)
[ 2184.375003] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

Running sudo nvidia-smi --gpu-reset returns:

Unlisted GPU 00000000:01:00.0 was successfully reset.
All done.

However the GPU remains undetected and the above errors get repeated in dmesg.

This has been a long running problem - My research led me to install the proprietary driver and disable the GSP entirely by creating the file /etc/modprobe.d/nvidia.conf containing:

options nvidia NVreg_EnableGpuFirmware=0

and then rebuilding initramfs using mkinitcpio -P

This worked, but this is not possible anymore as Arch has dropped newer versions of the nvidia proprietary driver and only allows you to install nvidia-open, which cannot disable the GSP firmware, so if I need my gpu to work I need to be stuck on the 580xx proprietary drivers from the Arch User Repository.

To Reproduce

  • Install nvidia-open dirver
  • Start PC

Bug Incidence

Always

nvidia-bug-report.log.gz

NOTE - The attached bug report is when I had 595.44.03 installed, it's virtually the same for 595.45.04

nvidia-bug-report.log.gz

More Info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions