Skip to content

Reloading nvidia_drm breaks power management if nvidia-powerd is stopped #1059

@nitinkmr333

Description

@nitinkmr333

NVIDIA Open GPU Kernel Modules Version

595.45.04

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Bazzite

Kernel Release

Linux bazzite 6.17.7-ba28.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Mar 8 17:54:59 UTC 2026 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 4060 Laptop GPU

Describe the bug

Currently Power Management on Nvidia GPUs is broken so I am using script from here (which is based on this comment) as a workaround to make it work. However, if I stop nvidia-powerd service (my system is a laptop running in hybrid mode), remove module nvidia_drm, load it again and start nvidia-powerd, the GPU power management stops working with an error in dmesg.

To Reproduce

Case 1:
Exact commands to reproduce the error (after making sure nothing is running on nvidia gpu)-

❯ sudo systemctl stop nvidia-powerd.service
❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm
❯ sudo systemctl start nvidia-powerd.service

The power management is now broken with these errors in dmesg-

[ 1938.613341] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3963908968 heartbeat 0 heartbeatWithOffsetMs 0 diff 3963908968 timeout 5200
[ 1938.613347] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1941.428328] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 1948.632392] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 1950.913667] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get target temp from SBIOS @ platform_request_handler_ctrl.c:2171
[ 1950.913678] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get platform power mode from SBIOS @ platform_request_handler_ctrl.c:2114
[ 1951.087658] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.100513] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.105468] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
[ 1951.105999] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
[ 1972.148621] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.148630] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.227421] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.227423] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.237287] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944143704 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944143704 timeout 5200
[ 1972.237289] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1978.253184] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.253191] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.332087] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.332090] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.341988] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944154196 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944154196 timeout 5200
[ 1978.341990] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1984.403990] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.403996] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.482683] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.482685] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.492597] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944157516 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944157516 timeout 5200
[ 1984.492599] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1990.516030] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.516036] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.594743] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.594745] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.604595] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944164122 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944164122 timeout 5200
[ 1990.604597] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out

(the log keeps repeating while GPU constantly tries to go into low power mode)

Case 2:
This does not happen (power management keeps working properly) if I only reload the nvidia_drm module without reloading nvidia-powerd service-

❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm

There are no errors in dmesg.

Case 3:
It also does not happen if I just restart the nvidia-powerd service without removing nvidia_drm module-

❯ sudo systemctl stop nvidia-powerd.service
❯ sudo systemctl start nvidia-powerd.service

Here, power management keeps working.

Use case for reloading nvidia_drm and nvidia modules- Sometimes I need remove all nvidia modules (need to stop nvidia-powerd otherwise we cannot remove nvidia module) for temporarily passing GPU for VFIO, and then reloading all modules back after VM shutdown, but this breaks Power Management. Only solution for me is to restart the whole system.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Few notes on my system-

  • My laptop is Lenovo Legion Slim 5 16APH8 (7840HS, RTX4060). Laptop is running on Hybrid mode so the display is connected to iGPU (AMD Radeon 780M). Removing nvidia modules after boot should not be an issue.
  • I made sure that nothing is utilizing nvidia gpu while I did the tests. Here's the output of nvidia-smi-
❯ nvidia-smi
Sat Mar 14 12:27:02 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.45.04              Driver Version: 595.45.04      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8            588W /   60W |       2MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
  • Dynamic Boost is working fine after reloading the modules and restarting the nvidia-powerd service. Only Power Management is broken. Cuda also works fine.
  • Exact bazzite version being used-
❯ rpm-ostree status
State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-dx-nvidia:stable
                   Digest: sha256:8e0bdd2406ee29ff1c0e6a688a4fda0069af3103017694792b07f191318e5643
                  Version: 43.20260313 (2026-03-13T08:35:48Z)
          LayeredPackages: ufw

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions