Reloading nvidia_drm breaks power management if nvidia-powerd is stopped

### NVIDIA Open GPU Kernel Modules Version

595.45.04

### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

- [ ] I confirm that this does not happen with the proprietary driver package.

### Operating System and Version

Bazzite

### Kernel Release

Linux bazzite 6.17.7-ba28.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Mar  8 17:54:59 UTC 2026 x86_64 GNU/Linux

### Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

- [x] I am running on a stable kernel release.

### Hardware: GPU

NVIDIA GeForce RTX 4060 Laptop GPU

### Describe the bug

Currently Power Management on Nvidia GPUs is broken so I am using [script from here](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/905#issuecomment-3706044916) (which is based on [this comment](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/905#issuecomment-3360970555)) as a workaround to make it work. However, if I stop `nvidia-powerd` service (my system is a laptop running in hybrid mode), remove module `nvidia_drm`, load it again and start `nvidia-powerd`, the GPU power management stops working with an error in `dmesg`.

### To Reproduce

Case 1:
Exact commands to reproduce the error (after making sure nothing is running on nvidia gpu)-
```
❯ sudo systemctl stop nvidia-powerd.service
❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm
❯ sudo systemctl start nvidia-powerd.service
```
The power management is now broken with these errors in dmesg-
```
[ 1938.613341] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3963908968 heartbeat 0 heartbeatWithOffsetMs 0 diff 3963908968 timeout 5200
[ 1938.613347] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1941.428328] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 1948.632392] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 1950.913667] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get target temp from SBIOS @ platform_request_handler_ctrl.c:2171
[ 1950.913678] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get platform power mode from SBIOS @ platform_request_handler_ctrl.c:2114
[ 1951.087658] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.100513] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.105468] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
[ 1951.105999] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
[ 1972.148621] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.148630] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.227421] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.227423] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.237287] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944143704 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944143704 timeout 5200
[ 1972.237289] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1978.253184] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.253191] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.332087] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.332090] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.341988] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944154196 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944154196 timeout 5200
[ 1978.341990] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1984.403990] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.403996] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.482683] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.482685] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.492597] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944157516 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944157516 timeout 5200
[ 1984.492599] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1990.516030] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.516036] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.594743] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.594745] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.604595] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944164122 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944164122 timeout 5200
[ 1990.604597] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
```
(the log keeps repeating while GPU constantly tries to go into low power mode)

Case 2:
This does **not** happen (power management keeps working properly) if I only reload the `nvidia_drm` module without reloading `nvidia-powerd` service-
```
❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm
```
There are no errors in `dmesg`.

Case 3:
It also does **not** happen if I just restart the `nvidia-powerd` service without removing `nvidia_drm` module-
```
❯ sudo systemctl stop nvidia-powerd.service
❯ sudo systemctl start nvidia-powerd.service
```
Here, power management keeps working.

Use case for reloading `nvidia_drm` and `nvidia` modules- Sometimes I need remove all nvidia modules (need to stop `nvidia-powerd` otherwise we cannot remove `nvidia` module) for temporarily passing GPU for VFIO, and then reloading all modules back after VM shutdown, but this breaks Power Management. Only solution for me is to restart the whole system.

### Bug Incidence

Always

### nvidia-bug-report.log.gz

[nvidia-bug-report.log.gz](https://github.com/user-attachments/files/25992180/nvidia-bug-report.log.gz)

### More Info

Few notes on my system-

- My laptop is Lenovo Legion Slim 5 16APH8 (7840HS, RTX4060). Laptop is running on Hybrid mode so the display is connected to iGPU (AMD Radeon 780M). Removing nvidia modules after boot should not be an issue.
-  I made sure that nothing is utilizing nvidia gpu while I did the tests. Here's the output of `nvidia-smi`-
```
❯ nvidia-smi
Sat Mar 14 12:27:02 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.45.04              Driver Version: 595.45.04      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8            588W /   60W |       2MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```
- Dynamic Boost is working fine after reloading the modules and restarting the `nvidia-powerd` service. Only Power Management is broken. Cuda also works fine.
- Exact bazzite version being used-
```
❯ rpm-ostree status
State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-dx-nvidia:stable
                   Digest: sha256:8e0bdd2406ee29ff1c0e6a688a4fda0069af3103017694792b07f191318e5643
                  Version: 43.20260313 (2026-03-13T08:35:48Z)
          LayeredPackages: ufw
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reloading nvidia_drm breaks power management if nvidia-powerd is stopped #1059

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reloading nvidia_drm breaks power management if nvidia-powerd is stopped #1059

Description

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions