-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
NVIDIA Open GPU Kernel Modules Version
595.45.04
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Bazzite
Kernel Release
Linux bazzite 6.17.7-ba28.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Mar 8 17:54:59 UTC 2026 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 4060 Laptop GPU
Describe the bug
Currently Power Management on Nvidia GPUs is broken so I am using script from here (which is based on this comment) as a workaround to make it work. However, if I stop nvidia-powerd service (my system is a laptop running in hybrid mode), remove module nvidia_drm, load it again and start nvidia-powerd, the GPU power management stops working with an error in dmesg.
To Reproduce
Case 1:
Exact commands to reproduce the error (after making sure nothing is running on nvidia gpu)-
❯ sudo systemctl stop nvidia-powerd.service
❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm
❯ sudo systemctl start nvidia-powerd.service
The power management is now broken with these errors in dmesg-
[ 1938.613341] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3963908968 heartbeat 0 heartbeatWithOffsetMs 0 diff 3963908968 timeout 5200
[ 1938.613347] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1941.428328] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 1948.632392] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 1950.913667] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get target temp from SBIOS @ platform_request_handler_ctrl.c:2171
[ 1950.913678] NVRM: GPU0 nvAssertOkFailedNoLog: Assertion failed: Invalid data passed [NV_ERR_INVALID_DATA] (0x00000025) returned from PlatformRequestHandler failed to get platform power mode from SBIOS @ platform_request_handler_ctrl.c:2114
[ 1951.087658] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.100513] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-2
[ 1951.105468] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
[ 1951.105999] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
[ 1972.148621] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.148630] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.227421] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1972.227423] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1972.237287] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944143704 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944143704 timeout 5200
[ 1972.237289] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1978.253184] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.253191] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.332087] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1978.332090] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1978.341988] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944154196 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944154196 timeout 5200
[ 1978.341990] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1984.403990] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.403996] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.482683] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1984.482685] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1984.492597] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944157516 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944157516 timeout 5200
[ 1984.492599] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[ 1990.516030] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.516036] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.594743] NVRM: _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
[ 1990.594745] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
[ 1990.604595] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 3944164122 heartbeat 0 heartbeatWithOffsetMs 0 diff 3944164122 timeout 5200
[ 1990.604597] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
(the log keeps repeating while GPU constantly tries to go into low power mode)
Case 2:
This does not happen (power management keeps working properly) if I only reload the nvidia_drm module without reloading nvidia-powerd service-
❯ sudo rmmod nvidia_drm
❯ sudo modprobe nvidia_drm
There are no errors in dmesg.
Case 3:
It also does not happen if I just restart the nvidia-powerd service without removing nvidia_drm module-
❯ sudo systemctl stop nvidia-powerd.service
❯ sudo systemctl start nvidia-powerd.service
Here, power management keeps working.
Use case for reloading nvidia_drm and nvidia modules- Sometimes I need remove all nvidia modules (need to stop nvidia-powerd otherwise we cannot remove nvidia module) for temporarily passing GPU for VFIO, and then reloading all modules back after VM shutdown, but this breaks Power Management. Only solution for me is to restart the whole system.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
Few notes on my system-
- My laptop is Lenovo Legion Slim 5 16APH8 (7840HS, RTX4060). Laptop is running on Hybrid mode so the display is connected to iGPU (AMD Radeon 780M). Removing nvidia modules after boot should not be an issue.
- I made sure that nothing is utilizing nvidia gpu while I did the tests. Here's the output of
nvidia-smi-
❯ nvidia-smi
Sat Mar 14 12:27:02 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.45.04 Driver Version: 595.45.04 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P8 588W / 60W | 2MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
- Dynamic Boost is working fine after reloading the modules and restarting the
nvidia-powerdservice. Only Power Management is broken. Cuda also works fine. - Exact bazzite version being used-
❯ rpm-ostree status
State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite-dx-nvidia:stable
Digest: sha256:8e0bdd2406ee29ff1c0e6a688a4fda0069af3103017694792b07f191318e5643
Version: 43.20260313 (2026-03-13T08:35:48Z)
LayeredPackages: ufw