Skip to content

drm/compositor: fall back to primary-plane cursor on map_mut failure#2033

Open
poelzi wants to merge 1 commit into
Smithay:masterfrom
poelzi:nomem
Open

drm/compositor: fall back to primary-plane cursor on map_mut failure#2033
poelzi wants to merge 1 commit into
Smithay:masterfrom
poelzi:nomem

Conversation

@poelzi
Copy link
Copy Markdown

@poelzi poelzi commented May 14, 2026

Summary

  • The cursor-plane fast path in DrmCompositor::render_cursor_plane calls .expect(\"Lost track of cursor device\") on the outer Result of gbm::BufferObject::map_mut. That panic was written assuming the only failure mode was a vanished gbm device, but on NVIDIA the kernel allocator can return ENOMEM here — typically after a misbehaving client (e.g. Chromium's video decoder hitting a CHECK and dying under SIGILL) leaks GPU memory in nvkms. A transient per-frame allocation failure then takes the entire compositor down through render_frame.
  • This patch handles the outer error the same way the four sibling failure points immediately above already do (create_buffer, add_framebuffer, copy_element_to_cursor_bo, missing underlying_storage): log at debug! and return None. The caller falls back to compositing the cursor on the primary plane for this frame, and the cursor-plane fast path is automatically retried on subsequent frames once kernel memory frees up.

Repro

Observed in the wild on a niri + NVIDIA setup running smithay 0.7.0:

panicked at smithay-0.7.0/src/backend/drm/compositor/mod.rs:3353:18:
Lost track of cursor device: Os { code: 12, kind: OutOfMemory, message: "Cannot allocate memory" }

Sequence: chromium Media thread SIGILL → kernel logs [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* Failed to map NvKmsKapiMemory → 5s later niri panics at the line above → every Wayland client loses its socket.

Test plan

  • cargo check --no-default-features --features "backend_drm,backend_gbm,renderer_pixman" — clean.
  • cargo test --lib under the default + relevant features — 71/71 pass, no behavior change.
  • cargo clippy — no new lints (one pre-existing io_other_error warning in gbm.rs:215 is unrelated).
  • Real-world: rebuild niri (or your compositor of choice) against this and confirm the Chromium → NVIDIA-ENOMEM sequence no longer kills the session. The cursor may briefly fall back to primary-plane compositing until GPU memory recovers, which is the intended behavior.

Notes

  • Scope is intentionally minimal. The other .unwrap() / .expect() sites in DrmCompositor were audited and are state-machine invariants (HashMap entries the code just inserted, Options the conditional just proved Some); they do not touch the GPU allocator path.
  • Doesn't address the upstream NVIDIA driver leak — that's not fixable here — only the compositor's reaction to it.

The cursor-plane fast path called .expect("Lost track of cursor device")
on the result of gbm::BufferObject::map_mut. That panic was written
assuming the only way the outer Result could fail was a vanished gbm
device handle. In practice, on NVIDIA the kernel allocator can return
ENOMEM here — typically after a misbehaving client (e.g. Chromium's
video decoder crashing under SIGILL) leaks GPU memory in nvkms — and
the panic propagated all the way up render_frame, killing the entire
compositor over a transient per-frame allocation failure.

Treat the outer map_mut error the same way the four sibling failure
points immediately above already do (create_buffer, add_framebuffer,
copy_element_to_cursor_bo, missing underlying_storage): log at debug!
and return None, letting the caller composite the cursor on the
primary plane for this frame. Once kernel memory frees up the cursor
plane is re-tried automatically.

Repro path on the user's machine: chromium SIGILL in the Media thread →
nv_drm_gem_nvkms_map starts returning ENOMEM → niri panics at
mod.rs:3353 with "Lost track of cursor device: Os { code: 12,
kind: OutOfMemory }" → every Wayland client loses its socket.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant