Skip to content

Hotfix: probe torch.cuda via subprocess + opaque install log#25

Merged
cryptopoly merged 1 commit intomainfrom
fix/torch-dll-lock-and-zindex
May 1, 2026
Merged

Hotfix: probe torch.cuda via subprocess + opaque install log#25
cryptopoly merged 1 commit intomainfrom
fix/torch-dll-lock-and-zindex

Conversation

@cryptopoly
Copy link
Copy Markdown
Owner

Summary

Two regressions surfaced when smoke-testing the v0.7.2 rebuild on Windows / RTX 4090 — both blockers for shipping.

#1 — torch DLL lock prevents GPU bundle install

PR #22 added _snapshot_torch_cuda which did import torch in the backend process. On Windows that pins torch/lib/*.dll (asmjit, cublas, cudnn, ...) into the process handle table. The next click on Install GPU runtime runs pip install --target which calls shutil.rmtree on the existing torch dir, hits the locked DLLs, and crashes:

```
PermissionError: [WinError 5] Access is denied:
'...\extras\cp312\site-packages\torch\lib\asmjit.dll'
```

DiffusersImageEngine.probe() already documents this exact trap (it deliberately uses find_spec instead of importing torch). PR #22 was undoing that protection.

Fix: spawn a short-lived Python subprocess that imports torch, prints {gpu_name, total, used} as JSON, and exits. The OS releases the DLL handles on subprocess exit so the next install can swap torch in place. Prefer the embedded sidecar Python (CHAOSENGINE_EMBED_PYTHON_BIN); fall back to sys.executable.

Skip the probe entirely on macOS — Apple Silicon has no torch.cuda; _snapshot_macos owns that path.

#2 — Install log appears to overlap Prompt + Recent Outputs

PR #23's position: relative; z-index: 5 won the stacking battle, but the install-log-panel kept its translucent rgba(0, 0, 0, 0.22) background, so the Prompt + Recent Outputs panel headers bled through visually whenever the install log was adjacent to them. Reads as "overlap" even when the layout doesn't actually intersect.

Fix: switch the background to var(--surface) for a fully opaque card, and add contain: layout so the panel's growth during a long torch download can't leak into sibling grid rows.

Changes

  • backend_service/helpers/gpu.py_snapshot_torch_cuda now spawns a Python subprocess; new _resolve_python_executable picks the embedded sidecar Python first; macOS short-circuits to None.
  • tests/test_gpu_detection.py — rewritten to mock subprocess.run instead of sys.modules['torch']. Adds an explicit assertion that the probe never imports torch in the main process — if anyone reverts to an in-process import, this test catches it.
  • src/styles.css.install-log-panel gets background: var(--surface) + contain: layout.

Test plan

  • .venv/bin/python -m pytest tests/test_gpu.py tests/test_gpu_detection.py tests/test_inference.py tests/test_setup_routes.py -q — all pass (30 in test_gpu* alone, 1 expected skip)
  • Manual verify on Windows / RTX 4090: install GPU runtime → restart backend → click Install GPU runtime AGAIN → confirm no PermissionError; uninstall + reinstall → confirm extras dir survives.
  • Manual verify on Windows: while a long torch install is streaming, scroll Image Studio → confirm install log card is opaque, no Prompt headers showing through.
  • Manual verify on macOS: VRAM detection still reports unified memory via _snapshot_macos (untouched).

Two regressions reported from the v0.7.2 Windows smoke test (RTX 4090).

1. GPU bundle install fails with PermissionError on torch DLLs.

   PR #22's _snapshot_torch_cuda probed the GPU by importing torch
   directly in the backend process. On Windows that loads
   torch/lib/*.dll (asmjit, cublas, cudnn, ...) into the process
   handle table, which then makes pip's --target install fail with

       PermissionError: [WinError 5] Access is denied:
       '...\extras\cp312\site-packages\torch\lib\asmjit.dll'

   when shutil.rmtree tries to swap the existing torch wheel.
   DiffusersImageEngine.probe() already documents this exact trap
   and explicitly avoids importing torch — _snapshot_torch_cuda
   was undoing that protection.

   Fix: spawn a short-lived Python subprocess that imports torch,
   prints {gpu_name, total, used} as JSON to stdout, and exits.
   The OS releases the DLL handles on process exit, so the next
   Install GPU runtime click can rmtree + replace torch in place.
   Prefer the embedded sidecar Python (CHAOSENGINE_EMBED_PYTHON_BIN)
   so the subprocess sees the same site-packages as the backend;
   fall back to sys.executable when the env var is not set.

   Also skip the probe entirely on macOS — Apple Silicon has no
   torch.cuda; the unified-memory path in _snapshot_macos owns
   that case.

2. InstallLogPanel still appears to overlap Prompt + Recent Outputs.

   The previous rgba(0, 0, 0, 0.22) background let the sibling
   panel headers bleed through whenever the install log was
   visually adjacent to them, which read as 'overlap' even when
   the layout wasn't actually intersecting. Switch to var(--surface)
   for a fully opaque card background, and add 'contain: layout'
   so the panel's growth during a long torch download cannot leak
   into sibling grid rows.

Tests
- tests/test_gpu_detection.py rewritten to mock subprocess.run
  instead of sys.modules['torch']. Adds an explicit assertion
  that the probe never imports torch in the main process — if
  anyone reverts to an in-process import, that test catches it.
- All existing tests still pass.
@cryptopoly cryptopoly merged commit 0f84066 into main May 1, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant