Skip to content

Fix Windows CUDA detection + image gen#22

Merged
cryptopoly merged 1 commit intomainfrom
fix/windows-cuda-detection
May 1, 2026
Merged

Fix Windows CUDA detection + image gen#22
cryptopoly merged 1 commit intomainfrom
fix/windows-cuda-detection

Conversation

@cryptopoly
Copy link
Copy Markdown
Owner

Summary

Two related Windows bugs surfaced during the v0.7.2 smoke test on an RTX 4090 / 24 GB box:

  • Bump actions/upload-artifact from 6 to 7 #6 GPU memory misreported as 12 GB on a 24 GB card. GPUMonitor._snapshot_nvidia() shelled out to nvidia-smi, and on Windows boxes without it on PATH (driver installed but no CUDA toolkit) it fell through to _fallback_psutil(), which returns system RAM. The image / video safety estimators then read that as the GPU budget and produced spurious "Likely to crash" warnings.
  • Bump tauri-apps/tauri-action from 0.6.0 to 0.6.2 #7 Image gen produces gibberish placeholder after install. DiffusersImageEngine.probe() uses importlib.util.find_spec to choose between the placeholder engine and the real diffusers pipeline. After the GPU bundle install lands new packages into the extras dir, importlib's negative-lookup cache still answers None until invalidate_caches() is called, so the probe kept reporting realGenerationAvailable=False and the SVG placeholder leaked through.

Changes

backend_service/helpers/gpu.py

  • New _snapshot_torch_cuda() reads VRAM via torch.cuda.get_device_properties(0).total_memory first — works whenever the GPU bundle is installed, no PATH dependency.
  • _snapshot_nvidia() now tries torch.cuda → nvidia-smi → returns vram_total_gb=None (no system-RAM lie).
  • The old _fallback_psutil() is kept untouched but no longer reachable from the live path.

backend_service/image_runtime.py

  • DiffusersImageEngine.probe() calls importlib.invalidate_caches() before the find_spec checks so newly-installed packages from the GPU bundle install are visible without a backend restart.

backend_service/routes/setup.py

  • _gpu_bundle_job_worker invalidates the import cache and resets the VRAM total cache when transitioning to phase=done, so the immediately-following capabilities snapshot reflects freshly-importable torch.

tests/test_gpu_detection.py (new)

Nine unit tests covering:

  • torch.cuda returns full 24 GB for a mocked RTX 4090.
  • torch.cuda unavailable / not installed returns None.
  • _snapshot_nvidia falls back to {"gpu_name": "No GPU detected", "vram_total_gb": None} when both torch.cuda and nvidia-smi fail.
  • _snapshot_nvidia does NOT fall back to system RAM via psutil any more.
  • torch.cuda takes precedence over nvidia-smi when both are available.
  • get_device_vram_total_gb caches result for process lifetime.

Test plan

  • .venv/bin/python -m pytest tests/test_gpu_detection.py -v — 9/9 pass
  • .venv/bin/python -m pytest tests/test_setup_routes.py tests/test_inference.py tests/test_services.py -q — pre-existing tests still pass
  • Manual verify on Windows / RTX 4090: Settings → Diagnostics reports 24 GB VRAM after restart; FLUX.1 Dev no longer triggers "Likely to crash" warning; clicking Generate after a fresh GPU bundle install produces a real image instead of the placeholder.
  • Manual verify on macOS (regression): VRAM detection still reports unified memory correctly via the existing _snapshot_macos path (untouched).

Out of scope

  • The diffusers/safetensors version-incompatibility warning ("safetensors 0.7.0 vs >=0.8.0-rc.0") observed in the install log is a separate issue. safetensors 0.8.0-rc.0 is a pre-release that pip won't install by default; in practice 0.7.0 works for the FLUX pipeline, so this PR leaves the pin at safetensors>=0.4.5 and treats the pip-resolver warning as benign.

Two related Windows-only bugs surfaced by the v0.7.2 smoke test on
an RTX 4090 box:

Bug #6 — RTX 4090 reported as 12 GB total
  GPUMonitor._snapshot_nvidia() shells out to nvidia-smi, and on
  Windows boxes without it on PATH (driver installed but no CUDA
  toolkit) it fell through to _fallback_psutil() which returns
  psutil.virtual_memory().total — system RAM, not VRAM. The image /
  video safety estimators then read that as the GPU budget and
  produced 'Likely to crash' warnings on a 24 GB card holding an
  11 GB FLUX model.

  Fix:
  - Try torch.cuda.get_device_properties(0).total_memory first.
    When the GPU bundle is installed this is the most reliable
    source — it reads through the CUDA driver, no PATH needed.
  - Fall back to nvidia-smi as before.
  - Drop the psutil fallback. When neither answers we now return
    {'vram_total_gb': None}, which the TS estimators
    (utils/images.ts, utils/videos.ts) already treat as 'unknown'
    via the DEFAULT_*_MEMORY_GB fallbacks. Better an honest
    'unknown' than a wrong 12 GB.

Bug #7 — Image gen produces gibberish placeholder after install
  DiffusersImageEngine.probe() uses importlib.util.find_spec to
  decide between the placeholder engine and the real diffusers
  pipeline. Once the GPU bundle install lands new packages into
  the extras dir, importlib's negative-lookup cache still answers
  None for the new modules until invalidate_caches() is called.
  The probe kept reporting realGenerationAvailable=False and the
  generation pipeline returned the SVG placeholder, which lands as
  a gibberish image when the frontend renders it as data:image/svg+xml.

  Fix:
  - probe() now calls importlib.invalidate_caches() before
    find_spec so newly-installed packages are picked up without a
    backend restart.
  - The GPU bundle worker (_gpu_bundle_job_worker) now also calls
    invalidate_caches and resets the VRAM total cache when it
    transitions to phase=done, so the immediately-following
    capabilities snapshot reflects the freshly-importable torch.

Tests
  tests/test_gpu_detection.py — 9 unit tests covering
  torch.cuda detection, nvidia-smi precedence, the new
  no-system-RAM fallback path, and the process-lifetime cache.
  All pass; existing pytest suite still green.
@cryptopoly cryptopoly merged commit 3967db3 into main May 1, 2026
1 of 2 checks passed
cryptopoly added a commit that referenced this pull request May 1, 2026
PR #22 (Fix Windows CUDA detection) replaced the system-RAM-as-VRAM
fallback in _snapshot_nvidia with a no-GPU response that returns
{'vram_total_gb': None, 'vram_used_gb': None}. The pre-existing
test_snapshot_vram_values_are_numeric still required (int, float),
which broke on the Linux CI runner where neither torch.cuda nor
nvidia-smi is available.

Loosen the type check to (int, float, type(None)) so the no-GPU path
is accepted. Numeric responses still fail the test if vram_used_gb
returns something garbage like a string.

Renamed the test to ..._numeric_or_none to make the intent loud at
the call site.
cryptopoly added a commit that referenced this pull request May 1, 2026
PR #22's no-system-RAM-fallback path returns {'vram_total_gb': None}
on Linux CI runners (no torch.cuda, no nvidia-smi). The pre-existing
test_snapshot_vram_values_are_numeric required (int, float) which
breaks on those runners.

This fix originally landed in branch fix/test-host-platform-mock
(commit 3b147a9) but was pushed after PR #24 had already merged, so
only the imageDiscoverMemoryEstimate Mac pin (commit 7bbeeef) made
it into main. The orphan commit went unnoticed until run 25223969487
on this PR's first CI ride re-surfaced the same failure.

Loosen the type check to (int, float, type(None)) and rename the
test to ..._numeric_or_none so the intent is loud at the call site.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant