Fix Windows CUDA detection + image gen#22
Merged
cryptopoly merged 1 commit intomainfrom May 1, 2026
Merged
Conversation
Two related Windows-only bugs surfaced by the v0.7.2 smoke test on an RTX 4090 box: Bug #6 — RTX 4090 reported as 12 GB total GPUMonitor._snapshot_nvidia() shells out to nvidia-smi, and on Windows boxes without it on PATH (driver installed but no CUDA toolkit) it fell through to _fallback_psutil() which returns psutil.virtual_memory().total — system RAM, not VRAM. The image / video safety estimators then read that as the GPU budget and produced 'Likely to crash' warnings on a 24 GB card holding an 11 GB FLUX model. Fix: - Try torch.cuda.get_device_properties(0).total_memory first. When the GPU bundle is installed this is the most reliable source — it reads through the CUDA driver, no PATH needed. - Fall back to nvidia-smi as before. - Drop the psutil fallback. When neither answers we now return {'vram_total_gb': None}, which the TS estimators (utils/images.ts, utils/videos.ts) already treat as 'unknown' via the DEFAULT_*_MEMORY_GB fallbacks. Better an honest 'unknown' than a wrong 12 GB. Bug #7 — Image gen produces gibberish placeholder after install DiffusersImageEngine.probe() uses importlib.util.find_spec to decide between the placeholder engine and the real diffusers pipeline. Once the GPU bundle install lands new packages into the extras dir, importlib's negative-lookup cache still answers None for the new modules until invalidate_caches() is called. The probe kept reporting realGenerationAvailable=False and the generation pipeline returned the SVG placeholder, which lands as a gibberish image when the frontend renders it as data:image/svg+xml. Fix: - probe() now calls importlib.invalidate_caches() before find_spec so newly-installed packages are picked up without a backend restart. - The GPU bundle worker (_gpu_bundle_job_worker) now also calls invalidate_caches and resets the VRAM total cache when it transitions to phase=done, so the immediately-following capabilities snapshot reflects the freshly-importable torch. Tests tests/test_gpu_detection.py — 9 unit tests covering torch.cuda detection, nvidia-smi precedence, the new no-system-RAM fallback path, and the process-lifetime cache. All pass; existing pytest suite still green.
4 tasks
2 tasks
cryptopoly
added a commit
that referenced
this pull request
May 1, 2026
PR #22 (Fix Windows CUDA detection) replaced the system-RAM-as-VRAM fallback in _snapshot_nvidia with a no-GPU response that returns {'vram_total_gb': None, 'vram_used_gb': None}. The pre-existing test_snapshot_vram_values_are_numeric still required (int, float), which broke on the Linux CI runner where neither torch.cuda nor nvidia-smi is available. Loosen the type check to (int, float, type(None)) so the no-GPU path is accepted. Numeric responses still fail the test if vram_used_gb returns something garbage like a string. Renamed the test to ..._numeric_or_none to make the intent loud at the call site.
4 tasks
cryptopoly
added a commit
that referenced
this pull request
May 1, 2026
PR #22's no-system-RAM-fallback path returns {'vram_total_gb': None} on Linux CI runners (no torch.cuda, no nvidia-smi). The pre-existing test_snapshot_vram_values_are_numeric required (int, float) which breaks on those runners. This fix originally landed in branch fix/test-host-platform-mock (commit 3b147a9) but was pushed after PR #24 had already merged, so only the imageDiscoverMemoryEstimate Mac pin (commit 7bbeeef) made it into main. The orphan commit went unnoticed until run 25223969487 on this PR's first CI ride re-surfaced the same failure. Loosen the type check to (int, float, type(None)) and rename the test to ..._numeric_or_none so the intent is loud at the call site.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related Windows bugs surfaced during the v0.7.2 smoke test on an RTX 4090 / 24 GB box:
GPUMonitor._snapshot_nvidia()shelled out tonvidia-smi, and on Windows boxes without it on PATH (driver installed but no CUDA toolkit) it fell through to_fallback_psutil(), which returns system RAM. The image / video safety estimators then read that as the GPU budget and produced spurious "Likely to crash" warnings.DiffusersImageEngine.probe()usesimportlib.util.find_specto choose between the placeholder engine and the real diffusers pipeline. After the GPU bundle install lands new packages into the extras dir, importlib's negative-lookup cache still answersNoneuntilinvalidate_caches()is called, so the probe kept reportingrealGenerationAvailable=Falseand the SVG placeholder leaked through.Changes
backend_service/helpers/gpu.py_snapshot_torch_cuda()reads VRAM viatorch.cuda.get_device_properties(0).total_memoryfirst — works whenever the GPU bundle is installed, no PATH dependency._snapshot_nvidia()now tries torch.cuda → nvidia-smi → returnsvram_total_gb=None(no system-RAM lie)._fallback_psutil()is kept untouched but no longer reachable from the live path.backend_service/image_runtime.pyDiffusersImageEngine.probe()callsimportlib.invalidate_caches()before thefind_specchecks so newly-installed packages from the GPU bundle install are visible without a backend restart.backend_service/routes/setup.py_gpu_bundle_job_workerinvalidates the import cache and resets the VRAM total cache when transitioning tophase=done, so the immediately-following capabilities snapshot reflects freshly-importable torch.tests/test_gpu_detection.py(new)Nine unit tests covering:
None._snapshot_nvidiafalls back to{"gpu_name": "No GPU detected", "vram_total_gb": None}when both torch.cuda and nvidia-smi fail._snapshot_nvidiadoes NOT fall back to system RAM via psutil any more.get_device_vram_total_gbcaches result for process lifetime.Test plan
.venv/bin/python -m pytest tests/test_gpu_detection.py -v— 9/9 pass.venv/bin/python -m pytest tests/test_setup_routes.py tests/test_inference.py tests/test_services.py -q— pre-existing tests still passSettings → Diagnosticsreports 24 GB VRAM after restart; FLUX.1 Dev no longer triggers "Likely to crash" warning; clicking Generate after a fresh GPU bundle install produces a real image instead of the placeholder._snapshot_macospath (untouched).Out of scope
safetensors>=0.4.5and treats the pip-resolver warning as benign.