Feature/video gen by cryptopoly · Pull Request #4 · cryptopoly/ChaosEngineAI

cryptopoly · 2026-04-21T19:54:14Z

No description provided.

Four targeted fixes from the Windows diagnostic run: - video_runtime._find_missing: catch ModuleNotFoundError when find_spec walks a dotted name whose parent namespace is absent. The probe was crashing with "No module named 'google'" while checking google.protobuf on a clean install, which made /api/video/runtime return 500 forever and kept the UI stuck on "Video runtime did not respond". Added regression test covering the exact scenario. - useSettings.handleStopServer / handleRestartServer: decide on isTauri() rather than tauriBackend?.managedByTauri. The latter is false until the background bootstrap thread populates it, so clicking Restart before that finished (or after a stale-state refresh) sent the browser down the web-fallback branch that POSTs /api/server/shutdown and then polls for a respawn that never came. Now every desktop-context click goes through restart_backend_sidecar so Rust can own the shutdown+relaunch. - CI (.github/workflows/build.yml): install the [images] extras in the Python test job. Tests for the diffusers video runtime import torch and diffusers directly; without them every ready-path assertion failed on main after the feature/video-gen merge. - build.ps1 / build.sh: flip the CUDA torch check from opt-in strict (CHAOSENGINE_REQUIRE_CUDA_TORCH=1) to abort-by-default on NVIDIA hosts, with CHAOSENGINE_ALLOW_CPU_TORCH=1 as the escape. The build now prints the actual Python version + bundled torch version in the error, lists three concrete fix paths (switch to Py 3.13, override CUDA index, or deliberately ship CPU), and extends the CUDA index candidate list up to cu130/cu129 so newer drivers get picked up.

Drops torch + diffusers + accelerate + transformers and all CUDA DLLs from the bundled installer (saves ~1.3 GB). The app now ships chat + llama.cpp out of the box (<~500 MB installer), and Image / Video Studio offer a one-click "Install GPU runtime" button the first time the user visits them. The install writes to a PERSISTENT user-local directory that survives app updates — no more re-downloading 2 GB of CUDA torch every version bump. ## Build-time changes - ``build.ps1`` / ``build.sh``: drop ``[images]`` from the default venv install + kill the CUDA torch install/verify blocks. Users who want a one-installer GPU build can set ``CHAOSENGINE_BUNDLE_GPU=1`` to restore the old 1.9 GB behaviour (useful for air-gapped deployments). - ``scripts/stage-runtime.mjs``: ``images`` moves from required → optional in ``validateBundledPythonPackages``. Only ``desktop`` (fastapi, uvicorn, psutil, pypdf, huggingface_hub) is strictly required now. ## Runtime plumbing - ``src-tauri/src/lib.rs``: compute a platform-native persistent extras path (``%LOCALAPPDATA%\\ChaosEngineAI\\extras\\site-packages`` on Windows, ``~/Library/Application Support/ChaosEngineAI/extras/...`` on macOS, ``$XDG_DATA_HOME/ChaosEngineAI/extras/...`` on Linux), create it on bootstrap, prepend it to ``PYTHONPATH`` so its packages shadow anything in the bundled site-packages, and export ``CHAOSENGINE_EXTRAS_SITE_PACKAGES`` so the backend knows where to install. This path lives OUTSIDE the ephemeral ``%TEMP%`` runtime extraction, so app updates don't wipe the 2 GB of CUDA libraries the user just downloaded. ## New async install endpoint - ``backend_service/routes/setup.py``: ``POST /api/setup/install-gpu-bundle`` starts a background thread that walks the CUDA index list for torch (cu124 → cu126 → cu128 → cu121 → nightly cu128) and then pip-installs diffusers + accelerate + transformers + safetensors + pillow + huggingface-hub + imageio + imageio-ffmpeg + sentencepiece + tiktoken + protobuf + ftfy from PyPI. Target is the persistent extras dir, so we never touch the DLLs the running backend has loaded (goodbye WinError 5). A disk-space preflight (5 GB required) fails cleanly up front. - ``GET /api/setup/install-gpu-bundle/status`` returns live progress: phase, current package, percent, attempts list, ``noWheelForPython`` flag, and ``cudaVerified`` after the post-install ``torch.cuda.is_available()`` probe. Frontend polls at 1.5 Hz. - ``GET /api/setup/gpu-bundle-info`` exposes the target dir + package list + approx download size + free disk for a pre-install banner. ## Frontend wiring - ``src/api.ts``: typed ``startGpuBundleInstall``, ``getGpuBundleStatus``, ``fetchGpuBundleInfo`` helpers + the ``GpuBundleJobState`` interface. - ``src/hooks/useImageState.ts``: ``handleInstallImageRuntime`` rewritten to use the async bundle job and poll until done, surfacing live progress through ``imageBusyLabel`` (picked up by the existing "Installing..." text in ImageStudioTab). - ``src/hooks/useVideoState.ts``: new ``handleInstallVideoGpuRuntime`` mirroring the image one. The existing per-package install path stays for targeted fixes (mp4 encoder, tokenizers) that don't need the full 2 GB. - ``src/features/video/VideoStudioTab.tsx``: adds an "Install GPU runtime" banner when the video probe reports the engine as unavailable (was previously just a Restart Backend button that couldn't fix anything). Also fixes a latent bug where the parent passed a 0-arg wrapper that swallowed the package list from the tokenizer-deps install button. ## Tests - ``tests/test_setup_routes.py``: new ``InstallGpuBundleTests`` covers the idle-state status response, bundle-info contract, happy-path end-to-end install, disk-space preflight failure, ``noWheelForPython`` flagging after all-indexes fail, and the concurrent-start guard that prevents parallel installs. 466 Python tests pass; 171 TypeScript tests pass; tsc + cargo check clean.

The previous flow collapsed every install failure into a single toast ("GPU bundle install failed."), which is the "failed without reason" the user reported. The backend already tracks every pip step's stdout + stderr in the job state — this exposes that as a collapsible log panel under the Install GPU Runtime button in both Image Studio and Video Studio. UX: - Auto-expands when the install enters the ``error`` phase so users see the real pip output without clicking — by far the most common thing they need when the install fails. - Stays collapsed on success so the UI doesn't get noisy. - Shows target dir + Python version + winning CUDA index up top (everything you'd need to paste into a support thread). - Each attempt is one row: OK/FAIL badge + step label (``torch (from cu124)``, ``pip install diffusers``, ``Verify torch.cuda.is_available()``…) + scrollable terminal-style ``<pre>`` with the last 60 lines of pip output. - Final status row mirrors ``state.message`` so partial successes surface cleanly ("CUDA verified, but ftfy + sentencepiece failed — retry individually"). Backend hardening (``backend_service/routes/setup.py``): - ``except Exception`` block now falls back to the exception type name when ``str(exc)`` is empty, so the UI never sees a blank ``error``. - "All CUDA index candidates failed" message now appends the last three lines of the most recent pip output so the toast itself is actionable, not just a pointer to the log. - Non-fatal package failures (after torch lands) are tracked in a ``non_fatal_failures`` list and summarised in the final ``message``, so success/restart prompts don't lie when half the bundle is missing. - Verify-CUDA failure path now includes the verify subprocess's last two output lines in the message — drives the user toward the actual cause (driver mismatch vs CPU torch) without expanding the log. Frontend (``src/components/InstallLogPanel.tsx``, ``src/hooks/use{Image,Video}State.ts``, ``src/features/{images,video}/{ImageStudio,VideoStudio}Tab.tsx``, ``src/App.tsx``): - New ``InstallLogPanel`` component (no emojis, monochrome, matches existing Studio styling). - Both hooks now keep a ``gpuBundleJob: GpuBundleJobState | null`` React state, updated each poll tick. Exposed via the hook return value, plumbed through ``App.tsx`` to the Studio tabs. - Both Studio tabs render ``<InstallLogPanel job={gpuBundleJob} />`` inside the runtime-actions block whenever ``realGenerationAvailable === false``, so it appears the moment the install starts and stays around until the user resolves it. - Image Studio install button text changed from "Install image runtime" to "Install GPU runtime" for consistency with Video Studio and the new bundle naming. Description updated to mention the ~2.5 GB persistent download. - Hook ``handleInstallImageRuntime`` / ``handleInstallVideoGpuRuntime`` now append " See the install log below for per-step pip output. Target: <dir>" to the toast message on failure, so even users who don't expand the log get pointed at where the bytes were going. Image Studio intentionally stays at one install button (vs Video Studio's three): mp4-encoder + tokenizer-deps buttons in Video Studio exist because video pipelines lazy-load specific packages at preload time. Image gen uses diffusers' bundled tokenizers and writes PNG directly — its needs are covered entirely by the GPU bundle. 466 Python tests pass; 171 TypeScript tests pass; tsc clean.

Verbatim from ~/andrej-karpathy-skills/CLAUDE.md so the rules show up without needing the source file in scope. Lives at the top of the project guide because they govern how to work, not what's in the codebase.

Chat model downloads now appear in My Models as soon as they start, with a live progress badge, and the Discover card header surfaces the same state without requiring the user to expand the card. Mirrors the behaviour video models already had via ``activeVideoDownloads``. - src/App.tsx: synthesise library rows from ``activeDownloads`` entries that haven't landed in ``workspace.library`` yet (matched by repo via inferred HF cache path / catalog variant), so MyModelsTab renders them alongside real rows. - src/features/models/MyModelsTab.tsx: treat rows with a ``download://<repo>`` sentinel path as synthetic — hide the reveal button and the expanded library-path line. - src/features/models/OnlineModelsTab.tsx: add Downloading/Paused/Failed badges to both the curated family and hub card headers so the state is visible on the collapsed row. - src/features/models/*.test.tsx: cover the new synthetic-row rendering and the new collapsed header badges for curated + hub cards. npm test: 174 passed. npx tsc --noEmit: clean. pytest tests/: all pass.

Two different meanings of 'backend' were colliding on screen: the footer's green 'BACKEND ONLINE' badge means the Python API sidecar, while the Dashboard's 'Runtime engine: No backend' meant no inference engine was active. Users reasonably read the collision as a bug report. 'Idle' matches RuntimeStatus.state already used elsewhere and doesn't claim the sidecar is down. No logic branches on the literal string. - backend_service/inference.py: MockInferenceEngine.engine_label - src/mockData.ts: matching test fixture - tests/test_backend_service.py: matching FakeRuntime stub 466 Python / 174 TS tests pass, tsc clean.

Two related guardrails. 1. backend_service/state.py::load_model — raise a clear RuntimeError when the request's modelRef isn't in the library AND no path was provided. Previously the request fell through to llama-server's built-in HuggingFace auto-fetch, which on Windows blew up with 'HTTPLIB failed: SSL server verification failed' because the bundled llama-server.exe ships without a CA bundle. That was the root cause of the phantom Nemotron load the user hit. Whatever UI path triggered that load, users will now see 'Download it from the Discover tab first' instead of a confusing SSL error. 2. src/features/chat/ChatTab.tsx — Send button is disabled and the textarea Enter-key handler is a no-op when loadedModelRef is null. Placeholder text changes to 'Load a model first...' so the disabled state is self-explanatory. Title attribute provides a hover hint. Also: - tests/test_backend_service.py — new test_model_load_rejects_models_not_on_disk locks in the guard behaviour (500 + actionable detail, runtime.load_model never invoked). - test_compare_stream_uses_exclusive_loading_and_clears_warm_pool updated to pass an explicit path for modelB, since the new guard requires either a library entry or a path (the test isn't exercising the guard itself, just warm-pool eviction). 467 Python / 174 TS tests pass, tsc clean.

Three related fixes for the Image/Video Studio install UX. 1. Eager torch warmup was pinning DLLs in the backend process, making it impossible for pip to reinstall torch via the install-gpu-bundle flow. Removed start_torch_warmup() from create_app() and updated probe() in both image and video runtimes to use importlib.util.find_spec instead of an actual import torch. The torch import is now deferred to preload()/generate() where it's actually needed — first-generate pays the cold-import cost (~30s on Windows cold disk) but install-gpu-bundle is no longer blocked by locked torch/lib/*.dll. 2. Image Studio's install button was gated on activeEngine === "unavailable" but the backend never returns that literal — it returns "placeholder" both when packages are missing AND when they're installed-but-broken (the user's "cannot import name 'autocast' from 'torch.amp'" case). The button was effectively invisible in production. Gate is now just !realGenerationAvailable, matching the Video Studio pattern. 3. Added a DLL-lock detector in the install worker: if pip rmtree fails with WinError 5 / Access is denied on a torch DLL (which still happens in one edge case — user already triggered a torch import before clicking Install GPU runtime), we now raise a clear actionable error pointing at the Restart Backend button instead of burying the cause under pip's stack trace. Backend plumbing: - backend_service/app.py: drop start_torch_warmup() from create_app(). Comment explains why — the warmup infrastructure still exists for future explicit callers. - backend_service/image_runtime.py: probe() uses find_spec only. Device is reported from the currently-loaded pipeline (None when nothing is loaded) instead of speculatively calling detect_device. - backend_service/video_runtime.py: same treatment. Message now lists missing_core packages explicitly so the preload() RuntimeError string tells callers what to install. - backend_service/routes/setup.py: new _looks_like_dll_lock() + short-circuit in _install_torch_walking_indexes. Message points at Restart Backend. Frontend: - src/features/images/ImageStudioTab.tsx: install button always shows when runtime isn't available (removed dead activeEngine gate). Tests: - tests/test_video_runtime.py: renamed test_probe_reports_ready_when_all_deps_and_torch_import_cleanly -> test_probe_reports_ready_when_all_deps_findable, asserts device is None. Removed test_probe_returns_initializing_fast_path_when_warmup_in_progress (that code path no longer exists). Added test_probe_does_not_import_torch as a regression lock so nobody reintroduces the DLL-pin bug. - tests/test_setup_routes.py: new DllLockDetectionTests covering the three cases that matter — WinError 5 on a torch DLL (match), unrelated pip failure (no match), WinError 5 on a non-torch file (no match). 470 Python / 174 TS tests pass, tsc clean.

Replaces the 'ask user to run PowerShell' troubleshooting loop with a one-click snapshot the user can copy as Markdown and paste into a support thread. Everything we'd otherwise manually query — OS + hardware + driver versions, runtime paths, env vars, GPU package presence, torch subprocess status, extras size + free disk, backend log tail — lives in one collapsible panel with a Copy diagnostics button. Backend endpoints (backend_service/routes/diagnostics.py): - GET /api/diagnostics/snapshot — structured JSON with app, os, hardware (CPU/memory/swap/disks/GPU from nvidia-smi/sysctl), python, runtime, gpu (find_spec + torch-in-subprocess probe so we never pin DLLs), extras, environment (pinned + redacted), logs tail. - GET /api/diagnostics/log-tail?lines=N — clamped at 500 lines. - POST /api/diagnostics/reextract-runtime — deletes the ephemeral %TEMP% embedded-runtime extraction so the next backend restart re-extracts from scratch. Useful when the runtime tree is corrupted from a crashed install. Frontend (src/features/settings/DiagnosticsPanel.tsx): - Renders the snapshot as collapsible sections with monospace key/value tables. Empty / missing values render as '—' so 'not applicable' is visually distinct from 'broken'. - Copy button formats the entire snapshot as a Markdown document (tables for each section, fenced code block for the log tail) and writes it to the clipboard. Status flips to 'Copied!' for 2.5s. - Re-extract runtime button (with confirm) + Restart Backend button grouped under a muted 'Advanced' note so users don't hit them by accident. Redaction: - Env vars whose name contains token / secret / password / api_key come back as ***redacted***. Pinned CHAOSENGINE_* / PYTHON* / PATH / VIRTUAL_ENV vars appear even when unset so users can see what the backend inherited vs didn't. Regression locks: - test_snapshot_gpu_uses_find_spec_without_importing_torch asserts the snapshot endpoint never pulls torch into sys.modules. The whole install-flow fix this week hinges on keeping the backend torch-free; this test prevents a future diagnostics addition from regressing it. - test_snapshot_redacts_secret_env_vars locks in the redaction rules so users can safely paste payloads into public chat. Wiring: - New Diagnostics subtab added to SettingsTab alongside General / Storage / Providers / Integrations. - App.tsx threads backendOnline / onRestartServer / busyAction into SettingsTab so the Refresh + Restart buttons gate correctly. 478 Python / 174 TS tests pass, tsc clean.

The user reported two 28 GB llama-server.exe processes surviving ChaosEngineAI closing all windows. Root cause: cleanup_orphaned_backends only matched 'backend_service.app' commandlines, so when the Python sidecar crashed before taskkill /F /T could cascade to its children, the llama.cpp binary children were reparented and left running. Now sweeps four process shapes on both platforms, parent-liveness- checked the same way as before: - backend_service.app (Python sidecar, unchanged) - llama-server (standard llama.cpp) - llama-server-turbo (TurboQuant / RotorQuant fork) - llama-cli (defensive — not currently spawned but may be) Unix: extended the ps-based pattern match into a const marker array. Turbo listed before 'llama-server' so substring match doesn't swallow the more specific name. Windows: split the single WMIC query into two filters (commandline LIKE for Python, name= for the llama binaries) because those can't compose cleanly in one WHERE clause. Lifted the shared parse + tasklist + taskkill logic into sweep_orphans_by_wmic_filter. 478 Python tests still pass. cargo check --release clean.

User hit 'Failed to fetch' in the Diagnostics panel — the very panel meant to troubleshoot failures fetch-failed. Most likely one of the helpers (psutil disk_usage on a funky Windows mountpoint, a hung nvidia-smi, a serialization edge case) raised and uvicorn returned an unparseable response before the 500 handler could serialize. Every section builder now runs inside a _guarded() wrapper that catches any exception and returns {'error': '<Type>: <msg>', 'section': '<name>'} in place of the normal section dict. The frontend's Record<string, unknown> typing accepts that shape unchanged, so empty values just render as '—' and the user still sees the other sections. The panel's whole point is to stay useful when things break; it can't itself be fragile. 478 Python tests still pass.

Two fixes surfaced by the user's first working diagnostics snapshot on their Windows VM. 1. find_spec('google.protobuf') raises ModuleNotFoundError (not returns None) when the 'google' namespace package isn't installed at all. That crashed _gpu_info() whole — the section came back as {error: 'ModuleNotFoundError: No module named google'} in the snapshot. Same edge case we already fixed in video_runtime._find_missing; the diagnostics helper didn't get the treatment. Now has a _has_module() wrapper that catches ModuleNotFoundError / ValueError / ImportError and returns False, matching the caller's 'is it installed?' semantic. 2. The Diagnostics panel body wasn't scrollable. The snapshot has ~9 sections plus a 200-line log tail, so on a 1080p screen the last few sections AND the Re-extract button at the bottom were below the fold, unreachable. Capped the body at 75vh with overflow-y: auto so users can scroll through the whole thing. The Copy button already worked regardless so support flows were fine, just frustrating to verify visually. Both trivial fixes, shipped together.

Closes the main memory-leak failure mode the user reported: 28 GB llama-server.exe processes surviving app close on Windows. The existing reactive sweep (cleanup_orphaned_backends) catches these on the next app launch, but only if the user actually relaunches. While the app stays closed the orphans are pinning VRAM + RAM indefinitely. Now three proactive mechanisms — each handling the case its platform is best at — layered so no single crash path can leak children. ## Windows (new): Job Objects with KILL_ON_JOB_CLOSE - Added windows-sys crate with Win32_Foundation + Win32_System_JobObjects features. - New src-tauri/src/lib.rs::windows_job module creates a singleton Job Object at first spawn via CreateJobObjectW, flips the JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE bit via SetInformationJobObject, holds the handle in a process-lifetime OnceLock<usize>. - After command.spawn() we call AssignProcessToJobObject on the child. Grandchildren (llama-server spawned by Python) auto-inherit job membership on Windows 8+. - When Tauri exits FOR ANY REASON — graceful close, crash, Task Manager's End Task, OOM killer, power loss handshake, panic — Windows kernel closes the handle and the KILL_ON_JOB_CLOSE flag makes it atomically kill every process in the job. No user-mode code runs; nothing can leak. - Best-effort: if Job creation fails the reactive sweep still catches anything that leaks. ## Linux (new): PR_SET_PDEATHSIG in pre_exec - Added libc::prctl(PR_SET_PDEATHSIG, SIGKILL) inside the existing setsid pre_exec block. Kernel now delivers SIGKILL to the backend Python when Tauri dies, even if the Python watchdog hasn't ticked yet (closes the worst-case 500ms race in the getppid poll). - Python then dies, which signals its session (llama-server + anything else) via the watchdog's SIGTERM+SIGKILL cascade below. ## macOS / Linux (hardened): watchdog now SIGTERM+SIGKILL - backend_service/app.py::_watch_parent_and_exit was already sending SIGTERM to the process group on parent death, but graceful SIGTERM can be ignored indefinitely by a buggy / stuck llama-server. Now we SIGTERM first (gives llama-server a chance to flush caches + release GPU handles), wait 300ms, then SIGKILL the whole group as belt-and-braces. killpg(pgrp, SIGKILL) kills us too so we never need to loop; if SIGKILL is somehow ignored we fall through to os._exit(0). ## Coverage matrix | Graceful close | Tauri crash | Tauri SIGKILL macOS | setsid+shutdown| watchdog | watchdog Linux | setsid+shutdown| PDEATHSIG | PDEATHSIG Windows | taskkill /T | Job Object | Job Object Fallback: cleanup_orphaned_backends on next launch (all OSes). cargo check clean on macOS (Windows cfg-gated code not compiled locally but mechanically audited against windows-sys 0.61 API). 478 Python + 174 TS tests still pass.

Rolls all four version-declaring files forward together so nothing goes out of sync: - package.json (frontend + vite build) - package-lock.json (npm lockfile mirror) - pyproject.toml (backend Python package) - src-tauri/Cargo.toml (Rust shell package) - src-tauri/Cargo.lock (Cargo picked up the new windows-sys dep for the orphan-prevention Job Object work AND the chaosengineai package version change in one regen) - src-tauri/tauri.conf.json (bundle + updater manifest version) Cargo.lock was going to need committing regardless — the orphan-prevention commit added a top-level windows-sys dependency in Cargo.toml and the lockfile must mirror that for reproducible builds. Rolling it into the version bump keeps the diff single-commit and bisect-friendly. 478 Python / 174 TS tests pass, cargo check clean.

User reported the app hanging on 'Loading workspace state...' after installing 0.6.2 over 0.6.0. Diagnostic showed the extraction tree in a partial state: backend 21/04 16:51 <-- stale, from 0.6.0 bin 21/04 17:53 <-- stale python 21/04 18:13 <-- stale manifest.json 21/04 19:57 <-- current, from 0.6.2 And backend/ was EMPTY inside — all the Python source that boots the FastAPI sidecar had been rmtree'd but the unpack never re-populated it. Root cause: ensure_embedded_runtime_extracted had this line: if extraction_root.exists() { let _ = fs::remove_dir_all(&extraction_root); // swallows err } On Windows, fs::remove_dir_all fails with 'Access denied' when a DLL is held open — our own torch/lib/*.dll or llama-server.exe from a previous session, or even a brief Defender scan lock. The let _ discarded that error; unpack then ran into a partially-emptied tree and wrote only whatever entries came BEFORE the locked file. Result shipped above: backend/ cleared, nothing re-extracted, manifest written as if everything was fine. ## Fix: key extraction dir by manifest content hash Each unique build manifest lands in its own directory: chaosengine-embedded-runtime/ windows-x86_64-a1b2c3d4/ <-- 0.6.0 build windows-x86_64-e5f6g7h8/ <-- 0.6.2 build Fresh extraction is always to a fresh directory. No existing tree is ever overwritten, so no rmtree race. If a specific-hash directory exists in a partial state (e.g. we crashed mid-unpack), we DO rmtree it — but only that one dir, and we propagate errors now instead of swallowing them so users see a clear message. Hash is a 32-bit DefaultHasher fingerprint (4B possible values, collision chance negligible for the handful of manifest versions a user ever sees). 8 hex chars keeps Windows MAX_PATH headroom for deep torch/lib paths. ## Cleanup for upgrading users cleanup_legacy_extraction_root deletes the pre-0.6.2 unsuffixed path (chaosengine-embedded-runtime/<platform>/) on bootstrap, best-effort. Idempotent — once it's gone it's a no-op forever. This means users upgrading from 0.6.0/0.6.1 don't have to manually nuke their extraction dir before the first 0.6.2 launch. ## Diagnostics re-extract button Updated Python's _runtime_extraction_root to return the PARENT directory (chaosengine-embedded-runtime/) instead of computing a specific platform-tag path. After the hash-suffix switch, Python can't know the current hash without access to the manifest. Deleting the whole parent matches the button's semantics anyway — 'clear everything and re-extract from scratch on next launch'. 478 Python tests pass, cargo check clean.

User asked for: fixed-width terminal, single scroll region, step counter showing progress. Previous per-step <details> cards were OK on a 3-package install but stacked too tall on the 13-package GPU bundle — output scrolled off-screen and users lost track of which step was current. New layout: - Single monospace <pre> region, max-height 380px, auto-scrolls to bottom on new attempts (tail -f behaviour). Doesn't steal scroll on phase transitions — only on new output — so a user reading earlier lines doesn't get yanked forward. - Step line above the terminal shows 'Step 3/13: accelerate · 42%' while running, 'Final: 12/13 packages · 100%' when done. - Per-attempt markers ([ OK ], [FAIL], [....] for in-progress) line up on the left edge so failures are scannable. Also strips pip's dep-resolver noise from the displayed output. The user hit this on their Windows box where their .venv's leftover turboquant-mlx-full declared an mlx>= constraint that will never be satisfied on Windows — pip prints a scary-looking ERROR block ('chaos-engine-compressor ... requires safetensors, which is not installed'), cosmetic but alarming. Raw attempt.output still has the noise; we just filter it from the rendered terminal. Users who want the full pip trace still get it via the attempts array. Per CLAUDE.md #2 (Simplicity First): single component file, no new abstractions, no new deps. Per #3 (Surgical Changes): only touched InstallLogPanel.tsx + its CSS block. Per #4 (Goal-Driven): visible improvement (scroll works, step counter visible, noise suppressed) verifiable at a glance on next build. 174 TS tests still pass, tsc clean.

cryptopoly added 15 commits April 20, 2026 21:25

Add Karpathy behavioural guidelines to project CLAUDE.md

74ff866

Verbatim from ~/andrej-karpathy-skills/CLAUDE.md so the rules show up without needing the source file in scope. Lives at the top of the project guide because they govern how to work, not what's in the codebase.

cryptopoly merged commit 48a7601 into main Apr 21, 2026
1 check passed

cryptopoly mentioned this pull request May 1, 2026

Preserve Windows GPU runtime on uninstall + fix install log z-index #23

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/video gen#4

Feature/video gen#4
cryptopoly merged 15 commits intomainfrom
feature/video-gen

cryptopoly commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cryptopoly commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant