feat: 0.5.0 Capture release — dictation, MCP, personalities#544
feat: 0.5.0 Capture release — dictation, MCP, personalities#544
Conversation
Ships the Capture release end to end. Global-hotkey dictation with synthetic paste into the focused app on macOS and Windows, an on-screen pill across recording / transcribing / refining, customizable push-to- talk and toggle chords, and an accessibility-permission prompt scoped to Settings → Captures with inline re-check feedback. Voice profiles gain optional personalities that power compose / rewrite / respond actions via a local Qwen3 LLM — shared with refinement, so there is one local LLM in the app, not two. Refinement hardened with deterministic Whisper-loop collapse before the LLM sees the transcript, per-capture flag snapshots for re-runs, and a ten-transcript evaluation harness across every bundled refinement size. Version bump 0.4.5 → 0.5.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mounts FastMCP at /mcp (Streamable HTTP) so Claude Code, Cursor, Windsurf, and the VS Code MCP extensions can call voicebox.speak, voicebox.transcribe, voicebox.list_captures, and voicebox.list_profiles against the running Voicebox server. Backend - new backend/mcp_server package (tools, middleware, profile resolve, pub/sub events); named mcp_server to avoid shadowing the installed mcp PyPI package FastMCP imports internally - app.py migrated from @app.on_event to lifespan= so FastMCP's session manager cohabits with Voicebox's startup/shutdown - new MCPClientBinding table + /mcp/bindings CRUD; ClientIdMiddleware reads X-Voicebox-Client-Id into a ContextVar and stamps last_seen_at - profile resolution precedence: explicit -> per-client binding -> capture_settings.default_playback_voice_id - POST /speak REST wrapper for non-MCP callers (shell, ACP, A2A) - GET /events/speak SSE broadcasts speak-start / speak-end so the pill surfaces agent-initiated speech - backend/mcp_shim proxy (plain httpx) for stdio-only MCP clients - PyInstaller spec updates + new --shim build target (~18 MB) Frontend - Settings -> MCP page with HTTP / stdio / claude-mcp-add copy snippets, default voice picker, per-client bindings table, connection status - useMCPBindings, useSpeakEvents hooks - CapturePill gains 'speaking' state; DictateWindow subscribes to SSE and emits dictate:show so the Rust side surfaces the pill window Native - tauri.conf.json externalBin now includes voicebox-mcp - show_dictate_window helper + dictate:show listener in main.rs - (also in this commit: InputMonitoringGate UX, hotkey_monitor tweaks, landing footer/navbar updates, new overview docs for captures / dictation / mcp-server / voice-personalities) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pill window now surfaces for agent-initiated speech without main-window
involvement. Rust subscribes to /events/speak via a tokio task + reqwest
streaming body (speak_monitor.rs), shows the pill, and forwards events to
the dictate webview over Tauri's event bus. The pill plays audio via a
plain HTMLAudioElement and emits dictate:hide when playback ends. The
pill stays hidden through the ~1 s generation wait and only surfaces when
audio actually starts, with the counter armed at that moment.
Fixes a shared-dict mutation in mcp_server/events.publish() that caused
the second subscriber (Rust speak_monitor) to receive `event: message`
instead of named speak-start/speak-end frames. Also teaches the speak_monitor
parser to handle CRLF framing (sse-starlette default). Main-window
AudioPlayer now skips autoplay for source in {mcp, rest} to avoid
double-play when both windows are alive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stops the "stuck pill" failure where pressing the chord with missing STT/LLM models triggers a recording that has nowhere to land. The hotkey now stays disarmed until every gate (models downloaded, Input Monitoring + Accessibility granted) is green; the empty-state checklist in CapturesTab surfaces each unmet gate with a one-click action and auto-arms the chord once everything turns green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Collapse intent tri-state (respond/rewrite/compose) to `personality: bool` on /generate, /speak, and voicebox.speak. Drop respond entirely; keep compose as a standalone button via /profiles/{id}/compose. Remove /rewrite, /respond, and /speak profile endpoints.
- FloatingGenerateBox: Wand2 persona toggle + Dices compose button appear when the selected profile has a personality. ProfileCard badges Wand2 alongside the effects Sparkles.
- MCP bindings: default_intent column → default_personality: bool. Migration drops the legacy column.
- i18n: en / ja / zh-CN / zh-TW translation files filled out and wired through the capture, server, and profile UI.
```ts
voicebox.speak({
text: "Deploy complete.",
profile: "Morgan",
personality: true, // rewrite through the profile's personality LLM
});
```
BOOL moved from Win32::Foundation to windows::core in 0.62. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
macOS apps match Cmd+V against the layout-translated character via NSMenu key equivalents, so posting kVK_ANSI_V (= 9, the QWERTY V position) on Dvorak produces Cmd+. and never triggers Paste. New keyboard_layout module resolves the active layout's V keycode via TISCopyCurrentKeyboardLayoutInputSource + UCKeyTranslate, caches it in an AtomicU16, and refreshes on kTISNotifySelectedKeyboardInputSourceChanged. All TIS calls run on the main thread (init from Tauri setup; observer callback delivered to the main runloop); synthetic_keys::send_paste reads the cached value once per paste. Falls back to kVK_ANSI_V when resolution fails or the active input source carries no Unicode key layout data. Windows is intentionally left on hardcoded VK_V — SendInput delivers WM_KEYDOWN with wParam = VK_V to the target regardless of the active layout, which is why `Send "^v"` works for AutoHotkey on Dvorak Windows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 14+ macOS 14 deprecated NSRunningApplication.activateWithOptions: in favour of a cooperative-activation pattern: the caller first yields activation rights to the target, then the target activate()s against the tightened Sonoma foreground rules. Without the yield, activate() on 14+ sometimes silently fails or only bounces the dock icon — the exact "paste lands in the wrong app" symptom we were previously one API break away from. activate_pid now discovers the 14+ selector via respondsToSelector: and branches: on 14+ it yieldActivationToApplication:'s from NSRunningApplication.current then calls -activate on the target; on 11–13 it stays on -activateWithOptions: (still the only option). Both branches propagate the BOOL return — if activation is refused we error out before clobbering the clipboard instead of silently proceeding. The respondsToSelector: result is cached in a OnceLock so the probe isn't repeated on every paste. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… failure Two bugs in paste_final_text' clipboard handling: 1. Restore was unconditional. If the user ⌘C'd in the target app during the 400 ms paste-consume window — or a clipboard history tool (Paste, Pastebot, Maccy) or Universal Clipboard sync snapshotted our staged text — the blind restore overwrote their newer content with the pre-paste snapshot, silently losing user data. 2. send_paste' errors were propagated with ? before the restore, so a CGEventPost / SendInput failure left the user's clipboard stuck on the transcript. Fix folds both into one pattern: capture the post-write change count, re-read it after paste-consume, restore only when they match (plus treat a change-count read failure as "unknown, don't overwrite"). Isolate send_paste's error so the restore runs regardless of paste success, then propagate the paste error after. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream Narsil/rdev has shipped no release since 2023-06 (crates.io still serves 0.5.3), so the Sonoma main-thread fix we depend on — PR #147, applied at hotkey_monitor.rs:184 — is only reachable via a git pin. A pin to a third-party repo breaks the build whenever the remote force-pushes, renames, or is taken down, and Cargo does not durably cache git-dep archives the way it does crates.io tarballs. Forking to jamiepine/rdev at the same SHA removes that failure mode without changing crate behavior and gives us a place to cherry-pick future OS-compatibility fixes on our own timeline. The SHA was verified to exist on the fork before re-pinning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two reliability gaps in the /events/speak subscriber: 1. resp.chunk().await had no idle timeout. A backend that accepts the TCP connection but stops producing frames (deadlocked SSE endpoint, zombie process) would block the task forever without reconnecting. The pill window would never surface for agent-initiated speech and there would be nothing to log. Backend emits a `:ping` heartbeat every 15 s, so 45 s without any data is now treated as a dead stream — the task errors out and the reconnect loop takes over. 2. Flat 2 s backoff escalates nowhere. Logs fill with reconnect lines when the backend is down for minutes, and a backend that accepts + immediately closes connections (no data) spins the loop tightly. Backoff now escalates 500 ms → 30 s on unproductive rounds and resets only when at least one frame arrives (the connection was genuinely productive, not just accepted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The word-level pass catches single-word Whisper loops ("URL URL URL…")
but misses two common hallucination patterns the PR had to claim as
"edge cases":
1. Multi-word English loops — "thanks for watching thanks for watching…"
× 6 sails through because no two consecutive tokens are identical
after text.split().
2. CJK loops — "謝謝觀看" × 7 sails through because text.split() returns
a single unsplit token for the whole loop (no whitespace between
characters).
Add a character-level second pass: a non-greedy regex finds any 2–60
char substring that repeats min_run+ times immediately after itself and
strips the run. The 2-char floor keeps emphasised single-letter runs
("wooooooow") intact. The 60-char ceiling covers every observed
Whisper tail hallucination ("Please like and subscribe to my
channel.", "Subtitles by the Amara.org community") while staying short
enough that coincidental long-phrase repetition in legitimate speech
doesn't hit the threshold. Whitespace normalisation only runs when the
pass actually stripped something, so untouched transcripts keep their
original spacing.
New test_refinement_collapse.py gives the pre-processor its first
deterministic unit-test coverage: 17 tests pinning the word-level
legacy behaviour plus the new multi-word English / CJK / Japanese /
emphasis-preservation cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SQLite gained ALTER TABLE … DROP COLUMN in 3.35 (Mar 2021). Production PyInstaller builds bundle Python 3.12 which links to SQLite 3.40+ so that path is always safe, but a dev running the backend directly on Ubuntu 20.04 (3.31) or Debian 11 (3.34) would crash on first startup trying to drop the legacy default_intent column. Add _supports_drop_column(engine) — returns True on non-SQLite dialects (Postgres / MySQL have supported DROP COLUMN for decades) and gates on the runtime sqlite_version for SQLite. When unsupported, log a warning and leave the unused column in place: SQLAlchemy only maps declared columns, so a stray default_intent column does no reads or writes and can't interfere with runtime behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g models The inline lifespan ran _run_shutdown inside the MCP context, so the TTS / Whisper / LLM models were unloaded *before* FastMCP's __aexit__ got a chance to cancel its in-flight session tasks. Any MCP request mid-generate at shutdown time would crash on "model unloaded" instead of receiving a clean session-cancelled error. Rewire via compose_lifespan (which was already defined in mcp_server.server for exactly this purpose but never used): AsyncExitStack enters factories in order and exits in LIFO, so MCP teardown fires first — cancelling sessions — and _run_shutdown runs after nothing is holding the models. Smoke test shows the log order flipped as expected: Ready StreamableHTTP session manager started ... running ... StreamableHTTP session manager shutting down ← was last, now first Voicebox server shutting down... ← was first, now last As a side benefit, _run_shutdown is now paired with _run_startup via try/finally inside voicebox_lifespan, so a partial startup (models half-loaded, MCP __aenter__ fails) still unloads whatever was loaded instead of leaking it to process exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /speak is a REST wrapper around voicebox.speak for agents that
don't talk MCP (shell scripts, ACP, A2A). It reads X-Voicebox-Client-Id
and uses it for the same per-client profile resolution + default
personality lookup the MCP tool does (speak.py:39-64), so its callers
are first-class clients — but the ClientIdMiddleware only stamped
last_seen_at on /mcp* paths. REST speak callers showed up as "never
seen" in Settings → MCP despite actively acting on their bindings.
Widen the stamp predicate to an explicit ("/mcp", "/speak") prefix
list, and require a path boundary on match so future routes named
/mcpfoo or /speakers don't silently inherit the stamp via the prefix.
New test_client_id_middleware.py pins the scope with 17 parametrised
cases (both the allowed set and the overlap cases that must not match).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the placeholder fake-waveform + play button in CapturesTab's audio card with a real CaptureInlinePlayer (wavesurfer.js). The player renders the actual waveform, lets users scrub through the clip, and shows a proper current/total timestamp pair in place of the duration-only label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap useUIStore in zustand/middleware's persist under the key voicebox-ui. partialize only selectedProfileId so volatile UI state (dialog open flags, form drafts, engine/voice pickers, sidebar) stays in-memory as before — but reopening the app no longer loses whichever profile the user was last working with. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Captures/dictation and personality-driven generation, an MCP server (HTTP + stdio shim) with per-client bindings, local Qwen3 LLM backends, capture CRUD/refinement pipelines, extensive Tauri native integrations for hotkeys/permissions/clipboard/focus, many frontend UI pages/components, API surface and DB migrations, and bumps version to 0.5.0. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Hotkey as Hotkey Monitor<br/>(Tauri/Rust)
participant Dictate as DictateWindow<br/>(React)
participant Backend as Backend<br/>(Python)
participant STT as Whisper<br/>(STT)
participant LLM as Qwen3<br/>(LLM)
participant Clipboard as Clipboard<br/>(Tauri)
User->>Hotkey: Hold/Toggle chord
Hotkey->>Dictate: dictate:start (with FocusSnapshot)
Dictate->>Dictate: Show CapturePill (recording)
Dictate->>Dictate: Record audio
User->>Hotkey: Release chord
Hotkey->>Dictate: dictate:stop
Dictate->>Backend: POST /captures (upload)
Backend->>STT: Transcribe audio
STT-->>Backend: Raw transcript
alt auto_refine true
Backend->>LLM: refine/rewrite transcript
LLM-->>Backend: refined transcript
end
Backend-->>Dictate: Capture response (id, text, flags)
alt allow_auto_paste
Dictate->>Clipboard: write/paste final text
end
Dictate->>Dictate: CapturePill -> completed -> rest
sequenceDiagram
actor Agent
participant MCP as MCP Server (FastMCP)
participant Backend as Backend (Python)
participant LLM as Qwen3 (LLM)
participant TTS as TTS Engine
participant Frontend as DictateWindow (React)
participant SSE as SSE (/events/speak)
Agent->>MCP: voicebox.speak(profile?, personality?)
MCP->>MCP: resolve_profile (explicit → binding → default)
MCP->>Backend: POST /generate (personality=true)
Backend->>LLM: rewrite/compose if personality
LLM-->>Backend: in-character text
Backend->>TTS: generate audio
TTS-->>Backend: WAV
Backend->>Backend: persist generation, publish speak-start
MCP-->>Agent: generation_id + status URL
Frontend->>SSE: subscribe
SSE-->>Frontend: speak-start
Frontend->>Backend: GET /audio/{id}
Backend-->>Frontend: stream audio
Frontend->>Frontend: play audio, show pill
Backend->>SSE: publish speak-end
SSE-->>Frontend: speak-end -> cleanup
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 687ab2a. Configure here.
The six-gate checklist only rendered in the CapturesTab empty state, so a user already on the settings page had no single surface showing which gate was red — the inline InputMonitoringNotice covered one, the model pickers covered another, and Accessibility was only hinted at by the auto-paste toggle. Mirror the same component into the right sidebar of the settings page so every gate (STT model, LLM model, Input Monitoring, Accessibility, plus the hotkey toggle in the main column) is always visible while the user configures dictation. New compact prop on DictationReadinessChecklist drops the centered header and empty-state max-width so it fits the 280 px sidebar next to the existing About / Differences blocks. Callers in compact mode own the heading — CapturesPage reuses the existing captures.readiness.title key (present in en / ja / zh-CN / zh-TW already) as an h3 matching the sibling sidebar sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/src/components/ServerTab/GenerationPage.tsx (1)
64-95:⚠️ Potential issue | 🔴 CriticalSlider writes are unthrottled and will spam the settings endpoint on every pointer move during drag.
Dragging
maxChunkCharsorcrossfadeMstriggersonValueChangedozens of times per second, each callingupdate()→mutation.mutate()→apiClient.updateGenerationSettings(patch)with no debounce, throttle, or request coalescing. This produces a stream of unthrottled HTTP PATCH requests. The optimisticonMutateupdate masks latency but does not prevent the request storm.Additionally, if any mid-drag request fails and
onErrorreverts the cache to a previous state, a subsequent successful request could overwrite that reversion, leaving the persisted value out of sync with the UI. Add debounce (e.g., 500–1000ms) to the update call or useuseDebouncedCallbackto coalesce slider changes during drag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerTab/GenerationPage.tsx` around lines 64 - 95, The Slider onValueChange handlers call update(...) which immediately invokes mutation.mutate (via apiClient.updateGenerationSettings) on every pointer move; debounce or coalesce those calls so the server is only patched after pauses (e.g., 500–1000ms) and only send the latest value. Modify the update function used by the Slider components (referenced as update and the Slider id props "maxChunkChars" and "crossfadeMs") to buffer rapid changes and call mutation.mutate once after a debounce delay (or use useDebouncedCallback), cancel/replace previous pending debounced calls, and ensure the mutation’s onMutate/onError handlers correctly reconcile optimistic state with the final settled value to avoid reverting to stale values.app/src/components/VoiceProfiles/ProfileForm.tsx (1)
825-835:⚠️ Potential issue | 🟡 MinorDiscard reset is missing
personality(andavatarFile).The other three
form.reset(...)calls for this form (lines 333, 347, 372) all includepersonality: ''. The discard-draft reset below omits it, so after clicking Discard a draft'spersonalityvalue stays in the form state (and the rendered textarea) rather than being cleared. Same foravatarFile. Align with the other resets for consistency.🔧 Proposed fix
form.reset({ name: '', description: '', language: 'en', + personality: '', sampleFile: undefined, referenceText: '', + avatarFile: undefined, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 825 - 835, The discard handler currently resets the draft via setProfileFormDraft(null) and calls form.reset(...) but omits the personality and avatarFile fields, leaving stale values; update the form.reset call inside the onClick to include personality: '' and avatarFile: undefined (matching the other resets) so the textarea and avatar state are cleared, keeping setProfileFormDraft and setSampleMode('record') as they are.
🧹 Nitpick comments (41)
landing/src/components/Personalities.tsx (1)
175-235: Optional: respectprefers-reduced-motionfor the 4.5s auto-cycle.The section auto-cycles continuously via
setInterval, which can be uncomfortable for motion-sensitive visitors. Consider pausing the rotation (and/or skipping the framer-motion entrance) whenwindow.matchMedia('(prefers-reduced-motion: reduce)').matches.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@landing/src/components/Personalities.tsx` around lines 175 - 235, The auto-rotation in Personalities uses setInterval inside the useEffect which ignores users' prefers-reduced-motion; update the effect to detect window.matchMedia('(prefers-reduced-motion: reduce)') and if it matches, do not start the interval (or clear it immediately), and instead keep idx fixed; also pass a prop like reducedMotion or prefersReducedMotion into ModeDemo (and any child components that use framer-motion) so they can skip entrance/animation when true. Ensure you reference Personalities, its useEffect that creates iv, and ModeDemo (add a boolean prop) when making the changes.app/src/components/ServerSettings/ModelManagement.tsx (2)
86-91: Model descriptions remain hardcoded English while surrounding UI is i18n'd.These strings (like the pre-existing TTS/Whisper entries) bypass
t()so they won't be translated in ja / zh-CN / zh-TW. Consistent with the existing pattern here, so not a regression for this PR — flagging for a future refactor to moveMODEL_DESCRIPTIONSbehind an i18n key namespace (e.g.models.descriptions.qwen3-0.6b).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 86 - 91, MODEL_DESCRIPTIONS contains hardcoded English strings (e.g., keys 'qwen3-0.6b', 'qwen3-1.7b', 'qwen3-4b') that bypass i18n; update the code so descriptions are retrieved via the translation function t() from an i18n namespace (suggested keys like models.descriptions.qwen3-0.6b etc.) instead of inline text, and refactor any use-sites in ModelManagement.tsx to call t(`models.descriptions.${modelKey}`) (or fallback to the English string) so descriptions are translatable.
420-420: Prefixqwen3-is fine today but narrow — consider explicit allowlist.If a future model is named e.g.
qwen3-tts-*orqwen3-embed-*, it will be silently categorized as a language model. Since the three qwen3 LLM variants are enumerated inMODEL_DESCRIPTIONSjust above, you could tighten this by matching against that known set, or by introducing amodel.kind === 'llm'field returned by the backend.♻️ Optional: filter against known LLM ids
- const llmModels = modelStatus?.models.filter((m) => m.model_name.startsWith('qwen3-')) ?? []; + const LLM_MODEL_NAMES = new Set(['qwen3-0.6b', 'qwen3-1.7b', 'qwen3-4b']); + const llmModels = modelStatus?.models.filter((m) => LLM_MODEL_NAMES.has(m.model_name)) ?? [];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ModelManagement.tsx` at line 420, The current llmModels filter uses a broad prefix check (llmModels = modelStatus?.models.filter((m) => m.model_name.startsWith('qwen3-'))), which can misclassify future qwen3 variants; change the filter to match only the explicit, known LLM ids from MODEL_DESCRIPTIONS (or another local allowlist) by comparing model.model_name against that set (e.g. build a Set from Object.keys(MODEL_DESCRIPTIONS) or the enumerated ids) or, if available, use a backend-provided discriminator like model.kind === 'llm' instead; update the llmModels assignment to use that allowlist-based check against modelStatus.models so only the intended qwen3 LLMs are included.backend/mcp_shim/__init__.py (1)
6-6: Docstring hardcodes127.0.0.1:17493.If the backend port is configurable elsewhere (env var / CLI flag in
__main__.py), consider softening the docstring to "the Voicebox server's/mcp/endpoint" to avoid it drifting from reality later. Minor doc nit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_shim/__init__.py` at line 6, The module-level docstring in backend/mcp_shim/__init__.py currently hardcodes "http://127.0.0.1:17493/mcp/"; update that docstring to a generic description such as "the Voicebox server's /mcp/ endpoint" (or otherwise reference the configurable backend host/port) so it doesn't drift if the address is configurable elsewhere (e.g., __main__.py or env vars); edit the module docstring text in __init__.py to remove the literal IP:port and replace it with the softer phrasing.scripts/setup-dev-sidecar.js (1)
369-379: Optional: derive sidecar list fromtauri.conf.jsoninstead of duplicating.The in-code comment already acknowledges this list must be kept in sync with
externalBinintauri.conf.json. Reading and parsing that file would eliminate the drift risk the comment warns about. Not a blocker; fine to keep as-is for now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/setup-dev-sidecar.js` around lines 369 - 379, Replace the hard-coded SIDECAR_BASE_NAMES with a dynamically derived list by reading and parsing tauri.conf.json: in main() (which currently calls getTargetTriple() and loops over SIDECAR_BASE_NAMES to call createPlaceholderBinary), load and parse tauri.conf.json, extract the externalBin keys/names from the configuration (or from build.externalBin depending on schema), map/normalize them to the same baseName format used today, and iterate that array instead of SIDECAR_BASE_NAMES; keep getTargetTriple() and createPlaceholderBinary usages unchanged and add minimal error handling if tauri.conf.json is missing or malformed.app/src/router.tsx (1)
116-116: Stale comment: captures is no longer a prototype.AudioTab has already been removed from the router and sidebar in this PR, so the "prototype — will replace AudioTab once the new flow is ready" wording is misleading. Consider updating or removing the comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/router.tsx` at line 116, The inline comment above the captures route is stale and misleading because AudioTab has already been removed; update or delete the comment near the captures route declaration in router.tsx so it no longer says "prototype — will replace AudioTab once the new flow is ready" (either remove the prototype note or replace it with a brief current description of the captures route/component and its purpose, referencing the captures route/component name and AudioTab for clarity).backend/routes/events.py (1)
35-40: Optional: use the builtinTimeoutErroralias (Ruff UP041).Since Python 3.11,
asyncio.TimeoutErroris an alias for the builtinTimeoutError. Small cleanup:- try: - event = await asyncio.wait_for(queue.get(), timeout=15.0) - except asyncio.TimeoutError: + try: + event = await asyncio.wait_for(queue.get(), timeout=15.0) + except TimeoutError:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/events.py` around lines 35 - 40, The except block catches asyncio.TimeoutError but Python 3.11 provides a builtin alias, so replace the exception reference with the builtin TimeoutError in the async generator that awaits asyncio.wait_for(queue.get(), timeout=15.0); update the except clause from "except asyncio.TimeoutError:" to "except TimeoutError:" (no other behavior changes) to satisfy the Ruff UP041 suggestion while keeping the heartbeat yield logic intact.landing/src/app/page.tsx (1)
84-88: Nit: hardcodedtext-whiteinside a themed paragraph.The surrounding subtitle uses
text-muted-foreground;text-whiteon the bolded span will look slightly off if the landing site ever gets a light-mode variant. Considertext-foreground(or a dedicated token) instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@landing/src/app/page.tsx` around lines 84 - 88, The bold span uses a hardcoded class "text-white" inside a paragraph that uses "text-muted-foreground", which will break theming; replace the "text-white" class on the <b> element with a theme-aware token like "text-foreground" (or a dedicated token) so the emphasized text inherits the correct foreground color in both light and dark modes and keep the surrounding "text-muted-foreground" intact.backend/routes/generations.py (1)
11-14: Moveloggerbelow the imports to satisfy E402.Ruff flags
from ..services import …at line 14 as a module-level import after non-import code (thelogger = …at line 11). Swap the ordering so all imports sit at the top of the module.🔧 Proposed fix
-logger = logging.getLogger(__name__) - from .. import models from ..services import history, personality, profiles, tts from ..database import Generation as DBGeneration, VoiceProfile as DBVoiceProfile, get_db from ..services.generation import run_generation from ..services.task_queue import cancel_generation as cancel_generation_job, enqueue_generation from ..utils.tasks import get_task_manager + +logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/generations.py` around lines 11 - 14, The logger assignment is placed before module imports causing an E402 import-order error; move the line "logger = logging.getLogger(__name__)" so it appears after the import block (after the "from .. import models" and "from ..services import history, personality, profiles, tts" lines) so all imports are at the top of the module and then initialize logger.app/src/lib/hooks/useSettings.ts (1)
18-99: Clean optimistic-update pattern; consider extracting the shared shape.Both
useCaptureSettingsanduseGenerationSettingsfollow identical optimistic-patch + rollback + server-reconcile logic — the only variations are the query key, fetcher, and mutator. If a third settings surface appears (and the MCP/captures trajectory suggests one will), a tinycreateSettingsHook({ key, fetch, patch })factory would eliminate the duplicated branches. Not blocking — fine to defer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/lib/hooks/useSettings.ts` around lines 18 - 99, Both hooks useCaptureSettings and useGenerationSettings duplicate the same optimistic-update/rollback/reconcile logic; extract that shared pattern into a factory like createSettingsHook({ key, queryFn, mutationFn }) which encapsulates the useQuery + useMutation setup (including cancelQueries, getQueryData, setQueryData, onMutate/onError/onSettled behavior) and returns { settings, isLoading, update }; then replace useCaptureSettings to call createSettingsHook with CAPTURE_SETTINGS_KEY, apiClient.getCaptureSettings, apiClient.updateCaptureSettings and replace useGenerationSettings to call it with GENERATION_SETTINGS_KEY, apiClient.getGenerationSettings, apiClient.updateGenerationSettings so future setting hooks reuse the same implementation.backend/mcp_server/server.py (1)
13-13: Prefercollections.abc.Callable(Ruff UP035).Per static analysis and PEP 585 guidance, import
Callablefromcollections.abcrather thantyping.-from typing import Callable +from collections.abc import Callable🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/server.py` at line 13, The import currently uses Callable from typing; update the import to use collections.abc.Callable to satisfy Ruff UP035/PEP585 guidance—replace the line "from typing import Callable" with an import of Callable from collections.abc so any references to Callable in this module continue to work while avoiding the deprecated typing import.backend/mcp_server/resolve.py (1)
55-57:with_db()name implies a context manager but it isn't.Returning
next(get_db())produces a bareSessionthat the caller must remember to.close()— thewith_prefix suggestswith with_db() as db:usage which will fail. Consider renaming tonew_db_session()(or similar) or converting to an actual@contextmanager.♻️ Option: context manager form
+from contextlib import contextmanager + -def with_db() -> Session: - """Utility for tool handlers that aren't managed by FastAPI's Depends.""" - return next(get_db()) +@contextmanager +def with_db(): + """Utility for tool handlers that aren't managed by FastAPI's Depends.""" + db = next(get_db()) + try: + yield db + finally: + db.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/resolve.py` around lines 55 - 57, The function with_db() is misleading because it returns a raw Session from next(get_db()) but is named like a context manager; change it so callers don't leak sessions by either (a) convert with_db into a real context manager (e.g., decorate with contextlib.contextmanager or return a contextmanager object that yields the Session and ensures session.close() after use) using get_db() to obtain and finally close the session, or (b) rename the function to new_db_session() (or acquire_db_session()) and document that it returns a plain Session that the caller must close; update all call sites accordingly (references: with_db, get_db).backend/services/settings.py (2)
21-38:_get_or_create_*has a narrow race on first creation.Two concurrent requests hitting the endpoint before the singleton row exists can both miss the
first()check and bothdb.add(row)→ commit; the second loses to aPRIMARY KEYconflict. Under the single-user desktop assumption this is essentially impossible, but a single-rowINSERT OR IGNORE(or a try/except IntegrityError that re-queries) would make it bulletproof and cheap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/settings.py` around lines 21 - 38, The _get_or_create_capture_row and _get_or_create_generation_row functions have a race when two requests concurrently hit the first() check; fix by performing an atomic insert-or-ignore or by catching IntegrityError on commit and re-querying: attempt to create the singleton row (DBCaptureSettings/DBGenerationSettings with id=SINGLETON_ID), commit in a try block, on IntegrityError rollback and run the original query again to return the existing row; ensure you import IntegrityError from sqlalchemy.exc and apply the same pattern to both functions so the code is race-proof.
46-53:value is not Nonefilter prevents clearing nullable columns.Callers cannot set
default_playback_voice_id(or any other nullable field) back toNULLthrough this API sinceNonevalues are unconditionally skipped. If clearing isn't a desired operation, document the contract; otherwise use a sentinel (MISSING) so explicitNoneflows through:♻️ Option — sentinel-based patch filtering
-def update_capture_settings(db: Session, patch: dict[str, Any]) -> DBCaptureSettings: - row = _get_or_create_capture_row(db) - for key, value in patch.items(): - if value is not None and hasattr(row, key): - setattr(row, key, value) +_MISSING = object() + +def update_capture_settings(db: Session, patch: dict[str, Any]) -> DBCaptureSettings: + row = _get_or_create_capture_row(db) + for key, value in patch.items(): + if value is _MISSING or not hasattr(row, key): + continue + setattr(row, key, value) db.commit() db.refresh(row) return row…with callers (Pydantic models) producing
_MISSINGfor unset fields andNonefor explicit clears.Also applies to: 61-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/settings.py` around lines 46 - 53, The current update_capture_settings function skips any keys whose value is None, which prevents callers from clearing nullable columns (e.g., default_playback_voice_id); change the logic to treat None as a valid explicit value by either (a) using a sentinel (e.g., _MISSING) such that callers send _MISSING for absent fields and None for explicit clears and then only skip when value is _MISSING, or (b) stop filtering on value and instead check key presence (if key in patch) and hasattr(row, key) then setattr(row, key, value). Apply the same sentinel/presence-based fix to the analogous code block referenced at lines 61-68 so explicit None values are persisted rather than ignored.backend/database/migrations.py (1)
269-274: Redundant localimport sqlite3.
_supports_drop_columnalready importssqlite3; a top-of-file import would remove the duplicate local imports here and at line 283. Not a bug — purely tidiness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/database/migrations.py` around lines 269 - 274, The local import sqlite3 inside the _supports_drop_column context is redundant; remove the inline import statements (both occurrences in this function) and instead add a single top-level import sqlite3 at the module scope, so functions like _supports_drop_column and the logger.warning call can use the module-level sqlite3.sqlite_version without duplicate local imports.app/src/components/CapturesTab/CaptureInlinePlayer.tsx (2)
90-109: Minor: in-flightloadon a previousaudioUrlcan land after a new load starts.When
audioUrlchanges rapidly, the priorws.load(url).catch(...)may reject with an abort error after the new load has already reset state, briefly flashing the old URL's error. Low severity since WaveSurfer aborts pending loads onload()/destroy(), but worth guarding with anisCurrentflag if it proves visible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/CapturesTab/CaptureInlinePlayer.tsx` around lines 90 - 109, When audioUrl changes fast a previous ws.load(audioUrl) can reject after a newer load has started causing stale error/state flashes; in the useEffect that references wavesurferRef and ws.load, introduce a local boolean flag (e.g. isCurrent) set true at start and set false in the effect cleanup, and only call setError, setIsLoading, setCurrentTime, setDuration, setIsPlaying (and any other state updates) inside ws.load.then/catch if isCurrent is still true; ensure you also guard the try/catch reset block (ws.pause(), ws.seekTo(0)) if needed so they only affect the current load lifecycle for the current audioUrl.
33-57: Waveform colors are captured once and won't follow theme changes.
cssHslareads--foreground/--accentat instantiation; switching theme (light ↔ dark) at runtime leaves the existing WaveSurfer instance with stale colors until the component remounts. If runtime theme switching is supported, subscribe to the theme change and callws.setOptions({ waveColor, progressColor }), or key the component on the theme.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/CapturesTab/CaptureInlinePlayer.tsx` around lines 33 - 57, The WaveSurfer instance in CaptureInlinePlayer captures CSS variables once inside the useEffect (cssHsla) so it won't update when the app theme changes; modify the hook to react to theme changes by either adding the theme variable as a dependency (or subscribing to your theme provider) and when theme changes call ws.setOptions({ waveColor: cssHsla('--foreground', 0.25), progressColor: cssHsla('--accent', 1) }) or recreate the WaveSurfer instance, and ensure you reference the existing ws instance created in the useEffect so colors are updated at runtime.backend/routes/llm.py (1)
27-40: Private-method access (backend._is_model_cached) leaks implementation detail.Reaching into
backend._is_model_cached(...)from a route handler couples the HTTP layer to a protected member of the LLM backend. Either promote it to a publicis_model_cached(size)on theLLMBackendprotocol (so other backends implement it consistently) or fold the "cached?" check into a higher-level service call — e.g.llm.ensure_loaded_or_schedule_download(model_size)returning("loaded" | "scheduled", task_name)— so the route doesn't need to reason about cache layout at all.While you're here, the
except Exceptionat line 71-72 also trips Ruff's B904 and spills the backend exception string to the client; chaining withfrom epreserves the traceback for logs, and the client-facing detail should probably be a generic "LLM generation failed" with the real exception logged server-side:♻️ Smaller inline tweak for B904 + message hygiene
except Exception as e: - raise HTTPException(status_code=500, detail=str(e)) + logger.exception("LLM generation failed") + raise HTTPException(status_code=500, detail="LLM generation failed") from e(Requires a module-level
logger = logging.getLogger(__name__).)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/llm.py` around lines 27 - 40, The route is reaching into a protected backend method backend._is_model_cached and also catches Exception then exposes the raw error; replace this by adding a public is_model_cached(self, size) to the LLMBackend protocol (or implement a higher-level llm.ensure_loaded_or_schedule_download(model_size) that returns ("loaded" | "scheduled", task_name)) and update the route to call that public API instead of backend._is_model_cached; move download logic into a backend/llm service method (e.g., download_llm_background or ensure_loaded_or_schedule_download) so the route only starts the background task via task_manager.start_download and create_background_task with the service coroutine, and change the exception handling inside the background task to log the real exception with logger.exception(...) or re-raise using "from e" while returning/sending a generic client-facing message like "LLM generation failed" to avoid leaking internals (update task_manager.error_download(progress_model_name, ...) to receive only sanitized message while logging the original).backend/routes/speak.py (1)
66-77: In-function import hints at a circular-import workaround — consider refactoring.Importing
generate_speechinside the handler suggests a circular dependency betweenroutes/speak.pyandroutes/generations.py. If that's the case, the cleaner fix is to extract the generation orchestration (generate_speech) into aservices/module that both route modules depend on, so both REST surfaces and the MCP tool share one canonical call site and the handler can import it at module top.Not blocking — the inline import works — but it's a latent source of ordering bugs and complicates the test surface.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/speak.py` around lines 66 - 77, The inline import of generate_speech inside the handler indicates a circular-import workaround between routes/speak.py and routes/generations.py; refactor by moving the orchestration function (generate_speech) into a new services module (e.g., services/generation_service or services/speech_service) that exposes the same function signature (accepting models.GenerationRequest and db) so both routes/speak.py and routes/generations.py can import it at module top; update the imports in both route modules to import generate_speech from the new services module and keep the call site using models.GenerationRequest(...) unchanged to preserve behavior and tests.backend/app.py (1)
214-214:applicationparameter is unused.
_run_startuptakesapplication: FastAPIbut never references it. Drop the parameter (and update the call site on line 83) for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app.py` at line 214, The _run_startup function currently declares an unused parameter application: FastAPI; remove that parameter from the _run_startup signature and update any callers to call _run_startup() without arguments (specifically the place that invoked _run_startup previously using the application variable). Ensure the function definition matches the new no-argument call and run tests / linter to confirm no remaining references to the removed parameter.backend/tests/test_personality_samples.py (1)
1-23: Filename starts withtest_but this is a manual evaluator, not a pytest test.Pytest will collect it during test runs. It currently has no top-level side effects and no
test_*functions, so collection is harmless today, but a future reader may mistake it for CI coverage of the personality service. Consider renaming (e.g.,eval_personality_samples.py) or moving it out ofbackend/tests/.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_personality_samples.py` around lines 1 - 23, The file is named with a pytest-discovered prefix but is an interactive evaluator, so rename or relocate it to avoid accidental test collection; specifically, change backend/tests/test_personality_samples.py to a non-test name like eval_personality_samples.py (or move it out of backend/tests/) and update any README/usage comments or scripts that reference the original filename so invocations (e.g., the top-level docstring examples) still work; ensure no import paths or tooling expect the old test_ prefix before committing.app/src/components/DictateWindow/DictateWindow.tsx (1)
190-194:source.onerrorsilently hopes EventSource recovers — consider a bounded retry cap.The handler is intentionally empty so
EventSourcecan auto-reconnect on transient drops. If the backend pod is actually gone (generation row deleted, server restart mid-flight), the browser will reconnect forever and the pill sits on "speaking" untildictate:speak-endrescues it. Worth adding either a soft timeout (e.g., dismiss if nocompleted/failedafter N seconds) or a retry counter, so a missedspeak-endcan't leave the pill stuck permanently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/DictateWindow/DictateWindow.tsx` around lines 190 - 194, The EventSource onerror handler currently does nothing (source.onerror), letting the browser reconnect indefinitely and potentially leaving the UI stuck; add a bounded retry/timeout: in the component/function that creates the EventSource (DictateWindow where source is created), track a retry counter and/or start a soft timeout when opening the EventSource, increment the counter in source.onerror and if it exceeds N (e.g., 3) or the timeout elapses without receiving a terminal event ('completed'/'failed' or the existing dictate:speak-end), programmatically dismiss the pill by dispatching the same action used for speak-end (or calling the existing handler that ends speaking), clear timers and close source; also ensure the 'completed'/'failed' message handlers and the dictate:speak-end flow clear the timer and reset the retry counter so normal reconnections don't trigger the fallback.app/src/components/AccessibilityGate/AccessibilityGate.tsx (1)
105-107:<Trans>pathcomponent renders as an unstyled<span>.
components={{ path: <span /> }}means the<path>System Settings → …</path>markers in the translations add no visual affordance (bold, monospace, color) — they just become inert spans. If the intent is to make the settings breadcrumb visually distinct (common in i18n'd instructional copy), apply a class; if there's no intent, drop the wrapper and simplify the translations.📝 Example
- <Trans i18nKey="captures.permissions.accessibility.body" components={{ path: <span /> }} /> + <Trans + i18nKey="captures.permissions.accessibility.body" + components={{ path: <span className="font-medium text-foreground" /> }} + />🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/AccessibilityGate/AccessibilityGate.tsx` around lines 105 - 107, The <Trans> usage in AccessibilityGate.tsx (i18nKey "captures.permissions.accessibility.body") passes components={{ path: <span /> }} which renders unstyled spans for the "path" markers; update the components prop to either remove the wrapper and adjust the translation to plain text, or replace the empty span with a styled element (e.g., add a clear visual affordance via className such as monospace/bold/color) so the settings breadcrumb is visually distinct; ensure the change targets the <Trans> call in AccessibilityGate.tsx and the "path" component only.CHANGELOG.md (1)
62-64: Addshelltag to code fence (MD040).Same markdownlint nit as the other docs — specify the language for the Claude Code one-liner block.
📝 Proposed fix
-``` +```shell claude mcp add voicebox --transport http --url http://127.0.0.1:17493/mcp --header "X-Voicebox-Client-Id: claude-code" ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 62 - 64, The code fence around the one-line Claude Code command is missing a language tag (MD040); update the fenced block that contains "claude mcp add voicebox --transport http --url http://127.0.0.1:17493/mcp --header \"X-Voicebox-Client-Id: claude-code\"" to include the shell tag (i.e., change ``` to ```shell) so the block is fenced as a shell snippet.backend/mcp_server/context.py (2)
51-52: Redundant__init__override.
BaseHTTPMiddleware.__init__already accepts(app, dispatch=None). Overriding just to callsuper().__init__(app)adds noise without behavior change — drop it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/context.py` around lines 51 - 52, Remove the redundant __init__ override from the middleware class in context.py: delete the __init__(self, app: ASGIApp) -> None: super().__init__(app) method so the class inherits BaseHTTPMiddleware.__init__(app, dispatch=None) unchanged; this removes noise without changing behavior and relies on the base class's constructor (identify the class by its use of BaseHTTPMiddleware and method name __init__).
54-101: Sync DB work runs on the event loop in every request.
_stamp_last_seendoes a SQLAlchemy SELECT + INSERT/UPDATE + COMMIT synchronously right before returning the response, so every/mcp/*and/speakrequest blocks the async event loop on SQLite I/O. For a chatty agent (batchedvoicebox.speakcalls) this serializes requests behind the stamp commit and adds tail latency to SSE streams. Since stamping is already best-effort (the broadexcept Exceptionswallows everything), fire-and-forget into a worker thread is a clean fit.⚡ Proposed fix — dispatch stamping off the event loop
+import asyncio @@ if client_id and _is_stamped_path(request.url.path): - _stamp_last_seen(client_id) + # Best-effort; don't block the event loop on SQLite commit. + asyncio.create_task(asyncio.to_thread(_stamp_last_seen, client_id)) return response🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/context.py` around lines 54 - 101, dispatch currently calls the synchronous _stamp_last_seen on the request path, blocking the async event loop; change dispatch to schedule stamping off the event loop (e.g., use asyncio.get_running_loop().run_in_executor or asyncio.to_thread / loop.run_in_executor) so the DB work runs in a worker thread and is fire-and-forget. Move or wrap the DB-heavy logic in _stamp_last_seen (or create a new helper like _stamp_last_seen_blocking) so it contains the imports, query/insert/commit/rollback/close and retains the existing broad try/except behavior, and have dispatch call asyncio.create_task or loop.run_in_executor to invoke that blocking helper when client_id and _is_stamped_path(request.url.path) are true; ensure dispatch does not await the task and still resets current_client_id as before.backend/mcp_server/README.md (1)
41-41: Add language identifiers to fenced code blocks (MD040).markdownlint flags the
claude mcp add,npx@modelcontextprotocol/inspector``,curl, and package-tree code blocks. Tag them (`shell` for the three commands, `text` for the tree) to silence the lint and get syntax highlighting in the MDX-rendered docs.Also applies to: 68-68, 80-80, 89-89
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/README.md` at line 41, The README has multiple fenced code blocks missing language identifiers causing MD040 warnings; update the fenced blocks containing the commands "claude mcp add", "npx `@modelcontextprotocol/inspector`", and the "curl" example to use the shell language tag (```shell) and update the package-tree block (the ASCII tree) to use the text language tag (```text); apply the same change to the other flagged occurrences referenced (the other blocks at the same locations) so all command blocks are tagged shell and the tree block is tagged text.app/src/components/CapturesTab/DictationReadinessChecklist.tsx (1)
94-113:downloadByModelin the effect deps runs the effect on every render.
downloadByModelis constructed as a freshMapon each render, so including it in the dependency array effectively makes the effect run on every render rather than only whenactiveTaskschanges. The inner set-diff still guards against spurious invalidations so this isn't a correctness bug, but dropdownloadByModelfrom deps and recompute the set inside the effect — that keeps the ref-identity semantics clean and makes intent obvious.♻️ Proposed fix
- const prevActive = useRef<Set<string>>(new Set()); - useEffect(() => { - const current = new Set(downloadByModel.keys()); - for (const name of prevActive.current) { - if (!current.has(name)) { - queryClient.invalidateQueries({ queryKey: ['capture-readiness'] }); - queryClient.invalidateQueries({ queryKey: ['modelStatus'] }); - break; - } - } - prevActive.current = current; - }, [activeTasks, queryClient, downloadByModel]); + const prevActive = useRef<Set<string>>(new Set()); + useEffect(() => { + const current = new Set<string>(); + for (const dl of activeTasks?.downloads ?? []) { + if (dl.status === 'downloading') current.add(dl.model_name); + } + for (const name of prevActive.current) { + if (!current.has(name)) { + queryClient.invalidateQueries({ queryKey: ['capture-readiness'] }); + queryClient.invalidateQueries({ queryKey: ['modelStatus'] }); + break; + } + } + prevActive.current = current; + }, [activeTasks, queryClient]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx` around lines 94 - 113, downloadByModel is rebuilt each render causing the useEffect dependency array to trigger every time; remove downloadByModel from the deps and instead compute a Set of model names inside the effect. Specifically, in the useEffect that references prevActive/current, use activeTasks (and queryClient) in the dependency array only, construct const current = new Set((activeTasks?.downloads ?? []).filter(dl => dl.status === 'downloading').map(dl => dl.model_name)) inside the effect, keep the same diff logic that invalidates via queryClient.invalidateQueries(['capture-readiness']) and ['modelStatus'] when a name disappears, and then assign prevActive.current = current at the end.backend/routes/mcp_bindings.py (1)
59-59: Replace deprecateddatetime.utcnow()with timezone-aware alternative.
datetime.utcnow()is deprecated as of Python 3.12 and scheduled for removal; it returns a naïve datetime which loses the UTC contract at the type boundary. The same pattern appears inbackend/mcp_server/context.pyat line 93, so both should be updated together.🕒 Proposed replacement
-from datetime import datetime +from datetime import datetime, timezone @@ - row.updated_at = datetime.utcnow() + row.updated_at = datetime.now(timezone.utc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/mcp_bindings.py` at line 59, Replace the naïve UTC timestamp assignment using datetime.utcnow() with a timezone-aware UTC datetime (e.g., datetime.now(timezone.utc)); update the import to bring in timezone (or use datetime.now with datetime.timezone) and set row.updated_at = timezone-aware datetime in the module where row.updated_at is assigned, and make the same change to the analogous timestamp assignment in the MCP server context code (the datetime assignment at the other location) so both places use timezone-aware UTC datetimes.app/src/components/ChordPicker/ChordPicker.tsx (1)
62-72: Nit: localtshadows the translation function.Inside the effect,
const t = window.setTimeout(...)shadowsconst { t } = useTranslation()from line 51. It's harmless today (the effect body doesn't callt(...)), but the shadow is a footgun for future edits. Consider renaming totimeoutId.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ChordPicker/ChordPicker.tsx` around lines 62 - 72, The effect in ChordPicker.tsx shadows the translation function t (from const { t } = useTranslation()) by declaring const t = window.setTimeout(...) inside useEffect; rename that local to a non-conflicting name (e.g., timeoutId) in the useEffect closure where setTimeout is assigned and cleared (the code that calls captureRef.current?.focus() and window.clearTimeout), so the translation function t remains unshadowed for future edits.app/src/components/CapturesTab/CapturesTab.tsx (1)
116-117: DeadfileInputRef: declared but never triggered.
fileInputRef(line 116) is attached to a hidden<input>(line 322) but no code callsfileInputRef.current?.click()— onlyuploadInputRefis triggered (viahandleUploadClick). The two inputs share the same accept and change handler, so the second is dead UI. Either remove it or wire the intended code path.🔧 Proposed fix
- const fileInputRef = useRef<HTMLInputElement>(null); const uploadInputRef = useRef<HTMLInputElement>(null); ... - <input - ref={fileInputRef} - type="file" - accept={CAPTURE_AUDIO_MIME} - onChange={(e) => handleUploadFile(e, 'file')} - className="hidden" - />Also applies to: 321-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/CapturesTab/CapturesTab.tsx` around lines 116 - 117, The dead ref fileInputRef is never used (only uploadInputRef is clicked), so either remove the redundant hidden input and fileInputRef declaration, or wire the intended click path: update the upload trigger (e.g., the button or handler that currently calls uploadInputRef.current?.click()) to call fileInputRef.current?.click() (or both) and ensure the shared accept and onChange handlers (the change handler function and the input props) remain attached to the active input; also remove fileInputRef and its unused hidden <input> if you choose the removal option to avoid unused refs.backend/database/models.py (1)
275-275: UseJSONcolumn type forrefinement_flagsto matchCaptureSettingsand eliminate manual serialization.
refinement_flagsis stored asTextand requires manualjson.loads()/json.dumps()in callers, whileCaptureSettings.chord_push_to_talk_keysuses SQLAlchemy'sJSONcolumn type. Switching toJSONremoves manual serialization boilerplate and makes the intent self-evident.If adopted, update callers in
backend/services/captures.py:
- Line 36: Change
json.loads(row.refinement_flags)torow.refinement_flags(already deserialized by SQLAlchemy)- Line 185: Change
json.dumps(flags.to_dict())toflags.to_dict()(JSON column auto-serializes)Proposed schema change
- refinement_flags = Column(Text, nullable=True) # JSON blob + refinement_flags = Column(JSON, nullable=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/database/models.py` at line 275, Change the refinement_flags column from Text to SQLAlchemy JSON so SQLAlchemy handles (de)serialization: update the Column definition named refinement_flags in models.py to use JSON(nullable=True) matching CaptureSettings.chord_push_to_talk_keys; then remove manual json.loads/json.dumps usage in backend/services/captures.py by replacing json.loads(row.refinement_flags) with just row.refinement_flags and replacing json.dumps(flags.to_dict()) with flags.to_dict(), ensuring callers expect native Python dicts.backend/tests/test_refinement_samples.py (2)
1-25: This is an eval harness but will be collected by pytest.Because the filename matches
test_*.py,pytest backend/testswill import the module (runningsys.path.insert(0, str(REPO_ROOT))) and report zero tests collected from it — harmless but potentially confusing in CI output. The docstring already says "this is an interactive evaluation harness, not a pass/fail unit test".Consider either renaming (
eval_refinement_samples.py), moving out ofbackend/tests/, or addingcollect_ignore = ["test_refinement_samples.py"]tobackend/tests/conftest.pyso regular test runs don't touch it.Also applies to: 451-452
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_refinement_samples.py` around lines 1 - 25, The test harness file test_refinement_samples.py is being picked up by pytest because it matches the test_*.py pattern; either rename or move the file (e.g., eval_refinement_samples.py or into a non-tests directory) so it is not auto-collected, or add a pytest collection exclusion by appending the filename to collect_ignore in backend/tests/conftest.py (use the collect_ignore list/variable name) so pytest will skip importing test_refinement_samples.py during regular test runs.
37-37:Iterableshould come fromcollections.abc(Ruff UP035).Preferred on Python 3.9+.
♻️ Fix
-from typing import Iterable, Optional +from collections.abc import Iterable +from typing import Optional🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_refinement_samples.py` at line 37, Replace the typing-sourced Iterable with the collections.abc one to satisfy Ruff UP035: update the import so that Iterable is imported from collections.abc while keeping Optional from typing (i.e. change the current "from typing import Iterable, Optional" usage to "from collections.abc import Iterable" and "from typing import Optional" or a combined import statement that reflects that separation). This targets the import statement referencing Iterable and Optional.backend/routes/captures.py (1)
33-36: Chunked read offers no streaming benefit — simplify.
file.read(UPLOAD_CHUNK_SIZE)in a loop followed byb"".join(chunks)is functionally identical toawait file.read(), sinceaudio_bytesis held entirely in memory before being passed tocreate_capture. If you eventually want a real streaming path, it would need to pass an iterator through to the service, not pre-materialize.♻️ Simplification
- chunks = [] - while chunk := await file.read(UPLOAD_CHUNK_SIZE): - chunks.append(chunk) - audio_bytes = b"".join(chunks) + audio_bytes = await file.read()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routes/captures.py` around lines 33 - 36, The chunked loop reading into chunks and joining them is unnecessary because audio_bytes is fully materialized; replace the while chunk := await file.read(UPLOAD_CHUNK_SIZE) / chunks append / b"".join(chunks) logic with a single await file.read() to produce audio_bytes and pass that to create_capture (or, if you want true streaming later, change create_capture to accept and consume an async iterator instead). Target symbols: file.read, UPLOAD_CHUNK_SIZE, chunks, audio_bytes, create_capture.backend/mcp_shim/__main__.py (1)
171-186: One transport hiccup kills the whole MCP session.If
_handle_requestraises for any reason (connection reset mid-stream, malformed SSE from the server,httpx.ReadError, etc.), the outerexcept Exceptionreturns 1 and the client loses its MCP server until the parent process respawns it. Consider catching and logging per-request so a single bad request doesn't take down long-lived sessions.♻️ Proposed resilience
- await _handle_request( - client, url, line, forward_headers, session_id - ) + try: + await _handle_request( + client, url, line, forward_headers, session_id + ) + except Exception as exc: + _err(f"request failed, continuing: {exc!r}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_shim/__main__.py` around lines 171 - 186, The main loop currently exits the whole process if any call to _handle_request raises; wrap the per-request call in its own try/except so transient errors don’t kill the session: inside the while True loop (around await _handle_request(client, url, line, forward_headers, session_id)) catch Exception as exc, call _err with a descriptive message including exc, then continue the loop; keep the existing outer except for KeyboardInterrupt/SystemExit and the check that _read_stdin_line() returning None exits with 0.backend/mcp_server/tools.py (2)
133-141: Check base64 length before decoding.
b64.b64decodeallocates the full decoded buffer before the length check on line 138, so a 270 MB base64 payload materializes ~200 MB before being rejected. Gate on the encoded length first (4 base64 chars ≈ 3 raw bytes) to reject cheaply.♻️ Proposed fix
+ if len(audio_base64) > MAX_TRANSCRIBE_BYTES // 3 * 4 + 4: + raise ValueError( + f"Audio exceeds {MAX_TRANSCRIBE_BYTES // (1024 * 1024)} MB limit." + ) try: raw = b64.b64decode(audio_base64, validate=True) except Exception as exc: raise ValueError(f"Invalid audio_base64: {exc}") from exc - if len(raw) > MAX_TRANSCRIBE_BYTES: - raise ValueError( - f"Audio exceeds {MAX_TRANSCRIBE_BYTES // (1024 * 1024)} MB limit." - )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/tools.py` around lines 133 - 141, The code decodes audio_base64 into raw before checking size, allowing huge memory allocation; change the logic in the Base64 branch to first compute an estimated decoded length from len(audio_base64) (use the 4:3 ratio and account for padding) and compare it to MAX_TRANSCRIBE_BYTES, raising ValueError if too large, and only then call b64.b64decode(audio_base64, validate=True) to produce raw (and keep the existing exception handling and subsequent length check/cleanup around raw).
290-296: Replacewhisper._is_model_cached()call with the publicis_model_cached()API.The code directly accesses a private backend method. Use the public
is_model_cached(hf_repo)function frombackend.backends.baseinstead, similar to howbackend/routes/captures.pydoes it. To refactor, mapmodel_sizeto the corresponding Hugging Face repository ID (following the pattern in backend implementations), then call the public API.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/mcp_server/tools.py` around lines 290 - 296, Current check uses the private whisper._is_model_cached(model_size); change it to use the public is_model_cached(hf_repo) API from backend.backends.base by mapping the local model_size string to the corresponding Hugging Face repo ID (same mapping/pattern used in backend implementations and backend/routes/captures.py), then call is_model_cached(mapped_hf_repo) in place of whisper._is_model_cached; keep the existing conditions using whisper.is_loaded() and whisper.model_size but replace the private check with the mapped public API call (use the same mapping logic found near other backend whisper model checks to derive the HF repo ID).backend/services/captures.py (1)
90-111: Minor style cleanups flagged by Ruff.
- Line 90:
_WHISPER_NATIVE_FORMATSis a function-local constant; Ruff N806 expects lowercase in function scope. Either hoist it to module scope next toVALID_SOURCES(more useful — it's conceptually a constant and won't be re-allocated on every call), or rename it.- Lines 108–111:
try/except OSError: passcan becontextlib.suppress(OSError)for readability (SIM105).♻️ Proposed diff
VALID_SOURCES = {"dictation", "recording", "file"} +_WHISPER_NATIVE_FORMATS = (".wav", ".mp3", ".flac", ".ogg") @@ - _WHISPER_NATIVE_FORMATS = (".wav", ".mp3", ".flac", ".ogg") - if audio is None or sr is None: @@ - audio_path = config.get_captures_dir() / f"{capture_id}.wav" - sf.write(str(audio_path), audio, sr, format="WAV") - try: - raw_path.unlink() - except OSError: - pass + audio_path = config.get_captures_dir() / f"{capture_id}.wav" + sf.write(str(audio_path), audio, sr, format="WAV") + with contextlib.suppress(OSError): + raw_path.unlink()Add
import contextlibat the top.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/captures.py` around lines 90 - 111, Move the function-local constant _WHISPER_NATIVE_FORMATS out to module scope (e.g., next to VALID_SOURCES) or rename it to lowercase to satisfy Ruff N806, and replace the try/except OSError: pass block that attempts raw_path.unlink() with contextlib.suppress(OSError): raw_path.unlink() while adding an import for contextlib at the top of the module so captures.py uses the suppress pattern for readability (ensure you update references if you relocate the constant).app/src/lib/api/types.ts (1)
193-201: Field name inconsistency between refine/retranscribe requests.
CaptureRefineRequest.model_sizevsCaptureRetranscribeRequest.model— I understand one targets the LLM and the other Whisper, so the semantics differ, but consumers now have to remember which endpoint uses which key. Considerstt_modelon retranscribe to match the backend'sretranscribe_capture(stt_model=...)parameter and to parallelllm_modelon the response type. Purely a naming nit; skip if the wire format is already locked.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/lib/api/types.ts` around lines 193 - 201, Rename the inconsistent field on the retranscribe request to match backend parameter naming: change the CaptureRetranscribeRequest property currently named model to stt_model (type WhisperModelSize) and update all usages to pass/read stt_model (e.g., where retranscribe_capture(stt_model=...) is called or the request is constructed); this parallels the llm model naming used for refine responses and avoids consumers having to remember different keys. Ensure type references and any serialization/deserialization that expect model are updated to stt_model.backend/backends/qwen_llm_backend.py (1)
238-246: Consider clearing MLX's Metal cache on unload.
del self.model; self.model = Nonedrops the Python ref but MLX's Metal allocator may hold the GPU buffers until a subsequent allocation reuses them. For explicit memory recovery (important when users toggle between 0.6B / 1.7B / 4B or unload before switching to a TTS engine),mx.metal.clear_cache()is the complement toempty_device_cache(self.device)in the PyTorch path above.mlx python clear metal GPU cache after deleting model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/qwen_llm_backend.py` around lines 238 - 246, The unload_model method currently deletes Python references to the MLX model/tokenizer but doesn't free MLX's Metal GPU buffers; after setting self.model and self.tokenizer to None in unload_model, call mx.metal.clear_cache() (or the equivalent MLX API) to explicitly clear Metal allocator caches so memory is released when switching models or before TTS usage; update unload_model to import/use mx.metal.clear_cache() and run it after the deletions and before logging "Qwen3 (MLX) unloaded".
| def mount_into( | ||
| app: FastAPI, | ||
| *, | ||
| extra_startup: Callable[[], None] | None = None, | ||
| ) -> None: | ||
| """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware. | ||
|
|
||
| ``extra_startup`` — if provided, runs during the FastAPI lifespan. This | ||
| is the hook that lets ``app.py`` keep its existing startup/shutdown | ||
| bodies while also driving FastMCP's session manager. | ||
| """ | ||
| mcp = build_mcp_server() | ||
| mcp_app = mcp.http_app(path="/", transport="http") | ||
|
|
||
| # ClientIdMiddleware must run before FastMCP so the ContextVar is set | ||
| # by the time tool handlers execute. Starlette composes middlewares | ||
| # outermost-first, so adding here on the parent app is correct. | ||
| app.add_middleware(ClientIdMiddleware) | ||
| app.mount("/mcp", mcp_app) | ||
| app.state.mcp_lifespan = mcp_app.router.lifespan_context | ||
| logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", "")) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find callers of mount_into to see whether extra_startup is passed anywhere.
rg -nP --type=py -C3 '\bmount_into\s*\('Repository: jamiepine/voicebox
Length of output: 397
🏁 Script executed:
#!/bin/bash
# Search for all calls to mount_into (not just definition)
rg -nP --type=py 'mount_into' | head -20Repository: jamiepine/voicebox
Length of output: 233
🏁 Script executed:
#!/bin/bash
# Read the full mount_into and compose_lifespan functions to check if extra_startup is referenced
sed -n '40,80p' backend/mcp_server/server.pyRepository: jamiepine/voicebox
Length of output: 1608
🏁 Script executed:
#!/bin/bash
# Search for mount_into calls in backend/app.py and other files
rg -nP --type=py -B2 -A2 'mount_into\(' backend/Repository: jamiepine/voicebox
Length of output: 265
🏁 Script executed:
#!/bin/bash
# Search more broadly for mount_into references without the opening paren
rg -nP --type=py 'mount_into' backend/ | grep -v 'def mount_into\|from.*mount_into\|__all__'Repository: jamiepine/voicebox
Length of output: 44
🏁 Script executed:
#!/bin/bash
# Check backend/app.py specifically for mount_into usage
if [ -f backend/app.py ]; then
echo "=== backend/app.py ==="
cat backend/app.py | head -100
fiRepository: jamiepine/voicebox
Length of output: 3768
🏁 Script executed:
#!/bin/bash
# Read the rest of create_app function
sed -n '75,150p' backend/app.pyRepository: jamiepine/voicebox
Length of output: 3054
extra_startup parameter is declared and documented but never invoked.
The docstring states extra_startup "runs during the FastAPI lifespan", but the function body never references it. Either wire it into the lifespan or remove the parameter and its doc paragraph to avoid misleading callers.
🛠️ Option A — remove the dead parameter
-def mount_into(
- app: FastAPI,
- *,
- extra_startup: Callable[[], None] | None = None,
-) -> None:
- """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware.
-
- ``extra_startup`` — if provided, runs during the FastAPI lifespan. This
- is the hook that lets ``app.py`` keep its existing startup/shutdown
- bodies while also driving FastMCP's session manager.
- """
+def mount_into(app: FastAPI) -> None:
+ """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware."""Option B — actually run it (e.g., wrap app.state.mcp_lifespan so extra_startup() is called inside the composed lifespan). The current pattern in create_app uses compose_lifespan for this; match whichever approach is intended here.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def mount_into( | |
| app: FastAPI, | |
| *, | |
| extra_startup: Callable[[], None] | None = None, | |
| ) -> None: | |
| """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware. | |
| ``extra_startup`` — if provided, runs during the FastAPI lifespan. This | |
| is the hook that lets ``app.py`` keep its existing startup/shutdown | |
| bodies while also driving FastMCP's session manager. | |
| """ | |
| mcp = build_mcp_server() | |
| mcp_app = mcp.http_app(path="/", transport="http") | |
| # ClientIdMiddleware must run before FastMCP so the ContextVar is set | |
| # by the time tool handlers execute. Starlette composes middlewares | |
| # outermost-first, so adding here on the parent app is correct. | |
| app.add_middleware(ClientIdMiddleware) | |
| app.mount("/mcp", mcp_app) | |
| app.state.mcp_lifespan = mcp_app.router.lifespan_context | |
| logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", "")) | |
| def mount_into(app: FastAPI) -> None: | |
| """Attach the MCP app to ``app`` at ``/mcp`` and install the client-id middleware.""" | |
| mcp = build_mcp_server() | |
| mcp_app = mcp.http_app(path="/", transport="http") | |
| # ClientIdMiddleware must run before FastMCP so the ContextVar is set | |
| # by the time tool handlers execute. Starlette composes middlewares | |
| # outermost-first, so adding here on the parent app is correct. | |
| app.add_middleware(ClientIdMiddleware) | |
| app.mount("/mcp", mcp_app) | |
| app.state.mcp_lifespan = mcp_app.router.lifespan_context | |
| logger.info("MCP: mounted at /mcp (FastMCP %s)", getattr(mcp, "version", "")) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/mcp_server/server.py` around lines 40 - 60, The declared
extra_startup parameter on mount_into is never used; either remove it and its
doc paragraph or wire it into the FastAPI lifespan so it actually runs. To fix
by wiring: when you set app.state.mcp_lifespan =
mcp_app.router.lifespan_context, if extra_startup is not None replace/compose
that lifespan with one that runs extra_startup during startup (use the existing
compose_lifespan helper or the same pattern used in create_app) so mount_into
installs the composed lifespan that calls extra_startup; alternatively delete
the extra_startup parameter, its docstring mention, and any callers relying on
it. Ensure you update the mount_into signature/comments to match the chosen
approach.
…all green Two small follow-ups to the sidebar checklist placement. Move it below the What's different section so the sticky top of the sidebar stays the page's narrative context (About → differences) and the checklist reads as a status panel rather than preamble. Gate the whole block on !readiness.allReady so once every gate is green the sidebar drops back to just About + What's different — no value in real estate full of checkmarks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
useCaptureSettings updated its own cache optimistically but never invalidated ['capture-readiness'], so for up to 5 s (the poll interval) after switching stt_model or llm_model the checklist kept showing the previous model's ready/missing state. The backend endpoint resolves the model live on each call — it was just the frontend cache that lagged. Invalidate in onSettled only when the patch touched a model field, so unrelated updates (chord keys, toggles) don't pay for a refetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/src/components/CapturesTab/DictationReadinessChecklist.tsx (1)
114-125: UnstabledownloadByModelin effect deps causes re-run every render.
downloadByModelis a freshMapinstance on every render (built at lines 106–109), so including it in the dep array makes this effect fire on every commit — not just whenactiveTasksactually changes. The logic is idempotent (when data is unchanged,currentequalsprevActive.currentso no invalidation occurs), but every render still rebuilds the Set and reassigns the ref.Prefer
useMemofor the map, or key the effect toactiveTasksalone and derive the key set inside:♻️ Proposed refactor
- const downloadByModel = new Map<string, ActiveDownloadTask>(); - for (const dl of activeTasks?.downloads ?? []) { - if (dl.status === 'downloading') downloadByModel.set(dl.model_name, dl); - } + const downloadByModel = useMemo(() => { + const map = new Map<string, ActiveDownloadTask>(); + for (const dl of activeTasks?.downloads ?? []) { + if (dl.status === 'downloading') map.set(dl.model_name, dl); + } + return map; + }, [activeTasks]); // When a download disappears from activeTasks, it just finished — refetch // readiness immediately so the row flips to ✓ instead of waiting up to 5s // for the next readiness poll. const prevActive = useRef<Set<string>>(new Set()); useEffect(() => { const current = new Set(downloadByModel.keys()); for (const name of prevActive.current) { if (!current.has(name)) { queryClient.invalidateQueries({ queryKey: ['capture-readiness'] }); queryClient.invalidateQueries({ queryKey: ['modelStatus'] }); break; } } prevActive.current = current; - }, [activeTasks, queryClient, downloadByModel]); + }, [downloadByModel, queryClient]);(
activeTaskscan be dropped from deps sincedownloadByModelis now memoized against it.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx` around lines 114 - 125, The effect currently reruns every render because downloadByModel is a new Map each render; memoize downloadByModel with useMemo (keyed by activeTasks) or remove downloadByModel from the effect deps and compute the current Set from activeTasks inside the useEffect instead; update the code around prevActive, useEffect, and downloadByModel so downloadByModel is stable (useMemo([...], [activeTasks])) or so the effect depends only on activeTasks and queryClient, then call queryClient.invalidateQueries(['capture-readiness']) and queryClient.invalidateQueries(['modelStatus']) when a name was removed as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/src/components/CapturesTab/DictationReadinessChecklist.tsx`:
- Around line 114-125: The effect currently reruns every render because
downloadByModel is a new Map each render; memoize downloadByModel with useMemo
(keyed by activeTasks) or remove downloadByModel from the effect deps and
compute the current Set from activeTasks inside the useEffect instead; update
the code around prevActive, useEffect, and downloadByModel so downloadByModel is
stable (useMemo([...], [activeTasks])) or so the effect depends only on
activeTasks and queryClient, then call
queryClient.invalidateQueries(['capture-readiness']) and
queryClient.invalidateQueries(['modelStatus']) when a name was removed as
before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 044f8174-cd4e-498e-8697-ab56b4b8cb18
📒 Files selected for processing (3)
app/src/components/CapturesTab/DictationReadinessChecklist.tsxapp/src/components/ServerTab/CapturesPage.tsxapp/src/lib/hooks/useSettings.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- app/src/lib/hooks/useSettings.ts
- app/src/components/ServerTab/CapturesPage.tsx
Two surfaces leaked macOS-specific copy onto other platforms: 1. The Input Monitoring + Accessibility rows in the readiness checklist rendered everywhere. On Windows/Linux the Rust permission stubs return true, so the rows showed as permanent green checkmarks with copy like "macOS allows Voicebox to detect your global shortcut." — nonsense when you're on Windows. Gate both rows on a userAgent-based isMacOS check so they only render where the underlying TCC permission actually exists. 2. The global-shortcut setting description ended with "macOS will ask for Input Monitoring permission the first time you turn this on." That sentence rendered on every platform. The readiness checklist already surfaces the TCC requirement at the right moment on macOS, so the description doesn't need the platform note — drop it from en / ja / zh-CN / zh-TW. Other macOS strings (AccessibilityNotice, InputMonitoringNotice, their "stillMissing" hints) are already gated behind the Rust permission booleans returning false, which never happens on Windows/Linux, so they stay inert without further changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pill subscribed to /generation/{id}/status to know when to start
playback, but EventSource.onerror was a no-op — auto-reconnect was the
intended recovery for transient drops. The gap: if the backend deletes
the gen row mid-flight or the connection silently dies in a way the
browser keeps retrying without ever getting a status event, the pill
sits in 'speaking' forever and the user has no way to clear it.
Added a 60-second hard cap that arms when the SSE opens and clears the
moment any real status event lands. If it fires while the pill is still
on the same id and audio never started, it force-dismisses. Same idea
as the existing post-speak-end 15s grace, but covers the case where the
backend never says anything at all.
…shadow
- DictationReadinessChecklist was constructing downloadByModel as a fresh Map every render and listing it in the cleanup effect's deps. With the 1 s polling cadence and arbitrary parent rerenders the effect ran more often than it needed to. Memoised the Map on activeTasks; the effect now keys off the memo's identity.
- zh-CN persona tooltipActive/ariaLabelActive matched their inactive twins byte-for-byte ("以人物设定朗读"). The other locales differentiate the active state with a -ing / -中 suffix; zh-CN now reads "正以人物设定朗读" when active.
- personalityPlaceholder was a ~290-character paragraph that doubled as both the example text and the explanation, repeating most of what personalityHint already said. Trimmed to the example only and folded the explanation + leave-blank consequence into the hint, across all four locales.
- Refinement model size keys were size06 / size17 / size4. Renamed the 4B variant to size40 so the decimal padding is consistent.
- ChordPicker's open-effect bound a window.setTimeout id to a local `t`, shadowing the i18n `t` from useTranslation. Renamed to timeoutId.
…r-move Both sliders on the generation settings page were calling update() — which is a React Query mutation that PATCHes /settings/generation — inside onValueChange. Dragging the chunk-limit slider from 800 to 3000 fired a request per pointer-move pixel, and a mid-drag failure plus optimistic rollback would leave persisted state visibly out of sync with the thumb position. Local state now mirrors each slider during a drag and the persist happens once on Radix's onValueCommit (pointer-up / keyboard-release). useEffects keep the local state in sync if the persisted value changes out-of-band — another window editing the same setting still updates the slider position cleanly.
…ad patterns Mechanical sweep of items called out in the PR review: - qwen_llm_backend: AutoModelForCausalLM.from_pretrained(torch_dtype=…) is deprecated in transformers ≥4.41 in favor of dtype=. Renamed. - routes/llm: try/except around backend.generate() raised HTTPException(500, detail=str(e)) which leaks stack traces / paths to clients and trips Ruff B904. Now logs the original exception server-side and hands the client a generic message; chained via `from e` to preserve traceback context. - mcp_bindings + mcp_server/context: datetime.utcnow() is deprecated since 3.12. Switched the two assignment sites to datetime.now(timezone.utc). The schema-level `default=datetime.utcnow` defaults in database/models.py are left for a later schema-aware pass. - routes/generations: `logger = …` sat between two import blocks (Ruff E402). Moved below imports. - mcp_server/server + tests/test_refinement_samples: typing.Callable / typing.Iterable have been preferred-via collections.abc since 3.9 (Ruff UP035). - routes/events: `except asyncio.TimeoutError` aliases plain `TimeoutError` since 3.11 (UP041). - services/captures: hoisted WHISPER_NATIVE_FORMATS to module scope (was a function-local UPPER_SNAKE that tripped N806) and replaced the raw_path.unlink try/except OSError-pass with contextlib.suppress (SIM105). Semantic equivalence preserved — written_files.remove(raw_path) still only runs when unlink succeeds because it sits inside the suppressed block after the unlink call. - database/migrations: hoisted the duplicate `import sqlite3` from inside two helper bodies to a single module-level import.
The track editor's clip toolbar now has a regenerate icon next to Delete; clicking it kicks a fresh take of the selected clip's underlying generation through the same /generate/{id}/regenerate path the History table uses, and pushes the id into the global pending set so the SSE watcher picks it up. The chat list's per-item dropdown gets the same action between Play-from-here and Remove. Translation keys added under storyContent.itemActions / storyContent.toast across all four locales.
…icker) You can now drop a music file onto the story content area or pick one through the new "Import audio" button in the add-clip popover. Both call POST /generate/import which writes the file to data/generations/<id>.<ext>, probes duration via librosa, and inserts a Generation row pointing at a singleton "Imported Audio" profile (created lazily on first import). The existing addStoryItem flow takes over from there — the timeline doesn't care that the row didn't come out of TTS. Engine field on the row is "import"; it's surfaced on StoryItemDetail so the chat list shows a music icon instead of the (missing) profile avatar and both the dropdown and the track-editor toolbar hide the Regenerate action — there's nothing to regenerate. Accepted formats: wav/mp3/flac/ogg/m4a/aac/webm, capped at 200 MB. Translation keys added across en/ja/zh-CN/zh-TW.
/audio/{id} and /audio/version/{id} hardcoded media_type="audio/wav" on
the FileResponse. That was a no-op when every generation came out of
TTS (everything on disk was a .wav anyway), but imported audio keeps
its source format — .mp3 / .m4a / .ogg — and the WaveSurfer MediaElement
backend uses an <audio> tag that checks Content-Type before letting the
clip play, so an MP3 announced as audio/wav silently failed to load.
Both endpoints now derive the type via mimetypes.guess_type and fall
back to audio/wav for unknown suffixes. Download filenames also keep
the real extension instead of always saying ".wav".
… scope The track editor's zoom was clamped to a hardcoded [10, 200] pixels-per-second range, which had no relationship to the project — on a 4-minute story a "max zoom out" of 200 px/s still required scrolling, and on a 5-second story you could zoom all the way in to where every clip was a tiny sliver. Reframed the bounds in the unit the user actually thinks in: how many seconds of timeline are visible at once. Min scope is 10 s (most zoomed in), max scope is the entire project, and the default lands on a 60 s scope (or the full project, whichever is shorter) once the editor measures its visible track width on first mount. The pixels-per-second value still lives in component state (because every downstream calculation already uses it) but minPps/maxPps are computed from `containerWidth − LABEL_COL_WIDTH` and the project's effective duration, so the +/- buttons and the edge-drag handles on the scrollbar all clamp to bounds that move with the project. Re-clamping fires whenever those bounds shift — adding a long clip or resizing the window pulls the current zoom inside the new range instead of leaving the user parked outside it.
Imports were rendering as "Imported Audio" everywhere because every import points at the singleton voice profile. The filename was already being stored on the generation row (in the `text` field), so the chat item title and the timeline clip label now read from `text` when `engine === 'import'` and fall back to the profile name otherwise. The chat item also drops the language pill (always "en" on imports — not informative) and skips the transcript textarea since imports have no spoken text to show.
handleSplit was sending currentTimeMs - item.start_time_ms straight to the backend, which rejects it because StoryItemSplit.split_time_ms is typed as int and the playhead's currentTimeMs is a float (it's driven from HTMLAudioElement.currentTime, which carries sub-millisecond precision). Pydantic surfaced the mismatch as "Input should be a valid integer, got a number with a fractional part" and the toast read "Failed to split clip". Math.round at the call site, matching what the trim and move handlers already do.
Each story item now carries a volume column (linear gain, default 1.0,
clamped 0.0–2.0 server-side). New PUT /stories/{}/items/{}/volume route
+ useUpdateStoryItemVolume hook + a Volume2 icon in the clip-edit
toolbar that opens a popover with a 0–200% slider. Local slider state
drives the visual during a drag; the persist fires once on
onValueCommit, mirroring the generation-page slider pattern.
Web Audio playback inserts a per-clip GainNode between source and
master so volume changes apply live without re-decoding the buffer
(source -> clipGain -> masterGain -> destination). Server-side
mixdown in export multiplies the trimmed clip by its volume before
summing into the timeline. Split + duplicate carry the volume forward
to the new clips so trimming a faded section keeps the level you set.
Migration adds the volume column with default 1.0 so existing rows
read as full volume.
…d audio The clip waveforms drawn inside each timeline track use WaveSurfer with the default MediaElement backend, which creates an internal <audio> element to drive playback timing. Web Audio in useStoryPlayback is what actually produces sound, but WaveSurfer's element was happily preloading and — after the first user gesture unlocked browser autoplay — playing the source URL through the page output too. For TTS clips it was masked: they're short, both sources start at the same time, and stopping the BufferSourceNode at pause coincides with the natural end of the audio element. For long imports (a four-minute MP3) the BufferSourceNode stops on pause but WaveSurfer's element keeps going on its own track — which is exactly the "music keeps playing when I pause" symptom. Hand WaveSurfer a muted <audio> element via the `media` option so the visual still loads peaks but the element itself can never produce sound. preload="metadata" keeps the load lightweight.
…ly halt source.stop() was the only thing happening when a clip was halted, and on long imported buffers (multi-minute MP3s scheduled via source.start with a duration argument) it was silently failing to halt the buffer in some browsers — pause left the music playing and seek stacked another source on top of the original. The mute-the-WaveSurfer-element fix was a different bug along the same path; this is the one that actually addresses the duplicated audio. ActiveSource now carries the per-clip GainNode alongside the source, and stopSource detaches the onended handler before calling stop() (so the natural-end callback can't race with explicit teardown and re-delete a freshly rescheduled entry at the same id), then disconnects both nodes inside their own try/catch blocks. Even when stop() doesn't actually halt the buffer the graph is severed — no path from source to destination, no audio.
Tiny + strips sit at the top of the topmost label cell and the bottom of the bottommost one, sticky-positioned in the label column so they follow horizontal scroll. Clicking either extends the visible track stack in that direction by one — above adds max(existing)+1, below adds min(existing)-1. Both compute against the full set (defaults + item-derived + previously-added) so successive clicks keep extending instead of fighting over the same number. Empty extras live in component state because a track only earns its keep once a clip lands on it. Once one does, item.track carries the number forward and the row keeps deriving from items naturally; if nothing lands there before reload, the empty row simply isn't there next time, which matches what the user expects of an unused affordance.
|
Branch review findings for the 0.5 release: High:
High: Accessibility is treated as required to arm the hotkey. That blocks macOS dictation when auto-paste permission is missing, even though release notes say transcripts should still land in Captures, and it blocks Linux entirely because the Linux stub returns High:
Medium: Default chords are macOS-only ( Medium: The first agent speech can be dropped if the dictate window has not already been created. Medium: Copy-to-clipboard, archive audio, and retention controls are explicitly mock-only local state. Users can set 7-day retention or disable archiving, but captures still persist forever with audio. Wire these to backend settings/enforcement or hide them for 0.5. Checks:
|
|
Commentaire rédigé et sauvegardé. Évaluation qualité : 4/5 - observation technique précise sur un gap d'auditabilité réel dans le design MCP client identity, question ouverte qui invite la discussion sans pitch produit. {"status": "ok", "draft_content": "The |

Summary
The 0.5.0 Capture release. Voicebox stops being just a voice-cloning studio and becomes a full AI voice studio: your voice goes into your computer through a global hotkey, and any agent's voice comes back out through a voice you own. Three pillars on top of one shared local LLM:
voicebox.speak,voicebox.transcribe,voicebox.list_captures,voicebox.list_profilesexposed to Claude Code, Cursor, Cline, Spacebot, etc. via Streamable HTTP plus a stdio shim binary.personality: boolon/generateandvoicebox.speakthat rewrites text through the profile's LLM before TTS.Everything still runs on-device. One Qwen3 instance powers refinement and personality, so there's one local LLM, not two.
What's in here
Dictation pipeline (Rust + app webview + backend)
keytap— our own cross-platform observe-only tap (CGEventTap on macOS, SetWindowsHookEx on Windows, evdev on Linux). Lazy-spawned only when the user enables it, so the macOS Input Monitoring TCC prompt fires from an explicit gesture instead of app startup. The chord state machine (Momentary push-to-talk + Toggle hands-free + PTT→Toggle upgrade) lives in keytap'sChordMatcher; voicebox is a thinChordEvent → Tauri eventdispatcher.ChordPicker. Holding PTT then tapping Space upgrades to hands-free mid-recording without a gap. macOS defaults (right-Cmd + right-Option) deliberately avoid Cmd+Opt+I, Cmd+Opt+Esc, Cmd+Opt+Space. Windows defaults (right-Ctrl + right-Shift) route around AltGr collisions on DE/FR/ES layouts.tauri/src-tauri/src/{clipboard,synthetic_keys,focus_capture,keyboard_layout}.rs): snapshot focused PID at chord-start → activate (cooperative on macOS 14+) → 120ms settle → save full clipboard (every UTI/format) → write text → post Cmd+V/Ctrl+V at HID level with layout-resolved V keycode → 400ms paste-consume → conditional restore. Short-circuits if the focused app is Voicebox itself or Accessibility is untrusted.useDictationReadiness+DictationReadinessChecklistsurface each unmet gate with a one-click action;useChordSyncauto-arms the chord the moment the last gate flips green. Stops the "stuck pill" failure where the chord fires but has nowhere to land.useCaptureRecordingSession, 328 LOC): recording → transcribing → refining → completed → rest → hidden, with calibrated dwell/fade timers and ref-stable closures so late-arriving mutations still see the right flags.MCP server
/mcp(Streamable HTTP).app.pymigrated from@app.on_eventtolifespan=viacompose_lifespan, so FastMCP's session manager nests inside FastAPI's startup/shutdown with correct LIFO teardown (MCP drains in-flight sessions before models unload).ClientIdMiddlewarereadsX-Voicebox-Client-Idinto aContextVar;MCPClientBindingtable maps client → profile +default_personalitybool;resolve_profile()picks explicit arg → per-client binding → global default.last_seen_atstamped on both/mcp(tool calls) and/speak(REST wrapper) so REST callers show up in the Settings UI./events/speakbroadcastsspeak-start/speak-end. Rustspeak_monitor.rssubscribes via reqwest streaming body (CRLF-aware parser, 45s idle timeout catches zombie streams, 500ms→30s escalating backoff) and surfaces the pill window when agent-initiated speech actually starts.voicebox-mcpbinary (backend/mcp_shim/, ~18MB, built viapython build_binary.py --shim) bridges stdio-only clients to the HTTP endpoint. Wired intotauri.conf.jsonexternalBin.MCPPage.tsxwith HTTP / stdio /claude-mcp-addsnippets, default-voice picker, per-client bindings table.Personalities
VoiceProfile.personality(free-form text, ≤2000 chars) describes what the voice says and how, orthogonal to how it sounds (the existing preset/cloning metadata).POST /profiles/{id}/composeproduces a fresh in-character utterance the user can edit (Dices button inFloatingGenerateBox, temp 0.9).personality: trueon/generate(andvoicebox.speak) routes text throughpersonality.rewrite_as_profile()before TTS, marks the rowsource = "personality_speak"(Wand2 toggle inFloatingGenerateBox, temp 0.3).Refinement hardening
"URL, URL, URL, URL, URL, URL"."thanks for watching" × 6) that the word pass misses, and CJK / Japanese loops ("謝謝觀看" × 7,"ご視聴ありがとうございました" × 6) wheretext.split()returns one unsplit token.RefinementFlags(smart_cleanup,self_correction,preserve_technical) snapshotted per capture so re-refine is reproducible.backend/tests/test_refinement_samples.py(10 transcripts, scores prompt leaks / answer leaks / loop echoes / question-mark survival / substring preservation across every bundled model size) andtest_personality_samples.py. Plustest_refinement_collapse.py(17 deterministic unit tests pinning both passes against regression).Capture persistence + UI
Capturemodel +routes/captures.py(POST /capturesupload+STT+optional refine,GET /captureslist,POST /captures/{id}/refine).CapturesTab.tsx(834 LOC) — two-pane settings UI: searchable list ↔ detail with a scrubbable WaveSurfer audio player, Raw/Refined transcript toggle, "Play as" voice, Refine / Copy / Delete. Listens forcapture:created/capture:updatedevents from the dictate window so new captures appear instantly.CapturesPage.tsx(607 LOC) — settings page (different from the tab): hotkey toggle, dualChordPicker, model pickers, auto-paste / auto-refine toggles, inline Accessibility + Input Monitoring notices, live pill preview.useUIStorenow persistsselectedProfileIdviazustand/middleware.persistso reopening the app keeps the last-selected profile.Database
New tables:
Capture,MCPClientBinding, singletonCaptureSettings(chord JSON, model picks, refinement flags, default voice), singletonGenerationSettings.VoiceProfile.personalitycolumn.Generation.sourcecolumn. Idempotent column-existence migrations inbackend/database/migrations.py(single-user desktop, no Alembic).MCPClientBinding's legacydefault_intentcolumn is dropped on startup; SQLite < 3.35 (Ubuntu 20.04, Debian 11) falls back to leaving the unused column in place rather than crashing — SQLAlchemy only maps declared columns so the stray does no work.i18n
app/src/i18n/locales/{en,ja,zh-CN,zh-TW}/translation.jsoneach grow ~375 lines. CapturesTab, CapturesPage, MCPPage, ChordPicker, both gates, the readiness checklist, ProfileCard / ProfileForm, FloatingGenerateBox, GenerationPage sidebar, Sidebar — all wired throught().Landing + docs
New
/captureroute (CaptureHero,CaptureSection,CapturesMockup,AgentIntegration,SupportedModels,Personalities). New overview docs: captures, dictation, mcp-server, voice-personalities. Design notes indocs/plans/MCP_SERVER.mdanddocs/plans/VOICE_IO.md.Notable design decisions
personality: boolover a tri-state. Earlier iterations exposedintent: "respond" | "rewrite" | "compose"onvoicebox.speak. Collapsed to a boolean —respondis dropped (callers that want a reply should call an LLM),composeis a UI-only standalone button (POST /profiles/{id}/compose),rewriteis the only meaning ofpersonality: true.MCPClientBinding.default_intentmigrated todefault_personality: bool(legacy column dropped when SQLite supports it).compose_lifespan. AsyncExitStack enters factories in order (Voicebox startup → MCP startup) and exits LIFO (MCP teardown →_run_shutdown). MCP cancels in-flight sessions before models unload, so an agent mid-voicebox.speakat shutdown gets a clean session-cancelled error instead of crashing on "model unloaded."ContextVarfor client_id. Avoids plumbing the request through every service; middleware sets it, tool handlers read it._supports_drop_column) guard DDL that needs a newer SQLite.keytap. Upstreamrdevhasn't shipped a release since 2023-06 and its macOS 14 main-thread crash only lives on an unreleased branch. Rather than maintain a git-pinned fork, we wrotekeytapfrom the spec: observe-only cross-platform taps, left/right modifier fidelity, clean shutdown via Drop, Sonoma-safe by design, and aChordMatcherwith bothMomentaryandTogglemodes. Voicebox's PTT→hands-free upgrade falls out of longest-match resolution plus Toggle sticky-end semantics — voicebox no longer carries 80+ LOC of bespoke chord state machine, and supply-chain risk drops to a regular crates.io dep with semver.Follow-up / known limitations
Not blocking merge, but worth noting:
VK_V.SendInputdeliverswParam = VK_Vto the target regardless of active layout, which is why AutoHotkey'sSend "^v"works on Dvorak Windows, so empirically this is fine — but if a Windows-on-Dvorak user reports breakage,VkKeyScanExW(L'V', layout)is the parallel fix (~30 lines).speak_monitorreconnects silently after the idle timeout / backoff land the pill offline. No user-facing "backend unreachable" toast; that belongs in a separate server-health surface, not this loop.Test plan
yieldActivationToApplication:→activate) brings the target app forward; paste lands in the original window even when Voicebox wasn't frontmost.AttachThreadInputforeground-lock dance lands the paste in the original window. BOOL import fromwindows::corecompiles against windows-rs 0.62.claude-mcp-addsnippet from Settings → MCP, callvoicebox.speak({text, profile}), pill surfaces with agent speech.last_seen_atadvances on the binding row.voicebox-mcpbinary./speak—curl -H "X-Voicebox-Client-Id: test", binding shows up in the Settings UI as last-seen.voicebox.speak({text})with no profile, hear Morgan; override withprofile=and hear that voice instead.voicebox.speak({text, personality: true}), output is in character.personality: false(or unset with no binding default) is plain TTS.default_intentcolumn, verify it's dropped (or left in place on SQLite < 3.35 with a warning) anddefault_personalityexists with defaultfalse.pytest backend/tests/test_refinement_samples.pyandtest_personality_samples.pyproduce reasonable output across bundled model sizes.pytest backend/tests/test_refinement_collapse.py backend/tests/test_client_id_middleware.py— both pass (34 tests).Note
Cursor Bugbot is generating a summary for commit 687ab2a. Configure here.
Summary by CodeRabbit
New Features
UI
Documentation
Tests