fix: CoreAudio device enumeration, input device visibility, AudioTab UX#91
fix: CoreAudio device enumeration, input device visibility, AudioTab UX#91rgr4y wants to merge 34 commits intojamiepine:mainfrom
Conversation
- Update mlx-audio to 0.3.1 with tiktoken for whisper models - Suppress upstream tokenizer warnings from mlx-audio/transformers - Close audio player when starting new generation - Add progress completion notifications for model downloads
Spawned child processes in frozen binaries inherit sys.argv and choke on the parent's --data-dir/--port arguments. freeze_support() is the standard fix for PyInstaller-bundled Python apps.
The (Recommended) label was specific to Mac16,10 hardware. Remove it to keep model selection neutral for upstream contribution.
Models that haven't been downloaded now show their size from the HF API. Results are cached in memory for the server lifetime.
Generation progress is reported as a percentage (0-100) via Server-Sent Events. The MLX backend reports per-chunk progress; PyTorch reports start/complete. Use ?stream=true on /generate to opt into async mode. Backward compatible: without the param, the endpoint blocks as before.
Rename the self-contained CLI script from `voicebox` to `voicebox-cli` to avoid ambiguity with the project directory name.
Replace the simple RMS-based normalize_audio with proper EBU R128 LUFS normalization using pyloudnorm. This matches the ffmpeg loudnorm filter (I=-16, TP=-2, LRA=11) but runs entirely in Python. Changes: - normalize_audio() now uses pyloudnorm for LUFS measurement and normalization with -16 LUFS target and -2 dBTP true-peak limiting - Falls back to RMS normalization if pyloudnorm unavailable or audio too short (<0.4s) - save_audio() now auto-normalizes by default (normalize=True) - Generated audio is automatically normalized before saving - validate_reference_audio() no longer rejects audio for "clipping" (peak > 0.99 was too aggressive). Instead, normalizes reference audio in-place on upload for consistent quality - Backend callers pass sample_rate explicitly to normalize_audio() - Add pyloudnorm>=0.1.0 to requirements.txt
Replace the ffmpeg-only normalize_audio_sample with a cascading approach: 1. pyloudnorm (EBU R128 LUFS, best quality, no binary dep) 2. ffmpeg fallback (if pyloudnorm unavailable) 3. stdlib wave peak-normalization (last resort, zero deps) Simplify upload_profile_sample_with_retry: the backend now normalizes reference audio on upload instead of rejecting for clipping, so the elaborate 4-attempt attenuation cascade is replaced with a single retry.
The reference audio validator was too strict and user-hostile: - Rejected audio at exactly 30.0s (off-by-one: > vs >=) - Rejected any audio over 30s with no helpful guidance - Rejected most real recordings for "clipping" (peak > 0.99) Now it does the right thing automatically: - Clips up to 45s are auto-trimmed to 30s (takes first 30s) - Clips over 45s get a clear error with the auto-trim threshold - Duration check is inclusive (30.0s is valid) - Audio is loudness-normalized on upload instead of rejected - Error messages include actual duration for debugging
- Set default status filter for /jobs endpoint (queued,generating,cancelling,complete) - Fix immediate deletion: invalidate home-jobs query and remove from pending store on delete - Add keyboard accessibility to history cards (Enter/Space to play) - Fix all lint errors: - Add <title> elements to SVG icons in landing page - Use for...of instead of forEach to avoid callback return warnings - Add accessibility role attributes and keyboard handlers - Add biome-ignore comments for custom components requiring non-semantic elements - Improve button semantics in AudioSampleUpload (div -> button) - Add aria-labels to platform icons for screen readers
- Add copy button to History Card textarea with hover effect - Add import warning modal before file picker (informs users about .voicebox.zip requirement) - Fix force-cancel causing queue to get stuck (signal job worker after unload) - Fix MLX backend attempting fallback when model is None after cancel
Previously markApiEmission() was called before fetch(), causing the health check to think the server was connected even when API calls were failing. This prevented the connection gate from showing when the server was down. Now markApiEmission() is only called after a successful response, allowing proper detection of server disconnection. Minor changes CUDA=1 on .cuda .gitignore Docker simplify Tweaks
CUDA tweaks build with /opt to prevent interference tweaks
- Use transformers 5.0.0rc3 on Apple Silicon (required by MLX) - Install MLX deps first, then filter transformers from requirements.txt - Install qwen-tts from git with --no-deps to avoid version conflict - Extract qwen-tts dependencies into requirements.txt - Set Python version to 3.12.12 (compatible with all packages) - Remove backend/.python-version (use root-level only) Resolves dependency conflicts on macOS Apple Silicon while maintaining compatibility.
- serverless_handler.py: RunPod handler that embeds the FastAPI server in a background thread and proxies jobs as HTTP requests; binary audio responses are base64-encoded - idle_timer.py: skip scheduling when timeout <= 0 (serverless mode) - pytorch_backend/mlx_backend: read SERVERLESS env var to set idle timeouts to 0, keeping models loaded for the worker's lifetime - Dockerfile: add SERVERLESS build arg + conditional final stage that bakes source in, disables healthcheck, and uses serverless entrypoint; COPY backend/ added so image works without volume mount - requirements.txt: add runpod>=1.7.0 - scripts/serverless-build.sh: helper to build/push the serverless image - SERVERLESS.md: deployment guide (endpoint settings, request format, billing rationale, limitations) - .env: add RUNPOD_API_KEY placeholder
- Serialize MLX model loads with _MLX_LOAD_LOCK to prevent Metal command buffer crashes from concurrent preload + job worker loads - Switch MLX TTS models to 4-bit quantized Base variants (900MB vs 3.4GB bf16) — fixes swap pressure on 16GB, enables real 0.6B model - Suppress transformers verbosity at module import time to silence qwen3_tts/mistral-regex warnings cleanly - Unpin transformers (>=4.57.3) — qwen-tts works fine with 5.0.0rc3 - voicebox-cli: cmd_generate now uses async job queue + SSE progress bar instead of blocking sync POST; better connection error messages - voicebox-cli: resolve_profile saves/restores last-used voice to config - dev-backend-watch.sh: load .env from voicebox/ and ../ - LOG_LEVEL env var respected by uvicorn and Python logging - Model status endpoint pulls IDs from backend map dynamically
- Resolve requirements.txt conflict: keep torchaudio, einops, gradio, onnxruntime, sox from main; keep runpod>=1.7.0 from serverless - Remove tauri/src-tauri/gen/Assets.car from repo (compiled binary); add to .gitignore - Fix progress toast showing stale "complete" immediately: clear progress entry from ProgressManager after mark_complete so future SSE subscribers don't see old state - Emit status="loading" during cached model loads so UI shows a spinner - Fix useGenerationForm: subscribe to progress when model not loaded (not just when not downloaded) - Add "loading" status case to useModelDownloadToast
… progress - Rewrite useModelDownloadToast to poll /models/progress-snapshot instead of SSE - Fix HF_HUB_OFFLINE not taking effect (patch huggingface_hub.constants directly at runtime) - Fix tqdm monkey-patch: patch .update() on hf_tqdm class directly instead of sys.modules replacement - Fix start_download() to not overwrite existing entries (preserves asyncio_task for cancel) - Add _is_model_cached guard before generation to prevent silent re-downloads - Fix model size and language selects using defaultValue (uncontrolled) → value (controlled) - Fix form.reset() wiping model size and language; preserve sticky fields across generations - Add ModelManagement useEffect to clear local downloadingModel when server reports done - Add voicebox-cli say usage help when no text provided - Add useAudioDevices hook with devicechange listener and localStorage persistence - Add test_tqdm_patch.py for tqdm patching verification
Audio device fixes: - Replace cpal output enumeration with direct CoreAudio AudioObjectGetPropertyData calls on macOS — cpal misses HDMI/DisplayPort devices (known bug) - Add list_input_devices() via CoreAudio on macOS, cpal fallback on Windows/Linux - Expose default input device in AudioTab with 3s polling; toast on change - Refresh button animates spinner and shows success toast AudioTab UX: - Optimistic updates on channel device toggles so checkmarks appear immediately - Optimistic updates on setChannelVoices so assigned voices update without relaunch - Add feature explanation subtitle: "Route different voices to dedicated speakers..." - Note TODO in useStoryPlayback for per-voice device routing (not yet implemented) Build info: - Inject __GIT_HASH__ and __GIT_COMMIT_COUNT__ at build time via vite define - Show shortened git hash + commit count next to version in App Updates - Dev mode shows "dev-<hash>" Production server pre-check: - In production builds, health-check port 17493 before spawning sidecar - If server already running, reuse it (skips redundant startup) Bug fixes (pre-existing TS errors): - useAutoUpdater dep array was garbled (isTauricheckOnMountcheckForUpdates) - useAutoUpdater called with object instead of boolean in App.tsx - useGenerationForm missing isQueueLimitReached return value - StoryTrackEditor handleTimelineClick typed as HTMLDivElement, used on <button>
- Add input device selector to voice recording UI (ProfileForm + SampleUpload): shows a mic dropdown above the waveform when multiple input devices are available, re-acquires the preview stream when the selection changes - Thread selected deviceId through useAudioRecording so actual recordings use the chosen mic (MediaTrackConstraints.deviceId exact) - Hide Audio Channels tab on web (tauriOnly: true) — device routing requires Tauri - Log model size on MLX TTS/STT unload for easier memory debugging
|
Mac MLX generation went from 45s -> 5s on my benchmark! This is a HUGE commit - I've been making changes as I use it. I'm using the server portion as a backend for just a webapp my friends and I can use to make TTS's of eachother. |
There was a problem hiding this comment.
Pull request overview
This pull request fixes CoreAudio device enumeration on macOS, adds input device visibility and monitoring, improves AudioTab UX with optimistic updates, displays git build information, pre-checks for running servers, and fixes several pre-existing TypeScript errors.
Changes:
- CoreAudio direct enumeration via AudioObjectGetPropertyData to capture HDMI/DisplayPort devices that cpal misses
- Input device listing and default input monitoring with 3-second polling and change notifications
- AudioTab optimistic updates for instant UI feedback on channel device toggles and voice assignments
- Git hash and commit count injected at build time and displayed in App Updates
- Production server health check before spawning sidecar to avoid duplicate server processes
- Pre-existing TypeScript errors fixed in useAutoUpdater, App.tsx, useGenerationForm, and StoryTrackEditor
Reviewed changes
Copilot reviewed 82 out of 88 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tauri/src-tauri/src/audio_output.rs | Added CoreAudio FFI bindings for direct device enumeration on macOS, input device listing |
| tauri/vite.config.ts | Added git hash and commit count extraction for build info |
| tauri/src/platform/audio.ts | Added listInputDevices method to Tauri audio platform |
| app/src/components/AudioTab/AudioTab.tsx | Added default input display, refresh button, optimistic updates for channel operations |
| app/src/hooks/useAutoUpdater.ts | Fixed garbled dependency array |
| app/src/App.tsx | Added server pre-check, fixed wrong call signature |
| app/src/lib/hooks/useGenerationForm.ts | Added isQueueLimitReached return value |
| backend/backends/pytorch_backend.py | Added idle timer, improved caching detection, better logging |
| backend/backends/mlx_backend.py | Added load lock, idle timer, improved error handling and progress reporting |
| backend/database.py | Added generation metadata columns, job queue tables, migrations |
| backend/utils/* | New utility modules for idle timers, HF size queries, progress tracking improvements |
| Dockerfile, docker-compose.yml | Added Docker support with CUDA/CPU variants and serverless mode |
| scripts/* | Added serverless build script, backend watch script, Linux setup script |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if column_name not in columns: | ||
| logger.info(f"Migrating generation_jobs: adding {column_name} column") | ||
| with engine.connect() as conn: | ||
| conn.execute(text(f"ALTER TABLE generation_jobs ADD COLUMN {column_name} {column_type}")) | ||
| conn.commit() | ||
| logger.info(f"Added {column_name} column to generation_jobs") | ||
|
|
||
| # Migration: add generation metadata columns | ||
| if 'generations' in inspector.get_table_names(): | ||
| columns = {col['name'] for col in inspector.get_columns('generations')} | ||
| generation_additions = [ | ||
| ('generation_time_seconds', 'FLOAT'), | ||
| ('model_size', 'VARCHAR'), | ||
| ('backend_type', 'VARCHAR'), | ||
| ('request_user_id', 'VARCHAR'), | ||
| ('request_user_first_name', 'VARCHAR'), | ||
| ('request_ip', 'VARCHAR'), | ||
| ('deleted_at', 'DATETIME'), | ||
| ] | ||
| for column_name, column_type in generation_additions: | ||
| if column_name not in columns: | ||
| logger.info(f"Migrating generations: adding {column_name} column") | ||
| with engine.connect() as conn: | ||
| conn.execute(text(f"ALTER TABLE generations ADD COLUMN {column_name} {column_type}")) | ||
| conn.commit() |
There was a problem hiding this comment.
SQL injection vulnerability: The database migration code uses string formatting to construct SQL statements with column names from variables. While these are currently from trusted sources, this pattern is dangerous. Use parameterized queries or whitelist the allowed column names to prevent potential SQL injection if this code is modified in the future.
| function getGitCommitCount(): number { | ||
| try { | ||
| return parseInt(execSync('git rev-list --count HEAD', { encoding: 'utf8' }).trim(), 10); | ||
| } catch { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Missing validation: The getGitCommitCount function parses the output of git rev-list as an integer without validating the format. If git returns unexpected output, parseInt could return NaN which will break the build info display. Add validation and return 0 if parsing fails.
| fn device_name(id: AudioDeviceID) -> Option<String> { | ||
| unsafe { | ||
| let addr = AudioObjectPropertyAddress { | ||
| mSelector: kAudioDevicePropertyDeviceNameCFString, | ||
| mScope: kAudioObjectPropertyScopeGlobal, | ||
| mElement: kAudioObjectPropertyElementMaster, | ||
| }; | ||
| let mut cf_str: CFStringRef = std::ptr::null(); | ||
| let mut size = mem::size_of::<CFStringRef>() as u32; | ||
| if AudioObjectGetPropertyData( | ||
| id, | ||
| &addr, | ||
| 0, | ||
| std::ptr::null(), | ||
| &mut size, | ||
| &mut cf_str as *mut _ as *mut _, | ||
| ) != 0 { return None; } | ||
| if cf_str.is_null() { return None; } | ||
| let ptr = CFStringGetCStringPtr(cf_str, kCFStringEncodingUTF8); | ||
| if ptr.is_null() { return None; } | ||
| Some(CStr::from_ptr(ptr).to_string_lossy().into_owned()) | ||
| } |
There was a problem hiding this comment.
Memory leak: CFStringRef returned from AudioObjectGetPropertyData is not released. CoreFoundation strings need to be explicitly released with CFRelease when done to avoid memory leaks. Add CFRelease(cf_str) before returning.
| if AudioObjectGetPropertyDataSize( | ||
| kAudioObjectSystemObject, | ||
| &addr, | ||
| 0, | ||
| std::ptr::null(), | ||
| &mut size, | ||
| ) != 0 { return vec![]; } | ||
|
|
||
| let count = size as usize / mem::size_of::<AudioDeviceID>(); | ||
| let mut ids: Vec<AudioDeviceID> = vec![0u32; count]; | ||
| if AudioObjectGetPropertyData( | ||
| kAudioObjectSystemObject, | ||
| &addr, | ||
| 0, | ||
| std::ptr::null(), | ||
| &mut size, | ||
| ids.as_mut_ptr() as *mut _, | ||
| ) != 0 { return vec![]; } |
There was a problem hiding this comment.
Missing error handling: AudioObjectGetPropertyData return values are checked for != 0 but the error codes are not logged or handled meaningfully. Consider logging the actual CoreAudio error codes to aid debugging when device enumeration fails.
| let dev_id = format!("device_{}", name.replace(' ', "_").to_lowercase()); | ||
| let is_default = default_id.map_or(false, |d| d == id); | ||
| Some(CoreAudioDevice { id: dev_id, name, is_default }) | ||
| }) | ||
| .collect() | ||
| } | ||
|
|
||
| pub fn list_input_devices() -> Vec<CoreAudioDevice> { | ||
| let default_id = default_device_id(kAudioHardwarePropertyDefaultInputDevice); | ||
| get_all_device_ids() | ||
| .into_iter() | ||
| .filter(|&id| has_streams(id, kAudioObjectPropertyScopeInput)) | ||
| .filter_map(|id| { | ||
| let name = device_name(id)?; | ||
| let dev_id = format!("input_{}", name.replace(' ', "_").to_lowercase()); | ||
| let is_default = default_id.map_or(false, |d| d == id); | ||
| Some(CoreAudioDevice { id: dev_id, name, is_default }) |
There was a problem hiding this comment.
Inconsistent device ID format: Output devices use "device_" prefix while input devices use "input_" prefix. This makes it impossible to distinguish device types solely from the ID structure when both are stored together. Consider using a more structured format like "output:{name}" and "input:{name}" for clarity.
| // Check if a server is already running before trying to start one. | ||
| // This handles the case where a dev server (or a previous instance) is | ||
| // already listening — we can skip the sidecar startup entirely. | ||
| const tryExistingServer = async (): Promise<boolean> => { | ||
| try { | ||
| const res = await fetch(`${SERVER_URL}/health`, { signal: AbortSignal.timeout(1500) }); | ||
| if (res.ok) { | ||
| console.log('Production mode: Found server already running, reusing it.'); | ||
| useServerStore.getState().setServerUrl(SERVER_URL); | ||
| setServerReady(true); | ||
| window.__voiceboxServerStartedByApp = false; | ||
| return true; | ||
| } | ||
| } catch { | ||
| // Not running — fall through to sidecar startup | ||
| } |
There was a problem hiding this comment.
Race condition: The server health check uses a 1.5 second timeout but immediately starts the sidecar if the check fails. If the existing server is slow to respond (but still running), this will attempt to start a duplicate server which will fail due to port binding. Consider increasing the timeout to 3-5 seconds or retrying the health check before starting the sidecar.
| // Check if a server is already running before trying to start one. | |
| // This handles the case where a dev server (or a previous instance) is | |
| // already listening — we can skip the sidecar startup entirely. | |
| const tryExistingServer = async (): Promise<boolean> => { | |
| try { | |
| const res = await fetch(`${SERVER_URL}/health`, { signal: AbortSignal.timeout(1500) }); | |
| if (res.ok) { | |
| console.log('Production mode: Found server already running, reusing it.'); | |
| useServerStore.getState().setServerUrl(SERVER_URL); | |
| setServerReady(true); | |
| window.__voiceboxServerStartedByApp = false; | |
| return true; | |
| } | |
| } catch { | |
| // Not running — fall through to sidecar startup | |
| } | |
| const HEALTHCHECK_TIMEOUT_MS = 1500; | |
| const HEALTHCHECK_MAX_ATTEMPTS = 3; | |
| const HEALTHCHECK_RETRY_DELAY_MS = 500; | |
| const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); | |
| // Check if a server is already running before trying to start one. | |
| // This handles the case where a dev server (or a previous instance) is | |
| // already listening — we can skip the sidecar startup entirely. | |
| const tryExistingServer = async (): Promise<boolean> => { | |
| for (let attempt = 1; attempt <= HEALTHCHECK_MAX_ATTEMPTS; attempt++) { | |
| try { | |
| const res = await fetch(`${SERVER_URL}/health`, { | |
| signal: AbortSignal.timeout(HEALTHCHECK_TIMEOUT_MS), | |
| }); | |
| if (res.ok) { | |
| console.log('Production mode: Found server already running, reusing it.'); | |
| useServerStore.getState().setServerUrl(SERVER_URL); | |
| setServerReady(true); | |
| window.__voiceboxServerStartedByApp = false; | |
| return true; | |
| } | |
| } catch { | |
| // Not running or slow to respond on this attempt — retry if attempts remain. | |
| } | |
| if (attempt < HEALTHCHECK_MAX_ATTEMPTS) { | |
| await sleep(HEALTHCHECK_RETRY_DELAY_MS); | |
| } | |
| } |
- Catch httpx.RequestError (not just ConnectError) in _wait_for_server so transient timeouts don't abort the startup polling loop - Clear _server_ready before (re)starting server thread to prevent stale ready state if uvicorn dies and restarts - Remove duplicate database.init_db() in _start_server; app startup hook handles DB init - Move `import os` to top of mlx_backend.py and pytorch_backend.py - Fix script name in serverless-build.sh usage comments (dash not underscore) - Guard --tag arg: print clear error if value is missing
…ringPtr null - Delete app/src/hooks/useAutoUpdater.ts (dead code; .tsx takes module resolution precedence and has the full implementation) - Fix malformed dependency array in useAutoUpdater.tsx line 76: [platform.metadata.isTauricheckOnMountcheckForUpdates] → [platform.metadata.isTauri, checkOnMount, checkForUpdates] - Fix CFStringGetCStringPtr null dereference in audio_output.rs: the CoreAudio API may return NULL from CFStringGetCStringPtr even for valid strings when internal storage isn't UTF-8; fall back to CFStringGetCString with a stack buffer
…t lint hook
- Remove unused imports and fix ruff lint issues in backend/main.py:
StaticFiles, os (x2), f-strings without placeholders, config loop var
shadowing module import, unused stt_size variable
- Inline MLX default model ID constant instead of instantiating MLXTTSBackend
- Fix useAutoUpdater call in App.tsx: pass { checkOnMount: true, showToast: true }
so toast notifications actually fire (boolean API always set showToast=false)
- Add .githooks/pre-commit: ruff for Python, biome for JS/TS staged files
Enable with: git config core.hooksPath .githooks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…utput.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ne#88, jamiepine#91, jamiepine#95, jamiepine#97) Co-authored-by: Patrick-DE <14962702+Patrick-DE@users.noreply.github.com>
Summary
output_devices()withAudioObjectGetPropertyDatacalls on macOS — cpal misses HDMI/DisplayPort devices (known upstream bug). Full device list now matches Audio MIDI Setup.list_input_devices()with CoreAudio on macOS (cpal fallback on Windows/Linux). AudioTab shows default input device with 3s polling and toasts when it changes.v0.1.12 abc1234 #264).useAutoUpdater, wrong call signature inApp.tsx, missingisQueueLimitReachedreturn fromuseGenerationForm, wrong mouse event type inStoryTrackEditor.useStoryPlaybackmarking where per-voice device routing needs to be wired up for Stories mode.Test plan
v0.1.12 abc1234 #264)tsc --noEmitpasses with zero errors🤖 Generated with Claude Code