CUDA GPU Support - External Provider Binaries#33
Conversation
- Added support for TTS providers in the backend, including endpoints for listing, starting, stopping, and downloading providers. - Enhanced the release workflow to build and upload TTS provider binaries for both Windows and Linux platforms. - Updated the architecture documentation to reflect the new provider system and its benefits for modularity and user experience. - Introduced a new `ProviderSettings` component in the frontend for managing provider configurations.
…ild scripts - Introduced a new attribute `_current_model_size` in `LocalProvider` to store the current model size, allowing for dynamic configuration during generation. - Updated the `generate` method to use the current model size instead of a hardcoded value. - Modified the `load_model` method to track the requested model size. - Removed platform-specific extension handling from the build scripts for both CPU and CUDA providers to streamline the build process.
…om 0.1.0 to 0.0.5
- Updated the release workflow to include a new configuration for the Ubuntu 22.04 platform without TTS bundled. - Added the @radix-ui/react-radio-group dependency to package.json. - Implemented a new RadioGroup component for better UI handling of radio inputs.
- Commented out the PyTorch CPU configuration in the release workflow for Ubuntu 22.04. - Updated TTS provider documentation to clarify options for Windows and Linux users. - Enhanced build scripts for both CPU and CUDA providers by excluding large unused modules to reduce binary size.
- Introduced a new `linux.rs` module for audio capture, indicating that audio capture is not supported on Linux at this time. - Updated `mod.rs` to include the Linux module conditionally based on the target OS.
- Renamed `load_model` to `load_model_async` in TTS provider classes for clarity and consistency. - Added compatibility alias for `load_model` to maintain existing functionality. - Enhanced `get_model_status` to handle both synchronous and asynchronous check functions. - Updated version numbers in `bun.lock` and `Cargo.lock` to 0.1.12, reflecting recent changes.
- Added macOS support for PyTorch CPU providers in the release workflow. - Updated the ProviderSettings component to handle macOS-specific conditions and improve UI interactions. - Refactored the radio group component styles for better accessibility and visual consistency. - Improved provider management logic to ensure proper handling of available providers across different platforms.
- Renamed `bundled-mlx` to `apple-mlx` for clarity in provider types. - Updated the ProviderSettings component to reflect the new provider naming. - Improved logging for provider startup and error handling in the backend. - Added scripts for building and installing PyTorch CPU and CUDA providers locally. - Enhanced the documentation to include details on TTS provider architecture and development setup.
- Updated the `ProviderSettings` component to log the current active provider. - Changed the provider health status to use specific names for MLX and PyTorch backends. - Removed unnecessary exclusions from the build scripts for both PyTorch CPU and CUDA providers. - Ensured consistency in the `.spec` files for PyTorch providers by aligning exclusion lists.
|
@deathreaperz you can test GPU easily by just cloning the repo (on main not this PR) and running the server with bun run dev:server --port 17493 ensure the production server is not running in task manager when you do this others are using this method to get GPU on windows and linux rn opening the Voicebox app will now connect to the dev server instead of the bundled CPU version |
- Clarified the bundling of PyTorch CPU providers for Windows and macOS Intel builds in documentation. - Improved handling of platform-specific dependencies in the build process, including asyncio support for PyInstaller. - Updated backend logic to gracefully handle missing dependencies and provide clearer error messages. - Enhanced progress management to ensure compatibility with PyInstaller's async handling. - Removed unnecessary exclusions from the build scripts for PyTorch providers to streamline the build process.
- Changed the backend setting for Windows from "none" to "pytorch" to ensure compatibility with bundled PyTorch CPU providers. - Updated comments for clarity regarding the Windows setup.
- Included a conditional installation of CPU-only PyTorch packages for Ubuntu 22.04 to reduce unnecessary CUDA dependencies. - Updated the release workflow to ensure compatibility with CPU-focused builds on Linux.
- Removed `lucide-react` version 0.454.0 and downgraded to version 0.316.0 in `bun.lock`. - Added `@tailwindcss/vite` and `tailwindcss` as development dependencies in `package.json`. - Updated Vite configuration to include Tailwind CSS plugin. - Set the HTML document to use a dark theme by adding the `class="dark"` attribute to the `<html>` tag.
- Introduced Docker support with CPU-only and GPU-enabled configurations via Dockerfiles and docker-compose files. - Added a .dockerignore file to exclude unnecessary files from Docker images. - Updated bun.lock and package.json to include new dependencies for icon handling. - Enhanced README with Docker usage instructions and deployment options. - Refactored components to utilize new icon libraries for improved UI consistency.
- Added functionality to create log files for provider output, improving debugging on Windows. - Updated error handling to read from log files instead of using subprocess output directly. - Enhanced logging messages to include log file locations for easier troubleshooting.
- Revised README to include links for downloading the latest releases for macOS, Windows, and Linux. - Added detailed instructions for running Voicebox with Docker, including Docker Compose usage. - Updated installation documentation to reflect Linux availability and provide specific download options for AppImage and Deb packages. - Enhanced clarity in Docker documentation regarding accessing the web UI.
- Added support for packaging provider archives in the release workflow, creating platform-specific zip and tar.gz files for distribution. - Updated the `.gitignore` to exclude `.spec` files. - Introduced a new `CudaDownloadSection` component to manage CUDA downloads, including progress tracking and error handling. - Refactored provider download logic to handle archive extraction and cleanup after download. - Improved subprocess output handling in the provider manager for better logging and error reporting.
…e data - Added environment variables to Dockerfiles to set non-interactive mode and configure the timezone to UTC. - Included installation of `tzdata` in both Dockerfiles to support timezone configuration during the build process.
… interactions - Introduced a loading state to indicate when a provider is starting, enhancing user experience. - Disabled radio buttons and action buttons during the loading state to prevent user interaction. - Updated UI elements to reflect the loading state, including a spinner and appropriate cursor styles.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 5 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| queryFn: () => apiClient.getCudaStatus(), | ||
| enabled: isWindows && platform.metadata.isTauri && gpuAvailable, | ||
| retry: false, | ||
| }); |
There was a problem hiding this comment.
React hooks called after early return statement
High Severity
The component has useQuery (line 33) and useEffect (line 58) called after an early return statement (lines 28-30). This violates React's Rules of Hooks which require hooks to be called unconditionally and in the same order on every render. When the early return condition is met, fewer hooks are called, causing React to throw a "Rendered fewer hooks than expected" error.
Additional Locations (1)
| // Determine current active provider | ||
| const currentProvider = activeProvider?.provider; | ||
| console.log('currentProvider', currentProvider); | ||
| const selectedProvider = currentProvider as ProviderType; |
There was a problem hiding this comment.
Debug console.log statement left in production code
Low Severity
A console.log('currentProvider', currentProvider) statement is present in the ProviderSettings component. This debugging output will appear in production browser consoles, which is unprofessional and could leak implementation details to users inspecting the console.
…tainability - Removed unnecessary line breaks in the rendering of radio buttons and labels for a cleaner code structure. - Added missing imports for numpy and scipy in build_binary.py to ensure proper functionality. - Updated binary Assets.car file to reflect recent changes.
…r paths - Introduced a new DataFolders component to display and manage paths for application data, models, and providers. - Implemented a FolderRow component for individual folder display, including loading states and open folder functionality. - Added API client methods and hooks to fetch system folder paths from the backend. - Updated ServerTab to include the new DataFolders component, enhancing server settings management.
- Replaced python3-pip with python3.12-venv in the installation process. - Added commands to ensure pip is upgraded and installed after setting Python 3.12 as the default. - Streamlined the installation of PyTorch by removing redundant pip upgrade command.
- Enhanced text descriptions for TTS providers, clarifying the functionality of PyTorch CUDA and Apple MLX. - Removed outdated references to availability and added notes regarding model version differences. - Improved UI structure by ensuring consistent labeling and disabling options based on system compatibility.
- Added platform detection for macOS and Windows to improve user experience. - Updated UI to conditionally disable options and provide clearer guidance based on installed providers. - Refactored button states and labels for PyTorch CUDA and CPU to reflect availability and download status accurately.
… UI consistency - Removed unnecessary class from the FloatingGenerateBox button for cleaner styling. - Updated HistoryTable to remove redundant class from the delete button. - Reorganized imports in dropdown-menu for better readability. - Enhanced useAutoUpdater hook with improved dependency management and updated icon usage. - Deleted obsolete useAutoUpdater.ts file to streamline codebase.
- Introduced useMemo to calculate bottom padding based on the visibility of the StoryTrackEditor and FloatingGenerateBox. - Updated imports to include useStory hook for fetching the current story data. - Adjusted the StoryList layout to accommodate the new padding logic, improving UI consistency.



Overview
Solves the 2GB GitHub release limit by splitting TTS providers into separate downloadable binaries served from downloads.voicebox.sh. The desktop app can now dynamically download and switch between PyTorch CPU, PyTorch CUDA, and future provider types (OpenAI, custom servers).
Problem
The CUDA backend binary is ~2.4GB, which exceeds GitHub's 2GB release asset limit. This blocked GPU users from getting official releases and forced everyone to re-download massive binaries for every app update.
Reference: https://github.com/jamiepine/voicebox/issues/[issue-number]
Solution
Split the monolithic backend into modular components:
Main App (~150MB Win/Linux, ~300MB macOS with MLX)
External Providers (downloadable on-demand)
Providers are downloaded from Cloudflare R2 at downloads.voicebox.sh and run as standalone HTTP servers that communicate with the main app.
Architecture
Backend:
backend/providers/- Provider management systembase.py- TTSProvider protocolbundled.py- Uses bundled MLX/PyTorch (macOS/fallback)local.py- HTTP client for external provider serversinstaller.py- Downloads providers from R2__init__.py- ProviderManager lifecycleProvider Binaries:
providers/pytorch-cpu/- Standalone CPU inference serverproviders/pytorch-cuda/- Standalone GPU inference serverFrontend:
ProviderSettings.tsx- UI for downloading/managing providersCI/CD:
build-providers(uploads to R2) andrelease(main app)Changes
Backend
GET /providers- List all providersGET /providers/installed- List installedGET /providers/active- Get active provider infoPOST /providers/start- Start a providerPOST /providers/download- Download a providerDELETE /providers/{type}- Delete a providerFrontend
CI/CD
Status
Implemented:
Not Yet Implemented:
Note
High Risk
Changes core inference packaging/runtime selection (provider start/download) and release/distribution pipelines (R2 uploads + Docker publishing), so misconfiguration can break voice generation or shipping artifacts across platforms.
Overview
Introduces a provider-based TTS architecture so large PyTorch CPU/CUDA backends can be downloaded and started separately instead of being shipped inside every desktop release, including new backend provider endpoints and a new Server Settings UI (
ProviderSettings) for install/start/delete with progress tracking.Updates the GitHub release workflow to build provider binaries across OSes and upload them to R2 (with the main app no longer bundling TTS on Linux), and adds CPU/CUDA Docker images plus
.dockerignore/README docs for container deployment.Also bumps the app version to
0.1.13, tightens backend import errors when bundled ML dependencies are missing, and does a broad UI icon/loader refresh plus a couple small TS correctness fixes.Written by Cursor Bugbot for commit af7e981. Configure here.