Skip to content

Comments

CUDA GPU Support - External Provider Binaries#33

Open
jamiepine wants to merge 34 commits intomainfrom
external-provider-binaries
Open

CUDA GPU Support - External Provider Binaries#33
jamiepine wants to merge 34 commits intomainfrom
external-provider-binaries

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Feb 1, 2026

Overview

Solves the 2GB GitHub release limit by splitting TTS providers into separate downloadable binaries served from downloads.voicebox.sh. The desktop app can now dynamically download and switch between PyTorch CPU, PyTorch CUDA, and future provider types (OpenAI, custom servers).

Screenshot 2026-02-03 at 6 16 50 AM

Problem

The CUDA backend binary is ~2.4GB, which exceeds GitHub's 2GB release asset limit. This blocked GPU users from getting official releases and forced everyone to re-download massive binaries for every app update.

Reference: https://github.com/jamiepine/voicebox/issues/[issue-number]

Solution

Split the monolithic backend into modular components:

  1. Main App (~150MB Win/Linux, ~300MB macOS with MLX)

    • Tauri + React UI
    • FastAPI backend without PyTorch
    • Whisper (bundled, ~50MB)
    • macOS includes MLX for out-of-the-box functionality
  2. External Providers (downloadable on-demand)

    • PyTorch CPU (~300MB)
    • PyTorch CUDA (~2.4GB, NVIDIA GPU)
    • Future: OpenAI API, custom remote servers

Providers are downloaded from Cloudflare R2 at downloads.voicebox.sh and run as standalone HTTP servers that communicate with the main app.

Architecture

Backend:

  • backend/providers/ - Provider management system
    • base.py - TTSProvider protocol
    • bundled.py - Uses bundled MLX/PyTorch (macOS/fallback)
    • local.py - HTTP client for external provider servers
    • installer.py - Downloads providers from R2
    • __init__.py - ProviderManager lifecycle

Provider Binaries:

  • providers/pytorch-cpu/ - Standalone CPU inference server
  • providers/pytorch-cuda/ - Standalone GPU inference server
  • Each includes FastAPI server + TTS backend
  • Built via PyInstaller, uploaded to R2 in CI

Frontend:

  • ProviderSettings.tsx - UI for downloading/managing providers
  • Shows installed providers, download buttons with progress
  • Start/stop/delete controls

CI/CD:

  • Separate jobs: build-providers (uploads to R2) and release (main app)
  • Main app binaries stay under 2GB and upload to GitHub releases
  • Provider binaries bypass GitHub entirely

Changes

Backend

  • Provider abstraction layer with pluggable backends
  • HTTP-based communication between app and external providers
  • Progress tracking for provider downloads (SSE)
  • New API endpoints:
    • GET /providers - List all providers
    • GET /providers/installed - List installed
    • GET /providers/active - Get active provider info
    • POST /providers/start - Start a provider
    • POST /providers/download - Download a provider
    • DELETE /providers/{type} - Delete a provider

Frontend

  • Provider settings UI with download management
  • Progress toasts for downloads
  • Platform detection (hides CUDA on macOS, shows MLX bundled message)
  • Placeholders for OpenAI and custom server support

CI/CD

  • Multi-stage release workflow
  • Provider binaries built and uploaded to R2
  • Main app excludes PyTorch on Windows/Linux
  • Independent versioning (app version vs provider version)

Status

Implemented:

  • ✅ Provider architecture and abstraction
  • ✅ PyTorch CPU and CUDA provider binaries
  • ✅ Download management with progress tracking
  • ✅ Provider lifecycle (start/stop/delete)
  • ✅ CI/CD pipeline for R2 uploads
  • ✅ Frontend UI for provider management
  • ✅ macOS MLX bundled support

Not Yet Implemented:

  • ⏳ OpenAI API provider
  • ⏳ Custom remote server provider
  • ⏳ Provider health monitoring and auto-failover
  • ⏳ Provider versioning and compatibility checks

Note

High Risk
Changes core inference packaging/runtime selection (provider start/download) and release/distribution pipelines (R2 uploads + Docker publishing), so misconfiguration can break voice generation or shipping artifacts across platforms.

Overview
Introduces a provider-based TTS architecture so large PyTorch CPU/CUDA backends can be downloaded and started separately instead of being shipped inside every desktop release, including new backend provider endpoints and a new Server Settings UI (ProviderSettings) for install/start/delete with progress tracking.

Updates the GitHub release workflow to build provider binaries across OSes and upload them to R2 (with the main app no longer bundling TTS on Linux), and adds CPU/CUDA Docker images plus .dockerignore/README docs for container deployment.

Also bumps the app version to 0.1.13, tightens backend import errors when bundled ML dependencies are missing, and does a broad UI icon/loader refresh plus a couple small TS correctness fixes.

Written by Cursor Bugbot for commit af7e981. Configure here.

- Added support for TTS providers in the backend, including endpoints for listing, starting, stopping, and downloading providers.
- Enhanced the release workflow to build and upload TTS provider binaries for both Windows and Linux platforms.
- Updated the architecture documentation to reflect the new provider system and its benefits for modularity and user experience.
- Introduced a new `ProviderSettings` component in the frontend for managing provider configurations.
…ild scripts

- Introduced a new attribute `_current_model_size` in `LocalProvider` to store the current model size, allowing for dynamic configuration during generation.
- Updated the `generate` method to use the current model size instead of a hardcoded value.
- Modified the `load_model` method to track the requested model size.
- Removed platform-specific extension handling from the build scripts for both CPU and CUDA providers to streamline the build process.
- Updated the release workflow to include a new configuration for the Ubuntu 22.04 platform without TTS bundled.
- Added the @radix-ui/react-radio-group dependency to package.json.
- Implemented a new RadioGroup component for better UI handling of radio inputs.
- Commented out the PyTorch CPU configuration in the release workflow for Ubuntu 22.04.
- Updated TTS provider documentation to clarify options for Windows and Linux users.
- Enhanced build scripts for both CPU and CUDA providers by excluding large unused modules to reduce binary size.
- Introduced a new `linux.rs` module for audio capture, indicating that audio capture is not supported on Linux at this time.
- Updated `mod.rs` to include the Linux module conditionally based on the target OS.
- Renamed `load_model` to `load_model_async` in TTS provider classes for clarity and consistency.
- Added compatibility alias for `load_model` to maintain existing functionality.
- Enhanced `get_model_status` to handle both synchronous and asynchronous check functions.
- Updated version numbers in `bun.lock` and `Cargo.lock` to 0.1.12, reflecting recent changes.
@jamiepine jamiepine mentioned this pull request Feb 1, 2026
@jamiepine jamiepine changed the title External Provider Binaries - Solve 2GB GitHub Limit CUDA GPU Support - External Provider Binaries Feb 1, 2026
- Added macOS support for PyTorch CPU providers in the release workflow.
- Updated the ProviderSettings component to handle macOS-specific conditions and improve UI interactions.
- Refactored the radio group component styles for better accessibility and visual consistency.
- Improved provider management logic to ensure proper handling of available providers across different platforms.
- Renamed `bundled-mlx` to `apple-mlx` for clarity in provider types.
- Updated the ProviderSettings component to reflect the new provider naming.
- Improved logging for provider startup and error handling in the backend.
- Added scripts for building and installing PyTorch CPU and CUDA providers locally.
- Enhanced the documentation to include details on TTS provider architecture and development setup.
- Updated the `ProviderSettings` component to log the current active provider.
- Changed the provider health status to use specific names for MLX and PyTorch backends.
- Removed unnecessary exclusions from the build scripts for both PyTorch CPU and CUDA providers.
- Ensured consistency in the `.spec` files for PyTorch providers by aligning exclusion lists.
@deathreaperz
Copy link

deathreaperz commented Feb 1, 2026

image

Idk if it's related or not, but me having 3 voice clones with lots of samples causes longer rendering time and memory leaks; it often doesn't finish after waiting for hours, just to make me force close voicebox-server, which is probably because I'm using CPU instead of GPU. Wish I could test this PR to prove me right. 😔

@jamiepine
Copy link
Owner Author

jamiepine commented Feb 1, 2026

@deathreaperz you can test GPU easily by just cloning the repo (on main not this PR) and running the server with

bun run dev:server --port 17493

ensure the production server is not running in task manager when you do this

others are using this method to get GPU on windows and linux rn

opening the Voicebox app will now connect to the dev server instead of the bundled CPU version

- Clarified the bundling of PyTorch CPU providers for Windows and macOS Intel builds in documentation.
- Improved handling of platform-specific dependencies in the build process, including asyncio support for PyInstaller.
- Updated backend logic to gracefully handle missing dependencies and provide clearer error messages.
- Enhanced progress management to ensure compatibility with PyInstaller's async handling.
- Removed unnecessary exclusions from the build scripts for PyTorch providers to streamline the build process.
- Changed the backend setting for Windows from "none" to "pytorch" to ensure compatibility with bundled PyTorch CPU providers.
- Updated comments for clarity regarding the Windows setup.
- Included a conditional installation of CPU-only PyTorch packages for Ubuntu 22.04 to reduce unnecessary CUDA dependencies.
- Updated the release workflow to ensure compatibility with CPU-focused builds on Linux.
- Removed `lucide-react` version 0.454.0 and downgraded to version 0.316.0 in `bun.lock`.
- Added `@tailwindcss/vite` and `tailwindcss` as development dependencies in `package.json`.
- Updated Vite configuration to include Tailwind CSS plugin.
- Set the HTML document to use a dark theme by adding the `class="dark"` attribute to the `<html>` tag.
- Introduced Docker support with CPU-only and GPU-enabled configurations via Dockerfiles and docker-compose files.
- Added a .dockerignore file to exclude unnecessary files from Docker images.
- Updated bun.lock and package.json to include new dependencies for icon handling.
- Enhanced README with Docker usage instructions and deployment options.
- Refactored components to utilize new icon libraries for improved UI consistency.
- Added functionality to create log files for provider output, improving debugging on Windows.
- Updated error handling to read from log files instead of using subprocess output directly.
- Enhanced logging messages to include log file locations for easier troubleshooting.
- Revised README to include links for downloading the latest releases for macOS, Windows, and Linux.
- Added detailed instructions for running Voicebox with Docker, including Docker Compose usage.
- Updated installation documentation to reflect Linux availability and provide specific download options for AppImage and Deb packages.
- Enhanced clarity in Docker documentation regarding accessing the web UI.
- Added support for packaging provider archives in the release workflow, creating platform-specific zip and tar.gz files for distribution.
- Updated the `.gitignore` to exclude `.spec` files.
- Introduced a new `CudaDownloadSection` component to manage CUDA downloads, including progress tracking and error handling.
- Refactored provider download logic to handle archive extraction and cleanup after download.
- Improved subprocess output handling in the provider manager for better logging and error reporting.
…e data

- Added environment variables to Dockerfiles to set non-interactive mode and configure the timezone to UTC.
- Included installation of `tzdata` in both Dockerfiles to support timezone configuration during the build process.
… interactions

- Introduced a loading state to indicate when a provider is starting, enhancing user experience.
- Disabled radio buttons and action buttons during the loading state to prevent user interaction.
- Updated UI elements to reflect the loading state, including a spinner and appropriate cursor styles.
@jamiepine jamiepine marked this pull request as ready for review February 2, 2026 13:59
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 5 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

queryFn: () => apiClient.getCudaStatus(),
enabled: isWindows && platform.metadata.isTauri && gpuAvailable,
retry: false,
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

React hooks called after early return statement

High Severity

The component has useQuery (line 33) and useEffect (line 58) called after an early return statement (lines 28-30). This violates React's Rules of Hooks which require hooks to be called unconditionally and in the same order on every render. When the early return condition is met, fewer hooks are called, causing React to throw a "Rendered fewer hooks than expected" error.

Additional Locations (1)

Fix in Cursor Fix in Web

// Determine current active provider
const currentProvider = activeProvider?.provider;
console.log('currentProvider', currentProvider);
const selectedProvider = currentProvider as ProviderType;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug console.log statement left in production code

Low Severity

A console.log('currentProvider', currentProvider) statement is present in the ProviderSettings component. This debugging output will appear in production browser consoles, which is unprofessional and could leak implementation details to users inspecting the console.

Fix in Cursor Fix in Web

…tainability

- Removed unnecessary line breaks in the rendering of radio buttons and labels for a cleaner code structure.
- Added missing imports for numpy and scipy in build_binary.py to ensure proper functionality.
- Updated binary Assets.car file to reflect recent changes.
…r paths

- Introduced a new DataFolders component to display and manage paths for application data, models, and providers.
- Implemented a FolderRow component for individual folder display, including loading states and open folder functionality.
- Added API client methods and hooks to fetch system folder paths from the backend.
- Updated ServerTab to include the new DataFolders component, enhancing server settings management.
- Replaced python3-pip with python3.12-venv in the installation process.
- Added commands to ensure pip is upgraded and installed after setting Python 3.12 as the default.
- Streamlined the installation of PyTorch by removing redundant pip upgrade command.
- Enhanced text descriptions for TTS providers, clarifying the functionality of PyTorch CUDA and Apple MLX.
- Removed outdated references to availability and added notes regarding model version differences.
- Improved UI structure by ensuring consistent labeling and disabling options based on system compatibility.
- Added platform detection for macOS and Windows to improve user experience.
- Updated UI to conditionally disable options and provide clearer guidance based on installed providers.
- Refactored button states and labels for PyTorch CUDA and CPU to reflect availability and download status accurately.
… UI consistency

- Removed unnecessary class from the FloatingGenerateBox button for cleaner styling.
- Updated HistoryTable to remove redundant class from the delete button.
- Reorganized imports in dropdown-menu for better readability.
- Enhanced useAutoUpdater hook with improved dependency management and updated icon usage.
- Deleted obsolete useAutoUpdater.ts file to streamline codebase.
- Introduced useMemo to calculate bottom padding based on the visibility of the StoryTrackEditor and FloatingGenerateBox.
- Updated imports to include useStory hook for fetching the current story data.
- Adjusted the StoryList layout to accommodate the new padding logic, improving UI consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants