Skip to content

Upgrade CUDA backend from cu126 to cu128, fix GPU settings UI#316

Merged
jamiepine merged 1 commit intomainfrom
fix/cuda-cu128-upgrade
Mar 18, 2026
Merged

Upgrade CUDA backend from cu126 to cu128, fix GPU settings UI#316
jamiepine merged 1 commit intomainfrom
fix/cuda-cu128-upgrade

Conversation

@jamiepine
Copy link
Copy Markdown
Owner

@jamiepine jamiepine commented Mar 18, 2026

Summary

  • Upgrade CUDA toolkit from 12.6 (cu126) to 12.8 (cu128) for proper RTX 50-series (Blackwell) GPU support — users with RTX 5070/5080/5090 were reporting CUDA detection failures with cu126.
  • Fix the GPU Acceleration settings panel where the "Switch to CPU Backend" button was unreachable once running on CUDA (the button was inside a !isCurrentlyCuda guard, making it impossible to switch back).
  • Bump minimum torch version to >=2.7.0 (required for cu128 compatibility).

Changes

CUDA cu126 → cu128 (8 locations)

  • backend/services/cuda.pyCUDA_LIBS_VERSION constant
  • .github/workflows/release.yml — PyTorch install, packaging args, release asset filenames
  • scripts/package_cuda.py — CLI defaults and docstring
  • backend/build_binary.py — CUDA torch restore URL
  • justfile — dev setup PyTorch install
  • backend/requirements.txt — torch minimum version
  • docs/content/docs/developer/building.mdx — developer docs

GPU Settings UI fix

  • app/src/components/ServerSettings/GpuAcceleration.tsx — moved "Switch to CPU Backend" into its own top-level conditional block that renders when isCurrentlyCuda is true, so users can actually switch back to CPU

Related issues

CUDA/GPU compatibility (#314, #313, #310, #301) — issues with CUDA 13.1, DirectML/AMD, RTX 5070, and CUDA errors during generation.

Closes #315

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced GPU acceleration interface with live status displays and improved controls for managing CPU/GPU backend switching, including better error handling and status feedback.
  • Chores

    • Updated CUDA toolkit to version 12.8 and PyTorch to version 2.7.0 for improved GPU support and performance compatibility.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3bec47e0-8bc6-4d72-b081-dca3cce88f14

📥 Commits

Reviewing files that changed from the base of the PR and between 2925355 and fc5ed1f.

📒 Files selected for processing (8)
  • .github/workflows/release.yml
  • app/src/components/ServerSettings/GpuAcceleration.tsx
  • backend/build_binary.py
  • backend/requirements.txt
  • backend/services/cuda.py
  • docs/content/docs/developer/building.mdx
  • justfile
  • scripts/package_cuda.py

📝 Walkthrough

Walkthrough

This PR updates CUDA toolkit support from version 12.6 to 12.8 across build workflows, backend services, and packaging scripts. It also bumps PyTorch from 2.1.0 to 2.7.0 and refactors the GPU acceleration UI to better handle active CUDA states with improved status indicators.

Changes

Cohort / File(s) Summary
CUDA 12.8 Build & Deployment Configuration
.github/workflows/release.yml, backend/build_binary.py, docs/content/docs/developer/building.mdx, justfile
Updated CUDA wheel installation and archive references from cu126 to cu128; adjusted paths and artifact names accordingly.
CUDA Version Constants & Tooling
backend/services/cuda.py, scripts/package_cuda.py
Updated CUDA_LIBS_VERSION constant from "cu126-v1" to "cu128-v1" and adjusted default argument values and help text to reflect the new version.
PyTorch Version Upgrade
backend/requirements.txt, scripts/package_cuda.py
Bumped torch from 2.1.0 to 2.7.0 and updated torch compatibility constraint from ">=2.6.0,<2.11.0" to ">=2.7.0,<2.11.0".
GPU Acceleration UI Refactor
app/src/components/ServerSettings/GpuAcceleration.tsx
Introduced CUDA-active branch with live restart/status indicator and "Switch to CPU Backend" button; restructured conditional rendering to display delete option when CUDA is available; consolidated error display and status messaging logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 Cuda springs forward, eight and two in flight,
Torches burn brighter at version twenty-seven's height,
Switches now glow when the GPU takes the lead,
Delete buttons ready for those who change their creed.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes significant CosyVoice TTS engine integration changes that are unrelated to the linked issue #315, which focuses on CUDA 12.8 support and GPU settings UI fixes. Consider separating CosyVoice integration into a dedicated PR, or clarify in the issue description if this was intended scope.
Docstring Coverage ⚠️ Warning Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main changes: CUDA upgrade from cu126 to cu128 and GPU settings UI fix, matching the primary objectives.
Linked Issues check ✅ Passed All primary requirements from issue #315 are addressed: CUDA 12.8 support is added throughout the backend, and the GPU settings UI now includes a reachable 'Switch to CPU Backend' button.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cuda-cu128-upgrade
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Upgrade CUDA toolkit from 12.6 (cu126) to 12.8 (cu128) for proper
RTX 50-series (Blackwell) GPU support. Users with RTX 5070/5080/5090
were reporting CUDA detection failures with cu126.

Also fix the GPU Acceleration settings panel where the 'Switch to CPU
Backend' button was unreachable — it was inside a conditional block
that required !isCurrentlyCuda, making it impossible to switch back
to CPU once running on CUDA.

Closes #315
@jamiepine jamiepine force-pushed the fix/cuda-cu128-upgrade branch from 2925355 to fc5ed1f Compare March 18, 2026 14:49
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (11)
backend/backends/cosyvoice_backend.py (6)

98-98: Use a def instead of assigning a lambda.

Per PEP 8 / Ruff E731, named functions should use def statements.

💅 Suggested fix
-    _noop = lambda *a, **kw: None
+    def _noop(*a, **kw):
+        pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` at line 98, Replace the assigned
lambda _noop = lambda *a, **kw: None with a proper named function using def;
locate the symbol _noop in cosyvoice_backend.py and change it to a def _noop(*a,
**kw): return None so it follows PEP 8 / Ruff E731 and is clearer to readers and
tooling.

183-185: Parameter format shadows Python builtin.

While this matches torchaudio.load's signature, consider renaming to format_ with an underscore to avoid shadowing the builtin.

💅 Suggested fix
-    def _sf_load(uri, frame_offset=0, num_frames=-1, normalize=True,
-                 channels_first=True, format=None, buffer_size=4096,
-                 backend=None):
+    def _sf_load(uri, frame_offset=0, num_frames=-1, normalize=True,
+                 channels_first=True, format_=None, buffer_size=4096,
+                 backend=None):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` around lines 183 - 185, The parameter
name format in function _sf_load shadows the Python builtin; rename it to
format_ (update the function signature for _sf_load and every internal reference
to format -> format_) and update any callers to pass format_ (or accept both
names by adding **kwargs handling and mapping format -> format_ if backward
compatibility is required). Also update the function docstring/parameter list to
reflect format_ and run tests to ensure no call sites were missed (search for
"_sf_load(" and references to the format parameter).

425-426: Consider logging a warning for empty audio output.

When no audio chunks are produced, returning 1 second of silence is a reasonable fallback, but this may indicate an upstream issue worth logging.

💡 Suggested improvement
             if not audio_chunks:
+                logger.warning("CosyVoice produced no audio chunks for text: %s", text[:60])
                 return np.zeros(COSYVOICE_SAMPLE_RATE, dtype=np.float32), COSYVOICE_SAMPLE_RATE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` around lines 425 - 426, The code
returns 1 second of silence when audio_chunks is empty but doesn't log this
event; add a warning log before the return to record that no audio was produced
(e.g., use logger.warning or self.logger.warning depending on the surrounding
scope) and include context such as the function name and that
COSYVOICE_SAMPLE_RATE silent buffer is being returned (reference audio_chunks
and COSYVOICE_SAMPLE_RATE to locate the code). Ensure the log is emitted only
when audio_chunks is falsy and then return the same np.zeros(...) as before.

23-23: Use modern type hints for Python 3.9+.

typing.List and typing.Tuple are deprecated. Since the project targets Python 3.12, use the built-in list and tuple directly.

💅 Suggested fix
-from typing import ClassVar, List, Optional, Tuple
+from typing import ClassVar, Optional

Then update usages throughout the file:

  • List[str]list[str]
  • Tuple[dict, bool]tuple[dict, bool]
  • Tuple[np.ndarray, str]tuple[np.ndarray, str]
  • Tuple[np.ndarray, int]tuple[np.ndarray, int]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` at line 23, Replace deprecated typing
aliases with built-in generics for Python 3.9+: remove typing.List and
typing.Tuple usage in the import line and type hints in this module; keep
ClassVar and Optional if still needed. Specifically, update the import from
"from typing import ClassVar, List, Optional, Tuple" to drop List and Tuple, and
change all occurrences like "List[str]" → "list[str]", "Tuple[dict, bool]" →
"tuple[dict, bool]", "Tuple[np.ndarray, str]" → "tuple[np.ndarray, str]", and
"Tuple[np.ndarray, int]" → "tuple[np.ndarray, int]" (apply similar conversions
elsewhere in functions/methods such as any function that returns tuple[...] or
accepts list[...] types).

372-372: Defensive load is good, but consider the variant fallback.

If _variant is None (model unloaded), this defaults to "v2". This could be surprising if the user intended to use v3. Consider requiring an explicit model_size parameter or raising an error if the model is not loaded.

Note: The retrieved learning indicates model reload/unload race conditions are a pre-existing design issue tracked for future follow-up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` at line 372, The call uses a silent
fallback to "v2" when self._variant is None which can misselect the model;
update the code around the invocation of load_model (the call at await
self.load_model(self._variant or "v2")) to avoid implicit fallback: either
require an explicit model_size/model_variant parameter and pass it to
load_model, or check if self._variant is None and raise a clear error (e.g.,
ValueError) instructing the caller to load the desired variant first; touch the
load_model caller site and any public API that will pass model_size so the
selection is explicit and no silent "v2" default remains.

274-298: Thread-safety concern with global torch.load patch.

Patching torch.load globally affects all threads. While _model_load_lock serializes model loading within this backend, other backends or code paths could call torch.load during this window and unexpectedly receive map_location="cpu".

This is a pre-existing pattern (also used in chatterbox_backend.py), but worth noting for future refactoring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/cosyvoice_backend.py` around lines 274 - 298, The global
patch of torch.load (_orig_torch_load/_patched_load) is not thread-safe;
instead, make the patch local to the import of cosyvoice.cli.cosyvoice by
temporarily replacing the torch entry in sys.modules with a small wrapper module
that exposes the same attributes but a patched load (only while importing
CosyVoice2/CosyVoice3), then restore sys.modules; locate the block that checks
device == "cpu" and currently sets torch.load and change it to: create a shallow
wrapper module or clone of the real torch with a patched load, inject it into
sys.modules["torch"], import the desired class (CosyVoice2 or CosyVoice3) and
instantiate model, and finally restore the original sys.modules["torch"]; keep
references to the existing symbols _orig_torch_load/_patched_load, torch.load,
variant, CosyVoice2/CosyVoice3 and _model_load_lock when implementing.
Dockerfile (1)

42-42: Consider pinning CosyVoice to a specific commit for reproducible builds.

The clone fetches the latest commit from the default branch, which could lead to build inconsistencies over time if the upstream CosyVoice repository introduces breaking changes.

💡 Suggested improvement
-RUN git clone --recursive --depth 1 https://github.com/FunAudioLLM/CosyVoice.git /build/CosyVoice
+# Pin to a known-working commit for reproducible builds
+RUN git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git /build/CosyVoice && \
+    cd /build/CosyVoice && git checkout <COMMIT_SHA>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` at line 42, The Dockerfile's RUN git clone line currently pulls
the latest default branch; change it to pin CosyVoice to a specific commit or
tag to ensure reproducible builds by cloning and then checking out a known
commit/hash (or cloning the specific tag/branch with --branch and
--single-branch) instead of the floating default branch; update the RUN step
that references "git clone --recursive --depth 1
https://github.com/FunAudioLLM/CosyVoice.git /build/CosyVoice" so it checks out
a provided commit SHA or tag immediately after clone (or clones the tag
directly) and document the chosen commit SHA/tag in the Dockerfile comment.
.github/workflows/release.yml (1)

66-66: Same reproducibility concern applies here.

Consider pinning CosyVoice to a specific commit to ensure release builds are reproducible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release.yml at line 66, The git clone command fetching
CosyVoice is not pinned and makes releases non-reproducible; update the clone
step that currently targets https://github.com/FunAudioLLM/CosyVoice to clone a
specific commit/SHA instead of the default branch—fetch the repo (as now), then
checkout a fixed commit SHA (or use the repository URL with the exact commit
reference) for CosyVoice so every release uses the same immutable version and
rerun CI to verify.
app/src/lib/hooks/useGenerationForm.ts (1)

18-22: Add cross-field validation between engine and modelSize.

Current schema allows invalid pairs (e.g., engine: 'luxtts' with modelSize: 'v3'). A superRefine check would prevent bad combinations before submission.

Possible schema refinement
-const generationSchema = z.object({
+const generationSchema = z
+  .object({
   text: z.string().min(1, '').max(50000),
   language: z.enum(LANGUAGE_CODES as [LanguageCode, ...LanguageCode[]]),
   seed: z.number().int().optional(),
   modelSize: z.enum(['1.7B', '0.6B', '1B', '3B', 'v2', 'v3']).optional(),
   instruct: z.string().max(500).optional(),
   engine: z
     .enum(['qwen', 'luxtts', 'chatterbox', 'chatterbox_turbo', 'tada', 'cosyvoice'])
     .optional(),
-});
+  })
+  .superRefine((data, ctx) => {
+    const engine = data.engine ?? 'qwen';
+    const size = data.modelSize;
+    if (!size) return;
+
+    const allowed: Record<string, string[]> = {
+      qwen: ['1.7B', '0.6B'],
+      tada: ['1B', '3B'],
+      cosyvoice: ['v2', 'v3'],
+      luxtts: [],
+      chatterbox: [],
+      chatterbox_turbo: [],
+    };
+
+    if (!allowed[engine]?.includes(size)) {
+      ctx.addIssue({
+        code: z.ZodIssueCode.custom,
+        path: ['modelSize'],
+        message: 'Invalid model size for selected engine',
+      });
+    }
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/hooks/useGenerationForm.ts` around lines 18 - 22, The Zod schema
in useGenerationForm (the schema object that defines modelSize, instruct,
engine) permits invalid engine/modelSize combinations; add a schema.superRefine
that checks the engine and modelSize pair and calls ctx.addIssue when the
combination is invalid (e.g., forbid luxtts with v2/v3, or enforce that certain
engines only accept numeric sizes like '0.6B','1B', etc.). Locate the schema
variable in useGenerationForm.ts and implement the cross-field rules inside
superRefine using the ctx.addIssue API with a descriptive message and path
['modelSize'] or ['engine'] so the form shows the validation error. Ensure
optional values are handled (skip check when one is undefined) and keep the
logic centralized in one refinement function.
app/src/components/Generation/EngineModelSelector.tsx (1)

77-81: Consider extracting shared language-compatibility fallback.

This fallback block is repeated across multiple engine branches. A small helper (e.g., ensureSupportedLanguage(form, engine)) would reduce drift when engine language support changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/EngineModelSelector.tsx` around lines 77 - 81,
Extract the repeated language-compatibility fallback into a reusable helper
(e.g., ensureSupportedLanguage(form, engine)) that calls
getLanguageOptionsForEngine(engine), compares form.getValues('language') against
available options, and sets form.setValue('language', available[0]?.value ??
'en') when current language is unsupported; replace the inline block in
EngineModelSelector (currently using getLanguageOptionsForEngine('cosyvoice')
and form) with a call to ensureSupportedLanguage(form, 'cosyvoice') so all
engine branches use the same centralized logic.
app/src/components/ServerSettings/GpuAcceleration.tsx (1)

247-280: Consider: Non-Tauri users running CUDA see no management options.

When isCurrentlyCuda is true but platform.metadata.isTauri is false (e.g., user manually started the CUDA backend and accesses via browser), neither the "Switch to CPU" section (lines 247-277) nor the download/manage section (lines 280-379) renders. Users would see only the GPU status with no actions available.

This is likely acceptable since restart functionality requires Tauri, but you might consider adding a brief informational message for this edge case explaining that backend management requires the desktop app.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx` around lines 247 -
280, When isCurrentlyCuda is true but platform.metadata.isTauri is false (user
started CUDA backend outside Tauri) there are no management controls shown; add
a small informational UI branch for this edge case so users know backend
management requires the desktop app. Specifically, inside the condition that
currently checks isCurrentlyCuda && platform.metadata.isTauri, add an else-if or
adjacent block for isCurrentlyCuda && !platform.metadata.isTauri that renders a
brief message (using the same styling as other notices) explaining that
switching to CPU or restarting requires the Tauri desktop app and that no
in-browser controls are available; reference the existing symbols
isCurrentlyCuda, platform.metadata.isTauri, and handleSwitchToCpu (to clarify
why the button is unavailable) and surface any existing error via the error
variable as in the other branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 74-77: The code currently trusts value.split(':') and writes
modelSize into form state; validate the parsed modelSize before calling
form.setValue. After const [, modelSize] = value.split(':'), check that
modelSize is one of the allowed values ('v2' or 'v3') and only then call
form.setValue('modelSize', modelSize as 'v2' | 'v3'); if it is invalid or
missing, handle it safely (e.g. skip setting, set a safe default, or early
return) and avoid propagating an invalid value to form state; update the block
around value.split, modelSize, form.setValue('engine', 'cosyvoice'), and
form.setValue('modelSize', ...) accordingly and consider logging or triggering
validation for malformed inputs.

In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 88-92: The qwen branch builds model IDs and display labels using
data.modelSize which can be undefined (yielding "qwen-tts-undefined" vs a
display like "Qwen TTS 0.6B"); fix by defaulting modelSize to '1.7B' when
constructing the model id and the display label: use (data.modelSize ?? '1.7B')
wherever model id is computed (the ternary that sets modelId for engine ===
'qwen') and wherever the human-facing label is generated (the block that outputs
"Qwen TTS ...", lines around the display text), and make the same change in the
other occurrence referenced (the second block at lines ~104-110) so both model
selection and display remain consistent.

In `@scripts/package_cuda.py`:
- Around line 217-218: The help text for the CLI option defining the torch
compatibility range is stale: update the help string for the argument (the
parser.add_argument call that sets default=">=2.7.0,<2.11.0" for the
--torch-compat option in scripts/package_cuda.py) so it matches the new default
(change the displayed range from ">=2.6.0,<2.11.0" to ">=2.7.0,<2.11.0"); ensure
the --torch-compat help message and the default value are consistent.

---

Nitpick comments:
In @.github/workflows/release.yml:
- Line 66: The git clone command fetching CosyVoice is not pinned and makes
releases non-reproducible; update the clone step that currently targets
https://github.com/FunAudioLLM/CosyVoice to clone a specific commit/SHA instead
of the default branch—fetch the repo (as now), then checkout a fixed commit SHA
(or use the repository URL with the exact commit reference) for CosyVoice so
every release uses the same immutable version and rerun CI to verify.

In `@app/src/components/Generation/EngineModelSelector.tsx`:
- Around line 77-81: Extract the repeated language-compatibility fallback into a
reusable helper (e.g., ensureSupportedLanguage(form, engine)) that calls
getLanguageOptionsForEngine(engine), compares form.getValues('language') against
available options, and sets form.setValue('language', available[0]?.value ??
'en') when current language is unsupported; replace the inline block in
EngineModelSelector (currently using getLanguageOptionsForEngine('cosyvoice')
and form) with a call to ensureSupportedLanguage(form, 'cosyvoice') so all
engine branches use the same centralized logic.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx`:
- Around line 247-280: When isCurrentlyCuda is true but
platform.metadata.isTauri is false (user started CUDA backend outside Tauri)
there are no management controls shown; add a small informational UI branch for
this edge case so users know backend management requires the desktop app.
Specifically, inside the condition that currently checks isCurrentlyCuda &&
platform.metadata.isTauri, add an else-if or adjacent block for isCurrentlyCuda
&& !platform.metadata.isTauri that renders a brief message (using the same
styling as other notices) explaining that switching to CPU or restarting
requires the Tauri desktop app and that no in-browser controls are available;
reference the existing symbols isCurrentlyCuda, platform.metadata.isTauri, and
handleSwitchToCpu (to clarify why the button is unavailable) and surface any
existing error via the error variable as in the other branch.

In `@app/src/lib/hooks/useGenerationForm.ts`:
- Around line 18-22: The Zod schema in useGenerationForm (the schema object that
defines modelSize, instruct, engine) permits invalid engine/modelSize
combinations; add a schema.superRefine that checks the engine and modelSize pair
and calls ctx.addIssue when the combination is invalid (e.g., forbid luxtts with
v2/v3, or enforce that certain engines only accept numeric sizes like
'0.6B','1B', etc.). Locate the schema variable in useGenerationForm.ts and
implement the cross-field rules inside superRefine using the ctx.addIssue API
with a descriptive message and path ['modelSize'] or ['engine'] so the form
shows the validation error. Ensure optional values are handled (skip check when
one is undefined) and keep the logic centralized in one refinement function.

In `@backend/backends/cosyvoice_backend.py`:
- Line 98: Replace the assigned lambda _noop = lambda *a, **kw: None with a
proper named function using def; locate the symbol _noop in cosyvoice_backend.py
and change it to a def _noop(*a, **kw): return None so it follows PEP 8 / Ruff
E731 and is clearer to readers and tooling.
- Around line 183-185: The parameter name format in function _sf_load shadows
the Python builtin; rename it to format_ (update the function signature for
_sf_load and every internal reference to format -> format_) and update any
callers to pass format_ (or accept both names by adding **kwargs handling and
mapping format -> format_ if backward compatibility is required). Also update
the function docstring/parameter list to reflect format_ and run tests to ensure
no call sites were missed (search for "_sf_load(" and references to the format
parameter).
- Around line 425-426: The code returns 1 second of silence when audio_chunks is
empty but doesn't log this event; add a warning log before the return to record
that no audio was produced (e.g., use logger.warning or self.logger.warning
depending on the surrounding scope) and include context such as the function
name and that COSYVOICE_SAMPLE_RATE silent buffer is being returned (reference
audio_chunks and COSYVOICE_SAMPLE_RATE to locate the code). Ensure the log is
emitted only when audio_chunks is falsy and then return the same np.zeros(...)
as before.
- Line 23: Replace deprecated typing aliases with built-in generics for Python
3.9+: remove typing.List and typing.Tuple usage in the import line and type
hints in this module; keep ClassVar and Optional if still needed. Specifically,
update the import from "from typing import ClassVar, List, Optional, Tuple" to
drop List and Tuple, and change all occurrences like "List[str]" → "list[str]",
"Tuple[dict, bool]" → "tuple[dict, bool]", "Tuple[np.ndarray, str]" →
"tuple[np.ndarray, str]", and "Tuple[np.ndarray, int]" → "tuple[np.ndarray,
int]" (apply similar conversions elsewhere in functions/methods such as any
function that returns tuple[...] or accepts list[...] types).
- Line 372: The call uses a silent fallback to "v2" when self._variant is None
which can misselect the model; update the code around the invocation of
load_model (the call at await self.load_model(self._variant or "v2")) to avoid
implicit fallback: either require an explicit model_size/model_variant parameter
and pass it to load_model, or check if self._variant is None and raise a clear
error (e.g., ValueError) instructing the caller to load the desired variant
first; touch the load_model caller site and any public API that will pass
model_size so the selection is explicit and no silent "v2" default remains.
- Around line 274-298: The global patch of torch.load
(_orig_torch_load/_patched_load) is not thread-safe; instead, make the patch
local to the import of cosyvoice.cli.cosyvoice by temporarily replacing the
torch entry in sys.modules with a small wrapper module that exposes the same
attributes but a patched load (only while importing CosyVoice2/CosyVoice3), then
restore sys.modules; locate the block that checks device == "cpu" and currently
sets torch.load and change it to: create a shallow wrapper module or clone of
the real torch with a patched load, inject it into sys.modules["torch"], import
the desired class (CosyVoice2 or CosyVoice3) and instantiate model, and finally
restore the original sys.modules["torch"]; keep references to the existing
symbols _orig_torch_load/_patched_load, torch.load, variant,
CosyVoice2/CosyVoice3 and _model_load_lock when implementing.

In `@Dockerfile`:
- Line 42: The Dockerfile's RUN git clone line currently pulls the latest
default branch; change it to pin CosyVoice to a specific commit or tag to ensure
reproducible builds by cloning and then checking out a known commit/hash (or
cloning the specific tag/branch with --branch and --single-branch) instead of
the floating default branch; update the RUN step that references "git clone
--recursive --depth 1 https://github.com/FunAudioLLM/CosyVoice.git
/build/CosyVoice" so it checks out a provided commit SHA or tag immediately
after clone (or clones the tag directly) and document the chosen commit SHA/tag
in the Dockerfile comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9bacccaa-4a86-4d26-8f63-a5a0eee63fda

📥 Commits

Reviewing files that changed from the base of the PR and between c9f38dd and 2925355.

📒 Files selected for processing (19)
  • .github/workflows/release.yml
  • .gitignore
  • Dockerfile
  • app/src/components/Generation/EngineModelSelector.tsx
  • app/src/components/ServerSettings/GpuAcceleration.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/constants/languages.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/cosyvoice_backend.py
  • backend/build_binary.py
  • backend/models.py
  • backend/requirements.txt
  • backend/server.py
  • backend/services/cuda.py
  • docs/content/docs/developer/building.mdx
  • justfile
  • scripts/package_cuda.py

Comment thread app/src/components/Generation/EngineModelSelector.tsx Outdated
Comment thread app/src/lib/hooks/useGenerationForm.ts Outdated
Comment thread scripts/package_cuda.py
Comment on lines +217 to 218
default=">=2.7.0,<2.11.0",
help="Torch version compatibility range (default: >=2.6.0,<2.11.0)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Help text shows stale default value.

The --torch-compat default was updated to >=2.7.0,<2.11.0 on Line 217, but the help text on Line 218 still shows >=2.6.0,<2.11.0.

Proposed fix
     parser.add_argument(
         "--torch-compat",
         type=str,
         default=">=2.7.0,<2.11.0",
-        help="Torch version compatibility range (default: >=2.6.0,<2.11.0)",
+        help="Torch version compatibility range (default: >=2.7.0,<2.11.0)",
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default=">=2.7.0,<2.11.0",
help="Torch version compatibility range (default: >=2.6.0,<2.11.0)",
parser.add_argument(
"--torch-compat",
type=str,
default=">=2.7.0,<2.11.0",
help="Torch version compatibility range (default: >=2.7.0,<2.11.0)",
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/package_cuda.py` around lines 217 - 218, The help text for the CLI
option defining the torch compatibility range is stale: update the help string
for the argument (the parser.add_argument call that sets
default=">=2.7.0,<2.11.0" for the --torch-compat option in
scripts/package_cuda.py) so it matches the new default (change the displayed
range from ">=2.6.0,<2.11.0" to ">=2.7.0,<2.11.0"); ensure the --torch-compat
help message and the default value are consistent.

@jamiepine jamiepine merged commit ffc1b54 into main Mar 18, 2026
1 check was pending
@CustardFlan
Copy link
Copy Markdown

NVIDIA GeForce RTX5060 Ti here
I receive the following log:

INFO: 127.0.0.1:61924 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:52080 - "DELETE /profiles/c66a05a6-9fcc-4c31-a74d-d63d32c7d1b4 HTTP/1.1" 200 OK
INFO: 127.0.0.1:63480 - "GET /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:52080 - "GET /profiles/c66a05a6-9fcc-4c31-a74d-d63d32c7d1b4 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:52080 - "GET /profiles/c66a05a6-9fcc-4c31-a74d-d63d32c7d1b4 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:53544 - "GET /tasks/active HTTP/1.1" 200 OK
INFO: 127.0.0.1:57480 - "POST /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:57480 - "GET /profiles HTTP/1.1" 200 OK
INFO: 127.0.0.1:49153 - "GET /profiles/c66a05a6-9fcc-4c31-a74d-d63d32c7d1b4 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:49153 - "GET /profiles/c66a05a6-9fcc-4c31-a74d-d63d32c7d1b4 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:51621 - "POST /profiles/9ace0801-ba36-44cc-b7f7-22b1de4cc2f0/samples HTTP/1.1" 200 OK
INFO: 127.0.0.1:51621 - "OPTIONS /profiles/9ace0801-ba36-44cc-b7f7-22b1de4cc2f0 HTTP/1.1" 200 OK
INFO: 127.0.0.1:51621 - "GET /profiles/9ace0801-ba36-44cc-b7f7-22b1de4cc2f0 HTTP/1.1" 200 OK
INFO: 127.0.0.1:60394 - "GET /models/status HTTP/1.1" 200 OK
INFO: 127.0.0.1:60394 - "POST /generate HTTP/1.1" 200 OK
INFO: 127.0.0.1:60394 - "GET /history?limit=20 HTTP/1.1" 200 OK
INFO: 127.0.0.1:60394 - "GET /generate/807254ac-55f2-45b1-a294-8638bb529fc0/status HTTP/1.1" 200 OK
Traceback (most recent call last):
File "backend\services\generation.py", line 65, in run_generation
File "backend\services\profiles.py", line 407, in create_voice_prompt_for_profile
File "backend\backends\pytorch_backend.py", line 172, in create_voice_prompt
File "asyncio\threads.py", line 25, in to_thread
File "concurrent\futures\thread.py", line 59, in run
File "backend\backends\pytorch_backend.py", line 165, in _create_prompt_sync
File "torch\utils_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "qwen_tts\inference\qwen3_tts_model.py", line 427, in create_voice_clone_prompt
File "qwen_tts\inference\qwen3_tts_tokenizer.py", line 248, in encode
File "transformers\feature_extraction_utils.py", line 246, in to
self.data = {k: maybe_to(v) for k, v in self.items()}
^^^^^^^^^^^
File "transformers\feature_extraction_utils.py", line 240, in maybe_to
return v.to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
Search for cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

So, i cannot generate an audio using the GPU

nvidia-smi:
NVIDIA-SMI 595.79 Driver Version: 595.79 CUDA Version: 13.2
NVIDIA GeForce RTX 5060 Ti WDDM

Thanks in advance

@Astral793
Copy link
Copy Markdown

Same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants