Skip to content

fix(build): stabilize cross-platform tts packaging and ci#6

Open
rishiskhare wants to merge 7 commits intomainfrom
pr/windows-build-clean
Open

fix(build): stabilize cross-platform tts packaging and ci#6
rishiskhare wants to merge 7 commits intomainfrom
pr/windows-build-clean

Conversation

@rishiskhare
Copy link
Copy Markdown
Owner

@rishiskhare rishiskhare commented Apr 12, 2026

  • feat(selection): add configurable text capture pipeline
  • fix(tts): improve kokoro runtime and text normalization
  • fix(ci): stabilize cross-platform packaging and workflows

Summary by CodeRabbit

  • New Features

    • Bundled ONNX Runtime support and added Kokoro neural TTS engine with model/voice loading
    • New settings UI and commands for selection capture method, clipboard handling, and model-unload timeout
    • Windows NSIS installer packaging enabled
  • Bug Fixes

    • More reliable selection capture (accessibility + clipboard) and safer Windows process handling
    • Improved text normalization and spacing around quotes
  • Documentation

    • Added Windows TTS process verification guide
  • Chores

    • CI/workflow upgrades and build target adjustments

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a vendored Kokoro ONNX TTS engine and runtime bundling; moves selection-capture into a new module with accessibility/clipboard strategies and new settings/commands; threads a bundled ONNX Runtime path into TTS initialization; expands UI settings; and updates multiple GitHub Actions and Windows NSIS packaging options.

Changes

Cohort / File(s) Summary
Workflows: CI action/version & packaging args
\.github/workflows/.../build.yml, \.github/workflows/.../build-test.yml, \.github/workflows/.../pr-test-build.yml, \.github/workflows/.../release.yml, \.github/workflows/.../playwright.yml, \.github/workflows/.../lint.yml, \.github/workflows/.../prettier.yml, \.github/workflows/.../test.yml
Bumped GitHub Action versions (e.g., actions/checkout v4→v5, upload-artifact v4→v7, github-script v7→v9). Reworked Vulkan SDK cache/prepare flow and added ONNX Runtime download/extract steps. Enabled NSIS bundling (--bundles nsis) for Windows matrix entries.
Tauri bundle config
src-tauri/tauri.conf.json
Changed bundle.targets from "all" to explicit array ["appimage","deb","dmg","nsis","rpm"].
ONNX runtime resolution & TTS plumbing
src-tauri/src/lib.rs, src-tauri/src/managers/tts.rs, src-tauri/Cargo.toml
Added resolve_bundled_onnxruntime() and threaded onnxruntime_path: Option<PathBuf> into initialization and TTSManager::new. Implemented one-time ONNX init helper (OnceLock) and ensure_onnxruntime_initialized. Added ort dependency and a [patch.crates-io] override for vendored tts-rs.
Selection capture refactor
src-tauri/src/selection.rs, src-tauri/src/actions.rs, src-tauri/src/settings.rs
Moved selection-capture into selection.rs exposing pub fn capture_selected_text(app: &AppHandle) -> Option<String> with Auto/Accessibility/Clipboard strategies. Removed prior in-file helpers and clipboard plugin usage; actions now call the new API. Added SelectionCaptureMethod and ClipboardHandling settings with defaults and serde compatibility.
Settings commands, bindings & UI
src-tauri/src/shortcut/mod.rs, src/bindings.ts, src/stores/settingsStore.ts, src/components/settings/*, src/components/settings/advanced/AdvancedSettings.tsx
Added Tauri commands and TS wrappers to change selection_capture_method, clipboard_handling, and model_unload_timeout. Introduced React components for selection-capture and clipboard-handling settings, updated ModelUnloadTimeout behavior, wired store updaters, and rendered new controls in Advanced settings.
Text normalization
src-tauri/src/text_normalization.rs
Refined fragment-leading-whitespace handling, added quote-aware normalization utilities, updated spacing rules, and added unit tests for quote/apostrophe edge cases.
Vendored tts-rs: crate scaffold & metadata
vendor/tts-rs/Cargo.toml, vendor/tts-rs/src/lib.rs, vendor/tts-rs/.cargo*, vendor/tts-rs/LICENSE, vendor/tts-rs/README.md, vendor/tts-rs/examples/kokoro.rs, .prettierignore
Added vendored tts-rs crate scaffold: SynthesisEngine trait, SynthesisResult helpers, license, README, example, .cargo metadata, and updated .prettierignore to skip vendored metadata.
Vendored tts-rs: Kokoro engine implementation
vendor/tts-rs/src/engines/kokoro/mod.rs, .../engine.rs, .../model.rs, .../phonemizer.rs, .../voices.rs, .../vocab.rs
Implemented Kokoro engine: ONNX-based model loading/inference, voice .npz parsing, vocab loading, and espeak-ng phonemizer. Key additions include model/session lifecycle, tensor dtype handling, chunking/crossfade, phonemization piping (suppresses Windows console window), and numerous public types/errors.
Vendor maintenance files
vendor/tts-rs/*
Added vendor maintenance files: .cargo_vcs_info.json, .gitignore, .cargo-ok, Cargo.toml.orig.
Documentation
docs/windows-tts-process-test-plan.md
Added Windows-specific TTS process test plan documenting CREATE_NO_WINDOW expectations and manual/observability verification steps.

Sequence Diagram(s)

sequenceDiagram
    participant App as Tauri App
    participant Init as initialize_core_logic
    participant Resolver as resolve_bundled_onnxruntime
    participant TTS as TTSManager
    participant Loader as Model Loader Thread
    participant ORT as ONNX Runtime

    App->>Init: start/run(cli_args)
    Init->>Resolver: resolve_bundled_onnxruntime(app_handle)
    Resolver-->>Init: Option<PathBuf> (bundled onnx path)
    Init->>TTS: TTSManager::new(..., onnxruntime_path)
    TTS->>Loader: spawn model loader thread
    Loader->>ORT: ensure_onnxruntime_initialized(onnxruntime_path)
    ORT-->>Loader: init result (OnceLock)
    Loader->>TTS: load model / emit events or emit tts-error
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through runtimes, bundled and bright,

Kokoro whispered in ONNX light,
Clipboard secrets, accessibility’s art,
New settings and voices take part,
A rabbit cheered — the TTS starts!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(build): stabilize cross-platform tts packaging and ci' directly summarizes the main changes—stabilizing cross-platform TTS packaging and CI workflows through GitHub Actions version updates, NSIS bundler configuration, Vulkan SDK handling improvements, and ONNX Runtime bundling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr/windows-build-clean

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (5)
src/components/settings/SelectionCaptureMethod.tsx (2)

26-49: Consider memoizing the options array.

Similar to SelectionClipboardHandling.tsx, the options array could be memoized since it depends on isMacOS and t.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/settings/SelectionCaptureMethod.tsx` around lines 26 - 49, The
options array in SelectionCaptureMethod depends on isMacOS and t and should be
memoized to avoid recreating it on each render; update the component
(SelectionCaptureMethod) to wrap the options creation in a useMemo that lists
[isMacOS, t] as dependencies so the array (and its objects with values "auto",
"accessibility", "clipboard") is only recomputed when those inputs change.

22-24: Consider memoizing the macOS detection.

The isMacOS check runs on every render but the user agent doesn't change. Consider moving this outside the component or memoizing it.

♻️ Optional improvement
// Move outside component or use useMemo
const isMacOS = typeof window !== "undefined" && /mac/i.test(window.navigator.userAgent || "");

// Or if you need it inside the component:
const isMacOS = useMemo(
  () => typeof window !== "undefined" && /mac/i.test(window.navigator.userAgent || ""),
  []
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/settings/SelectionCaptureMethod.tsx` around lines 22 - 24, The
macOS detection assigned to isMacOS in SelectionCaptureMethod.tsx is recomputed
on every render; move the expression const isMacOS = typeof window !==
"undefined" && /mac/i.test(window.navigator.userAgent || ""); outside the
component (module scope) or wrap it in useMemo with an empty dependency array
inside the component and import useMemo if needed so the userAgent check runs
once and not on every render.
src/components/settings/SelectionClipboardHandling.tsx (1)

21-36: Consider memoizing the options array.

The options array is recreated on every render. While this is a minor performance concern, memoizing it with useMemo would be consistent with best practices for components wrapped in React.memo.

♻️ Optional improvement
+import React, { useMemo } from "react";
-import React from "react";
 // ...
 
-    const options = [
+    const options = useMemo(() => [
       {
         value: "dont_modify",
         label: t(
           "settings.advanced.clipboardHandling.options.dontModify",
           "Don't Modify Clipboard",
         ),
       },
       {
         value: "copy_to_clipboard",
         label: t(
           "settings.advanced.clipboardHandling.options.copyToClipboard",
           "Copy Selection To Clipboard",
         ),
       },
-    ];
+    ], [t]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/settings/SelectionClipboardHandling.tsx` around lines 21 - 36,
The options array in SelectionClipboardHandling is recreated on every render;
wrap its creation with React's useMemo to memoize it (e.g., const options =
useMemo(() => [ ... ], [t]) ) so it only recalculates when dependencies change;
reference the options variable and the i18n translator function t and ensure
SelectionClipboardHandling imports React/useMemo and uses the memoized options
where currently used.
src-tauri/src/lib.rs (1)

105-110: Hardcoded ONNX Runtime version in candidate filenames.

The version 1.23.1 is hardcoded in the candidate library names. When updating the bundled ONNX Runtime, these will need to be updated manually. Consider extracting this as a constant or relying solely on the fallback scan.

♻️ Consider version-agnostic approach

The fallback scan (lines 120-146) already handles version mismatches, so the versioned candidates primarily serve as an optimization for the common case. This is a reasonable trade-off, but documenting the version coupling would help future maintainers.

+// NOTE: Update these version numbers when upgrading the bundled ONNX Runtime
 #[cfg(target_os = "macos")]
 let candidates = ["libonnxruntime.dylib", "libonnxruntime.1.23.1.dylib"];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src-tauri/src/lib.rs` around lines 105 - 110, The candidate library filenames
in lib.rs are hardcoding ONNX Runtime version "1.23.1" (the candidates array
used under #[cfg(target_os = "...")]), which forces manual updates when the
bundled runtime changes; update this by introducing a single version constant
(e.g., ONNX_RUNTIME_VERSION) and build the versioned candidate names from that
constant (or remove the versioned entries and rely only on the existing fallback
scan), then replace the current candidates arrays in the blocks referencing
candidates so future updates require changing only the constant (or none at all
if relying on the scan).
vendor/tts-rs/src/engines/kokoro/model.rs (1)

189-206: Consider reducing code duplication for speed tensor handling.

The if/else branches duplicate the entire input construction and session run, differing only in the speed tensor type.

♻️ Reduce duplication with a helper or macro

One option is to extract a helper that abstracts over the speed value type, or use a trait object. However, given the ORT API constraints and the relatively small duplication, the current approach is acceptable for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/model.rs` around lines 189 - 206, The two
branches duplicate inputs construction and session.run; instead, create the
speed tensor once and reuse it: match on self.speed_is_int32 to build a
TensorRef for "speed" (e.g. create speed_arr as ndarray::arr1(&[speed as i32])
or ndarray::arr1(&[speed]) and wrap with TensorRef::from_array_view), store that
TensorRef in a variable (e.g. speed_tensor), then build inputs using
self.tokens_input_name, "style", and the speed_tensor and call
self.session.run(inputs) once; reference symbols: self.speed_is_int32,
TensorRef::from_array_view, tokens_arr, style_view, self.tokens_input_name, and
self.session.run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src-tauri/src/selection.rs`:
- Around line 90-106: The buffer sizing in unsafe fn cf_to_string is incorrect
because CFStringGetLength returns UTF-16 code units; replace the fixed len*4+1
calculation with a call to CFStringGetMaximumSizeForEncoding(len,
kCFStringEncodingUTF8) (or the equivalent constant in your bindings) plus one
for the NUL terminator to compute buf_size, allocate the Vec accordingly, call
CFStringGetCString as before, and keep CFRelease(ptr); this ensures cf_to_string
handles surrogate pairs and supplementary Unicode correctly.

In `@src-tauri/src/text_normalization.rs`:
- Around line 371-425: normalize_quote_spacing currently does repeated scans
over chars (prev_non_space/next_non_space and position/find), causing O(n²)
behavior; make it linear by precomputing nearest non-space character info
(indices or Option<char>) for every index before the main loop or by converting
the logic into a single-pass state machine. Specifically, in
normalize_quote_spacing compute arrays like nearest_non_space_left and
nearest_non_space_right (or equivalent index maps) once, then replace all uses
of chars[..].iter().rev().find(), chars[idx+1..].iter().position(), and
chars[..].iter().rev().copied().find() with O(1) lookups into those arrays so
the existing checks that call is_quote_char, is_opening_quote, and
should_trim_space_before_opening_quote remain unchanged but run in linear time.
- Around line 427-442: The branch that inserts a missing space before opening
quotes only checks smart single quotes (matches!(ch, '‘' | '’'))—add smart
double quote and straight double quote to that match (e.g., include '“' and '"'
in the matches! set) so the same logic applies to opening double quotes; also
extend the replacement chain on out.replace(...) to include the
colon-to-opening-double-quote cases (e.g., add .replace(":“", ": “") and
optionally .replace(":\"", ": \"") ) so sequences like “:“quoted”” or : "quoted"
get the missing space inserted.

In `@vendor/tts-rs/.cargo_vcs_info.json`:
- Around line 1-6: The vendored JSON metadata (keys "git", "sha1",
"path_in_vcs") is failing Prettier; fix by either adding the expected formatting
(ensure the JSON ends with a newline and matches Prettier style) in this file or
update Prettier configuration to ignore vendored cargo metadata (add a rule in
.prettierignore to exclude files like vendor/**/.cargo_vcs_info.json or
vendor/**/.cargo_vcs_info*). Modify the file to include the trailing newline and
valid JSON formatting if choosing to edit the file, otherwise add the ignore
pattern to the project's Prettier ignore/config so CI stops checking these
generated vendor metadata files.

In `@vendor/tts-rs/Cargo.toml`:
- Around line 60-68: The ort dependency is pinned to "2.0.0-rc.11" in
Cargo.toml; update the version string for the [dependencies.ort] entry to
"2.0.0-rc.12" if you want the latest RC, or alternatively evaluate production
readiness and pin to a stable release (or a known-good 1.x) instead—ensure you
update the version value for the ort dependency and run cargo update + cargo
test to validate compatibility with features ("std","ndarray","load-dynamic")
and optional/default-features settings used in the current manifest.

In `@vendor/tts-rs/examples/kokoro.rs`:
- Around line 4-7: The example imports from the wrong crate name: replace the
non-existent "transcribe_rs" path with the vendored crate name "tts_rs" wherever
KokoroEngine, KokoroInferenceParams, KokoroModelParams and SynthesisEngine are
imported (check the use statement that references
KokoroEngine/KokoroInferenceParams/KokoroModelParams and SynthesisEngine) so the
example compiles against the actual library defined in Cargo.toml.

In `@vendor/tts-rs/README.md`:
- Line 1: The README.md in the tts-rs package is failing Prettier; run the
repository's formatter on that file (e.g., execute prettier --write README.md or
your project's formatting script) to normalize markdown (wrapping, spacing, line
endings) so it passes prettier --check ., then commit the formatted README.md to
unblock CI.
- Around line 15-18: The README shows the wrong dependency version and an
outdated synthesize call signature; update the toml dependency version from
2026.2.1 to 2026.2.3 and change the example call using engine.synthesize so it
passes a single Option<SynthesisParams> as the second argument (e.g., build a
SynthesisParams with the desired voice "af_heart" and pass Some(params) instead
of two separate args), referencing the synthesize function and the
SynthesisParams struct in the example.

In `@vendor/tts-rs/src/engines/kokoro/mod.rs`:
- Around line 52-62: The doc examples import the wrong crate name; update the
example imports to use the actual crate/lib name (tts_rs or tts-rs as used by
the crate) instead of transcribe_rs so doctests compile; specifically change the
use statements that reference transcribe_rs::{SynthesisEngine,
engines::kokoro::{KokoroEngine, KokoroInferenceParams}} to use tts_rs (matching
the crate's lib name) in both occurrences surrounding the KokoroEngine and
KokoroInferenceParams examples.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 30-44: The function voice_lang uses byte slicing
(&voice[..voice.len().min(2)]) which can panic on multi-byte UTF‑8 names; change
the prefix extraction in voice_lang to be character-safe (e.g., take the first
two chars using voice.chars().take(2).collect::<String>() or an equivalent
char-indexed slice) and then match on prefix.as_str() (or otherwise convert to
&str) instead of the current byte-slice; keep the rest of the match arms and
returned &'static str values unchanged.

In `@vendor/tts-rs/src/engines/kokoro/vocab.rs`:
- Around line 22-31: The loop that constructs the vocab map currently truncates
multi-character keys by taking only the first char (in the for (k, v) in
vocab_obj block), which can overwrite entries; update the validation in that
loop to reject any key whose character count is not exactly one (e.g., check
k.chars().count() != 1) and return a KokoroError::Config with a clear message
like "Vocab key must be a single character" including the offending key, keeping
the existing checks for empty keys and non-integer values (the error should be
raised before inserting into map and before parsing v with as_i64), so
map.insert(ch, id) only runs for valid single-character keys.

In `@vendor/tts-rs/src/engines/kokoro/voices.rs`:
- Around line 34-43: The voice name extraction currently uses
raw_name.trim_end_matches('/') and trim_end_matches(".npy") which leaves
directory prefixes for nested entries; update the logic around
entry.name().to_string() and voice_name so you first get the path basename
(e.g., via Path::new(&raw_name).file_name()) and then strip the ".npy" extension
(or validate the extension) to derive the voice_name; keep the existing guard
that skips empty names or directory entries (raw_name ending with '/'), and
reference raw_name, voice_name, and entry.name().to_string() when making the
change.
- Around line 61-69: get_style may panic when a found voice has an empty styles
vector because clamping uses saturating_sub(1) then indexing; guard by checking
if styles.is_empty() after retrieving with self.voices.get(voice) and return an
Err instead of indexing: e.g., return
Err(KokoroError::NoStyles(voice.to_string())) or add a suitable KokoroError
variant (update the KokoroError enum and any callers), otherwise compute clamped
= idx.min(styles.len() - 1) and return styles[clamped].
- Around line 97-106: The parser reads header_len from bytes 8..9 assuming numpy
1.0 but never checks the version bytes at positions 6..7; update the logic
(around header_len, data_offset, and the KokoroError::VoiceParse check) to first
read the version bytes (data[6], data[7]) and branch: if major==1 read a 2-byte
little-endian u16 header_len from data[8..10], if major==2 read a 4-byte
little-endian u32 header_len from data[8..12] (cast to usize), then set
data_offset = 10 + header_len for v1 and data_offset = 12 + header_len for v2;
add bounds checks to ensure data has those version/header bytes before reading
and return a clear KokoroError::VoiceParse on unsupported/unknown major versions
or truncated header.

In `@vendor/tts-rs/src/lib.rs`:
- Around line 62-66: The duration_secs method can divide by zero if sample_rate
== 0; update the duration_secs(&self) function to guard against a zero
sample_rate (e.g., if self.sample_rate == 0 { return 0.0 } or otherwise handle
gracefully) so that calculating self.samples.len() as f64 / self.sample_rate as
f64 never produces infinity/NaN; locate and modify the duration_secs method on
the struct that holds samples and sample_rate to implement this check and return
a sensible default when sample_rate is zero.

---

Nitpick comments:
In `@src-tauri/src/lib.rs`:
- Around line 105-110: The candidate library filenames in lib.rs are hardcoding
ONNX Runtime version "1.23.1" (the candidates array used under #[cfg(target_os =
"...")]), which forces manual updates when the bundled runtime changes; update
this by introducing a single version constant (e.g., ONNX_RUNTIME_VERSION) and
build the versioned candidate names from that constant (or remove the versioned
entries and rely only on the existing fallback scan), then replace the current
candidates arrays in the blocks referencing candidates so future updates require
changing only the constant (or none at all if relying on the scan).

In `@src/components/settings/SelectionCaptureMethod.tsx`:
- Around line 26-49: The options array in SelectionCaptureMethod depends on
isMacOS and t and should be memoized to avoid recreating it on each render;
update the component (SelectionCaptureMethod) to wrap the options creation in a
useMemo that lists [isMacOS, t] as dependencies so the array (and its objects
with values "auto", "accessibility", "clipboard") is only recomputed when those
inputs change.
- Around line 22-24: The macOS detection assigned to isMacOS in
SelectionCaptureMethod.tsx is recomputed on every render; move the expression
const isMacOS = typeof window !== "undefined" &&
/mac/i.test(window.navigator.userAgent || ""); outside the component (module
scope) or wrap it in useMemo with an empty dependency array inside the component
and import useMemo if needed so the userAgent check runs once and not on every
render.

In `@src/components/settings/SelectionClipboardHandling.tsx`:
- Around line 21-36: The options array in SelectionClipboardHandling is
recreated on every render; wrap its creation with React's useMemo to memoize it
(e.g., const options = useMemo(() => [ ... ], [t]) ) so it only recalculates
when dependencies change; reference the options variable and the i18n translator
function t and ensure SelectionClipboardHandling imports React/useMemo and uses
the memoized options where currently used.

In `@vendor/tts-rs/src/engines/kokoro/model.rs`:
- Around line 189-206: The two branches duplicate inputs construction and
session.run; instead, create the speed tensor once and reuse it: match on
self.speed_is_int32 to build a TensorRef for "speed" (e.g. create speed_arr as
ndarray::arr1(&[speed as i32]) or ndarray::arr1(&[speed]) and wrap with
TensorRef::from_array_view), store that TensorRef in a variable (e.g.
speed_tensor), then build inputs using self.tokens_input_name, "style", and the
speed_tensor and call self.session.run(inputs) once; reference symbols:
self.speed_is_int32, TensorRef::from_array_view, tokens_arr, style_view,
self.tokens_input_name, and self.session.run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c10f6a9-e3df-4dee-98a4-9678c91d222b

📥 Commits

Reviewing files that changed from the base of the PR and between 528afa0 and 73bfc2b.

⛔ Files ignored due to path filters (6)
  • src-tauri/Cargo.lock is excluded by !**/*.lock
  • vendor/tts-rs/output.wav is excluded by !**/*.wav
  • vendor/tts-rs/samples/dots.wav is excluded by !**/*.wav
  • vendor/tts-rs/samples/jfk.wav is excluded by !**/*.wav
  • vendor/tts-rs/samples/product_names.wav is excluded by !**/*.wav
  • vendor/tts-rs/samples/russian.wav is excluded by !**/*.wav
📒 Files selected for processing (41)
  • .github/workflows/build-test.yml
  • .github/workflows/build.yml
  • .github/workflows/lint.yml
  • .github/workflows/playwright.yml
  • .github/workflows/pr-test-build.yml
  • .github/workflows/prettier.yml
  • .github/workflows/release.yml
  • .github/workflows/test.yml
  • docs/windows-tts-process-test-plan.md
  • src-tauri/Cargo.toml
  • src-tauri/src/actions.rs
  • src-tauri/src/lib.rs
  • src-tauri/src/managers/tts.rs
  • src-tauri/src/selection.rs
  • src-tauri/src/settings.rs
  • src-tauri/src/shortcut/mod.rs
  • src-tauri/src/text_normalization.rs
  • src-tauri/tauri.conf.json
  • src/bindings.ts
  • src/components/settings/ModelUnloadTimeout.tsx
  • src/components/settings/SelectionCaptureMethod.tsx
  • src/components/settings/SelectionClipboardHandling.tsx
  • src/components/settings/advanced/AdvancedSettings.tsx
  • src/stores/settingsStore.ts
  • vendor/tts-rs/.cargo-ok
  • vendor/tts-rs/.cargo/config.toml
  • vendor/tts-rs/.cargo_vcs_info.json
  • vendor/tts-rs/.gitignore
  • vendor/tts-rs/Cargo.toml
  • vendor/tts-rs/Cargo.toml.orig
  • vendor/tts-rs/LICENSE
  • vendor/tts-rs/README.md
  • vendor/tts-rs/examples/kokoro.rs
  • vendor/tts-rs/src/engines/kokoro/engine.rs
  • vendor/tts-rs/src/engines/kokoro/mod.rs
  • vendor/tts-rs/src/engines/kokoro/model.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs
  • vendor/tts-rs/src/engines/kokoro/vocab.rs
  • vendor/tts-rs/src/engines/kokoro/voices.rs
  • vendor/tts-rs/src/engines/mod.rs
  • vendor/tts-rs/src/lib.rs

Comment thread src-tauri/src/selection.rs
Comment thread src-tauri/src/text_normalization.rs
Comment thread src-tauri/src/text_normalization.rs Outdated
Comment thread vendor/tts-rs/.cargo_vcs_info.json
Comment thread vendor/tts-rs/Cargo.toml
Comment thread vendor/tts-rs/src/engines/kokoro/vocab.rs
Comment thread vendor/tts-rs/src/engines/kokoro/voices.rs
Comment thread vendor/tts-rs/src/engines/kokoro/voices.rs
Comment thread vendor/tts-rs/src/engines/kokoro/voices.rs Outdated
Comment thread vendor/tts-rs/src/lib.rs
@rishiskhare
Copy link
Copy Markdown
Owner Author

@coderabbitai autofix

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 13 file(s) based on 15 unresolved review comments.

Files modified:

  • .prettierignore
  • src-tauri/Cargo.lock
  • src-tauri/Cargo.toml
  • src-tauri/src/selection.rs
  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/Cargo.toml
  • vendor/tts-rs/README.md
  • vendor/tts-rs/examples/kokoro.rs
  • vendor/tts-rs/src/engines/kokoro/mod.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs
  • vendor/tts-rs/src/engines/kokoro/vocab.rs
  • vendor/tts-rs/src/engines/kokoro/voices.rs
  • vendor/tts-rs/src/lib.rs

Commit: 4e2e5f07f798b16395e44d1ce0555a65a394b11a

The changes have been pushed to the pr/windows-build-clean branch.

Time taken: 10m 51s

Fixed 13 file(s) based on 15 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
vendor/tts-rs/src/engines/kokoro/phonemizer.rs (1)

208-210: ⚠️ Potential issue | 🟠 Major

Preserve the existing LD_LIBRARY_PATH when adding the bundled directory.

This replaces the parent process's library search path outright. If the app was already launched with required entries in LD_LIBRARY_PATH, the bundled espeak invocation can fail even though those paths were valid before. Prepend/append bin_dir instead of overwriting it.

💡 Suggested fix
     #[cfg(target_os = "linux")]
     if let Some(bin_dir) = espeak.bin_path.as_deref().and_then(|p| p.parent()) {
-        cmd.env("LD_LIBRARY_PATH", bin_dir);
+        let library_path = if let Some(existing) = std::env::var_os("LD_LIBRARY_PATH") {
+            let mut paths = vec![bin_dir.as_os_str().to_owned()];
+            paths.extend(std::env::split_paths(&existing).map(|p| p.into_os_string()));
+            std::env::join_paths(paths).map_err(|e| {
+                KokoroError::Config(format!("Invalid LD_LIBRARY_PATH: {e}"))
+            })?
+        } else {
+            bin_dir.as_os_str().to_owned()
+        };
+        cmd.env("LD_LIBRARY_PATH", library_path);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs` around lines 208 - 210, The
current code overwrites LD_LIBRARY_PATH with only the espeak parent dir; change
it to preserve the existing value by reading the parent process env
(std::env::var_os("LD_LIBRARY_PATH")), and set cmd.env("LD_LIBRARY_PATH",
combined) where combined is bin_dir prepended (or appended) to the existing
value separated by ':' (use bin_dir.as_os_str() and join with the existing
OsString safely); keep the same conditional block around espeak.bin_path and use
cmd.env rather than env_remove so prior library paths are retained.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src-tauri/src/selection.rs`:
- Line 139: The code currently treats any failure of clipboard.read_text() as
None (previous_clipboard) and later clears the clipboard when previous_clipboard
is None, which can destructively erase unreadable/non-text clipboard contents
(especially in DontModify mode); change the logic so previous_clipboard is only
None when there truly was no clipboard content and NOT simply when read_text()
failed—capture read_text() success into Some(text) and on failure preserve a
sentinel meaning "unreadable" (or use a tri-state) and when restoring only call
clipboard.set_text(...) if previous_clipboard is Some(text); ensure the
DontModify/path that currently clears on None instead skips any write when the
previous content was unreadable or read failed, updating the variables/branches
around previous_clipboard and the restore/clear code to only clear or set_text
when you have a concrete String from clipboard.read_text().
- Around line 154-165: The code uses fallible operations
(app.try_state::<EnigoState>()? and .lock().ok()? ) after priming the clipboard,
so those early returns can skip restore_clipboard; change those ?/ok() usages to
explicit matches that on Err/None call restore_clipboard(&clipboard,
previous_clipboard.as_ref()) and then return None. Specifically, in the block
using EnigoState and send_copy_ctrl_c, replace app.try_state::<EnigoState>()?
and let mut enigo = enigo_state.0.lock().ok()? with match/if let forms that
restore the clipboard before returning, keeping the existing send_copy_ctrl_c
error path intact.

In `@src-tauri/src/text_normalization.rs`:
- Around line 510-512: The match arm in normalize_quote_spacing() currently
treats '"' and '”' as requiring a separator after an alphanumeric left (pattern
(_, Some(left), Some(right @ ('"' | '”' | '’'))) ...), which forces an extra
space before closing quotes; update that arm to only match the curly apostrophe
'’' (e.g. (_, Some(left), Some(right @ ('’'))) if left.is_alphanumeric() => { /*
same logic */ }) so that '”' and '"' are no longer treated as needing a
separator and closing quotes won't get an inserted space.
- Around line 464-469: The global .replace call on out that does
.replace("'and", "' and") is too broad and corrupts tokens like 'and' or words
like android; update the post-pass so you only insert a space when 'and' is a
standalone quoted token or token boundary. Replace that chained .replace("'and",
"' and") with a contextual check over out (either a small manual scan that
checks surrounding chars or a targeted regex) that only matches "'and" when the
following character is a word-boundary or a closing quote/whitespace/punctuation
(e.g., match patterns like "'and'" or "'and " or start-of-string cases) and
leave occurrences inside larger words (like "android") untouched; modify the
code operating on the out variable where the replace chain appears to use that
contextual replacement.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 112-118: The loop over text.char_indices() treats '\r' and '\n'
separately causing two boundary tokens for CRLF; modify the block around
map_boundary_punctuation / is_numeric_connector_between_digits so that when ch
== '\r' and the next character is '\n' you treat them as one boundary:
consume/skip the '\r' and handle only the combined CRLF as a single punctuation
token (ensure you account for combined length when calling
is_numeric_connector_between_digits and when calling flush_text_part/parts.push
so only one TextPart::Punct is produced). Make this change inside the same for
(idx, ch) in text.char_indices() loop using the existing helpers
(map_boundary_punctuation, is_numeric_connector_between_digits, flush_text_part,
parts, current).

---

Duplicate comments:
In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 208-210: The current code overwrites LD_LIBRARY_PATH with only the
espeak parent dir; change it to preserve the existing value by reading the
parent process env (std::env::var_os("LD_LIBRARY_PATH")), and set
cmd.env("LD_LIBRARY_PATH", combined) where combined is bin_dir prepended (or
appended) to the existing value separated by ':' (use bin_dir.as_os_str() and
join with the existing OsString safely); keep the same conditional block around
espeak.bin_path and use cmd.env rather than env_remove so prior library paths
are retained.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84504253-c954-43fb-94e4-0b039134beae

📥 Commits

Reviewing files that changed from the base of the PR and between 73bfc2b and 4e2e5f0.

⛔ Files ignored due to path filters (1)
  • src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .prettierignore
  • src-tauri/Cargo.toml
  • src-tauri/src/selection.rs
  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/Cargo.toml
  • vendor/tts-rs/README.md
  • vendor/tts-rs/examples/kokoro.rs
  • vendor/tts-rs/src/engines/kokoro/mod.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs
  • vendor/tts-rs/src/engines/kokoro/vocab.rs
  • vendor/tts-rs/src/engines/kokoro/voices.rs
  • vendor/tts-rs/src/lib.rs
✅ Files skipped from review due to trivial changes (5)
  • .prettierignore
  • src-tauri/Cargo.toml
  • vendor/tts-rs/examples/kokoro.rs
  • vendor/tts-rs/Cargo.toml
  • vendor/tts-rs/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • vendor/tts-rs/src/engines/kokoro/mod.rs
  • vendor/tts-rs/src/lib.rs

Comment thread src-tauri/src/selection.rs Outdated
Comment thread src-tauri/src/selection.rs
Comment thread src-tauri/src/text_normalization.rs Outdated
Comment thread src-tauri/src/text_normalization.rs Outdated
Comment thread vendor/tts-rs/src/engines/kokoro/phonemizer.rs Outdated
Repository owner deleted a comment from chatgpt-codex-connector Bot Apr 12, 2026
@rishiskhare
Copy link
Copy Markdown
Owner Author

@CodeRabbit autofix the rest of the issues!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 3 file(s) based on 5 unresolved review comments.

Files modified:

  • src-tauri/src/selection.rs
  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs

Commit: 31a2ca401b32e614ec8595e90c9fe12422bf74e7

The changes have been pushed to the pr/windows-build-clean branch.

Time taken: 11m 1s

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

⚠️ Branch updated during autofix.

The branch was updated while autofix was in progress. Please try again.

Fixed 3 file(s) based on 5 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src-tauri/src/text_normalization.rs (1)

212-243: ⚠️ Potential issue | 🟠 Major

Track trailing-fragment whitespace instead of guessing before .

had_leading_whitespace only preserves separators when the incoming fragment starts with whitespace. When the separator was trimmed off the previous fragment, both join sites fall back to needs_space_between(), and the Some(right @ '’') branch then turns joins like don + ’t or user + ’s into don ’t / user ’s. Please carry a trailing-whitespace bit from the previous raw fragment (or per-buffer state) and use that to decide whether a separator existed, rather than inferring it from the apostrophe shape.

Suggested direction
  • capture had_trailing_whitespace before calling normalize_inline_whitespace()
  • store that bit alongside SpeechTextRenderer.output, LinkContext.text, and ImageContext.alt_text
  • consult that state before needs_space_between(...)
  • drop the unconditional “incoming after alnum means insert a space” heuristic
  • add a regression case for a split fragment such as don**'t**

Also applies to: 324-342, 527-543

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src-tauri/src/text_normalization.rs` around lines 212 - 243, The push_text
logic uses had_leading_whitespace only and then falls back to
needs_space_between which mis-detects joins before right single-quote (’);
capture a had_trailing_whitespace boolean from the incoming raw fragment before
calling normalize_inline_whitespace() and store that trailing-whitespace bit
alongside the buffer state for SpeechTextRenderer.output, LinkContext.text, and
ImageContext.alt_text (i.e. where append_segment appends); when deciding to
insert a separator consult the previous buffer's had_trailing_whitespace rather
than inferring from the upcoming apostrophe, remove the special-case “incoming ’
after alnum => insert space” heuristic in needs_space_between usage, and add a
regression test covering a split fragment like "don" + "’t" to ensure no
spurious space is inserted.
♻️ Duplicate comments (1)
src-tauri/src/selection.rs (1)

149-155: ⚠️ Potential issue | 🟠 Major

DontModify still overwrites unreadable clipboard contents.

Line 149 collapses every read_text() failure into ClipboardState::Unreadable, but Line 166 still primes the clipboard. Because restore_clipboard() intentionally no-ops for Unreadable, the default DontModify path leaves image/binary clipboard contents replaced by the sentinel or copied text. ClipboardState::Empty is also never constructed, so the clear branch is currently dead.

🛠️ Suggested fix
     let previous_clipboard = match clipboard.read_text() {
-        Ok(text) => ClipboardState::Text(text),
-        Err(_) => {
-            // Try to determine if clipboard has content but is unreadable (e.g., image)
-            // vs. truly empty. Since we can't reliably distinguish, assume unreadable.
-            ClipboardState::Unreadable
-        }
+        Ok(text) if text.is_empty() => ClipboardState::Empty,
+        Ok(text) => ClipboardState::Text(text),
+        Err(err) => {
+            debug!("Unable to read existing clipboard text: {}", err);
+            if handling == ClipboardHandling::DontModify {
+                warn!("Clipboard is unreadable; skipping capture in DontModify mode");
+                return None;
+            }
+            ClipboardState::Unreadable
+        }
     };

Also applies to: 166-213, 222-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src-tauri/src/selection.rs` around lines 149 - 155, The code collapses all
read_text() failures into ClipboardState::Unreadable and never produces
ClipboardState::Empty, causing DontModify paths to prime/overwrite unreadable
clipboards; change the read_text handling in the block that sets
previous_clipboard so that: when read_text() returns Ok(text) and
text.is_empty() set previous_clipboard = ClipboardState::Empty, when
Ok(non-empty) set ClipboardState::Text(text), and when Err keep
ClipboardState::Unreadable; then update the logic that primes/sets the clipboard
(the code that runs before calling restore_clipboard() and any DontModify
handling) to only write the sentinel or copied text when previous_clipboard ==
ClipboardState::Text or previous_clipboard == ClipboardState::Empty (i.e., never
prime when previous_clipboard == ClipboardState::Unreadable). Ensure
restore_clipboard() behavior stays unchanged for Unreadable.
🧹 Nitpick comments (2)
src-tauri/src/selection.rs (1)

197-204: Poll for clipboard change instead of sleeping once.

A single 120 ms sleep makes this path timing-dependent. On slower apps or remote sessions the clipboard can update after the read, which turns a successful copy into a spurious None.

♻️ Suggested approach
-    std::thread::sleep(std::time::Duration::from_millis(CLIPBOARD_COPY_DELAY_MS));
-
-    let copied_text = clipboard.read_text().ok();
+    let deadline =
+        std::time::Instant::now() + std::time::Duration::from_millis(CLIPBOARD_COPY_DELAY_MS);
+    let copied_text = loop {
+        let text = clipboard.read_text().ok();
+        if text.as_deref() != Some(sentinel.as_str()) || std::time::Instant::now() >= deadline {
+            break text;
+        }
+        std::thread::sleep(std::time::Duration::from_millis(15));
+    };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src-tauri/src/selection.rs` around lines 197 - 204, Replace the single
sleep-based wait with a short polling loop that repeatedly calls
clipboard.read_text() and compares the result to the sentinel until it changes
or a timeout elapses (use CLIPBOARD_COPY_DELAY_MS as the total timeout and a
small per-iteration interval); update the logic around clipboard.read_text(),
copied_text, and captured so the code returns the first non-empty, trimmed,
non-sentinel value when observed or None after timeout, ensuring you preserve
the sentinel comparison and string trimming/map-to-owned behavior.
vendor/tts-rs/src/engines/kokoro/phonemizer.rs (1)

316-355: Add a CRLF regression test.

The implementation now collapses \r\n into a single TextPart::Punct('.'), but the split tests still only cover plain punctuation and numeric separators. A small Hello\r\nworld assertion would keep that Windows-specific fix from regressing.

🧪 Suggested test
     #[test]
     fn still_splits_comma_when_not_between_digits() {
         let parts = split_text_parts("Value 2, next");
         assert_eq!(
             parts,
             vec![
                 TextPart::Text("Value 2".to_string()),
                 TextPart::Punct(','),
                 TextPart::Text("next".to_string()),
             ]
         );
     }
+
+    #[test]
+    fn collapses_crlf_into_single_boundary() {
+        let parts = split_text_parts("Hello\r\nworld");
+        assert_eq!(
+            parts,
+            vec![
+                TextPart::Text("Hello".to_string()),
+                TextPart::Punct('.'),
+                TextPart::Text("world".to_string()),
+            ]
+        );
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs` around lines 316 - 355, Add a
new unit test that ensures split_text_parts treats Windows CRLF ("\r\n") as the
single punctuation produced by the current implementation: call
split_text_parts("Hello\r\nworld") and assert the result equals
vec![TextPart::Text("Hello".to_string()), TextPart::Punct('.'),
TextPart::Text("world".to_string())]; name the test something like
splits_crlf_into_single_punct and place it alongside the other tests for
split_text_parts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src-tauri/src/text_normalization.rs`:
- Around line 503-516: The issue: straight apostrophes produced by
decode_html_entity("apos") are not recognized as quotes, so update the quote
helpers to treat '\'' as a quote: add '\'' to the match arms in is_opening_quote
(treat it like '‘'/'’' handling) and include '\'' in is_quote_char's matches so
the quote-tightening pass recognizes &apos; as a quote; alternatively, normalize
&apos; to ’ at decode time — pick one consistent approach and update the
corresponding code paths (is_opening_quote and is_quote_char or the HTML decode)
so straight single quotes are handled.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 237-260: The espeak-ng child process in phonemizer.rs (spawned
into variable child) is currently unbounded and can block in write_all() and
wait_with_output(); update phonemizer.rs to perform the write + wait under a
hard timeout (use std::time::Duration and a timeout mechanism such as
std::thread + join with timeout or a wait_timeout helper), ensure stdin (after
canonicalize_espeak_stdin_payload) is flushed/closed before waiting, and if the
timeout expires kill the child (child.kill()/child.wait()) and return a new
KokoroError variant (e.g., KokoroError::Timeout) instead of hanging; wire this
into callers like phonemize() so a wedged espeak-ng fails fast.
- Around line 226-229: The current block that sets LD_LIBRARY_PATH via
cmd.env("LD_LIBRARY_PATH", bin_dir) overwrites any inherited value; change it to
prepend bin_dir to the existing LD_LIBRARY_PATH instead: read the current
LD_LIBRARY_PATH from std::env::var_os (or Command's inherited env), build a new
value that is "bin_dir:<existing>" when an existing value is present (or just
"bin_dir" when not), and pass that combined string to cmd.env. Update the code
around espeak.bin_path.as_deref().and_then(|p| p.parent()) and the cmd.env call
so the bundled directory is prepended rather than clobbering the parent's
LD_LIBRARY_PATH.

---

Outside diff comments:
In `@src-tauri/src/text_normalization.rs`:
- Around line 212-243: The push_text logic uses had_leading_whitespace only and
then falls back to needs_space_between which mis-detects joins before right
single-quote (’); capture a had_trailing_whitespace boolean from the incoming
raw fragment before calling normalize_inline_whitespace() and store that
trailing-whitespace bit alongside the buffer state for
SpeechTextRenderer.output, LinkContext.text, and ImageContext.alt_text (i.e.
where append_segment appends); when deciding to insert a separator consult the
previous buffer's had_trailing_whitespace rather than inferring from the
upcoming apostrophe, remove the special-case “incoming ’ after alnum => insert
space” heuristic in needs_space_between usage, and add a regression test
covering a split fragment like "don" + "’t" to ensure no spurious space is
inserted.

---

Duplicate comments:
In `@src-tauri/src/selection.rs`:
- Around line 149-155: The code collapses all read_text() failures into
ClipboardState::Unreadable and never produces ClipboardState::Empty, causing
DontModify paths to prime/overwrite unreadable clipboards; change the read_text
handling in the block that sets previous_clipboard so that: when read_text()
returns Ok(text) and text.is_empty() set previous_clipboard =
ClipboardState::Empty, when Ok(non-empty) set ClipboardState::Text(text), and
when Err keep ClipboardState::Unreadable; then update the logic that primes/sets
the clipboard (the code that runs before calling restore_clipboard() and any
DontModify handling) to only write the sentinel or copied text when
previous_clipboard == ClipboardState::Text or previous_clipboard ==
ClipboardState::Empty (i.e., never prime when previous_clipboard ==
ClipboardState::Unreadable). Ensure restore_clipboard() behavior stays unchanged
for Unreadable.

---

Nitpick comments:
In `@src-tauri/src/selection.rs`:
- Around line 197-204: Replace the single sleep-based wait with a short polling
loop that repeatedly calls clipboard.read_text() and compares the result to the
sentinel until it changes or a timeout elapses (use CLIPBOARD_COPY_DELAY_MS as
the total timeout and a small per-iteration interval); update the logic around
clipboard.read_text(), copied_text, and captured so the code returns the first
non-empty, trimmed, non-sentinel value when observed or None after timeout,
ensuring you preserve the sentinel comparison and string trimming/map-to-owned
behavior.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 316-355: Add a new unit test that ensures split_text_parts treats
Windows CRLF ("\r\n") as the single punctuation produced by the current
implementation: call split_text_parts("Hello\r\nworld") and assert the result
equals vec![TextPart::Text("Hello".to_string()), TextPart::Punct('.'),
TextPart::Text("world".to_string())]; name the test something like
splits_crlf_into_single_punct and place it alongside the other tests for
split_text_parts.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36c74298-c5e2-425d-85ab-dc54716530c8

📥 Commits

Reviewing files that changed from the base of the PR and between 4e2e5f0 and 31a2ca4.

📒 Files selected for processing (3)
  • src-tauri/src/selection.rs
  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs

Comment thread src-tauri/src/text_normalization.rs Outdated
Comment thread vendor/tts-rs/src/engines/kokoro/phonemizer.rs
Comment thread vendor/tts-rs/src/engines/kokoro/phonemizer.rs Outdated
@rishiskhare
Copy link
Copy Markdown
Owner Author

@CodeRabbit fix the major issues

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get all review comments on PR `#6`
gh pr view 6 --repo rishiskhare/parrot --json reviews,comments

echo "---REVIEW COMMENTS---"
gh api repos/rishiskhare/parrot/pulls/6/comments --jq '.[] | {id: .id, path: .path, body: .body, line: .line, resolved: .resolved}'

echo "---PR COMMENTS---"
gh api repos/rishiskhare/parrot/issues/6/comments --jq '.[] | {id: .id, body: .body, user: .user.login}'

Length of output: 198246


@rishiskhare Here are the remaining major issues and their fixes:

Here are the 4 remaining unresolved major issues I'll fix:

  • Fix straight ' bypassing the quote-tightening pass in text_normalization.rs
  • Fix LD_LIBRARY_PATH overwrite in phonemizer.rs (Linux)
  • Add timeout to espeak-ng subprocess in phonemizer.rs
  • Fix trailing-whitespace tracking in SpeechTextRenderer to avoid spurious spaces before '

🏁 Script executed:

#!/bin/bash
# Read the current implementations of is_opening_quote and is_quote_char
rg -n "fn is_opening_quote|fn is_quote_char|fn is_quote_word_char|fn should_trim_space" src-tauri/src/text_normalization.rs -A 10

Length of output: 1080


🏁 Script executed:

#!/bin/bash
# Read the current LD_LIBRARY_PATH block
rg -n "LD_LIBRARY_PATH|bin_dir|cmd\.env" vendor/tts-rs/src/engines/kokoro/phonemizer.rs -B 3 -A 3

Length of output: 469


🏁 Script executed:

#!/bin/bash
# Read the run_espeak function and surrounding context
rg -n "fn run_espeak|child|write_all|wait_with_output|kill" vendor/tts-rs/src/engines/kokoro/phonemizer.rs -B 2 -A 5
# Also check KokoroError enum for existing variants
rg -n "pub enum KokoroError|Timeout|PhonemizerFailed" vendor/tts-rs/src/engines/kokoro/model.rs vendor/tts-rs/src/engines/kokoro/phonemizer.rs -A 2

Length of output: 2424


🏁 Script executed:

#!/bin/bash
# Read the SpeechTextRenderer struct and push_text method, and the append_segment helper
rg -n "struct SpeechTextRenderer|struct LinkContext|struct ImageContext|fn push_text|fn append_segment|had_leading_whitespace|had_trailing" src-tauri/src/text_normalization.rs -B 2 -A 15

Length of output: 3982


🏁 Script executed:

#!/bin/bash
# Read the full needs_space_between function and surrounding context
sed -n '527,580p' src-tauri/src/text_normalization.rs

echo "---"
# Also check the KokoroError enum to see all variants for adding Timeout
sed -n '25,60p' vendor/tts-rs/src/engines/kokoro/model.rs

Length of output: 3313


🏁 Script executed:

#!/bin/bash
# Read all call sites of append_segment and LinkContext/ImageContext construction
rg -n "append_segment|LinkContext {|ImageContext {" src-tauri/src/text_normalization.rs -B 1 -A 4

echo "---LINK_CONTEXT_INIT---"
# How LinkContext is created and text_had_trailing would be initialized
rg -n "LinkContext" src-tauri/src/text_normalization.rs -A 3

echo "---IMAGE_CONTEXT_INIT---"
rg -n "ImageContext" src-tauri/src/text_normalization.rs -A 3

Length of output: 1014


@rishiskhare Here is the full plan for the 4 remaining major issues:

Remaining Major Issues

  • Issue 1 (text_normalization.rs): Add '\'' to is_opening_quote / is_quote_char for &apos; handling
  • Issue 2 (phonemizer.rs L228): Prepend bin_dir to existing LD_LIBRARY_PATH instead of clobbering it
  • Issue 3 (phonemizer.rs L260 + model.rs): Add KokoroError::Timeout and bound espeak-ng with a hard timeout
  • Issue 4 (text_normalization.rs L212-243): Track per-buffer trailing-whitespace state to prevent spurious spaces before ' when fragments are split (e.g. don + 't)

Failed to handle agent chat message. Please try again.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 3 file(s) based on 3 unresolved review comments.

Files modified:

  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/src/engines/kokoro/model.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs

Commit: 1e55f85b97da9e98c5b9a1a7854c502fd55142cf

The changes have been pushed to the pr/windows-build-clean branch.

Time taken: 8m 10s

Fixed 3 file(s) based on 3 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
vendor/tts-rs/src/engines/kokoro/phonemizer.rs (1)

259-320: ⚠️ Potential issue | 🔴 Critical

The subprocess timeout can still hang instead of failing fast.

The 30s deadline only starts after Line 265 returns, so a wedged child can still block indefinitely in write_all(). Then, if recv_timeout() fires, the timeout branch on Line 315 tries to lock the same mutex that the waiter thread is holding across wait_with_output(), so the kill path can deadlock and never return KokoroError::Timeout(_).

Please restructure this so the timeout covers the full write+wait lifecycle and the thread responsible for timing out still owns the Child handle it needs to kill.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs` around lines 259 - 320, The
current timeout doesn't cover the blocking write and deadlocks because the
waiter thread holds the Child; fix by moving the entire write+wait lifecycle
into the spawned worker so the main thread can enforce the timeout and kill by
PID: change wait_with_timeout (and its call site where
canonicalize_espeak_stdin_payload is used) so it captures child.id() first, then
spawn a thread that takes ownership of the Child and the stdin payload, performs
write_all(stdin_payload) and then wait_with_output() and sends the Result over a
channel; in the main thread use recv_timeout(timeout) to wait, and on timeout
invoke a platform-appropriate kill of the child by the saved child_id (then
return KokoroError::Timeout), ensuring no mutex is held across blocking waits
and the timeout covers both write and wait.
🧹 Nitpick comments (1)
vendor/tts-rs/src/engines/kokoro/model.rs (1)

235-241: Make the fallback .onnx selection deterministic.

std::fs::read_dir() does not guarantee ordering. If kokoro-quant-convinteger.onnx is absent and the directory contains multiple .onnx files, different filesystems can pick different models on different platforms. Sorting the candidates before choosing the fallback would make packaging behavior stable.

💡 Example refactor
-    // Scan for any .onnx file
-    for entry in std::fs::read_dir(model_dir)? {
-        let entry = entry?;
-        let path = entry.path();
-        if path.extension().and_then(|e| e.to_str()) == Some("onnx") {
-            log::info!("Using ONNX file: {}", path.display());
-            return Ok(path);
-        }
-    }
+    let mut candidates = std::fs::read_dir(model_dir)?
+        .filter_map(|entry| entry.ok().map(|e| e.path()))
+        .filter(|path| path.extension().and_then(|e| e.to_str()) == Some("onnx"))
+        .collect::<Vec<_>>();
+    candidates.sort();
+
+    if let Some(path) = candidates.into_iter().next() {
+        log::info!("Using ONNX file: {}", path.display());
+        return Ok(path);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/model.rs` around lines 235 - 241, The
current loop using std::fs::read_dir(model_dir) in
vendor/tts-rs/src/engines/kokoro/model.rs returns the first found .onnx file
non-deterministically; change the logic in that scanning block (the code that
iterates entries and returns a path) to collect all entries whose extension is
"onnx" into a Vec<PathBuf>, sort them deterministically (e.g., by file_name() or
display() string), then choose the first sorted candidate (after the existing
explicit check for "kokoro-quant-convinteger.onnx" is still applied) and return
it, so the fallback selection is stable across filesystems.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@vendor/tts-rs/src/engines/kokoro/model.rs`:
- Around line 337-358: split_chunks currently hardcodes punctuation IDs
([1,2,3,4,5,6]) which breaks on vocab layouts loaded from config; change the API
so the caller passes the punctuation ID set (derived from self.vocab) into
split_chunks (or add a new split_chunks_with_punct(ids: &[i64], punct_ids:
&[i64]) and use that) and update calls where split_chunks is used to compute
punct_ids from self.vocab (e.g., map characters like ';',':',',','.','!','?' to
their IDs) before invoking the chunker, keeping MAX_PHONEME_LEN behavior
unchanged.

---

Duplicate comments:
In `@vendor/tts-rs/src/engines/kokoro/phonemizer.rs`:
- Around line 259-320: The current timeout doesn't cover the blocking write and
deadlocks because the waiter thread holds the Child; fix by moving the entire
write+wait lifecycle into the spawned worker so the main thread can enforce the
timeout and kill by PID: change wait_with_timeout (and its call site where
canonicalize_espeak_stdin_payload is used) so it captures child.id() first, then
spawn a thread that takes ownership of the Child and the stdin payload, performs
write_all(stdin_payload) and then wait_with_output() and sends the Result over a
channel; in the main thread use recv_timeout(timeout) to wait, and on timeout
invoke a platform-appropriate kill of the child by the saved child_id (then
return KokoroError::Timeout), ensuring no mutex is held across blocking waits
and the timeout covers both write and wait.

---

Nitpick comments:
In `@vendor/tts-rs/src/engines/kokoro/model.rs`:
- Around line 235-241: The current loop using std::fs::read_dir(model_dir) in
vendor/tts-rs/src/engines/kokoro/model.rs returns the first found .onnx file
non-deterministically; change the logic in that scanning block (the code that
iterates entries and returns a path) to collect all entries whose extension is
"onnx" into a Vec<PathBuf>, sort them deterministically (e.g., by file_name() or
display() string), then choose the first sorted candidate (after the existing
explicit check for "kokoro-quant-convinteger.onnx" is still applied) and return
it, so the fallback selection is stable across filesystems.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fdc6627-3ce9-4eaa-87b5-4b54aab2d352

📥 Commits

Reviewing files that changed from the base of the PR and between 31a2ca4 and 1e55f85.

📒 Files selected for processing (3)
  • src-tauri/src/text_normalization.rs
  • vendor/tts-rs/src/engines/kokoro/model.rs
  • vendor/tts-rs/src/engines/kokoro/phonemizer.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src-tauri/src/text_normalization.rs

Comment thread vendor/tts-rs/src/engines/kokoro/model.rs
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

Files modified:

  • vendor/tts-rs/src/engines/kokoro/model.rs

Commit: dbbc14f1ba3d6b3381d73f4398b41717b8f9a10a

The changes have been pushed to the pr/windows-build-clean branch.

Time taken: 10m 27s

Fixed 1 file(s) based on 1 unresolved review comment.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@vendor/tts-rs/src/engines/kokoro/model.rs`:
- Around line 241-247: The current loop over std::fs::read_dir(model_dir)
returns the first .onnx entry in nondeterministic filesystem order; instead
collect all paths whose extension == "onnx" into a Vec, sort them
deterministically (e.g., by file_name() or display string using
to_string_lossy()), and then return the first element of the sorted list; update
the code around the read_dir loop that uses entry.path() and the ONNX check to
perform collection + sort before returning to ensure deterministic model
selection.
- Around line 318-339: Change detect_tokens_input(session: &Session) and
detect_speed_type(session: &Session) to return Result<T, KokoroError> instead of
silently returning defaults: detect_tokens_input should return Result<String,
KokoroError> and return Err(KokoroError::MissingInput("input_ids/tokens")) if
neither "input_ids" nor "tokens" is present; detect_speed_type should return
Result<bool, KokoroError> and return Err(KokoroError::MissingInput("speed")) if
"speed" is absent (or Err with a clear message if dtype cannot be determined).
Update the callers in the model loader to use the ? operator to propagate errors
during model load so missing inputs fail fast. Ensure you reference the Session
input iteration (session.inputs()), input.name(), and input.dtype() when
implementing the checks and error construction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4bf1ea05-ace5-4e06-b6b6-cb63f0e632f3

📥 Commits

Reviewing files that changed from the base of the PR and between 1e55f85 and dbbc14f.

📒 Files selected for processing (1)
  • vendor/tts-rs/src/engines/kokoro/model.rs

Comment on lines +241 to +247
for entry in std::fs::read_dir(model_dir)? {
let entry = entry?;
let path = entry.path();
if path.extension().and_then(|e| e.to_str()) == Some("onnx") {
log::info!("Using ONNX file: {}", path.display());
return Ok(path);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback ONNX selection is nondeterministic.

Line 241-Line 247 picks the first .onnx from read_dir, but directory iteration order is not stable across platforms/filesystems. This can load different models in CI/package environments.

💡 Suggested fix
-    // Scan for any .onnx file
-    for entry in std::fs::read_dir(model_dir)? {
-        let entry = entry?;
-        let path = entry.path();
-        if path.extension().and_then(|e| e.to_str()) == Some("onnx") {
-            log::info!("Using ONNX file: {}", path.display());
-            return Ok(path);
-        }
-    }
+    // Scan for any .onnx file deterministically
+    let mut candidates = Vec::new();
+    for entry in std::fs::read_dir(model_dir)? {
+        let entry = entry?;
+        let path = entry.path();
+        if path.extension().and_then(|e| e.to_str()) == Some("onnx") {
+            candidates.push(path);
+        }
+    }
+    candidates.sort();
+    if let Some(path) = candidates.into_iter().next() {
+        log::info!("Using ONNX file: {}", path.display());
+        return Ok(path);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/model.rs` around lines 241 - 247, The
current loop over std::fs::read_dir(model_dir) returns the first .onnx entry in
nondeterministic filesystem order; instead collect all paths whose extension ==
"onnx" into a Vec, sort them deterministically (e.g., by file_name() or display
string using to_string_lossy()), and then return the first element of the sorted
list; update the code around the read_dir loop that uses entry.path() and the
ONNX check to perform collection + sort before returning to ensure deterministic
model selection.

Comment on lines +318 to +339
/// Detect the token input name ("input_ids" or "tokens") from session inputs.
fn detect_tokens_input(session: &Session) -> String {
for input in session.inputs() {
if input.name() == "input_ids" || input.name() == "tokens" {
return input.name().to_string();
}
}
// Default to "input_ids" if neither is found
"input_ids".to_string()
}

/// Detect whether the speed input expects int32 (true) or float32 (false).
fn detect_speed_type(session: &Session) -> bool {
for input in session.inputs() {
if input.name() == "speed" {
// Check the type description
let type_str = format!("{:?}", input.dtype());
return type_str.contains("Int32") || type_str.contains("int32");
}
}
// Default: modern Kokoro models use int32
true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and check the exact code at those lines
cat -n vendor/tts-rs/src/engines/kokoro/model.rs | sed -n '318,340p'

Repository: rishiskhare/parrot

Length of output: 1053


🏁 Script executed:

# Search for KokoroError definition to verify the Config variant exists
rg "enum KokoroError|struct KokoroError" vendor/tts-rs/src/

Repository: rishiskhare/parrot

Length of output: 126


🏁 Script executed:

# Find all usages of detect_tokens_input and detect_speed_type
rg "detect_tokens_input|detect_speed_type" vendor/tts-rs/src/

Repository: rishiskhare/parrot

Length of output: 454


🏁 Script executed:

# Get the full KokoroError enum definition
rg -A 20 "pub enum KokoroError" vendor/tts-rs/src/engines/kokoro/model.rs

Repository: rishiskhare/parrot

Length of output: 870


🏁 Script executed:

# Find the context where these functions are called (get surrounding lines)
rg -B 5 -A 5 "let tokens_input_name = detect_tokens_input" vendor/tts-rs/src/engines/kokoro/model.rs

Repository: rishiskhare/parrot

Length of output: 498


🏁 Script executed:

# Get the function signature of the calling function
rg -B 20 "let tokens_input_name = detect_tokens_input" vendor/tts-rs/src/engines/kokoro/model.rs | head -25

Repository: rishiskhare/parrot

Length of output: 846


Fail fast if required model inputs are not found.

The functions detect_tokens_input() and detect_speed_type() use silent defaults ("input_ids" and int32) that mask model schema mismatches. When the model lacks expected inputs, the functions succeed but inference fails later with obscure errors. Since these are called during model load—which already returns Result<Self, KokoroError>—both functions should return Result<T, KokoroError> and propagate errors via the ? operator instead.

Suggested fix
-fn detect_tokens_input(session: &Session) -> String {
+fn detect_tokens_input(session: &Session) -> Result<String, KokoroError> {
     for input in session.inputs() {
         if input.name() == "input_ids" || input.name() == "tokens" {
-            return input.name().to_string();
+            return Ok(input.name().to_string());
         }
     }
-    "input_ids".to_string()
+    Err(KokoroError::Config(
+        "Missing required token input: expected 'input_ids' or 'tokens'".to_string(),
+    ))
 }
 
-fn detect_speed_type(session: &Session) -> bool {
+fn detect_speed_type(session: &Session) -> Result<bool, KokoroError> {
     for input in session.inputs() {
         if input.name() == "speed" {
             let type_str = format!("{:?}", input.dtype());
-            return type_str.contains("Int32") || type_str.contains("int32");
+            return Ok(type_str.contains("Int32") || type_str.contains("int32"));
         }
     }
-    true
+    Err(KokoroError::Config(
+        "Missing required 'speed' input in model graph".to_string(),
+    ))
 }
-        let tokens_input_name = detect_tokens_input(&session);
-        let speed_is_int32 = detect_speed_type(&session);
+        let tokens_input_name = detect_tokens_input(&session)?;
+        let speed_is_int32 = detect_speed_type(&session)?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Detect the token input name ("input_ids" or "tokens") from session inputs.
fn detect_tokens_input(session: &Session) -> String {
for input in session.inputs() {
if input.name() == "input_ids" || input.name() == "tokens" {
return input.name().to_string();
}
}
// Default to "input_ids" if neither is found
"input_ids".to_string()
}
/// Detect whether the speed input expects int32 (true) or float32 (false).
fn detect_speed_type(session: &Session) -> bool {
for input in session.inputs() {
if input.name() == "speed" {
// Check the type description
let type_str = format!("{:?}", input.dtype());
return type_str.contains("Int32") || type_str.contains("int32");
}
}
// Default: modern Kokoro models use int32
true
/// Detect the token input name ("input_ids" or "tokens") from session inputs.
fn detect_tokens_input(session: &Session) -> Result<String, KokoroError> {
for input in session.inputs() {
if input.name() == "input_ids" || input.name() == "tokens" {
return Ok(input.name().to_string());
}
}
Err(KokoroError::Config(
"Missing required token input: expected 'input_ids' or 'tokens'".to_string(),
))
}
/// Detect whether the speed input expects int32 (true) or float32 (false).
fn detect_speed_type(session: &Session) -> Result<bool, KokoroError> {
for input in session.inputs() {
if input.name() == "speed" {
// Check the type description
let type_str = format!("{:?}", input.dtype());
return Ok(type_str.contains("Int32") || type_str.contains("int32"));
}
}
Err(KokoroError::Config(
"Missing required 'speed' input in model graph".to_string(),
))
}
Suggested change
/// Detect the token input name ("input_ids" or "tokens") from session inputs.
fn detect_tokens_input(session: &Session) -> String {
for input in session.inputs() {
if input.name() == "input_ids" || input.name() == "tokens" {
return input.name().to_string();
}
}
// Default to "input_ids" if neither is found
"input_ids".to_string()
}
/// Detect whether the speed input expects int32 (true) or float32 (false).
fn detect_speed_type(session: &Session) -> bool {
for input in session.inputs() {
if input.name() == "speed" {
// Check the type description
let type_str = format!("{:?}", input.dtype());
return type_str.contains("Int32") || type_str.contains("int32");
}
}
// Default: modern Kokoro models use int32
true
let tokens_input_name = detect_tokens_input(&session)?;
let speed_is_int32 = detect_speed_type(&session)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/tts-rs/src/engines/kokoro/model.rs` around lines 318 - 339, Change
detect_tokens_input(session: &Session) and detect_speed_type(session: &Session)
to return Result<T, KokoroError> instead of silently returning defaults:
detect_tokens_input should return Result<String, KokoroError> and return
Err(KokoroError::MissingInput("input_ids/tokens")) if neither "input_ids" nor
"tokens" is present; detect_speed_type should return Result<bool, KokoroError>
and return Err(KokoroError::MissingInput("speed")) if "speed" is absent (or Err
with a clear message if dtype cannot be determined). Update the callers in the
model loader to use the ? operator to propagate errors during model load so
missing inputs fail fast. Ensure you reference the Session input iteration
(session.inputs()), input.name(), and input.dtype() when implementing the checks
and error construction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant