Skip to content

fix(voice): enable GPU detection and Metal acceleration for whisper (#558)#571

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
oxoxDev:fix/558-gpu-detection-acceleration
Apr 14, 2026
Merged

fix(voice): enable GPU detection and Metal acceleration for whisper (#558)#571
senamakel merged 4 commits into
tinyhumansai:mainfrom
oxoxDev:fix/558-gpu-detection-acceleration

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Apr 14, 2026

Summary

  • Enable Metal GPU acceleration for whisper-rs on macOS via target-specific Cargo feature
  • Expand GPU detection to cover Intel Mac, Apple Silicon, and NVIDIA (via nvidia-smi)
  • Thread DeviceProfile GPU info through both bootstrap and lazy-load whisper engine paths
  • Whisper transcription now runs on Metal backend (~20x realtime on Apple M2)

Problem

Voice model GPU detection existed (DeviceProfile.has_gpu) but was never passed to the whisper engine. Meanwhile, whisper-rs was compiled without any GPU backend feature, so WhisperContextParameters::default() always set use_gpu: false. Result: voice transcription always ran on CPU regardless of available hardware acceleration.

Root causes:

  1. Cargo.toml:83whisper-rs = "0.16" declared without GPU features, so the internal _gpu flag was never set
  2. whisper_engine.rs:63WhisperContextParameters::default() used without configuring GPU params from the detected device profile
  3. device.rs:73-84 — GPU detection was Apple Silicon only; no Intel Mac distinction, no NVIDIA probe for Windows/Linux

Solution

  1. Cargo.toml — Added [target.'cfg(target_os = "macos")'.dependencies] section enabling metal feature on whisper-rs. Cargo merges features from both declarations so non-macOS builds are unaffected.
  2. device.rs — Expanded detect_gpu() with Intel Mac detection (no Metal for whisper), nvidia-smi subprocess probe for Windows/Linux NVIDIA GPUs, and tracing::debug at each decision point.
  3. whisper_engine.rs — Updated load_engine() to accept has_gpu and gpu_description params; explicitly configures use_gpu and flash_attn on WhisperContextParameters instead of relying on compile-time defaults.
  4. bootstrap.rs — Passes DeviceProfile GPU info to whisper engine at bootstrap startup.
  5. speech.rs — Detects device profile at lazy-load time and passes GPU info through.

Submission Checklist

  • Unit testscargo test passes (11/11 for device, whisper_engine, bootstrap modules)
  • N/A — E2E test infrastructure does not cover whisper GPU acceleration; verified manually via tauri dev with RUST_LOG=debug
  • Doc comments — Updated rustdoc on load_engine() and detect_gpu()
  • Inline comments — Added comments explaining GPU param configuration and feature merging

Manual verification:

  • Confirmed whisper_backend_init_gpu: using Metal backend in logs
  • Confirmed GPU name: Apple M2, MTLGPUFamilyMetal3 in Metal init
  • Confirmed transcription at 159ms for 3.1s audio (~20x realtime)
  • Confirmed voice dictation end-to-end working (recording → transcription → clipboard)

Impact

  • macOS (Apple Silicon): Whisper now uses Metal GPU — significant transcription speedup
  • macOS (Intel): Correctly reports no GPU acceleration available, falls back to CPU
  • Windows/Linux (NVIDIA): GPU detection via nvidia-smi; CUDA acceleration requires future cuda feature enablement
  • Windows/Linux (no NVIDIA): No behavior change, continues using CPU
  • No breaking API changes; load_engine signature change is crate-internal

Related

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

    • macOS builds include Metal GPU support for accelerated speech transcription.
    • In-process speech engine now respects detected GPU presence and description when initializing.
  • Bug Fixes

    • Improved GPU detection across macOS (Apple/Intel), Windows, and Linux (NVIDIA).
    • Enhanced logging for clearer hardware-acceleration diagnostics.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Device GPU detection was expanded (Apple Metal, Intel Mac, NVIDIA via nvidia-smi) and GPU availability/description are now passed through service bootstrap/speech into the Whisper engine; load_engine was extended to accept and apply GPU settings (use_gpu, flash_attn).

Changes

Cohort / File(s) Summary
Platform deps & device detection
Cargo.toml, src/openhuman/local_ai/device.rs
Added macOS-only whisper-rs dependency (metal feature). Enhanced GPU detection: Apple Silicon detection, Intel Mac branch, NVIDIA probing via nvidia-smi, and explicit CPU fallback; added related debug logs and probe_nvidia_smi() helper.
Service integration
src/openhuman/local_ai/service/bootstrap.rs, src/openhuman/local_ai/service/speech.rs
Detect device profile earlier and pass has_gpu + gpu_description into the blocking whisper engine load task; adjusted spawn closure to forward GPU parameters.
Whisper engine
src/openhuman/local_ai/service/whisper_engine.rs
Extended load_engine signature to accept has_gpu: bool and gpu_description: Option<&str>; set WhisperContextParameters (use_gpu, enable flash_attn when GPU present) and updated logs/tests to reflect backend selection.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant DeviceDetector
  participant Service
  participant WhisperEngine
  participant System as "Host (nvidia-smi)"

  App->>Service: request init/load model
  Service->>DeviceDetector: detect_device_profile()
  DeviceDetector->>System: (on Windows/Linux) run `nvidia-smi`
  System-->>DeviceDetector: GPU name or failure
  DeviceDetector-->>Service: DeviceProfile(has_gpu, gpu_description)
  Service->>WhisperEngine: spawn_blocking load_engine(handle, model, has_gpu, gpu_description)
  WhisperEngine-->>Service: load result (success/failure)
  Service-->>App: return initialization result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐇 I sniffed the silicon and heard a hum,
Metal, CUDA, whispers on the run,
nvidia-smi named the GPU true,
I hopped to pass the info through,
Now models hum faster — carrot crumbs for you! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(voice): enable GPU detection and Metal acceleration for whisper' clearly summarizes the main objective of the PR: enabling GPU detection and Metal acceleration for the whisper voice engine.
Linked Issues check ✅ Passed All coding requirements from issue #558 are met: GPU detection is expanded across platforms (Apple Silicon, Intel Mac, NVIDIA via nvidia-smi), Metal acceleration is enabled on macOS, GPU info is threaded through engine initialization, CPU fallback is explicit with logging, and test coverage is provided.
Out of Scope Changes check ✅ Passed All changes directly support GPU detection and acceleration objectives. Cargo.toml adds Metal feature targeting, device.rs expands GPU detection logic, whisper_engine.rs configures GPU parameters at runtime, and bootstrap.rs/speech.rs thread device info through initialization paths.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

senamakel
senamakel previously approved these changes Apr 14, 2026
oxoxDev and others added 4 commits April 15, 2026 01:16
…inyhumansai#558)

Add Intel Mac detection (no Metal GPU for whisper), nvidia-smi probe
for Windows/Linux NVIDIA GPUs, and diagnostic tracing at each
decision point in detect_gpu().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
)

Add target-specific dependency for macOS that enables the `metal`
feature on whisper-rs, compiling whisper.cpp with Metal GPU support.
Cargo merges features from both declarations so non-macOS builds
are unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
)

Accept has_gpu and gpu_description in load_engine() and explicitly
set use_gpu and flash_attn on WhisperContextParameters instead of
relying on the compile-time default. Log the selected acceleration
backend at startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ai#558)

Thread DeviceProfile has_gpu and gpu_description through both the
bootstrap (startup) and speech (lazy) whisper engine load calls so
the engine can configure Metal or CUDA acceleration at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@oxoxDev oxoxDev force-pushed the fix/558-gpu-detection-acceleration branch from bb070cc to 9a7e75e Compare April 14, 2026 19:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/local_ai/device.rs (1)

99-124: Consider adding a timeout to the nvidia-smi subprocess.

The synchronous Command::output() call could block indefinitely if nvidia-smi hangs (e.g., due to driver issues). While rare, this would prevent startup from completing. A timeout would improve resilience.

♻️ Suggested timeout approach
fn probe_nvidia_smi() -> Option<String> {
    use std::time::Duration;
    
    let mut child = std::process::Command::new("nvidia-smi")
        .args(["--query-gpu=name", "--format=csv,noheader"])
        .stdout(std::process::Stdio::piped())
        .stderr(std::process::Stdio::null())
        .spawn()
        .ok()?;

    // Wait up to 5 seconds for nvidia-smi to complete.
    let start = std::time::Instant::now();
    loop {
        match child.try_wait() {
            Ok(Some(status)) if status.success() => break,
            Ok(Some(_)) => return None, // Non-zero exit
            Ok(None) if start.elapsed() > Duration::from_secs(5) => {
                let _ = child.kill();
                tracing::debug!("nvidia-smi timed out");
                return None;
            }
            Ok(None) => std::thread::sleep(Duration::from_millis(50)),
            Err(_) => return None,
        }
    }

    let output = child.wait_with_output().ok()?;
    let name = String::from_utf8_lossy(&output.stdout)
        .lines()
        .next()?
        .trim()
        .to_string();

    if name.is_empty() {
        return None;
    }
    Some(format!("NVIDIA {name} (CUDA)"))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 99 - 124, The probe_nvidia_smi
function should avoid blocking indefinitely by replacing the synchronous
Command::output() call with spawn() and a timed wait: spawn the process in
probe_nvidia_smi, poll with child.try_wait() in a short sleep loop up to a
chosen timeout (e.g., 5s), handle Ok(Some(status)) (return None on non-zero), on
timeout kill the child and return None, then call child.wait_with_output() to
read stdout and proceed with the existing UTF-8/empty checks; include a debug
log (e.g., tracing::debug!) when a timeout occurs and ensure all error cases
still return None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/local_ai/service/whisper_engine.rs`:
- Around line 69-80: The code calls WhisperContextParameters.use_gpu(has_gpu)
and WhisperContextParameters.flash_attn(true) but those methods may not exist on
the custom whisper-rs-sys fork; inspect the forked definition of
WhisperContextParameters and confirm the presence and signatures of use_gpu and
flash_attn, and if absent either add equivalent setter methods to the fork
(preserving API and safety) or change this code to use the supported GPU
configuration mechanism provided by whisper-rs (e.g., setting explicit fields on
WhisperContextParameters, using a builder or passing GPU/backend options to the
context creation function); specifically update the call sites around params,
WhisperContextParameters and the has_gpu logic to match the verified API.

---

Nitpick comments:
In `@src/openhuman/local_ai/device.rs`:
- Around line 99-124: The probe_nvidia_smi function should avoid blocking
indefinitely by replacing the synchronous Command::output() call with spawn()
and a timed wait: spawn the process in probe_nvidia_smi, poll with
child.try_wait() in a short sleep loop up to a chosen timeout (e.g., 5s), handle
Ok(Some(status)) (return None on non-zero), on timeout kill the child and return
None, then call child.wait_with_output() to read stdout and proceed with the
existing UTF-8/empty checks; include a debug log (e.g., tracing::debug!) when a
timeout occurs and ensure all error cases still return None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96b46e0f-6b3a-4954-aedc-f0796553d49e

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae2bf8 and bb070cc.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml
  • src/openhuman/local_ai/device.rs
  • src/openhuman/local_ai/service/bootstrap.rs
  • src/openhuman/local_ai/service/speech.rs
  • src/openhuman/local_ai/service/whisper_engine.rs

Comment on lines +69 to +80
let mut params = WhisperContextParameters::default();

// Explicitly configure GPU acceleration based on device profile.
// The default `use_gpu` is `cfg!(feature = "_gpu")` which is only true
// when a GPU backend feature (metal, cuda, etc.) is compiled in.
params.use_gpu(has_gpu);

// Enable flash attention when GPU is available — improves throughput
// on both Metal and CUDA backends.
if has_gpu {
params.flash_attn(true);
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

whisper-rs WhisperContextParameters use_gpu method signature Rust

💡 Result:

WhisperContextParameters does not have a public use_gpu method. The official documentation at docs.rs/whisper-rs/latest/whisper_rs/struct.WhisperContextParameters.html lists no such method. Usage examples consistently use WhisperContextParameters::default without modification, as in WhisperContext::new_with_params(model_path, WhisperContextParameters::default). GPU support is enabled via crate feature flags like cuda, metal, vulkan, etc., which implicitly set GPU flags at runtime per the README. A set_use_gpu method exists on WhisperVadContextParams, but that's for VAD only. Code snippets setting ctx_params.use_gpu = true appear to be incorrect or from forks/other crates.

Citations:


🏁 Script executed:

# First, find and check the whisper_engine.rs file
cat -n src/openhuman/local_ai/service/whisper_engine.rs | head -100

Repository: tinyhumansai/openhuman

Length of output: 4169


🏁 Script executed:

# Check the Cargo.toml for whisper-rs version
grep -r "whisper-rs" Cargo.toml app/src-tauri/Cargo.toml 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 343


🏁 Script executed:

# Search for any other uses of WhisperContextParameters in the codebase
rg "WhisperContextParameters" --type rust -B 2 -A 2

Repository: tinyhumansai/openhuman

Length of output: 930


🏁 Script executed:

# Search for any trait implementations or extensions to WhisperContextParameters in the codebase
rg "impl.*WhisperContextParameters|trait.*Whisper" --type rust -B 2 -A 5

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check if there are any builder pattern methods defined locally
rg "fn use_gpu|fn flash_attn" --type rust -B 2 -A 3

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Look for whisper-rs imports and usage patterns
rg "use_gpu|flash_attn" --type rust -B 1 -A 1

Repository: tinyhumansai/openhuman

Length of output: 1041


🏁 Script executed:

# Check the Cargo.lock or any dependency information for whisper-rs versions
find . -name "Cargo.lock" -type f | head -1 | xargs grep -A 5 "whisper-rs" 2>/dev/null | head -30

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check if there's any git history or information about the custom fork
git log --oneline -- src/openhuman/local_ai/service/whisper_engine.rs 2>/dev/null | head -10

Repository: tinyhumansai/openhuman

Length of output: 138


🏁 Script executed:

# Look for test files related to whisper
fd ".*whisper.*test.*" --type f 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check if there are any comments or documentation about WhisperContextParameters methods
rg "WhisperContextParameters|use_gpu|flash_attn" -B 3 -A 3 --type rust

Repository: tinyhumansai/openhuman

Length of output: 2645


🏁 Script executed:

# Check the git config or remotes to see if the custom fork URL is documented
git config --list 2>/dev/null | grep -i whisper || echo "No git config info"

Repository: tinyhumansai/openhuman

Length of output: 84


🏁 Script executed:

# Get the full commit details to see what changed
git show 9a7e75e --stat 2>/dev/null | head -50

Repository: tinyhumansai/openhuman

Length of output: 2935


🏁 Script executed:

# Check the actual diff of the whisper_engine.rs changes
git show 9a7e75e -- src/openhuman/local_ai/service/whisper_engine.rs 2>/dev/null | head -100

Repository: tinyhumansai/openhuman

Length of output: 3726


🏁 Script executed:

# Check if there's any documentation or notes about the custom fork in the repo
find . -name "*.md" -type f | xargs grep -l "whisper-rs-sys\|whisper_rs" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 85


🏁 Script executed:

# Check the memory file for any context
cat ./.claude/memory.md | head -100

Repository: tinyhumansai/openhuman

Length of output: 14761


🏁 Script executed:

# Try to find any references to whisper-rs fork or custom implementations
rg "use_gpu|flash_attn" --type rs -B 5 -A 5 | head -80

Repository: tinyhumansai/openhuman

Length of output: 92


🏁 Script executed:

# Check if there's a recent successful build
find target -name "libwhisper_engine*" -o -name "*whisper_engine*.rlib" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Look for any GitHub issues or PRs mentioning these methods
rg "use_gpu|flash_attn" --type md --type txt 2>/dev/null | head -20

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check if the code actually compiles by looking for the compiled file
ls -la target/debug/deps/ 2>/dev/null | grep whisper || echo "No compiled artifact found"

Repository: tinyhumansai/openhuman

Length of output: 92


🏁 Script executed:

# Look for any git submodules or fork references
find . -name ".gitmodules" -o -name "*.toml" | xargs grep -l "whisper-rs-sys" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 78


🏁 Script executed:

# Check the actual Cargo.toml to see the custom fork reference
cat app/src-tauri/Cargo.toml | grep -A 3 "whisper-rs"

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Search for any documentation about the custom fork
find . -type f \( -name "*.md" -o -name "*.txt" \) -exec grep -l "whisper-rs-sys\|github.com/tinyhumansai" {} \; 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 415


🏁 Script executed:

# Check BUILDING.md for information about whisper-rs setup
grep -i "whisper" docs/BUILDING.md

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check the main Cargo.toml to see the fork reference context
cat Cargo.toml | grep -B 2 -A 5 "whisper-rs-sys"

Repository: tinyhumansai/openhuman

Length of output: 683


🏁 Script executed:

# Check the architecture documentation
grep -i "whisper" gitbooks/technology/architecture.md | head -10

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Look for any mentions in README
grep -i "whisper" README.md | head -10

Repository: tinyhumansai/openhuman

Length of output: 48


🏁 Script executed:

# Check if there's any GitHub issue or discussion about the fork
grep -r "use_gpu\|flash_attn" --include="*.md" --include="*.rs" .

Repository: tinyhumansai/openhuman

Length of output: 491


🏁 Script executed:

# Look for any build.rs files that might indicate what the fork does
find . -name "build.rs" -type f -exec cat {} \;

Repository: tinyhumansai/openhuman

Length of output: 1444


🏁 Script executed:

# Check if there are any feature flags related to GPU configuration
rg "use_gpu|flash_attn|gpu.*acceleration" --type rs -B 3 -A 3

Repository: tinyhumansai/openhuman

Length of output: 92


Verify that use_gpu() and flash_attn() methods exist on WhisperContextParameters.

The code calls params.use_gpu(has_gpu) and params.flash_attn(true) without capturing return values (lines 74 and 79). However, the official whisper-rs 0.16 API documentation does not list these methods on WhisperContextParameters. While this project uses a custom fork of whisper-rs-sys for Windows MSVC compatibility, that fork's documented changes do not include adding GPU or flash attention configuration methods.

Verify:

  1. These methods exist on the custom fork's WhisperContextParameters
  2. If they don't exist, either add them to the fork or use an alternative GPU configuration approach supported by whisper-rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/service/whisper_engine.rs` around lines 69 - 80, The
code calls WhisperContextParameters.use_gpu(has_gpu) and
WhisperContextParameters.flash_attn(true) but those methods may not exist on the
custom whisper-rs-sys fork; inspect the forked definition of
WhisperContextParameters and confirm the presence and signatures of use_gpu and
flash_attn, and if absent either add equivalent setter methods to the fork
(preserving API and safety) or change this code to use the supported GPU
configuration mechanism provided by whisper-rs (e.g., setting explicit fields on
WhisperContextParameters, using a builder or passing GPU/backend options to the
context creation function); specifically update the call sites around params,
WhisperContextParameters and the has_gpu logic to match the verified API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods exist on WhisperContextParameters in whisper-rs 0.16. See whisper_ctx.rs:487-493:

pub fn use_gpu(&mut self, use_gpu: bool) -> &mut Self {
    self.use_gpu = use_gpu;
    self
}
pub fn flash_attn(&mut self, flash_attn: bool) -> &mut Self {
    self.flash_attn = flash_attn;
    self
}

The web query result was incorrect — these are public setter methods, not fields on WhisperVadContextParams. Code compiles and Metal acceleration is confirmed working (~159ms for 3.1s audio on Apple M2).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxoxDev, thank you for the correction and for pointing directly to the upstream source. You're right — use_gpu() and flash_attn() are confirmed public setter methods on WhisperContextParameters in whisper-rs v0.16.0, and my web search result was incorrect (it confused WhisperVadContextParams with WhisperContextParameters). The implementation is valid, and the confirmed ~159ms transcription of 3.1s audio on Apple M2 is an excellent result. Apologies for the noise.


✏️ Learnings added
Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 571
File: src/openhuman/local_ai/service/whisper_engine.rs:69-80
Timestamp: 2026-04-14T19:59:04.826Z
Learning: In whisper-rs v0.16.0 (crate `whisper-rs`), `WhisperContextParameters` has two public builder-style setter methods: `use_gpu(&mut self, use_gpu: bool) -> &mut Self` and `flash_attn(&mut self, flash_attn: bool) -> &mut Self` (see whisper-rs/blob/v0.16.0/src/whisper_ctx.rs:487-493). Do not flag calls to these methods as missing/invalid — they exist in the upstream crate and are not limited to `WhisperVadContextParams`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 271
File: app/src/pages/onboarding/Onboarding.tsx:65-85
Timestamp: 2026-04-02T13:53:41.803Z
Learning: When reviewing `local_ai` Tauri backend RPCs in `src/openhuman/local_ai/**`, note that `local_ai_download` and `local_ai_download_all_assets` both delegate to the same `service.download_all_models()` obtained from `local_ai::global(config)`, which uses shared internal state (via a shared `Arc`) to make concurrent/duplicate downloads an in-flight no-op. Because the backend is already idempotent for concurrent calls, don’t require or introduce separate frontend cross-component/concurrency guards to prevent duplicate downloads.

Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.

Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/openhuman/local_ai/device.rs (2)

79-95: Add structured log prefixes for grep-friendly diagnostics.

The debug logs at lines 79, 85, 91, and 95 lack a consistent prefix. Per coding guidelines, logs should use stable prefixes like [device] or [local_ai] for easier filtering and correlation. The related whisper_engine.rs uses a LOG_PREFIX pattern.

♻️ Suggested fix: add a structured prefix
+const LOG_PREFIX: &str = "[device]";
+
 fn detect_gpu(cpu_brand: &str, os_name: &str) -> (bool, Option<String>) {
     let brand_lower = cpu_brand.to_ascii_lowercase();
     let os_lower = os_name.to_ascii_lowercase();

     // Apple Silicon detection: brand contains "apple" or we're on macOS with an ARM chip.
     if brand_lower.contains("apple") || (os_lower.contains("mac") && brand_lower.contains("arm")) {
-        tracing::debug!("GPU detected: Apple Silicon (Metal)");
+        tracing::debug!("{LOG_PREFIX} GPU detected: Apple Silicon (Metal)");
         return (true, Some("Apple Silicon (Metal)".to_string()));
     }

     // Intel Mac: macOS with Intel CPU — no Metal GPU acceleration for whisper.
     if os_lower.contains("mac") {
-        tracing::debug!("Intel Mac detected — no GPU acceleration available for whisper");
+        tracing::debug!("{LOG_PREFIX} Intel Mac detected — no GPU acceleration available for whisper");
         return (false, Some("Intel Mac (no Metal GPU)".to_string()));
     }

     // Windows / Linux: probe for NVIDIA GPU via nvidia-smi.
     if let Some(desc) = probe_nvidia_smi() {
-        tracing::debug!("GPU detected via nvidia-smi: {desc}");
+        tracing::debug!("{LOG_PREFIX} GPU detected via nvidia-smi: {desc}");
         return (true, Some(desc));
     }

-    tracing::debug!("no GPU detected — voice model will use CPU");
+    tracing::debug!("{LOG_PREFIX} no GPU detected — voice model will use CPU");
     (false, None)
 }

As per coding guidelines: "Use structured, grep-friendly context in logs with stable prefixes (e.g., [domain], [rpc], [ui-flow]) and include correlation fields."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 79 - 95, The device detection
debug logs lack the standardized grep-friendly prefix; update the
tracing::debug! calls in src/openhuman/local_ai/device.rs (the Apple Silicon,
Intel Mac, nvidia-smi branch, and final no-GPU message) to include a stable
prefix such as "[device]" (or reuse an existing LOG_PREFIX constant if
available, e.g., LOG_PREFIX) and keep the original message and any variable
interpolation (e.g., include {desc} in the nvidia-smi log); ensure each call
uses the same prefix and optionally include a correlation field like device_id
or trace_id if present in scope.

101-124: Add trace-level logging for nvidia-smi probe failures.

When nvidia-smi is not found or fails, the function silently returns None. Adding a trace or debug log on the failure paths would help diagnose GPU detection issues on Windows/Linux systems without NVIDIA drivers or with misconfigured environments.

♻️ Suggested fix: log probe failures
 fn probe_nvidia_smi() -> Option<String> {
-    let output = std::process::Command::new("nvidia-smi")
+    let output = match std::process::Command::new("nvidia-smi")
         .args(["--query-gpu=name", "--format=csv,noheader"])
         .stdout(std::process::Stdio::piped())
         .stderr(std::process::Stdio::null())
         .output()
-        .ok()?;
+    {
+        Ok(o) => o,
+        Err(e) => {
+            tracing::trace!("{LOG_PREFIX} nvidia-smi not available: {e}");
+            return None;
+        }
+    };

     if !output.status.success() {
+        tracing::trace!("{LOG_PREFIX} nvidia-smi exited with non-zero status");
         return None;
     }

     let name = String::from_utf8_lossy(&output.stdout)
         .lines()
         .next()?
         .trim()
         .to_string();

     if name.is_empty() {
+        tracing::trace!("{LOG_PREFIX} nvidia-smi returned empty GPU name");
         return None;
     }

     Some(format!("NVIDIA {name} (CUDA)"))
 }

As per coding guidelines: "include logs at... external calls... and error handling paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 101 - 124, probe_nvidia_smi
currently returns None silently on failures; add trace-level logs on each
failure path to aid debugging: when Command::new("nvidia-smi").output() fails
(log the io::Error), when output.status.success() is false (log the exit status
and maybe stdout/stderr), and when the parsed name is empty (log the raw output
bytes from output.stdout). Use the existing tracing/log facility in the project
(e.g., trace! or log::trace!) and reference probe_nvidia_smi, the
Command::new("nvidia-smi") call, output.status, and output.stdout when adding
these messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/local_ai/device.rs`:
- Around line 79-95: The device detection debug logs lack the standardized
grep-friendly prefix; update the tracing::debug! calls in
src/openhuman/local_ai/device.rs (the Apple Silicon, Intel Mac, nvidia-smi
branch, and final no-GPU message) to include a stable prefix such as "[device]"
(or reuse an existing LOG_PREFIX constant if available, e.g., LOG_PREFIX) and
keep the original message and any variable interpolation (e.g., include {desc}
in the nvidia-smi log); ensure each call uses the same prefix and optionally
include a correlation field like device_id or trace_id if present in scope.
- Around line 101-124: probe_nvidia_smi currently returns None silently on
failures; add trace-level logs on each failure path to aid debugging: when
Command::new("nvidia-smi").output() fails (log the io::Error), when
output.status.success() is false (log the exit status and maybe stdout/stderr),
and when the parsed name is empty (log the raw output bytes from output.stdout).
Use the existing tracing/log facility in the project (e.g., trace! or
log::trace!) and reference probe_nvidia_smi, the Command::new("nvidia-smi")
call, output.status, and output.stdout when adding these messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5ae07ef-5f23-44f4-9c1b-c9f906b687eb

📥 Commits

Reviewing files that changed from the base of the PR and between bb070cc and 9a7e75e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml
  • src/openhuman/local_ai/device.rs
  • src/openhuman/local_ai/service/bootstrap.rs
  • src/openhuman/local_ai/service/speech.rs
  • src/openhuman/local_ai/service/whisper_engine.rs
✅ Files skipped from review due to trivial changes (2)
  • Cargo.toml
  • src/openhuman/local_ai/service/whisper_engine.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/openhuman/local_ai/service/speech.rs
  • src/openhuman/local_ai/service/bootstrap.rs

@senamakel senamakel merged commit b2c7445 into tinyhumansai:main Apr 14, 2026
8 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
…inyhumansai#558) (tinyhumansai#571)

* fix(device): expand GPU detection with Intel Mac and NVIDIA probes (tinyhumansai#558)

Add Intel Mac detection (no Metal GPU for whisper), nvidia-smi probe
for Windows/Linux NVIDIA GPUs, and diagnostic tracing at each
decision point in detect_gpu().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* build(cargo): enable whisper-rs Metal feature on macOS (tinyhumansai#558)

Add target-specific dependency for macOS that enables the `metal`
feature on whisper-rs, compiling whisper.cpp with Metal GPU support.
Cargo merges features from both declarations so non-macOS builds
are unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(whisper): configure GPU params from device profile (tinyhumansai#558)

Accept has_gpu and gpu_description in load_engine() and explicitly
set use_gpu and flash_attn on WhisperContextParameters instead of
relying on the compile-time default. Log the selected acceleration
backend at startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(local_ai): pass GPU info to whisper engine load paths (tinyhumansai#558)

Thread DeviceProfile has_gpu and gpu_description through both the
bootstrap (startup) and speech (lazy) whisper engine load calls so
the engine can configure Metal or CUDA acceleration at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix voice model GPU detection and acceleration selection

2 participants