fix(voice): enable GPU detection and Metal acceleration for whisper (#558) by oxoxDev · Pull Request #571 · tinyhumansai/openhuman

oxoxDev · 2026-04-14T19:41:40Z

Summary

Enable Metal GPU acceleration for whisper-rs on macOS via target-specific Cargo feature
Expand GPU detection to cover Intel Mac, Apple Silicon, and NVIDIA (via nvidia-smi)
Thread DeviceProfile GPU info through both bootstrap and lazy-load whisper engine paths
Whisper transcription now runs on Metal backend (~20x realtime on Apple M2)

Problem

Voice model GPU detection existed (DeviceProfile.has_gpu) but was never passed to the whisper engine. Meanwhile, whisper-rs was compiled without any GPU backend feature, so WhisperContextParameters::default() always set use_gpu: false. Result: voice transcription always ran on CPU regardless of available hardware acceleration.

Root causes:

Cargo.toml:83 — whisper-rs = "0.16" declared without GPU features, so the internal _gpu flag was never set
whisper_engine.rs:63 — WhisperContextParameters::default() used without configuring GPU params from the detected device profile
device.rs:73-84 — GPU detection was Apple Silicon only; no Intel Mac distinction, no NVIDIA probe for Windows/Linux

Solution

Cargo.toml — Added [target.'cfg(target_os = "macos")'.dependencies] section enabling metal feature on whisper-rs. Cargo merges features from both declarations so non-macOS builds are unaffected.
device.rs — Expanded detect_gpu() with Intel Mac detection (no Metal for whisper), nvidia-smi subprocess probe for Windows/Linux NVIDIA GPUs, and tracing::debug at each decision point.
whisper_engine.rs — Updated load_engine() to accept has_gpu and gpu_description params; explicitly configures use_gpu and flash_attn on WhisperContextParameters instead of relying on compile-time defaults.
bootstrap.rs — Passes DeviceProfile GPU info to whisper engine at bootstrap startup.
speech.rs — Detects device profile at lazy-load time and passes GPU info through.

Submission Checklist

Unit tests — cargo test passes (11/11 for device, whisper_engine, bootstrap modules)
N/A — E2E test infrastructure does not cover whisper GPU acceleration; verified manually via tauri dev with RUST_LOG=debug
Doc comments — Updated rustdoc on load_engine() and detect_gpu()
Inline comments — Added comments explaining GPU param configuration and feature merging

Manual verification:

Confirmed whisper_backend_init_gpu: using Metal backend in logs
Confirmed GPU name: Apple M2, MTLGPUFamilyMetal3 in Metal init
Confirmed transcription at 159ms for 3.1s audio (~20x realtime)
Confirmed voice dictation end-to-end working (recording → transcription → clipboard)

Impact

macOS (Apple Silicon): Whisper now uses Metal GPU — significant transcription speedup
macOS (Intel): Correctly reports no GPU acceleration available, falls back to CPU
Windows/Linux (NVIDIA): GPU detection via nvidia-smi; CUDA acceleration requires future cuda feature enablement
Windows/Linux (no NVIDIA): No behavior change, continues using CPU
No breaking API changes; load_engine signature change is crate-internal

Summary by CodeRabbit

New Features
- macOS builds include Metal GPU support for accelerated speech transcription.
- In-process speech engine now respects detected GPU presence and description when initializing.
Bug Fixes
- Improved GPU detection across macOS (Apple/Intel), Windows, and Linux (NVIDIA).
- Enhanced logging for clearer hardware-acceleration diagnostics.

coderabbitai · 2026-04-14T19:41:57Z

📝 Walkthrough

Walkthrough

Device GPU detection was expanded (Apple Metal, Intel Mac, NVIDIA via nvidia-smi) and GPU availability/description are now passed through service bootstrap/speech into the Whisper engine; load_engine was extended to accept and apply GPU settings (use_gpu, flash_attn).

Changes

Cohort / File(s)	Summary
Platform deps & device detection `Cargo.toml`, `src/openhuman/local_ai/device.rs`	Added macOS-only `whisper-rs` dependency (metal feature). Enhanced GPU detection: Apple Silicon detection, Intel Mac branch, NVIDIA probing via `nvidia-smi`, and explicit CPU fallback; added related debug logs and `probe_nvidia_smi()` helper.
Service integration `src/openhuman/local_ai/service/bootstrap.rs`, `src/openhuman/local_ai/service/speech.rs`	Detect device profile earlier and pass `has_gpu` + `gpu_description` into the blocking whisper engine load task; adjusted spawn closure to forward GPU parameters.
Whisper engine `src/openhuman/local_ai/service/whisper_engine.rs`	Extended `load_engine` signature to accept `has_gpu: bool` and `gpu_description: Option<&str>`; set `WhisperContextParameters` (`use_gpu`, enable `flash_attn` when GPU present) and updated logs/tests to reflect backend selection.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant DeviceDetector
  participant Service
  participant WhisperEngine
  participant System as "Host (nvidia-smi)"

  App->>Service: request init/load model
  Service->>DeviceDetector: detect_device_profile()
  DeviceDetector->>System: (on Windows/Linux) run `nvidia-smi`
  System-->>DeviceDetector: GPU name or failure
  DeviceDetector-->>Service: DeviceProfile(has_gpu, gpu_description)
  Service->>WhisperEngine: spawn_blocking load_engine(handle, model, has_gpu, gpu_description)
  WhisperEngine-->>Service: load result (success/failure)
  Service-->>App: return initialization result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(voice): reduce dictation hallucinations and improve Fn/focus reliability (#385) #409 — Modifies whisper_engine.rs and service/speech.rs (related signature/call-site changes).

Poem

🐇 I sniffed the silicon and heard a hum,
Metal, CUDA, whispers on the run,
nvidia-smi named the GPU true,
I hopped to pass the info through,
Now models hum faster — carrot crumbs for you! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(voice): enable GPU detection and Metal acceleration for whisper' clearly summarizes the main objective of the PR: enabling GPU detection and Metal acceleration for the whisper voice engine.
Linked Issues check	✅ Passed	All coding requirements from issue `#558` are met: GPU detection is expanded across platforms (Apple Silicon, Intel Mac, NVIDIA via nvidia-smi), Metal acceleration is enabled on macOS, GPU info is threaded through engine initialization, CPU fallback is explicit with logging, and test coverage is provided.
Out of Scope Changes check	✅ Passed	All changes directly support GPU detection and acceleration objectives. Cargo.toml adds Metal feature targeting, device.rs expands GPU detection logic, whisper_engine.rs configures GPU parameters at runtime, and bootstrap.rs/speech.rs thread device info through initialization paths.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…inyhumansai#558) Add Intel Mac detection (no Metal GPU for whisper), nvidia-smi probe for Windows/Linux NVIDIA GPUs, and diagnostic tracing at each decision point in detect_gpu(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

) Add target-specific dependency for macOS that enables the `metal` feature on whisper-rs, compiling whisper.cpp with Metal GPU support. Cargo merges features from both declarations so non-macOS builds are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

) Accept has_gpu and gpu_description in load_engine() and explicitly set use_gpu and flash_attn on WhisperContextParameters instead of relying on the compile-time default. Log the selected acceleration backend at startup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ai#558) Thread DeviceProfile has_gpu and gpu_description through both the bootstrap (startup) and speech (lazy) whisper engine load calls so the engine can configure Metal or CUDA acceleration at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/local_ai/device.rs (1)

99-124: Consider adding a timeout to the nvidia-smi subprocess.

The synchronous Command::output() call could block indefinitely if nvidia-smi hangs (e.g., due to driver issues). While rare, this would prevent startup from completing. A timeout would improve resilience.

♻️ Suggested timeout approach

fn probe_nvidia_smi() -> Option<String> {
    use std::time::Duration;
    
    let mut child = std::process::Command::new("nvidia-smi")
        .args(["--query-gpu=name", "--format=csv,noheader"])
        .stdout(std::process::Stdio::piped())
        .stderr(std::process::Stdio::null())
        .spawn()
        .ok()?;

    // Wait up to 5 seconds for nvidia-smi to complete.
    let start = std::time::Instant::now();
    loop {
        match child.try_wait() {
            Ok(Some(status)) if status.success() => break,
            Ok(Some(_)) => return None, // Non-zero exit
            Ok(None) if start.elapsed() > Duration::from_secs(5) => {
                let _ = child.kill();
                tracing::debug!("nvidia-smi timed out");
                return None;
            }
            Ok(None) => std::thread::sleep(Duration::from_millis(50)),
            Err(_) => return None,
        }
    }

    let output = child.wait_with_output().ok()?;
    let name = String::from_utf8_lossy(&output.stdout)
        .lines()
        .next()?
        .trim()
        .to_string();

    if name.is_empty() {
        return None;
    }
    Some(format!("NVIDIA {name} (CUDA)"))
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 99 - 124, The probe_nvidia_smi
function should avoid blocking indefinitely by replacing the synchronous
Command::output() call with spawn() and a timed wait: spawn the process in
probe_nvidia_smi, poll with child.try_wait() in a short sleep loop up to a
chosen timeout (e.g., 5s), handle Ok(Some(status)) (return None on non-zero), on
timeout kill the child and return None, then call child.wait_with_output() to
read stdout and proceed with the existing UTF-8/empty checks; include a debug
log (e.g., tracing::debug!) when a timeout occurs and ensure all error cases
still return None.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/local_ai/service/whisper_engine.rs`:
- Around line 69-80: The code calls WhisperContextParameters.use_gpu(has_gpu)
and WhisperContextParameters.flash_attn(true) but those methods may not exist on
the custom whisper-rs-sys fork; inspect the forked definition of
WhisperContextParameters and confirm the presence and signatures of use_gpu and
flash_attn, and if absent either add equivalent setter methods to the fork
(preserving API and safety) or change this code to use the supported GPU
configuration mechanism provided by whisper-rs (e.g., setting explicit fields on
WhisperContextParameters, using a builder or passing GPU/backend options to the
context creation function); specifically update the call sites around params,
WhisperContextParameters and the has_gpu logic to match the verified API.

---

Nitpick comments:
In `@src/openhuman/local_ai/device.rs`:
- Around line 99-124: The probe_nvidia_smi function should avoid blocking
indefinitely by replacing the synchronous Command::output() call with spawn()
and a timed wait: spawn the process in probe_nvidia_smi, poll with
child.try_wait() in a short sleep loop up to a chosen timeout (e.g., 5s), handle
Ok(Some(status)) (return None on non-zero), on timeout kill the child and return
None, then call child.wait_with_output() to read stdout and proceed with the
existing UTF-8/empty checks; include a debug log (e.g., tracing::debug!) when a
timeout occurs and ensure all error cases still return None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96b46e0f-6b3a-4954-aedc-f0796553d49e

📥 Commits

Reviewing files that changed from the base of the PR and between 7ae2bf8 and bb070cc.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

Cargo.toml
src/openhuman/local_ai/device.rs
src/openhuman/local_ai/service/bootstrap.rs
src/openhuman/local_ai/service/speech.rs
src/openhuman/local_ai/service/whisper_engine.rs

coderabbitai · 2026-04-14T19:52:17Z

+    let mut params = WhisperContextParameters::default();
+
+    // Explicitly configure GPU acceleration based on device profile.
+    // The default `use_gpu` is `cfg!(feature = "_gpu")` which is only true
+    // when a GPU backend feature (metal, cuda, etc.) is compiled in.
+    params.use_gpu(has_gpu);
+
+    // Enable flash attention when GPU is available — improves throughput
+    // on both Metal and CUDA backends.
+    if has_gpu {
+        params.flash_attn(true);
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

whisper-rs WhisperContextParameters use_gpu method signature Rust

💡 Result:

WhisperContextParameters does not have a public use_gpu method. The official documentation at docs.rs/whisper-rs/latest/whisper_rs/struct.WhisperContextParameters.html lists no such method. Usage examples consistently use WhisperContextParameters::default without modification, as in WhisperContext::new_with_params(model_path, WhisperContextParameters::default). GPU support is enabled via crate feature flags like cuda, metal, vulkan, etc., which implicitly set GPU flags at runtime per the README. A set_use_gpu method exists on WhisperVadContextParams, but that's for VAD only. Code snippets setting ctx_params.use_gpu = true appear to be incorrect or from forks/other crates.

Citations:

1: https://docs.rs/whisper-rs/latest/whisper_rs/struct.WhisperVadContextParams.html

2: https://docs.rs/whisper-rs/latest/whisper_rs/struct.WhisperContext.html

3: https://docs.rs/whisper-rs/latest/whisper_rs/

4: https://docs.rs/whisper-rs

5: https://crates.io/crates/whisper-rs

6: https://github.com/tazz4843/whisper-rs

7: https://docs.rs/whisper-rs/latest/src/whisper_rs/lib.rs.html

🏁 Script executed:

# First, find and check the whisper_engine.rs file cat -n src/openhuman/local_ai/service/whisper_engine.rs | head -100

Repository: tinyhumansai/openhuman

Length of output: 4169

🏁 Script executed:

# Check the Cargo.toml for whisper-rs version grep -r "whisper-rs" Cargo.toml app/src-tauri/Cargo.toml 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 343

🏁 Script executed:

# Search for any other uses of WhisperContextParameters in the codebase rg "WhisperContextParameters" --type rust -B 2 -A 2

Repository: tinyhumansai/openhuman

Length of output: 930

🏁 Script executed:

# Search for any trait implementations or extensions to WhisperContextParameters in the codebase rg "impl.*WhisperContextParameters|trait.*Whisper" --type rust -B 2 -A 5

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check if there are any builder pattern methods defined locally rg "fn use_gpu|fn flash_attn" --type rust -B 2 -A 3

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Look for whisper-rs imports and usage patterns rg "use_gpu|flash_attn" --type rust -B 1 -A 1

Repository: tinyhumansai/openhuman

Length of output: 1041

🏁 Script executed:

# Check the Cargo.lock or any dependency information for whisper-rs versions find . -name "Cargo.lock" -type f | head -1 | xargs grep -A 5 "whisper-rs" 2>/dev/null | head -30

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check if there's any git history or information about the custom fork git log --oneline -- src/openhuman/local_ai/service/whisper_engine.rs 2>/dev/null | head -10

Repository: tinyhumansai/openhuman

Length of output: 138

🏁 Script executed:

# Look for test files related to whisper fd ".*whisper.*test.*" --type f 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check if there are any comments or documentation about WhisperContextParameters methods rg "WhisperContextParameters|use_gpu|flash_attn" -B 3 -A 3 --type rust

Repository: tinyhumansai/openhuman

Length of output: 2645

🏁 Script executed:

# Check the git config or remotes to see if the custom fork URL is documented git config --list 2>/dev/null | grep -i whisper || echo "No git config info"

Repository: tinyhumansai/openhuman

Length of output: 84

🏁 Script executed:

# Get the full commit details to see what changed git show 9a7e75e --stat 2>/dev/null | head -50

Repository: tinyhumansai/openhuman

Length of output: 2935

🏁 Script executed:

# Check the actual diff of the whisper_engine.rs changes git show 9a7e75e -- src/openhuman/local_ai/service/whisper_engine.rs 2>/dev/null | head -100

Repository: tinyhumansai/openhuman

Length of output: 3726

🏁 Script executed:

# Check if there's any documentation or notes about the custom fork in the repo find . -name "*.md" -type f | xargs grep -l "whisper-rs-sys\|whisper_rs" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 85

🏁 Script executed:

# Check the memory file for any context cat ./.claude/memory.md | head -100

Repository: tinyhumansai/openhuman

Length of output: 14761

🏁 Script executed:

# Try to find any references to whisper-rs fork or custom implementations rg "use_gpu|flash_attn" --type rs -B 5 -A 5 | head -80

Repository: tinyhumansai/openhuman

Length of output: 92

🏁 Script executed:

# Check if there's a recent successful build find target -name "libwhisper_engine*" -o -name "*whisper_engine*.rlib" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Look for any GitHub issues or PRs mentioning these methods rg "use_gpu|flash_attn" --type md --type txt 2>/dev/null | head -20

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check if the code actually compiles by looking for the compiled file ls -la target/debug/deps/ 2>/dev/null | grep whisper || echo "No compiled artifact found"

Repository: tinyhumansai/openhuman

Length of output: 92

🏁 Script executed:

# Look for any git submodules or fork references find . -name ".gitmodules" -o -name "*.toml" | xargs grep -l "whisper-rs-sys" 2>/dev/null | head -5

Repository: tinyhumansai/openhuman

Length of output: 78

🏁 Script executed:

# Check the actual Cargo.toml to see the custom fork reference cat app/src-tauri/Cargo.toml | grep -A 3 "whisper-rs"

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Search for any documentation about the custom fork find . -type f \( -name "*.md" -o -name "*.txt" \) -exec grep -l "whisper-rs-sys\|github.com/tinyhumansai" {} \; 2>/dev/null

Repository: tinyhumansai/openhuman

Length of output: 415

🏁 Script executed:

# Check BUILDING.md for information about whisper-rs setup grep -i "whisper" docs/BUILDING.md

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check the main Cargo.toml to see the fork reference context cat Cargo.toml | grep -B 2 -A 5 "whisper-rs-sys"

Repository: tinyhumansai/openhuman

Length of output: 683

🏁 Script executed:

# Check the architecture documentation grep -i "whisper" gitbooks/technology/architecture.md | head -10

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Look for any mentions in README grep -i "whisper" README.md | head -10

Repository: tinyhumansai/openhuman

Length of output: 48

🏁 Script executed:

# Check if there's any GitHub issue or discussion about the fork grep -r "use_gpu\|flash_attn" --include="*.md" --include="*.rs" .

Repository: tinyhumansai/openhuman

Length of output: 491

🏁 Script executed:

# Look for any build.rs files that might indicate what the fork does find . -name "build.rs" -type f -exec cat {} \;

Repository: tinyhumansai/openhuman

Length of output: 1444

🏁 Script executed:

# Check if there are any feature flags related to GPU configuration rg "use_gpu|flash_attn|gpu.*acceleration" --type rs -B 3 -A 3

Repository: tinyhumansai/openhuman

Length of output: 92

Verify that use_gpu() and flash_attn() methods exist on WhisperContextParameters.

The code calls params.use_gpu(has_gpu) and params.flash_attn(true) without capturing return values (lines 74 and 79). However, the official whisper-rs 0.16 API documentation does not list these methods on WhisperContextParameters. While this project uses a custom fork of whisper-rs-sys for Windows MSVC compatibility, that fork's documented changes do not include adding GPU or flash attention configuration methods.

Verify:

These methods exist on the custom fork's WhisperContextParameters

If they don't exist, either add them to the fork or use an alternative GPU configuration approach supported by whisper-rs

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/local_ai/service/whisper_engine.rs` around lines 69 - 80, The code calls WhisperContextParameters.use_gpu(has_gpu) and WhisperContextParameters.flash_attn(true) but those methods may not exist on the custom whisper-rs-sys fork; inspect the forked definition of WhisperContextParameters and confirm the presence and signatures of use_gpu and flash_attn, and if absent either add equivalent setter methods to the fork (preserving API and safety) or change this code to use the supported GPU configuration mechanism provided by whisper-rs (e.g., setting explicit fields on WhisperContextParameters, using a builder or passing GPU/backend options to the context creation function); specifically update the call sites around params, WhisperContextParameters and the has_gpu logic to match the verified API.

These methods exist on WhisperContextParameters in whisper-rs 0.16. See whisper_ctx.rs:487-493:

pub fn use_gpu(&mut self, use_gpu: bool) -> &mut Self { self.use_gpu = use_gpu; self } pub fn flash_attn(&mut self, flash_attn: bool) -> &mut Self { self.flash_attn = flash_attn; self }

The web query result was incorrect — these are public setter methods, not fields on WhisperVadContextParams. Code compiles and Metal acceleration is confirmed working (~159ms for 3.1s audio on Apple M2).

@oxoxDev, thank you for the correction and for pointing directly to the upstream source. You're right — use_gpu() and flash_attn() are confirmed public setter methods on WhisperContextParameters in whisper-rs v0.16.0, and my web search result was incorrect (it confused WhisperVadContextParams with WhisperContextParameters). The implementation is valid, and the confirmed ~159ms transcription of 3.1s audio on Apple M2 is an excellent result. Apologies for the noise.

✏️ Learnings added

Learnt from: oxoxDev Repo: tinyhumansai/openhuman PR: 571 File: src/openhuman/local_ai/service/whisper_engine.rs:69-80 Timestamp: 2026-04-14T19:59:04.826Z Learning: In whisper-rs v0.16.0 (crate `whisper-rs`), `WhisperContextParameters` has two public builder-style setter methods: `use_gpu(&mut self, use_gpu: bool) -> &mut Self` and `flash_attn(&mut self, flash_attn: bool) -> &mut Self` (see whisper-rs/blob/v0.16.0/src/whisper_ctx.rs:487-493). Do not flag calls to these methods as missing/invalid — they exist in the upstream crate and are not limited to `WhisperVadContextParams`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: graycyrus Repo: tinyhumansai/openhuman PR: 271 File: app/src/pages/onboarding/Onboarding.tsx:65-85 Timestamp: 2026-04-02T13:53:41.803Z Learning: When reviewing `local_ai` Tauri backend RPCs in `src/openhuman/local_ai/**`, note that `local_ai_download` and `local_ai_download_all_assets` both delegate to the same `service.download_all_models()` obtained from `local_ai::global(config)`, which uses shared internal state (via a shared `Arc`) to make concurrent/duplicate downloads an in-flight no-op. Because the backend is already idempotent for concurrent calls, don’t require or introduce separate frontend cross-component/concurrency guards to prevent duplicate downloads.

Learnt from: sanil-23 Repo: tinyhumansai/openhuman PR: 416 File: src/openhuman/memory/relex.rs:441-464 Timestamp: 2026-04-07T15:49:51.275Z Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.

Learnt from: sanil-23 Repo: tinyhumansai/openhuman PR: 416 File: src/openhuman/memory/relex.rs:441-464 Timestamp: 2026-04-07T15:49:51.275Z Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.

coderabbitai

🧹 Nitpick comments (2)

src/openhuman/local_ai/device.rs (2)

79-95: Add structured log prefixes for grep-friendly diagnostics.

The debug logs at lines 79, 85, 91, and 95 lack a consistent prefix. Per coding guidelines, logs should use stable prefixes like [device] or [local_ai] for easier filtering and correlation. The related whisper_engine.rs uses a LOG_PREFIX pattern.

♻️ Suggested fix: add a structured prefix

+const LOG_PREFIX: &str = "[device]";
+
 fn detect_gpu(cpu_brand: &str, os_name: &str) -> (bool, Option<String>) {
     let brand_lower = cpu_brand.to_ascii_lowercase();
     let os_lower = os_name.to_ascii_lowercase();

     // Apple Silicon detection: brand contains "apple" or we're on macOS with an ARM chip.
     if brand_lower.contains("apple") || (os_lower.contains("mac") && brand_lower.contains("arm")) {
-        tracing::debug!("GPU detected: Apple Silicon (Metal)");
+        tracing::debug!("{LOG_PREFIX} GPU detected: Apple Silicon (Metal)");
         return (true, Some("Apple Silicon (Metal)".to_string()));
     }

     // Intel Mac: macOS with Intel CPU — no Metal GPU acceleration for whisper.
     if os_lower.contains("mac") {
-        tracing::debug!("Intel Mac detected — no GPU acceleration available for whisper");
+        tracing::debug!("{LOG_PREFIX} Intel Mac detected — no GPU acceleration available for whisper");
         return (false, Some("Intel Mac (no Metal GPU)".to_string()));
     }

     // Windows / Linux: probe for NVIDIA GPU via nvidia-smi.
     if let Some(desc) = probe_nvidia_smi() {
-        tracing::debug!("GPU detected via nvidia-smi: {desc}");
+        tracing::debug!("{LOG_PREFIX} GPU detected via nvidia-smi: {desc}");
         return (true, Some(desc));
     }

-    tracing::debug!("no GPU detected — voice model will use CPU");
+    tracing::debug!("{LOG_PREFIX} no GPU detected — voice model will use CPU");
     (false, None)
 }

As per coding guidelines: "Use structured, grep-friendly context in logs with stable prefixes (e.g., [domain], [rpc], [ui-flow]) and include correlation fields."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 79 - 95, The device detection
debug logs lack the standardized grep-friendly prefix; update the
tracing::debug! calls in src/openhuman/local_ai/device.rs (the Apple Silicon,
Intel Mac, nvidia-smi branch, and final no-GPU message) to include a stable
prefix such as "[device]" (or reuse an existing LOG_PREFIX constant if
available, e.g., LOG_PREFIX) and keep the original message and any variable
interpolation (e.g., include {desc} in the nvidia-smi log); ensure each call
uses the same prefix and optionally include a correlation field like device_id
or trace_id if present in scope.

101-124: Add trace-level logging for nvidia-smi probe failures.

When nvidia-smi is not found or fails, the function silently returns None. Adding a trace or debug log on the failure paths would help diagnose GPU detection issues on Windows/Linux systems without NVIDIA drivers or with misconfigured environments.

♻️ Suggested fix: log probe failures

 fn probe_nvidia_smi() -> Option<String> {
-    let output = std::process::Command::new("nvidia-smi")
+    let output = match std::process::Command::new("nvidia-smi")
         .args(["--query-gpu=name", "--format=csv,noheader"])
         .stdout(std::process::Stdio::piped())
         .stderr(std::process::Stdio::null())
         .output()
-        .ok()?;
+    {
+        Ok(o) => o,
+        Err(e) => {
+            tracing::trace!("{LOG_PREFIX} nvidia-smi not available: {e}");
+            return None;
+        }
+    };

     if !output.status.success() {
+        tracing::trace!("{LOG_PREFIX} nvidia-smi exited with non-zero status");
         return None;
     }

     let name = String::from_utf8_lossy(&output.stdout)
         .lines()
         .next()?
         .trim()
         .to_string();

     if name.is_empty() {
+        tracing::trace!("{LOG_PREFIX} nvidia-smi returned empty GPU name");
         return None;
     }

     Some(format!("NVIDIA {name} (CUDA)"))
 }

As per coding guidelines: "include logs at... external calls... and error handling paths."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/device.rs` around lines 101 - 124, probe_nvidia_smi
currently returns None silently on failures; add trace-level logs on each
failure path to aid debugging: when Command::new("nvidia-smi").output() fails
(log the io::Error), when output.status.success() is false (log the exit status
and maybe stdout/stderr), and when the parsed name is empty (log the raw output
bytes from output.stdout). Use the existing tracing/log facility in the project
(e.g., trace! or log::trace!) and reference probe_nvidia_smi, the
Command::new("nvidia-smi") call, output.status, and output.stdout when adding
these messages.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/local_ai/device.rs`:
- Around line 79-95: The device detection debug logs lack the standardized
grep-friendly prefix; update the tracing::debug! calls in
src/openhuman/local_ai/device.rs (the Apple Silicon, Intel Mac, nvidia-smi
branch, and final no-GPU message) to include a stable prefix such as "[device]"
(or reuse an existing LOG_PREFIX constant if available, e.g., LOG_PREFIX) and
keep the original message and any variable interpolation (e.g., include {desc}
in the nvidia-smi log); ensure each call uses the same prefix and optionally
include a correlation field like device_id or trace_id if present in scope.
- Around line 101-124: probe_nvidia_smi currently returns None silently on
failures; add trace-level logs on each failure path to aid debugging: when
Command::new("nvidia-smi").output() fails (log the io::Error), when
output.status.success() is false (log the exit status and maybe stdout/stderr),
and when the parsed name is empty (log the raw output bytes from output.stdout).
Use the existing tracing/log facility in the project (e.g., trace! or
log::trace!) and reference probe_nvidia_smi, the Command::new("nvidia-smi")
call, output.status, and output.stdout when adding these messages.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5ae07ef-5f23-44f4-9c1b-c9f906b687eb

📥 Commits

Reviewing files that changed from the base of the PR and between bb070cc and 9a7e75e.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

Cargo.toml
src/openhuman/local_ai/device.rs
src/openhuman/local_ai/service/bootstrap.rs
src/openhuman/local_ai/service/speech.rs
src/openhuman/local_ai/service/whisper_engine.rs

✅ Files skipped from review due to trivial changes (2)

Cargo.toml
src/openhuman/local_ai/service/whisper_engine.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/openhuman/local_ai/service/speech.rs
src/openhuman/local_ai/service/bootstrap.rs

…inyhumansai#558) (tinyhumansai#571) * fix(device): expand GPU detection with Intel Mac and NVIDIA probes (tinyhumansai#558) Add Intel Mac detection (no Metal GPU for whisper), nvidia-smi probe for Windows/Linux NVIDIA GPUs, and diagnostic tracing at each decision point in detect_gpu(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build(cargo): enable whisper-rs Metal feature on macOS (tinyhumansai#558) Add target-specific dependency for macOS that enables the `metal` feature on whisper-rs, compiling whisper.cpp with Metal GPU support. Cargo merges features from both declarations so non-macOS builds are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(whisper): configure GPU params from device profile (tinyhumansai#558) Accept has_gpu and gpu_description in load_engine() and explicitly set use_gpu and flash_attn on WhisperContextParameters instead of relying on the compile-time default. Log the selected acceleration backend at startup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(local_ai): pass GPU info to whisper engine load paths (tinyhumansai#558) Thread DeviceProfile has_gpu and gpu_description through both the bootstrap (startup) and speech (lazy) whisper engine load calls so the engine can configure Metal or CUDA acceleration at runtime. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

senamakel previously approved these changes Apr 14, 2026

View reviewed changes

oxoxDev and others added 4 commits April 15, 2026 01:16

oxoxDev dismissed senamakel’s stale review via 9a7e75e April 14, 2026 19:48

oxoxDev force-pushed the fix/558-gpu-detection-acceleration branch from bb070cc to 9a7e75e Compare April 14, 2026 19:48

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

senamakel approved these changes Apr 14, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

senamakel merged commit b2c7445 into tinyhumansai:main Apr 14, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): enable GPU detection and Metal acceleration for whisper (#558)#571

fix(voice): enable GPU detection and Metal acceleration for whisper (#558)#571
senamakel merged 4 commits into
tinyhumansai:mainfrom
oxoxDev:fix/558-gpu-detection-acceleration

oxoxDev commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 14, 2026 •

edited

Loading

Uh oh!

oxoxDev Apr 14, 2026

Uh oh!

coderabbitai Bot Apr 14, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxoxDev commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oxoxDev Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oxoxDev commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

coderabbitai Bot Apr 14, 2026 •

edited

Loading