Skip to content

feat: Add voice input with pluggable backend (Gemini zero-install + Whisper)#18499

Open
fayerman-source wants to merge 78 commits intogoogle-gemini:mainfrom
fayerman-source:feat/voice-input-clean
Open

feat: Add voice input with pluggable backend (Gemini zero-install + Whisper)#18499
fayerman-source wants to merge 78 commits intogoogle-gemini:mainfrom
fayerman-source:feat/voice-input-clean

Conversation

@fayerman-source
Copy link
Copy Markdown
Contributor

@fayerman-source fayerman-source commented Feb 7, 2026

Summary

This PR adds native voice input to Gemini CLI with a pluggable backend architecture:

  • gemini as the default zero-install transcription backend
  • optional local whisper support for users who want a local binary workflow

This addresses the request for a zero-install option while keeping a local backend available for users who prefer it.

Update: Gemini Backend Added (addresses @jacob314's feedback)

Per @jacob314's comment requesting a zero-install option using the Gemini API, this PR adds a GeminiRestBackend that:

  • Records audio locally via sox or arecord
  • Converts raw PCM to WAV in-memory
  • Transcribes via generateContent using the user's existing Gemini auth
  • Works with existing OAuth or API key authentication

Note: This uses generateContent with audio rather than the Live API streaming approach. The Live API is designed for real-time bidirectional sessions; generateContent is simpler and sufficient for push-to-talk transcription.

Features

Feature Description
Voice Recording Press Space Space on an empty prompt to start or stop recording
Zero-install backend Gemini transcription via existing auth with no Python or model downloads
Pluggable backends voice.provider: "gemini" (default) or "whisper" (local)
Auto Language Detection Gemini and Whisper auto-detect spoken language
Visual Feedback while recording, spinner while transcribing
Silence Filtering voice.silenceThreshold drops silent recordings before transcription
Slash commands /voice, /voice enable, /voice disable, `/voice provider [gemini

Settings

{
  "voice": {
    "enabled": true,                  // default: false
    "provider": "gemini",             // default: "gemini", or "whisper"
    "silenceThreshold": 80,           // default: 80, 0 disables silence filtering
    "whisperPath": "/usr/bin/whisper" // optional, only used for whisper
  }
}

Architecture

useVoiceInput (hook)
  ├── GeminiRestBackend   ← default: sox/arecord → WAV buffer → generateContent
  └── LocalWhisperBackend ← optional: sox/arecord → WAV file → whisper binary

Key Design Decisions

Decision Rationale
Event-based transcript delivery Uses coreEvents to avoid React re-render cascades
In-memory Gemini path Keeps Gemini transcription off disk
Existing auth reuse Uses config.getContentGenerator() for OAuth or API key flows
Toggle race protection Uses isTogglingRef to prevent overlapping toggles
Silence filtering Applies voice.silenceThreshold before transcription work is done
Clear recording feedback Shows while recording and a spinner while transcribing

Audio Pipeline

Gemini Backend

Step Behavior
1 Record raw PCM via sox or arecord
2 Concatenate PCM chunks in memory and prepend a WAV header
3 Send audio/wav to generateContent
4 Emit transcript text through the event bus and insert it at the cursor

Whisper Backend

Step Behavior
1 Record WAV audio via sox or arecord
2 Drop silent recordings using voice.silenceThreshold
3 Invoke the configured Whisper binary
4 Emit transcript text through the event bus and insert it at the cursor

Commands

Command Purpose
/voice Show current voice settings
/voice enable Enable voice input
/voice disable Disable voice input
/voice provider [gemini|whisper] Select the transcription backend
/voice sensitivity <0-1000> Set silence filtering
/voice set-path <path> Set the Whisper binary path
/voice help Show voice help

Validation

Manual Validation

Check Result
End-to-end voice recording and transcription
Space Space start/stop behavior
Esc cancels active recording
gemini backend
whisper backend
Prompt alignment remains stable during idle, recording, and transcribing

Automated Validation

Command Result
npm run build -w @google/gemini-cli-core
npm run typecheck
npm test -w @google/gemini-cli -- src/nonInteractiveCli.test.ts
npm test -w @google/gemini-cli-core -- src/services/voice/LocalWhisperBackend.test.ts
npm test -w @google/gemini-cli -- --coverage.enabled=false src/nonInteractiveCli.test.ts src/ui/components/InputPrompt.test.tsx
npm test -w @google/gemini-cli -- src/ui/hooks/useVoiceInput.test.ts src/ui/hooks/useVoiceInput.replication.test.tsx src/ui/hooks/useVoiceInput.stress.test.ts src/ui/hooks/useVoiceInput.log-volume.test.ts
npm run build

Follow-up Work

Not included in this PR
Deeper /settings TUI integration for voice configuration
Further UI polish around discoverability of sensitivity defaults

Closes #18067

@fayerman-source fayerman-source requested review from a team as code owners February 7, 2026 01:05
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @fayerman-source, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Gemini CLI by integrating a native voice input capability. This feature aims to streamline user interaction by enabling natural speech-to-text conversion, making the CLI more accessible and efficient. The implementation focuses on a robust, event-driven design to maintain performance and provides clear feedback to the user throughout the voice input process.

Highlights

  • Native Voice Input: Introduces a native voice input feature for the Gemini CLI, allowing users to speak commands and input text directly.
  • Whisper Integration: Leverages OpenAI Whisper (or compatible alternatives like faster-whisper, whisper-cli) for accurate speech-to-text transcription.
  • Flexible Activation: Users can toggle voice recording using keyboard shortcuts (Alt+R or Ctrl+Q) or the new /voice slash command.
  • Dynamic Language Detection: The system automatically detects the spoken language, removing the need for hardcoded language settings.
  • User Feedback: Provides clear visual indicators (🎤 for recording, ⏳ for transcribing) to inform the user of the voice input status.
  • Optimized Architecture: Implements an event-based architecture for transcript delivery, utilizing EventEmitter to prevent unnecessary React re-renders and ensure UI responsiveness.
  • Robustness: Includes race condition protection (isTogglingRef) and process guards to ensure stable recording and transcription operations.
  • Configurable Backend: Supports various Whisper backends and allows users to specify a custom whisperPath in settings for advanced configurations.
Changelog
  • docs/cli/keyboard-shortcuts.md
    • Added a new 'Voice Input' section detailing the Alt + R and Ctrl + Q keybindings for toggling voice recording.
  • docs/cli/settings.md
    • Introduced a new 'Voice' section with the voice.whisperPath setting, allowing users to specify the path to their Whisper executable.
  • docs/get-started/configuration.md
    • Added documentation for the new voice.whisperPath setting under the voice category.
  • package-lock.json
    • Updated several dependency entries to include "peer": true.
  • packages/cli/src/config/keyBindings.ts
    • Defined a new Command.VOICE_INPUT enum member.
    • Added default keybindings (Alt+R, Ctrl+Q) for the VOICE_INPUT command.
    • Included 'Voice Input' in the commandCategories for better organization.
  • packages/cli/src/config/settingsSchema.ts
    • Added a new voice object to the SETTINGS_SCHEMA to manage voice input configurations, including the whisperPath property.
  • packages/cli/src/services/BuiltinCommandLoader.ts
    • Imported the new voiceCommand.
    • Registered voiceCommand in the list of built-in slash commands.
  • packages/cli/src/test-utils/render.tsx
    • Imported VoiceContext and VoiceInputReturn for testing purposes.
    • Added mockVoiceReturn to provide a default mock for the voice input hook in tests.
    • Modified renderWithProviders to accept and provide a voice mock to the VoiceContext.Provider.
  • packages/cli/src/ui/AppContainer.tsx
    • Imported VoiceContext and useVoiceInput.
    • Integrated the useVoiceInput hook to manage voice recording state and actions.
    • Updated the UIActionsContext value to include toggleVoice.
    • Added keybinding handling for Command.VOICE_INPUT to trigger voice.toggleRecording().
    • Wrapped the main App component with VoiceContext.Provider to make voice input functionality available throughout the UI.
  • packages/cli/src/ui/commands/types.ts
    • Added toggleVoice: () => void; to the CommandContext['ui'] interface.
  • packages/cli/src/ui/commands/voiceCommand.ts
    • New file: Implemented the /voice slash command, which toggles voice recording and provides help documentation when /voice help is used.
  • packages/cli/src/ui/components/Composer.tsx
    • Added a key="input-prompt" prop to the InputPrompt component.
  • packages/cli/src/ui/components/InputPrompt.tsx
    • Imported useVoiceContext and onVoiceTranscript.
    • Integrated useVoiceContext to access voice input state.
    • Added a useEffect hook to subscribe to onVoiceTranscript events, inserting transcribed text into the buffer.
    • Modified keybinding handling for Escape and Quit to stop voice recording if active.
    • Added keybinding handling for Command.VOICE_INPUT to toggle recording.
    • Updated status text and indicator logic to display voice input states (recording, transcribing, error) and their corresponding icons (🎤, ⏳).
  • packages/cli/src/ui/components/snapshots/AskUserDialog.test.tsx.snap
    • Updated snapshots due to changes in the rendering tree, likely related to the VoiceContext integration.
  • packages/cli/src/ui/components/snapshots/InputPrompt.test.tsx.snap
    • Updated snapshots due to changes in the rendering tree and the integration of VoiceContext.
  • packages/cli/src/ui/components/snapshots/RewindViewer.test.tsx.snap
    • Updated snapshots due to changes in the rendering tree.
  • packages/cli/src/ui/components/shared/snapshots/BaseSelectionList.test.tsx.snap
    • Updated snapshots due to changes in the rendering tree.
  • packages/cli/src/ui/contexts/VoiceContext.test.tsx
    • New file: Added unit tests for VoiceContext to ensure it provides the correct voice input state.
  • packages/cli/src/ui/contexts/VoiceContext.tsx
    • New file: Created VoiceContext and useVoiceContext for managing and accessing voice input state, and re-exported onVoiceTranscript.
  • packages/cli/src/ui/hooks/slashCommandProcessor.test.tsx
    • Updated the mock ui object in tests to include the new toggleVoice function.
  • packages/cli/src/ui/hooks/slashCommandProcessor.ts
    • Added toggleVoice: () => void; to the SlashCommandProcessorActions interface.
    • Included actions.toggleVoice in the CommandContext['ui'] object passed to slash commands.
  • packages/cli/src/ui/hooks/useVoiceInput.log-volume.test.ts
    • New file: Added tests to verify that useVoiceInput remains silent (no visible logs) during normal operations and when processing noisy sox stderr output.
  • packages/cli/src/ui/hooks/useVoiceInput.replication.test.tsx
    • New file: Added a replication test to confirm that the event-based transcript delivery works as intended without causing excessive React re-renders in consumer components.
  • packages/cli/src/ui/hooks/useVoiceInput.stress.test.ts
    • New file: Added stress tests to ensure the useVoiceInput hook handles rapid sox progress logs and multiple toggleRecording calls efficiently without excessive re-renders.
  • packages/cli/src/ui/hooks/useVoiceInput.test.ts
    • New file: Comprehensive unit tests for useVoiceInput, covering initialization, starting/stopping recording, sox/arecord fallback, error handling, and event-based transcript emission.
  • packages/cli/src/ui/hooks/useVoiceInput.ts
    • New file: Implemented the core useVoiceInput hook, managing recording processes (sox/arecord), temporary audio file creation, Whisper transcription, and event-based transcript emission to prevent React re-render loops. Includes state management for recording/transcribing status and error handling.
  • packages/cli/src/ui/keyMatchers.test.ts
    • Updated tests for Command.PASTE_CLIPBOARD to include Cmd + V and Alt + V.
    • Added tests for the new Command.VOICE_INPUT keybindings (Alt + R).
  • packages/cli/src/ui/noninteractive/nonInteractiveUi.ts
    • Added a no-op toggleVoice function to the non-interactive UI mock.
  • schemas/settings.schema.json
    • Added the voice property to the root of the settings schema, defining its structure and the whisperPath sub-property.
Activity
  • The author, fayerman-source, initiated this pull request to introduce a significant new feature: native voice input.
  • A detailed summary, demo video, feature list, and technical implementation overview were provided in the PR description.
  • Comprehensive testing was performed, including passing existing tests, adding a new test suite (7 tests), stress tests for rapid toggling, and a clean slate verification.
  • Security and quality considerations were addressed, ensuring no secrets in code, proper temp file cleanup, and adherence to existing code style.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fantastic new feature for native voice input using Whisper. The implementation is well-structured, leveraging an event-based architecture to prevent UI re-render loops, and includes comprehensive documentation and tests. However, it introduces critical Command Injection vulnerabilities by using child_process.exec with user-controlled settings and unvalidated file paths, specifically in the handling of the user-configurable whisperPath setting. This poses a high security risk, potentially leading to Remote Code Execution (RCE), and must be addressed before merging.

Comment on lines +259 to +312
const runWhisper = async (binary: string, args: string) => {
// If it looks like an absolute path, verify existence directly
// This avoids 'which' issues with PATH
if (binary.startsWith('/') || binary.startsWith('.')) {
// Check for common configuration error: path with incorrect quotes
if (binary.includes("'") || binary.includes('"')) {
const sanitized = binary.replace(/['"]/g, '');
if (sanitized !== binary) {
try {
await access(sanitized);
if (!sanitizedPathLoggedRef.current) {
debugLogger.log(
'useVoiceInput: found sanitized path, using it instead',
sanitized,
);
sanitizedPathLoggedRef.current = true;
}
binary = sanitized;
} catch {
// Sanitized path also doesn't exist, proceed with original to let it fail or be logged
}
}
}

try {
await access(binary);
} catch {
// If access fails, we'll fall through to try executing it (or it might fail there)
// but checking 'which' on an absolute path is redundant/incorrect.
debugLogger.log(
'useVoiceInput: explicit path access check failed, but will try execution',
binary,
);
}
} else {
// For command names, try to find them first
try {
await execAsync(`which ${binary}`);
} catch {
// If which fails, we might still try running it if we suspect it's in the PATH but not found by `sh -c which`
// (e.g. some obscure shell setup). But usually `which` failure is authoritative.
// However, for user convenience, let's allow proceeding if it's the configured path.
if (binary === config?.whisperPath) {
debugLogger.log(
'useVoiceInput: `which` failed for configured path, but proceeding anyway',
);
} else {
throw new Error(`Command not found: ${binary}`);
}
}
}

await execAsync(`${binary} ${args}`);
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The runWhisper function uses exec (via execAsync) to execute a command constructed from the user-controlled whisperPath setting. This creates a critical Command Injection vulnerability. The binary variable, derived from the user-configurable voice.whisperPath setting, is directly interpolated into a shell command. An attacker could provide a malicious path like "/path/to/whisper; malicious_command" in the settings to achieve Remote Code Execution (RCE) on the user's machine. The which ${binary} check on line 296 is also vulnerable to injection.

Remediation: To mitigate this, replace child_process.exec (or execAsync) with child_process.spawn and pass arguments as an array. This prevents shell interpretation of user-controlled input. If the whisperPath setting needs to support arguments, parse the string into an executable and an array of arguments safely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ce4b23c and 1df335c. All exec() calls have been replaced with execFile() using argument arrays — no shell is spawned, so no shell interpretation of user input. Additionally, validateBinaryPath() rejects any path containing shell metacharacters. The which check also uses execFile('which', [cmd]) now.

Comment on lines +333 to +336
await execAsync(
`whisper-faster "${audioFile}" --model tiny --output_format txt --output_dir "${tempDirRef.current}"`,
);
// Read the transcript file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The fallback logic for whisper-faster uses exec with concatenated arguments, including file paths. If the temporary directory path or the filename contains shell metacharacters, it could lead to command injection.

Remediation: Use child_process.spawn with an arguments array to safely execute external commands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the same commits. whisper-faster is now invoked via execFileAsync('whisper-faster', [audioFile, '--model', 'tiny', ...]) — arguments passed as an array, no string concatenation.

Comment on lines +342 to +345
await execAsync('which whisper');
await execAsync(
`whisper "${audioFile}" --model tiny --output_format txt --output_dir "${tempDirRef.current}"`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The fallback logic for whisper uses exec with concatenated arguments, which is vulnerable to command injection.

Remediation: Use child_process.spawn with an arguments array to safely execute external commands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix — whisper fallback now uses execFileAsync('whisper', [audioFile, '--model', 'tiny', ...]) with an argument array.

@gemini-cli gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Feb 7, 2026
@fayerman-source
Copy link
Copy Markdown
Contributor Author

🔒 Security Fix Applied

Fixed the command injection vulnerability identified by @gemini-code-assist:

Changes:

  1. **Replaced with ** - No shell spawned, no shell interpretation
  2. Arguments passed as arrays - Never concatenate user input into command strings
  3. **Added ** - Rejects paths with shell metacharacters (, , , , , etc.)
  4. Removed all usage - Including the checks, now using

Before (Vulnerable):

After (Secure):

The fix follows Node.js security best practices (2024) and prevents RCE even with malicious settings.

@fayerman-source fayerman-source force-pushed the feat/voice-input-clean branch 2 times, most recently from 92776cf to 1df335c Compare February 7, 2026 06:05
@gemini-cli gemini-cli bot added priority/p2 Important but can be addressed in a future release. help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Feb 7, 2026
@fayerman-source
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-architected voice input feature. The use of an event-based system for transcript delivery is a great choice to prevent performance issues, and the implementation shows good consideration for security and cross-platform support. I've identified two high-severity issues in the core useVoiceInput hook concerning platform portability and a potential race condition that should be addressed to ensure the feature's robustness. Overall, this is a strong contribution.

@fayerman-source fayerman-source force-pushed the feat/voice-input-clean branch 5 times, most recently from 6bd08ba to c6c7329 Compare February 14, 2026 15:43
@fayerman-source
Copy link
Copy Markdown
Contributor Author

🚀 Voice Input Enhancements & Fixes

I've updated this PR with several improvements to stability, configuration, and developer experience:

🛠 Changes

  • Static Configuration: Added /voice set-path <path> subcommand. This allows users to configure the Whisper path instantly without triggering an 'agentic search' loop in the CLI.
  • Improved Stability: Converted debugLogger.log calls in useVoiceInput.ts to .debug() level. This fixes the log-volume.test.ts failures while keeping the TUI clean for end-users.
  • UX Refinement: Patched the non-interactive CLI runner to correctly handle slash commands and exit immediately if no follow-up prompt is returned.
  • Code Quality: Wrapped case blocks in nonInteractiveCliCommands.ts to resolve ESLint lexical declaration errors.

✅ Validation

  • Benchmarking: Verified that faster-whisper provides a ~5.8x speedup (0.48s vs 2.8s) over standard Whisper on local hardware.
  • Preflight: Successfully ran the full repository preflight suite (npm run preflight), confirming all tests, types, and linting pass.

These changes make the feature more 'production-ready' by aligning with the CLI's existing command patterns.

@fayerman-source
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a native voice input feature using Whisper, which is a great addition to the CLI's accessibility and usability. The implementation is well-structured, using an event-based architecture to prevent performance issues and including comprehensive tests for the new functionality. The code demonstrates good security practices by validating user-provided paths and using execFile to prevent command injection.

I've identified two high-severity issues in the useVoiceInput hook that should be addressed: a resource leak due to temporary directories not being cleaned up, and a portability issue with the command existence check that will likely cause the feature to fail on Windows. My detailed comments provide suggestions for fixing these issues.

void unlink(audioFileRef.current).catch(() => {});
}
if (tempDirRef.current) {
// Clean up temp dir
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The temporary directory created by mkdtemp for storing audio recordings is not being cleaned up. The useEffect cleanup hook has a // Clean up temp dir comment but no implementation. This will lead to an accumulation of gemini-voice-* directories in the system's temporary folder, causing a resource leak.

While this cleanup hook should be fixed for robustness on unmount, the primary cleanup should happen at the end of the stopRecording function, after the audio and transcript files have been processed.

Please add logic to remove the temporary directory in both places. You'll need to import rm from node:fs/promises.

Suggested change
// Clean up temp dir
void rm(tempDirRef.current, { recursive: true, force: true });
References
  1. Use the node: prefix when importing built-in Node.js modules for consistency across the codebase.
  2. Use asynchronous file system operations (e.g., fs.promises.readFile) instead of synchronous ones (e.g., fs.readFileSync) to avoid blocking the event loop.

Comment on lines +106 to +111
const commandExists = (cmd: string): Promise<boolean> =>
new Promise((resolve) => {
execFile('which', [cmd], (error) => {
resolve(!error);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The commandExists helper function uses the which command to check for the existence of sox and arecord. The which command is specific to Unix-like systems and is not available on Windows by default. This will cause the voice input feature to fail on Windows, as it won't be able to find the necessary recording tools.

To ensure cross-platform compatibility, please use a method that works on all supported operating systems. A simple solution is to use where on Windows and which on other platforms by checking process.platform.

Suggested change
const commandExists = (cmd: string): Promise<boolean> =>
new Promise((resolve) => {
execFile('which', [cmd], (error) => {
resolve(!error);
});
});
const commandExists = (cmd: string): Promise<boolean> =>
new Promise((resolve) => {
const checkCmd = process.platform === 'win32' ? 'where' : 'which';
execFile(checkCmd, [cmd], (error) => {
resolve(!error);
});
});

@fayerman-source
Copy link
Copy Markdown
Contributor Author

🛠 Feedback Addressed

I've implemented the high-severity fixes identified in the latest review:

  • Fixed Resource Leak: Temporary directories are now rigorously cleaned up using rm -rf equivalent (fs.rm) during both unmount and the normal transcription lifecycle.
  • Improved Portability: Replaced the which dependency with a cross-platform commandExists helper that manually traverses the PATH (and respects PATHEXT on Windows).
  • TS Compliance: Resolved environment variable access issues to satisfy the project's strict TypeScript configuration.

All preflight checks are passing locally. @gemini-code-assist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the detailed update, @fayerman-source! I appreciate you addressing the feedback from the previous review.

The implemented changes, especially the security fix for command injection by using execFile with argument arrays and the validateBinaryPath function, are crucial and significantly improve the robustness and security of the voice input feature. The rigorous cleanup of temporary directories and the improved cross-platform commandExists helper also demonstrate a strong commitment to quality and maintainability.

These updates look great and contribute significantly to making the voice input feature production-ready. Excellent work!

@fayerman-source
Copy link
Copy Markdown
Contributor Author

fayerman-source commented Feb 19, 2026

@jacob314 @skeshive What's the game plan for the voice input feature? Happy to iterate on this together to get this into production. Cheers!

@jacob314 jacob314 reopened this Feb 19, 2026
@fayerman-source
Copy link
Copy Markdown
Contributor Author

@jacob314 @scidomino Checking in on this. Is there anything else we need before merging?

Comment on lines +95 to +99
this.recordingProcess.stdout?.on('data', (chunk: Buffer) => {
if (!this.isStopping) {
this.audioChunks.push(chunk);
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the final audio chunk may be getting dropped because of this guard. Here's why — in the stop() method, this.isStopping is set to true at line 163 synchronously, before SIGTERM is sent at line 177 and before the event loop is yielded at line 182 (await closePromise). When sox/arecord receives SIGTERM, it flushes its remaining internal buffer to stdout — but by the time that final data event reaches the listener, isStopping is already true, so the chunk is silently discarded.

Suggested fix

Remove the !this.isStopping guard from the data listener at line 96:

this.recordingProcess.stdout?.on('data', (chunk: Buffer) => {
  this.audioChunks.push(chunk);
});

This is safe because:

  • In stop(), await closePromise at line 182 waits for the 'close' event. Node.js guarantees all 'data' events fire before 'close', so by the time we reach Buffer.concat(this.audioChunks) at line 184 the array already has every chunk including the final flush.
  • In cancel(), this.audioChunks = [] at line 148 wipes the array anyway, so any stray chunks after SIGTERM are harmless.

Let me know what you think.

@jacob314
Copy link
Copy Markdown
Contributor

The PR currently has a bunch of files that should not be included in it. Please take a look. I'll review once the diff is back down to the appropriate size rather than 10K+ lines due to files that should not be included.

@fayerman-source
Copy link
Copy Markdown
Contributor Author

@jacob314 Good deal. Got the accidental artifacts removed, so the diff should be back down now. The most recent comment from @mrpmohiburrahman about the final audio chunk being dropped was on point too. Love the community spirit. Got that addressed and added a focused regression test. Thanks a bunch, all.

@fayerman-source fayerman-source force-pushed the feat/voice-input-clean branch from 79d3c44 to c8ffca7 Compare April 1, 2026 22:22
# Conflicts:
#	packages/cli/src/nonInteractiveCli.ts
#	packages/cli/src/ui/AppContainer.tsx
#	packages/cli/src/ui/components/InputPrompt.tsx
#	packages/cli/src/ui/components/__snapshots__/InputPrompt.test.tsx.snap
#	packages/core/src/index.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! priority/p2 Important but can be addressed in a future release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Proposal: Unified Native Voice Input Architecture (Local-First)

8 participants