feat: Add voice input with pluggable backend (Gemini zero-install + Whisper) by fayerman-source · Pull Request #18499 · google-gemini/gemini-cli

fayerman-source · 2026-02-07T01:05:06Z

Summary

This PR adds native voice input to Gemini CLI with a pluggable backend architecture:

gemini as the default zero-install transcription backend
optional local whisper support for users who want a local binary workflow

This addresses the request for a zero-install option while keeping a local backend available for users who prefer it.

Update: Gemini Backend Added (addresses @jacob314's feedback)

Per @jacob314's comment requesting a zero-install option using the Gemini API, this PR adds a GeminiRestBackend that:

Records audio locally via sox or arecord
Converts raw PCM to WAV in-memory
Transcribes via generateContent using the user's existing Gemini auth
Works with existing OAuth or API key authentication

Note: This uses generateContent with audio rather than the Live API streaming approach. The Live API is designed for real-time bidirectional sessions; generateContent is simpler and sufficient for push-to-talk transcription.

Features

Feature	Description
Voice Recording	Press `Space Space` on an empty prompt to start or stop recording
Zero-install backend	Gemini transcription via existing auth with no Python or model downloads
Pluggable backends	`voice.provider: "gemini"` (default) or `"whisper"` (local)
Auto Language Detection	Gemini and Whisper auto-detect spoken language
Visual Feedback	`●` while recording, spinner while transcribing
Silence Filtering	`voice.silenceThreshold` drops silent recordings before transcription
Slash commands	`/voice`, `/voice enable`, `/voice disable`, `/voice provider [gemini

Settings

{
  "voice": {
    "enabled": true,                  // default: false
    "provider": "gemini",             // default: "gemini", or "whisper"
    "silenceThreshold": 80,           // default: 80, 0 disables silence filtering
    "whisperPath": "/usr/bin/whisper" // optional, only used for whisper
  }
}

Architecture

useVoiceInput (hook)
  ├── GeminiRestBackend   ← default: sox/arecord → WAV buffer → generateContent
  └── LocalWhisperBackend ← optional: sox/arecord → WAV file → whisper binary

Key Design Decisions

Decision	Rationale
Event-based transcript delivery	Uses `coreEvents` to avoid React re-render cascades
In-memory Gemini path	Keeps Gemini transcription off disk
Existing auth reuse	Uses `config.getContentGenerator()` for OAuth or API key flows
Toggle race protection	Uses `isTogglingRef` to prevent overlapping toggles
Silence filtering	Applies `voice.silenceThreshold` before transcription work is done
Clear recording feedback	Shows `●` while recording and a spinner while transcribing

Audio Pipeline

Gemini Backend

Step	Behavior
1	Record raw PCM via `sox` or `arecord`
2	Concatenate PCM chunks in memory and prepend a WAV header
3	Send `audio/wav` to `generateContent`
4	Emit transcript text through the event bus and insert it at the cursor

Whisper Backend

Step	Behavior
1	Record WAV audio via `sox` or `arecord`
2	Drop silent recordings using `voice.silenceThreshold`
3	Invoke the configured Whisper binary
4	Emit transcript text through the event bus and insert it at the cursor

Commands

Command	Purpose
`/voice`	Show current voice settings
`/voice enable`	Enable voice input
`/voice disable`	Disable voice input
`/voice provider [gemini\|whisper]`	Select the transcription backend
`/voice sensitivity <0-1000>`	Set silence filtering
`/voice set-path <path>`	Set the Whisper binary path
`/voice help`	Show voice help

Validation

Manual Validation

Check	Result
End-to-end voice recording and transcription	✅
`Space Space` start/stop behavior	✅
`Esc` cancels active recording	✅
`gemini` backend	✅
`whisper` backend	✅
Prompt alignment remains stable during idle, recording, and transcribing	✅

Automated Validation

Command	Result
`npm run build -w @google/gemini-cli-core`	✅
`npm run typecheck`	✅
`npm test -w @google/gemini-cli -- src/nonInteractiveCli.test.ts`	✅
`npm test -w @google/gemini-cli-core -- src/services/voice/LocalWhisperBackend.test.ts`	✅
`npm test -w @google/gemini-cli -- --coverage.enabled=false src/nonInteractiveCli.test.ts src/ui/components/InputPrompt.test.tsx`	✅
`npm test -w @google/gemini-cli -- src/ui/hooks/useVoiceInput.test.ts src/ui/hooks/useVoiceInput.replication.test.tsx src/ui/hooks/useVoiceInput.stress.test.ts src/ui/hooks/useVoiceInput.log-volume.test.ts`	✅
`npm run build`	✅

Follow-up Work

Not included in this PR
Deeper `/settings` TUI integration for voice configuration
Further UI polish around discoverability of sensitivity defaults

Closes #18067

gemini-code-assist · 2026-02-07T01:05:39Z

Summary of Changes

Hello @fayerman-source, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Gemini CLI by integrating a native voice input capability. This feature aims to streamline user interaction by enabling natural speech-to-text conversion, making the CLI more accessible and efficient. The implementation focuses on a robust, event-driven design to maintain performance and provides clear feedback to the user throughout the voice input process.

Highlights

Native Voice Input: Introduces a native voice input feature for the Gemini CLI, allowing users to speak commands and input text directly.
Whisper Integration: Leverages OpenAI Whisper (or compatible alternatives like faster-whisper, whisper-cli) for accurate speech-to-text transcription.
Flexible Activation: Users can toggle voice recording using keyboard shortcuts (Alt+R or Ctrl+Q) or the new /voice slash command.
Dynamic Language Detection: The system automatically detects the spoken language, removing the need for hardcoded language settings.
User Feedback: Provides clear visual indicators (🎤 for recording, ⏳ for transcribing) to inform the user of the voice input status.
Optimized Architecture: Implements an event-based architecture for transcript delivery, utilizing EventEmitter to prevent unnecessary React re-renders and ensure UI responsiveness.
Robustness: Includes race condition protection (isTogglingRef) and process guards to ensure stable recording and transcription operations.
Configurable Backend: Supports various Whisper backends and allows users to specify a custom whisperPath in settings for advanced configurations.

Changelog

docs/cli/keyboard-shortcuts.md
- Added a new 'Voice Input' section detailing the Alt + R and Ctrl + Q keybindings for toggling voice recording.
docs/cli/settings.md
- Introduced a new 'Voice' section with the voice.whisperPath setting, allowing users to specify the path to their Whisper executable.
docs/get-started/configuration.md
- Added documentation for the new voice.whisperPath setting under the voice category.
package-lock.json
- Updated several dependency entries to include "peer": true.
packages/cli/src/config/keyBindings.ts
- Defined a new Command.VOICE_INPUT enum member.
- Added default keybindings (Alt+R, Ctrl+Q) for the VOICE_INPUT command.
- Included 'Voice Input' in the commandCategories for better organization.
packages/cli/src/config/settingsSchema.ts
- Added a new voice object to the SETTINGS_SCHEMA to manage voice input configurations, including the whisperPath property.
packages/cli/src/services/BuiltinCommandLoader.ts
- Imported the new voiceCommand.
- Registered voiceCommand in the list of built-in slash commands.
packages/cli/src/test-utils/render.tsx
- Imported VoiceContext and VoiceInputReturn for testing purposes.
- Added mockVoiceReturn to provide a default mock for the voice input hook in tests.
- Modified renderWithProviders to accept and provide a voice mock to the VoiceContext.Provider.
packages/cli/src/ui/AppContainer.tsx
- Imported VoiceContext and useVoiceInput.
- Integrated the useVoiceInput hook to manage voice recording state and actions.
- Updated the UIActionsContext value to include toggleVoice.
- Added keybinding handling for Command.VOICE_INPUT to trigger voice.toggleRecording().
- Wrapped the main App component with VoiceContext.Provider to make voice input functionality available throughout the UI.
packages/cli/src/ui/commands/types.ts
- Added toggleVoice: () => void; to the CommandContext['ui'] interface.
packages/cli/src/ui/commands/voiceCommand.ts
- New file: Implemented the /voice slash command, which toggles voice recording and provides help documentation when /voice help is used.
packages/cli/src/ui/components/Composer.tsx
- Added a key="input-prompt" prop to the InputPrompt component.
packages/cli/src/ui/components/InputPrompt.tsx
- Imported useVoiceContext and onVoiceTranscript.
- Integrated useVoiceContext to access voice input state.
- Added a useEffect hook to subscribe to onVoiceTranscript events, inserting transcribed text into the buffer.
- Modified keybinding handling for Escape and Quit to stop voice recording if active.
- Added keybinding handling for Command.VOICE_INPUT to toggle recording.
- Updated status text and indicator logic to display voice input states (recording, transcribing, error) and their corresponding icons (🎤, ⏳).
packages/cli/src/ui/components/snapshots/AskUserDialog.test.tsx.snap
- Updated snapshots due to changes in the rendering tree, likely related to the VoiceContext integration.
packages/cli/src/ui/components/snapshots/InputPrompt.test.tsx.snap
- Updated snapshots due to changes in the rendering tree and the integration of VoiceContext.
packages/cli/src/ui/components/snapshots/RewindViewer.test.tsx.snap
- Updated snapshots due to changes in the rendering tree.
packages/cli/src/ui/components/shared/snapshots/BaseSelectionList.test.tsx.snap
- Updated snapshots due to changes in the rendering tree.
packages/cli/src/ui/contexts/VoiceContext.test.tsx
- New file: Added unit tests for VoiceContext to ensure it provides the correct voice input state.
packages/cli/src/ui/contexts/VoiceContext.tsx
- New file: Created VoiceContext and useVoiceContext for managing and accessing voice input state, and re-exported onVoiceTranscript.
packages/cli/src/ui/hooks/slashCommandProcessor.test.tsx
- Updated the mock ui object in tests to include the new toggleVoice function.
packages/cli/src/ui/hooks/slashCommandProcessor.ts
- Added toggleVoice: () => void; to the SlashCommandProcessorActions interface.
- Included actions.toggleVoice in the CommandContext['ui'] object passed to slash commands.
packages/cli/src/ui/hooks/useVoiceInput.log-volume.test.ts
- New file: Added tests to verify that useVoiceInput remains silent (no visible logs) during normal operations and when processing noisy sox stderr output.
packages/cli/src/ui/hooks/useVoiceInput.replication.test.tsx
- New file: Added a replication test to confirm that the event-based transcript delivery works as intended without causing excessive React re-renders in consumer components.
packages/cli/src/ui/hooks/useVoiceInput.stress.test.ts
- New file: Added stress tests to ensure the useVoiceInput hook handles rapid sox progress logs and multiple toggleRecording calls efficiently without excessive re-renders.
packages/cli/src/ui/hooks/useVoiceInput.test.ts
- New file: Comprehensive unit tests for useVoiceInput, covering initialization, starting/stopping recording, sox/arecord fallback, error handling, and event-based transcript emission.
packages/cli/src/ui/hooks/useVoiceInput.ts
- New file: Implemented the core useVoiceInput hook, managing recording processes (sox/arecord), temporary audio file creation, Whisper transcription, and event-based transcript emission to prevent React re-render loops. Includes state management for recording/transcribing status and error handling.
packages/cli/src/ui/keyMatchers.test.ts
- Updated tests for Command.PASTE_CLIPBOARD to include Cmd + V and Alt + V.
- Added tests for the new Command.VOICE_INPUT keybindings (Alt + R).
packages/cli/src/ui/noninteractive/nonInteractiveUi.ts
- Added a no-op toggleVoice function to the non-interactive UI mock.
schemas/settings.schema.json
- Added the voice property to the root of the settings schema, defining its structure and the whisperPath sub-property.

Activity

The author, fayerman-source, initiated this pull request to introduce a significant new feature: native voice input.
A detailed summary, demo video, feature list, and technical implementation overview were provided in the PR description.
Comprehensive testing was performed, including passing existing tests, adding a new test suite (7 tests), stress tests for rapid toggling, and a clean slate verification.
Security and quality considerations were addressed, ensuring no secrets in code, proper temp file cleanup, and adherence to existing code style.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a fantastic new feature for native voice input using Whisper. The implementation is well-structured, leveraging an event-based architecture to prevent UI re-render loops, and includes comprehensive documentation and tests. However, it introduces critical Command Injection vulnerabilities by using child_process.exec with user-controlled settings and unvalidated file paths, specifically in the handling of the user-configurable whisperPath setting. This poses a high security risk, potentially leading to Remote Code Execution (RCE), and must be addressed before merging.

gemini-code-assist · 2026-02-07T01:09:24Z

packages/cli/src/ui/hooks/useVoiceInput.ts

+      const runWhisper = async (binary: string, args: string) => {
+        // If it looks like an absolute path, verify existence directly
+        // This avoids 'which' issues with PATH
+        if (binary.startsWith('/') || binary.startsWith('.')) {
+          // Check for common configuration error: path with incorrect quotes
+          if (binary.includes("'") || binary.includes('"')) {
+            const sanitized = binary.replace(/['"]/g, '');
+            if (sanitized !== binary) {
+              try {
+                await access(sanitized);
+                if (!sanitizedPathLoggedRef.current) {
+                  debugLogger.log(
+                    'useVoiceInput: found sanitized path, using it instead',
+                    sanitized,
+                  );
+                  sanitizedPathLoggedRef.current = true;
+                }
+                binary = sanitized;
+              } catch {
+                // Sanitized path also doesn't exist, proceed with original to let it fail or be logged
+              }
+            }
+          }
+
+          try {
+            await access(binary);
+          } catch {
+            // If access fails, we'll fall through to try executing it (or it might fail there)
+            // but checking 'which' on an absolute path is redundant/incorrect.
+            debugLogger.log(
+              'useVoiceInput: explicit path access check failed, but will try execution',
+              binary,
+            );
+          }
+        } else {
+          // For command names, try to find them first
+          try {
+            await execAsync(`which ${binary}`);
+          } catch {
+            // If which fails, we might still try running it if we suspect it's in the PATH but not found by `sh -c which`
+            // (e.g. some obscure shell setup). But usually `which` failure is authoritative.
+            // However, for user convenience, let's allow proceeding if it's the configured path.
+            if (binary === config?.whisperPath) {
+              debugLogger.log(
+                'useVoiceInput: `which` failed for configured path, but proceeding anyway',
+              );
+            } else {
+              throw new Error(`Command not found: ${binary}`);
+            }
+          }
+        }
+
+        await execAsync(`${binary} ${args}`);
+      };


The runWhisper function uses exec (via execAsync) to execute a command constructed from the user-controlled whisperPath setting. This creates a critical Command Injection vulnerability. The binary variable, derived from the user-configurable voice.whisperPath setting, is directly interpolated into a shell command. An attacker could provide a malicious path like "/path/to/whisper; malicious_command" in the settings to achieve Remote Code Execution (RCE) on the user's machine. The which ${binary} check on line 296 is also vulnerable to injection.

Remediation: To mitigate this, replace child_process.exec (or execAsync) with child_process.spawn and pass arguments as an array. This prevents shell interpretation of user-controlled input. If the whisperPath setting needs to support arguments, parse the string into an executable and an array of arguments safely.

Fixed in ce4b23c and 1df335c. All exec() calls have been replaced with execFile() using argument arrays — no shell is spawned, so no shell interpretation of user input. Additionally, validateBinaryPath() rejects any path containing shell metacharacters. The which check also uses execFile('which', [cmd]) now.

gemini-code-assist · 2026-02-07T01:09:24Z

packages/cli/src/ui/hooks/useVoiceInput.ts

+            await execAsync(
+              `whisper-faster "${audioFile}" --model tiny --output_format txt --output_dir "${tempDirRef.current}"`,
+            );
+            // Read the transcript file


The fallback logic for whisper-faster uses exec with concatenated arguments, including file paths. If the temporary directory path or the filename contains shell metacharacters, it could lead to command injection.

Remediation: Use child_process.spawn with an arguments array to safely execute external commands.

Fixed in the same commits. whisper-faster is now invoked via execFileAsync('whisper-faster', [audioFile, '--model', 'tiny', ...]) — arguments passed as an array, no string concatenation.

gemini-code-assist · 2026-02-07T01:09:24Z

packages/cli/src/ui/hooks/useVoiceInput.ts

+              await execAsync('which whisper');
+              await execAsync(
+                `whisper "${audioFile}" --model tiny --output_format txt --output_dir "${tempDirRef.current}"`,
+              );


The fallback logic for whisper uses exec with concatenated arguments, which is vulnerable to command injection.

Remediation: Use child_process.spawn with an arguments array to safely execute external commands.

Same fix — whisper fallback now uses execFileAsync('whisper', [audioFile, '--model', 'tiny', ...]) with an argument array.

fayerman-source · 2026-02-07T01:59:29Z

🔒 Security Fix Applied

Fixed the command injection vulnerability identified by @gemini-code-assist:

Changes:

**Replaced with ** - No shell spawned, no shell interpretation
Arguments passed as arrays - Never concatenate user input into command strings
**Added ** - Rejects paths with shell metacharacters (, , , , , etc.)
Removed all usage - Including the checks, now using

Before (Vulnerable):

After (Secure):

The fix follows Node.js security best practices (2024) and prevents RCE even with malicious settings.

fayerman-source · 2026-02-07T14:32:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and well-architected voice input feature. The use of an event-based system for transcript delivery is a great choice to prevent performance issues, and the implementation shows good consideration for security and cross-platform support. I've identified two high-severity issues in the core useVoiceInput hook concerning platform portability and a potential race condition that should be addressed to ensure the feature's robustness. Overall, this is a strong contribution.

packages/cli/src/ui/hooks/useVoiceInput.ts

fayerman-source · 2026-02-14T18:37:05Z

🚀 Voice Input Enhancements & Fixes

I've updated this PR with several improvements to stability, configuration, and developer experience:

🛠 Changes

Static Configuration: Added /voice set-path <path> subcommand. This allows users to configure the Whisper path instantly without triggering an 'agentic search' loop in the CLI.
Improved Stability: Converted debugLogger.log calls in useVoiceInput.ts to .debug() level. This fixes the log-volume.test.ts failures while keeping the TUI clean for end-users.
UX Refinement: Patched the non-interactive CLI runner to correctly handle slash commands and exit immediately if no follow-up prompt is returned.
Code Quality: Wrapped case blocks in nonInteractiveCliCommands.ts to resolve ESLint lexical declaration errors.

✅ Validation

Benchmarking: Verified that faster-whisper provides a ~5.8x speedup (0.48s vs 2.8s) over standard Whisper on local hardware.
Preflight: Successfully ran the full repository preflight suite (npm run preflight), confirming all tests, types, and linting pass.

These changes make the feature more 'production-ready' by aligning with the CLI's existing command patterns.

fayerman-source · 2026-02-14T18:45:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a native voice input feature using Whisper, which is a great addition to the CLI's accessibility and usability. The implementation is well-structured, using an event-based architecture to prevent performance issues and including comprehensive tests for the new functionality. The code demonstrates good security practices by validating user-provided paths and using execFile to prevent command injection.

I've identified two high-severity issues in the useVoiceInput hook that should be addressed: a resource leak due to temporary directories not being cleaned up, and a portability issue with the command existence check that will likely cause the feature to fail on Windows. My detailed comments provide suggestions for fixing these issues.

gemini-code-assist · 2026-02-14T18:48:30Z

packages/cli/src/ui/hooks/useVoiceInput.ts

+        void unlink(audioFileRef.current).catch(() => {});
+      }
+      if (tempDirRef.current) {
+        // Clean up temp dir


The temporary directory created by mkdtemp for storing audio recordings is not being cleaned up. The useEffect cleanup hook has a // Clean up temp dir comment but no implementation. This will lead to an accumulation of gemini-voice-* directories in the system's temporary folder, causing a resource leak.

While this cleanup hook should be fixed for robustness on unmount, the primary cleanup should happen at the end of the stopRecording function, after the audio and transcript files have been processed.

Please add logic to remove the temporary directory in both places. You'll need to import rm from node:fs/promises.

Suggested change

// Clean up temp dir

void rm(tempDirRef.current, { recursive: true, force: true });

References

Use the node: prefix when importing built-in Node.js modules for consistency across the codebase.

Use asynchronous file system operations (e.g., fs.promises.readFile) instead of synchronous ones (e.g., fs.readFileSync) to avoid blocking the event loop.

gemini-code-assist · 2026-02-14T18:48:30Z

packages/cli/src/ui/hooks/useVoiceInput.ts

+      const commandExists = (cmd: string): Promise<boolean> =>
+        new Promise((resolve) => {
+          execFile('which', [cmd], (error) => {
+            resolve(!error);
+          });
+        });


The commandExists helper function uses the which command to check for the existence of sox and arecord. The which command is specific to Unix-like systems and is not available on Windows by default. This will cause the voice input feature to fail on Windows, as it won't be able to find the necessary recording tools.

To ensure cross-platform compatibility, please use a method that works on all supported operating systems. A simple solution is to use where on Windows and which on other platforms by checking process.platform.

Suggested change

const commandExists = (cmd: string): Promise<boolean> =>

new Promise((resolve) => {

execFile('which', [cmd], (error) => {

resolve(!error);

});

});

const commandExists = (cmd: string): Promise<boolean> =>

new Promise((resolve) => {

const checkCmd = process.platform === 'win32' ? 'where' : 'which';

execFile(checkCmd, [cmd], (error) => {

resolve(!error);

});

});

fayerman-source · 2026-02-14T19:48:25Z

🛠 Feedback Addressed

I've implemented the high-severity fixes identified in the latest review:

Fixed Resource Leak: Temporary directories are now rigorously cleaned up using rm -rf equivalent (fs.rm) during both unmount and the normal transcription lifecycle.
Improved Portability: Replaced the which dependency with a cross-platform commandExists helper that manually traverses the PATH (and respects PATHEXT on Windows).
TS Compliance: Resolved environment variable access issues to satisfy the project's strict TypeScript configuration.

All preflight checks are passing locally. @gemini-code-assist

gemini-code-assist · 2026-02-14T19:48:44Z

Thank you for the detailed update, @fayerman-source! I appreciate you addressing the feedback from the previous review.

The implemented changes, especially the security fix for command injection by using execFile with argument arrays and the validateBinaryPath function, are crucial and significantly improve the robustness and security of the voice input feature. The rigorous cleanup of temporary directories and the improved cross-platform commandExists helper also demonstrate a strong commitment to quality and maintainability.

These updates look great and contribute significantly to making the voice input feature production-ready. Excellent work!

fayerman-source · 2026-02-19T16:49:22Z

@jacob314 @skeshive What's the game plan for the voice input feature? Happy to iterate on this together to get this into production. Cheers!

…ncies

fayerman-source · 2026-03-21T16:37:55Z

@jacob314 @scidomino Checking in on this. Is there anything else we need before merging?

mrpmohiburrahman · 2026-03-25T15:03:01Z

packages/core/src/services/voice/GeminiRestBackend.ts

+      this.recordingProcess.stdout?.on('data', (chunk: Buffer) => {
+        if (!this.isStopping) {
+          this.audioChunks.push(chunk);
+        }
+      });


I think the final audio chunk may be getting dropped because of this guard. Here's why — in the stop() method, this.isStopping is set to true at line 163 synchronously, before SIGTERM is sent at line 177 and before the event loop is yielded at line 182 (await closePromise). When sox/arecord receives SIGTERM, it flushes its remaining internal buffer to stdout — but by the time that final data event reaches the listener, isStopping is already true, so the chunk is silently discarded.

Suggested fix

Remove the !this.isStopping guard from the data listener at line 96:

this.recordingProcess.stdout?.on('data', (chunk: Buffer) => { this.audioChunks.push(chunk); });

This is safe because:

In stop(), await closePromise at line 182 waits for the 'close' event. Node.js guarantees all 'data' events fire before 'close', so by the time we reach Buffer.concat(this.audioChunks) at line 184 the array already has every chunk including the final flush.

In cancel(), this.audioChunks = [] at line 148 wipes the array anyway, so any stray chunks after SIGTERM are harmless.

Let me know what you think.

jacob314 · 2026-03-25T16:38:53Z

The PR currently has a bunch of files that should not be included in it. Please take a look. I'll review once the diff is back down to the appropriate size rather than 10K+ lines due to files that should not be included.

fayerman-source · 2026-03-25T17:24:35Z

@jacob314 Good deal. Got the accidental artifacts removed, so the diff should be back down now. The most recent comment from @mrpmohiburrahman about the final audio chunk being dropped was on point too. Love the community spirit. Got that addressed and added a focused regression test. Thanks a bunch, all.

# Conflicts: # packages/cli/src/ui/components/InputPrompt.tsx

…tness

# Conflicts: # packages/cli/src/nonInteractiveCli.ts # packages/cli/src/ui/AppContainer.tsx # packages/cli/src/ui/components/InputPrompt.tsx # packages/cli/src/ui/components/__snapshots__/InputPrompt.test.tsx.snap # packages/core/src/index.ts

fayerman-source requested review from a team as code owners February 7, 2026 01:05

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

gemini-cli bot added the area/core Issues related to User Interface, OS Support, Core Functionality label Feb 7, 2026

fayerman-source force-pushed the feat/voice-input-clean branch 2 times, most recently from 92776cf to 1df335c Compare February 7, 2026 06:05

gemini-cli bot added priority/p2 Important but can be addressed in a future release. help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Feb 7, 2026

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

packages/cli/src/ui/hooks/useVoiceInput.ts Outdated Show resolved Hide resolved

packages/cli/src/ui/hooks/useVoiceInput.ts Outdated Show resolved Hide resolved

fayerman-source force-pushed the feat/voice-input-clean branch 5 times, most recently from 6bd08ba to c6c7329 Compare February 14, 2026 15:43

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

This was referenced Feb 19, 2026

Audio input instead of typing #1982

Open

Enhance Gemini CLI with voice interaction capabilities. #13798

Closed

jacob314 mentioned this pull request Feb 19, 2026

Deflake windows tests. #19511

Merged

jacob314 closed this in #19511 Feb 19, 2026

fayerman-source deleted the feat/voice-input-clean branch February 19, 2026 16:34

jacob314 reopened this Feb 19, 2026

fayerman-source added 4 commits March 19, 2026 19:52

fix(cli): resolve FinishReason build error and missing bundle depende…

3078d2a

…ncies

Merge branch 'main' into feat/voice-input-clean

f00d05c

docs: regenerate settings documentation and schema

1fbc052

Merge branch 'main' into feat/voice-input-clean

615ce40

fayerman-source added 6 commits March 21, 2026 17:25

Merge branch 'main' into feat/voice-input-clean

fc5aff7

Merge branch 'main' into feat/voice-input-clean

2f89ac2

Merge branch 'main' into feat/voice-input-clean

414674f

fix(cli): update voice tests for async renderHook

d2ed487

Merge branch 'main' into feat/voice-input-clean

c87dc06

Merge upstream/main and resolve conflicts

52b8352

mrpmohiburrahman reviewed Mar 25, 2026

View reviewed changes

fayerman-source added 2 commits March 25, 2026 13:10

chore(pr): drop accidental local artifacts from voice diff

77b5dbe

fix(voice): preserve final audio chunk on stop

94739e1

fayerman-source added 10 commits March 25, 2026 13:25

Merge branch 'main' into feat/voice-input-clean

015d28e

chore(pr): trim unrelated voice diff noise

a2f98f5

fix(ci): resolve voice input lint and test drift

25b4781

Merge branch 'main' into feat/voice-input-clean

0fab703

Merge branch 'main' into feat/voice-input-clean

644d4dc

Merge branch 'main' into feat/voice-input-clean

b3305c5

Merge remote-tracking branch 'upstream/main' into feat/voice-input-clean

e9a3156

# Conflicts: # packages/cli/src/ui/components/InputPrompt.tsx

Merge branch 'main' into feat/voice-input-clean

70a4a4c

Merge remote-tracking branch 'upstream/main' into feat/voice-input-clean

1b28c60

# Conflicts: # packages/cli/src/ui/components/InputPrompt.tsx

Merge upstream/main and resolve conflicts

c8ffca7

fayerman-source force-pushed the feat/voice-input-clean branch from 79d3c44 to c8ffca7 Compare April 1, 2026 22:22

fayerman-source added 3 commits April 1, 2026 19:27

fix(cli): convert dynamic devtoolsService imports to static for robus…

a05117f

…tness

Merge branch 'main' into feat/voice-input-clean

d77d640

	// Clean up temp dir
	void rm(tempDirRef.current, { recursive: true, force: true });

Conversation

fayerman-source commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Update: Gemini Backend Added (addresses @jacob314's feedback)

Features

Settings

Architecture

Key Design Decisions

Audio Pipeline

Gemini Backend

Whisper Backend

Commands

Validation

Manual Validation

Automated Validation

Follow-up Work

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

fayerman-source Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

fayerman-source Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

fayerman-source Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

fayerman-source commented Feb 7, 2026

🔒 Security Fix Applied

Changes:

Before (Vulnerable):

After (Secure):

Uh oh!

fayerman-source commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

fayerman-source commented Feb 14, 2026

🚀 Voice Input Enhancements & Fixes

🛠 Changes

✅ Validation

Uh oh!

fayerman-source commented Feb 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

fayerman-source commented Feb 14, 2026

Uh oh!

gemini-code-assist bot commented Feb 14, 2026

Uh oh!

fayerman-source commented Feb 7, 2026 •

edited

Loading

fayerman-source commented Feb 19, 2026 •

edited

Loading