feat(shell): enable interactive commands with virtual terminal#6694
feat(shell): enable interactive commands with virtual terminal#6694
Conversation
This change introduces a pseudo-terminal (pty) for the `run_shell_command` tool, enabling support for interactive shell commands. Key changes: - Replaces the previous raw stream handling in `ShellExecutionService` with a headless xterm.js terminal. This allows for proper interpretation of terminal escape codes, cursor movements, and screen clearing, providing a more accurate representation of the terminal's state. - Adds `writeToShell` and `resizeShell` methods to `GeminiClient`, allowing clients to send input and resize the virtual terminal. - The `CoreToolScheduler` now manages terminal size and PID updates for shell tools. - The `isBinary` utility is now more sophisticated, better distinguishing between binary files and rich TTY output with ANSI color codes. - Removes the prompt instruction that discouraged the use of interactive commands, as they are now supported. This provides a foundation for a richer, interactive terminal experience within the Gemini CLI.
There was a problem hiding this comment.
Summary of Changes
Hello @galz10, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the Gemini CLI by introducing support for interactive shell commands. By integrating a pseudo-terminal, the CLI can now properly render complex terminal user interfaces (TUIs) from tools like vim or top, and allow real-time user interaction. This foundational change opens up a much richer and more dynamic terminal experience, moving beyond simple command execution to full interactive control.
Highlights
- Interactive Shell Commands: The CLI now supports interactive shell commands like
vim,top, andgit rebase -iby integrating a pseudo-terminal (pty) usingxterm.jsandnode-pty. This allows for correct rendering of terminal UIs and real-time updates. - Enhanced Communication: Two-way communication is enabled between the client and the running shell process through new
writeToShellandresizeShellmethods, allowing user input (keystrokes) and terminal resizing to be sent back to the process. - Smarter Output Handling: The
isBinaryutility has been improved to accurately distinguish between rich TTY output (with ANSI color codes) and actual binary streams, ensuring colorful command-line tool output is preserved as text. - Model Prompt Update: A previous system prompt instruction that discouraged the model from using interactive commands has been removed, fully enabling this new capability.
- New Keybinding for Shell Focus: A new keybinding,
Ctrl+T, has been introduced to toggle input focus between the main prompt and an active interactive shell. The previousCtrl+Tbinding for tool descriptions has been remapped toCtrl+I.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request is an impressive piece of work that enables interactive shell commands by integrating a pseudo-terminal. The architectural shift to using a headless xterm.js terminal backed by node-pty is a great move and unlocks a much richer user experience. The changes are extensive but well-organized across the UI, hooks, and core services. The improved isBinary utility is also a notable enhancement. I've found one potential issue regarding the handling of terminal resizing for new interactive sessions. Overall, this is a fantastic feature addition.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
jacob314
left a comment
There was a problem hiding this comment.
Super excited about this feature. Hopefully Gemini CLI can help address most of these comments. The most important ones are the ones to derisk landing this change.
Other UI issues noticed. Need to make the regular InputPrompt show up as unfocused when focus switches to the terminal. Also curious why the cursor inside the embedded Gemini CLI input prompt doesn't work. Potentially we aren't notifying the sub terminal that it has focus.
Need to align on whether we use a flickering cursor. Either we should use it everywhere or nowhere.
We should discuss the keyboard shortcuts for this. Potentially we should use tab to toggle focus. Can you work with @miguelsolorio to figure out reasonable user interactions for that.
We should show focus by changing the outline color of the shell command area box. We've meant to do that for InputPrompt as well but never got around to it. @miguelsolorio should have ideas on colors to use. I'd suggest the shell has the yellow outline when focused.
|
|
||
| const render = () => { | ||
| renderTimeout = null; | ||
| const newStrippedOutput = getFullText(headlessTerminal); |
There was a problem hiding this comment.
I would expect the headlessTerminal has some apis to notify of changes so we don't have to poll.
This commit refactors how terminal dimensions are managed within the application. Previously, terminal width and height were passed down through multiple components, leading to prop drilling. To simplify this, terminal dimensions are now stored in the `Config` object. This allows any part of the application to access the terminal size directly from the config, eliminating the need to pass it as a prop. Other changes in this commit include: - Updating key bindings for toggling tool descriptions and shell input focus. - Removing the `pidUpdateHandler` as it is no longer used. - Simplifying the `isBinary` utility function. - Adding a guideline to the system prompt regarding interactive shell commands.
The logic for determining the `activeShellPtyId` has been moved from the `App` component into the `useShellCommandProcessor` hook. Previously, the `App` component was responsible for iterating through pending history items to find an executing shell command and derive the active pty ID. This approach was indirect and placed state management logic in a higher-level component. This change centralizes the state management within the `useShellCommandProcessor` hook, which is directly responsible for shell command execution and has immediate access to the process ID. The hook now manages the `activeShellPtyId` state and exposes it to its consumers. This refactor simplifies the `App` component, improves separation of concerns, and allows for more direct testing of the active shell state management.
This refactor removes the useEffect from App.tsx that was responsible for managing the shell input focus. The logic has been moved into the useShellCommandProcessor hook, which now directly calls setShellInputFocused(false) when a shell command's lifecycle ends (on success, error, or cancellation). Additionally, useGeminiStream now handles resetting the focus when a user cancels an ongoing request. This change simplifies the App component and makes the state management more direct and robust. Unit tests for both hooks have been updated to verify the new behavior.
Refactors the terminal shell interface to display a blinking cursor at its actual position within the
terminal buffer, rather than at a static input prompt. This provides a more authentic and intuitive user
experience, accurately reflecting the state of the underlying shell, which is critical for interactive
applications like vim or htop.
Key changes:
- `ShellExecutionService`: The executeWithPty method now extracts cursor coordinates (x, y) from the
node-pty instance and includes them in the data output events. The ShellOutputEvent type has been updated
to support this optional cursor payload.
- `useShell` Hook: The hook now manages the cursor's position in its state, updating it based on the data
received from the execution service.
- `TerminalOutput` Component: A new component has been created to render the terminal output and overlay
the cursor at the correct absolute position. It handles the logic for displaying the character under the
cursor with inverse styling to create a classic block cursor effect.
- `Shell` Screen: The main screen now manages the cursor's blinking visibility (isCursorVisible state and
the blinking interval) and integrates the new TerminalOutput component.
- `ShellInputPrompt` Cleanup: The now-redundant cursor rendering and blinking logic has been removed from
the input prompt component, simplifying it to a pure input handler.
Removes the `setTimeout`-based render scheduling in `ShellExecutionService` in favor of a direct-rendering approach. This change eliminates the potential for a 17ms delay in terminal output by calling the `render` function directly upon cursor movement and after data is written to the pseudo-terminal. This reduces latency and improves the responsiveness of the terminal output stream.
Removes the blinking effect from the cursor in the shell command output. The cursor visibility is no longer toggled on an interval. Instead, it remains consistently visible when the shell input is focused, matching the behavior of the main input prompt. This improves visual consistency and provides a more stable editing experience. The `isCursorVisible` prop has been removed from `TerminalOutput` and its parent components, simplifying the rendering logic.
Removes the unused `config` prop from `ShellInputPrompt` and its usage in `ToolMessage`. This simplifies the component's interface and removes an unnecessary dependency.
packages/core/src/config/config.ts
Outdated
| trustedFolder?: boolean; | ||
| shouldUseNodePtyShell?: boolean; | ||
| skipNextSpeakerCheck?: boolean; | ||
| terminalWidth?: number; |
There was a problem hiding this comment.
can we move the height and width to be a new ShellExecutionConfig sub object similar to SandboxConfig.
That way it is clear this this is only something for spinning up shell execution not something people who are generally interested in the terminal width+height should use. Also makes it clear that clients might set this something arbitrary based on their interest in how big to by default make terminals they will render.
| - **Parallelism:** Execute multiple independent tool calls in parallel when feasible (i.e. searching the codebase). | ||
| - **Command Execution:** Use the '${ShellTool.Name}' tool for running shell commands, remembering the safety rule to explain modifying commands first. | ||
| - **Background Processes:** Use background processes (via \`&\`) for commands that are unlikely to stop on their own, e.g. \`node server.js &\`. If unsure, ask the user. | ||
| - **Interactive Commands:** Try to avoid shell commands that are likely to require user interaction (e.g. \`git rebase -i\`). Use non-interactive versions of commands (e.g. \`npm init -y\` instead of \`npm init\`) when available, and otherwise remind the user that interactive shell commands are not supported and may cause hangs until canceled by the user. |
There was a problem hiding this comment.
Fyi @anj-s for the prompt changes here. Would want to make sure what we do here doesn't regress quality.
I'd suggest we try to make this a bit more terse and a bit more centered on saying what to do rather than what to avoid.
There was a problem hiding this comment.
I reverted the prompt changes to what they are at head, so there shouldn't be any regressions due to the prompt.
| }); | ||
|
|
||
| it('should call onPid with the process id', async () => { | ||
| const onPid = vi.fn(); |
There was a problem hiding this comment.
this test seems misleading with onPid removed.
| expect(result.aborted).toBe(false); | ||
| expect(result.output).toBe('file1.txt\na warning'); | ||
| expect(handle.pid).toBe(12345); | ||
| expect(handle.pid).toBe(undefined); |
There was a problem hiding this comment.
why did this expectation change?
There was a problem hiding this comment.
Same as bellow, we use the pid to focus on the terminal so for child_process we don't need to define the pid.
| error, | ||
| aborted: abortSignal.aborted, | ||
| pid: child.pid, | ||
| pid: undefined, |
There was a problem hiding this comment.
why don't we fallback to child.pid anymore?
There was a problem hiding this comment.
For child_process we don't care about the pid, since this is being used to focus the terminal onto the process which shouldn't happen for child_process
packages/cli/src/ui/App.tsx
Outdated
| const [showEscapePrompt, setShowEscapePrompt] = useState(false); | ||
| const [isProcessing, setIsProcessing] = useState<boolean>(false); | ||
| const [shellInputFocused, setShellInputFocused] = useState(false); | ||
| const [cursorPosition, setCursorPosition] = useState<{ |
There was a problem hiding this comment.
can you keep this login in the shell rather than in App.tsx to keep App.tsx small
packages/cli/src/ui/App.tsx
Outdated
| refreshStatic(); | ||
| }, 300); | ||
|
|
||
| config.setTerminalHeight(terminalHeight*0.5); |
There was a problem hiding this comment.
add constants for the shellExecutionTerminalHeight and width to avoid the repetition of this logic. would like to follow up these prs my changing it to match the actual height and width available rather than setting to 0.5 which is a lot more narrow than would be ideal.
|
Size Change: +24.4 kB (+0.19%) Total Size: 13.2 MB
ℹ️ View Unchanged
|
|
|
||
| let promise: Promise<ToolResult>; | ||
| if (invocation instanceof ShellToolInvocation) { | ||
| const setPidCallback = (pid: number) => { |
There was a problem hiding this comment.
special casing this for ShellToolInvocation is a bit ugly. would be ideal if there is a simpler way but won't block the pr on that. perhaps add a todo.
There was a problem hiding this comment.
Added a todo, i'll follow up on this i think the correct implementation would require some refactors.
| negative: [createKey('return'), createKey('space')], | ||
| }, | ||
| { | ||
| command: Command.TOGGLE_SHELL_INPUT_FOCUS, |
There was a problem hiding this comment.
Fyi @miguelsolorio would be good to get your thoughts on what keyboard shortcut we should use to toggle focus between components in the main GEMINI CLI UI. right now it is just the shell input vs regular input but in the future I could see toggling focus back to review the list of historic messages in the UI.
Other possible followup is automatically moving the focus to the shell when it is active and moving it back.
There was a problem hiding this comment.
I like the idea of moving focus, but I'd also assume using things like Tab allow you to cycle through the options?

TLDR
This pull request enables interactive shell commands by integrating a pseudo-terminal (pty) into the
run_shell_commandtool. This allows Gemini to run commands likevim,top, orgit rebase -iand correctly render their UI, a capability that was previously unsupported.Dive Deeper
The core of this change is the replacement of the previous raw stream handling in
ShellExecutionServicewith a headlessxterm.jsterminal backed bynode-pty. This architectural shift provides several key benefits:writeToShellandresizeShellmethods on theGeminiClient. This creates a two-way communication channel, allowing the client to send user input (keystrokes) and terminal resize events back to the running process.isBinaryutility has been made more sophisticated. The previous implementation would often misclassify rich TTY output with ANSI color codes as a binary stream. The new logic is more discerning, ensuring that colorful command-line tool output is preserved as text.These changes provide the foundational support for a much richer, interactive terminal experience within Gemini.
Reviewer Test Plan
To validate these changes, pull down the branch and run the CLI with prompts that invoke interactive or TUI-based applications.
Basic Interactivity:
ng new.Resizing:
Testing Matrix
Linked issues / bugs