feat(core): refactor shell execution to use node-pty #6088
Conversation
This commit refactors the ShellExecutionService to use node-pty and @xterm/headless instead of the native
child_process.spawn. This change provides a more robust and accurate shell emulation, addressing several
issues with the previous implementation.
Key changes:
- Replaced child_process.spawn with pty.spawn to leverage pseudo-terminals for command execution.
- Integrated @xterm/headless to correctly parse terminal output, strip ANSI escape codes, and handle complex
terminal behavior.
- Unified stdout and stderr into a single, ordered output stream, which simplifies output handling and more
closely mirrors a real terminal experience.
- Implemented a promise-based serialization queue for processing data chunks to prevent race conditions
between text and binary output detection.
- The ShellTool and UI hooks have been updated to pass terminal dimensions down to the execution service,
allowing for correctly formatted output from terminal-aware applications.
This new architecture resolves race conditions, correctly handles binary vs. text data streams, and provides
a much cleaner and more reliable foundation for shell command execution.
There was a problem hiding this comment.
Summary of Changes
Hello @galz10, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the underlying shell execution mechanism by migrating from Node.js's built-in child_process to node-pty. This change improves the reliability and control over spawned shell commands. Additionally, it integrates @xterm/headless to provide advanced terminal output processing, such as automatic ANSI escape code stripping, ensuring cleaner and more readable output. The system now also passes terminal dimensions to shell commands, enabling better adaptation to the display environment. These updates streamline output handling by consolidating stdout and stderr into a single stream, and the necessary changes have been applied across the codebase to support this new architecture.
Highlights
- Migration to node-pty: The core shell execution mechanism has been refactored to utilize
node-ptyinstead of the nativechild_processmodule. This change provides more robust control over spawned processes and their I/O. - Enhanced Terminal Output Processing: The
@xterm/headlesslibrary has been integrated to handle terminal output processing. This includes automatic stripping of ANSI escape codes, ensuring that raw terminal output is cleanly presented as plain text. - Terminal Dimension Awareness: The shell execution service now accepts and utilizes terminal dimensions (columns and rows). This allows for more accurate and responsive rendering of shell output, especially for interactive commands or those that adapt to terminal size.
- Consolidated Output Streams: The internal handling of shell output has been simplified. Instead of separate
stdoutandstderrstreams, all textual output is now consolidated into a singleoutputstream, streamlining data processing and display. - System-Wide Integration: Changes have been propagated across various modules, including
esbuildconfigurations, CLI hooks, core tool schedulers, and test suites, to ensure compatibility and proper functioning with the new shell execution architecture.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request refactors the shell execution logic to use node-pty instead of child_process, which is a great improvement for emulating a real terminal environment. The changes are mostly well-executed, with updates to dependencies, build configurations, and tests to support this new approach.
However, I've found a critical issue in the implementation of live output streaming. The current logic incorrectly assumes that terminal output is always append-only, which will cause garbled live displays for any command that manipulates the screen, such as those with progress bars or full-screen interfaces. I've provided a set of related comments with suggestions to fix this by sending the full terminal state on each update and having the consumers replace their buffers accordingly.
Once this issue is addressed, the PR will be in excellent shape.
This commit hardens the ShellExecutionService to prevent crashes from synchronous errors during process spawning and incorporates feedback from a prior code review. Previously, if pty.spawn threw an exception (e.g., if the underlying shell executable was not found), the error would not be caught, leading to an unhandled promise rejection. The pty.spawn call is now wrapped in a try...catch block. If a synchronous error occurs, the service returns a resolved promise with a ShellExecutionResult that captures the error, ensuring graceful failure. An accompanying test case has been added to validate this new error handling path, and an obsolete test for the old child_process 'error' event has been removed. Minor refactoring was also applied, including removing an unused variable and moving a helper function to the module scope.
f140dde to
c53a330
Compare
ea1308e to
6a9caca
Compare
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
olcan
left a comment
There was a problem hiding this comment.
I'm approving this but please see my feedback, mostly about adding some comments and watching out for certain things like the switch to mixed output in model context, and new dependencies (e.g. on python for node-pty) that likely require robust testing across platforms + environments (incl. sandboxes) + terminals + commands, including commands that push terminal features e.g. various editors, gemini-cli itself, other clis, etc. This would be less for the model but more for human dev running these things via shell mode (which shares the shell execution service w/ shell tool iiuc). I would also test robust termination of subprocess hierarchies given the switch in termination logic.
packages/core/package.json
Outdated
| "undici": "^7.10.0", | ||
| "ws": "^8.18.0" | ||
| "ws": "^8.18.0", | ||
| "node-pty": "1.0.0", |
There was a problem hiding this comment.
One concern I had with node-pty before, when I was experimenting with interactivity, was that it had a dependency on python, which felt too heavy at the time. Is that still the case, and are we good with it now given the benefits?
There was a problem hiding this comment.
It does depend on python but node-pty has installers and stuff to make sure it is installed correctly on all os's, i think in order to support this feature + better ansi key handling, node-pty is need.
One more thing that came to mind is the background PID detection via pgrep, see around gemini-cli/packages/core/src/tools/shell.ts Lines 304 to 305 in f81ccd8 I would also make sure that gemini-cli/packages/core/src/tools/shell.ts Lines 341 to 342 in f81ccd8 kill -pgid and this is indeed the most robust way for the model to kill a background process group.
|
2f01a1d to
3fc6ecb
Compare
3fc6ecb to
d4a69d5
Compare
c18a9ca to
2954282
Compare
2954282 to
4d3b018
Compare
664c63b to
307e696
Compare
This is an empty merge commit to maintain parity with upstream structure. All changes have already been cherry-picked: - feat(shell): Include disallowed commands in block reason (#6278) - Fix license notice generation script (#6272) - IDE integration multi-folder support (#6265, #6177) - Show /ide enable & /ide disable commands (#6248) - Update versioning script (#6075) - Add eslint as recommended extension (#6196) - Add recommended extensions list (#5810) - Generate NOTICES.TXT and surface via command (#5310) - Stylize diff line numbers & characters (#6269) - Various other improvements and fixes Note: feat(core): refactor shell execution to use node-pty (#6088) was cherry-picked but later reverted upstream in #6328. We kept it as it works well with llxprt. Maintains llxprt's multi-provider support, branding, ServerToolsProvider, and authentication differences while staying in sync with upstream improvements.

TLDR
This pull request replaces the standard
child_process.spawnwithnode-ptyfor executing shell commands. This change introduces a pseudo-terminal (pty) that allows for a much more accurate and feature-rich terminal emulation within the Gemini CLI. The primary benefit is the correct handling of complex ANSI escape codes, colors, and cursor movements, which were previously either stripped or not handled correctly.Dive Deeper
The core of this change is in the
ShellExecutionService. Instead of capturingstdoutandstderrstreams fromchild_process, we now spawn a pseudo-terminal usingnode-pty. The output of this pty is then piped into a headless terminal instance from@xterm/headless. This allows us to interpret the raw terminal output stream and get the "rendered" text content, correctly processing complex terminal interactions like cursor movements, screen clearing, and advanced color formatting that a simplestrip-ansicall cannot handle.To ensure the pseudo-terminal behaves as closely as possible to the user's actual terminal, the terminal dimensions (columns and rows) are now detected in the UI layer using a new
useTerminalSizehook. These dimensions are then passed down through the tool scheduling layers (useReactToolScheduler,CoreToolScheduler) all the way to theShellExecutionServicewhen a command is executed.Since a pty provides a single, unified output stream (just like a real terminal), the distinction between
stdoutandstderrat the stream handling level has been removed inshellCommandProcessorandShellExecutionService, simplifying the output processing logic.Reviewer Test Plan
npm installto fetch the newnode-ptyand@xterm/headlessdependencies. Then, runnpm run build.ls -laecho "hello world"npm run preflightnpm installin a project with many dependencies.npx jest --color.sleep 10) and cancel it withCtrl+C. Verify that the process is terminated correctly.Testing Matrix
Linked issues / bugs