-
Notifications
You must be signed in to change notification settings - Fork 13k
feat(cli): implement visual validation framework and TTY smoke tests #22461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mattKorwel
wants to merge
7
commits into
main
Choose a base branch
from
mk-ux-validation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
5833b84
feat(cli): implement visual validation framework and TTY smoke tests
mattKorwel 117d3e2
feat(cli): add welcome message and exercise TDD with TTY smoke tests
mattKorwel 55a138a
docs(cli): add TDD welcome message example to visual validation guide
mattKorwel 6880bc2
chore: remove core fixes from validation PR
mattKorwel 1a4fbbf
chore: move policy visual tests to core fixes PR
mattKorwel f4f4c5f
docs(cli): focus visual validation guide on exposure of issues
mattKorwel a85a2ea
Merge branch 'main' into mk-ux-validation
mattKorwel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| # Visual validation and TTY testing | ||
|
|
||
| Gemini CLI uses a multi-layered approach to validate its user interface (UI) and | ||
| ensure the CLI boots correctly in real terminal environments. This document | ||
| explains the tools and techniques used for visual regression and bootstrap | ||
| testing. | ||
|
|
||
| ## Overview | ||
|
|
||
| While standard integration tests focus on logic and file system operations, | ||
| visual validation ensures that the terminal output looks correct to the user. We | ||
| use two primary methods for this: | ||
|
|
||
| 1. **TTY Bootstrap Smoke Tests:** Spawns the actual built binary in a real | ||
| pseudo-terminal (PTY) to verify startup and basic interactivity. | ||
| 2. **Visual Regression (SVG Snapshots):** Renders integrated UI flows inside a | ||
| virtual terminal and compares the output against committed "golden" SVG | ||
| baselines. | ||
|
|
||
| ## TTY bootstrap smoke tests | ||
|
|
||
| These tests validate that the Gemini CLI binary can successfully initialize and | ||
| render its Ink-based UI in a real terminal environment. They catch issues like | ||
| missing dependencies, broken startup sequences, or TTY-specific crashes. | ||
|
|
||
| These tests are located in `packages/cli/integration-tests/`. | ||
|
|
||
| ### Running TTY tests | ||
|
|
||
| To run the bootstrap smoke test, use the following command: | ||
|
|
||
| ```bash | ||
| npm test -w @google/gemini-cli -- integration-tests/bootstrap.test.ts | ||
| ``` | ||
|
|
||
| ### How it works | ||
|
|
||
| The test utility `runInteractive` (found in `@google/gemini-cli-test-utils`) | ||
| uses `node-pty` to spawn the CLI. It provides a programmable interface to wait | ||
| for specific text markers and send simulated user input. | ||
|
|
||
| ```typescript | ||
| const run = await runInteractive(); | ||
| const readyMarker = 'Type your message or @path/to/file'; | ||
| await run.expectText(readyMarker, 30000); // Wait for the main prompt | ||
| await run.kill(); | ||
| ``` | ||
|
|
||
| ### TDD Example: Adding a Welcome Message | ||
|
|
||
| To add a new visual feature like a "Welcome to Gemini CLI!" message: | ||
|
|
||
| 1. **Write the failing test:** Update `bootstrap.test.ts` to expect the new | ||
| string. | ||
| ```typescript | ||
| const welcomeMessage = 'Welcome to Gemini CLI!'; | ||
| await run.expectText(welcomeMessage, 30000); | ||
| ``` | ||
| 2. **Verify failure:** Run `npm test` and observe the TTY rig reporting the | ||
| missing text. | ||
| 3. **Implement the feature:** Add the message to `AppHeader.tsx`. | ||
| 4. **Verify success:** Rebuild the binary (`npm run bundle`) and run the test | ||
| again to see it pass. | ||
|
|
||
| ## Visual regression with SVG snapshots | ||
|
|
||
| To automate the verification of complex UI layouts (like tables, progress bars, | ||
| or policy warnings), we use **SVG Snapshots**. This approach captures colors, | ||
| spacing, and text formatting in a deterministic way. | ||
|
|
||
| These tests are located in `packages/cli/src/ui/` and use the `AppRig` utility. | ||
|
|
||
| ### Running visual tests | ||
|
|
||
| To run the visual validation suite, use the following command: | ||
|
|
||
| ```bash | ||
| npm test -w @google/gemini-cli -- src/ui/PolicyVisual.test.tsx | ||
| ``` | ||
|
|
||
| ### Updating snapshots | ||
|
|
||
| If you intentionally change the UI, the visual tests will fail because the | ||
| actual output no longer matches the saved snapshot. To "bless" your changes and | ||
| update the snapshots, run the tests with the update flag: | ||
|
|
||
| ```bash | ||
| npm test -w @google/gemini-cli -- src/ui/PolicyVisual.test.tsx -u | ||
| ``` | ||
|
|
||
| After updating, you must review the resulting `.snap.svg` files in the | ||
| `__snapshots__` directory to ensure they look as intended. | ||
|
|
||
| ### New use cases unlocked | ||
|
|
||
| This framework allows maintainers to validate scenarios that were previously | ||
| difficult to automate: | ||
|
|
||
| - **Policy Visibility:** Ensuring that security blocks or "Ask User" prompts are | ||
| clearly rendered and not suppressed by error verbosity settings. | ||
| - **Integrated Flow Validation:** Testing the full cycle of a model response | ||
| triggering a tool, which is then handled by the policy engine and displayed in | ||
| the UI. | ||
| - **Startup Health:** Verifying that changes to the core scheduler or config | ||
| resolution don't cause the app to hang in the "Initializing..." state. | ||
|
|
||
| ## Comparison with existing tests | ||
|
|
||
| | Test Type | Rig Used | Environment | Best For | | ||
| | :-------------------- | :--------- | :---------------- | :---------------------------------- | | ||
| | **Integration (E2E)** | `TestRig` | Headless / Binary | File system logic, tool execution | | ||
| | **Bootstrap Smoke** | `node-pty` | Real PTY / Binary | Startup health, TTY compatibility | | ||
| | **Visual (Snapshot)** | `AppRig` | Virtual / Ink | UI layout, colors, integrated flows | | ||
| | **Behavioral (Old)** | `AppRig` | Virtual / Ink | Model decision-making and steering | | ||
|
|
||
| ## Why this matters | ||
|
|
||
| Existing testing layers often miss critical user experience regressions: | ||
|
|
||
| - **Integration tests** may pass if the logic is sound, but they won't detect if | ||
| the app hangs during UI initialization or if the binary fails to communicate | ||
| with the TTY. | ||
| - **Behavioral evaluations** validate the model's intent, but they don't ensure | ||
| that the resulting state (like a policy violation) is actually visible to the | ||
| user. | ||
|
|
||
| The new validation tools bridge these gaps. For example, they were used to | ||
| expose critical issues where visual feedback for the Policy Engine was | ||
| suppressed in certain modes and the core scheduler was prone to TTY-based race | ||
| conditions. The high-fidelity validation provided by these tools was essential | ||
| for identifying and verifying the fixes for these issues. | ||
|
|
||
| ## Next steps | ||
|
|
||
| - **Extend Coverage:** Add SVG snapshots for more complex components like | ||
| `DiffRenderer` or `McpStatus`. | ||
| - **CI Integration:** Ensure TTY-based tests run in GitHub Actions environments | ||
| that support pseudo-terminals. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| /** | ||
| * @license | ||
| * Copyright 2026 Google LLC | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| import { describe, it, beforeEach, afterEach } from 'vitest'; | ||
| import { TestRig } from '@google/gemini-cli-test-utils'; | ||
|
|
||
| describe('Gemini CLI TTY Bootstrap', () => { | ||
| let rig: TestRig; | ||
|
|
||
| beforeEach(() => { | ||
| rig = new TestRig(); | ||
| rig.setup('TTY Bootstrap Smoke Test'); | ||
| }); | ||
|
|
||
| afterEach(async () => { | ||
| await rig.cleanup(); | ||
| }); | ||
|
|
||
| it('should render the interactive UI and display the ready marker in a TTY', async () => { | ||
| // Spawning the CLI in a pseudo-TTY with a dummy API key to bypass auth prompt | ||
| const run = await rig.runInteractive({ | ||
| env: { GEMINI_API_KEY: 'dummy-key' }, | ||
| }); | ||
|
|
||
| // The ready marker we expect to see | ||
| const readyMarker = 'Type your message or @path/to/file'; | ||
| const welcomeMessage = 'Welcome to Gemini CLI!'; | ||
|
|
||
| // Verify the initial render completes and displays the markers | ||
| await run.expectText(welcomeMessage, 30000); | ||
| await run.expectText(readyMarker, 30000); | ||
|
|
||
| // If we reached here, the smoke test passed | ||
| await run.kill(); | ||
| }); | ||
| }); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of improvements that can be made here:
Redundant Check: The
rig.runInteractive()helper already waits for the ready marker (' Type your message or @path/to/file') before its promise resolves. Therefore, the call torun.expectText(readyMarker, ...)on line 34 is redundant and can be removed. The test only needs to verify the new welcome message.Hardcoded Timeout: The hardcoded timeout of
30000should be avoided. TheexpectTextfunction is designed to usegetDefaultTimeout()from the test utilities when the timeout argument is omitted. This aligns with the principle of using consistent, managed mechanisms for time-sensitive operations, similar to howAbortSignalis preferred for cancellation over separate timeouts. This allows timeouts to adjust automatically for different environments (e.g., 15s locally, 60s in CI), improving test robustness and maintainability.By removing the redundant check and the hardcoded timeout, the test becomes cleaner and more aligned with the existing test utilities.
References
getDefaultTimeout()in tests instead of hardcoded values.