Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions docs/cli/visual-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Visual validation and TTY testing

Gemini CLI uses a multi-layered approach to validate its user interface (UI) and
ensure the CLI boots correctly in real terminal environments. This document
explains the tools and techniques used for visual regression and bootstrap
testing.

## Overview

While standard integration tests focus on logic and file system operations,
visual validation ensures that the terminal output looks correct to the user. We
use two primary methods for this:

1. **TTY Bootstrap Smoke Tests:** Spawns the actual built binary in a real
pseudo-terminal (PTY) to verify startup and basic interactivity.
2. **Visual Regression (SVG Snapshots):** Renders integrated UI flows inside a
virtual terminal and compares the output against committed "golden" SVG
baselines.

## TTY bootstrap smoke tests

These tests validate that the Gemini CLI binary can successfully initialize and
render its Ink-based UI in a real terminal environment. They catch issues like
missing dependencies, broken startup sequences, or TTY-specific crashes.

These tests are located in `packages/cli/integration-tests/`.

### Running TTY tests

To run the bootstrap smoke test, use the following command:

```bash
npm test -w @google/gemini-cli -- integration-tests/bootstrap.test.ts
```

### How it works

The test utility `runInteractive` (found in `@google/gemini-cli-test-utils`)
uses `node-pty` to spawn the CLI. It provides a programmable interface to wait
for specific text markers and send simulated user input.

```typescript
const run = await runInteractive();
const readyMarker = 'Type your message or @path/to/file';
await run.expectText(readyMarker, 30000); // Wait for the main prompt
await run.kill();
```

### TDD Example: Adding a Welcome Message

To add a new visual feature like a "Welcome to Gemini CLI!" message:

1. **Write the failing test:** Update `bootstrap.test.ts` to expect the new
string.
```typescript
const welcomeMessage = 'Welcome to Gemini CLI!';
await run.expectText(welcomeMessage, 30000);
```
2. **Verify failure:** Run `npm test` and observe the TTY rig reporting the
missing text.
3. **Implement the feature:** Add the message to `AppHeader.tsx`.
4. **Verify success:** Rebuild the binary (`npm run bundle`) and run the test
again to see it pass.

## Visual regression with SVG snapshots

To automate the verification of complex UI layouts (like tables, progress bars,
or policy warnings), we use **SVG Snapshots**. This approach captures colors,
spacing, and text formatting in a deterministic way.

These tests are located in `packages/cli/src/ui/` and use the `AppRig` utility.

### Running visual tests

To run the visual validation suite, use the following command:

```bash
npm test -w @google/gemini-cli -- src/ui/PolicyVisual.test.tsx
```

### Updating snapshots

If you intentionally change the UI, the visual tests will fail because the
actual output no longer matches the saved snapshot. To "bless" your changes and
update the snapshots, run the tests with the update flag:

```bash
npm test -w @google/gemini-cli -- src/ui/PolicyVisual.test.tsx -u
```

After updating, you must review the resulting `.snap.svg` files in the
`__snapshots__` directory to ensure they look as intended.

### New use cases unlocked

This framework allows maintainers to validate scenarios that were previously
difficult to automate:

- **Policy Visibility:** Ensuring that security blocks or "Ask User" prompts are
clearly rendered and not suppressed by error verbosity settings.
- **Integrated Flow Validation:** Testing the full cycle of a model response
triggering a tool, which is then handled by the policy engine and displayed in
the UI.
- **Startup Health:** Verifying that changes to the core scheduler or config
resolution don't cause the app to hang in the "Initializing..." state.

## Comparison with existing tests

| Test Type | Rig Used | Environment | Best For |
| :-------------------- | :--------- | :---------------- | :---------------------------------- |
| **Integration (E2E)** | `TestRig` | Headless / Binary | File system logic, tool execution |
| **Bootstrap Smoke** | `node-pty` | Real PTY / Binary | Startup health, TTY compatibility |
| **Visual (Snapshot)** | `AppRig` | Virtual / Ink | UI layout, colors, integrated flows |
| **Behavioral (Old)** | `AppRig` | Virtual / Ink | Model decision-making and steering |

## Why this matters

Existing testing layers often miss critical user experience regressions:

- **Integration tests** may pass if the logic is sound, but they won't detect if
the app hangs during UI initialization or if the binary fails to communicate
with the TTY.
- **Behavioral evaluations** validate the model's intent, but they don't ensure
that the resulting state (like a policy violation) is actually visible to the
user.

The new validation tools bridge these gaps. For example, they were used to
expose critical issues where visual feedback for the Policy Engine was
suppressed in certain modes and the core scheduler was prone to TTY-based race
conditions. The high-fidelity validation provided by these tools was essential
for identifying and verifying the fixes for these issues.

## Next steps

- **Extend Coverage:** Add SVG snapshots for more complex components like
`DiffRenderer` or `McpStatus`.
- **CI Integration:** Ensure TTY-based tests run in GitHub Actions environments
that support pseudo-terminals.
3 changes: 3 additions & 0 deletions docs/integration-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ verify that it behaves as expected when interacting with the file system.
These tests are located in the `integration-tests` directory and are run using a
custom test runner.

For information about visual regression and TTY bootstrap testing, see
[Visual validation and TTY testing](/docs/cli/visual-validation.md).

## Building the tests

Prior to running any integration tests, you need to create a release bundle that
Expand Down
4 changes: 4 additions & 0 deletions docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,10 @@
"items": [
{ "label": "Contribution guide", "slug": "docs/contributing" },
{ "label": "Integration testing", "slug": "docs/integration-tests" },
{
"label": "Visual validation and TTY",
"slug": "docs/cli/visual-validation"
},
{
"label": "Issue and PR automation",
"slug": "docs/issue-and-pr-automation"
Expand Down
39 changes: 39 additions & 0 deletions packages/cli/integration-tests/bootstrap.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/**
* @license
* Copyright 2026 Google LLC
* SPDX-License-Identifier: Apache-2.0
*/

import { describe, it, beforeEach, afterEach } from 'vitest';
import { TestRig } from '@google/gemini-cli-test-utils';

describe('Gemini CLI TTY Bootstrap', () => {
let rig: TestRig;

beforeEach(() => {
rig = new TestRig();
rig.setup('TTY Bootstrap Smoke Test');
});

afterEach(async () => {
await rig.cleanup();
});

it('should render the interactive UI and display the ready marker in a TTY', async () => {
// Spawning the CLI in a pseudo-TTY with a dummy API key to bypass auth prompt
const run = await rig.runInteractive({
env: { GEMINI_API_KEY: 'dummy-key' },
});

// The ready marker we expect to see
const readyMarker = 'Type your message or @path/to/file';
const welcomeMessage = 'Welcome to Gemini CLI!';

// Verify the initial render completes and displays the markers
await run.expectText(welcomeMessage, 30000);
await run.expectText(readyMarker, 30000);
Comment on lines +33 to +34
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are a couple of improvements that can be made here:

  1. Redundant Check: The rig.runInteractive() helper already waits for the ready marker (' Type your message or @path/to/file') before its promise resolves. Therefore, the call to run.expectText(readyMarker, ...) on line 34 is redundant and can be removed. The test only needs to verify the new welcome message.

  2. Hardcoded Timeout: The hardcoded timeout of 30000 should be avoided. The expectText function is designed to use getDefaultTimeout() from the test utilities when the timeout argument is omitted. This aligns with the principle of using consistent, managed mechanisms for time-sensitive operations, similar to how AbortSignal is preferred for cancellation over separate timeouts. This allows timeouts to adjust automatically for different environments (e.g., 15s locally, 60s in CI), improving test robustness and maintainability.

By removing the redundant check and the hardcoded timeout, the test becomes cleaner and more aligned with the existing test utilities.

Suggested change
await run.expectText(welcomeMessage, 30000);
await run.expectText(readyMarker, 30000);
await run.expectText(welcomeMessage);
References
  1. Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns. This principle extends to using standardized timeout mechanisms like getDefaultTimeout() in tests instead of hardcoded values.


// If we reached here, the smoke test passed
await run.kill();
});
});
8 changes: 8 additions & 0 deletions packages/cli/src/ui/components/AppHeader.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,14 @@ export const AppHeader = ({ version, showDetails = true }: AppHeaderProps) => {
/>
)}

{showHeader && (
<Box paddingLeft={2} marginTop={1}>
<Text italic color={theme.text.secondary}>
Welcome to Gemini CLI!
</Text>
</Box>
)}

{!(settings.merged.ui.hideTips || config.getScreenReader()) &&
showTips && <Tips config={config} />}
</Box>
Expand Down
Loading