diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..8651bfa --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,59 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project + +OpenCode CLI Enforcer — an OpenCode plugin that orchestrates Claude, Gemini, and Codex CLIs with resilience (circuit breakers, retry with backoff, automatic fallback). Published to npm as `opencode-cli-enforcer`. + +## Commands + +```bash +bun install # Install dependencies +bun test # Run all tests +bun test --watch # Run tests in watch mode +bun test tests/retry.test.ts # Run a single test file +bun run typecheck # Type-check without emitting +bun run build # Build (no bundle) +``` + +Runtime is **Bun** (>=1.3.5), not Node. Tests use Bun's built-in test runner (`bun:test`). There is no linter or formatter configured. + +## Architecture + +The plugin exports four tools (`cli_exec`, `cli_status`, `cli_list`, `cli_route`) via the OpenCode plugin interface. + +**Request flow through `cli_exec`:** + +``` +index.ts (plugin entry, tool definitions, hooks) + → resilience.ts (global time budget, retry + circuit breaker + fallback) + → circuit-breaker.ts (per-CLI isolation: 3 failures OR 5 timeouts → open) + → retry.ts (exponential backoff with jitter, abort-aware sleep) + → executor.ts (execa wrapper: structured results, Windows .cmd handling, PATH augmentation) + → cli-defs.ts (arg builders + dynamic --max-turns for Claude) + → detection.ts (CLI availability via which/where, 5-min cache) + → safe-env.ts (allowlisted env vars only, no API keys) + → redact.ts (strips API keys from output) + → error-classifier.ts (transient/rate_limit/permanent/crash) +``` + +**Key state in `index.ts`:** three `Map`s — `breakers` (circuit breaker per CLI), `cliAvailability` (detection results), `usageStats` (call counts/timing). CLI detection runs non-blocking at startup via `Promise.allSettled`. + +**Global time budget** (`resilience.ts`): a single timeout budget shared across all retries AND fallbacks, preventing timeout multiplication. Process timeouts skip retries and go straight to fallback. + +**Circuit breaker** has separate thresholds: opens after 3 failures OR 5 timeouts (slow ≠ broken), cooldown 60s. **Retry**: max 2 retries, 1s base delay, 10s max, 0.3 jitter factor. + +**Role-based routing** (`cli-defs.ts`): 6 agent roles (manager, coordinator, developer, researcher, reviewer, architect) map to optimal CLI providers via `cli_route`. + +## Cross-Platform + +- `platform.ts` exports `PLATFORM` and `IS_WINDOWS` +- Binary detection uses `which` (Unix) / `where` (Windows) with 5-minute cache +- Windows: `.cmd/.bat` shim handling via `cmd /c`, PATH augmentation (npm, scoop, cargo, pnpm) +- Large prompts (>30KB) delivered via stdin to avoid OS arg-length limits +- CI runs on ubuntu, windows, and macos + +## Release + +The release workflow (`.github/workflows/release.yml`) requires a production environment approval gate, publishes to npm with provenance attestation using Node 22, and creates a GitHub release with a git tag. diff --git a/README.md b/README.md index 1e47ef0..4238e6e 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,92 @@

- opencode-cli-enforcer
- Resilient multi-LLM CLI orchestration for OpenCode + opencode-cli-enforcer +

+ +

opencode-cli-enforcer

+ +

+ Resilient multi-LLM CLI orchestration for OpenCode
+ Execute Claude, Gemini & Codex with circuit breakers, smart retry, automatic fallback, and role-based routing.

CI - npm + npm License + Platform + Bun +

+ +--- + +## Why opencode-cli-enforcer? + +Running AI CLIs in production is fragile. Processes timeout, rate limits hit, binaries disappear. Calling three different CLIs means three different failure modes, arg formats, and platform quirks. + +**opencode-cli-enforcer** wraps all of that into a single, resilient plugin: + + + + + + +
+ +**Without this plugin** +- Manual subprocess management +- No retry on transient failures +- One CLI down = entire workflow blocked +- OS-specific arg handling per CLI +- Secrets leak into error logs +- No visibility into CLI health + + + +**With this plugin** +- 4 tools, zero boilerplate +- Exponential backoff + jitter retry +- Automatic fallback chain across providers +- Cross-platform (Windows `.cmd` shims, PATH augmentation) +- Secret redaction on all output +- Real-time health dashboard + +
+ +--- + +## Architecture + +

+ Architecture diagram

+``` +cli_exec(prompt) + | + v ++----------------------------------------------------------+ +| Resilience Engine | +| | +| Global Time Budget (shared across ALL attempts) | +| +---------+ +---------+ +---------+ | +| | Claude | --> | Gemini | --> | Codex | fallback | +| +---------+ +---------+ +---------+ chain | +| | | | | +| v v v | +| [Circuit Breaker] -----> [Retry w/ Backoff] --> [execa] | +| 3 failures = open max 2 retries 10MB | +| 5 timeouts = open 1s-10s + jitter buffer | +| 60s cooldown abort-aware sleep | ++----------------------------------------------------------+ +``` + --- -Execute Claude, Gemini, and Codex CLIs with automatic OS detection, circuit breaker pattern, retry with exponential backoff, and provider fallback. Cross-platform (Windows/macOS/Linux). +## Quick Start -## Install +### 1. Install as OpenCode plugin (recommended) -### OpenCode plugin (recommended) +Add to your OpenCode configuration: ```json { @@ -23,77 +94,478 @@ Execute Claude, Gemini, and Codex CLIs with automatic OS detection, circuit brea } ``` -### npm +### 2. Install via npm / bun ```bash bun add opencode-cli-enforcer +# or +npm install opencode-cli-enforcer ``` -## Tools +### 3. Prerequisites + +You need at least **one** CLI installed and authenticated: + +| CLI | Install | Auth | +|-----|---------|------| +| [Claude Code](https://docs.anthropic.com/en/docs/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude login` | +| [Gemini CLI](https://github.com/google-gemini/gemini-cli) | `npm i -g @anthropic-ai/gemini-cli` | `gcloud auth login` | +| [Codex CLI](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex auth` | + +--- + +## Tools Reference + +### `cli_exec` — Execute with full resilience + +The primary tool. Sends a prompt to a CLI with automatic retry, circuit breaker protection, and fallback. -### `cli_exec` — Execute a CLI with full resilience +```typescript +cli_exec({ + cli: "claude", + prompt: "Explain the observer pattern with a TypeScript example", + mode: "generate", // "generate" | "analyze" + timeout_seconds: 300, // Global budget: 10-1800s + allow_fallback: true // Try gemini/codex on failure +}) +``` | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `cli` | `"claude" \| "gemini" \| "codex"` | required | Primary CLI | -| `prompt` | `string` | required | Prompt to send | -| `mode` | `"generate" \| "analyze"` | `"generate"` | `analyze` enables file reads (Claude) | -| `timeout_seconds` | `number` | `720` | Max seconds (10-1800) | -| `allow_fallback` | `boolean` | `true` | Try alternatives on failure | +| `cli` | `"claude" \| "gemini" \| "codex"` | *required* | Primary CLI provider | +| `prompt` | `string` | *required* | Prompt to send (max 100KB) | +| `mode` | `"generate" \| "analyze"` | `"generate"` | `analyze` enables file reads (Claude only) | +| `timeout_seconds` | `number` | `720` | Global timeout budget in seconds | +| `allow_fallback` | `boolean` | `true` | Auto-fallback to alternative providers | + +**Response:** + +```jsonc +{ + "success": true, + "cli": "claude", + "stdout": "The Observer pattern is a behavioral design pattern...", + "stderr": "", + "duration_ms": 4523, + "timed_out": false, + "used_fallback": false, + "fallback_chain": ["claude"], + "error": null, + "error_class": null, // "transient" | "rate_limit" | "permanent" | "crash" + "circuit_state": "closed", // "closed" | "open" | "half-open" + "attempt": 1, + "max_attempts": 3 +} +``` + +--- + +### `cli_status` — Health dashboard + +Returns real-time health for all providers: installation status, circuit breaker state, and usage statistics. -### `cli_status` — Health check dashboard +```typescript +cli_status({}) +``` + +**Response:** + +```jsonc +{ + "platform": "windows", + "detection_complete": true, + "retry_config": { "max_retries": 2, "base_delay_ms": 1000, "max_delay_ms": 10000 }, + "breaker_config": { "failure_threshold": 3, "timeout_threshold": 5, "cooldown_seconds": 60 }, + "providers": [ + { + "name": "claude", + "installed": true, + "path": "/usr/local/bin/claude", + "version": "1.0.16", + "circuit_breaker": { + "state": "closed", + "consecutive_failures": 0, + "consecutive_timeouts": 0, + "total_executions": 12, + "total_failures": 1, + "total_timeouts": 0 + }, + "usage": { + "total_calls": 12, + "success_rate": "92%", + "avg_duration_ms": 3400 + }, + "fallback_order": ["gemini", "codex"] + } + // ... gemini, codex + ] +} +``` + +--- + +### `cli_list` — List installed providers + +Quick check of which CLIs are available on the system. + +```typescript +cli_list({}) +``` + +**Response:** + +```jsonc +{ + "installed_count": 2, + "providers": [ + { "provider": "claude", "path": "/usr/local/bin/claude", "version": "1.0.16", "strengths": ["reasoning", "code-analysis", "debugging", "architecture", "planning"] }, + { "provider": "gemini", "path": "/usr/local/bin/gemini", "version": "0.1.8", "strengths": ["research", "trends", "knowledge", "large-context", "web-search"] } + ] +} +``` + +--- + +### `cli_route` — Role-based routing -Returns platform info, detection status, circuit breaker states, and usage stats for all providers. +Recommends the best CLI for a task based on agent role. Considers both provider strengths and real-time availability. + +```typescript +cli_route({ + role: "developer", + task_description: "Refactor the auth module to use JWT" +}) +``` + +| Parameter | Type | Description | +|-----------|------|-------------| +| `role` | `"manager" \| "coordinator" \| "developer" \| "researcher" \| "reviewer" \| "architect"` | Agent role | +| `task_description` | `string?` | Optional context | + +**Routing table:** + +

+ Role routing table +

+ +| Role | Primary CLI | Reasoning | +|------|------------|-----------| +| **Manager** | Gemini | Research, trends, large-context analysis | +| **Coordinator** | Claude | Reasoning, planning, decision-making | +| **Developer** | Codex | Code generation, refactoring, full-auto | +| **Researcher** | Gemini | Knowledge synthesis, web search | +| **Reviewer** | Claude | Code analysis, debugging, quality | +| **Architect** | Claude | System design, architecture planning | + +**Response:** + +```jsonc +{ + "role": "developer", + "task_description": "Refactor the auth module to use JWT", + "recommended_cli": "codex", + "reasoning": "Role \"developer\" maps to codex (code-generation, edits, refactoring, full-auto).", + "fallback_chain": ["codex", "claude", "gemini"], + "availability": { "codex": true, "claude": true, "gemini": false } +} +``` + +--- ## Resilience Pipeline +

+ Resilience pipeline +

+ +### Global Time Budget + +Unlike per-attempt timeouts, the **global time budget** is shared across ALL retries and ALL fallback providers. This prevents timeout multiplication: + ``` -Request --> Circuit Breaker --> Retry (3x, exp backoff) --> Execute (execa) - | | | - v v v - If open: If exhausted: On failure: - skip to try next CLI record + retry - fallback in chain +Traditional: 3 providers x 3 attempts x 300s timeout = 2700s worst case +This plugin: 300s total budget across everything = 300s worst case ``` -**Circuit Breaker States:** +Each attempt receives the **remaining** seconds, not the full budget. When the budget runs out, execution stops immediately. -| State | Behavior | -|-------|----------| -| closed | Normal — requests pass through | -| open | Blocked — 3+ failures, 60s cooldown | -| half-open | Probe — 1 request to test recovery | +### Circuit Breaker -**Fallback Order:** `claude --> gemini --> codex` +Per-CLI failure isolation with **separate thresholds** for failures and timeouts (because slow ≠ broken): -## Supported CLIs +

+ Circuit breaker states +

+ +| State | Behavior | Transition | +|-------|----------|------------| +| **Closed** | Normal operation, requests pass through | 3 failures OR 5 timeouts → Open | +| **Open** | All requests blocked, provider is skipped | After 60s cooldown → Half-Open | +| **Half-Open** | One probe request allowed | Success → Closed / Failure → Open | + +### Retry with Exponential Backoff + +``` +Attempt 0: immediate +Attempt 1: ~1s + jitter (+-30%) +Attempt 2: ~2s + jitter (+-30%) + capped at 10s max +``` + +- **Transient errors** (network, socket): standard retry +- **Rate limits** (429, quota): retry with 3x longer delay +- **Process timeouts**: skip retries entirely, move to next provider +- **Permanent errors** (auth, 401/403): skip retries, move to fallback +- **Crash** (SIGKILL, ENOENT): skip retries, move to fallback + +### Error Classification + +``` +Error arrives + | + +-- exitCode 137 / SIGKILL / ENOENT ---------> CRASH (no retry) + +-- 429 / "rate limit" / "quota" -------------> RATE_LIMIT (retry, 3x delay) + +-- 401 / 403 / "auth" / "not found" --------> PERMANENT (no retry) + +-- everything else --------------------------> TRANSIENT (retry) +``` + +### Fallback Chain + +When a provider fails, the next one in the chain takes over automatically: + +``` +Claude ---[fail]---> Gemini ---[fail]---> Codex +Gemini ---[fail]---> Claude ---[fail]---> Codex +Codex ---[fail]---> Claude ---[fail]---> Gemini +``` + +--- + +## Cross-Platform Support + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FeatureWindowsmacOS / Linux
Binary detectionwherewhich
.cmd/.bat shimsAuto-wrapped with cmd /cN/A
PATH augmentationnpm, scoop, cargo, pnpm dirsStandard PATH
Large prompts (>30KB)Delivered via stdin to avoid OS arg-length limits
EnvironmentAllowlisted vars only (no secrets leak to subprocesses)
+ +### Detection Caching + +CLI availability is cached for **5 minutes** to avoid repeated filesystem lookups. The cache covers: +- Binary path resolution +- Version detection +- Both positive and negative results + +--- + +## Security + +| Protection | Description | +|-----------|-------------| +| **Secret redaction** | API keys (`sk-*`, `key-*`, `AIza*`, `ant-api*`) and Bearer tokens stripped from all output | +| **Environment filtering** | Only system essentials + proxy vars passed to subprocesses. No API keys — CLIs handle their own auth. | +| **Input isolation** | Large prompts (>30KB) delivered via stdin, not shell args | +| **No shell interpolation** | All CLI execution via `execa` (no `shell: true`) | + +--- + +## Examples + +### Basic: Ask Claude to review code + +```typescript +const result = await cli_exec({ + cli: "claude", + prompt: "Review this function for bugs:\n\nfunction add(a, b) { return a - b }", + mode: "analyze", + timeout_seconds: 120 +}) +// result.stdout → "Bug found: the function is named `add` but performs subtraction..." +``` + +### Fallback: Primary CLI is down + +```typescript +// Claude's circuit breaker is open (too many recent failures) +const result = await cli_exec({ + cli: "claude", + prompt: "Generate a REST API for user management", + allow_fallback: true +}) +// result.cli → "gemini" (automatic fallback) +// result.used_fallback → true +``` + +### Role routing: Pick the right tool for the job + +```typescript +// For a developer task, route to Codex (best at code generation) +const recommendation = await cli_route({ + role: "developer", + task_description: "Implement pagination for the /users endpoint" +}) +// recommendation.recommended_cli → "codex" + +// Then execute with the recommended CLI +const result = await cli_exec({ + cli: recommendation.recommended_cli, + prompt: "Implement pagination for the /users endpoint using cursor-based pagination" +}) +``` -| CLI | Best For | -|-----|----------| -| Claude | Reasoning, code analysis, debugging, architecture | -| Gemini | Research, broad knowledge, large context | -| Codex | Code generation, edits, refactoring | +### Monitor health across providers -## Prerequisites +```typescript +const status = await cli_status({}) -- [Bun](https://bun.sh) runtime -- At least one CLI: [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or [Codex CLI](https://github.com/openai/codex) +for (const provider of status.providers) { + console.log(`${provider.name}: ${provider.circuit_breaker.state} | ${provider.usage.success_rate}`) +} +// claude: closed | 95% +// gemini: closed | 88% +// codex: open | 60% <-- circuit breaker tripped +``` + +### Large prompt via stdin + +```typescript +const largeCodebase = readFileSync("src/index.ts", "utf-8") // 45KB file +const result = await cli_exec({ + cli: "claude", + prompt: `Analyze this codebase for security vulnerabilities:\n\n${largeCodebase}`, + mode: "analyze", + timeout_seconds: 600 +}) +// Prompt >30KB → automatically delivered via stdin (no OS arg-length issues) +``` + +--- + +## Hooks + +The plugin injects two hooks into the OpenCode lifecycle: + +### `experimental.chat.system.transform` + +Automatically injects CLI availability into the system prompt of every agent (except `orchestrator` and `task_decomposer`), so the LLM knows which tools are available and their current health. + +### `tool.execute.after` + +Tracks when agents invoke CLIs directly via `bash` instead of using `cli_exec`, incrementing usage counters for observability. + +--- + +## Provider Strengths + +

+ Provider strengths +

+ +| Provider | Binary | Strengths | +|----------|--------|-----------| +| **Claude** | `claude` | Reasoning, code analysis, debugging, architecture, planning | +| **Gemini** | `gemini` | Research, trends, knowledge, large-context, web search | +| **Codex** | `codex` | Code generation, edits, refactoring, full-auto mode | + +--- + +## Configuration Reference + +### Circuit Breaker Defaults + +| Parameter | Value | Description | +|-----------|-------|-------------| +| `failureThreshold` | `3` | Consecutive failures before opening | +| `timeoutThreshold` | `5` | Consecutive timeouts before opening (slow ≠ broken) | +| `cooldownMs` | `60000` | Milliseconds before open → half-open | +| `halfOpenSuccessThreshold` | `1` | Successes in half-open to close | + +### Retry Defaults + +| Parameter | Value | Description | +|-----------|-------|-------------| +| `maxRetries` | `2` | Maximum retry attempts per provider | +| `baseDelayMs` | `1000` | Base delay for exponential backoff | +| `maxDelayMs` | `10000` | Maximum delay cap | +| `jitterFactor` | `0.3` | Random jitter range (+-30%) | + +### Executor Defaults + +| Parameter | Value | Description | +|-----------|-------|-------------| +| `STDIN_THRESHOLD` | `30000` | Characters before switching to stdin delivery | +| `MAX_BUFFER` | `10MB` | Maximum stdout/stderr buffer | + +--- ## Development ```bash -bun install -bun test +bun install # Install dependencies +bun test # Run all tests (85 tests) +bun test --watch # Watch mode +bun test tests/retry.test.ts # Run a single test file +bun run typecheck # Type-check without emitting +bun run build # Build +``` + +### Project Structure + +``` +src/ + index.ts Plugin entry, 4 tools, 2 hooks + resilience.ts Global time budget, retry + breaker + fallback + circuit-breaker.ts Per-CLI state machine (failures + timeouts) + executor.ts execa wrapper, Windows handling, PATH augmentation + cli-defs.ts Provider configs, arg builders, role routing + detection.ts CLI auto-detection with 5-min cache + retry.ts Exponential backoff, abort-aware sleep + error-classifier.ts Error categorization for retry decisions + safe-env.ts Environment variable allowlist + redact.ts Secret redaction + platform.ts OS detection + +tests/ + 8 test files, 85 tests covering all modules ``` +--- + ## Contributing 1. Fork the repo 2. Create a feature branch from `develop`: `git checkout -b feat/my-feature develop` 3. Make your changes and add tests -4. Run `bun test` +4. Run `bun test` (all 85 must pass) 5. Open a PR to `develop` +--- + ## License -MIT +[MIT](LICENSE) © [lleontor705](https://github.com/lleontor705) diff --git a/bun.lock b/bun.lock index 13ea524..57c149d 100644 --- a/bun.lock +++ b/bun.lock @@ -6,8 +6,8 @@ "name": "opencode-cli-enforcer", "dependencies": { "@opencode-ai/plugin": "^1.2.26", - "cockatiel": "^3.2.1", "execa": "^9.6.1", + "zod": "^3.23.0", }, "devDependencies": { "@types/bun": "latest", @@ -30,8 +30,6 @@ "bun-types": ["bun-types@1.3.11", "", { "dependencies": { "@types/node": "*" } }, "sha512-1KGPpoxQWl9f6wcZh57LvrPIInQMn2TQ7jsgxqpRzg+l0QPOFvJVH7HmvHo/AiPgwXy+/Thf6Ov3EdVn1vOabg=="], - "cockatiel": ["cockatiel@3.2.1", "", {}, "sha512-gfrHV6ZPkquExvMh9IOkKsBzNDk6sDuZ6DdBGUBkvFnTCqCxzpuq48RySgP0AnaqQkw2zynOFj9yly6T1Q2G5Q=="], - "cross-spawn": ["cross-spawn@7.0.6", "", { "dependencies": { "path-key": "^3.1.0", "shebang-command": "^2.0.0", "which": "^2.0.1" } }, "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA=="], "execa": ["execa@9.6.1", "", { "dependencies": { "@sindresorhus/merge-streams": "^4.0.0", "cross-spawn": "^7.0.6", "figures": "^6.1.0", "get-stream": "^9.0.0", "human-signals": "^8.0.1", "is-plain-obj": "^4.1.0", "is-stream": "^4.0.1", "npm-run-path": "^6.0.0", "pretty-ms": "^9.2.0", "signal-exit": "^4.1.0", "strip-final-newline": "^4.0.0", "yoctocolors": "^2.1.1" } }, "sha512-9Be3ZoN4LmYR90tUoVu2te2BsbzHfhJyfEiAVfz7N5/zv+jduIfLrV2xdQXOHbaD6KgpGdO9PRPM1Y4Q9QkPkA=="], @@ -76,7 +74,9 @@ "yoctocolors": ["yoctocolors@2.1.2", "", {}, "sha512-CzhO+pFNo8ajLM2d2IW/R93ipy99LWjtwblvC1RsoSUMZgyLbYFr221TnSNT7GjGdYui6P459mw9JH/g/zW2ug=="], - "zod": ["zod@4.1.8", "", {}, "sha512-5R1P+WwQqmmMIEACyzSvo4JXHY5WiAFHRMg+zBZKgKS+Q1viRa0C1hmUKtHltoIFKtIdki3pRxkmpP74jnNYHQ=="], + "zod": ["zod@3.25.76", "", {}, "sha512-gzUt/qt81nXsFGKIFcC3YnfEAx5NkunCfnDlvuBSSFS02bcXu4Lmea0AFIUwbLWxWPx3d9p8S5QoaujKcNQxcQ=="], + + "@opencode-ai/plugin/zod": ["zod@4.1.8", "", {}, "sha512-5R1P+WwQqmmMIEACyzSvo4JXHY5WiAFHRMg+zBZKgKS+Q1viRa0C1hmUKtHltoIFKtIdki3pRxkmpP74jnNYHQ=="], "npm-run-path/path-key": ["path-key@4.0.0", "", {}, "sha512-haREypq7xkM7ErfgIyA0z+Bj4AGKlMSdlQE2jvJo6huWD1EdkKYV+G/T4nq0YEF2vgTT8kqMFKo1uHn950r4SQ=="], } diff --git a/docs/assets/architecture.svg b/docs/assets/architecture.svg new file mode 100644 index 0000000..0570a24 --- /dev/null +++ b/docs/assets/architecture.svg @@ -0,0 +1,124 @@ + + + + + + + + + + + + + + + + OpenCode CLI Enforcer — Architecture + + + + OpenCode Agent + system prompt + hooks + + + + + + + Plugin Entry + index.ts — 4 tools, 2 hooks + + + + + + + cli_exec + + + cli_status + + + cli_list + + + cli_route + + + + + + + Resilience Engine + + + + Global Time Budget (shared) + + + + Claude + reasoning + + + Gemini + research + + + Codex + code-gen + + + + + + + + + + + Circuit Breaker + 3 failures OR 5 timeouts = open | 60s cooldown + + + + + + + Retry + Backoff + max 2 retries | 1s-10s + jitter | abort-aware + + + + + + + Executor (execa) + stdin >30KB | Windows .cmd | 10MB buffer + + + + error-classifier.ts + + + detection.ts (cache) + + + safe-env.ts + + + redact.ts + + + + + + + diff --git a/docs/assets/circuit-breaker.svg b/docs/assets/circuit-breaker.svg new file mode 100644 index 0000000..bf79955 --- /dev/null +++ b/docs/assets/circuit-breaker.svg @@ -0,0 +1,63 @@ + + + + + + + + + + + + + + + + Circuit Breaker State Machine + + + + CLOSED + Requests pass through + + + + OPEN + All requests blocked + + + + HALF-OPEN + One probe allowed + + + + 3 failures + OR 5 timeouts + + + + 60s cooldown + + + + 1 success + + + + failure / timeout + + + + Normal operation + + Provider skipped + + Testing recovery + diff --git a/docs/assets/logo.svg b/docs/assets/logo.svg new file mode 100644 index 0000000..b3ab13b --- /dev/null +++ b/docs/assets/logo.svg @@ -0,0 +1,31 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/providers.svg b/docs/assets/providers.svg new file mode 100644 index 0000000..dcce5c9 --- /dev/null +++ b/docs/assets/providers.svg @@ -0,0 +1,76 @@ + + + + + Supported CLI Providers + + + + Claude + $ claude + + + reasoning + + + code-analysis + + + debugging + + + architecture + + + planning + + fallback: gemini → codex + + + + Gemini + $ gemini + + + research + + + large-context + + + trends + + + knowledge + + + web-search + + fallback: claude → codex + + + + Codex + $ codex + + + code-generation + + + full-auto + + + edits + + + refactoring + + fallback: claude → gemini + diff --git a/docs/assets/resilience-pipeline.svg b/docs/assets/resilience-pipeline.svg new file mode 100644 index 0000000..4cb87e4 --- /dev/null +++ b/docs/assets/resilience-pipeline.svg @@ -0,0 +1,131 @@ + + + + + + + + + + + + + Resilience Pipeline — Request Flow + + + + 1 + Request + cli_exec() + + + + + + + 2 + Budget OK? + remaining > 0 + + + + + + + 3 + Breaker Closed? + check state + + + + + + + 4 + Execute CLI + execa + timeout + + + + + + Success! + + + + + + + All providers failed + + + + + Skip to fallback + + + + + + + Timeout? + + + + + Classify Error + + + + + + Skip retries + + + + + + + Transient + + + + + Permanent + + + + + + Retry (backoff) + + + + + Next provider + + + + + + + + retry loop + + + + Success path + + Failure / fallback + + Timeout (skip retries) + + Retry with backoff + diff --git a/docs/assets/routing.svg b/docs/assets/routing.svg new file mode 100644 index 0000000..bf520ef --- /dev/null +++ b/docs/assets/routing.svg @@ -0,0 +1,108 @@ + + + + + + + + + + + + + Role-Based CLI Routing + + + Agent Role + + + Primary + Fallback 1 + Fallback 2 + + + + + + + + + + + + Manager + + Gemini + + + Claude + + + Codex + + + + Coordinator + + Claude + + + Gemini + + + Codex + + + + Developer + + Codex + + + Claude + + + Gemini + + + + Researcher + + Gemini + + + Claude + + + Codex + + + + Reviewer + + Claude + + + Gemini + + + Codex + + + + Architect + + Claude + + + Gemini + + + Codex + diff --git a/package.json b/package.json index 5dc8504..e05e0b7 100644 --- a/package.json +++ b/package.json @@ -32,8 +32,8 @@ ], "dependencies": { "@opencode-ai/plugin": "^1.2.26", - "cockatiel": "^3.2.1", - "execa": "^9.6.1" + "execa": "^9.6.1", + "zod": "^3.23.0" }, "devDependencies": { "@types/bun": "latest", diff --git a/src/circuit-breaker.ts b/src/circuit-breaker.ts index a56488b..e1e267d 100644 --- a/src/circuit-breaker.ts +++ b/src/circuit-breaker.ts @@ -3,14 +3,14 @@ * * States: * closed → normal operation, requests pass through - * open → too many failures, requests are blocked + * open → too many failures OR too many timeouts, requests are blocked * half-open → cooldown elapsed, one probe request allowed * * Transitions: - * closed →(N failures)→ open - * open →(cooldown)→ half-open - * half-open →(success)→ closed - * half-open →(failure)→ open + * closed →(N failures OR M timeouts)→ open + * open →(cooldown)→ half-open + * half-open →(success)→ closed + * half-open →(failure/timeout)→ open */ export type CircuitState = "closed" | "open" | "half-open" @@ -18,15 +18,21 @@ export type CircuitState = "closed" | "open" | "half-open" export interface CircuitBreaker { state: CircuitState failures: number + timeouts: number successes: number lastFailure: number | null lastSuccess: number | null openedAt: number | null + totalExecutions: number + totalFailures: number + totalTimeouts: number } export interface BreakerConfig { /** Consecutive failures before opening the circuit */ failureThreshold: number + /** Consecutive timeouts before opening (higher than failures: slow ≠ broken) */ + timeoutThreshold: number /** Ms to wait before transitioning from open → half-open */ cooldownMs: number /** Successes in half-open needed to close the circuit */ @@ -35,6 +41,7 @@ export interface BreakerConfig { export const DEFAULT_BREAKER_CONFIG: BreakerConfig = { failureThreshold: 3, + timeoutThreshold: 5, cooldownMs: 60_000, halfOpenSuccessThreshold: 1, } @@ -43,10 +50,14 @@ export function createBreaker(): CircuitBreaker { return { state: "closed", failures: 0, + timeouts: 0, successes: 0, lastFailure: null, lastSuccess: null, openedAt: null, + totalExecutions: 0, + totalFailures: 0, + totalTimeouts: 0, } } @@ -75,8 +86,10 @@ export function recordSuccess( config: BreakerConfig = DEFAULT_BREAKER_CONFIG, now: number = Date.now(), ): void { + breaker.totalExecutions++ breaker.lastSuccess = now breaker.failures = 0 + breaker.timeouts = 0 if (breaker.state === "half-open") { breaker.successes++ @@ -93,6 +106,8 @@ export function recordFailure( config: BreakerConfig = DEFAULT_BREAKER_CONFIG, now: number = Date.now(), ): void { + breaker.totalExecutions++ + breaker.totalFailures++ breaker.failures++ breaker.lastFailure = now @@ -107,3 +122,25 @@ export function recordFailure( breaker.openedAt = now } } + +export function recordTimeout( + breaker: CircuitBreaker, + config: BreakerConfig = DEFAULT_BREAKER_CONFIG, + now: number = Date.now(), +): void { + breaker.totalExecutions++ + breaker.totalTimeouts++ + breaker.timeouts++ + breaker.lastFailure = now + + if (breaker.state === "half-open") { + breaker.state = "open" + breaker.openedAt = now + return + } + + if (breaker.timeouts >= config.timeoutThreshold) { + breaker.state = "open" + breaker.openedAt = now + } +} diff --git a/src/cli-defs.ts b/src/cli-defs.ts index c1ac716..128e5dd 100644 --- a/src/cli-defs.ts +++ b/src/cli-defs.ts @@ -20,22 +20,22 @@ export const CLI_DEFS: Record = { claude: { name: "claude", description: "Anthropic Claude — strong reasoning, code analysis, complex logic", - strengths: ["reasoning", "code-analysis", "debugging", "architecture"], + strengths: ["reasoning", "code-analysis", "debugging", "architecture", "planning"], binary: "claude", buildArgs: (prompt, mode) => mode === "analyze" - ? ["-p", prompt, "--max-turns", "10"] + ? ["-p", prompt] : ["-p", prompt, "--allowedTools", ""], buildStdinArgs: (mode) => mode === "analyze" - ? ["-p", "-", "--max-turns", "10"] + ? ["-p", "-"] : ["-p", "-", "--allowedTools", ""], fallbackOrder: ["gemini", "codex"], }, gemini: { name: "gemini", description: "Google Gemini — research, trends, broad knowledge, large context", - strengths: ["research", "trends", "knowledge", "large-context"], + strengths: ["research", "trends", "knowledge", "large-context", "web-search"], binary: "gemini", buildArgs: (prompt, _mode) => ["-e", "none", "-p", prompt], buildStdinArgs: (_mode) => ["-e", "none"], @@ -44,7 +44,7 @@ export const CLI_DEFS: Record = { codex: { name: "codex", description: "OpenAI Codex — code generation, edits, refactoring", - strengths: ["code-generation", "edits", "refactoring"], + strengths: ["code-generation", "edits", "refactoring", "full-auto"], binary: "codex", buildArgs: (prompt, _mode) => ["exec", prompt, "--full-auto"], buildStdinArgs: (_mode) => ["exec", "-", "--full-auto"], @@ -53,3 +53,38 @@ export const CLI_DEFS: Record = { } export const ALL_CLI_NAMES: CliName[] = ["claude", "gemini", "codex"] + +/** + * Generate CLI-specific args that hint at timeout constraints. + * Claude: --max-turns scales with available time (~1 turn per 30s). + * Gemini/Codex: no known timeout flags. + */ +export function buildTimeoutArgs( + provider: CliName, + remainingSeconds: number, +): string[] { + switch (provider) { + case "claude": { + const maxTurns = Math.max(2, Math.min(25, Math.floor(remainingSeconds / 30))) + return ["--max-turns", String(maxTurns)] + } + case "gemini": + return [] + case "codex": + return [] + } +} + +// ── Role-based routing ────────────────────────────────────────────────────── + +export const AGENT_ROLES = ["manager", "coordinator", "developer", "researcher", "reviewer", "architect"] as const +export type AgentRole = (typeof AGENT_ROLES)[number] + +export const ROLE_ROUTING: Record = { + manager: { primary: "gemini", fallbacks: ["claude", "codex"] }, + coordinator: { primary: "claude", fallbacks: ["gemini", "codex"] }, + developer: { primary: "codex", fallbacks: ["claude", "gemini"] }, + researcher: { primary: "gemini", fallbacks: ["claude", "codex"] }, + reviewer: { primary: "claude", fallbacks: ["gemini", "codex"] }, + architect: { primary: "claude", fallbacks: ["gemini", "codex"] }, +} diff --git a/src/detection.ts b/src/detection.ts index fff225b..0c453b1 100644 --- a/src/detection.ts +++ b/src/detection.ts @@ -1,11 +1,14 @@ /** * CLI Auto-Detection — probes the system for installed CLI binaries. + * Caches results for 5 minutes to avoid repeated filesystem lookups. */ import { execa } from "execa" import { IS_WINDOWS } from "./platform" import { CLI_DEFS, ALL_CLI_NAMES, type CliName } from "./cli-defs" +const CACHE_TTL_MS = 5 * 60 * 1000 // 5 minutes + export interface CliAvailability { installed: boolean path: string | null @@ -13,25 +16,43 @@ export interface CliAvailability { checkedAt: number } +interface CacheEntry { + result: CliAvailability + timestamp: number +} + +const cache = new Map() + +function isCacheValid(entry: CacheEntry): boolean { + return Date.now() - entry.timestamp < CACHE_TTL_MS +} + export async function detectCli(name: CliName): Promise { + const cached = cache.get(name) + if (cached && isCacheValid(cached)) return cached.result + const def = CLI_DEFS[name] const whichBin = IS_WINDOWS ? "where" : "which" try { - const { stdout } = await execa(whichBin, [def.binary], { timeout: 5_000 }) - const path = stdout.trim().split("\n")[0] ?? null + const { stdout } = await execa(whichBin, [def.binary], { timeout: 5_000, windowsHide: true }) + const path = stdout.trim().split(/\r?\n/)[0] ?? null let version: string | null = null try { - const vResult = await execa(def.binary, ["--version"], { timeout: 5_000 }) - version = vResult.stdout.trim().split("\n")[0] ?? null + const vResult = await execa(def.binary, ["--version"], { timeout: 5_000, windowsHide: true }) + version = vResult.stdout.trim().split(/\r?\n/)[0] ?? null } catch { // version check is best-effort } - return { installed: true, path, version, checkedAt: Date.now() } + const result: CliAvailability = { installed: true, path, version, checkedAt: Date.now() } + cache.set(name, { result, timestamp: Date.now() }) + return result } catch { - return { installed: false, path: null, version: null, checkedAt: Date.now() } + const result: CliAvailability = { installed: false, path: null, version: null, checkedAt: Date.now() } + cache.set(name, { result, timestamp: Date.now() }) + return result } } @@ -47,3 +68,7 @@ export async function detectAllClis(): Promise> { } return results } + +export function getDetectionCache(): Map { + return new Map([...cache].map(([k, v]) => [k, v.result])) +} diff --git a/src/executor.ts b/src/executor.ts index b8f886b..7abcfa6 100644 --- a/src/executor.ts +++ b/src/executor.ts @@ -1,66 +1,114 @@ /** * Core Execution Engine — runs a CLI binary via execa with structured output. + * Returns structured results (never throws) to support global time budget. */ import { execa } from "execa" +import path from "node:path" +import os from "node:os" import type { CliDef } from "./cli-defs" +import { buildTimeoutArgs } from "./cli-defs" import { getSafeEnv } from "./safe-env" import { redactSecrets } from "./redact" +import { IS_WINDOWS } from "./platform" /** Prompts longer than this (chars) are delivered via stdin to avoid OS arg-length limits. */ export const STDIN_THRESHOLD = 30_000 +const MAX_BUFFER = 10 * 1024 * 1024 // 10MB export interface ExecResult { stdout: string stderr: string + exitCode: number durationMs: number + timedOut: boolean +} + +/** + * On Windows, CLIs installed via npm/scoop/cargo may be .cmd/.bat shims. + * Wrap with `cmd /c` so execa can execute them without a shell. + */ +function resolveCommand(binary: string): { file: string; prefix: string[] } { + if (!IS_WINDOWS) return { file: binary, prefix: [] } + + const ext = path.extname(binary).toLowerCase() + if (ext === ".cmd" || ext === ".bat") { + return { file: "cmd", prefix: ["/c", binary] } + } + + const pathext = (process.env.PATHEXT || "").toLowerCase() + if (pathext.includes(".cmd") || pathext.includes(".bat")) { + return { file: "cmd", prefix: ["/c", binary] } + } + + return { file: binary, prefix: [] } +} + +/** Enhance PATH on Windows with common CLI install locations */ +function getEnhancedPath(): string | undefined { + if (!IS_WINDOWS) return undefined + + const home = os.homedir() + const extraPaths = [ + path.join(home, "AppData", "Roaming", "npm"), + path.join(home, "scoop", "shims"), + path.join(home, ".cargo", "bin"), + path.join(home, "AppData", "Local", "pnpm"), + ] + + const currentPath = process.env.PATH || "" + return [...extraPaths, currentPath].join(path.delimiter) } export async function executeCliOnce( def: CliDef, prompt: string, mode: string, - timeoutMs: number, + timeoutSeconds: number, signal?: AbortSignal, ): Promise { const useStdin = def.buildStdinArgs != null && prompt.length > STDIN_THRESHOLD - const args = useStdin ? def.buildStdinArgs!(mode) : def.buildArgs(prompt, mode) - const start = Date.now() - - const result = await execa(def.binary, args, { - timeout: timeoutMs, - maxBuffer: 10 * 1024 * 1024, - reject: false, - windowsHide: true, - env: getSafeEnv(), - ...(useStdin ? { input: prompt } : {}), - ...(signal ? { cancelSignal: signal } : {}), - }) + const baseArgs = useStdin ? def.buildStdinArgs!(mode) : def.buildArgs(prompt, mode) + const timeoutHints = buildTimeoutArgs(def.name, timeoutSeconds) + const args = [...baseArgs, ...timeoutHints] - const durationMs = Date.now() - start + const { file, prefix } = resolveCommand(def.binary) + const finalArgs = [...prefix, ...args] - if (result.isCanceled) { - throw Object.assign(new Error(`CLI '${def.name}' was canceled`), { canceled: true }) + const env = getSafeEnv() + const enhancedPath = getEnhancedPath() + if (enhancedPath) { + env.PATH = enhancedPath } - if (result.timedOut) { - throw Object.assign(new Error(`CLI '${def.name}' timed out after ${timeoutMs}ms`), { - timedOut: true, - }) - } + const start = Date.now() - if (result.failed && result.exitCode !== 0) { - const rawMsg = result.stderr?.trim() || result.message || `Exit code ${result.exitCode}` - const msg = redactSecrets(rawMsg) - throw Object.assign(new Error(`CLI '${def.name}' failed: ${msg}`), { - exitCode: result.exitCode, + try { + const result = await execa(file, finalArgs, { + timeout: timeoutSeconds * 1000, + maxBuffer: MAX_BUFFER, + reject: false, + windowsHide: true, + env, + ...(useStdin ? { input: prompt } : {}), + ...(signal ? { cancelSignal: signal } : {}), }) - } - return { - stdout: result.stdout ?? "", - stderr: result.stderr ?? "", - durationMs, + return { + stdout: result.stdout || "", + stderr: redactSecrets(result.stderr || ""), + exitCode: result.exitCode ?? 1, + durationMs: Date.now() - start, + timedOut: result.timedOut ?? false, + } + } catch (error: any) { + return { + stdout: "", + stderr: redactSecrets(error.message || "Execution failed"), + exitCode: 1, + durationMs: Date.now() - start, + timedOut: !!error.timedOut, + } } } diff --git a/src/index.ts b/src/index.ts index 266c4a7..fb24e5f 100644 --- a/src/index.ts +++ b/src/index.ts @@ -7,6 +7,8 @@ * Tools exposed: * cli_exec — Execute a CLI with full resilience pipeline * cli_status — Health check and observability dashboard + * cli_list — List installed CLI providers + * cli_route — Recommend best CLI by agent role * * Hook: * experimental.chat.system.transform — injects CLI availability into agent prompts @@ -18,7 +20,7 @@ import { tool } from "@opencode-ai/plugin" import { z } from "zod" import { PLATFORM } from "./platform" -import { CLI_DEFS, ALL_CLI_NAMES, type CliName } from "./cli-defs" +import { CLI_DEFS, ALL_CLI_NAMES, AGENT_ROLES, ROLE_ROUTING, type CliName } from "./cli-defs" import { createBreaker, DEFAULT_BREAKER_CONFIG } from "./circuit-breaker" import { DEFAULT_RETRY_CONFIG } from "./retry" import { detectAllClis, type CliAvailability } from "./detection" @@ -89,12 +91,13 @@ export default ((ctx) => { ## External CLI Tools (${PLATFORM}) Use \`cli_exec\` to call external LLMs — it handles OS differences, timeout, retry, and automatic fallback. Use \`cli_status\` to check health and availability of all CLI providers. +Use \`cli_list\` to see installed providers. Use \`cli_route\` for role-based CLI recommendations. | CLI | Description | Strengths | Health | |-----|-------------|-----------|--------| ${rows} ${unavailableNote} -Features: auto-retry (${DEFAULT_RETRY_CONFIG.maxRetries}x with backoff), circuit breaker per CLI, fallback to next available provider. +Features: auto-retry (${DEFAULT_RETRY_CONFIG.maxRetries}x with backoff), circuit breaker per CLI, fallback to next available provider, global time budget. Rules: One concern per call. Split large requests. Include "CLI Consultations" in output. ` } @@ -107,7 +110,8 @@ Rules: One concern per call. Split large requests. Include "CLI Consultations" i name: "cli_exec", description: "Execute an external CLI (claude, gemini, codex) with automatic OS detection, timeout, " + - "retry with exponential backoff, circuit breaker protection, and fallback to alternative providers.", + "retry with exponential backoff, circuit breaker protection, and fallback to alternative providers. " + + "Uses a global time budget shared across all retries and fallbacks.", parameters: z.object({ cli: z.enum(["claude", "gemini", "codex"]).describe("Primary CLI to invoke"), prompt: z.string().min(1).max(100_000).describe("The prompt to send to the CLI"), @@ -121,7 +125,7 @@ Rules: One concern per call. Split large requests. Include "CLI Consultations" i .min(10) .max(1800) .default(720) - .describe("Max seconds before killing the process"), + .describe("Global timeout budget in seconds (covers all retries and fallbacks)"), allow_fallback: z .boolean() .default(true) @@ -174,7 +178,9 @@ Rules: One concern per call. Split large requests. Include "CLI Consultations" i circuit_breaker: { state: breaker.state, consecutive_failures: breaker.failures, + consecutive_timeouts: breaker.timeouts, failure_threshold: DEFAULT_BREAKER_CONFIG.failureThreshold, + timeout_threshold: DEFAULT_BREAKER_CONFIG.timeoutThreshold, cooldown_seconds: DEFAULT_BREAKER_CONFIG.cooldownMs / 1000, opened_at: breaker.openedAt ? new Date(breaker.openedAt).toISOString() : null, last_failure: breaker.lastFailure @@ -183,6 +189,9 @@ Rules: One concern per call. Split large requests. Include "CLI Consultations" i last_success: breaker.lastSuccess ? new Date(breaker.lastSuccess).toISOString() : null, + total_executions: breaker.totalExecutions, + total_failures: breaker.totalFailures, + total_timeouts: breaker.totalTimeouts, }, usage: { total_calls: stats.calls, @@ -207,12 +216,75 @@ Rules: One concern per call. Split large requests. Include "CLI Consultations" i }, breaker_config: { failure_threshold: DEFAULT_BREAKER_CONFIG.failureThreshold, + timeout_threshold: DEFAULT_BREAKER_CONFIG.timeoutThreshold, cooldown_seconds: DEFAULT_BREAKER_CONFIG.cooldownMs / 1000, }, providers, } }, }), + + tool({ + name: "cli_list", + description: "List installed CLI providers with their paths, versions, and strengths.", + parameters: z.object({}), + execute: async () => { + await detectionPromise + + const installed: { provider: CliName; path: string | null; version: string | null; strengths: string[] }[] = [] + for (const name of ALL_CLI_NAMES) { + const avail = cliAvailability.get(name) + if (avail?.installed) { + installed.push({ + provider: name, + path: avail.path, + version: avail.version, + strengths: CLI_DEFS[name].strengths, + }) + } + } + + return { + installed_count: installed.length, + providers: installed, + } + }, + }), + + tool({ + name: "cli_route", + description: + "Suggest the best CLI for a task based on agent role. " + + "Returns recommended provider with reasoning and fallback chain.", + parameters: z.object({ + role: z.enum(AGENT_ROLES).describe("Agent role (manager, coordinator, developer, researcher, reviewer, architect)"), + task_description: z.string().optional().describe("Brief task description for context"), + }), + execute: async ({ role, task_description }) => { + await detectionPromise + + const routing = ROLE_ROUTING[role] + const chain = [routing.primary, ...routing.fallbacks] as CliName[] + + const availability: Record = {} + for (const provider of chain) { + const det = cliAvailability.get(provider) + const breaker = breakers.get(provider)! + availability[provider] = (det?.installed ?? false) && breaker.state !== "open" + } + + const recommended = chain.find((p) => availability[p]) || routing.primary + + return { + role, + task_description: task_description ?? null, + recommended_cli: recommended, + reasoning: `Role "${role}" maps to ${routing.primary} (${CLI_DEFS[routing.primary].strengths.join(", ")})${recommended !== routing.primary ? `. Falling back to ${recommended} because ${routing.primary} is unavailable.` : "."}`, + fallback_chain: chain, + availability, + } + }, + }), ], hooks: { diff --git a/src/resilience.ts b/src/resilience.ts index 4a39e85..76027ca 100644 --- a/src/resilience.ts +++ b/src/resilience.ts @@ -1,6 +1,9 @@ /** * Resilience Engine — orchestrates retry + circuit breaker + fallback - * into a single execution pipeline. + * into a single execution pipeline with a global time budget. + * + * The global budget is shared across ALL retries and fallback attempts, + * preventing timeout multiplication (3 providers × 3 attempts × timeout). */ import type { CliName } from "./cli-defs" @@ -11,6 +14,7 @@ import { isBreakerAvailable, recordSuccess, recordFailure, + recordTimeout, } from "./circuit-breaker" import type { RetryConfig } from "./retry" import { DEFAULT_RETRY_CONFIG, calculateDelay, sleep } from "./retry" @@ -21,7 +25,7 @@ import type { CircuitState } from "./circuit-breaker" import { classifyError, type ErrorClass } from "./error-classifier" import { redactSecrets } from "./redact" -// ─── Structured Response (MCP pattern) ───────────────────────────────────── +// ─── Structured Response ────────────────────────────────────────────────── export interface CliResponse { success: boolean @@ -40,7 +44,7 @@ export interface CliResponse { max_attempts: number } -// ─── Usage Stats ─────────────────────────────────────────────────────────── +// ─── Usage Stats ────────────────────────────────────────────────────────── export interface UsageStats { calls: number @@ -48,7 +52,7 @@ export interface UsageStats { totalMs: number } -// ─── Engine ──────────────────────────────────────────────────────────────── +// ─── Engine ─────────────────────────────────────────────────────────────── export interface ResilienceContext { breakers: Map @@ -59,6 +63,17 @@ export interface ResilienceContext { breakerConfig: BreakerConfig } +/** Merge caller signal with budget signal (compatible with all runtimes) */ +function mergeAbortSignals(a?: AbortSignal, b?: AbortSignal): AbortSignal | undefined { + if (!a) return b + if (!b) return a + const controller = new AbortController() + const onAbort = () => controller.abort() + a.addEventListener("abort", onAbort, { once: true }) + b.addEventListener("abort", onAbort, { once: true }) + return controller.signal +} + export async function executeWithResilience( ctx: ResilienceContext, targetCli: CliName, @@ -68,121 +83,167 @@ export async function executeWithResilience( allowFallback: boolean, signal?: AbortSignal, ): Promise { - const timeoutMs = timeoutSeconds * 1000 const def = CLI_DEFS[targetCli] const fallbackChain: string[] = [targetCli] + const errors: string[] = [] // Build execution order: target first, then fallbacks const executionOrder: CliName[] = [targetCli] if (allowFallback) { - const available = ALL_CLI_NAMES.filter((n) => { - const avail = ctx.availability.get(n) - return avail?.installed !== false - }) for (const fb of def.fallbackOrder) { - if (available.includes(fb)) executionOrder.push(fb) + const avail = ctx.availability.get(fb) + if (avail?.installed !== false) executionOrder.push(fb) } } - for (const cliName of executionOrder) { - const currentDef = CLI_DEFS[cliName] - const breaker = ctx.breakers.get(cliName)! - const stats = ctx.usageStats.get(cliName)! + // Global time budget: entire chain (retries + fallbacks) must fit within timeoutSeconds + const globalDeadline = Date.now() + timeoutSeconds * 1000 + const budgetController = new AbortController() + const budgetTimeout = setTimeout(() => budgetController.abort(), timeoutSeconds * 1000) + const mergedSignal = mergeAbortSignals(signal, budgetController.signal) + + try { + for (const cliName of executionOrder) { + const remaining = globalDeadline - Date.now() + if (remaining <= 0) { + errors.push(`${cliName}: global budget exhausted`) + break + } - // Check circuit breaker - if (!isBreakerAvailable(breaker, ctx.breakerConfig)) { - if (cliName !== targetCli) fallbackChain.push(`${cliName}(circuit-open)`) - continue - } + const currentDef = CLI_DEFS[cliName] + const breaker = ctx.breakers.get(cliName)! + const stats = ctx.usageStats.get(cliName)! - // Check availability - const avail = ctx.availability.get(cliName) - if (avail?.installed === false) { - if (cliName !== targetCli) fallbackChain.push(`${cliName}(not-installed)`) - continue - } + // Check circuit breaker + if (!isBreakerAvailable(breaker, ctx.breakerConfig)) { + if (cliName !== targetCli) fallbackChain.push(`${cliName}(circuit-open)`) + errors.push(`${cliName}: circuit breaker open`) + continue + } - if (cliName !== targetCli) fallbackChain.push(cliName) + // Check availability + const avail = ctx.availability.get(cliName) + if (avail?.installed === false) { + if (cliName !== targetCli) fallbackChain.push(`${cliName}(not-installed)`) + errors.push(`${cliName}: not installed`) + continue + } - // Retry loop - for (let attempt = 0; attempt <= ctx.retryConfig.maxRetries; attempt++) { - if (signal?.aborted) break + if (cliName !== targetCli) fallbackChain.push(cliName) - if (attempt > 0) { - const delay = calculateDelay(attempt - 1, ctx.retryConfig) - await sleep(delay) - } + // Retry loop + for (let attempt = 0; attempt <= ctx.retryConfig.maxRetries; attempt++) { + if (mergedSignal?.aborted) { + errors.push(`${cliName}: aborted`) + break + } + + const remainingSeconds = Math.max(1, Math.floor((globalDeadline - Date.now()) / 1000)) + if (remainingSeconds <= 1) { + errors.push(`${cliName}: global budget exhausted`) + break + } + + if (attempt > 0) { + const delay = calculateDelay(attempt - 1, ctx.retryConfig) + try { + await sleep(delay, mergedSignal) + } catch { + errors.push(`${cliName}: aborted during retry backoff`) + break + } + } - try { stats.calls++ - const result = await executeCliOnce(currentDef, prompt, mode, timeoutMs, signal) - - recordSuccess(breaker, ctx.breakerConfig) - stats.totalMs += result.durationMs - - return { - success: true, - cli: cliName, - platform: ctx.platform, - stdout: result.stdout, - stderr: result.stderr, - duration_ms: result.durationMs, - timed_out: false, - used_fallback: cliName !== targetCli, - fallback_chain: fallbackChain, - error: null, - error_class: null, - circuit_state: breaker.state, - attempt: attempt + 1, - max_attempts: ctx.retryConfig.maxRetries + 1, + const result = await executeCliOnce(currentDef, prompt, mode, remainingSeconds, mergedSignal) + + if (result.exitCode === 0 && result.stdout) { + recordSuccess(breaker, ctx.breakerConfig) + stats.totalMs += result.durationMs + + return { + success: true, + cli: cliName, + platform: ctx.platform, + stdout: result.stdout, + stderr: result.stderr, + duration_ms: result.durationMs, + timed_out: false, + used_fallback: cliName !== targetCli, + fallback_chain: fallbackChain, + error: null, + error_class: null, + circuit_state: breaker.state, + attempt: attempt + 1, + max_attempts: ctx.retryConfig.maxRetries + 1, + } } - } catch (err: unknown) { - stats.failures++ - const errorClass = classifyError(err) + // Process timeout: skip retries, move to next provider immediately + if (result.timedOut) { + const err = redactSecrets(`${cliName}: process timeout (${result.durationMs}ms) — skipping retries`) + errors.push(err) + stats.failures++ + recordTimeout(breaker, ctx.breakerConfig) + break + } - // permanent and crash errors: skip retries, fallback immediately + // Classify error for retry decision + const errorClass = classifyError({ message: result.stderr, exitCode: result.exitCode }) + + // Permanent and crash errors: skip retries, fallback immediately if (errorClass === "permanent" || errorClass === "crash") { + const err = redactSecrets(`${cliName}: ${result.stderr || "non-retryable failure"}`) + errors.push(err) + stats.failures++ recordFailure(breaker, ctx.breakerConfig) - break // try next CLI in fallback chain + break } - // rate_limit: wait longer before retrying - if (errorClass === "rate_limit") { + // Rate limit: wait longer before retrying + if (errorClass === "rate_limit" && attempt < ctx.retryConfig.maxRetries) { const rateLimitDelay = calculateDelay(attempt + 1, { ...ctx.retryConfig, baseDelayMs: ctx.retryConfig.baseDelayMs * 3, }) - await sleep(rateLimitDelay) + try { + await sleep(rateLimitDelay, mergedSignal) + } catch { + errors.push(`${cliName}: aborted during rate-limit backoff`) + break + } } const isLastAttempt = attempt === ctx.retryConfig.maxRetries if (isLastAttempt) { + const err = redactSecrets(`${cliName}: exhausted retries — ${result.stderr}`) + errors.push(err) + stats.failures++ recordFailure(breaker, ctx.breakerConfig) - break // try next CLI in fallback chain } - // transient or rate_limit — loop continues with retry } + + if (mergedSignal?.aborted) break } - } - // All CLIs exhausted - return { - success: false, - cli: targetCli, - platform: ctx.platform, - stdout: "", - stderr: "", - duration_ms: 0, - timed_out: false, - used_fallback: fallbackChain.length > 1, - fallback_chain: fallbackChain, - error: redactSecrets( - `All CLI providers exhausted. Tried: ${fallbackChain.join(" → ")}. Check cli_status for details.`, - ), - error_class: "transient", - circuit_state: ctx.breakers.get(targetCli)!.state, - attempt: ctx.retryConfig.maxRetries + 1, - max_attempts: ctx.retryConfig.maxRetries + 1, + // All CLIs exhausted + return { + success: false, + cli: targetCli, + platform: ctx.platform, + stdout: "", + stderr: "", + duration_ms: 0, + timed_out: false, + used_fallback: fallbackChain.length > 1, + fallback_chain: fallbackChain, + error: redactSecrets(`All providers failed: ${errors.join("; ")}`), + error_class: "transient", + circuit_state: ctx.breakers.get(targetCli)!.state, + attempt: ctx.retryConfig.maxRetries + 1, + max_attempts: ctx.retryConfig.maxRetries + 1, + } + } finally { + clearTimeout(budgetTimeout) } } diff --git a/src/retry.ts b/src/retry.ts index 0a237b2..8e4e982 100644 --- a/src/retry.ts +++ b/src/retry.ts @@ -31,8 +31,23 @@ export function calculateDelay( return Math.max(0, Math.round(capped + jitter)) } -export function sleep(ms: number): Promise { - return new Promise((resolve) => setTimeout(resolve, ms)) +/** Abort-aware sleep — rejects immediately if signal fires during the wait. */ +export function sleep(ms: number, signal?: AbortSignal): Promise { + return new Promise((resolve, reject) => { + if (signal?.aborted) { + reject(new Error("aborted")) + return + } + const timer = setTimeout(resolve, ms) + signal?.addEventListener( + "abort", + () => { + clearTimeout(timer) + reject(new Error("aborted")) + }, + { once: true }, + ) + }) } export function isRetryableError(error: unknown): boolean { diff --git a/src/safe-env.ts b/src/safe-env.ts index cd3540a..423245f 100644 --- a/src/safe-env.ts +++ b/src/safe-env.ts @@ -1,9 +1,13 @@ /** * Environment Variable Filtering — only passes safe variables to * spawned CLI processes, preventing accidental secret leakage. + * + * CLIs handle their own auth inline (claude login, gcloud auth, etc.) + * so we just need system essentials + proxy settings. */ export const SAFE_ENV_VARS = [ + // System essentials "PATH", "HOME", "USER", @@ -11,14 +15,24 @@ export const SAFE_ENV_VARS = [ "SHELL", "LANG", "LC_ALL", - "ANTHROPIC_API_KEY", - "GOOGLE_API_KEY", - "OPENAI_API_KEY", - "GEMINI_API_KEY", - "CODEX_API_KEY", + // Windows + "USERPROFILE", + "SYSTEMROOT", + "SYSTEMDRIVE", + "COMSPEC", + "PATHEXT", + "TEMP", + "TMP", + "APPDATA", + "LOCALAPPDATA", + "PROGRAMFILES", + // Proxy "HTTP_PROXY", "HTTPS_PROXY", "NO_PROXY", + "http_proxy", + "https_proxy", + "no_proxy", ] export function getSafeEnv(): Record { diff --git a/tests/circuit-breaker.test.ts b/tests/circuit-breaker.test.ts index 00ce857..4e99407 100644 --- a/tests/circuit-breaker.test.ts +++ b/tests/circuit-breaker.test.ts @@ -4,12 +4,14 @@ import { isBreakerAvailable, recordSuccess, recordFailure, + recordTimeout, DEFAULT_BREAKER_CONFIG, type BreakerConfig, } from "../src/circuit-breaker" const config: BreakerConfig = { failureThreshold: 3, + timeoutThreshold: 5, cooldownMs: 5_000, halfOpenSuccessThreshold: 1, } @@ -19,6 +21,8 @@ describe("Circuit Breaker", () => { const b = createBreaker() expect(b.state).toBe("closed") expect(b.failures).toBe(0) + expect(b.timeouts).toBe(0) + expect(b.totalExecutions).toBe(0) }) it("remains closed after fewer failures than threshold", () => { @@ -92,4 +96,53 @@ describe("Circuit Breaker", () => { const b = createBreaker() expect(isBreakerAvailable(b, config)).toBe(true) }) + + it("tracks total executions across success and failure", () => { + const b = createBreaker() + recordSuccess(b, config) + recordFailure(b, config) + recordSuccess(b, config) + expect(b.totalExecutions).toBe(3) + expect(b.totalFailures).toBe(1) + }) +}) + +describe("Circuit Breaker — Timeout Tracking", () => { + it("remains closed after fewer timeouts than threshold", () => { + const b = createBreaker() + for (let i = 0; i < 4; i++) recordTimeout(b, config) + expect(b.state).toBe("closed") + expect(b.timeouts).toBe(4) + }) + + it("opens after reaching timeout threshold (5)", () => { + const b = createBreaker() + for (let i = 0; i < 5; i++) recordTimeout(b, config) + expect(b.state).toBe("open") + expect(b.totalTimeouts).toBe(5) + }) + + it("resets timeout counter on success", () => { + const b = createBreaker() + recordTimeout(b, config) + recordTimeout(b, config) + recordSuccess(b, config) + expect(b.timeouts).toBe(0) + }) + + it("re-opens immediately on timeout in half-open state", () => { + const b = createBreaker() + const now = 1000 + for (let i = 0; i < 5; i++) recordTimeout(b, config, now) + isBreakerAvailable(b, config, now + 5_000) // trigger half-open + + recordTimeout(b, config, now + 5_001) + expect(b.state).toBe("open") + }) + + it("timeout threshold is higher than failure threshold (slow ≠ broken)", () => { + expect(DEFAULT_BREAKER_CONFIG.timeoutThreshold).toBeGreaterThan( + DEFAULT_BREAKER_CONFIG.failureThreshold, + ) + }) }) diff --git a/tests/cli-defs.test.ts b/tests/cli-defs.test.ts index 9114a9f..4894726 100644 --- a/tests/cli-defs.test.ts +++ b/tests/cli-defs.test.ts @@ -1,5 +1,5 @@ import { describe, it, expect } from "bun:test" -import { CLI_DEFS, ALL_CLI_NAMES, type CliName } from "../src/cli-defs" +import { CLI_DEFS, ALL_CLI_NAMES, AGENT_ROLES, ROLE_ROUTING, buildTimeoutArgs, type CliName } from "../src/cli-defs" describe("CLI Definitions", () => { it("defines all three CLIs", () => { @@ -53,7 +53,6 @@ describe("CLI Definitions", () => { const args = CLI_DEFS.claude.buildStdinArgs!("analyze") expect(args).toContain("-p") expect(args).toContain("-") - expect(args).toContain("--max-turns") }) it("gemini stdin mode does not include prompt", () => { @@ -79,10 +78,10 @@ describe("CLI Definitions", () => { expect(args).toContain("test prompt") }) - it("claude analyze mode includes --max-turns", () => { + it("claude analyze mode uses -p flag", () => { const args = CLI_DEFS.claude.buildArgs("test prompt", "analyze") - expect(args).toContain("--max-turns") - expect(args).toContain("10") + expect(args).toContain("-p") + expect(args).toContain("test prompt") }) it("gemini builds correct args", () => { @@ -99,3 +98,66 @@ describe("CLI Definitions", () => { }) }) }) + +describe("buildTimeoutArgs", () => { + it("claude gets --max-turns based on remaining seconds", () => { + const args = buildTimeoutArgs("claude", 300) + expect(args).toContain("--max-turns") + expect(args).toContain("10") // 300/30 = 10 + }) + + it("claude max-turns clamps at minimum 2", () => { + const args = buildTimeoutArgs("claude", 15) + expect(args).toContain("2") + }) + + it("claude max-turns clamps at maximum 25", () => { + const args = buildTimeoutArgs("claude", 1800) + expect(args).toContain("25") + }) + + it("gemini returns empty array", () => { + expect(buildTimeoutArgs("gemini", 300)).toEqual([]) + }) + + it("codex returns empty array", () => { + expect(buildTimeoutArgs("codex", 300)).toEqual([]) + }) +}) + +describe("Role Routing", () => { + it("defines all 6 agent roles", () => { + expect(AGENT_ROLES).toHaveLength(6) + expect(AGENT_ROLES).toContain("manager") + expect(AGENT_ROLES).toContain("developer") + expect(AGENT_ROLES).toContain("architect") + }) + + it("each role maps to a valid primary provider", () => { + for (const role of AGENT_ROLES) { + expect(ALL_CLI_NAMES).toContain(ROLE_ROUTING[role].primary) + } + }) + + it("each role has valid fallbacks", () => { + for (const role of AGENT_ROLES) { + const routing = ROLE_ROUTING[role] + for (const fb of routing.fallbacks) { + expect(ALL_CLI_NAMES).toContain(fb) + } + expect(routing.fallbacks).not.toContain(routing.primary) + } + }) + + it("developer routes to codex", () => { + expect(ROLE_ROUTING.developer.primary).toBe("codex") + }) + + it("researcher routes to gemini", () => { + expect(ROLE_ROUTING.researcher.primary).toBe("gemini") + }) + + it("architect routes to claude", () => { + expect(ROLE_ROUTING.architect.primary).toBe("claude") + }) +}) diff --git a/tests/safe-env.test.ts b/tests/safe-env.test.ts index a55ee35..375e9a8 100644 --- a/tests/safe-env.test.ts +++ b/tests/safe-env.test.ts @@ -6,16 +6,26 @@ describe("SAFE_ENV_VARS", () => { expect(SAFE_ENV_VARS).toContain("PATH") }) - it("includes API key vars", () => { - expect(SAFE_ENV_VARS).toContain("ANTHROPIC_API_KEY") - expect(SAFE_ENV_VARS).toContain("GOOGLE_API_KEY") - expect(SAFE_ENV_VARS).toContain("OPENAI_API_KEY") + it("does NOT include API key vars (CLIs handle their own auth)", () => { + expect(SAFE_ENV_VARS).not.toContain("ANTHROPIC_API_KEY") + expect(SAFE_ENV_VARS).not.toContain("GOOGLE_API_KEY") + expect(SAFE_ENV_VARS).not.toContain("OPENAI_API_KEY") }) - it("includes proxy vars", () => { + it("includes Windows system vars", () => { + expect(SAFE_ENV_VARS).toContain("USERPROFILE") + expect(SAFE_ENV_VARS).toContain("SYSTEMROOT") + expect(SAFE_ENV_VARS).toContain("APPDATA") + expect(SAFE_ENV_VARS).toContain("PATHEXT") + }) + + it("includes proxy vars with both casings", () => { expect(SAFE_ENV_VARS).toContain("HTTP_PROXY") expect(SAFE_ENV_VARS).toContain("HTTPS_PROXY") expect(SAFE_ENV_VARS).toContain("NO_PROXY") + expect(SAFE_ENV_VARS).toContain("http_proxy") + expect(SAFE_ENV_VARS).toContain("https_proxy") + expect(SAFE_ENV_VARS).toContain("no_proxy") }) })