feat: Add external LLM agents (Codex, Gemini) for brainstorming and review

# External LLM Agents for Brainstorming and Review

## Overview

Add the ability to call Codex CLI and Gemini CLI in non-interactive mode from the Compounding Engineering workflow for brainstorming/planning and code review tasks. These agents provide alternative AI perspectives without performing any code modifications.

## Problem Statement

Currently, the CE workflow relies solely on Claude-based agents for planning and code review. Adding perspectives from GPT-5.1 (via Codex CLI) and Gemini 2.5 Pro (via Gemini CLI) would provide:

1. **Diverse AI perspectives** on architectural decisions and code quality
2. **Second opinions** for critical reviews
3. **Brainstorming variety** for feature planning
4. **Validation** of Claude's analysis through independent models

## Technical Approach

### Architecture

Both Codex and Gemini CLIs are already installed locally:
- **Codex CLI**: v0.63.0 at `/home/ron/.nvm/versions/node/v22.14.0/bin/codex`
- **Gemini CLI**: v0.18.4 at `/home/ron/.nvm/versions/node/v22.14.0/bin/gemini`

The integration will use **read-only sandbox mode** exclusively to prevent any code modifications.

### Implementation Phases

#### Phase 1: Core LLM Wrapper Agents

Create two new agents in `agents/research/`:

##### 1. `codex-brainstorm.md` - GPT-5.1 for Brainstorming

```markdown
---
name: codex-brainstorm
description: Use this agent to get GPT-5.1's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications.
model: haiku
color: green
---

You are a coordinator that invokes the Codex CLI to get GPT-5.1's perspective on brainstorming and planning tasks.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

Execute the Codex CLI in non-interactive mode with GPT-5.1 and high reasoning:

\`\`\`bash
codex exec \
  -m gpt-5.1 \
  --sandbox read-only \
  -c model_reasoning_effort=high \
  -o /tmp/codex-brainstorm-output.md \
  "{prompt_with_context}"
\`\`\`

## Key Flags

- `-m gpt-5.1`: Uses GPT-5.1 model (optimized for brainstorming)
- `--sandbox read-only`: REQUIRED - prevents any file modifications
- `-c model_reasoning_effort=high`: Enables deep reasoning for quality analysis
- `-o <file>`: Captures the final response for parsing

## Use Cases

1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"

## Output Format

Parse the output file and present GPT-5.1's perspective with clear attribution:

---
### GPT-5.1 Perspective (via Codex)

{parsed_response}

---
```

##### 2. `gemini-brainstorm.md` - Gemini 3 Pro for Brainstorming

```markdown
---
name: gemini-brainstorm
description: Use this agent to get Gemini 3 Pro's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---

You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's perspective on brainstorming and planning tasks.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

Execute the Gemini CLI in non-interactive mode with the highest-tier model:

\`\`\`bash
gemini -m gemini-3-pro-preview \
  -p "{prompt_with_context}" \
  --output-format json 2>&1 | jq -r '.response'
\`\`\`

## Key Flags

- `-m gemini-3-pro-preview`: Specifies Gemini 3 Pro (requires Ultra subscription)
- `-p "{prompt}"`: Non-interactive prompt mode
- `--output-format json`: Structured output for parsing
- NO `--yolo` or `--approval-mode=yolo`: Never auto-approve actions

**Note:** Requires Google AI Ultra subscription for Gemini 3 Pro access.

## Use Cases

1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"

## Output Format

Parse the JSON response and present Gemini's perspective with clear attribution:

---
### Gemini 3 Pro Perspective

{parsed_response}

---
```

##### 3. `codex-reviewer.md` - GPT-5.1-Codex-Max for Code Review

```markdown
---
name: codex-reviewer
description: Use this agent to get GPT-5.1-Codex-Max's perspective on code review with maximum reasoning. Read-only analysis only - no code modifications.
model: haiku
color: green
---

You are a coordinator that invokes the Codex CLI to get GPT-5.1-Codex-Max's code review perspective with maximum reasoning effort.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

For code review, pipe the diff or file content to Codex with MAXIMUM reasoning:

\`\`\`bash
# Review a PR diff
git diff main...HEAD | codex exec \
  -m gpt-5.1-codex-max \
  --sandbox read-only \
  -c model_reasoning_effort=xhigh \
  -o /tmp/codex-review-output.md \
  "Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback."

# Review specific files
cat {file_path} | codex exec \
  -m gpt-5.1-codex-max \
  --sandbox read-only \
  -c model_reasoning_effort=xhigh \
  -o /tmp/codex-review-output.md \
  "Review this code for quality, security, and performance issues."
\`\`\`

## Review Focus Areas

1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)

## Output Format

---
### GPT-5.1 Code Review (via Codex)

**Security Findings:**
{findings}

**Performance Findings:**
{findings}

**Architecture Findings:**
{findings}

**Quality Findings:**
{findings}

---
```

##### 4. `gemini-reviewer.md` - Gemini 3 Pro for Code Review

```markdown
---
name: gemini-reviewer
description: Use this agent to get Gemini 3 Pro's perspective on code review. Read-only analysis only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---

You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's code review perspective.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

For code review, pipe the diff or file content to Gemini with thinking model:

\`\`\`bash
# Review a PR diff
git diff main...HEAD | gemini -m gemini-3-pro-preview \
  -p "Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback. Output as structured JSON with categories." \
  --output-format json 2>&1 | jq -r '.response'

# Review specific files
cat {file_path} | gemini -m gemini-3-pro-preview \
  -p "Review this code for quality, security, and performance issues." \
  --output-format json 2>&1 | jq -r '.response'
\`\`\`

## Review Focus Areas

1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)

## Output Format

---
### Gemini 3 Pro Code Review

**Security Findings:**
{findings}

**Performance Findings:**
{findings}

**Architecture Findings:**
{findings}

**Quality Findings:**
{findings}

---
```

#### Phase 2: Unified Multi-LLM Agent

Create a coordinator agent that can invoke multiple LLMs:

##### `multi-llm-brainstorm.md`

```markdown
---
name: multi-llm-brainstorm
description: Use this agent to get perspectives from multiple LLMs (Claude, GPT-5.1, Gemini) on planning and brainstorming tasks. Synthesizes diverse AI viewpoints.
---

You are a Multi-LLM Coordinator that gathers perspectives from multiple AI models to provide comprehensive analysis.

## Available LLMs

1. **Claude** (current context) - Direct analysis
2. **GPT-5.1** (via Codex CLI) - Brainstorm: `-m gpt-5.1 -c model_reasoning_effort=high` | Review: `-m gpt-5.1-codex-max -c model_reasoning_effort=xhigh`
3. **Gemini 3 Pro** (via Gemini CLI) - `gemini -m gemini-3-pro-preview -p`

## Workflow

1. **Analyze the request** to understand what perspectives would be valuable
2. **Invoke external LLMs in parallel** for efficiency
3. **Synthesize findings** into a unified view highlighting:
   - Points of agreement across models
   - Unique insights from each model
   - Areas of disagreement (with reasoning)

## Invocation Commands

### GPT-5.1 (Codex) - Brainstorming
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`

### GPT-5.1-Codex-Max (Codex) - Code Review (Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`

### Gemini 3 Pro
\`\`\`bash
gemini -m gemini-3-pro-preview -p "{prompt}" --output-format json | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
\`\`\`

## Output Format

---
## Multi-LLM Analysis

### Claude's Perspective
{direct_analysis}

### GPT-5.1's Perspective (via Codex)
{codex_output}

### Gemini 3 Pro's Perspective
{gemini_output}

### Synthesis

**Points of Agreement:**
- {common_findings}

**Unique Insights:**
- Claude: {unique}
- GPT-5.1: {unique}
- Gemini: {unique}

**Disagreements:**
- {topic}: Claude says X, GPT-5.1 says Y, Gemini says Z

### Recommended Approach
{synthesized_recommendation}

---
```

#### Phase 3: Skills for Direct CLI Access

Create skills in `skills/` for users who want direct CLI access:

##### `skills/codex-cli/SKILL.md`

```markdown
# Codex CLI Skill

Invoke OpenAI's Codex CLI for GPT-5.1 analysis in non-interactive mode.

## Quick Commands

### Brainstorming (GPT-5.1 + High Reasoning)
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o output.md "Your brainstorming prompt here"
\`\`\`

### Code Review (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
git diff main...HEAD | codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o review.md "Review this code for issues"
\`\`\`

### Architecture Analysis (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o analysis.md "Analyze the architecture in this codebase"
\`\`\`

## Safety Notes

- ALWAYS use `--sandbox read-only` to prevent modifications
- NEVER use `--full-auto` or `--dangerously-bypass-approvals-and-sandbox`
- Output is for analysis only, not execution
```

##### `skills/gemini-cli/SKILL.md`

```markdown
# Gemini CLI Skill

Invoke Google's Gemini CLI for Gemini 3 Pro analysis in non-interactive mode. Requires Google AI Ultra subscription.

## Quick Commands

### Brainstorming
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Your brainstorming prompt here" --output-format text
\`\`\`

### Code Review
\`\`\`bash
git diff main...HEAD | gemini -m gemini-3-pro-preview -p "Review this code for issues" --output-format text
\`\`\`

### Architecture Analysis
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Analyze the architecture in this codebase" --output-format text
\`\`\`

## Safety Notes

- NEVER use `--yolo` or `--approval-mode=yolo`
- Default approval mode prevents any file modifications
- Output is for analysis only, not execution
```

#### Phase 4: Integration into Existing Workflows

##### Update `/plan` Command

Add optional multi-LLM brainstorming to the plan workflow:

```markdown
### Optional: Multi-LLM Brainstorming

<thinking>
For complex features, getting diverse AI perspectives can reveal blind spots and alternative approaches.
</thinking>

If the feature is complex or architecturally significant, optionally run:

- Task multi-llm-brainstorm(feature_description)

This will gather perspectives from Claude, GPT-5.1, and Gemini 2.5 Pro, synthesizing their recommendations.
```

##### Update `/review` Command

Add optional external LLM review step:

```markdown
### Optional: External LLM Review

For critical PRs or when a second opinion is valuable:

<parallel_tasks>
Run external LLM reviewers in parallel with existing agents:

- Task codex-reviewer(PR diff)
- Task gemini-reviewer(PR diff)
</parallel_tasks>

Synthesize external findings with internal agent findings in the final report.
```

## Acceptance Criteria

### Functional Requirements

- [ ] `codex-brainstorm` agent can invoke Codex CLI and capture output
- [ ] `gemini-brainstorm` agent can invoke Gemini CLI and capture output
- [ ] `codex-reviewer` agent can review code via Codex CLI
- [ ] `gemini-reviewer` agent can review code via Gemini CLI
- [ ] `multi-llm-brainstorm` agent coordinates all three LLMs
- [ ] All agents use read-only mode exclusively
- [ ] Output is clearly attributed to source LLM
- [ ] Errors from external CLIs are handled gracefully

### Non-Functional Requirements

- [ ] No file modifications ever occur from external LLMs
- [ ] Timeout handling for slow CLI responses (60s default)
- [ ] Graceful degradation if an LLM is unavailable
- [ ] Clear documentation of CLI version requirements

### Quality Gates

- [ ] Manual testing of each agent in isolation
- [ ] Integration testing within /plan and /review workflows
- [ ] Documentation updated in CLAUDE.md

## Technical Details

### Affected Files

**New Files:**
- `agents/research/codex-brainstorm.md`
- `agents/research/gemini-brainstorm.md`
- `agents/research/codex-reviewer.md`
- `agents/research/gemini-reviewer.md`
- `agents/research/multi-llm-brainstorm.md`
- `skills/codex-cli/SKILL.md`
- `skills/gemini-cli/SKILL.md`

**Modified Files:**
- `commands/workflows/plan.md` - Add optional multi-LLM step
- `commands/workflows/review.md` - Add optional external review step
- `CLAUDE.md` - Document new agents and capabilities
- `local_manifest.yml` - Add new local files

### CLI Requirements

| CLI | Version | Model | Reasoning | Purpose |
|-----|---------|-------|-----------|---------|
| Codex | 0.63.0+ | `gpt-5.1` | `high` | Brainstorming |
| Codex | 0.63.0+ | `gpt-5.1-codex-max` | `xhigh` (maximum) | Code review |
| Gemini | 0.18.4+ | `gemini-3-pro-preview` | Built-in (Gemini 3 Pro) | Brainstorming, code review |

### Safety Constraints

**Codex CLI:**
- MUST use `--sandbox read-only`
- MUST NOT use `--full-auto`
- MUST NOT use `--dangerously-bypass-approvals-and-sandbox`

**Gemini CLI:**
- MUST NOT use `--yolo` or `--approval-mode=yolo`
- Default approval mode (prompts for actions) is sufficient since we don't request actions

## Verified CLI Authentication (Tested 2025-12-02)

All configurations verified working with current logged-in credentials:

| CLI | Model | Reasoning | Status | Test Output |
|-----|-------|-----------|--------|-------------|
| **Codex** | `gpt-5.1` | `high` | ✅ Working | Auth/session best practices response |
| **Codex** | `gpt-5.1-codex-max` | `xhigh` | ✅ Working | Security review checklist response |
| **Gemini** | `gemini-3-pro-preview` | Built-in | ✅ Working | Database normalization response |

### Test Commands Used

```bash
# Codex brainstorming (GPT-5.1 + high)
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high \
  -o /tmp/test.txt "In 2 sentences, what's a good approach for adding authentication?"

# Codex code review (GPT-5.1-Codex-Max + xhigh)
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh \
  -o /tmp/test.txt "In 2 sentences, what should I look for in a security code review?"

# Gemini 3 Pro
gemini -m gemini-3-pro-preview -p "In 2 sentences, what's a good database design principle?" \
  --output-format text
```

**Note:** Gemini shows an IDE extension warning that can be ignored - CLI functions correctly.

## Model Availability Notes

Based on testing:

1. **Codex CLI**: Uses `gpt-5.1-codex-max` by default with high reasoning enabled via `-c model_reasoning_effort=high`. Premium models like `o3` require API key authentication.

2. **Gemini CLI**: With Google AI Ultra subscription, `gemini-3-pro-preview` is available for the highest-quality reasoning and analysis.

The plan uses the highest-tier models with appropriate reasoning levels:
- **Codex Brainstorming**: `gpt-5.1` + `model_reasoning_effort=high`
- **Codex Code Review**: `gpt-5.1-codex-max` + `model_reasoning_effort=xhigh` (maximum)
- **Gemini**: `gemini-3-pro-preview` (requires Ultra subscription)

## Resources

### Documentation
- [Codex CLI Reference](https://developers.openai.com/codex/cli/reference/)
- [Codex Exec Mode](https://github.com/openai/codex/blob/main/docs/exec.md)
- [Gemini CLI Headless Mode](https://geminicli.com/docs/cli/headless/)
- [Gemini CLI Model Selection](https://geminicli.com/docs/cli/model/)

### Related Work
- Existing `gpt-5` agent pattern in CE workflow
- Multi-agent parallel execution in `/review` command


CLI	Version	Model	Reasoning	Purpose
Codex	0.63.0+	`gpt-5.1`	`high`	Brainstorming
Codex	0.63.0+	`gpt-5.1-codex-max`	`xhigh` (maximum)	Code review
Gemini	0.18.4+	`gemini-3-pro-preview`	Built-in (Gemini 3 Pro)	Brainstorming, code review

CLI	Model	Reasoning	Status	Test Output
Codex	`gpt-5.1`	`high`	✅ Working	Auth/session best practices response
Codex	`gpt-5.1-codex-max`	`xhigh`	✅ Working	Security review checklist response
Gemini	`gemini-3-pro-preview`	Built-in	✅ Working	Database normalization response

feat: Add external LLM agents (Codex, Gemini) for brainstorming and review #34

Description

External LLM Agents for Brainstorming and Review

Overview

Problem Statement

Technical Approach

Architecture

Implementation Phases

Phase 1: Core LLM Wrapper Agents

1. codex-brainstorm.md - GPT-5.1 for Brainstorming

2. gemini-brainstorm.md - Gemini 3 Pro for Brainstorming

3. codex-reviewer.md - GPT-5.1-Codex-Max for Code Review

4. gemini-reviewer.md - Gemini 3 Pro for Code Review

Phase 2: Unified Multi-LLM Agent

multi-llm-brainstorm.md

Phase 3: Skills for Direct CLI Access

skills/codex-cli/SKILL.md

skills/gemini-cli/SKILL.md

Phase 4: Integration into Existing Workflows

Update /plan Command

Update /review Command

Acceptance Criteria

Functional Requirements

Non-Functional Requirements

Quality Gates

Technical Details

Affected Files

CLI Requirements

Safety Constraints

Verified CLI Authentication (Tested 2025-12-02)

Test Commands Used

Model Availability Notes

Resources

Documentation

Related Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `codex-brainstorm.md` - GPT-5.1 for Brainstorming

2. `gemini-brainstorm.md` - Gemini 3 Pro for Brainstorming

3. `codex-reviewer.md` - GPT-5.1-Codex-Max for Code Review

4. `gemini-reviewer.md` - Gemini 3 Pro for Code Review

`multi-llm-brainstorm.md`

`skills/codex-cli/SKILL.md`

`skills/gemini-cli/SKILL.md`

Update `/plan` Command

Update `/review` Command