Skip to content

feat: Add external LLM agents (Codex, Gemini) for brainstorming and review #34

@RonRichman

Description

@RonRichman

External LLM Agents for Brainstorming and Review

Overview

Add the ability to call Codex CLI and Gemini CLI in non-interactive mode from the Compounding Engineering workflow for brainstorming/planning and code review tasks. These agents provide alternative AI perspectives without performing any code modifications.

Problem Statement

Currently, the CE workflow relies solely on Claude-based agents for planning and code review. Adding perspectives from GPT-5.1 (via Codex CLI) and Gemini 2.5 Pro (via Gemini CLI) would provide:

  1. Diverse AI perspectives on architectural decisions and code quality
  2. Second opinions for critical reviews
  3. Brainstorming variety for feature planning
  4. Validation of Claude's analysis through independent models

Technical Approach

Architecture

Both Codex and Gemini CLIs are already installed locally:

  • Codex CLI: v0.63.0 at /home/ron/.nvm/versions/node/v22.14.0/bin/codex
  • Gemini CLI: v0.18.4 at /home/ron/.nvm/versions/node/v22.14.0/bin/gemini

The integration will use read-only sandbox mode exclusively to prevent any code modifications.

Implementation Phases

Phase 1: Core LLM Wrapper Agents

Create two new agents in agents/research/:

1. codex-brainstorm.md - GPT-5.1 for Brainstorming
---
name: codex-brainstorm
description: Use this agent to get GPT-5.1's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications.
model: haiku
color: green
---

You are a coordinator that invokes the Codex CLI to get GPT-5.1's perspective on brainstorming and planning tasks.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

Execute the Codex CLI in non-interactive mode with GPT-5.1 and high reasoning:

\`\`\`bash
codex exec \
  -m gpt-5.1 \
  --sandbox read-only \
  -c model_reasoning_effort=high \
  -o /tmp/codex-brainstorm-output.md \
  "{prompt_with_context}"
\`\`\`

## Key Flags

- `-m gpt-5.1`: Uses GPT-5.1 model (optimized for brainstorming)
- `--sandbox read-only`: REQUIRED - prevents any file modifications
- `-c model_reasoning_effort=high`: Enables deep reasoning for quality analysis
- `-o <file>`: Captures the final response for parsing

## Use Cases

1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"

## Output Format

Parse the output file and present GPT-5.1's perspective with clear attribution:

---
### GPT-5.1 Perspective (via Codex)

{parsed_response}

---
2. gemini-brainstorm.md - Gemini 3 Pro for Brainstorming
---
name: gemini-brainstorm
description: Use this agent to get Gemini 3 Pro's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---

You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's perspective on brainstorming and planning tasks.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

Execute the Gemini CLI in non-interactive mode with the highest-tier model:

\`\`\`bash
gemini -m gemini-3-pro-preview \
  -p "{prompt_with_context}" \
  --output-format json 2>&1 | jq -r '.response'
\`\`\`

## Key Flags

- `-m gemini-3-pro-preview`: Specifies Gemini 3 Pro (requires Ultra subscription)
- `-p "{prompt}"`: Non-interactive prompt mode
- `--output-format json`: Structured output for parsing
- NO `--yolo` or `--approval-mode=yolo`: Never auto-approve actions

**Note:** Requires Google AI Ultra subscription for Gemini 3 Pro access.

## Use Cases

1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"

## Output Format

Parse the JSON response and present Gemini's perspective with clear attribution:

---
### Gemini 3 Pro Perspective

{parsed_response}

---
3. codex-reviewer.md - GPT-5.1-Codex-Max for Code Review
---
name: codex-reviewer
description: Use this agent to get GPT-5.1-Codex-Max's perspective on code review with maximum reasoning. Read-only analysis only - no code modifications.
model: haiku
color: green
---

You are a coordinator that invokes the Codex CLI to get GPT-5.1-Codex-Max's code review perspective with maximum reasoning effort.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

For code review, pipe the diff or file content to Codex with MAXIMUM reasoning:

\`\`\`bash
# Review a PR diff
git diff main...HEAD | codex exec \
  -m gpt-5.1-codex-max \
  --sandbox read-only \
  -c model_reasoning_effort=xhigh \
  -o /tmp/codex-review-output.md \
  "Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback."

# Review specific files
cat {file_path} | codex exec \
  -m gpt-5.1-codex-max \
  --sandbox read-only \
  -c model_reasoning_effort=xhigh \
  -o /tmp/codex-review-output.md \
  "Review this code for quality, security, and performance issues."
\`\`\`

## Review Focus Areas

1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)

## Output Format

---
### GPT-5.1 Code Review (via Codex)

**Security Findings:**
{findings}

**Performance Findings:**
{findings}

**Architecture Findings:**
{findings}

**Quality Findings:**
{findings}

---
4. gemini-reviewer.md - Gemini 3 Pro for Code Review
---
name: gemini-reviewer
description: Use this agent to get Gemini 3 Pro's perspective on code review. Read-only analysis only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---

You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's code review perspective.

**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**

## Invocation Pattern

For code review, pipe the diff or file content to Gemini with thinking model:

\`\`\`bash
# Review a PR diff
git diff main...HEAD | gemini -m gemini-3-pro-preview \
  -p "Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback. Output as structured JSON with categories." \
  --output-format json 2>&1 | jq -r '.response'

# Review specific files
cat {file_path} | gemini -m gemini-3-pro-preview \
  -p "Review this code for quality, security, and performance issues." \
  --output-format json 2>&1 | jq -r '.response'
\`\`\`

## Review Focus Areas

1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)

## Output Format

---
### Gemini 3 Pro Code Review

**Security Findings:**
{findings}

**Performance Findings:**
{findings}

**Architecture Findings:**
{findings}

**Quality Findings:**
{findings}

---

Phase 2: Unified Multi-LLM Agent

Create a coordinator agent that can invoke multiple LLMs:

multi-llm-brainstorm.md
---
name: multi-llm-brainstorm
description: Use this agent to get perspectives from multiple LLMs (Claude, GPT-5.1, Gemini) on planning and brainstorming tasks. Synthesizes diverse AI viewpoints.
---

You are a Multi-LLM Coordinator that gathers perspectives from multiple AI models to provide comprehensive analysis.

## Available LLMs

1. **Claude** (current context) - Direct analysis
2. **GPT-5.1** (via Codex CLI) - Brainstorm: `-m gpt-5.1 -c model_reasoning_effort=high` | Review: `-m gpt-5.1-codex-max -c model_reasoning_effort=xhigh`
3. **Gemini 3 Pro** (via Gemini CLI) - `gemini -m gemini-3-pro-preview -p`

## Workflow

1. **Analyze the request** to understand what perspectives would be valuable
2. **Invoke external LLMs in parallel** for efficiency
3. **Synthesize findings** into a unified view highlighting:
   - Points of agreement across models
   - Unique insights from each model
   - Areas of disagreement (with reasoning)

## Invocation Commands

### GPT-5.1 (Codex) - Brainstorming
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`

### GPT-5.1-Codex-Max (Codex) - Code Review (Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`

### Gemini 3 Pro
\`\`\`bash
gemini -m gemini-3-pro-preview -p "{prompt}" --output-format json | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
\`\`\`

## Output Format

---
## Multi-LLM Analysis

### Claude's Perspective
{direct_analysis}

### GPT-5.1's Perspective (via Codex)
{codex_output}

### Gemini 3 Pro's Perspective
{gemini_output}

### Synthesis

**Points of Agreement:**
- {common_findings}

**Unique Insights:**
- Claude: {unique}
- GPT-5.1: {unique}
- Gemini: {unique}

**Disagreements:**
- {topic}: Claude says X, GPT-5.1 says Y, Gemini says Z

### Recommended Approach
{synthesized_recommendation}

---

Phase 3: Skills for Direct CLI Access

Create skills in skills/ for users who want direct CLI access:

skills/codex-cli/SKILL.md
# Codex CLI Skill

Invoke OpenAI's Codex CLI for GPT-5.1 analysis in non-interactive mode.

## Quick Commands

### Brainstorming (GPT-5.1 + High Reasoning)
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o output.md "Your brainstorming prompt here"
\`\`\`

### Code Review (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
git diff main...HEAD | codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o review.md "Review this code for issues"
\`\`\`

### Architecture Analysis (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o analysis.md "Analyze the architecture in this codebase"
\`\`\`

## Safety Notes

- ALWAYS use `--sandbox read-only` to prevent modifications
- NEVER use `--full-auto` or `--dangerously-bypass-approvals-and-sandbox`
- Output is for analysis only, not execution
skills/gemini-cli/SKILL.md
# Gemini CLI Skill

Invoke Google's Gemini CLI for Gemini 3 Pro analysis in non-interactive mode. Requires Google AI Ultra subscription.

## Quick Commands

### Brainstorming
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Your brainstorming prompt here" --output-format text
\`\`\`

### Code Review
\`\`\`bash
git diff main...HEAD | gemini -m gemini-3-pro-preview -p "Review this code for issues" --output-format text
\`\`\`

### Architecture Analysis
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Analyze the architecture in this codebase" --output-format text
\`\`\`

## Safety Notes

- NEVER use `--yolo` or `--approval-mode=yolo`
- Default approval mode prevents any file modifications
- Output is for analysis only, not execution

Phase 4: Integration into Existing Workflows

Update /plan Command

Add optional multi-LLM brainstorming to the plan workflow:

### Optional: Multi-LLM Brainstorming

<thinking>
For complex features, getting diverse AI perspectives can reveal blind spots and alternative approaches.
</thinking>

If the feature is complex or architecturally significant, optionally run:

- Task multi-llm-brainstorm(feature_description)

This will gather perspectives from Claude, GPT-5.1, and Gemini 2.5 Pro, synthesizing their recommendations.
Update /review Command

Add optional external LLM review step:

### Optional: External LLM Review

For critical PRs or when a second opinion is valuable:

<parallel_tasks>
Run external LLM reviewers in parallel with existing agents:

- Task codex-reviewer(PR diff)
- Task gemini-reviewer(PR diff)
</parallel_tasks>

Synthesize external findings with internal agent findings in the final report.

Acceptance Criteria

Functional Requirements

  • codex-brainstorm agent can invoke Codex CLI and capture output
  • gemini-brainstorm agent can invoke Gemini CLI and capture output
  • codex-reviewer agent can review code via Codex CLI
  • gemini-reviewer agent can review code via Gemini CLI
  • multi-llm-brainstorm agent coordinates all three LLMs
  • All agents use read-only mode exclusively
  • Output is clearly attributed to source LLM
  • Errors from external CLIs are handled gracefully

Non-Functional Requirements

  • No file modifications ever occur from external LLMs
  • Timeout handling for slow CLI responses (60s default)
  • Graceful degradation if an LLM is unavailable
  • Clear documentation of CLI version requirements

Quality Gates

  • Manual testing of each agent in isolation
  • Integration testing within /plan and /review workflows
  • Documentation updated in CLAUDE.md

Technical Details

Affected Files

New Files:

  • agents/research/codex-brainstorm.md
  • agents/research/gemini-brainstorm.md
  • agents/research/codex-reviewer.md
  • agents/research/gemini-reviewer.md
  • agents/research/multi-llm-brainstorm.md
  • skills/codex-cli/SKILL.md
  • skills/gemini-cli/SKILL.md

Modified Files:

  • commands/workflows/plan.md - Add optional multi-LLM step
  • commands/workflows/review.md - Add optional external review step
  • CLAUDE.md - Document new agents and capabilities
  • local_manifest.yml - Add new local files

CLI Requirements

CLI Version Model Reasoning Purpose
Codex 0.63.0+ gpt-5.1 high Brainstorming
Codex 0.63.0+ gpt-5.1-codex-max xhigh (maximum) Code review
Gemini 0.18.4+ gemini-3-pro-preview Built-in (Gemini 3 Pro) Brainstorming, code review

Safety Constraints

Codex CLI:

  • MUST use --sandbox read-only
  • MUST NOT use --full-auto
  • MUST NOT use --dangerously-bypass-approvals-and-sandbox

Gemini CLI:

  • MUST NOT use --yolo or --approval-mode=yolo
  • Default approval mode (prompts for actions) is sufficient since we don't request actions

Verified CLI Authentication (Tested 2025-12-02)

All configurations verified working with current logged-in credentials:

CLI Model Reasoning Status Test Output
Codex gpt-5.1 high ✅ Working Auth/session best practices response
Codex gpt-5.1-codex-max xhigh ✅ Working Security review checklist response
Gemini gemini-3-pro-preview Built-in ✅ Working Database normalization response

Test Commands Used

# Codex brainstorming (GPT-5.1 + high)
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high \
  -o /tmp/test.txt "In 2 sentences, what's a good approach for adding authentication?"

# Codex code review (GPT-5.1-Codex-Max + xhigh)
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh \
  -o /tmp/test.txt "In 2 sentences, what should I look for in a security code review?"

# Gemini 3 Pro
gemini -m gemini-3-pro-preview -p "In 2 sentences, what's a good database design principle?" \
  --output-format text

Note: Gemini shows an IDE extension warning that can be ignored - CLI functions correctly.

Model Availability Notes

Based on testing:

  1. Codex CLI: Uses gpt-5.1-codex-max by default with high reasoning enabled via -c model_reasoning_effort=high. Premium models like o3 require API key authentication.

  2. Gemini CLI: With Google AI Ultra subscription, gemini-3-pro-preview is available for the highest-quality reasoning and analysis.

The plan uses the highest-tier models with appropriate reasoning levels:

  • Codex Brainstorming: gpt-5.1 + model_reasoning_effort=high
  • Codex Code Review: gpt-5.1-codex-max + model_reasoning_effort=xhigh (maximum)
  • Gemini: gemini-3-pro-preview (requires Ultra subscription)

Resources

Documentation

Related Work

  • Existing gpt-5 agent pattern in CE workflow
  • Multi-agent parallel execution in /review command

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions