External LLM Agents for Brainstorming and Review
Overview
Add the ability to call Codex CLI and Gemini CLI in non-interactive mode from the Compounding Engineering workflow for brainstorming/planning and code review tasks. These agents provide alternative AI perspectives without performing any code modifications.
Problem Statement
Currently, the CE workflow relies solely on Claude-based agents for planning and code review. Adding perspectives from GPT-5.1 (via Codex CLI) and Gemini 2.5 Pro (via Gemini CLI) would provide:
- Diverse AI perspectives on architectural decisions and code quality
- Second opinions for critical reviews
- Brainstorming variety for feature planning
- Validation of Claude's analysis through independent models
Technical Approach
Architecture
Both Codex and Gemini CLIs are already installed locally:
- Codex CLI: v0.63.0 at
/home/ron/.nvm/versions/node/v22.14.0/bin/codex
- Gemini CLI: v0.18.4 at
/home/ron/.nvm/versions/node/v22.14.0/bin/gemini
The integration will use read-only sandbox mode exclusively to prevent any code modifications.
Implementation Phases
Phase 1: Core LLM Wrapper Agents
Create two new agents in agents/research/:
1. codex-brainstorm.md - GPT-5.1 for Brainstorming
---
name: codex-brainstorm
description: Use this agent to get GPT-5.1's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications.
model: haiku
color: green
---
You are a coordinator that invokes the Codex CLI to get GPT-5.1's perspective on brainstorming and planning tasks.
**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**
## Invocation Pattern
Execute the Codex CLI in non-interactive mode with GPT-5.1 and high reasoning:
\`\`\`bash
codex exec \
-m gpt-5.1 \
--sandbox read-only \
-c model_reasoning_effort=high \
-o /tmp/codex-brainstorm-output.md \
"{prompt_with_context}"
\`\`\`
## Key Flags
- `-m gpt-5.1`: Uses GPT-5.1 model (optimized for brainstorming)
- `--sandbox read-only`: REQUIRED - prevents any file modifications
- `-c model_reasoning_effort=high`: Enables deep reasoning for quality analysis
- `-o <file>`: Captures the final response for parsing
## Use Cases
1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"
## Output Format
Parse the output file and present GPT-5.1's perspective with clear attribution:
---
### GPT-5.1 Perspective (via Codex)
{parsed_response}
---
2. gemini-brainstorm.md - Gemini 3 Pro for Brainstorming
---
name: gemini-brainstorm
description: Use this agent to get Gemini 3 Pro's perspective on planning, brainstorming, and architectural decisions. Read-only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---
You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's perspective on brainstorming and planning tasks.
**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**
## Invocation Pattern
Execute the Gemini CLI in non-interactive mode with the highest-tier model:
\`\`\`bash
gemini -m gemini-3-pro-preview \
-p "{prompt_with_context}" \
--output-format json 2>&1 | jq -r '.response'
\`\`\`
## Key Flags
- `-m gemini-3-pro-preview`: Specifies Gemini 3 Pro (requires Ultra subscription)
- `-p "{prompt}"`: Non-interactive prompt mode
- `--output-format json`: Structured output for parsing
- NO `--yolo` or `--approval-mode=yolo`: Never auto-approve actions
**Note:** Requires Google AI Ultra subscription for Gemini 3 Pro access.
## Use Cases
1. **Feature brainstorming**: "Propose 5 approaches to implement X"
2. **Architecture review**: "Analyze this architecture for potential issues"
3. **Trade-off analysis**: "Compare approach A vs B for this use case"
4. **Planning validation**: "Review this plan and identify gaps"
## Output Format
Parse the JSON response and present Gemini's perspective with clear attribution:
---
### Gemini 3 Pro Perspective
{parsed_response}
---
3. codex-reviewer.md - GPT-5.1-Codex-Max for Code Review
---
name: codex-reviewer
description: Use this agent to get GPT-5.1-Codex-Max's perspective on code review with maximum reasoning. Read-only analysis only - no code modifications.
model: haiku
color: green
---
You are a coordinator that invokes the Codex CLI to get GPT-5.1-Codex-Max's code review perspective with maximum reasoning effort.
**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**
## Invocation Pattern
For code review, pipe the diff or file content to Codex with MAXIMUM reasoning:
\`\`\`bash
# Review a PR diff
git diff main...HEAD | codex exec \
-m gpt-5.1-codex-max \
--sandbox read-only \
-c model_reasoning_effort=xhigh \
-o /tmp/codex-review-output.md \
"Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback."
# Review specific files
cat {file_path} | codex exec \
-m gpt-5.1-codex-max \
--sandbox read-only \
-c model_reasoning_effort=xhigh \
-o /tmp/codex-review-output.md \
"Review this code for quality, security, and performance issues."
\`\`\`
## Review Focus Areas
1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)
## Output Format
---
### GPT-5.1 Code Review (via Codex)
**Security Findings:**
{findings}
**Performance Findings:**
{findings}
**Architecture Findings:**
{findings}
**Quality Findings:**
{findings}
---
4. gemini-reviewer.md - Gemini 3 Pro for Code Review
---
name: gemini-reviewer
description: Use this agent to get Gemini 3 Pro's perspective on code review. Read-only analysis only - no code modifications. Requires Google AI Ultra.
model: haiku
color: blue
---
You are a coordinator that invokes the Gemini CLI to get Gemini 3 Pro's code review perspective.
**CRITICAL: This agent is READ-ONLY. It must NEVER modify files.**
## Invocation Pattern
For code review, pipe the diff or file content to Gemini with thinking model:
\`\`\`bash
# Review a PR diff
git diff main...HEAD | gemini -m gemini-3-pro-preview \
-p "Review this code change for: security issues, performance problems, architectural concerns, and best practices violations. Provide specific line-level feedback. Output as structured JSON with categories." \
--output-format json 2>&1 | jq -r '.response'
# Review specific files
cat {file_path} | gemini -m gemini-3-pro-preview \
-p "Review this code for quality, security, and performance issues." \
--output-format json 2>&1 | jq -r '.response'
\`\`\`
## Review Focus Areas
1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Performance issues (N+1 queries, memory leaks, inefficient algorithms)
3. Architectural concerns (coupling, cohesion, patterns)
4. Code quality (readability, maintainability, testing)
## Output Format
---
### Gemini 3 Pro Code Review
**Security Findings:**
{findings}
**Performance Findings:**
{findings}
**Architecture Findings:**
{findings}
**Quality Findings:**
{findings}
---
Phase 2: Unified Multi-LLM Agent
Create a coordinator agent that can invoke multiple LLMs:
multi-llm-brainstorm.md
---
name: multi-llm-brainstorm
description: Use this agent to get perspectives from multiple LLMs (Claude, GPT-5.1, Gemini) on planning and brainstorming tasks. Synthesizes diverse AI viewpoints.
---
You are a Multi-LLM Coordinator that gathers perspectives from multiple AI models to provide comprehensive analysis.
## Available LLMs
1. **Claude** (current context) - Direct analysis
2. **GPT-5.1** (via Codex CLI) - Brainstorm: `-m gpt-5.1 -c model_reasoning_effort=high` | Review: `-m gpt-5.1-codex-max -c model_reasoning_effort=xhigh`
3. **Gemini 3 Pro** (via Gemini CLI) - `gemini -m gemini-3-pro-preview -p`
## Workflow
1. **Analyze the request** to understand what perspectives would be valuable
2. **Invoke external LLMs in parallel** for efficiency
3. **Synthesize findings** into a unified view highlighting:
- Points of agreement across models
- Unique insights from each model
- Areas of disagreement (with reasoning)
## Invocation Commands
### GPT-5.1 (Codex) - Brainstorming
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`
### GPT-5.1-Codex-Max (Codex) - Code Review (Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o /tmp/codex-output.md "{prompt}"
cat /tmp/codex-output.md
\`\`\`
### Gemini 3 Pro
\`\`\`bash
gemini -m gemini-3-pro-preview -p "{prompt}" --output-format json | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
\`\`\`
## Output Format
---
## Multi-LLM Analysis
### Claude's Perspective
{direct_analysis}
### GPT-5.1's Perspective (via Codex)
{codex_output}
### Gemini 3 Pro's Perspective
{gemini_output}
### Synthesis
**Points of Agreement:**
- {common_findings}
**Unique Insights:**
- Claude: {unique}
- GPT-5.1: {unique}
- Gemini: {unique}
**Disagreements:**
- {topic}: Claude says X, GPT-5.1 says Y, Gemini says Z
### Recommended Approach
{synthesized_recommendation}
---
Phase 3: Skills for Direct CLI Access
Create skills in skills/ for users who want direct CLI access:
skills/codex-cli/SKILL.md
# Codex CLI Skill
Invoke OpenAI's Codex CLI for GPT-5.1 analysis in non-interactive mode.
## Quick Commands
### Brainstorming (GPT-5.1 + High Reasoning)
\`\`\`bash
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high -o output.md "Your brainstorming prompt here"
\`\`\`
### Code Review (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
git diff main...HEAD | codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o review.md "Review this code for issues"
\`\`\`
### Architecture Analysis (GPT-5.1-Codex-Max + Maximum Reasoning)
\`\`\`bash
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh -o analysis.md "Analyze the architecture in this codebase"
\`\`\`
## Safety Notes
- ALWAYS use `--sandbox read-only` to prevent modifications
- NEVER use `--full-auto` or `--dangerously-bypass-approvals-and-sandbox`
- Output is for analysis only, not execution
skills/gemini-cli/SKILL.md
# Gemini CLI Skill
Invoke Google's Gemini CLI for Gemini 3 Pro analysis in non-interactive mode. Requires Google AI Ultra subscription.
## Quick Commands
### Brainstorming
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Your brainstorming prompt here" --output-format text
\`\`\`
### Code Review
\`\`\`bash
git diff main...HEAD | gemini -m gemini-3-pro-preview -p "Review this code for issues" --output-format text
\`\`\`
### Architecture Analysis
\`\`\`bash
gemini -m gemini-3-pro-preview -p "Analyze the architecture in this codebase" --output-format text
\`\`\`
## Safety Notes
- NEVER use `--yolo` or `--approval-mode=yolo`
- Default approval mode prevents any file modifications
- Output is for analysis only, not execution
Phase 4: Integration into Existing Workflows
Update /plan Command
Add optional multi-LLM brainstorming to the plan workflow:
### Optional: Multi-LLM Brainstorming
<thinking>
For complex features, getting diverse AI perspectives can reveal blind spots and alternative approaches.
</thinking>
If the feature is complex or architecturally significant, optionally run:
- Task multi-llm-brainstorm(feature_description)
This will gather perspectives from Claude, GPT-5.1, and Gemini 2.5 Pro, synthesizing their recommendations.
Update /review Command
Add optional external LLM review step:
### Optional: External LLM Review
For critical PRs or when a second opinion is valuable:
<parallel_tasks>
Run external LLM reviewers in parallel with existing agents:
- Task codex-reviewer(PR diff)
- Task gemini-reviewer(PR diff)
</parallel_tasks>
Synthesize external findings with internal agent findings in the final report.
Acceptance Criteria
Functional Requirements
Non-Functional Requirements
Quality Gates
Technical Details
Affected Files
New Files:
agents/research/codex-brainstorm.md
agents/research/gemini-brainstorm.md
agents/research/codex-reviewer.md
agents/research/gemini-reviewer.md
agents/research/multi-llm-brainstorm.md
skills/codex-cli/SKILL.md
skills/gemini-cli/SKILL.md
Modified Files:
commands/workflows/plan.md - Add optional multi-LLM step
commands/workflows/review.md - Add optional external review step
CLAUDE.md - Document new agents and capabilities
local_manifest.yml - Add new local files
CLI Requirements
| CLI |
Version |
Model |
Reasoning |
Purpose |
| Codex |
0.63.0+ |
gpt-5.1 |
high |
Brainstorming |
| Codex |
0.63.0+ |
gpt-5.1-codex-max |
xhigh (maximum) |
Code review |
| Gemini |
0.18.4+ |
gemini-3-pro-preview |
Built-in (Gemini 3 Pro) |
Brainstorming, code review |
Safety Constraints
Codex CLI:
- MUST use
--sandbox read-only
- MUST NOT use
--full-auto
- MUST NOT use
--dangerously-bypass-approvals-and-sandbox
Gemini CLI:
- MUST NOT use
--yolo or --approval-mode=yolo
- Default approval mode (prompts for actions) is sufficient since we don't request actions
Verified CLI Authentication (Tested 2025-12-02)
All configurations verified working with current logged-in credentials:
| CLI |
Model |
Reasoning |
Status |
Test Output |
| Codex |
gpt-5.1 |
high |
✅ Working |
Auth/session best practices response |
| Codex |
gpt-5.1-codex-max |
xhigh |
✅ Working |
Security review checklist response |
| Gemini |
gemini-3-pro-preview |
Built-in |
✅ Working |
Database normalization response |
Test Commands Used
# Codex brainstorming (GPT-5.1 + high)
codex exec -m gpt-5.1 --sandbox read-only -c model_reasoning_effort=high \
-o /tmp/test.txt "In 2 sentences, what's a good approach for adding authentication?"
# Codex code review (GPT-5.1-Codex-Max + xhigh)
codex exec -m gpt-5.1-codex-max --sandbox read-only -c model_reasoning_effort=xhigh \
-o /tmp/test.txt "In 2 sentences, what should I look for in a security code review?"
# Gemini 3 Pro
gemini -m gemini-3-pro-preview -p "In 2 sentences, what's a good database design principle?" \
--output-format text
Note: Gemini shows an IDE extension warning that can be ignored - CLI functions correctly.
Model Availability Notes
Based on testing:
-
Codex CLI: Uses gpt-5.1-codex-max by default with high reasoning enabled via -c model_reasoning_effort=high. Premium models like o3 require API key authentication.
-
Gemini CLI: With Google AI Ultra subscription, gemini-3-pro-preview is available for the highest-quality reasoning and analysis.
The plan uses the highest-tier models with appropriate reasoning levels:
- Codex Brainstorming:
gpt-5.1 + model_reasoning_effort=high
- Codex Code Review:
gpt-5.1-codex-max + model_reasoning_effort=xhigh (maximum)
- Gemini:
gemini-3-pro-preview (requires Ultra subscription)
Resources
Documentation
Related Work
- Existing
gpt-5 agent pattern in CE workflow
- Multi-agent parallel execution in
/review command
External LLM Agents for Brainstorming and Review
Overview
Add the ability to call Codex CLI and Gemini CLI in non-interactive mode from the Compounding Engineering workflow for brainstorming/planning and code review tasks. These agents provide alternative AI perspectives without performing any code modifications.
Problem Statement
Currently, the CE workflow relies solely on Claude-based agents for planning and code review. Adding perspectives from GPT-5.1 (via Codex CLI) and Gemini 2.5 Pro (via Gemini CLI) would provide:
Technical Approach
Architecture
Both Codex and Gemini CLIs are already installed locally:
/home/ron/.nvm/versions/node/v22.14.0/bin/codex/home/ron/.nvm/versions/node/v22.14.0/bin/geminiThe integration will use read-only sandbox mode exclusively to prevent any code modifications.
Implementation Phases
Phase 1: Core LLM Wrapper Agents
Create two new agents in
agents/research/:1.
codex-brainstorm.md- GPT-5.1 for Brainstorming2.
gemini-brainstorm.md- Gemini 3 Pro for Brainstorming3.
codex-reviewer.md- GPT-5.1-Codex-Max for Code Review4.
gemini-reviewer.md- Gemini 3 Pro for Code ReviewPhase 2: Unified Multi-LLM Agent
Create a coordinator agent that can invoke multiple LLMs:
multi-llm-brainstorm.mdPhase 3: Skills for Direct CLI Access
Create skills in
skills/for users who want direct CLI access:skills/codex-cli/SKILL.mdskills/gemini-cli/SKILL.mdPhase 4: Integration into Existing Workflows
Update
/planCommandAdd optional multi-LLM brainstorming to the plan workflow:
Update
/reviewCommandAdd optional external LLM review step:
Acceptance Criteria
Functional Requirements
codex-brainstormagent can invoke Codex CLI and capture outputgemini-brainstormagent can invoke Gemini CLI and capture outputcodex-revieweragent can review code via Codex CLIgemini-revieweragent can review code via Gemini CLImulti-llm-brainstormagent coordinates all three LLMsNon-Functional Requirements
Quality Gates
Technical Details
Affected Files
New Files:
agents/research/codex-brainstorm.mdagents/research/gemini-brainstorm.mdagents/research/codex-reviewer.mdagents/research/gemini-reviewer.mdagents/research/multi-llm-brainstorm.mdskills/codex-cli/SKILL.mdskills/gemini-cli/SKILL.mdModified Files:
commands/workflows/plan.md- Add optional multi-LLM stepcommands/workflows/review.md- Add optional external review stepCLAUDE.md- Document new agents and capabilitieslocal_manifest.yml- Add new local filesCLI Requirements
gpt-5.1highgpt-5.1-codex-maxxhigh(maximum)gemini-3-pro-previewSafety Constraints
Codex CLI:
--sandbox read-only--full-auto--dangerously-bypass-approvals-and-sandboxGemini CLI:
--yoloor--approval-mode=yoloVerified CLI Authentication (Tested 2025-12-02)
All configurations verified working with current logged-in credentials:
gpt-5.1highgpt-5.1-codex-maxxhighgemini-3-pro-previewTest Commands Used
Note: Gemini shows an IDE extension warning that can be ignored - CLI functions correctly.
Model Availability Notes
Based on testing:
Codex CLI: Uses
gpt-5.1-codex-maxby default with high reasoning enabled via-c model_reasoning_effort=high. Premium models likeo3require API key authentication.Gemini CLI: With Google AI Ultra subscription,
gemini-3-pro-previewis available for the highest-quality reasoning and analysis.The plan uses the highest-tier models with appropriate reasoning levels:
gpt-5.1+model_reasoning_effort=highgpt-5.1-codex-max+model_reasoning_effort=xhigh(maximum)gemini-3-pro-preview(requires Ultra subscription)Resources
Documentation
Related Work
gpt-5agent pattern in CE workflow/reviewcommand