What would you like to be added?
Agent Arena is a competitive execution feature that allows users to dispatch multiple AI models simultaneously to execute the same task. Users can observe how different models perform on identical tasks, compare their solutions, and select the best result to apply to their main workspace.
Why is this needed?
Current Pain Points
- Model selection difficulty: Users configure multiple model providers but are unsure which model is best for specific task types
- Lack of horizontal comparison: Unable to intuitively compare performance differences between different models on the same task
- Single point of failure: Relying on only one model may lead to suboptimal solutions for specific problem types
Expected Value
- Model benchmarking: Evaluate different models' capabilities in actual work scenarios
- Best solution selection: Pick the optimal implementation from multiple solutions
- Learning and insights: Observe different models' reasoning styles and problem-solving approaches
- Improved reliability: Reduce error risks from single models through multi-model validation
Core Requirements
1. User Entry Point
Provide a slash command interface for users to launch Agent Arena.
/arena --models model1,model2 "task description" to start a new session with specified models
- Support selecting models from configured providers
- Allow users to specify the task description for all competing Agents
2. Multi-Agent Parallel Execution
Run multiple independent Agents simultaneously, each using a different model configuration.
- Support launching N Agents simultaneously (N specified by user or auto-determined by configured models, max 5)
- Each Agent is a complete Main Agent-level instance (not a restricted Subagent)
- Agents are completely independent with no shared state
- Support individual Agent early completion or failure without affecting other Agents
3. Environment Isolation (Git Worktree)
Each Agent must run in a completely isolated environment to prevent interference.
- Use
git worktree to create independent working directories for each competing model
- Worktrees should be created in a unified management location (e.g.,
~/.qwen/arena/<session-id>/<model-name>/)
- Support automatic cleanup of worktrees after Agent Arena completion, or retention for user inspection
- All file operations (read, write, edit) by each Agent are restricted to their worktree scope
4. TUI Multi-Agent Display
Provide flexible ways to visualize and interact with multiple running Agents.
- Display progress indicators for all running Agents (status, elapsed time, etc.)
- Support two display modes:
- In-process Mode: Within a single terminal window, allow users to switch between different Agents to view their execution progress and interact with them
- Split-pane Mode: Display each Agent's execution in separate terminal windows/panes (e.g., tmux panes) for side-by-side comparison
- Allow users to interact with any Agent (send input, interrupt, etc.) regardless of display mode
5. Result Comparison and Selection
After Agent Arena completes, allow users to compare outcomes and select preferred solutions.
- Outcome Summary: See a brief summary of each Agent's result (success/failure, key output)
- Execution Metrics: View execution statistics for each Agent (completion status, elapsed time, etc.)
- Solution Selection: Choose one Agent's solution to apply/merge into the main workspace
- Workspace Management: Choose to preserve worktrees for further inspection or clean them up after selection
Additional context
Acceptance Criteria
Key Differences vs. Related Features
|
Agent Arena |
Agent Team |
Agent Swarm |
| Goal |
Competitive: Find the best solution to the same task |
Collaborative: Tackle different aspects together |
Parallel processing: Dynamically spawn workers for batch tasks |
| Entry Point |
/arena slash command with explicit --models |
Natural language request describing task and team |
Tool call during execution (spawn_swarm_worker) |
| Agent Creation |
Pre-configured models compete |
Teammates dynamically created with roles |
Workers created on-the-fly, no pre-definition |
| Relationship |
Agents compete; user selects winner |
Agents collaborate; lead synthesizes results |
Workers execute independently; parent aggregates |
| Communication |
No agent-to-agent communication |
Direct peer-to-peer messaging between teammates |
One-way: results aggregated by parent |
| Coordination |
Parallel execution with final selection |
Self-coordination via shared task list |
Parent manages spawning and result collection |
| Context |
Completely isolated (separate worktrees) |
Independent sessions with shared task list |
Lightweight ephemeral context per worker |
| Lifecycle |
Session-based with comparison phase |
Persistent team with ongoing collaboration |
Ephemeral: spawn → execute → return → cleanup |
| Output |
One selected solution applied to workspace |
Synthesized results from multiple perspectives |
Aggregated results from parallel batch processing |
| Best for |
Benchmarking, choosing between model approaches |
Research, complex collaboration, cross-layer work |
Batch operations, data processing, map-reduce tasks |
What would you like to be added?
Agent Arena is a competitive execution feature that allows users to dispatch multiple AI models simultaneously to execute the same task. Users can observe how different models perform on identical tasks, compare their solutions, and select the best result to apply to their main workspace.
Why is this needed?
Current Pain Points
Expected Value
Core Requirements
1. User Entry Point
Provide a slash command interface for users to launch Agent Arena.
/arena --models model1,model2 "task description"to start a new session with specified models2. Multi-Agent Parallel Execution
Run multiple independent Agents simultaneously, each using a different model configuration.
3. Environment Isolation (Git Worktree)
Each Agent must run in a completely isolated environment to prevent interference.
git worktreeto create independent working directories for each competing model~/.qwen/arena/<session-id>/<model-name>/)4. TUI Multi-Agent Display
Provide flexible ways to visualize and interact with multiple running Agents.
5. Result Comparison and Selection
After Agent Arena completes, allow users to compare outcomes and select preferred solutions.
Additional context
Acceptance Criteria
/arena --models model1,model2 "task description"Key Differences vs. Related Features
/arenaslash command with explicit--modelsspawn_swarm_worker)