Feature Request: Agent Arena - Multi-Model Competitive Execution

## What would you like to be added?

**Agent Arena** is a competitive execution feature that allows users to dispatch multiple AI models simultaneously to execute the same task. Users can observe how different models perform on identical tasks, compare their solutions, and select the best result to apply to their main workspace.

## Why is this needed?

### Current Pain Points

1. **Model selection difficulty**: Users configure multiple model providers but are unsure which model is best for specific task types
2. **Lack of horizontal comparison**: Unable to intuitively compare performance differences between different models on the same task
3. **Single point of failure**: Relying on only one model may lead to suboptimal solutions for specific problem types

### Expected Value

1. **Model benchmarking**: Evaluate different models' capabilities in actual work scenarios
2. **Best solution selection**: Pick the optimal implementation from multiple solutions
3. **Learning and insights**: Observe different models' reasoning styles and problem-solving approaches
4. **Improved reliability**: Reduce error risks from single models through multi-model validation

## Core Requirements

### 1. User Entry Point
Provide a slash command interface for users to launch Agent Arena.

- `/arena --models model1,model2 "task description"` to start a new session with specified models
- Support selecting models from configured providers
- Allow users to specify the task description for all competing Agents

### 2. Multi-Agent Parallel Execution
Run multiple independent Agents simultaneously, each using a different model configuration.

- Support launching N Agents simultaneously (N specified by user or auto-determined by configured models, max 5)
- Each Agent is a **complete Main Agent-level instance** (not a restricted Subagent)
- Agents are completely independent with no shared state
- Support individual Agent early completion or failure without affecting other Agents

### 3. Environment Isolation (Git Worktree)
Each Agent must run in a completely isolated environment to prevent interference.

- Use `git worktree` to create independent working directories for each competing model
- Worktrees should be created in a unified management location (e.g., `~/.qwen/arena/<session-id>/<model-name>/`)
- Support automatic cleanup of worktrees after Agent Arena completion, or retention for user inspection
- All file operations (read, write, edit) by each Agent are restricted to their worktree scope

### 4. TUI Multi-Agent Display
Provide flexible ways to visualize and interact with multiple running Agents.

- Display progress indicators for all running Agents (status, elapsed time, etc.)
- Support two display modes:
  - **In-process Mode**: Within a single terminal window, allow users to switch between different Agents to view their execution progress and interact with them
  - **Split-pane Mode**: Display each Agent's execution in separate terminal windows/panes (e.g., tmux panes) for side-by-side comparison
- Allow users to interact with any Agent (send input, interrupt, etc.) regardless of display mode

### 5. Result Comparison and Selection
After Agent Arena completes, allow users to compare outcomes and select preferred solutions.

- **Outcome Summary**: See a brief summary of each Agent's result (success/failure, key output)
- **Execution Metrics**: View execution statistics for each Agent (completion status, elapsed time, etc.)
- **Solution Selection**: Choose one Agent's solution to apply/merge into the main workspace
- **Workspace Management**: Choose to preserve worktrees for further inspection or clean them up after selection

## Additional context

### Acceptance Criteria

- [ ] User can launch Agent Arena with multiple models using `/arena --models model1,model2 "task description"`
- [ ] Each Agent runs in isolated Git worktree with no interference
- [ ] Main UI shows progress indicators for all Agents
- [ ] User can switch between Agents to view their execution progress and interact with them
- [ ] Agent Arena session can be cleaned up or preserved after completion
- [ ] Results from different Agents can be compared and selected

### Key Differences vs. Related Features

|                  | **Agent Arena**                                    | **Agent Team**                                      | **Agent Swarm**                         |
|:-----------------|:---------------------------------------------------|:----------------------------------------------------|:---------------------------------------------------|
| **Goal**         | Competitive: Find the best solution to the *same* task | Collaborative: Tackle *different* aspects together  | Parallel processing: Dynamically spawn workers for batch tasks |
| **Entry Point**  | `/arena` slash command with explicit `--models`    | Natural language request describing task and team   | Tool call during execution (`spawn_swarm_worker`)  |
| **Agent Creation**| Pre-configured models compete                     | Teammates dynamically created with roles            | Workers created on-the-fly, no pre-definition      |
| **Relationship** | Agents compete; user selects winner                | Agents collaborate; lead synthesizes results        | Workers execute independently; parent aggregates   |
| **Communication**| No agent-to-agent communication                    | Direct peer-to-peer messaging between teammates     | One-way: results aggregated by parent              |
| **Coordination** | Parallel execution with final selection            | Self-coordination via shared task list              | Parent manages spawning and result collection      |
| **Context**      | Completely isolated (separate worktrees)           | Independent sessions with shared task list          | Lightweight ephemeral context per worker           |
| **Lifecycle**    | Session-based with comparison phase                | Persistent team with ongoing collaboration          | Ephemeral: spawn → execute → return → cleanup      |
| **Output**       | One selected solution applied to workspace         | Synthesized results from multiple perspectives      | Aggregated results from parallel batch processing  |
| **Best for**     | Benchmarking, choosing between model approaches    | Research, complex collaboration, cross-layer work   | Batch operations, data processing, map-reduce tasks |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Agent Arena - Multi-Model Competitive Execution #1814

What would you like to be added?

Why is this needed?

Current Pain Points

Expected Value

Core Requirements

1. User Entry Point

2. Multi-Agent Parallel Execution

3. Environment Isolation (Git Worktree)

4. TUI Multi-Agent Display

5. Result Comparison and Selection

Additional context

Acceptance Criteria

Key Differences vs. Related Features

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Agent Arena	Agent Team	Agent Swarm
Goal	Competitive: Find the best solution to the same task	Collaborative: Tackle different aspects together	Parallel processing: Dynamically spawn workers for batch tasks
Entry Point	`/arena` slash command with explicit `--models`	Natural language request describing task and team	Tool call during execution (`spawn_swarm_worker`)
Agent Creation	Pre-configured models compete	Teammates dynamically created with roles	Workers created on-the-fly, no pre-definition
Relationship	Agents compete; user selects winner	Agents collaborate; lead synthesizes results	Workers execute independently; parent aggregates
Communication	No agent-to-agent communication	Direct peer-to-peer messaging between teammates	One-way: results aggregated by parent
Coordination	Parallel execution with final selection	Self-coordination via shared task list	Parent manages spawning and result collection
Context	Completely isolated (separate worktrees)	Independent sessions with shared task list	Lightweight ephemeral context per worker
Lifecycle	Session-based with comparison phase	Persistent team with ongoing collaboration	Ephemeral: spawn → execute → return → cleanup
Output	One selected solution applied to workspace	Synthesized results from multiple perspectives	Aggregated results from parallel batch processing
Best for	Benchmarking, choosing between model approaches	Research, complex collaboration, cross-layer work	Batch operations, data processing, map-reduce tasks

Feature Request: Agent Arena - Multi-Model Competitive Execution #1814

Description

What would you like to be added?

Why is this needed?

Current Pain Points

Expected Value

Core Requirements

1. User Entry Point

2. Multi-Agent Parallel Execution

3. Environment Isolation (Git Worktree)

4. TUI Multi-Agent Display

5. Result Comparison and Selection

Additional context

Acceptance Criteria

Key Differences vs. Related Features

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions