[FEATURE]: chat.model hook for blind LLM benchmarking feat(plugin)

### Feature hasn't been suggested before.

- [x] I have verified this feature I'm about to request hasn't been suggested before.

### Describe the enhancement you want to request

**Problem**
LLM providers and models have biased benchmarks. Existing evaluations (HumanEval, provider-reported metrics) don't reflect real-world coding performance. I want to build a blind testing plugin that:
- Randomly assigns models from a pool at session start
- Hides model identity during use
- Prompts users to rate after the session
- Reveals the model only after rating
- Collects unbiased benchmark data
- This enables organizations and open-source contributors to compare LLMs for coding without brand bias.

**Proposal**
Add a chat.model plugin hook to override model selection:
```
"chat.model"?: (
  input: { sessionID, agent, model, provider, message },
  output: { 
    model?: { providerID: string; modelID: string }
    displayModel?: string
  }
) => Promise<void>
```
- model: Override the actual model used
- displayModel: Custom name shown in UI (e.g., "Model A")
This follows existing patterns like chat.params and chat.headers.


**Why OpenCode Benefits**
1. Zen's mission - OpenCode Zen aims to benchmark the best models. Real-world blind test data would improve recommendations.
2. Public benchmark dashboard - OpenCode could host a leaderboard at opencode.ai/benchmark showing unbiased model rankings from community contributions. Differentiates from competitors.
3. Thought leadership - "Most comprehensive real-world LLM benchmark for coding" drives blog posts, press, and adoption.
4. Generic utility - Beyond benchmarking, the hook enables provider failover, A/B testing, and enterprise model governance.


**Implementation**
Small scope:
- packages/plugin/src/index.ts - hook type
- packages/opencode/src/session/llm.ts - trigger hook before LLM call
- packages/opencode/src/session/message-v2.ts - add displayModel field
- TUI components - show displayModel if present

**Discussion**
- Is the hook signature appropriate?
- Interest in OpenCode hosting a public benchmark dashboard?
- I'm willing to implement the core changes and build the blind test plugin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: chat.model hook for blind LLM benchmarking feat(plugin) #16932

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE]: chat.model hook for blind LLM benchmarking feat(plugin) #16932

Description

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions