Skip to content

[FEATURE]: chat.model hook for blind LLM benchmarking feat(plugin) #16932

@igor-voloc

Description

@igor-voloc

Feature hasn't been suggested before.

  • I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

Problem
LLM providers and models have biased benchmarks. Existing evaluations (HumanEval, provider-reported metrics) don't reflect real-world coding performance. I want to build a blind testing plugin that:

  • Randomly assigns models from a pool at session start
  • Hides model identity during use
  • Prompts users to rate after the session
  • Reveals the model only after rating
  • Collects unbiased benchmark data
  • This enables organizations and open-source contributors to compare LLMs for coding without brand bias.

Proposal
Add a chat.model plugin hook to override model selection:

"chat.model"?: (
  input: { sessionID, agent, model, provider, message },
  output: { 
    model?: { providerID: string; modelID: string }
    displayModel?: string
  }
) => Promise<void>
  • model: Override the actual model used
  • displayModel: Custom name shown in UI (e.g., "Model A")
    This follows existing patterns like chat.params and chat.headers.

Why OpenCode Benefits

  1. Zen's mission - OpenCode Zen aims to benchmark the best models. Real-world blind test data would improve recommendations.
  2. Public benchmark dashboard - OpenCode could host a leaderboard at opencode.ai/benchmark showing unbiased model rankings from community contributions. Differentiates from competitors.
  3. Thought leadership - "Most comprehensive real-world LLM benchmark for coding" drives blog posts, press, and adoption.
  4. Generic utility - Beyond benchmarking, the hook enables provider failover, A/B testing, and enterprise model governance.

Implementation
Small scope:

  • packages/plugin/src/index.ts - hook type
  • packages/opencode/src/session/llm.ts - trigger hook before LLM call
  • packages/opencode/src/session/message-v2.ts - add displayModel field
  • TUI components - show displayModel if present

Discussion

  • Is the hook signature appropriate?
  • Interest in OpenCode hosting a public benchmark dashboard?
  • I'm willing to implement the core changes and build the blind test plugin.

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)discussionUsed for feature requests, proposals, ideas, etc. Open discussion

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions