Skip to content

docs/examples: add multi-model x multi-metric x variability benchmark showcase #370

@christso

Description

@christso

Canonical Plan

This issue body is the source of truth for implementation.

Objective

We create one canonical showcase that demonstrates multi-model × multi-metric × variability workflow end-to-end.

Why This Matters

  • The capabilities exist, but users currently assemble the workflow from multiple separate examples.
  • One integrated showcase lowers onboarding time and reduces misconfiguration risk.
  • It makes AgentV’s benchmark story legible and repeatable.

Why This Location

  • This is a composition and communication problem, not a runtime primitive gap.
  • A docs/examples deliverable can prove workflow value without core expansion.
  • Keeping this in showcase/docs preserves architecture boundaries.

Architecture Boundary

Docs/examples only.

Deliverable Location

  • Showcase root: examples/showcase/multi-model-benchmark/
  • Suggested structure:
    • examples/showcase/multi-model-benchmark/evals/
    • examples/showcase/multi-model-benchmark/README.md
  • Docs links/mentions: apps/web/src/content/docs/

Design Latitude

We choose the final scenario/domain, provider mix, and narrative flow to maximize clarity and practical utility.

Acceptance Signals

  • We add a new showcase that composes targets matrix, weighted metrics, trials, and comparison workflow.
  • We include low-cost/default-safe execution guidance.
  • We link it from relevant docs navigation.

Non-Goals

  • No new runtime features created only for the showcase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions