Canonical Plan
This issue body is the source of truth for implementation.
Objective
We create one canonical showcase that demonstrates multi-model × multi-metric × variability workflow end-to-end.
Why This Matters
- The capabilities exist, but users currently assemble the workflow from multiple separate examples.
- One integrated showcase lowers onboarding time and reduces misconfiguration risk.
- It makes AgentV’s benchmark story legible and repeatable.
Why This Location
- This is a composition and communication problem, not a runtime primitive gap.
- A docs/examples deliverable can prove workflow value without core expansion.
- Keeping this in showcase/docs preserves architecture boundaries.
Architecture Boundary
Docs/examples only.
Deliverable Location
- Showcase root:
examples/showcase/multi-model-benchmark/
- Suggested structure:
examples/showcase/multi-model-benchmark/evals/
examples/showcase/multi-model-benchmark/README.md
- Docs links/mentions:
apps/web/src/content/docs/
Design Latitude
We choose the final scenario/domain, provider mix, and narrative flow to maximize clarity and practical utility.
Acceptance Signals
- We add a new showcase that composes targets matrix, weighted metrics, trials, and comparison workflow.
- We include low-cost/default-safe execution guidance.
- We link it from relevant docs navigation.
Non-Goals
- No new runtime features created only for the showcase.
Canonical Plan
This issue body is the source of truth for implementation.
Objective
We create one canonical showcase that demonstrates multi-model × multi-metric × variability workflow end-to-end.
Why This Matters
Why This Location
Architecture Boundary
Docs/examples only.
Deliverable Location
examples/showcase/multi-model-benchmark/examples/showcase/multi-model-benchmark/evals/examples/showcase/multi-model-benchmark/README.mdapps/web/src/content/docs/Design Latitude
We choose the final scenario/domain, provider mix, and narrative flow to maximize clarity and practical utility.
Acceptance Signals
Non-Goals