feat(plugin): trial output-consistency metric via embedding similarity

## Canonical Plan
This issue body is the source of truth for implementation.

## Objective
We introduce a trial-output consistency metric as a plugin/reference capability.

## Why This Matters
- Trial score variability does not directly describe semantic output consistency.
- We need a dedicated signal for response stability across repeated attempts.
- This unlocks stronger diagnostics for non-deterministic agent behavior.

## Why This Location
- Consistency scoring method choice is specialized and likely to evolve.
- Plugin-first lets us experiment without hardcoding a narrow built-in.
- This aligns with AgentV’s extensibility-first architecture.

## Architecture Boundary
Plugin-first (aligned with AgentV principles).

## Deliverable Location
- Primary location: `examples/features/trial-output-consistency/`
- Plugin/judge location: `examples/features/trial-output-consistency/judges/`
- Runnable eval location: `examples/features/trial-output-consistency/evals/`
- Usage/docs: `examples/features/trial-output-consistency/README.md`

## Design Latitude
We can choose embedding backend, similarity strategy, and artifact flow, provided the solution stays extension-oriented.

## Acceptance Signals
- We provide a runnable reference implementation that computes consistency across repeated trial outputs.
- We expose the result as a named metric in evaluation workflows.
- We define explicit behavior for low-trial and edge-case inputs.

## Non-Goals
- No built-in core evaluator unless plugin-first proves insufficient.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugin): trial output-consistency metric via embedding similarity #368

Canonical Plan

Objective

Why This Matters

Why This Location

Architecture Boundary

Deliverable Location

Design Latitude

Acceptance Signals

Non-Goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(plugin): trial output-consistency metric via embedding similarity #368

Description

Canonical Plan

Objective

Why This Matters

Why This Location

Architecture Boundary

Deliverable Location

Design Latitude

Acceptance Signals

Non-Goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions