feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection

## Status Update (2026-04-09)

This issue remains relevant, but its scope should continue to center on **core run data and CLI analysis**, not a larger assistant-runtime or Studio expansion.

Important note: AgentV already has **`agentv trend`**. That means part of the original "cross-run regression detection" goal already exists. The remaining core work is mainly around **iteration metadata, termination taxonomy, and artifact-backed run history that other optimization workflows can build on**.

## Revised Scope

### In scope
- `expected_iterations` on tests
- `completion_signal` support where appropriate
- `termination_reason` / related loop metadata in results
- `iteration_efficiency` assertion type (still depends on `#320`)
- strengthening/reusing CLI regression analysis primitives where needed
- additive, non-breaking metadata that helps optimization workflows resume and inspect prior cycles
- artifact-backed iteration history primitives where they clearly support CLI analysis and other workflow layers

### Deprioritized
- Studio regression alert feed
- Studio regression timeline as a major workstream
- auto-clustering UI inside Studio
- making this issue dependent on a bigger Studio rollout
- persistent personal memory or session-search infrastructure for the core product

## Design Boundary

This issue should provide **portable run/result primitives** that skills, plugins, wrappers, or future workflow tooling can consume. It should not turn AgentV into a long-lived self-improving assistant runtime.

## Dependencies

- `#320` remains the important dependency for `iteration_efficiency`

## Acceptance Signals

- [ ] iteration metadata is represented in result data cleanly and non-breakingly
- [ ] termination reasons are captured for loop-based workflows
- [ ] `iteration_efficiency` can be evaluated once `#320` is available
- [ ] any regression UX added later can reuse existing CLI/data primitives rather than invent a new subsystem
- [ ] result metadata is sufficient for higher-level optimization loops to resume, inspect, and compare prior cycles without requiring chat-memory features

## Related

- #748 — autoresearch mode in agentv-bench
- #958 — automated keep/discard in the current bench loop
- #1003 — tracking issue for optimization-loop roadmap coordination



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection #335

Status Update (2026-04-09)

Revised Scope

In scope

Deprioritized

Design Boundary

Dependencies

Acceptance Signals

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection #335

Description

Status Update (2026-04-09)

Revised Scope

In scope

Deprioritized

Design Boundary

Dependencies

Acceptance Signals

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions