feat(tooling): split combined results JSONL by target (external-first)

## Canonical Plan
This issue body is the source of truth for implementation.

## Objective
We improve multi-model benchmarking ergonomics by producing per-target result artifacts from combined JSONL outputs.

## Why This Matters
- Multi-target runs are harder to compare quickly when all records stay in one combined file.
- Teams repeatedly write ad-hoc filtering scripts before they can run model-vs-model analysis.
- Standardizing this utility reduces operational friction in benchmark workflows.

## Why This Location
- This is an output-shaping concern, not scoring/execution behavior.
- We can solve it as tooling without changing core evaluator/runtime semantics.
- Keeping it external-first preserves AgentV’s lightweight-core design.

## Architecture Boundary
External-first. We prefer wrapper/tooling and avoid core evaluator/runtime changes.

## Deliverable Location
- Primary location (in-repo tooling path): `examples/features/benchmark-tooling/`
- Script location: `examples/features/benchmark-tooling/scripts/` (for split-by-target utility)
- Usage/docs: `examples/features/benchmark-tooling/README.md`

## Design Latitude
We can choose the exact utility interface and filename strategy as long as output is deterministic and easy to use.

## Acceptance Signals
- We can derive one deterministic JSONL per target from a combined results file.
- We handle target names that require safe filename normalization.
- We document the downstream `compare` workflow.

## Non-Goals
- No runtime scoring semantics changes.
- No mandatory schema changes for existing result records.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tooling): split combined results JSONL by target (external-first) #364

Canonical Plan

Objective

Why This Matters

Why This Location

Architecture Boundary

Deliverable Location

Design Latitude

Acceptance Signals

Non-Goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(tooling): split combined results JSONL by target (external-first) #364

Description

Canonical Plan

Objective

Why This Matters

Why This Location

Architecture Boundary

Deliverable Location

Design Latitude

Acceptance Signals

Non-Goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions