Skip to content

feat(cli): static HTML report command for eval results #1079

@christso

Description

@christso

Context

`agentv eval` already writes a live-refreshing HTML report via `html-writer.ts` (light theme, GitHub-style). However there is no way to produce a sharable HTML report from existing JSONL results without re-running the eval or asking a subagent to write HTML from scratch — burning LLM tokens on fully deterministic content.

Concrete user-land workaround that triggered this request:
A user ran an eval pipeline and the subagent produced a polished HTML results page. The HTML was well-structured but required the full LLM call chain to generate, when a static template + data injection would have produced the identical output.

Proposed solution — skill only, no CLI changes

The report content is entirely deterministic. A skill can produce it without any LLM HTML generation:

  1. The skill ships a static HTML template as an asset file (Studio dark theme, data injected via a `DATA_PLACEHOLDER` slot)
  2. When invoked, the agent:
    • Reads the `*.results.jsonl` from the results directory
    • Serialises each line into a JSON array
    • Substitutes the array into the template's placeholder
    • Writes the resulting file to disk

This is a file read + string substitution + file write. No HTML generation by the LLM.

Reusable code

`html-writer.ts` (`apps/cli/src/commands/eval/html-writer.ts`) already contains a complete standalone HTML implementation:

  • Inline CSS and JS (self-contained, zero external requests)
  • Data slot: `var DATA = ${dataJson}`
  • Two tabs: Overview (stats grid, targets table, score histogram) + Test Cases (filterable, sortable, expandable rows with assertion detail)
  • All rendering logic already written

The skill's static template is the `generateHtml()` output with `isLive=false` and the data slot replaced by a literal `DATA_PLACEHOLDER` string, re-skinned to the Studio dark palette.

Note: `plugins/agentv-dev/skills/agentv-bench/assets/eval_review.html` is unrelated — it is an interactive editor for eval set queries, not an eval results viewer.

Template enhancements over html-writer.ts

The existing `html-writer.ts` output covers most of what's needed, but the template should add two things visible in real-world usage:

1. Group tests by eval file

When a results directory contains multiple `*.results.jsonl` files, group the tests under a section header per eval file (e.g. "cw-freight-boolean-registry — 5/5 PASSED"). The eval file name is inferred directly from the JSONL filename — no wire format change needed.

2. Assertion type badges

Show the evaluator type inline with each assertion (e.g. `contains`, `llm-grader`, `regex`). The type is already available as `scores[].name` in the existing JSONL — each assertion belongs to a parent score entry whose `name` is the evaluator type. The template just needs to carry the parent `name` down when rendering individual assertions. No wire format change needed.

Template design requirements

Re-skin the extracted template to follow `apps/studio/DESIGN.md`:

Token Value
Canvas `#030712` (`bg-gray-950`)
Elevated surface `#111827` (`bg-gray-900`)
Border `#1f2937` (`border-gray-800`)
Body text `#d1d5db` / `#9ca3af` (`gray-300` / `gray-400`)
Heading `#ffffff`
Accent (interactive) `#22d3ee` (`cyan-400`)
Pass / warn / fail `#34d399` / `#facc15` / `#f87171`

Additional requirements:

  • System sans-serif only — no webfonts, no Google Fonts
  • `tabular-nums` on every numeric column
  • Pass-rate bar: blue-gradient fill on `#1f2937` track (matching `PassRatePill` appearance)

Scope of change

  • New skill in `plugins/agentv-dev/skills/` (or equivalent) with:
    • A `SKILL.md` that instructs the agent to read JSONL → serialise → substitute → write
    • An `assets/report-template.html` — the Studio-themed static template with `DATA_PLACEHOLDER`
  • No changes to `html-writer.ts`, the CLI, or any TypeScript source
  • No wire format changes — all data needed (eval file grouping, assertion types) is already present in the existing JSONL output

Acceptance signals

  • Invoking the skill produces a valid, self-contained HTML file that opens without network access
  • Report matches Studio design system (dark theme, cyan accent, dense tables, no webfonts)
  • Multiple eval files in a results directory are rendered as separate grouped sections
  • Assertion type badges are shown per assertion, derived from the parent `scores[].name`
  • The agent performs zero HTML generation — only file I/O and string substitution
  • `html-writer.ts` and all CLI commands are untouched

Non-goals

  • A new CLI subcommand (the skill covers the use case without one)
  • Modifying the live `HtmlWriter` used during `agentv eval`
  • PDF export
  • Any JSONL or wire format changes

Related

  • `apps/studio/DESIGN.md` — design system to follow
  • `apps/cli/src/commands/eval/html-writer.ts` — source for the template structure and JS rendering logic
  • `apps/cli/src/commands/eval/artifact-writer.ts` — wire format for JSONL result data
  • `plugins/agentv-dev/skills/agentv-bench/assets/eval_review.html` — unrelated (eval set editor, not results viewer)

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentVenhancementNew feature or requestin-progressClaimed by an agent — do not duplicate work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions