dashboard: add coverage diagnostics and improve run alignment by AustinKelsay · Pull Request #9 · AustinKelsay/plebdev-bench

AustinKelsay · 2026-02-17T17:44:58Z

Summary by CodeRabbit

New Features
- Coverage diagnostics panel added to run details showing plan-vs-execution comparisons and coverage metrics
Improvements
- Frontier-eval failure tracking and a dedicated failures section with detailed entries and new failure types/tooltips
- Reworked failure breakdown with unified, clearer presentation
- Enhanced timing views with grouped, per-group breakdowns and richer timing metrics
- CLI help/usage output reformatted for multi-line readability

vercel · 2026-02-17T17:45:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
plebdev-bench-dashboard	Ready	Preview, Comment	Feb 17, 2026 11:27pm

coderabbitai · 2026-02-17T17:45:14Z

📝 Walkthrough

Walkthrough

Adds diagnostics and frontier-eval failure tracking across aggregations and UI: new CoverageDiagnostics component wired into RunDetailPage, diagnostics aggregation helpers, expanded FailureStats and failure UI (including frontier-eval), timing UI refactor, tooltip entries for new failure types, and minor formatting/import tweaks.

Changes

Cohort / File(s)	Summary
Build & Configuration `apps/dashboard/scripts/build-index.ts`, `vercel.json`, `apps/dashboard/tsconfig.json`	Help output formatting; JSON re-indentation; removed `noUncheckedSideEffectImports` from dashboard tsconfig.
Coverage & Diagnostics UI `apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx`, `apps/dashboard/src/components/run-detail/run-detail-page.tsx`	New `CoverageDiagnostics` component added and rendered in `RunDetailPage`; uses diagnostic aggregations to compute coverage and per-dimension deltas.
Failure Tracking UI `apps/dashboard/src/components/run-detail/failure-breakdown.tsx`, `apps/dashboard/src/components/run-detail/dimension-detail-dialog.tsx`	Refactored failure rendering to use `computeFailureStats` and `FailureTypeList`; added frontier-eval failure support and a detailed frontier-eval failures section.
Timing & Metrics UI `apps/dashboard/src/components/run-detail/timing-stats.tsx`	Reworked timing UI into `TimingGrid` and `ScoringBreakdownTable`; computes generation/scoring timing and per-group scoring breakdowns.
Aggregation Libraries `apps/dashboard/src/lib/aggregations-core.ts`, `apps/dashboard/src/lib/aggregations-diagnostics.ts`, `apps/dashboard/src/lib/aggregations.ts`	Extended `FailureStats` with frontier-eval fields/details; added diagnostics module (`computeScoringTimingStats`, `computeScoringTimingBreakdown`, `computeDimensionCounts`, `computeCoverageStats`) and re-exported it.
Tooltip & Labels `apps/dashboard/src/lib/tooltip-content.ts`	Added frontier-eval category and multiple new failure type descriptions (`auth_error`, `rate_limited`, `http_error`, `invalid_response`, `parse_error`, `truncated`, `missing_export`, `factory_init_failed`).
Harnesses & Runner `src/harnesses/code-output-policy.ts`, `src/harnesses/goose-adapter.ts`, `src/harnesses/opencode-adapter.ts`, `src/runner/item-executor.ts`, `src/runner/plan-builder.ts`	Mostly non-functional formatting/import reorderings; `plan-builder` now derives `RunPlan.summary` directly from expanded matrix items.
Other `apps/dashboard/scripts/...`, `package.json`	Minor manifest/formatting references present in diffs; no API removals.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Page as RunDetailPage
participant UI as CoverageDiagnostics
participant Agg as AggregationsDiagnostics
participant Core as AggregationsCore
participant Items as Run Items / Plan
Page->>UI: render(run, plan)
UI->>Items: collect matrix items & frontier-eval failure entries
UI->>Agg: computeCoverageStats(items)
UI->>Agg: computeDimensionCounts(items)
UI->>Core: computeFailureStats(items)
Agg-->>UI: coverage & dimension metrics
Core-->>UI: failure aggregates & details
UI-->>Page: render diagnostic table, metric panels, frontier-eval failures panel

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

review and harden phase executed by codex #4 — Adds/wires frontier-eval failure tracking and related core/UI changes (overlaps FailureStats and frontier-eval fields).
refine with real tool use and better nuanced metrics #5 — Adds diagnostics and aggregation helpers similar to computeCoverageStats/computeDimensionCounts and new dashboard panels.
tighten todo prompt and export exceptions #8 — Introduces new scoring failure types (missing_export, factory_init_failed) surfaced in tooltip/content and UI here.

Poem

🐰 I hopped through runs and tallied every test,
Frontier-evals counted, deltas neatly dressed,
I built tiny tables and timing rows so bright,
Failures and coverage glowing in moonlight,
A carrot cheer — diagnostics take flight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: adding coverage diagnostics component and improving run alignment via plan summary refactoring.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/dashboard-coverage-diagnostics

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

src/harnesses/opencode-adapter.ts (1)

471-473: Avoid silently swallowing cleanup failures.

This .catch(() => {}) drops errors. Please at least log at debug/warn so failures aren’t silent. As per coding guidelines: "Throw on invalid config, invalid schemas, and programmer mistakes; avoid silent fallbacks."

♻️ Proposed change

-				fs.promises
-					.rm(workDir, { recursive: true, force: true })
-					.catch(() => {});
+				fs.promises
+					.rm(workDir, { recursive: true, force: true })
+					.catch((error) => {
+						log.warn({ error, workDir }, "Failed to cleanup OpenCode workDir");
+					});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/harnesses/opencode-adapter.ts` around lines 471 - 473, The cleanup call
fs.promises.rm(workDir, { recursive: true, force: true }).catch(() => {}) is
swallowing errors; change the catch to surface/log the error (e.g., .catch(err
=> processLogger?.warn(`Failed to remove workDir ${workDir}:`, err)) or at
minimum console.warn) so failures aren’t silent—update the code around
fs.promises.rm and ensure the log includes the workDir and the error object for
debugging.

apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx (2)

31-41: Consider documenting the delta semantics.

The getDeltaTone logic treats actual > base as warning. While this makes sense for alignment diagnostics (more executed than planned suggests drift), a brief inline comment would clarify the intentional semantics for future maintainers.

📝 Optional: Add clarifying comment

+/** Returns color class based on alignment: exact match = success, over = warning, under = danger */
 function getDeltaTone(base: number, actual: number): string {
 	if (base === actual) return "text-success";
 	if (actual > base) return "text-warning";
 	return "text-danger";
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx` around
lines 31 - 41, Add a brief inline comment above getDeltaTone (and optionally
formatDelta) explaining the delta semantics: equal (base === actual) is success,
actual > base is a warning because it indicates more was executed than planned
(potential drift), and actual < base is danger because less was executed than
expected; place the comment immediately above the getDeltaTone function to make
intent clear for future maintainers.

166-176: Verify the distinction between "failures" and "missing evals" is clear to users.

The panel displays frontierEvalFailures (items where evaluation was attempted but failed) and "missing evals" (totalItems - frontierEvalItems, items without any eval data). These are different metrics, but both contribute to incomplete coverage. Consider whether a tooltip would help users understand this distinction.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx` around
lines 166 - 176, The UI shows frontierEvalFailures and computes "missing evals"
as coverage.totalItems - coverage.frontierEvalItems, which can confuse users
because failures (attempted but failed) and missing evals (no eval attempted)
are distinct; update the Frontier Eval Failures card in coverage-diagnostics.tsx
to make this distinction explicit (e.g., change the secondary label text and/or
add a concise tooltip or info icon) that explains frontierEvalFailures refers to
attempted-but-failed evaluations and missing evals are items with no eval data
(use the symbols frontierEvalFailures, coverage.totalItems and
coverage.frontierEvalItems to locate the values to describe). Ensure the tooltip
copy is short, clear, and attached to the warning number or the small
description so users can easily see the difference.

apps/dashboard/src/lib/aggregations-core.ts (1)

155-177: Consider consolidating duplicated computeTimingStatsFromValues helper.

This helper is duplicated in aggregations-diagnostics.ts (lines 49-71). Consider exporting it from one location to avoid divergent implementations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/lib/aggregations-core.ts` around lines 155 - 177, The
computeTimingStatsFromValues helper is duplicated; export the single
implementation (the computeTimingStatsFromValues function) from its current
module and replace the duplicate in the diagnostics aggregation with an import
to that exported function (or move it to a new shared util and import from
there), then remove the duplicated implementation so both aggregations use the
same symbol; update any imports/exports to ensure computeTimingStatsFromValues
is exported (e.g., add an export) and imported where the diagnostics aggregation
currently defines its own copy.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/dashboard/src/components/run-detail/failure-breakdown.tsx`:
- Around line 29-30: The new failure keys missing_export and factory_init_failed
were added to FAILURE_LABELS but have no tooltip text in tooltip-content.ts; add
descriptive entries for these keys in tooltip-content.ts (matching the keys used
by FailureTypeList) so the UI shows consistent tooltips—for example, add tooltip
strings for "missing_export" and "factory_init_failed" alongside the existing
tooltip map entries referenced by FailureTypeList.

In `@apps/dashboard/src/lib/aggregations-core.ts`:
- Around line 161-177: computeTimingStatsFromValues currently computes median
with Math.floor(sortedValues.length / 2) which selects a single middle element
and is incorrect for even-length arrays; update the median calculation in
computeTimingStatsFromValues to match the logic used in the helper in
aggregations-diagnostics.ts by returning the average of the two middle values
when sortedValues.length is even, otherwise return the single middle element,
ensuring p90/other stats remain unchanged.

---

Nitpick comments:
In `@apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx`:
- Around line 31-41: Add a brief inline comment above getDeltaTone (and
optionally formatDelta) explaining the delta semantics: equal (base === actual)
is success, actual > base is a warning because it indicates more was executed
than planned (potential drift), and actual < base is danger because less was
executed than expected; place the comment immediately above the getDeltaTone
function to make intent clear for future maintainers.
- Around line 166-176: The UI shows frontierEvalFailures and computes "missing
evals" as coverage.totalItems - coverage.frontierEvalItems, which can confuse
users because failures (attempted but failed) and missing evals (no eval
attempted) are distinct; update the Frontier Eval Failures card in
coverage-diagnostics.tsx to make this distinction explicit (e.g., change the
secondary label text and/or add a concise tooltip or info icon) that explains
frontierEvalFailures refers to attempted-but-failed evaluations and missing
evals are items with no eval data (use the symbols frontierEvalFailures,
coverage.totalItems and coverage.frontierEvalItems to locate the values to
describe). Ensure the tooltip copy is short, clear, and attached to the warning
number or the small description so users can easily see the difference.

In `@apps/dashboard/src/lib/aggregations-core.ts`:
- Around line 155-177: The computeTimingStatsFromValues helper is duplicated;
export the single implementation (the computeTimingStatsFromValues function)
from its current module and replace the duplicate in the diagnostics aggregation
with an import to that exported function (or move it to a new shared util and
import from there), then remove the duplicated implementation so both
aggregations use the same symbol; update any imports/exports to ensure
computeTimingStatsFromValues is exported (e.g., add an export) and imported
where the diagnostics aggregation currently defines its own copy.

In `@src/harnesses/opencode-adapter.ts`:
- Around line 471-473: The cleanup call fs.promises.rm(workDir, { recursive:
true, force: true }).catch(() => {}) is swallowing errors; change the catch to
surface/log the error (e.g., .catch(err => processLogger?.warn(`Failed to remove
workDir ${workDir}:`, err)) or at minimum console.warn) so failures aren’t
silent—update the code around fs.promises.rm and ensure the log includes the
workDir and the error object for debugging.

coderabbitai

🧹 Nitpick comments (1)

apps/dashboard/src/lib/aggregations-core.ts (1)

276-288: Consider extracting the detail item type for clarity.

The inline type for frontierEvalFailureDetails is fairly complex. Extracting it to a named interface would improve readability and enable reuse if this shape is needed elsewhere.

♻️ Proposed extraction

+/** Detail record for a single frontier eval failure. */
+export interface FrontierEvalFailureDetail {
+	id: string;
+	runtime: string;
+	model: string;
+	harness: string;
+	test: string;
+	passType: string;
+	type: string;
+	status?: number;
+	latencyMs?: number;
+	evalModel?: string;
+	attempts?: number;
+}
+
 /** Failure counts by type. */
 export interface FailureStats {
 	generationFailures: Map<string, number>;
 	scoringFailures: Map<string, number>;
 	frontierEvalFailures: Map<string, number>;
-	frontierEvalFailureDetails: Array<{
-		id: string;
-		runtime: string;
-		model: string;
-		harness: string;
-		test: string;
-		passType: string;
-		type: string;
-		status?: number;
-		latencyMs?: number;
-		evalModel?: string;
-		attempts?: number;
-	}>;
+	frontierEvalFailureDetails: FrontierEvalFailureDetail[];
 	totalGenerationFailures: number;
 	totalScoringFailures: number;
 	totalFrontierEvalFailures: number;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/lib/aggregations-core.ts` around lines 276 - 288, The
inline complex object type used for the frontierEvalFailureDetails property
should be extracted into a named interface to improve readability and reuse:
define an interface (e.g., FrontierEvalFailureDetail) with the fields id,
runtime, model, harness, test, passType, type, and optional status, latencyMs,
evalModel, attempts, then replace the inline Array<{...}> type on
frontierEvalFailureDetails with FrontierEvalFailureDetail[] and update any other
places that use this shape to reference the new interface (search for
frontierEvalFailureDetails and related usages to change the type annotation).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/dashboard/src/lib/aggregations-core.ts`:
- Around line 276-288: The inline complex object type used for the
frontierEvalFailureDetails property should be extracted into a named interface
to improve readability and reuse: define an interface (e.g.,
FrontierEvalFailureDetail) with the fields id, runtime, model, harness, test,
passType, type, and optional status, latencyMs, evalModel, attempts, then
replace the inline Array<{...}> type on frontierEvalFailureDetails with
FrontierEvalFailureDetail[] and update any other places that use this shape to
reference the new interface (search for frontierEvalFailureDetails and related
usages to change the type annotation).

AustinKelsay · 2026-02-17T21:37:09Z

@CodeRabbit full review

coderabbitai · 2026-02-17T21:37:18Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

apps/dashboard/src/lib/aggregations-diagnostics.ts (1)

49-71: Duplicate computeTimingStatsFromValues helper.

This is an exact duplicate of the helper in aggregations-core.ts (lines 161-183). To maintain consistency and reduce maintenance burden, consider exporting it from aggregations-core.ts and importing it here.

♻️ Suggested refactor

In aggregations-core.ts, export the helper:

-function computeTimingStatsFromValues(values: number[]): TimingStats | null {
+export function computeTimingStatsFromValues(values: number[]): TimingStats | null {

Then in this file, import and use it:

-import type { TimingStats } from "./aggregations-core";
+import { computeTimingStatsFromValues, type TimingStats } from "./aggregations-core";
 import type { MatrixItemResult } from "./types";
 
-type DimensionCountItem = Pick<
-	MatrixItemResult,
-	"runtime" | "model" | "harness" | "test" | "passType"
->;
-
-/** Timing stats grouped by dimension name. */
-export interface GroupTimingStats {
-	name: string;
-	stats: TimingStats;
-}
-
-...
-
-/**
- * Computes timing statistics from an arbitrary numeric sample.
- *
- * `@param` values - Timing values in milliseconds
- * `@returns` Timing stats or null when no values are present
- */
-function computeTimingStatsFromValues(values: number[]): TimingStats | null {
-	const sortedValues = [...values].sort((a, b) => a - b);
-	if (sortedValues.length === 0) return null;
-
-	const sum = sortedValues.reduce((a, b) => a + b, 0);
-	const rawP90Index = Math.ceil(sortedValues.length * 0.9) - 1;
-	const p90Index = Math.min(sortedValues.length - 1, Math.max(0, rawP90Index));
-
-	const mid = sortedValues.length / 2;
-	const median =
-		sortedValues.length % 2 === 0
-			? (sortedValues[mid - 1] + sortedValues[mid]) / 2
-			: sortedValues[Math.floor(mid)];
-
-	return {
-		min: sortedValues[0],
-		max: sortedValues[sortedValues.length - 1],
-		median,
-		mean: sum / sortedValues.length,
-		p90: sortedValues[p90Index] || sortedValues[sortedValues.length - 1],
-		count: sortedValues.length,
-	};
-}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/lib/aggregations-diagnostics.ts` around lines 49 - 71, The
computeTimingStatsFromValues function here is a duplicate; remove the local copy
and instead export computeTimingStatsFromValues from aggregations-core.ts
(preserving its signature and the TimingStats type) and import it into this
file, then update any local references to use the imported function; ensure you
delete the duplicate implementation in aggregations-diagnostics.ts so there’s a
single shared implementation.

apps/dashboard/src/components/run-detail/failure-breakdown.tsx (1)

74-81: Add JSDoc documentation for the exported FailureBreakdown function.

The exported function lacks TSDoc documentation. Per coding guidelines, all exported functions must have documentation including purpose, params, returns, and throws.

📝 Suggested JSDoc

+/**
+ * Renders a breakdown card showing failure types grouped by category.
+ * Displays generation, scoring, and frontier eval failures with tooltips.
+ *
+ * `@param` props - Component props
+ * `@param` props.items - Matrix items to compute failure stats from
+ * `@returns` Card component or null if no failures exist
+ */
 export function FailureBreakdown({ items }: FailureBreakdownProps) {

As per coding guidelines: "All exported functions must have TSDoc/JSDoc documentation including purpose, params, returns, and throws"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/components/run-detail/failure-breakdown.tsx` around lines
74 - 81, Add a JSDoc/TSDoc block above the exported FailureBreakdown function
describing its purpose (renders a failure summary UI for given items),
documenting the parameter `items` via the FailureBreakdownProps type, the return
value (JSX.Element | null), and any exceptions/edge-cases (e.g., if
computeFailureStats throws) — reference the function name FailureBreakdown and
the helper computeFailureStats so the doc points to the behavior and inputs
used.

apps/dashboard/src/lib/aggregations-core.ts (1)

155-183: Extract computeTimingStatsFromValues to a shared utility to avoid duplication.

This internal helper is identical in both files and can be exported from aggregations-core.ts and imported by aggregations-diagnostics.ts. Since aggregations-diagnostics.ts already imports TimingStats from core, adding this function import maintains the existing dependency pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/lib/aggregations-core.ts` around lines 155 - 183, Extract
the duplicate helper computeTimingStatsFromValues from
aggregations-diagnostics.ts into aggregations-core.ts and export it; then update
aggregations-diagnostics.ts to import { computeTimingStatsFromValues } (it
already imports TimingStats) and remove the local copy. Ensure the exported
function signature and return type remain TimingStats | null and that callers
use the imported function instead of the duplicated implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx`:
- Around line 166-175: The "missing evals" count currently uses
(coverage.totalItems - coverage.frontierEvalItems) which double-counts
frontierEvalFailures; update the UI in coverage-diagnostics.tsx so the displayed
missing-evals number excludes failures (e.g., compute missing =
coverage.totalItems - coverage.frontierEvalItems - frontierEvalFailures) or
alternatively change the label to indicate it includes failures (e.g., "missing
or failed evals"); locate the paragraph that renders {coverage.totalItems -
coverage.frontierEvalItems} and adjust the expression or label accordingly to
avoid double-counting.

In `@apps/dashboard/src/components/run-detail/timing-stats.tsx`:
- Line 154: The exported function TimingStats lacks the required TSDoc/JSDoc;
add a JSDoc block immediately above the TimingStats declaration describing the
function's purpose, its parameter (items: TimingStatsProps) with types and
meaning, the return value (React element / JSX.Element), and any errors/throws
if applicable; ensure the comment uses TSDoc/JSDoc syntax (/** ... */) and
mentions TimingStatsProps so linters and documentation tools recognize the
export.
- Around line 186-203: The tooltip copy used by the TimingGrid is specific to
"generation" and is misleading for scoring metrics; update TimingGrid to accept
a customizable tooltip/description prop (e.g., tooltipText or description) and
use a neutral default ("Timing information for this stage") to preserve current
behavior, then pass a scoring-specific string when rendering scoring blocks (in
the scoringStats branch where TimingGrid is invoked) or rely on the neutral
default for all. Modify the TimingGrid component signature and its internal
tooltip rendering to use the new prop (or default) and update the calls in the
Generation and Scoring usages (the TimingGrid(...) instances) accordingly.

---

Nitpick comments:
In `@apps/dashboard/src/components/run-detail/failure-breakdown.tsx`:
- Around line 74-81: Add a JSDoc/TSDoc block above the exported FailureBreakdown
function describing its purpose (renders a failure summary UI for given items),
documenting the parameter `items` via the FailureBreakdownProps type, the return
value (JSX.Element | null), and any exceptions/edge-cases (e.g., if
computeFailureStats throws) — reference the function name FailureBreakdown and
the helper computeFailureStats so the doc points to the behavior and inputs
used.

In `@apps/dashboard/src/lib/aggregations-core.ts`:
- Around line 155-183: Extract the duplicate helper computeTimingStatsFromValues
from aggregations-diagnostics.ts into aggregations-core.ts and export it; then
update aggregations-diagnostics.ts to import { computeTimingStatsFromValues }
(it already imports TimingStats) and remove the local copy. Ensure the exported
function signature and return type remain TimingStats | null and that callers
use the imported function instead of the duplicated implementation.

In `@apps/dashboard/src/lib/aggregations-diagnostics.ts`:
- Around line 49-71: The computeTimingStatsFromValues function here is a
duplicate; remove the local copy and instead export computeTimingStatsFromValues
from aggregations-core.ts (preserving its signature and the TimingStats type)
and import it into this file, then update any local references to use the
imported function; ensure you delete the duplicate implementation in
aggregations-diagnostics.ts so there’s a single shared implementation.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

apps/dashboard/src/components/run-detail/timing-stats.tsx (1)

1-3: Update file header to reflect expanded scope.

The header mentions only "generation duration metrics" but the component now also handles scoring timing statistics and breakdowns. Consider updating it to reflect the full purpose and exports.

 /**
- * Purpose: Timing statistics component showing generation duration metrics.
+ * Purpose: Timing statistics component showing generation and scoring duration metrics.
+ * Exports: TimingStats
  */

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/dashboard/src/components/run-detail/timing-stats.tsx` around lines 1 -
3, Update the file header comment to describe the component's expanded
responsibilities: mention that TimingStats (and any related exports such as
exported types/helpers) displays generation duration metrics, scoring timing
statistics, and breakdowns; also note what it exports (e.g., TimingStats
component and any helper functions/types) and its purpose in the run-detail UI
so the header accurately reflects scope and public API.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx`:
- Around line 43-49: Add a `@throws` tag to the JSDoc for the exported React
component CoverageDiagnostics; update the existing docblock above the
CoverageDiagnostics function to include a "throws" entry (if it never throws,
use "@throws none") so the docblock contains purpose, `@param`(s), `@returns` and
`@throws` as required by the coding guidelines.

---

Nitpick comments:
In `@apps/dashboard/src/components/run-detail/timing-stats.tsx`:
- Around line 1-3: Update the file header comment to describe the component's
expanded responsibilities: mention that TimingStats (and any related exports
such as exported types/helpers) displays generation duration metrics, scoring
timing statistics, and breakdowns; also note what it exports (e.g., TimingStats
component and any helper functions/types) and its purpose in the run-detail UI
so the header accurately reflects scope and public API.

dashboard: add coverage diagnostics and improve run alignment

53e56c5

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread apps/dashboard/src/components/run-detail/failure-breakdown.tsx

Comment thread apps/dashboard/src/lib/aggregations-core.ts

nitpicks

c48b613

vercel Bot deployed to Preview February 17, 2026 21:18 View deployment

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

tsconfig fix

541a16d

vercel Bot deployed to Preview February 17, 2026 21:21 View deployment

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx

Comment thread apps/dashboard/src/components/run-detail/timing-stats.tsx

Comment thread apps/dashboard/src/components/run-detail/timing-stats.tsx

dashboard: refine eval coverage and timing tooltip copy

8da0cd8

vercel Bot deployed to Preview February 17, 2026 23:27 View deployment

coderabbitai Bot reviewed Feb 17, 2026

View reviewed changes

Comment thread apps/dashboard/src/components/run-detail/coverage-diagnostics.tsx

AustinKelsay merged commit c2a6f1b into main Feb 18, 2026
3 checks passed

coderabbitai Bot mentioned this pull request Feb 18, 2026

model specifi dashboard view #10

Merged

coderabbitai Bot mentioned this pull request Mar 5, 2026

Feature/result checkpointing and aggregation #13

Merged

coderabbitai Bot mentioned this pull request Mar 22, 2026

Implement canonical machine profiles #18

Merged

Conversation

AustinKelsay commented Feb 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

AustinKelsay commented Feb 17, 2026

Uh oh!

coderabbitai Bot commented Feb 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AustinKelsay commented Feb 17, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Feb 17, 2026 •

edited

Loading

coderabbitai Bot commented Feb 17, 2026 •

edited

Loading