dashboard layout improvements by AustinKelsay · Pull Request #21 · AustinKelsay/plebdev-bench

AustinKelsay · 2026-03-30T12:05:30Z

Summary by CodeRabbit

Bug Fixes
- Corrected duration metric label from "Avg Duration" to "Median Duration" for clarity.
Refactor
- Reorganized leaderboard chart sequence for improved presentation.
- Reordered dashboard overview sections for better content flow.
Documentation
- Added comprehensive project documentation covering architecture and data sources.
Tests
- Expanded test fixtures and checkpoint validation coverage.

vercel · 2026-03-30T12:05:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
plebdev-bench-dashboard	Ready	Preview, Comment	Mar 30, 2026 0:46am

coderabbitai · 2026-03-30T12:05:46Z

📝 Walkthrough

Walkthrough

This pull request reorders dashboard UI components and updates labeling, expands the set of core benchmark library files tracked for checkpoint hashing, documents the project architecture and data state, and updates corresponding tests to reflect the expanded assets.

Changes

Cohort / File(s)	Summary
Dashboard leaderboard components `apps/dashboard/src/components/leaderboard/leaderboard-chart-gallery.tsx`, `apps/dashboard/src/components/leaderboard/leaderboard-summary-cards.tsx`	Repositioned `ModelTestHeatmap` component ordering in chart gallery and updated KPI label from "Avg Duration" to "Median Duration" on summary card without altering logic.
Dashboard run detail layout `apps/dashboard/src/components/run-detail/run-detail-page.tsx`	Reordered overview tab sections, moving `CoverageDiagnostics`, `Results Matrix`, and breakdown components to appear after comparison charts while preserving all event wiring and props.
Benchmark checkpoint assets `src/lib/benchmark-checkpoint.ts`	Expanded `CORE_BENCHMARK_LIB_ASSETS` to include five additional library files (`code-module-scorer.ts`, `workspace-scorer.ts`, `workspace-manifest.ts`, `test-workspace.ts`, `signal-assessment.ts`), affecting checkpoint hash computation.
Test fixtures and cases `test/benchmark-checkpoint.test.ts`, `test/build-index.test.ts`	Updated test fixtures to include new benchmark library files and added validation that modifications to `signal-assessment.ts` produce different checkpoint IDs.
Project documentation `memory/MEMORY.md`	Added comprehensive project documentation covering stack, repository paths, dashboard architecture, composite scoring formula, checkpoint behavior, and current data state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

model specifi dashboard view #10: Modifies run-detail-page.tsx with similar component reordering and layout restructuring of the overview section.
Feature/result checkpointing and aggregation #13: Directly modifies src/lib/benchmark-checkpoint.ts and related checkpointing logic affected by the asset expansion.
Implement canonical machine profiles #18: Updates leaderboard-summary-cards.tsx with KPI label and card behavior changes to the same component.

Poem

🐰 Charts reshuffled, cards relabeled with care,
Benchmarks expanded with assets to share,
Hashes now track what they didn't before,
Documentation springs forth, our knowledge is soaring! 📊✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'dashboard layout improvements' is too vague and does not accurately reflect the actual changes, which include dashboard UI reorganization, metric label updates, and backend checkpoint asset expansion.	Use a more specific title that captures the main changes, such as 'Reorder dashboard components and update checkpoint tracking' or 'Reorganize dashboard layout and expand core benchmark assets'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch staging

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

memory/MEMORY.md (1)
23-31: Consider adding a note about snapshot data.

The "Current Data State" section contains specific values (checkpoint ID, run counts, model lists) that will become stale as the benchmark evolves. Consider adding a brief note indicating this is a point-in-time snapshot, or document the expectation for keeping this section updated.
📝 Suggested clarification
-## Current Data State (as of 2026-03-30)
+## Current Data State (snapshot — update after significant runs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@memory/MEMORY.md` around lines 23 - 31, Add a brief clarifying sentence to
the "Current Data State" section indicating these listed values are a
point-in-time snapshot and may become stale; update the section header or
directly append a line under "## Current Data State (as of 2026-03-30)" stating
that the checkpoint ID, run counts, model lists and other metrics reflect the
state on that date and will change over time, and optionally note how/where to
update them in future (e.g., refer to the canonical update process or repository
file) so readers know to treat these as temporal data.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@memory/MEMORY.md`:
- Around line 23-31: Add a brief clarifying sentence to the "Current Data State"
section indicating these listed values are a point-in-time snapshot and may
become stale; update the section header or directly append a line under "##
Current Data State (as of 2026-03-30)" stating that the checkpoint ID, run
counts, model lists and other metrics reflect the state on that date and will
change over time, and optionally note how/where to update them in future (e.g.,
refer to the canonical update process or repository file) so readers know to
treat these as temporal data.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 49398d3c-52b4-4062-ab97-2aba209c8198

📥 Commits

Reviewing files that changed from the base of the PR and between cbd4142 and 5164b65.

📒 Files selected for processing (7)

apps/dashboard/src/components/leaderboard/leaderboard-chart-gallery.tsx
apps/dashboard/src/components/leaderboard/leaderboard-summary-cards.tsx
apps/dashboard/src/components/run-detail/run-detail-page.tsx
memory/MEMORY.md
src/lib/benchmark-checkpoint.ts
test/benchmark-checkpoint.test.ts
test/build-index.test.ts

dashboard layout improvements

5164b65

vercel Bot deployed to Preview March 30, 2026 12:05 View deployment

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

AustinKelsay merged commit 0807904 into main Mar 30, 2026
3 checks passed

coderabbitai Bot mentioned this pull request Apr 5, 2026

[codex] bench: harden signal assessment and retry fairness #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dashboard layout improvements#21

dashboard layout improvements#21
AustinKelsay merged 1 commit intomainfrom
staging

AustinKelsay commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AustinKelsay commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AustinKelsay commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Mar 30, 2026 •

edited

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading