Skip to content

feat(tui): add per-severity finding counts to queue, review, and tasks views#671

Open
cpcloud wants to merge 17 commits intoroborev-dev:mainfrom
cpcloud:worktree-vectorized-herding-stroustrup
Open

feat(tui): add per-severity finding counts to queue, review, and tasks views#671
cpcloud wants to merge 17 commits intoroborev-dev:mainfrom
cpcloud:worktree-vectorized-herding-stroustrup

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 26, 2026

Summary

  • Add CountFindings helper that extracts high/medium/low counts from review output (critical+high → H, medium → M, low+info → L) and persist them as new columns on the reviews table in both SQLite and PostgreSQL (Postgres bumped to schema v13).
  • Backfill existing rows on schema upgrade in both databases; sync workers recompute counts from review text on pull so legacy zero rows get populated rather than overwriting freshly counted values.
  • Surface counts in the TUI: new Findings column in the queue table (toggleable, persisted in column config), aggregate Findings: H<n> M<n> L<n> line in queue status header, finding badges in the review detail header, and parent-review counts in the tasks view.
  • Severity badge styling uses red (high), yellow (medium), and blue (low) with light/dark-theme variants; zero counts render dim. Nil pointers are guarded throughout via a shared derefOrZero helper.
  • JobStats carries finding counts so roborev status reflects them; ReviewJob carries both own and parent counts so the tasks view can display the parent review's findings without a second fetch.

cpcloud added 13 commits April 26, 2026 07:53
…ew output

Refactors hasSeverityLabel into a shared classifySeverityLine helper that
recognises critical/high/medium/low/info labels; CountFindings uses it to
return per-bucket counts (high=critical+high, medium, low=low+info) while
hasSeverityLabel remains a thin wrapper that excludes info-only output from
the fail verdict, preserving existing ParseVerdict behaviour.

Also tightens isLegendEntry to stop scanning at non-legend section headers
(e.g. "Findings:") so findings below a legend block are correctly counted.
Add high_count, medium_count, low_count columns to the reviews table
schema and a migration that adds them to existing databases. BackfillFindingCounts
uses CountFindings to populate the columns for legacy rows where all three
are zero and output is non-empty; idempotent across restarts.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 26, 2026

roborev: Combined Review (0910038)

High: Backfill cannot distinguish unparsed reviews from clean parsed reviews, causing repeated startup work.

High

  • internal/storage/summary.go (BackfillFindingCounts), internal/storage/db.go (schema migration)
    • The backfill identifies unparsed rows with high_count = 0 AND medium_count = 0 AND low_count = 0, but newly processed clean reviews also have all zero counts. Since migrate() runs during initialization, historical clean reviews may be reparsed on every daemon startup, creating a linearly growing performance regression.
    • Suggested fix: make the new count columns nullable, using NULL to mean “not backfilled” and 0 to mean “parsed with no findings,” then update the backfill filter to target high_count IS NULL.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 26, 2026

- HIGH from roborev-ci: make finding count columns nullable in both
  SQLite and Postgres v13 schemas. NULL means "not yet parsed", 0
  means "parsed and zero findings". The backfill now filters on
  IS NULL and writes {h, m, l} unconditionally, so clean parsed
  reviews stop matching the predicate after one pass instead of
  being re-scanned on every daemon startup. SELECT sites that read
  into int wrap the columns in COALESCE(..., 0) so legacy NULLs
  don't fail Scan.
- Lint (testifylint): use assert.Len and assert.Contains in
  severity_badge_test.go.
- Windows test failure in TestCountJobStats_FindingAggregation:
  pass rA.RootPath instead of "/tmp/repo-a" so the filter matches
  the absolute path that GetOrCreateRepo stored after filepath.Abs
  + ToSlash on Windows.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 26, 2026

roborev: Combined Review (2dc0044)

Summary verdict: No Medium, High, or Critical findings were reported.

No actionable findings at or above Medium severity.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Adds targeted unit tests to lift coverage on PR roborev-dev#671:
- derefOrZero(nil) returns 0 (the previously uncovered branch).
- migrateColumnConfig backfills 'findings' into a custom
  TaskColumnOrder (after 'parent' or appended), and leaves an
  already-migrated TaskColumnOrder alone.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 26, 2026

roborev: Combined Review (98512c9)

Verdict: No Medium, High, or Critical findings to report.

All reported findings were Low severity, and the remaining agents found no issues.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Drop the H/M/L letters from the in-row badge — they were doubling up
with the column header and the colour already encodes severity. Slashes
render in the same dim grey as zero counts so non-zero values pop in
red/yellow/blue. Plain-text width drops from 8 to 5 for typical
single-digit counts; the column header 'Findings' (still 8 chars) keeps
the column from getting too narrow.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 26, 2026

roborev: Combined Review (61b8d70)

Summary verdict: No Medium, High, or Critical findings were reported; the reviewed changes look clean.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 26, 2026

Demo of the new Findings column with the tightened 3/2/5 badge format.

Queue view — status header aggregates findings across visible jobs (Findings: 3/6/11), per-row badges show each review's own counts. Slashes and zeros render dim grey so non-zero counts pop in red (high), yellow (medium), blue (low).

roborev queue

Review detail — header line shows Verdict: ... Findings: 0/2/5 for the selected review.

roborev review

Pushes patch coverage on PR roborev-dev#671 by exercising paths that the prior
suite skipped:

- TestPostgresMigration_FindingCounts (postgres-tagged): bootstraps a
  v12 schema, seeds a review, runs EnsureSchema to migrate to v13,
  asserts pre-existing rows have NULL counts, and verifies UpsertReview
  + PullReviews handle the new columns end-to-end (including the
  COALESCE on legacy NULLs).
- Embeds postgres_v12.sql in the migration test helper so future
  v12 -> N migrations can start there.
- Pins the existing TestPostgresMigration_SkipReasonAndClassify to v11
  explicitly; the old pgSchemaVersion-1 form silently broke when the
  schema bumped to v13 because no v12 schema was embedded.
- TestBackfillFindingCounts_PartialNull, _SkipsEmptyOutput, and
  _ClearsSyncedAt: cover the OR branch of the IS NULL predicate, the
  output != '' guard, and the synced_at = NULL re-queue.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Apr 26, 2026

roborev: Combined Review (042d921)

Verdict: No Medium, High, or Critical findings were reported; the reviewed changes look clean.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Apr 27, 2026

I wonder if the high/medium/low distinction survives language changes (e.g. if someone asks for the results to be in German)

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 27, 2026

Good question, I doubt it. Seems like to account for that there would have to be a deeper refactor to have agents produce structured output with English-based JSON keys, and construct the TUI/UX from that instead of the current approach. I think this would be a backwards compatible change, only internals would see the JSON for now.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 27, 2026

I'll get clanking on that, and we can wait on this PR until that is in.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 27, 2026

Well, basically most of the supported review agent CLIs do not support anything like --output-schema (claude code and codex do, gemini has an issue for it), so we're back to the same problem if we actually want to count these across languages. Kind of unfortunate given that many of them support structured outputs at the API layer. They just don't expose it on the CLI.

This PR can be closed out if we think it's too fragile or specific to English-output review text.

@mariusvniekerk
Copy link
Copy Markdown
Collaborator

i think we can keep it in draft until we come up with a better design

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants