Skip to content

feat(ce-resolve-pr-feedback): drop bot noise, centralize test runs#610

Merged
tmchow merged 1 commit intomainfrom
tmchow/optimize-pr-feedback
Apr 20, 2026
Merged

feat(ce-resolve-pr-feedback): drop bot noise, centralize test runs#610
tmchow merged 1 commit intomainfrom
tmchow/optimize-pr-feedback

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 20, 2026

Summary

Two wins for ce-resolve-pr-feedback, both cutting per-thread work that should happen once per invocation:

  • Review-bot wrapper comments no longer reach the agent. CodeRabbit, Codex, Gemini Code Assist, Copilot, and Codecov post boilerplate the skill repeatedly deliberated on and then narrated dropping. Their actionable asks live in inline review threads, which still come through untouched.
  • Resolver agents no longer run the full project test suite. They run only targeted tests for what they changed; the skill runs project validation once against the combined diff before commit.

Bot wrapper filter

scripts/get-pr-comments drops known review-bot and CI-bot authors from pr_comments and review_bodies in the jq pipeline, before the skill sees them:

  • coderabbitai, chatgpt-codex-connector, gemini-code-assist, copilot-pull-request-reviewer — AI review bots whose top-level output is wrapper text; their actionable asks are inline and still processed as review_threads.
  • codecov — CI summary, never actionable on its own.

The filter list is narrow by design: exact logins, not patterns, so future bots that post genuinely actionable top-level content are not suppressed preemptively. The GraphQL payload also drops url and createdAt fields that weren't consumed downstream.

The skill gains an explicit silent-drop rule for any wrapper that slips through the script — no announcing, counting, or surfacing dropped items. Removes the "ignoring the Codex bot comment" chatter from the skill's output.

Test split between resolver and orchestrator

Parallel resolvers previously ran the full project test suite in isolation, duplicating work (N × full-suite time, N × test output read by the model) and never validating the merged diff.

  • Resolver agent writes tests when warranted and runs only targeted tests for what it changed (specific file, test pattern, or the test it just wrote). Trivial edits (doc, comment, string-literal) skip tests entirely. If targeted tests can't be located, the agent notes that in reason and the orchestrator catches issues in the combined run. No verdict downgrade.
  • Orchestrator gains a new step 6 (Validate Combined State). After all resolvers return, it aggregates files_changed and runs the project's validation command once against the combined diff. Three outcomes: green proceeds to commit, red in touched files gets one inline fix attempt before escalating needs-human, red in untouched files is treated as pre-existing and noted in the commit footer.

Steps 6-9 renumbered to 7-10 to accommodate the new Validate step. Step 10 summary includes a one-line validation outcome (Validation: bun test passed (893/893)).

Test plan

  • bun test green (893/893). Frontmatter and converter tests would catch a malformed skill change or a broken script.
  • bun run release:validate in sync.
  • Manual verification of the bot filter against live PRs: siddharthvaddem/openscreen#471 (CodeRabbit + Codex), GoogleCloudPlatform/cluster-toolkit#5531 (Gemini), jaegertracing/jaeger-ui#3740 (Copilot + Codecov). Every bot wrapper filtered; human reviewer comments preserved in each case.

Compound Engineering
Claude Code

…pers, split test responsibility

- Filter CodeRabbit, Codex, Gemini, Copilot, and Codecov top-level comments
  and review bodies in get-pr-comments. These bots put actionable feedback in
  inline review threads; their wrapper bodies are reliable noise and the
  agent repeatedly narrated ignoring them. Also trim unused url/createdAt
  fields from the GraphQL payload.
- Add silent-drop rule so the agent stops announcing non-actionable items.
- Split test responsibility: resolvers run only targeted tests for what they
  changed (specific file, pattern, or the test they wrote), never the full
  project suite. Orchestrator runs the project's validation once against the
  combined diff before commit. Covers pre-existing failures (note in footer),
  failures in touched files (one inline retry, then escalate), and the
  zero-changes skip path.
@tmchow tmchow merged commit b35de99 into main Apr 20, 2026
2 checks passed
This was referenced Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant