feat(review): integrate codex-reviewer into stable review pipeline#356
feat(review): integrate codex-reviewer into stable review pipeline#356mvanhorn wants to merge 2 commits intoEveryInc:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f96cf6279
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
76bf361 to
70e8729
Compare
|
#348 was merged to main, so rebase off main please |
3f96cf6 to
d7a5270
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7a5270386
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d7a5270 to
93ae222
Compare
|
Rebased onto main. Squashed the branch to a single commit on top of current main since #348's merge created too many conflicts for a clean commit-by-commit rebase. Tests match main (same 42 pre-existing failures, no new ones). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93ae22240d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
|
Noted - since review-beta is now promoted to stable, should I retarget this to integrate codex-reviewer into the stable review pipeline instead? Or close this if the integration approach has changed? |
Target the stable skill. |
|
Got it - will retarget the codex-reviewer integration to the stable |
Add codex-reviewer as a cross-cutting conditional persona in ce:review. Retargeted from ce-review-beta (now promoted to stable) per maintainer feedback on PR EveryInc#356. - Add codex-reviewer.md agent with Codex CLI delegation and environment guards (sandbox detection, CLI availability check) - Register codex in persona-catalog.md and SKILL.md dispatch table - Add codex-reviewer contract tests to review-skill-contract.test.ts - Update README.md agent table with codex-reviewer entry
93ae222 to
ec8ef55
Compare
💡 Codex ReviewThe PR-mode scope command still falls back to ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
ec8ef55 to
f524f46
Compare
Add codex-reviewer as a cross-cutting conditional persona in ce:review. Retargeted from ce-review-beta (now promoted to stable) per maintainer feedback on PR EveryInc#356. - Add codex-reviewer.md agent with Codex CLI delegation and environment guards (sandbox detection, CLI availability check) - Register codex in persona-catalog.md and SKILL.md dispatch table - Add codex-reviewer contract tests to review-skill-contract.test.ts - Update README.md agent table with codex-reviewer entry
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f524f46d7f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Rebased onto main and addressed the Codex P1 findings in f524f46:
The P2 about partial tool mapping in @tmchow this is now targeting stable |
|
@mvanhorn I like this idea as I already run ce-review followed by codex:review. I'm about to try it out locally. Are you using this locally with success? |
|
Yeah, been using it locally for a few weeks. Codex review catches different things than ce-review: more script and shell-level findings, fewer architectural callouts. Running both has caught issues I wouldn't have hit with either alone. Holler if anything looks off in the rebase. |
Add a conditional review persona that delegates to OpenAI's Codex CLI for an independent second-opinion read on the diff. Codex catches blind spots same-model reviewers share — different model, different reasoning patterns, different recall. Reapplies PR EveryInc#356 onto current main after the v3 restructure (ce-review -> ce-code-review, agents/<category>/ -> agents/ce-*.agent.md, ce: -> ce-, new subagent template that passes pre-computed diff). Addresses both Codex findings still active on the prior commit: - P1 (line 54): "Stop assuming ce:review passes base branch context." The new subagent template passes the pre-computed diff directly; ce-codex-reviewer materializes that diff to a tempfile and feeds it to codex. No base-branch resolution, no git symbolic-ref or rev-parse, no fall-back chain to break on non-default targets. - P2 (line 123): "Remove cross-persona suppression rule." The agent no longer instructs codex to suppress findings already covered by other personas. Synthesis dedupes centrally in Stage 5; the parallel subagent has no visibility into other reviewers' outputs anyway. Pieces: - agents/ce-codex-reviewer.agent.md — new persona, anchored confidence rubric (50/75; 100 never emitted because second-opinion is not direct verification), two-phase environment guard (CODEX_SANDBOX/SESSION_ID recursion check + which codex availability check), pipe-delimited output translation into the structured findings schema. - skills/ce-code-review/references/persona-catalog.md — codex registered as conditional persona EveryInc#9. Selected when the diff is non-trivial (>=25 changed non-test lines) or correctness-sensitive (auth, payments, data, parsing, concurrency, external API contracts) AND the user-level codex CLI is on PATH. - skills/ce-code-review/SKILL.md — codex added to the cross-cutting conditional dispatch table; reviewer count bumped 18 -> 19. - plugins/compound-engineering/README.md — codex listed in the Review agents table (insertion in sorted order). - tests/review-skill-contract.test.ts — codex added to the anchored- rubric persona loop, plus four new contract tests guarding (a) frontmatter shape, (b) the no-base-branch-resolution rule with explicit P1 regression callouts, (c) the no-cross-persona-suppression rule with explicit P2 regression callout, (d) catalog/SKILL.md registration, and (e) the two environment guards before any codex invocation. Verification: - bun test tests/review-skill-contract.test.ts: 28 pass (was 24) - bun test full suite: 992 pass, 9 pre-existing failures in detect-project-type.sh unchanged (present on main before this PR) - bun run release:validate: clean (52 agents, 35 skills, 0 MCP) No release-owned files touched: plugin.json, marketplace.json, and CHANGELOG.md remain untouched per AGENTS.md rules.
f524f46 to
a8a5ed2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8a5ed2673
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Rebased onto current main and re-applied as a single commit (a8a5ed2). Main moved a lot since the original branch —
|
Addresses both Codex findings on a8a5ed2 (April 2026): P1 — Emit schema-valid line numbers for file-level findings. The findings schema requires `line >= 1` (verified in references/findings-schema.json). The previous prompt told codex to emit `LINE=0` for file-level issues, which would have been dropped by the merge validator as malformed before synthesis — silently losing every file-level Codex finding. Fix: the codex prompt now requires `line` to be a positive integer and adds a `file_level` boolean. When `file_level: true`, codex sets `line: 1` and the agent prepends a "file-level finding (no specific line applies)" string to the evidence array so synthesis and downstream surfaces can still distinguish "line 1 was the issue" from "this is a whole-file concern." The `file_level` signal is preserved in evidence rather than as a separate field because the findings schema doesn't expose a top-level file_level flag, and inventing one would fail the strict-validator path. P2 — Use structured output so evidence cannot break parsing. The previous pipe-delimited contract dropped any row that wasn't exactly five `|`-separated fields. EVIDENCE is a raw code snippet that can legitimately contain `|` (bitwise OR / union types, shell pipes, markdown tables, regex alternation). When that happened, valid findings disappeared silently. Fix: switch the codex prompt to NDJSON — one JSON object per line. The agent JSON-parses each line independently; embedded pipes in evidence are no longer a parsing hazard because JSON quoting handles them. Lines that fail to parse are skipped (no retry, no inference). Test coverage: - New contract test "uses NDJSON output contract so evidence can carry pipes safely" guards the prompt format and asserts the pipe-delimited shape stays gone. - New contract test "emits schema-valid line numbers for file-level findings" guards the line=1 + file_level=true convention, asserts the old "0 means file-level" wording is gone, and reads the findings schema to verify minimum=1 still holds (so a future schema relaxation triggers a deliberate test update rather than a silent drift). Both tests cite their originating Codex finding inline so future regressions get a clear pointer to PR EveryInc#356. bun test tests/review-skill-contract.test.ts: 30 pass (was 28) bun run release:validate: clean
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: adba28a8f4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ## Step 4: Translate codex output into findings | ||
|
|
||
| Parse codex's stdout line-by-line as NDJSON. Skip blank lines. Skip the literal `NO_FINDINGS` token. For each remaining line, attempt to JSON-parse it; if parsing fails, skip the line (do not retry, do not infer). |
There was a problem hiding this comment.
Treat fully unparsable Codex output as reviewer failure
Step 4 instructs the agent to silently skip every non-JSON line from Codex output, but it never marks the run as degraded when stdout is non-empty and none of it parses. In that case the persona can return an empty findings set that looks like a clean review, even though the cross-model review actually failed to produce structured results (for example if Codex emits prose/refusal text). This masks loss of reviewer coverage and can hide real issues that were expected from this pass.
Useful? React with 👍 / 👎.
|
|
||
| ## Step 5: Output | ||
|
|
||
| Return the findings as JSON matching the contract in the subagent template. Honor the artifact-file write if a Run ID is present: write the full analysis to `/tmp/compound-engineering/ce-code-review/{run_id}/ce-codex-reviewer.json` and return the compact form to the parent. |
There was a problem hiding this comment.
Write artifact using the reviewer-keyed filename
This hardcodes the artifact filename to ce-codex-reviewer.json, but the same file sets the top-level reviewer key to "codex". The review pipeline’s detail-enrichment steps resolve artifact files by reviewer key ({reviewer_name} / {reviewer}.json), so codex findings can lose why_it_matters and evidence lookups when the file is stored under a different name. Use the reviewer-keyed filename convention to keep downstream matching deterministic.
Useful? React with 👍 / 👎.
Summary
Add
codex-revieweras a cross-cutting conditional persona in the stablece:reviewpipeline. This agent delegates code review to OpenAI's Codex CLI for cross-model validation, catching blind spots that same-model reviewers share.Retargeted from
ce-review-betato stablece:reviewper @tmchow's feedback - beta was promoted to stable in #371.Changes
plugins/compound-engineering/agents/review/codex-reviewer.md- New agent with environment guards (Codex sandbox detection atcodex-reviewer.md:17, CLI availability check atcodex-reviewer.md:32), structured findings output matching the findings schema, and conservative confidence calibration (0.65 baseline)plugins/compound-engineering/skills/ce-review/SKILL.md- Register codex-reviewer in the cross-cutting conditional persona dispatch table (line 97)plugins/compound-engineering/skills/ce-review/references/persona-catalog.md- Add codex entry to catalog (line 35)tests/review-skill-contract.test.ts- 3 new tests: agent references stable skill, registered in catalog + SKILL.md, follows structured findings contractplugins/compound-engineering/README.md- Add codex-reviewer to review agent tableContext
ce-review-betawhich was promoted to stable in feat: promotece:review-betato stablece:review#371ce:reviewce:review(notce:review-beta), usingmode:autofix(notmode:autonomous)Testing
All 389 tests pass (
bun test), including 3 new codex-reviewer contract tests verifying:ce:reviewskill (noce:review-betaremnants)This contribution was developed with AI assistance (Claude Code).