feat(review): integrate codex-reviewer into stable review pipeline by mvanhorn · Pull Request #356 · EveryInc/compound-engineering-plugin

mvanhorn · 2026-03-24T00:41:29Z

Summary

Add codex-reviewer as a cross-cutting conditional persona in the stable ce:review pipeline. This agent delegates code review to OpenAI's Codex CLI for cross-model validation, catching blind spots that same-model reviewers share.

Retargeted from ce-review-beta to stable ce:review per @tmchow's feedback - beta was promoted to stable in #371.

Changes

plugins/compound-engineering/agents/review/codex-reviewer.md - New agent with environment guards (Codex sandbox detection at codex-reviewer.md:17, CLI availability check at codex-reviewer.md:32), structured findings output matching the findings schema, and conservative confidence calibration (0.65 baseline)
plugins/compound-engineering/skills/ce-review/SKILL.md - Register codex-reviewer in the cross-cutting conditional persona dispatch table (line 97)
plugins/compound-engineering/skills/ce-review/references/persona-catalog.md - Add codex entry to catalog (line 35)
tests/review-skill-contract.test.ts - 3 new tests: agent references stable skill, registered in catalog + SKILL.md, follows structured findings contract
plugins/compound-engineering/README.md - Add codex-reviewer to review agent table

Context

Original PR built against ce-review-beta which was promoted to stable in feat: promote ce:review-beta to stable ce:review #371
feat: integrate orphaned stack-specific reviewers into ce:review #375 integrated orphaned stack-specific reviewers into stable ce:review
All references now target ce:review (not ce:review-beta), using mode:autofix (not mode:autonomous)

Testing

All 389 tests pass (bun test), including 3 new codex-reviewer contract tests verifying:

Agent file references stable ce:review skill (no ce:review-beta remnants)
Codex-reviewer registered in both persona-catalog.md and SKILL.md dispatch table
Agent follows structured findings contract (confidence calibration, JSON output, suppression rules)

This contribution was developed with AI assistance (Claude Code).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f96cf6279

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tmchow · 2026-03-24T05:06:02Z

#348 was merged to main, so rebase off main please

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7a5270386

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

mvanhorn · 2026-03-24T18:07:30Z

Rebased onto main. Squashed the branch to a single commit on top of current main since #348's merge created too many conflicts for a clean commit-by-commit rebase. Tests match main (same 42 pre-existing failures, no new ones).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93ae22240d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tmchow · 2026-03-25T08:19:51Z

ce:review-beta is promoted to stable now.

mvanhorn · 2026-03-25T13:30:09Z

Noted - since review-beta is now promoted to stable, should I retarget this to integrate codex-reviewer into the stable review pipeline instead? Or close this if the integration approach has changed?

tmchow · 2026-03-25T14:04:20Z

Noted - since review-beta is now promoted to stable, should I retarget this to integrate codex-reviewer into the stable review pipeline instead? Or close this if the integration approach has changed?

Target the stable skill.

mvanhorn · 2026-03-25T16:27:07Z

Got it - will retarget the codex-reviewer integration to the stable ce:review pipeline. I'll rebase onto main, move the agent and references from ce-review-beta/ into ce-review/, and update the orchestration to match the promoted pipeline structure.

Add codex-reviewer as a cross-cutting conditional persona in ce:review. Retargeted from ce-review-beta (now promoted to stable) per maintainer feedback on PR EveryInc#356. - Add codex-reviewer.md agent with Codex CLI delegation and environment guards (sandbox detection, CLI availability check) - Register codex in persona-catalog.md and SKILL.md dispatch table - Add codex-reviewer contract tests to review-skill-contract.test.ts - Update README.md agent table with codex-reviewer entry

chatgpt-codex-connector · 2026-03-25T23:59:39Z

💡 Codex Review

compound-engineering-plugin/plugins/compound-engineering/skills/ce-review/SKILL.md

Line 167 in ec8ef55

    
           PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)

Avoid resolving PR base from unrelated local branches

The PR-mode scope command still falls back to git rev-parse --verify <base> when the actual PR base repository ref cannot be resolved. In fork/backport workflows, a local <base> branch often points at a different remote (for example origin/main), so this path can silently compute BASE against the wrong lineage and produce misleading review diffs instead of failing closed. That contradicts the later requirement to stop when the PR base cannot be resolved and can hide real regressions.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Add codex-reviewer as a cross-cutting conditional persona in ce:review. Retargeted from ce-review-beta (now promoted to stable) per maintainer feedback on PR EveryInc#356. - Add codex-reviewer.md agent with Codex CLI delegation and environment guards (sandbox detection, CLI availability check) - Register codex in persona-catalog.md and SKILL.md dispatch table - Add codex-reviewer contract tests to review-skill-contract.test.ts - Update README.md agent table with codex-reviewer entry

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f524f46d7f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

mvanhorn · 2026-03-27T02:16:04Z

Rebased onto main and addressed the Codex P1 findings in f524f46:

Removed 2>/dev/null error suppression from which codex check
Split the chained git symbolic-ref | sed into separate steps per AGENTS.md shell rules
Added a fail-closed guard for base branch resolution - the agent no longer falls back to git rev-parse --verify against local branch names that may track a different remote

The P2 about partial tool mapping in agent-content.ts is a general utility issue outside this PR's scope - happy to address that separately if you'd like.

@tmchow this is now targeting stable ce:review as requested. Ready for another look.

npwalker · 2026-04-28T03:00:50Z

@mvanhorn I like this idea as I already run ce-review followed by codex:review.

I'm about to try it out locally. Are you using this locally with success?

mvanhorn · 2026-04-28T15:37:40Z

Yeah, been using it locally for a few weeks. Codex review catches different things than ce-review: more script and shell-level findings, fewer architectural callouts. Running both has caught issues I wouldn't have hit with either alone. Holler if anything looks off in the rebase.

Add a conditional review persona that delegates to OpenAI's Codex CLI for an independent second-opinion read on the diff. Codex catches blind spots same-model reviewers share — different model, different reasoning patterns, different recall. Reapplies PR EveryInc#356 onto current main after the v3 restructure (ce-review -> ce-code-review, agents/<category>/ -> agents/ce-*.agent.md, ce: -> ce-, new subagent template that passes pre-computed diff). Addresses both Codex findings still active on the prior commit: - P1 (line 54): "Stop assuming ce:review passes base branch context." The new subagent template passes the pre-computed diff directly; ce-codex-reviewer materializes that diff to a tempfile and feeds it to codex. No base-branch resolution, no git symbolic-ref or rev-parse, no fall-back chain to break on non-default targets. - P2 (line 123): "Remove cross-persona suppression rule." The agent no longer instructs codex to suppress findings already covered by other personas. Synthesis dedupes centrally in Stage 5; the parallel subagent has no visibility into other reviewers' outputs anyway. Pieces: - agents/ce-codex-reviewer.agent.md — new persona, anchored confidence rubric (50/75; 100 never emitted because second-opinion is not direct verification), two-phase environment guard (CODEX_SANDBOX/SESSION_ID recursion check + which codex availability check), pipe-delimited output translation into the structured findings schema. - skills/ce-code-review/references/persona-catalog.md — codex registered as conditional persona EveryInc#9. Selected when the diff is non-trivial (>=25 changed non-test lines) or correctness-sensitive (auth, payments, data, parsing, concurrency, external API contracts) AND the user-level codex CLI is on PATH. - skills/ce-code-review/SKILL.md — codex added to the cross-cutting conditional dispatch table; reviewer count bumped 18 -> 19. - plugins/compound-engineering/README.md — codex listed in the Review agents table (insertion in sorted order). - tests/review-skill-contract.test.ts — codex added to the anchored- rubric persona loop, plus four new contract tests guarding (a) frontmatter shape, (b) the no-base-branch-resolution rule with explicit P1 regression callouts, (c) the no-cross-persona-suppression rule with explicit P2 regression callout, (d) catalog/SKILL.md registration, and (e) the two environment guards before any codex invocation. Verification: - bun test tests/review-skill-contract.test.ts: 28 pass (was 24) - bun test full suite: 992 pass, 9 pre-existing failures in detect-project-type.sh unchanged (present on main before this PR) - bun run release:validate: clean (52 agents, 35 skills, 0 MCP) No release-owned files touched: plugin.json, marketplace.json, and CHANGELOG.md remain untouched per AGENTS.md rules.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8a5ed2673

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

mvanhorn · 2026-04-28T17:17:03Z

Rebased onto current main and re-applied as a single commit (a8a5ed2). Main moved a lot since the original branch — ce-review → ce-code-review, agents flattened to ce-*.agent.md, ce: → ce-, and the subagent template now passes a pre-computed diff. The re-apply addresses both still-open Codex findings:

P1 (base-branch assumption): the new subagent template passes the diff directly, so ce-codex-reviewer writes it to a tempfile and feeds it to codex exec. No base-branch resolution, no git symbolic-ref / rev-parse fallback chain. A new contract test (tests/review-skill-contract.test.ts) guards the regression.
P2 (cross-persona suppression): dropped. The agent runs as an independent parallel subagent and has no visibility into other reviewers; Stage 5 synthesis dedupes centrally. Also guarded by a contract test.

bun run release:validate clean (52 agents now), full suite has the same 9 pre-existing failures as main. Holler if anything looks off.

Addresses both Codex findings on a8a5ed2 (April 2026): P1 — Emit schema-valid line numbers for file-level findings. The findings schema requires `line >= 1` (verified in references/findings-schema.json). The previous prompt told codex to emit `LINE=0` for file-level issues, which would have been dropped by the merge validator as malformed before synthesis — silently losing every file-level Codex finding. Fix: the codex prompt now requires `line` to be a positive integer and adds a `file_level` boolean. When `file_level: true`, codex sets `line: 1` and the agent prepends a "file-level finding (no specific line applies)" string to the evidence array so synthesis and downstream surfaces can still distinguish "line 1 was the issue" from "this is a whole-file concern." The `file_level` signal is preserved in evidence rather than as a separate field because the findings schema doesn't expose a top-level file_level flag, and inventing one would fail the strict-validator path. P2 — Use structured output so evidence cannot break parsing. The previous pipe-delimited contract dropped any row that wasn't exactly five `|`-separated fields. EVIDENCE is a raw code snippet that can legitimately contain `|` (bitwise OR / union types, shell pipes, markdown tables, regex alternation). When that happened, valid findings disappeared silently. Fix: switch the codex prompt to NDJSON — one JSON object per line. The agent JSON-parses each line independently; embedded pipes in evidence are no longer a parsing hazard because JSON quoting handles them. Lines that fail to parse are skipped (no retry, no inference). Test coverage: - New contract test "uses NDJSON output contract so evidence can carry pipes safely" guards the prompt format and asserts the pipe-delimited shape stays gone. - New contract test "emits schema-valid line numbers for file-level findings" guards the line=1 + file_level=true convention, asserts the old "0 means file-level" wording is gone, and reads the findings schema to verify minimum=1 still holds (so a future schema relaxation triggers a deliberate test update rather than a silent drift). Both tests cite their originating Codex finding inline so future regressions get a clear pointer to PR EveryInc#356. bun test tests/review-skill-contract.test.ts: 30 pass (was 28) bun run release:validate: clean

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: adba28a8f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T17:45:45Z

+
+## Step 4: Translate codex output into findings
+
+Parse codex's stdout line-by-line as NDJSON. Skip blank lines. Skip the literal `NO_FINDINGS` token. For each remaining line, attempt to JSON-parse it; if parsing fails, skip the line (do not retry, do not infer).


Treat fully unparsable Codex output as reviewer failure

Step 4 instructs the agent to silently skip every non-JSON line from Codex output, but it never marks the run as degraded when stdout is non-empty and none of it parses. In that case the persona can return an empty findings set that looks like a clean review, even though the cross-model review actually failed to produce structured results (for example if Codex emits prose/refusal text). This masks loss of reviewer coverage and can hide real issues that were expected from this pass.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-28T17:45:45Z

+
+## Step 5: Output
+
+Return the findings as JSON matching the contract in the subagent template. Honor the artifact-file write if a Run ID is present: write the full analysis to `/tmp/compound-engineering/ce-code-review/{run_id}/ce-codex-reviewer.json` and return the compact form to the parent.


Write artifact using the reviewer-keyed filename

This hardcodes the artifact filename to ce-codex-reviewer.json, but the same file sets the top-level reviewer key to "codex". The review pipeline’s detail-enrichment steps resolve artifact files by reviewer key ({reviewer_name} / {reviewer}.json), so codex findings can lose why_it_matters and evidence lookups when the file is stored under a different name. Use the reviewer-keyed filename convention to keep downstream matching deterministic.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated

tmchow force-pushed the feat/compare-review-skills branch from 76bf361 to 70e8729 Compare March 24, 2026 03:02

mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from 3f96cf6 to d7a5270 Compare March 24, 2026 17:39

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread .github/.release-please-manifest.json Outdated

Comment thread plugins/compound-engineering/skills/ce-review-beta/SKILL.md Outdated

mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from d7a5270 to 93ae222 Compare March 24, 2026 18:07

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread src/utils/agent-content.ts Outdated

mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from 93ae222 to ec8ef55 Compare March 25, 2026 23:50

mvanhorn changed the title ~~feat(review): integrate codex-reviewer into review-beta pipeline~~ feat(review): integrate codex-reviewer into stable review pipeline Mar 25, 2026

mvanhorn changed the base branch from feat/compare-review-skills to main March 25, 2026 23:50

mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from ec8ef55 to f524f46 Compare March 27, 2026 01:54

chatgpt-codex-connector Bot reviewed Mar 27, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated

mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from f524f46 to a8a5ed2 Compare April 28, 2026 17:11

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/ce-codex-reviewer.agent.md Outdated

Comment thread plugins/compound-engineering/agents/ce-codex-reviewer.agent.md Outdated

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes


		## Step 4: Translate codex output into findings

		Parse codex's stdout line-by-line as NDJSON. Skip blank lines. Skip the literal `NO_FINDINGS` token. For each remaining line, attempt to JSON-parse it; if parsing fails, skip the line (do not retry, do not infer).


		## Step 5: Output

		Return the findings as JSON matching the contract in the subagent template. Honor the artifact-file write if a Run ID is present: write the full analysis to `/tmp/compound-engineering/ce-code-review/{run_id}/ce-codex-reviewer.json` and return the compact form to the parent.

Conversation

mvanhorn commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Context

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

tmchow commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mvanhorn commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

tmchow commented Mar 25, 2026

Uh oh!

mvanhorn commented Mar 25, 2026

Uh oh!

tmchow commented Mar 25, 2026

Uh oh!

mvanhorn commented Mar 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 25, 2026

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mvanhorn commented Mar 27, 2026

Uh oh!

npwalker commented Apr 28, 2026

Uh oh!

mvanhorn commented Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mvanhorn commented Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mvanhorn commented Mar 24, 2026 •

edited

Loading