Skip to content

feat(review): integrate codex-reviewer into stable review pipeline#356

Open
mvanhorn wants to merge 2 commits intoEveryInc:mainfrom
mvanhorn:osc/348-codex-reviewer-review-beta
Open

feat(review): integrate codex-reviewer into stable review pipeline#356
mvanhorn wants to merge 2 commits intoEveryInc:mainfrom
mvanhorn:osc/348-codex-reviewer-review-beta

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Mar 24, 2026

Summary

Add codex-reviewer as a cross-cutting conditional persona in the stable ce:review pipeline. This agent delegates code review to OpenAI's Codex CLI for cross-model validation, catching blind spots that same-model reviewers share.

Retargeted from ce-review-beta to stable ce:review per @tmchow's feedback - beta was promoted to stable in #371.

Changes

  • plugins/compound-engineering/agents/review/codex-reviewer.md - New agent with environment guards (Codex sandbox detection at codex-reviewer.md:17, CLI availability check at codex-reviewer.md:32), structured findings output matching the findings schema, and conservative confidence calibration (0.65 baseline)
  • plugins/compound-engineering/skills/ce-review/SKILL.md - Register codex-reviewer in the cross-cutting conditional persona dispatch table (line 97)
  • plugins/compound-engineering/skills/ce-review/references/persona-catalog.md - Add codex entry to catalog (line 35)
  • tests/review-skill-contract.test.ts - 3 new tests: agent references stable skill, registered in catalog + SKILL.md, follows structured findings contract
  • plugins/compound-engineering/README.md - Add codex-reviewer to review agent table

Context

Testing

All 389 tests pass (bun test), including 3 new codex-reviewer contract tests verifying:

  1. Agent file references stable ce:review skill (no ce:review-beta remnants)
  2. Codex-reviewer registered in both persona-catalog.md and SKILL.md dispatch table
  3. Agent follows structured findings contract (confidence calibration, JSON output, suppression rules)

This contribution was developed with AI assistance (Claude Code).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f96cf6279

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated
Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated
@tmchow tmchow force-pushed the feat/compare-review-skills branch from 76bf361 to 70e8729 Compare March 24, 2026 03:02
@tmchow
Copy link
Copy Markdown
Collaborator

tmchow commented Mar 24, 2026

#348 was merged to main, so rebase off main please

@mvanhorn mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from 3f96cf6 to d7a5270 Compare March 24, 2026 17:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7a5270386

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/.release-please-manifest.json Outdated
Comment thread plugins/compound-engineering/skills/ce-review-beta/SKILL.md Outdated
@mvanhorn mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from d7a5270 to 93ae222 Compare March 24, 2026 18:07
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Rebased onto main. Squashed the branch to a single commit on top of current main since #348's merge created too many conflicts for a clean commit-by-commit rebase. Tests match main (same 42 pre-existing failures, no new ones).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93ae22240d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/utils/agent-content.ts Outdated
@tmchow
Copy link
Copy Markdown
Collaborator

tmchow commented Mar 25, 2026

ce:review-beta is promoted to stable now.

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Noted - since review-beta is now promoted to stable, should I retarget this to integrate codex-reviewer into the stable review pipeline instead? Or close this if the integration approach has changed?

@tmchow
Copy link
Copy Markdown
Collaborator

tmchow commented Mar 25, 2026

Noted - since review-beta is now promoted to stable, should I retarget this to integrate codex-reviewer into the stable review pipeline instead? Or close this if the integration approach has changed?

Target the stable skill.

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Got it - will retarget the codex-reviewer integration to the stable ce:review pipeline. I'll rebase onto main, move the agent and references from ce-review-beta/ into ce-review/, and update the orchestration to match the promoted pipeline structure.

mvanhorn added a commit to mvanhorn/compound-engineering-plugin that referenced this pull request Mar 25, 2026
Add codex-reviewer as a cross-cutting conditional persona in ce:review.
Retargeted from ce-review-beta (now promoted to stable) per maintainer
feedback on PR EveryInc#356.

- Add codex-reviewer.md agent with Codex CLI delegation and environment
  guards (sandbox detection, CLI availability check)
- Register codex in persona-catalog.md and SKILL.md dispatch table
- Add codex-reviewer contract tests to review-skill-contract.test.ts
- Update README.md agent table with codex-reviewer entry
@mvanhorn mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from 93ae222 to ec8ef55 Compare March 25, 2026 23:50
@mvanhorn mvanhorn changed the title feat(review): integrate codex-reviewer into review-beta pipeline feat(review): integrate codex-reviewer into stable review pipeline Mar 25, 2026
@mvanhorn mvanhorn changed the base branch from feat/compare-review-skills to main March 25, 2026 23:50
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)

P1 Badge Avoid resolving PR base from unrelated local branches

The PR-mode scope command still falls back to git rev-parse --verify <base> when the actual PR base repository ref cannot be resolved. In fork/backport workflows, a local <base> branch often points at a different remote (for example origin/main), so this path can silently compute BASE against the wrong lineage and produce misleading review diffs instead of failing closed. That contradicts the later requirement to stop when the PR base cannot be resolved and can hide real regressions.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mvanhorn mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from ec8ef55 to f524f46 Compare March 27, 2026 01:54
mvanhorn added a commit to mvanhorn/compound-engineering-plugin that referenced this pull request Mar 27, 2026
Add codex-reviewer as a cross-cutting conditional persona in ce:review.
Retargeted from ce-review-beta (now promoted to stable) per maintainer
feedback on PR EveryInc#356.

- Add codex-reviewer.md agent with Codex CLI delegation and environment
  guards (sandbox detection, CLI availability check)
- Register codex in persona-catalog.md and SKILL.md dispatch table
- Add codex-reviewer contract tests to review-skill-contract.test.ts
- Update README.md agent table with codex-reviewer entry
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f524f46d7f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated
Comment thread plugins/compound-engineering/agents/review/codex-reviewer.md Outdated
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Rebased onto main and addressed the Codex P1 findings in f524f46:

  1. Removed 2>/dev/null error suppression from which codex check
  2. Split the chained git symbolic-ref | sed into separate steps per AGENTS.md shell rules
  3. Added a fail-closed guard for base branch resolution - the agent no longer falls back to git rev-parse --verify against local branch names that may track a different remote

The P2 about partial tool mapping in agent-content.ts is a general utility issue outside this PR's scope - happy to address that separately if you'd like.

@tmchow this is now targeting stable ce:review as requested. Ready for another look.

@npwalker
Copy link
Copy Markdown

@mvanhorn I like this idea as I already run ce-review followed by codex:review.

I'm about to try it out locally. Are you using this locally with success?

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Yeah, been using it locally for a few weeks. Codex review catches different things than ce-review: more script and shell-level findings, fewer architectural callouts. Running both has caught issues I wouldn't have hit with either alone. Holler if anything looks off in the rebase.

Add a conditional review persona that delegates to OpenAI's Codex CLI
for an independent second-opinion read on the diff. Codex catches
blind spots same-model reviewers share — different model, different
reasoning patterns, different recall.

Reapplies PR EveryInc#356 onto current main after the v3 restructure
(ce-review -> ce-code-review, agents/<category>/ -> agents/ce-*.agent.md,
ce: -> ce-, new subagent template that passes pre-computed diff).

Addresses both Codex findings still active on the prior commit:

- P1 (line 54): "Stop assuming ce:review passes base branch context."
  The new subagent template passes the pre-computed diff directly;
  ce-codex-reviewer materializes that diff to a tempfile and feeds
  it to codex. No base-branch resolution, no git symbolic-ref or
  rev-parse, no fall-back chain to break on non-default targets.

- P2 (line 123): "Remove cross-persona suppression rule." The agent
  no longer instructs codex to suppress findings already covered by
  other personas. Synthesis dedupes centrally in Stage 5; the parallel
  subagent has no visibility into other reviewers' outputs anyway.

Pieces:

- agents/ce-codex-reviewer.agent.md — new persona, anchored confidence
  rubric (50/75; 100 never emitted because second-opinion is not direct
  verification), two-phase environment guard (CODEX_SANDBOX/SESSION_ID
  recursion check + which codex availability check), pipe-delimited
  output translation into the structured findings schema.
- skills/ce-code-review/references/persona-catalog.md — codex registered
  as conditional persona EveryInc#9. Selected when the diff is non-trivial
  (>=25 changed non-test lines) or correctness-sensitive (auth,
  payments, data, parsing, concurrency, external API contracts) AND
  the user-level codex CLI is on PATH.
- skills/ce-code-review/SKILL.md — codex added to the cross-cutting
  conditional dispatch table; reviewer count bumped 18 -> 19.
- plugins/compound-engineering/README.md — codex listed in the Review
  agents table (insertion in sorted order).
- tests/review-skill-contract.test.ts — codex added to the anchored-
  rubric persona loop, plus four new contract tests guarding (a)
  frontmatter shape, (b) the no-base-branch-resolution rule with
  explicit P1 regression callouts, (c) the no-cross-persona-suppression
  rule with explicit P2 regression callout, (d) catalog/SKILL.md
  registration, and (e) the two environment guards before any codex
  invocation.

Verification:

- bun test tests/review-skill-contract.test.ts: 28 pass (was 24)
- bun test full suite: 992 pass, 9 pre-existing failures in
  detect-project-type.sh unchanged (present on main before this PR)
- bun run release:validate: clean (52 agents, 35 skills, 0 MCP)

No release-owned files touched: plugin.json, marketplace.json, and
CHANGELOG.md remain untouched per AGENTS.md rules.
@mvanhorn mvanhorn force-pushed the osc/348-codex-reviewer-review-beta branch from f524f46 to a8a5ed2 Compare April 28, 2026 17:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8a5ed2673

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/compound-engineering/agents/ce-codex-reviewer.agent.md Outdated
Comment thread plugins/compound-engineering/agents/ce-codex-reviewer.agent.md Outdated
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Rebased onto current main and re-applied as a single commit (a8a5ed2). Main moved a lot since the original branch — ce-reviewce-code-review, agents flattened to ce-*.agent.md, ce:ce-, and the subagent template now passes a pre-computed diff. The re-apply addresses both still-open Codex findings:

  • P1 (base-branch assumption): the new subagent template passes the diff directly, so ce-codex-reviewer writes it to a tempfile and feeds it to codex exec. No base-branch resolution, no git symbolic-ref / rev-parse fallback chain. A new contract test (tests/review-skill-contract.test.ts) guards the regression.
  • P2 (cross-persona suppression): dropped. The agent runs as an independent parallel subagent and has no visibility into other reviewers; Stage 5 synthesis dedupes centrally. Also guarded by a contract test.

bun run release:validate clean (52 agents now), full suite has the same 9 pre-existing failures as main. Holler if anything looks off.

Addresses both Codex findings on a8a5ed2 (April 2026):

P1 — Emit schema-valid line numbers for file-level findings.
The findings schema requires `line >= 1` (verified in
references/findings-schema.json). The previous prompt told codex to
emit `LINE=0` for file-level issues, which would have been dropped
by the merge validator as malformed before synthesis — silently
losing every file-level Codex finding.

Fix: the codex prompt now requires `line` to be a positive integer
and adds a `file_level` boolean. When `file_level: true`, codex sets
`line: 1` and the agent prepends a "file-level finding (no specific
line applies)" string to the evidence array so synthesis and
downstream surfaces can still distinguish "line 1 was the issue"
from "this is a whole-file concern." The `file_level` signal is
preserved in evidence rather than as a separate field because the
findings schema doesn't expose a top-level file_level flag, and
inventing one would fail the strict-validator path.

P2 — Use structured output so evidence cannot break parsing.
The previous pipe-delimited contract dropped any row that wasn't
exactly five `|`-separated fields. EVIDENCE is a raw code snippet
that can legitimately contain `|` (bitwise OR / union types, shell
pipes, markdown tables, regex alternation). When that happened,
valid findings disappeared silently.

Fix: switch the codex prompt to NDJSON — one JSON object per line.
The agent JSON-parses each line independently; embedded pipes in
evidence are no longer a parsing hazard because JSON quoting handles
them. Lines that fail to parse are skipped (no retry, no inference).

Test coverage:

- New contract test "uses NDJSON output contract so evidence can
  carry pipes safely" guards the prompt format and asserts the
  pipe-delimited shape stays gone.
- New contract test "emits schema-valid line numbers for file-level
  findings" guards the line=1 + file_level=true convention, asserts
  the old "0 means file-level" wording is gone, and reads the
  findings schema to verify minimum=1 still holds (so a future
  schema relaxation triggers a deliberate test update rather than a
  silent drift).

Both tests cite their originating Codex finding inline so future
regressions get a clear pointer to PR EveryInc#356.

bun test tests/review-skill-contract.test.ts: 30 pass (was 28)
bun run release:validate: clean
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: adba28a8f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


## Step 4: Translate codex output into findings

Parse codex's stdout line-by-line as NDJSON. Skip blank lines. Skip the literal `NO_FINDINGS` token. For each remaining line, attempt to JSON-parse it; if parsing fails, skip the line (do not retry, do not infer).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat fully unparsable Codex output as reviewer failure

Step 4 instructs the agent to silently skip every non-JSON line from Codex output, but it never marks the run as degraded when stdout is non-empty and none of it parses. In that case the persona can return an empty findings set that looks like a clean review, even though the cross-model review actually failed to produce structured results (for example if Codex emits prose/refusal text). This masks loss of reviewer coverage and can hide real issues that were expected from this pass.

Useful? React with 👍 / 👎.


## Step 5: Output

Return the findings as JSON matching the contract in the subagent template. Honor the artifact-file write if a Run ID is present: write the full analysis to `/tmp/compound-engineering/ce-code-review/{run_id}/ce-codex-reviewer.json` and return the compact form to the parent.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Write artifact using the reviewer-keyed filename

This hardcodes the artifact filename to ce-codex-reviewer.json, but the same file sets the top-level reviewer key to "codex". The review pipeline’s detail-enrichment steps resolve artifact files by reviewer key ({reviewer_name} / {reviewer}.json), so codex findings can lose why_it_matters and evidence lookups when the file is stored under a different name. Use the reviewer-keyed filename convention to keep downstream matching deterministic.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants