fix(ce-code-review): replace LFG with best-judgment auto-resolve by tmchow · Pull Request #685 · EveryInc/compound-engineering-plugin

tmchow · 2026-04-25T09:01:45Z

Summary

/ce-code-review had a routing option (formerly "LFG") meant for users who wanted the agent to just go — but it filed tickets for findings the agent could fix. On a 13-finding branch, 8 ended up in the file-tickets bucket; when prompted, the agent could propose concrete fixes for all 8 without external context. This PR makes "go fix things" actually do that, drops the approval gates that ran before any action, and renames the path to "Auto-resolve with best judgment" so the label matches what it does (and stops colliding with the /lfg macro skill in this repo).

What changed

Three structural shifts work together:

Personas commit to a fix when they can defend one. The subagent template now treats suggested_fix as the authoritative signal for "the agent can act on this." Personas propose a defensible fix whenever one is reachable from review context (parallel patterns, framework conventions, the cited code itself). Imperfect information is not grounds for omission — propose the most defensible default and name the assumption. manual findings without suggested_fix are reserved for genuinely handoff-only cases (no defensible code change, or purely organizational actions like legal sign-off).

Option B applies fixes directly; no preview gate. Stage 5b validator pre-pass and bulk-preview approval no longer run on this path. The fixer dispatches immediately on the full pending action set (gated_auto + manual + advisory) and returns a {applied, failed, advisory} partition. The fixer's apply/fail outcome is the validation. Items that fail get a one-line reason (fix did not apply cleanly, evidence no longer matches code at <file:line>, no fix proposed by reviewer). One post-run question fires only when failed is non-empty (file tickets / walk through / ignore), fired before the completion report so the report reflects final state.

Renamed: LFG → best-judgment auto-resolve. User-facing label: Auto-resolve with best judgment — apply per-finding fixes the agent can defend, surface the rest. Walk-through option D mirrors it: Auto-resolve with best judgment on the rest. Internal naming uses "best-judgment path" (not "auto-resolve path") to avoid lexical collision with mode:autofix — the two live at different structural levels (autofix is a top-level mode; best-judgment is a routing option within interactive mode), and naming them similarly would invite model confusion in routing tables and prose.

Other paths are unchanged: walk-through option A inherits the action-derivation rule automatically (per-finding recommendations flip from Defer to Apply when the persona proposed a fix); option C (file tickets) keeps Stage 5b and bulk-preview because filing produces durable external state worth previewing; autofix and headless retain their existing single-pass safe_auto contracts.

Why this matters

On uncommitted local edits — where this path runs — reverting an applied edit is cheaper than closing 8 GitHub issues that should never have been filed. The previous defer-bias inverted that cost calculus by treating ticket-filing as the safer default.

Calibration

Whether personas actually populate suggested_fix more often than before is empirical and won't be answered until real review runs land. The new template wording is strong and explicit about the failure mode, but if runs show under-compliance, references/subagent-template.md is where to tune. The richer payload also surfaces a separate question about whether safe_auto itself is undersized — tracked separately in #686.

Test plan

tests/review-skill-contract.test.ts updated with assertions for the new flow contract: no Stage 5b on the best-judgment path, no bulk-preview gate on best-judgment, post-run failure-handling question structure, heterogeneous fixer queue, per-class failure reason taxonomy. Assertions match against the new labels (Auto-resolve with best judgment, best-judgment routing (option B), (B)), not the old LFG strings.
908/908 tests pass; bun run release:validate clean.

References

Plan: docs/plans/2026-04-25-001-fix-ce-code-review-lfg-defer-bias-plan.md
Follow-up: Calibrate safe_auto vs gated_auto boundary in ce-code-review persona output #686 (calibrate safe_auto vs gated_auto boundary)
Follow-up: Rename LFG path to best-judgment in ce-doc-review (parallel to PR #685) #688 (parallel rename for ce-doc-review's LFG terminology)
Prior plan that introduced Stage 5b: docs/plans/2026-04-21-002-refactor-ce-code-review-precision-and-validation-plan.md
Prior plan that introduced the routing question and bulk-preview: docs/plans/2026-04-17-002-feat-ce-review-interactive-judgment-plan.md

LFG previously routed too many findings to ticket-filing when the agent could propose concrete fixes from review context. Stage 5b validation and bulk-preview approval ran before any action, requiring upfront research the user already opted out of by picking LFG. Push personas to commit to a `suggested_fix` whenever any defensible code change is reachable from the diff -- imperfect information is not grounds for omission. `manual` findings with `suggested_fix` now recommend Apply (was Defer). Drop Stage 5b and bulk-preview on the LFG path; dispatch the fixer immediately on the full pending set. Items that fail to apply or lack a proposed fix surface in a `failed` bucket with a one-line reason; one post-run question (file tickets / walk through / ignore) handles them only when the bucket is non-empty, fired before the completion report so it reflects final state. Walk-through option A inherits the action-derivation change automatically -- per-finding recommendations flip from Defer to Apply when the persona proposed a fix. `LFG the rest` from inside the walk-through dispatches one fixer pass on the union of accumulated Apply set and remaining undecided findings, preserving the "one fixer, consistent tree" contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1e36e990b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Address PR #685 review feedback: - Step 3 fixer queue contract previously defined behavior for gated_auto/manual WITH suggested_fix and manual WITHOUT, but left gated_auto WITHOUT suggested_fix undefined. Extend the no-fix branch to cover both gated_auto and manual so they route to failed instead of being silently skipped, preserving the apply-or-fail contract. - walkthrough.md still documented LFG-the-rest -> Proceed/Cancel and end-of-loop dispatch semantics from the removed bulk-preview path. Reconcile: scope end-of-walk-through dispatch to the run-to-completion path and explicitly point to the LFG-the-rest bullet for that path's single-dispatch semantics. Remove stale Proceed/Cancel framings and the bulk-preview reference in the no-sink adaptation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 530560a3eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ollision The interactive routing option B was labeled "LFG" in user-facing prose and "the LFG path" in skill internals. That name collides with the /lfg macro skill in this repo, creating confusion when an agent reads the ce-code-review skill while also having /lfg in scope. Rename: - User-facing menu (option B): "LFG. Apply the agent's best-judgment action per finding" -> "Auto-resolve with best judgment -- apply per-finding fixes the agent can defend, surface the rest" - User-facing per-finding option D: "LFG the rest" -> "Auto-resolve with best judgment on the rest" - Internal naming: "LFG path" / "LFG routing (option B)" -> "best-judgment path" / "best-judgment routing (option B)" - Routing tables, Step 3 fixer queue prose, completion-report bucket list, and other internal references updated to match - Test assertions updated against the new labels The user-facing label leads with "Auto-resolve" so the action is clear on scan; "with best judgment" qualifies how decisions are made. The internal name uses "best-judgment" (not "auto-resolve") to avoid lexical collision with the existing mode:autofix flag, since the two do different things and live at different structural levels -- autofix is a top-level mode while best-judgment is a routing option within interactive mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33e00c4047

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…dering Address PR #685 review feedback: - walkthrough.md: add a "No suggested_fix (Apply suppressed)" adaptation so the entry-point menu omits Apply for findings without a concrete suggested_fix. Stage 5 step 6b already maps these to a Defer recommendation; this matches the suppression applied during the post-run "Walk through these one at a time" re-entry path so the same handling applies regardless of which entry path the user came in through. - SKILL.md Step 3: extend the homogeneous queue contract with a defensive backstop. If a no-suggested_fix finding ever slips into the walk-through Apply set, the fixer routes it to failed with reason "no fix proposed by reviewer", mirroring the heterogeneous queue's apply-or-fail handling. Autofix and headless callers are unaffected (they only ever process safe_auto items). - SKILL.md Step 3: update the "Best-judgment path is single-pass" paragraph to reference Step 2 option B's two-branch ordering (failed empty -> emit report directly; non-empty -> question first, then execute, then report). Removes a contradiction the prior reconciliation commit (530560a) didn't catch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…re-reason taxonomy Two polish fixes surfaced by interactive eval runs of the best-judgment path: - Sink-omission stem text ("tracker sink", "on this platform") was being copied verbatim from spec `e.g.` examples into user-facing output. The phrasing is jargon-y and incorrectly implies a per-agent-platform constraint when the missing piece is per-project. Replaced the three `e.g.` strings (SKILL.md options C and post-run failure handler, walkthrough.md no-sink adaptation) with description-not-prescription guidance that names what to convey (no issue tracker configured for this checkout) and explicitly bans the bad phrasing so the model doesn't regenerate it. - Failure-reason taxonomy in Step 3 was specified as four canonical strings, but real runs produced richer, finding-specific reasons (e.g., "needs intent confirmation; was the field narrowing deliberate, or do clients still need the full payload?") that read better in the post-run question framing. Reframed each canonical string as "default phrasing when nothing more specific applies" and added a worked example showing the level of context-specific detail to prefer. Routing categories stay fixed. Tests: 908/908 pass. Contract assertions still match the canonical phrases (now serving as defaults). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f20954e2ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T01:15:50Z


 ### Adaptations

+- **No `suggested_fix` (Apply suppressed):** when the finding has no concrete `suggested_fix` (`gated_auto` or `manual` with `suggested_fix == null`), option A (`Apply`) is **omitted from the menu**. Stage 5 step 6b already maps these to a `Defer` recommendation, so the `(recommended)` marker lands on a still-visible option. The menu shows three options: `Defer` / `Skip` / `Auto-resolve with best judgment on the rest` (and reduces to `Skip` / `Auto-resolve with best judgment on the rest` when combined with the no-sink adaptation). When this combines with the advisory variant, the same suppression is moot because option A is already replaced with `Acknowledge`. This rule mirrors the suppression applied during `SKILL.md` Step 2 Interactive option B's post-run `Walk through these one at a time` re-entry, so the same handling applies regardless of which entry path the user came in through.


Handle no-suggested_fix in N=1/no-sink adaptation

The new No suggested_fix (Apply suppressed) rule says Apply should be omitted, but the menu logic still has a separate N=1 + no sink case that keeps Apply / Skip. When a single remaining finding has no suggested_fix and no tracker sink is available (a reachable post-run failed-item scenario), these adaptations conflict and can re-enable Apply for an item the fixer cannot safely execute, violating the intended apply-or-fail guard.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T01:15:50Z

+- **Heterogeneous queue (best-judgment path — interactive option B and walk-through's `Auto-resolve with best judgment on the rest`):** the queue mixes `gated_auto`, `manual`, and `advisory` findings. Each item carries: `autofix_class`, `severity`, `file:line`, `title`, `suggested_fix` (may be null), `why_it_matters`, and `evidence`. The fixer routes each item to one of four buckets — the routing categories are fixed; the failure *reason string* should be specific enough that the post-run question's framing (`N findings could not be auto-resolved...`) reads meaningfully to the user. Use the category's default phrasing below when nothing more specific applies; prefer richer, finding-specific reasons that capture *why this particular item didn't land* (e.g., `needs intent confirmation; was the field narrowing deliberate, or do clients still need the full payload?` is more useful than the generic default).
+  - **`safe_auto` / `gated_auto` / `manual` with `suggested_fix`:** light evidence-match check (verify the cited code at `file:line` still resembles the persona's evidence — concretely: at least one identifier or distinctive token from the evidence appears at the cited location, and the line has not been deleted). If the check passes, attempt to apply the fix. On clean apply, route to `applied`. On fix-application failure (line moved, conflicting edit, syntax issue), route to `failed` with a concrete reason — default phrasing `fix did not apply cleanly: <error>` when no richer description fits.


Define fallback when heterogeneous queue lacks evidence payload

This queue contract now requires why_it_matters and evidence on every best-judgment item and makes routing depend on an evidence-match check, but the overall review flow still allows persona artifact writes to fail and proceed with compact returns only. In that documented write-failure path, Option B has no guaranteed evidence source, so the fixer behavior is undefined (it cannot reliably perform the required evidence gate). Add an explicit fallback for missing artifact detail to keep best-judgment routing deterministic.

Useful? React with 👍 / 👎.

…omment truncation Codex review on PR #695 caught that `related_pr: PR #685 ...` in the calibration writeup parses as just `'PR'` — YAML treats unquoted ' #' as a comment delimiter, so everything from the `#` onward was silently dropped. Any tooling that indexes or renders `related_pr` would lose the linkage. Fix: quote the value. Repo-wide sweep for the same pattern in docs/ frontmatter found one other instance with the same risk (and an additional unquoted-colon issue compounding it): docs/plans/2026-04-16-001-fix-ce-polish-beta-detection-gaps-plan.md. Quoted that one too while in here. Verified: both files now parse correctly via yaml.safe_load. Re-sweep across docs/ frontmatter shows zero remaining ' #' instances. Tests: 910/910 pass. release:validate clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ryInc#685) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated

Comment thread plugins/compound-engineering/skills/ce-code-review/references/walkthrough.md Outdated

tmchow mentioned this pull request Apr 25, 2026

Calibrate safe_auto vs gated_auto boundary in ce-code-review persona output #686

Closed

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated

tmchow mentioned this pull request Apr 25, 2026

Rename LFG path to best-judgment in ce-doc-review (parallel to PR #685) #688

Closed

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated

tmchow changed the title ~~fix(ce-code-review): drop LFG defer-bias and approval gates~~ fix(ce-code-review): replace LFG with best-judgment auto-resolve Apr 25, 2026

tmchow and others added 2 commits April 25, 2026 15:45

tmchow merged commit 9ba41a1 into main Apr 26, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 26, 2026

chore: release main #684

Merged

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-code-review): replace LFG with best-judgment auto-resolve#685

fix(ce-code-review): replace LFG with best-judgment auto-resolve#685
tmchow merged 5 commits intomainfrom
tmchow/lfg-defer-bias

tmchow commented Apr 25, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		### Adaptations

		- No `suggested_fix` (Apply suppressed): when the finding has no concrete `suggested_fix` (`gated_auto` or `manual` with `suggested_fix == null`), option A (`Apply`) is omitted from the menu. Stage 5 step 6b already maps these to a `Defer` recommendation, so the `(recommended)` marker lands on a still-visible option. The menu shows three options: `Defer` / `Skip` / `Auto-resolve with best judgment on the rest` (and reduces to `Skip` / `Auto-resolve with best judgment on the rest` when combined with the no-sink adaptation). When this combines with the advisory variant, the same suppression is moot because option A is already replaced with `Acknowledge`. This rule mirrors the suppression applied during `SKILL.md` Step 2 Interactive option B's post-run `Walk through these one at a time` re-entry, so the same handling applies regardless of which entry path the user came in through.

		- Heterogeneous queue (best-judgment path — interactive option B and walk-through's `Auto-resolve with best judgment on the rest`): the queue mixes `gated_auto`, `manual`, and `advisory` findings. Each item carries: `autofix_class`, `severity`, `file:line`, `title`, `suggested_fix` (may be null), `why_it_matters`, and `evidence`. The fixer routes each item to one of four buckets — the routing categories are fixed; the failure reason string should be specific enough that the post-run question's framing (`N findings could not be auto-resolved...`) reads meaningfully to the user. Use the category's default phrasing below when nothing more specific applies; prefer richer, finding-specific reasons that capture why this particular item didn't land (e.g., `needs intent confirmation; was the field narrowing deliberate, or do clients still need the full payload?` is more useful than the generic default).
		- `safe_auto` / `gated_auto` / `manual` with `suggested_fix`: light evidence-match check (verify the cited code at `file:line` still resembles the persona's evidence — concretely: at least one identifier or distinctive token from the evidence appears at the cited location, and the line has not been deleted). If the check passes, attempt to apply the fix. On clean apply, route to `applied`. On fix-application failure (line moved, conflicting edit, syntax issue), route to `failed` with a concrete reason — default phrasing `fix did not apply cleanly: <error>` when no richer description fits.

Conversation

tmchow commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why this matters

Calibration

Test plan

References

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Apr 25, 2026 •

edited

Loading