Skip to content

fix(ce-code-review): replace LFG with best-judgment auto-resolve#685

Merged
tmchow merged 5 commits intomainfrom
tmchow/lfg-defer-bias
Apr 26, 2026
Merged

fix(ce-code-review): replace LFG with best-judgment auto-resolve#685
tmchow merged 5 commits intomainfrom
tmchow/lfg-defer-bias

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 25, 2026

Summary

/ce-code-review had a routing option (formerly "LFG") meant for users who wanted the agent to just go — but it filed tickets for findings the agent could fix. On a 13-finding branch, 8 ended up in the file-tickets bucket; when prompted, the agent could propose concrete fixes for all 8 without external context. This PR makes "go fix things" actually do that, drops the approval gates that ran before any action, and renames the path to "Auto-resolve with best judgment" so the label matches what it does (and stops colliding with the /lfg macro skill in this repo).

What changed

Three structural shifts work together:

Personas commit to a fix when they can defend one. The subagent template now treats suggested_fix as the authoritative signal for "the agent can act on this." Personas propose a defensible fix whenever one is reachable from review context (parallel patterns, framework conventions, the cited code itself). Imperfect information is not grounds for omission — propose the most defensible default and name the assumption. manual findings without suggested_fix are reserved for genuinely handoff-only cases (no defensible code change, or purely organizational actions like legal sign-off).

Option B applies fixes directly; no preview gate. Stage 5b validator pre-pass and bulk-preview approval no longer run on this path. The fixer dispatches immediately on the full pending action set (gated_auto + manual + advisory) and returns a {applied, failed, advisory} partition. The fixer's apply/fail outcome is the validation. Items that fail get a one-line reason (fix did not apply cleanly, evidence no longer matches code at <file:line>, no fix proposed by reviewer). One post-run question fires only when failed is non-empty (file tickets / walk through / ignore), fired before the completion report so the report reflects final state.

Renamed: LFG → best-judgment auto-resolve. User-facing label: Auto-resolve with best judgment — apply per-finding fixes the agent can defend, surface the rest. Walk-through option D mirrors it: Auto-resolve with best judgment on the rest. Internal naming uses "best-judgment path" (not "auto-resolve path") to avoid lexical collision with mode:autofix — the two live at different structural levels (autofix is a top-level mode; best-judgment is a routing option within interactive mode), and naming them similarly would invite model confusion in routing tables and prose.

Other paths are unchanged: walk-through option A inherits the action-derivation rule automatically (per-finding recommendations flip from Defer to Apply when the persona proposed a fix); option C (file tickets) keeps Stage 5b and bulk-preview because filing produces durable external state worth previewing; autofix and headless retain their existing single-pass safe_auto contracts.

Why this matters

On uncommitted local edits — where this path runs — reverting an applied edit is cheaper than closing 8 GitHub issues that should never have been filed. The previous defer-bias inverted that cost calculus by treating ticket-filing as the safer default.

Calibration

Whether personas actually populate suggested_fix more often than before is empirical and won't be answered until real review runs land. The new template wording is strong and explicit about the failure mode, but if runs show under-compliance, references/subagent-template.md is where to tune. The richer payload also surfaces a separate question about whether safe_auto itself is undersized — tracked separately in #686.

Test plan

  • tests/review-skill-contract.test.ts updated with assertions for the new flow contract: no Stage 5b on the best-judgment path, no bulk-preview gate on best-judgment, post-run failure-handling question structure, heterogeneous fixer queue, per-class failure reason taxonomy. Assertions match against the new labels (Auto-resolve with best judgment, best-judgment routing (option B), (B)), not the old LFG strings.
  • 908/908 tests pass; bun run release:validate clean.

References


Compound Engineering
Claude Code

LFG previously routed too many findings to ticket-filing when the
agent could propose concrete fixes from review context. Stage 5b
validation and bulk-preview approval ran before any action, requiring
upfront research the user already opted out of by picking LFG.

Push personas to commit to a `suggested_fix` whenever any defensible
code change is reachable from the diff -- imperfect information is
not grounds for omission. `manual` findings with `suggested_fix` now
recommend Apply (was Defer). Drop Stage 5b and bulk-preview on the
LFG path; dispatch the fixer immediately on the full pending set.
Items that fail to apply or lack a proposed fix surface in a `failed`
bucket with a one-line reason; one post-run question (file tickets /
walk through / ignore) handles them only when the bucket is
non-empty, fired before the completion report so it reflects final
state.

Walk-through option A inherits the action-derivation change
automatically -- per-finding recommendations flip from Defer to Apply
when the persona proposed a fix. `LFG the rest` from inside the
walk-through dispatches one fixer pass on the union of accumulated
Apply set and remaining undecided findings, preserving the
"one fixer, consistent tree" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1e36e990b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-code-review/references/walkthrough.md Outdated
Address PR #685 review feedback:

- Step 3 fixer queue contract previously defined behavior for
  gated_auto/manual WITH suggested_fix and manual WITHOUT, but left
  gated_auto WITHOUT suggested_fix undefined. Extend the no-fix branch
  to cover both gated_auto and manual so they route to failed instead
  of being silently skipped, preserving the apply-or-fail contract.

- walkthrough.md still documented LFG-the-rest -> Proceed/Cancel and
  end-of-loop dispatch semantics from the removed bulk-preview path.
  Reconcile: scope end-of-walk-through dispatch to the run-to-completion
  path and explicitly point to the LFG-the-rest bullet for that path's
  single-dispatch semantics. Remove stale Proceed/Cancel framings and
  the bulk-preview reference in the no-sink adaptation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 530560a3eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
…ollision

The interactive routing option B was labeled "LFG" in user-facing prose
and "the LFG path" in skill internals. That name collides with the /lfg
macro skill in this repo, creating confusion when an agent reads the
ce-code-review skill while also having /lfg in scope.

Rename:

- User-facing menu (option B): "LFG. Apply the agent's best-judgment
  action per finding" -> "Auto-resolve with best judgment -- apply
  per-finding fixes the agent can defend, surface the rest"
- User-facing per-finding option D: "LFG the rest" -> "Auto-resolve
  with best judgment on the rest"
- Internal naming: "LFG path" / "LFG routing (option B)" ->
  "best-judgment path" / "best-judgment routing (option B)"
- Routing tables, Step 3 fixer queue prose, completion-report bucket
  list, and other internal references updated to match
- Test assertions updated against the new labels

The user-facing label leads with "Auto-resolve" so the action is clear
on scan; "with best judgment" qualifies how decisions are made. The
internal name uses "best-judgment" (not "auto-resolve") to avoid
lexical collision with the existing mode:autofix flag, since the two
do different things and live at different structural levels --
autofix is a top-level mode while best-judgment is a routing option
within interactive mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33e00c4047

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
@tmchow tmchow changed the title fix(ce-code-review): drop LFG defer-bias and approval gates fix(ce-code-review): replace LFG with best-judgment auto-resolve Apr 25, 2026
tmchow and others added 2 commits April 25, 2026 15:45
…dering

Address PR #685 review feedback:

- walkthrough.md: add a "No suggested_fix (Apply suppressed)" adaptation
  so the entry-point menu omits Apply for findings without a concrete
  suggested_fix. Stage 5 step 6b already maps these to a Defer
  recommendation; this matches the suppression applied during the
  post-run "Walk through these one at a time" re-entry path so the
  same handling applies regardless of which entry path the user came
  in through.
- SKILL.md Step 3: extend the homogeneous queue contract with a
  defensive backstop. If a no-suggested_fix finding ever slips into
  the walk-through Apply set, the fixer routes it to failed with
  reason "no fix proposed by reviewer", mirroring the heterogeneous
  queue's apply-or-fail handling. Autofix and headless callers are
  unaffected (they only ever process safe_auto items).
- SKILL.md Step 3: update the "Best-judgment path is single-pass"
  paragraph to reference Step 2 option B's two-branch ordering
  (failed empty -> emit report directly; non-empty -> question first,
  then execute, then report). Removes a contradiction the prior
  reconciliation commit (530560a) didn't catch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re-reason taxonomy

Two polish fixes surfaced by interactive eval runs of the best-judgment path:

- Sink-omission stem text ("tracker sink", "on this platform") was being copied
  verbatim from spec `e.g.` examples into user-facing output. The phrasing is
  jargon-y and incorrectly implies a per-agent-platform constraint when the
  missing piece is per-project. Replaced the three `e.g.` strings (SKILL.md
  options C and post-run failure handler, walkthrough.md no-sink adaptation)
  with description-not-prescription guidance that names what to convey
  (no issue tracker configured for this checkout) and explicitly bans the
  bad phrasing so the model doesn't regenerate it.

- Failure-reason taxonomy in Step 3 was specified as four canonical strings,
  but real runs produced richer, finding-specific reasons (e.g., "needs intent
  confirmation; was the field narrowing deliberate, or do clients still need
  the full payload?") that read better in the post-run question framing.
  Reframed each canonical string as "default phrasing when nothing more
  specific applies" and added a worked example showing the level of
  context-specific detail to prefer. Routing categories stay fixed.

Tests: 908/908 pass. Contract assertions still match the canonical phrases
(now serving as defaults).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tmchow tmchow merged commit 9ba41a1 into main Apr 26, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 26, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f20954e2ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


### Adaptations

- **No `suggested_fix` (Apply suppressed):** when the finding has no concrete `suggested_fix` (`gated_auto` or `manual` with `suggested_fix == null`), option A (`Apply`) is **omitted from the menu**. Stage 5 step 6b already maps these to a `Defer` recommendation, so the `(recommended)` marker lands on a still-visible option. The menu shows three options: `Defer` / `Skip` / `Auto-resolve with best judgment on the rest` (and reduces to `Skip` / `Auto-resolve with best judgment on the rest` when combined with the no-sink adaptation). When this combines with the advisory variant, the same suppression is moot because option A is already replaced with `Acknowledge`. This rule mirrors the suppression applied during `SKILL.md` Step 2 Interactive option B's post-run `Walk through these one at a time` re-entry, so the same handling applies regardless of which entry path the user came in through.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle no-suggested_fix in N=1/no-sink adaptation

The new No suggested_fix (Apply suppressed) rule says Apply should be omitted, but the menu logic still has a separate N=1 + no sink case that keeps Apply / Skip. When a single remaining finding has no suggested_fix and no tracker sink is available (a reachable post-run failed-item scenario), these adaptations conflict and can re-enable Apply for an item the fixer cannot safely execute, violating the intended apply-or-fail guard.

Useful? React with 👍 / 👎.

Comment on lines +783 to +784
- **Heterogeneous queue (best-judgment path — interactive option B and walk-through's `Auto-resolve with best judgment on the rest`):** the queue mixes `gated_auto`, `manual`, and `advisory` findings. Each item carries: `autofix_class`, `severity`, `file:line`, `title`, `suggested_fix` (may be null), `why_it_matters`, and `evidence`. The fixer routes each item to one of four buckets — the routing categories are fixed; the failure *reason string* should be specific enough that the post-run question's framing (`N findings could not be auto-resolved...`) reads meaningfully to the user. Use the category's default phrasing below when nothing more specific applies; prefer richer, finding-specific reasons that capture *why this particular item didn't land* (e.g., `needs intent confirmation; was the field narrowing deliberate, or do clients still need the full payload?` is more useful than the generic default).
- **`safe_auto` / `gated_auto` / `manual` with `suggested_fix`:** light evidence-match check (verify the cited code at `file:line` still resembles the persona's evidence — concretely: at least one identifier or distinctive token from the evidence appears at the cited location, and the line has not been deleted). If the check passes, attempt to apply the fix. On clean apply, route to `applied`. On fix-application failure (line moved, conflicting edit, syntax issue), route to `failed` with a concrete reason — default phrasing `fix did not apply cleanly: <error>` when no richer description fits.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Define fallback when heterogeneous queue lacks evidence payload

This queue contract now requires why_it_matters and evidence on every best-judgment item and makes routing depend on an evidence-match check, but the overall review flow still allows persona artifact writes to fail and proceed with compact returns only. In that documented write-failure path, Option B has no guaranteed evidence source, so the fixer behavior is undefined (it cannot reliably perform the required evidence gate). Add an explicit fallback for missing artifact detail to keep best-judgment routing deterministic.

Useful? React with 👍 / 👎.

tmchow added a commit that referenced this pull request Apr 26, 2026
…omment truncation

Codex review on PR #695 caught that `related_pr: PR #685 ...` in the
calibration writeup parses as just `'PR'` — YAML treats unquoted ' #' as
a comment delimiter, so everything from the `#` onward was silently
dropped. Any tooling that indexes or renders `related_pr` would lose
the linkage.

Fix: quote the value. Repo-wide sweep for the same pattern in docs/
frontmatter found one other instance with the same risk (and an
additional unquoted-colon issue compounding it):
docs/plans/2026-04-16-001-fix-ce-polish-beta-detection-gaps-plan.md.
Quoted that one too while in here.

Verified: both files now parse correctly via yaml.safe_load. Re-sweep
across docs/ frontmatter shows zero remaining ' #' instances.

Tests: 910/910 pass. release:validate clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michaelvolz pushed a commit to michaelvolz/compound-engineering-plugin-windows-version that referenced this pull request Apr 28, 2026
…ryInc#685)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant