Skip to content

feat(ce-compound): add headless mode:autofix with depth selector#662

Open
benwilson wants to merge 7 commits intoEveryInc:mainfrom
benwilson:feat/ce-compound-autofix
Open

feat(ce-compound): add headless mode:autofix with depth selector#662
benwilson wants to merge 7 commits intoEveryInc:mainfrom
benwilson:feat/ce-compound-autofix

Conversation

@benwilson
Copy link
Copy Markdown

@benwilson benwilson commented Apr 23, 2026

Summary

Adds a headless affordance to /ce-compound for programmatic callers (post-merge hooks, CI, orchestrator pipelines) via two orthogonal tokens:

  • mode:autofix — headless operation. Suppresses all four interactive prompts (mode selection, session-history opt-in, Discoverability consent, "What's next?").
  • depth:lightweight|full|thorough — optional depth selector, defaults to lightweight.
    • depth:lightweight → single-pass, no subagents, no overlap check.
    • depth:full → Phase 1 parallel research, overlap-update, Phase 3 specialized reviewers. Session history off.
    • depth:thorough → Full path plus Session Historian dispatch.

depth: is parsed only when mode:autofix is present. Invalid depth: values halt before any subagent dispatch with ce-compound failed. Reason: unknown depth:<value>. Valid values: lightweight, full, thorough.

Motivation

ce-compound had no headless affordance. Its Execution Strategy enforced four mandatory blocking prompts using the platform's blocking question tool with explicit "Do NOT pre-select a mode. Do NOT skip this prompt" language. A subprocess invocation like claude -p "/ce-compound ..." or codex exec "Run /ce-compound" --ephemeral had no path to completion: the blocking tool errors in headless sessions, the documented fallback is "present options in chat," and there is no user to respond. Result: hang, partial execution, or undefined degradation depending on the harness.

Sibling skills in the same plugin already ship a mode token (ce-compound-refreshmode:autofix; ce-code-reviewmode:autofix, mode:headless, mode:report-only; ce-doc-reviewmode:headless). This PR brings ce-compound in line.

Output contract (for callers)

Autofix output uses stable first-line strings, a Depth: anchor, and labeled fields. The full contract lives at plugins/compound-engineering/skills/ce-compound/references/autofix-output-parse-contract.md (not loaded at runtime — it is for caller authors).

Anchor Shape Meaning
✓ Documentation complete (mode:autofix) success New doc written
✓ Documentation updated (mode:autofix) success Existing doc updated (depth:full|thorough, overlap high)
✓ No documentation written (mode:autofix) no-op Preconditions unmet
Depth: [lightweight|full|thorough] all Execution path taken
File created: / File updated: success Path to the written doc
Overlap detected: updated Path + matched dimensions of the refreshed doc
Track: success bug or knowledge
Category: success Category under docs/solutions/
Refresh candidate: success Older doc that may be stale given this learning
Discoverability recommendation: success Present when AGENTS.md/CLAUDE.md does not surface docs/solutions/
Reason: no-op / refresh block Which precondition failed, or rationale for the refresh candidate
Context considered: no-op Bulleted summary of what was available when the skill declined to write

Testing

  • bun test tests/compound-support-files.test.ts tests/pipeline-review-contract.test.ts — 51 pass / 0 fail.
  • bun run release:validate — clean.
  • bun test overall — 896 pass / 9 fail. The 9 failures are all in tests/skills/ce-polish-beta-project-type.test.ts and predate this branch.

This is a doc-only skill change; existing contract tests cover the shared support files, which are unmodified.

…ded documentation

ce-compound had no headless affordance. Its Execution Strategy
enforces four mandatory blocking prompts (Full-vs-Lightweight
mode selection, session-history opt-in, Discoverability Check
consent, post-completion "What's next?" menu), each using the
platform's blocking question tool with "Never silently skip"
guarantees. This blocked programmatic callers (post-merge hooks,
CI, orchestrator pipelines) — the blocking question tool errors
in headless sessions and the documented fallback is "present in
chat," with no user to respond.

The inconsistency was visible against sibling skills in the same
plugin: ce-compound-refresh ships mode:autofix, ce-code-review
ships mode:autofix/mode:headless/mode:report-only, ce-doc-review
ships mode:headless. ce-compound was the lone holdout.

This PR adds two mode tokens:

- mode:autofix — autofix + Lightweight execution (default headless).
  Low-cost single-pass capture; overlap drift caught later by
  ce-compound-refresh.
- mode:autofix-full — autofix + Full execution (opt-in thorough).
  Phase 1 research subagents, overlap-update, Phase 3 specialized
  reviewers; session-history defaults to off.

Mutually exclusive with a conflict error mirroring ce-code-review.

All four interactive prompts are guarded in both modes. Discoverability
Check becomes a recommendation in the output rather than an instruction-
file edit. "What's next?" menu is suppressed. No-op output shape
explains which precondition blocked when preconditions aren't met.

Learning doc added at docs/solutions/skill-design/compound-skill-improvements.md
covers the design decisions (explicit opt-in, Lightweight as default
depth, two distinct mode tokens vs nested colons, tokenization rule
for the mode:autofix-full prefix ambiguity).

Tests: compound-specific (compound-support-files, pipeline-review-contract)
51/51 pass. release:validate clean. Full test suite: same 896/9 as main
(the 9 failures are pre-existing in tests/skills/ce-polish-beta-project-type.test.ts,
verified by stashing edits and rerunning).
@benwilson benwilson marked this pull request as ready for review April 23, 2026 23:14
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9afa056dbe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated
Phase 2.5's multi-candidate branch unconditionally asked the user
whether to run a targeted refresh, which would block unattended
mode:autofix-full runs (mode:autofix already dodged it via the
lightweight rule, but the guard was implicit). Add an explicit
autofix guard: recommend ce-compound-refresh via the Follow-up line
instead of prompting, and enumerate this prompt in the shared
autofix skip list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@tmchow tmchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. The goal is right and I want to land this, but the execution is roughly five times the size of the sibling pattern it claims to follow, and it crosses a few of the plugin's own authoring rules in plugins/compound-engineering/AGENTS.md. I've flagged the specifics inline.

Headline asks:

  1. Collapse Mode Detection to the shape used by ce-compound-refresh and ce-code-review. One table, one rules block, state each rule once.
  2. Reconsider the two-token design. Orthogonal tokens (mode:autofix plus depth:full) or reusing mode:headless would sidestep the prefix tokenization rule and remove the parallel Use-when blocks.
  3. Unify the four Autofix output templates into one parameterized block, or extract them to references/. Four near-duplicate blocks loaded on every invocation is exactly the case the AGENTS.md extraction rule was written for.
  4. Cut runtime-neutral rationale: the Use-when bullets, the subagent-safety paragraph, the Note: inside the output template, and the caller-parsing instructions.
  5. Drop or radically shrink docs/solutions/skill-design/compound-skill-improvements.md. ce-compound-refresh did not ship a parallel learning doc when it added mode:autofix and I don't want the precedent.

Happy to iterate. Once the size comes down I expect the rest of the review to move quickly.

Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md
Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md
Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated
Comment thread docs/solutions/skill-design/compound-skill-improvements.md Outdated
Copy link
Copy Markdown
Collaborator

@tmchow tmchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments.

…KILL.md

Responds to @tmchow's PR 662 review. Replaces `mode:autofix` /
`mode:autofix-full` with orthogonal `mode:autofix` + optional
`depth:lightweight|full|thorough` (default `lightweight`). Collapses
Mode Detection to sibling shape (one table + one rules block, 23 lines).
Consolidates four ~90-line output templates into one parameterized
success template + one short no-op template. Cuts runtime-neutral
rationale flagged in review (per-variant Use-when bullets, subagent-
safety paragraph, Note: in the output template, parser-contract
paragraph). Moves the parser-facing contract to a new reference file
not loaded at runtime. Deletes docs/solutions/skill-design/compound-
skill-improvements.md — it does not meet the docs/solutions/
institutional-memory bar.

BREAKING CHANGE: removes `mode:autofix-full`. Callers must use
`mode:autofix depth:full` (full path, session history off) or
`mode:autofix depth:thorough` (full path, session history on).
No alias or migration shim — the token was introduced in this PR
and has no merged users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benwilson
Copy link
Copy Markdown
Author

Thanks for the review @tmchow — all five asks addressed in 1988594:

  1. Mode Detection collapsed (line 34) to 23 lines: one two-row table + depth:<preset> bullets + invalid-depth halt + four-bullet Autofix mode rules. Matches the ce-compound-refresh shape.
  2. Token grammar reworked (line 44) to orthogonal mode:autofix + optional depth:lightweight|full|thorough. No prefix trap, no tokenization rule, no conflicting-mode-flags clause.
  3. Output templates consolidated (line 520) to one parameterized success template + one short no-op template (~60 lines combined, stays in-body — 630 × 20% = 126-line extraction threshold).
  4. Runtime-neutral rationale cut (lines 57, 70, 522, 544): per-variant Use-when bullets gone with their parent subsections, subagent-safety paragraph gone, Note: inside the output template gone, parser-contract paragraph moved to references/autofix-output-parse-contract.md (not loaded at runtime; contract also appears in the refreshed PR body for caller authors). "Print the full block" directive survives in the Autofix mode rules block.
  5. Learning doc deletedcompound-skill-improvements.md is gone with no shrunken successor.

Net: -209 / +91 lines in the commit; SKILL.md dropped from 686 to 630. bun test 896/9 (same 9 pre-existing ce-polish-beta-project-type failures); bun run release:validate clean.

One pre-authorized divergence from your line-44 suggestion (follow-up to #discussion_r3140321365): went with three depth presets (lightweight|full|thorough) rather than a binary depth:full. Rationale: depth:thorough is the only way to expose the session-history opt-in to a headless caller without prompting. If you'd rather this collapse to binary, say the word and I'll fold thorough into full (or drop it entirely). Not planning to defend the three-preset choice past one round.

PR body refreshed to match the new grammar and carries the output contract table.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1988594053

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


### Autofix mode rules

- **Skip all user questions.** Never pause — the mode-selection prompt, Phase 2.5 selective-refresh prompt (multi-candidate case), session-history opt-in, Discoverability Check consent, and "What's next?" menu are all suppressed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard autofix from post-run "What's next?" prompt

Autofix mode is documented here as suppressing all user questions, but the Success Output section still has an unconditional "present the What's next? options ... do not continue ... without the user's selection" instruction. In unattended invocations (/ce-compound mode:autofix ...), that conflicting requirement can still trigger a blocking question after writing docs and hang the caller. Add an explicit autofix guard at the post-output routing step so headless runs always terminate without waiting for input.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a8cf13a.

You were right — the Mode Detection suppression alone wasn't strong enough to override the "present the options ... do not end the turn without the user's selection" imperative sitting at the prompt site. A headless claude -p --permission-mode bypassPermissions "/ce-compound mode:autofix depth:thorough ..." run before the fix showed the agent emitting the full numbered menu with a confused inline disclaimer ("Autofix mode was specified, so I'm stopping here rather than prompting"), which is exactly the conflicting-requirements case you flagged.

Added a seven-word guard at the prompt site: "In autofix mode, skip this step." Same test run after the fix returns no "What's next?" header, no options, no prompt — clean termination.

Context: my earlier U2 refactor (commit 1988594) deleted a longer version of this guard that used to sit below the prompt instruction, judging it redundant with the Autofix mode rules block. That was too aggressive — the rule is load-bearing at the point of action even when stated in Mode Detection.


## Mode Detection

Parse `$ARGUMENTS` for `mode:autofix` and an optional `depth:<value>` token. Strip matched tokens; treat the remainder as the brief context hint.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict depth parsing to autofix mode only

These parsing rules treat any depth:<value> token as control input and pair that with a hard failure for unknown values, even when mode:autofix is absent. That means normal interactive calls can fail if the free-form context includes a depth: token (for example from pasted logs or prose) that is not one of the three presets. To avoid regressing interactive usage, only interpret/validate depth: when mode:autofix is present; otherwise leave it in the context hint.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3d032e1. Two minimal edits:

  1. Parse instruction now conditional: Parse $ARGUMENTS for mode:autofix. If present, also parse an optional depth:<value> token. Strip matched tokens; the remainder is the context hint. — depth: is only parsed when autofix is present.

  2. Halt rule scoped: In autofix mode, invalid depth: values halt before subagent dispatch and emit ... — invalid-depth error no longer fires outside autofix.

Verified headless: /ce-compound depth:bogus (no mode:autofix) now enters Interactive mode instead of emitting the halt envelope. The autofix-path halt still fires correctly on /ce-compound mode:autofix depth:bogus.

Net change: 2 insertions / 2 deletions — zero line growth, since both fixes fit on the existing single lines.

benwilson and others added 4 commits April 24, 2026 15:54
…ofix

Phase 1 step 4 was written for interactive mode only ("only if the user
opted in"); it never had an autofix branch. In `mode:autofix
depth:thorough` — which the Mode Detection contract promises dispatches
the Session Historian — the executing agent correctly followed the
literal Phase 1 text and skipped dispatch.

Rewrites the step 4 header and lead bullets to explicitly list both
dispatch and skip conditions with their mode triggers, so the rule at
the dispatch site matches the contract in Mode Detection.

Caught by a headless `claude -p` test run against a scratch repo, which
showed "Session Historian — mode:autofix arg implies skip" even though
depth:thorough was passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior commit (6f409f8) restated the dispatch rule at the Phase 1 step 4
site, which was necessary to fix the skipped-dispatch bug, but included
parenthetical rationale that just repeated what the condition already
said: "(autofix mode — depth:thorough is the one autofix path that
dispatches Session Historian)" and "(the depth:lightweight default and
depth:full both skip Session Historian)". Per AGENTS.md rationale
discipline, rationale that would not change runtime behavior if removed
belongs elsewhere, not in SKILL.md.

Also collapses the paired Dispatch/Skip bullets into one: the Skip
condition was the logical inverse of Dispatch, so two bullets doubled
text without adding runtime-actionable information.

Net: 3 lines of dispatch guidance down to 1; same runtime effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Success Output section still carried an unconditional "present the
'What's next?' options ... Do not continue the workflow or end the turn
without the user's selection" instruction at the prompt site. The
Autofix mode rules block in Mode Detection lists the "What's next?"
menu as suppressed, but that was too weak to override the direct
imperative next to the prompt template, and a headless test run
confirmed agents were emitting the menu anyway.

Adds a four-word guard at the prompt site: "In autofix mode, skip this
step." Load-bearing, no rationale, at the exact location where the
drift was happening.

Addresses EveryInc#662 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior text treated any depth:<value> token as control input regardless
of mode, and the invalid-depth halt rule applied unconditionally. An
interactive call whose free-form context happened to include a depth:
token (pasted logs, prose, error messages) would hard-fail with an
autofix-shaped error envelope even though the user never opted into
autofix routing.

Scopes both parsing and the halt rule to mode:autofix being present.
When mode:autofix is absent, depth: is not parsed, not stripped, not
validated — it stays in the context hint.

Verified headless: /ce-compound depth:bogus (no mode:autofix) now
enters Interactive mode instead of emitting the halt envelope.

Addresses EveryInc#662 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@benwilson
Copy link
Copy Markdown
Author

Follow-ups since 1988594:

  • 6f409f8b — dispatch Session Historian on depth:thorough in autofix. Caught by headless test: Phase 1 step 4 header still said "only if the user opted in," so the autofix dispatch branch never fired even though Mode Detection promised it would.
  • 77d9de66 — tighten the Session Historian dispatch rule per AGENTS.md rationale discipline. Collapses parenthetical restatements 6f409f8 had introduced.
  • a8cf13a4 — guard the "What's next?" prompt at the prompt site. Codex P1 #discussion_r3140572986. Mode Detection suppression alone was too weak to override the direct "present the options" imperative nearby.
  • 3d032e1d — scope depth: parsing and validation to autofix mode. Codex P2 #discussion_r3140572989. Interactive calls whose context happened to contain depth:... no longer hit the invalid-depth halt.

Known drift worth flagging before re-review: in headless claude -p --permission-mode bypassPermissions "/ce-compound mode:autofix depth:full|thorough ..." runs, agents reliably emit ✓ Documentation complete and File created: / File updated: anchors, but frequently drop the (mode:autofix) suffix, Depth:, Track:/Category:, and the structured Phase 1 subagents: / Phase 3 specialized reviewers: blocks — synthesizing their own summary with improvised section names instead. I tested six SKILL.md-text interventions (stronger imperatives, extraction to references/, explicit tool-call directives, structural marker changes) and none produced reliable contract compliance. Looks like a pattern that doesn't yield to modest text changes — a reliable fix would likely need subagent dispatch for the output step or a structured output format (YAML/JSON), both beyond this PR's review scope. Downstream parsers keying on the reliable anchors (first-line string + File created: / File updated:) will work; those depending on the unreliable anchors won't. Flagging so it isn't a surprise if you test headlessly — happy to address in a follow-up PR if it matters.

@benwilson benwilson requested a review from tmchow April 24, 2026 23:56
@tmchow tmchow changed the title feat(ce-compound): add mode:autofix and mode:autofix-full for unattended documentation feat(ce-compound): add headless mode:autofix with depth selector Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@tmchow tmchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwilson I updated the PR description to be better as your agent was weirdly making it into a changelog which wasn't necessary.

I'd like to change the mode values to lightweight, standard and deep to match what we do in other skills (vs lightweight, full, and thorough)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants