feat(ce-polish-beta): human-in-the-loop polish phase between /ce:review and merge#568
feat(ce-polish-beta): human-in-the-loop polish phase between /ce:review and merge#568kieranklaassen merged 7 commits intomainfrom
Conversation
Adds /ce:polish-beta — the polish phase that runs AFTER /ce:review but BEFORE merge. Starts a dev server from .claude/launch.json, generates a user-testable checklist, lets the user mark items for fixes, and dispatches polish sub-agents in parallel. Emits stacked-PR seeds for oversized items and a replan seed when the whole batch is too big to polish as one unit. Five phases: input triage, branch/PR acquisition, entry gate, dev-server lifecycle, checklist+size-gate+dispatch, envelope. Headless mode mirrors ce:review's output envelope shape and terminates with `Polish complete`. Cross-cutting changes: - ce:review now writes metadata.json alongside findings so polish can verify the artifact matches the current branch and HEAD. - Tests pin the intentional resolve-base.sh duplication between ce-review and ce-polish-beta (self-contained skill rule). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c8cc7cb97
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Run git diff to get the file list. Status `D` (deleted) files are still | ||
| # polish surfaces -- their removal is part of what the user tests. | ||
| FILES=$(git diff --name-only "${BASE_REF}...HEAD" 2>/dev/null || true) |
There was a problem hiding this comment.
Stop treating diff failures as empty polish scope
This swallows git diff errors and then interprets the empty result as [], which makes polish think there is nothing to do. If BASE_REF is unresolved in the local checkout (for example, only origin/main exists but BASE_REF is main), the script exits successfully and Phase 4 can skip polish entirely with a false "no diff" outcome. Fail fast on diff errors instead of converting them into an empty file list.
Useful? React with 👍 / 👎.
| MATCH=$(jq -c --arg name "$REQUESTED_NAME" '.configurations[] | select(.name == $name)' "$LAUNCH_PATH") | ||
| if [ -z "$MATCH" ]; then |
There was a problem hiding this comment.
Detect duplicate launch config name matches
When launch.json contains duplicate configuration names, this query returns multiple objects and the script prints them as multiple JSON lines, even though callers expect one selected config object. In that case downstream parsing/selection is ambiguous and can launch the wrong runtime command. The script should treat duplicate-name matches as an explicit error sentinel (or require uniqueness) rather than returning multi-object output.
Useful? React with 👍 / 👎.
|
Awesome. Love this direction per our slack convos. Some feedback and thoughts: Web server detection: These all fall through to unknown (prompt) for:
For the stated purpose (polish a PR’s UX), at minimum Nuxt/SvelteKit/Remix/Astro feel like table stakes alongside Next and Vite? Monorepo blind spot?
The cwd escape hatch in launch.json only helps users who already know to author one. Perhaps consider a shallow Package manager detection I think this is documented but not implemented? Next/Vite stubs will always be written as Perhaps either add a Port cascade heuristics are lossy
|
Address PR #568 feedback from @tmchow: 1. Framework coverage: add Nuxt, Astro, Remix, SvelteKit signatures to detect-project-type.sh with recipe reference files for each. 2. Monorepo detection: when root detection finds zero matches, probe subdirectories up to depth 3 for framework signatures. Output grammar extended with type@dir and multiple:type@dir,type@dir forms. 3. Package-manager detection: new resolve-package-manager.sh script with lockfile priority cascade (pnpm > yarn > bun > npm). 4. Port resolution: new resolve-port.sh with 8-probe cascade replacing the inline bash in SKILL.md. Adds framework config parsing, .env quote/comment stripping, and removes unreliable AGENTS.md grep. Includes review-autofix: empty-string guard on framework config port validation (P0), compact -p3000 Procfile syntax support (P1). 128 tests pass (72 new across 3 test files). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aba8b9da56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| exit 0 | ||
| fi | ||
|
|
||
| CONFIG_COUNT=$(jq '(.configurations // []) | length' "$LAUNCH_PATH") |
There was a problem hiding this comment.
Validate launch.json configurations is an array
CONFIG_COUNT uses (.configurations // []) | length, which treats non-array values as valid counts. When configurations is mistakenly an object, the script enters the multi-config branch and jq '[.configurations[].name]' errors, so callers receive __MULTIPLE_CONFIGS__ without the required second-line JSON array. That breaks the documented output contract and prevents the selection flow from handling malformed-but-common launch.json shapes gracefully.
Useful? React with 👍 / 👎.
|
|
||
| # Views / components / templates | ||
| case "$path" in | ||
| app/views/*|app/components/*|app/javascript/components/*|src/components/*|components/*|pages/*|src/pages/*|src/app/*|app/*/page.*|app/*/layout.*) |
There was a problem hiding this comment.
Classify Next.js app router root pages as view surface
The view matcher only includes app/*/page.* and app/*/layout.*, which misses common Next.js App Router paths like app/page.tsx (and deeper nested routes such as app/blog/[slug]/page.tsx). Those files are currently labeled other, so polish defaults can treat core UI pages as non-dispatchable instead of view work, causing user-facing polish items to be under-prioritized by the generated checklist.
Useful? React with 👍 / 👎.
…ding - Combine two sed forks into one in parse_env_port (3 forks -> 2) - Eliminate double-grep on framework config files (read file once) - Remove WHAT-narrating comments, keep only the WHY comment - Fix Unicode arrows in dev-server-next.md to ASCII (AGENTS.md convention) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 528f2df05b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| --type) | ||
| PROJ_TYPE="${2:-}" | ||
| shift 2 |
There was a problem hiding this comment.
Reject missing value for --type flag
The argument parser assigns PROJ_TYPE="${2:-}" and then unconditionally runs shift 2; if the caller passes --type without a value, shift fails and positional parameters are not consumed, so the while [ $# -gt 0 ] loop never terminates. In practice bash scripts/resolve-port.sh --type hangs instead of returning a usage error, which can deadlock polish automation when flags are malformed.
Useful? React with 👍 / 👎.
| compose_file="$PROJECT_ROOT/docker-compose.yml" | ||
| if [ -f "$compose_file" ]; then | ||
| # Simple line-anchored grep for port mappings: - "NNNN:NNNN" or - NNNN:NNNN | ||
| compose_port=$(grep -Eo '"[0-9]+:[0-9]+"' "$compose_file" 2>/dev/null | head -1 | grep -Eo '[0-9]+' | head -1) |
There was a problem hiding this comment.
Parse unquoted docker-compose port mappings
This probe only matches quoted mappings ("3000:3000"), but many docker-compose.yml files use the standard unquoted form (- 3000:3000). When that common shape is present, the probe misses and the script falls through to a default port, causing polish to target the wrong dev-server URL/port despite the compose file explicitly declaring one.
Useful? React with 👍 / 👎.
| done <<< "$FILES_ARG" | ||
|
|
||
| if [ ${#FILES_ARRAY[@]} -gt 0 ]; then | ||
| NUMSTAT=$(git diff --numstat "${BASE_REF}...HEAD" -- "${FILES_ARRAY[@]}" 2>/dev/null || true) |
There was a problem hiding this comment.
Propagate git diff failures in oversized classifier
The classifier swallows git diff --numstat errors with || true, so an unresolved base ref produces empty NUMSTAT and leaves diff_lines at 0. That incorrectly biases items toward manageable and can suppress stacked/replan gating; with an invalid base (for example no-such-ref) the script still emits a successful manageable classification instead of surfacing a failure.
Useful? React with 👍 / 👎.
Strip Phase 2 (entry gate) entirely: fork-PR trust boundary, review artifact SHA binding, verdict check, CI status check, and merge-base offer. These belong in a deploy/merge gate, not a polish skill. Polish is "can this be better?" — you shouldn't need a green CI pass or a review artifact to look at your feature in a browser. Removes trust-fork:1 and accept-stale-review:1 arguments. Renumbers phases: 0 (input) -> 1 (branch) -> 2 (dev server) -> 3 (checklist) -> 4 (envelope). 866 -> 753 lines (-113 lines of gate logic). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip 90% of the skill: checklist generation, size gates, stacked-PR seeds, replan escalation, dispatch matrices, elaborate envelopes, headless mode ceremony. What's left: start the dev server, open the browser, iterate with the user. 753 -> 82 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a0e32380b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pkg_file="$PROJECT_ROOT/package.json" | ||
| if [ -f "$pkg_file" ]; then | ||
| # Look for --port or -p in dev/start scripts | ||
| pkg_port=$(grep -Eo '(-p[= ]+|--port[= ]+)[0-9]+' "$pkg_file" 2>/dev/null | head -1 | grep -Eo '[0-9]+') |
There was a problem hiding this comment.
Restrict package.json port probe to dev/start scripts
This probe scans the entire package.json text and returns the first --port/-p token it finds, even when it comes from an unrelated script (for example e2e or test). In that case resolve-port.sh resolves a non-dev port and polish probes/opens the wrong URL, which can make the dev-server startup flow fail despite a valid scripts.dev. Parse only scripts.dev/scripts.start (e.g., with jq) instead of grepping the whole file.
Useful? React with 👍 / 👎.
| done < <(eval "find . -maxdepth 4 $EXCLUDE_ARGS -name 'Gemfile' -print" 2>/dev/null) | ||
|
|
||
| # Parse found files into (type, relative-dir) pairs | ||
| declare -A MONO_HITS=() # key = "type@dir", value = 1 (dedup) |
There was a problem hiding this comment.
Remove bash-4 associative arrays from project detection
The script declares MONO_HITS with declare -A, which requires Bash 4+, but this skill is invoked via bash and commonly runs on macOS where the default Bash is 3.2. In that environment the script aborts at this line (declare: -A: invalid option) and project-type auto-detection never runs, breaking the no-launch.json startup path. Use a Bash-3-compatible dedup strategy (indexed array/string set) here.
Useful? React with 👍 / 👎.
The SKILL.md rewrite dropped phases that used these files. No remaining references in the skill. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 885afd8df1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| --port) | ||
| EXPLICIT_PORT="${2:-}" | ||
| shift 2 | ||
| ;; |
There was a problem hiding this comment.
Reject missing value for --port flag
The --port parser path mirrors the --type bug: it reads ${2:-} and unconditionally executes shift 2. When the flag is passed without a value (for example, bash scripts/resolve-port.sh --port), shift fails, positional parameters are not consumed, and the while [ $# -gt 0 ] loop never terminates. This can hang the polish workflow instead of returning a clear usage error for malformed input.
Useful? React with 👍 / 👎.
| if [ -n "$value" ]; then | ||
| printf '%s' "$value" |
There was a problem hiding this comment.
Enforce numeric PORT values from .env probes
parse_env_port returns any non-empty PORT= value after trimming, even when it is not a number (e.g. PORT=${PORT:-3000} or PORT=abc). In those common cases the script emits a non-numeric token as the resolved port, which breaks downstream localhost URL/probe logic that expects an integer port. The parser should validate numeric content and fall through when the value is not a literal number.
Useful? React with 👍 / 👎.
| "extract-surfaces.sh", | ||
| "classify-oversized.sh", | ||
| "parse-checklist.sh", |
There was a problem hiding this comment.
Align script catalog test with shipped ce-polish-beta files
This test hard-requires extract-surfaces.sh, classify-oversized.sh, and parse-checklist.sh, but this commit only adds four scripts under ce-polish-beta/scripts/ and none of those three files. As written, the new envelope test fails immediately with ENOENT, which turns the PR red. Either add the missing scripts in this commit or update the catalog expectations to match the actual shipped skill contents.
Useful? React with 👍 / 👎.
…tructure Delete envelope and size-gate test files, remove resolve-base.sh packaging tests from converter.test.ts — all tested features that were removed in the SKILL.md rewrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c818a3c988
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for cfg in \ | ||
| "$PROJECT_ROOT"/next.config.js \ | ||
| "$PROJECT_ROOT"/next.config.ts \ | ||
| "$PROJECT_ROOT"/next.config.mjs \ | ||
| "$PROJECT_ROOT"/next.config.cjs \ |
There was a problem hiding this comment.
Restrict framework config scan to selected --type
This probe always checks next.config.* first, even when the caller explicitly passes a different --type (for example --type vite). In mixed repos that contain multiple framework configs, the script returns the first matching port from an unrelated framework and exits early, which points polish at the wrong dev-server URL. Filter the config-file list by PROJ_TYPE before applying first-hit-wins logic.
Useful? React with 👍 / 👎.
|
|
||
| # Conservative regex: match "port:" + digits, then verify nothing non-numeric | ||
| # follows (rejects variable references like "port: process.env.PORT || 3000"). | ||
| local_line=$(grep -E 'port:[[:space:]]*["'"'"']?[0-9]+' "$cfg" 2>/dev/null | head -1) |
There was a problem hiding this comment.
Skip commented port lines in config parsing
The framework-config matcher accepts any line containing port: and digits, including comments. If a file contains a comment like // port: 4000 before the real config, this code resolves 4000 and exits, even when the actual runtime port differs. That causes false port resolution and failed browser/probe targeting; comment lines should be excluded before extracting the numeric value.
Useful? React with 👍 / 👎.
| # ── Probe 4: Procfile.dev ─────────────────────────────────────────────────── | ||
|
|
||
| if should_probe "$PROJ_TYPE" "procfile"; then | ||
| procfile="$PROJECT_ROOT/Procfile.dev" |
There was a problem hiding this comment.
Probe Procfile when Procfile.dev is absent
Project detection classifies procfile when either Procfile or Procfile.dev exists, but this branch only reads Procfile.dev. In Procfile-only projects, port extraction is skipped and the script falls through to defaults, so an explicit web: ... -p <port> in Procfile is ignored and polish targets the wrong port. Add a Procfile fallback here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c818a3c988
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| procfile="$PROJECT_ROOT/Procfile.dev" | ||
| if [ -f "$procfile" ]; then |
There was a problem hiding this comment.
Probe Procfile when resolving procfile project ports
detect-project-type.sh classifies a repo as procfile when either Procfile or Procfile.dev exists, but this resolver only reads Procfile.dev. In Procfile-only projects (a common Foreman setup), a declared web port (for example -p 4567) is ignored and the script falls back to 3000, so polish probes/opens the wrong URL and can appear to fail startup even though the Procfile is valid.
Useful? React with 👍 / 👎.
| For framework types that need a package manager, run `bash scripts/resolve-package-manager.sh` and substitute the result into the start command. | ||
|
|
||
| Resolve the port with `bash scripts/resolve-port.sh --type <type>`. |
There was a problem hiding this comment.
Parse monorepo type@dir before running resolvers
The skill calls detect-project-type.sh, but then always runs package-manager and port resolution at repo root and passes a raw <type> token. That breaks the new monorepo output contract (e.g. next@apps/web): in that case resolve-package-manager.sh reads the root (often returning __NO_PACKAGE_JSON__) and resolve-port.sh defaults incorrectly, so polish can start/probe the wrong app context in multi-package repos.
Useful? React with 👍 / 👎.
|
|
||
| # Conservative regex: match "port:" + digits, then verify nothing non-numeric | ||
| # follows (rejects variable references like "port: process.env.PORT || 3000"). | ||
| local_line=$(grep -E 'port:[[:space:]]*["'"'"']?[0-9]+' "$cfg" 2>/dev/null | head -1) |
There was a problem hiding this comment.
Skip commented port literals in config-file probe
The framework-config probe greps for any port: <number> text and takes the first match, so commented lines like // port: 9000 are treated as authoritative port settings. This can override the real runtime port with a stale/example value and send polish to the wrong localhost URL; the scan should ignore comment-only matches (or parse actual config keys).
Useful? React with 👍 / 👎.
Summary
Adds
/ce:polish-beta— the second human-in-the-loop moment in the compound-engineering workflow (paired with brainstorm). Runs after/ce:reviewpasses but before merge. Starts a dev server from.claude/launch.json, generates a user-testable checklist, lets the human mark items for craft fixes, and dispatches polish sub-agents in parallel. Emits stacked-PR seeds when individual items are oversized and a replan seed when the whole batch exceeds a single focus area.Skill structure
Five phases in
plugins/compound-engineering/skills/ce-polish-beta/SKILL.md:mode:headless,trust-fork:1,accept-stale-review:1,allow-port-kill:1,plan:<path>)/ce:reviewran, CI is green, branch is merged with main, fork-PR trust.claude/launch.json(authoritative), auto-detects Rails/Next/Vite/Procfile as fallback, kills port, starts server, opens IDE browser.context/compound-engineering/ce-polish/<run-id>/artifact layout, stacked-PR handoffSize gate (three-tier)
>5 files OR >2 surfaces OR >300 diff lines→stacked-pr-<n>.mdseedaction: stackedon manageable → same seed withuser_judgment: yes>30 files OR >1000 lines OR majority oversized OR 3+ replan actions→replan-seed.mdCross-cutting changes
ce:reviewnow writesmetadata.json(branch, head_sha, verdict, completed_at) alongside findings so polish can verify the review artifact matches the current HEAD. Additive — pre-existing artifacts fall back to mtime.resolve-base.shis duplicated fromce-reviewintoce-polish-betaper the self-contained-skill rule. Two new tests intests/converter.test.tspin the byte-equality invariant so silent drift fails loudly.Tests
tests/skills/ce-polish-beta-envelope.test.ts(18) — envelope shape, argument tokens, reference file catalog, script catalog (owner-execute bit)tests/skills/ce-polish-beta-size-gate.test.ts(21) — extract-surfaces / classify-oversized / parse-checklist against an in-memory git repotests/skills/ce-polish-beta-dev-server.test.ts— launch.json reader and project-type detectorFull suite: 713 pass, 1 pre-existing fail (
resolve-base.sh > prefers the PR base remote from gh metadata over origin— unrelated, confirmed on cleanfeat/ce-polish-commandbefore our changes).Review synthesis
Tier-2
/ce:review mode:autofixran 6 reviewer personas in parallel. Two high-confidence fixes applied:- field:line that terminated anotes: |block was silently dropped, allowing notes to overwrite a later item's action. Extracted field parsing intoparse_field_line()and re-dispatch the terminator through it. Regression test added.action: stackedon manageable items (user judgment the classifier missed), but the parser rejected it. Relaxed the parser; test flipped from rejection to acceptance.Deferred as follow-ups (tracked as todos):
.context/paths are CWD-relative (affects subdir-invocation)references/ide-detection.mdVersioning
Per
AGENTS.md: no manual bump ofplugin.json,marketplace.json, orCHANGELOG.md. Release-please handles linked-versions (cli + compound-engineering) automatically on merge.Test plan
bun test tests/skills/ce-polish-beta-size-gate.test.ts— 21 passbun test tests/skills/ce-polish-beta-envelope.test.ts— 18 passbun test— 713 pass (1 pre-existing fail documented)bun run release:validate— in sync, 43 skills, 49 agents/ce:review mode:autofix— 6 reviewers, 2 safe fixes applied, advisory findings deferred/ce:polish-betaon a real PR (beta —disable-model-invocation: truemeans it won't auto-trigger)Post-Deploy Monitoring & Validation
Beta skill with
disable-model-invocation: true, so there is no production autoload path on merge — users invoke/ce:polish-betaexplicitly. No additional operational monitoring required beyond the standard plugin-release pipeline.🤖 Generated with Claude Code