diff --git a/CLAUDE.md b/CLAUDE.md index bfd7d069..9123de5a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -85,7 +85,7 @@ Rust: clippy for linting, cargo test for tests. No TypeScript, no npm. Skills an - **Review comparison**: `flowctl review-backend --compare ` or `--epic ` detects consensus/conflict across review receipts (auto-archived to `.flow/reviews/`) - **Domain tagging**: `flowctl task create --domain ` tags tasks (frontend/backend/architecture/testing/docs/ops/general), filterable via `tasks --domain` - **Epic archival**: `flowctl epic archive ` moves closed epic + tasks + specs + reviews to `.flow/.archive/`; `flowctl epic clean` archives all closed epics at once -- **Learning loop**: plan injects memory (Step 1b), worker saves lessons (Phase 5b, included in default sequence when memory.enabled is true), epic close prompts retro, retro verifies stale entries via `flowctl memory verify ` +- **Learning loop**: plan injects memory (Step 6), worker saves lessons (Phase 11, included in default sequence when memory.enabled is true), epic close prompts retro, retro verifies stale entries via `flowctl memory verify ` - **Task duration**: `flowctl done` auto-tracks `duration_seconds` from start to completion, rendered in evidence - **File ownership**: `flowctl task create --files ` declares owned files; `flowctl files --epic ` shows ownership map + conflict detection - **File locking (Teams)**: `flowctl lock --task --files ` acquires runtime file locks; `flowctl unlock --task ` releases on completion; `flowctl lock-check --file ` inspects lock state; `flowctl unlock --all` clears all locks between waves @@ -95,9 +95,9 @@ Rust: clippy for linting, cargo test for tests. No TypeScript, no npm. Skills an - **Review circuit breaker**: impl-review fix loop capped at `MAX_REVIEW_ITERATIONS` (default 3) — prevents infinite NEEDS_WORK cycles - **Auto-improve analysis-driven**: generates custom program.md from codebase analysis (hotspots, lint, coverage, memory) with Action Catalog ranked by impact — not static templates - **Auto-improve quantitative**: captures before/after metrics per experiment, commit messages include delta `[lint:23→21]` -- **Worker self-review**: Phase 2.5 runs guard + structured diff review (correctness, quality, performance, testing) before commit +- **Worker self-review**: Phase 6 runs guard + structured diff review (correctness, quality, performance, testing) before commit - **Plan auto-execute**: `/flow-code:plan` defaults to auto-execute work after planning (Teams mode handles any task count); `--plan-only` to opt out -- **Goal-backward verification**: worker Phase 5 re-reads acceptance criteria and verifies each is actually satisfied before completing +- **Goal-backward verification**: worker Phase 10 re-reads acceptance criteria and verifies each is actually satisfied before completing - **Full-auto by default**: `/flow-code:plan` and `/flow-code:work` require zero interactive questions — AI reads git state, `.flow/` config, and request context to make branch, review, and research decisions autonomously. Default mode is Worktree + Teams + Phase-Gate (all three active). Work resumes from `.flow/` state on every startup (not a special "resume mode"). All tasks done → auto push + draft PR (`--no-pr` to skip) - **Cross-platform**: flowctl is a single Rust binary (macOS/Linux). RP plan-review auto-degrades to Codex on platforms where rp-cli is unavailable. Bash hooks degrade gracefully on Windows (skip, don't block) - **Session start**: CLAUDE.md instruction (not an enforced hook) — if `.flow/` exists, run `flowctl status --interrupted` to check for unfinished work from a previous session and resume with the suggested `/flow-code:work ` command diff --git a/agents/worker.md b/agents/worker.md index 2fccb934..a2f52dd9 100644 --- a/agents/worker.md +++ b/agents/worker.md @@ -20,8 +20,8 @@ You implement a single flow-code task. Your prompt contains configuration values - `FLOWCTL` - path to flowctl CLI - `REVIEW_MODE` - none, rp, or codex - `RALPH_MODE` - true if running autonomously -- `TDD_MODE` - true to enforce test-first development (Phase 2a) -- `RP_CONTEXT` - mcp, cli, or none (controls RP-powered context gathering in Phase 1.5) +- `TDD_MODE` - true to enforce test-first development (Phase 4) +- `RP_CONTEXT` - mcp, cli, or none (controls RP-powered context gathering in Phase 3) ## Environment @@ -38,7 +38,7 @@ You execute phases one at a time via flowctl commands. 4. Run: `$FLOWCTL worker-phase done --task $TASK_ID --phase --json` 5. Repeat from step 1 until response has `all_done: true` -Do NOT skip phases. Do NOT execute phases out of order. The gate enforces sequential execution — attempting to complete phase 3 before phase 2 will be rejected. +Do NOT skip phases. Do NOT execute phases out of order. The gate enforces sequential execution — attempting to complete phase 5 before phase 4 will be rejected. @@ -129,7 +129,7 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm -## Phase 0: Verify Configuration (CRITICAL) +## Phase 1: Verify Configuration (CRITICAL) **If TEAM_MODE is `true`:** @@ -139,7 +139,7 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm SendMessage(to: "coordinator", summary: "Blocked: ", message: "Task is blocked.\nReason: TEAM_MODE=true but OWNED_FILES is empty or missing.\nBlocked by: orchestrator configuration error") ``` - - Do NOT proceed to Phase 1 + - Do NOT proceed to Phase 2 2. **Verify TASK_ID matches prompt** - Confirm the `TASK_ID` from your prompt matches what `flowctl show` returns @@ -148,11 +148,11 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm 3. **Log owned files for audit trail** - Print `OWNED_FILES: , , ...` so the conversation log captures your ownership set -**If TEAM_MODE is not set or `false`:** proceed directly to Phase 1 (unrestricted file access). +**If TEAM_MODE is not set or `false`:** proceed directly to Phase 2 (unrestricted file access). -## Phase 1: Re-anchor (CRITICAL - DO NOT SKIP) +## Phase 2: Re-anchor (CRITICAL - DO NOT SKIP) Use the FLOWCTL path and IDs from your prompt: @@ -217,13 +217,13 @@ GIT_BASELINE_REV=$(git rev-parse HEAD) echo "GIT_BASELINE_REV=$GIT_BASELINE_REV" git diff --stat HEAD 2>/dev/null || true ``` -Save `GIT_BASELINE_REV` — you'll use it in Phase 5 to generate workspace change evidence. +Save `GIT_BASELINE_REV` — you'll use it in Phase 10 to generate workspace change evidence. -## Phase 1.5: Pre-implementation Investigation +## Phase 3: Pre-implementation Investigation -**If the task spec contains `## Investigation targets` with content, execute this phase. Otherwise skip to Phase 2a/2.** +**If the task spec contains `## Investigation targets` with content, execute this phase. Otherwise skip to Phase 4/5.** ### Step 0: RP-powered deep context (if RP_CONTEXT != none) @@ -237,7 +237,7 @@ IF RP_CONTEXT == "mcp": Timeout: 120 seconds. If context_builder does not return within 120s, log: "RP context_builder timed out after 120s, using built-in fallback" and skip to Step 1. - Use the returned plan to guide Phase 2 implementation. + Use the returned plan to guide Phase 5 implementation. ELIF RP_CONTEXT == "cli": Run with 120s timeout: @@ -245,7 +245,7 @@ ELIF RP_CONTEXT == "cli": If timeout or failure, log: "rp-cli builder timed out or failed, using built-in fallback" and skip to Step 1. - Use the returned plan to guide Phase 2 implementation. + Use the returned plan to guide Phase 5 implementation. ELSE (RP_CONTEXT == "none"): Skip to Step 1 (existing behavior, unchanged). @@ -286,11 +286,11 @@ END 3. Read **Optional** files as needed based on Step 1 findings. -4. Continue to Phase 2a/2 only after investigation is complete. +4. Continue to Phase 4/5 only after investigation is complete. -## Phase 2a: TDD Red-Green (if TDD_MODE=true) +## Phase 4: TDD Red-Green (if TDD_MODE=true) **Skip this phase if TDD_MODE is not `true`.** @@ -303,7 +303,7 @@ Before implementing the feature, write failing tests first: ``` If tests pass already, the feature may already be implemented. Investigate before proceeding. -2. **Green** — Now implement the minimum code to make tests pass (this IS Phase 2). +2. **Green** — Now implement the minimum code to make tests pass (this IS Phase 5). 3. **Refactor** — After tests pass, clean up without changing behavior. Run tests again to confirm still green. @@ -311,7 +311,7 @@ The key constraint: **no implementation code before a failing test exists**. Thi -## Phase 2: Implement +## Phase 5: Implement **First, capture base commit for scoped review:** ```bash @@ -401,7 +401,7 @@ If during implementation you discover the spec is wrong, incomplete, or contradi - What the spec says vs what reality requires - Why the spec approach won't work - A suggested correction (if you have one) -3. **Return early** with status `SPEC_CONFLICT` in your Phase 6 summary +3. **Return early** with status `SPEC_CONFLICT` in your Phase 12 summary 4. Do NOT mark the task as done — leave it `in_progress` The main conversation will resolve the conflict and re-dispatch you (or update the spec). @@ -414,7 +414,7 @@ The main conversation will resolve the conflict and re-dispatch you (or update t -## Phase 2.3: Plan Alignment Check +### Plan Alignment Check Quick sanity check — did implementation stay within plan scope? @@ -431,11 +431,11 @@ Quick sanity check — did implementation stay within plan scope? Drift: ``` -**This is a 30-second check, not a full re-review.** Read the spec, glance at your diff, note any drift. Then proceed to Phase 2.5. +**This is a 30-second check, not a full re-review.** Read the spec, glance at your diff, note any drift. Then proceed to Phase 6. -## Phase 2.5: Verify & Fix +## Phase 6: Verify & Fix **After implementing, before committing — verify your code works. This is normal development: implement → test → fix → retest → pass → commit.** @@ -451,7 +451,7 @@ Continue until guard passes. There is no retry limit — this is not a retry loo **If the failure is not a code bug but a spec problem** (e.g., spec asks for something impossible, acceptance criteria contradict each other, required API doesn't exist): - Do NOT keep trying to fix code -- Return early with `SPEC_CONFLICT` status (see Phase 2 spec conflict protocol) +- Return early with `SPEC_CONFLICT` status (see Phase 5 spec conflict protocol) - In Teams mode, send a `Spec conflict` message to the coordinator **Teams mode constraint:** When `TEAM_MODE=true`, only fix files in `OWNED_FILES`. If the failure is caused by a file you don't own, request access via `flowctl approval create --kind file_access` + `approval show --wait` (or fallback `Need file access:` SendMessage), then wait for a resolution. If access is rejected or times out, note the issue in your completion summary. @@ -473,11 +473,11 @@ If you find issues, fix them and re-run ` guard` to verify. **Rules:** - Only fix issues in YOUR changes — don't refactor unrelated code -- If unsure whether something is an issue, leave it for Phase 4 (external review) +- If unsure whether something is an issue, leave it for Phase 8 (external review) -## Phase 3: Commit +## Phase 7: Commit ```bash git add -A @@ -493,9 +493,9 @@ Use conventional commits. Scope from task context. -## Phase 4: Review (MANDATORY if REVIEW_MODE != none) +## Phase 8: Review (MANDATORY if REVIEW_MODE != none) -**If REVIEW_MODE is `none`, skip to Phase 5.** +**If REVIEW_MODE is `none`, skip to Phase 10.** **If REVIEW_MODE is `rp` or `codex`, you MUST invoke impl-review and receive SHIP before proceeding.** @@ -520,13 +520,13 @@ If NEEDS_WORK: 3. Commit fixes 4. Re-invoke the skill: `/flow-code:impl-review --base $BASE_COMMIT` -Continue until SHIP verdict. Save final `REVIEW_ITERATIONS` count for Phase 5 evidence. +Continue until SHIP verdict. Save final `REVIEW_ITERATIONS` count for Phase 10 evidence. -## Phase 5: Complete +## Phase 10: Complete -**Prerequisite:** Phase 5c (Outputs Dump) must have run if `outputs.enabled=true`. The phase registry orders 5c before 5 so the narrative handoff file exists before dependents unblock. +**Prerequisite:** Phase 9 (Outputs Dump) must have run if `outputs.enabled=true`. The phase registry orders 9 before 10 so the narrative handoff file exists before dependents unblock. **Verify before completing:** ```bash @@ -547,7 +547,7 @@ Go through each `- [ ]` acceptance criterion in the spec: **Rules:** - This is a 1-minute sanity check, not a full re-review -- Only check acceptance criteria, not general quality (Phase 2.5 already did that) +- Only check acceptance criteria, not general quality (Phase 6 already did that) - If you discover a gap, fix + commit + re-run guard - If you discover the criterion is impossible, note it in the summary (not SPEC_CONFLICT at this stage) @@ -556,7 +556,7 @@ Capture the commit hash: COMMIT_HASH=$(git rev-parse HEAD) ``` -Capture workspace changes (compare against Phase 1 baseline): +Capture workspace changes (compare against Phase 2 baseline): ```bash # Generate workspace change summary DIFF_STAT=$(git diff --stat "$GIT_BASELINE_REV"..HEAD 2>/dev/null || echo "no diff") @@ -574,7 +574,7 @@ EOF **If a review was done (REVIEW_MODE != none)**, append the review receipt to evidence so it gets auto-archived: ```bash -# Only if RECEIPT_PATH exists from Phase 4 +# Only if RECEIPT_PATH exists from Phase 8 if [ -f "${RECEIPT_PATH:-/tmp/impl-review-receipt.json}" ]; then # Merge review_receipt into evidence JSON python3 -c " @@ -610,9 +610,9 @@ Status MUST be `done`. If not: -## Phase 5c: Outputs Dump (if outputs.enabled) +## Phase 9: Outputs Dump (if outputs.enabled) -**Runs BEFORE Phase 5 completion.** Phase 5c must produce the handoff artifact before `flowctl done` fires, otherwise a dependent task can start re-anchoring and race past the missing file. The phase registry in `flowctl-cli/src/commands/workflow/phase.rs` enforces this ordering (5c before 5). +**Runs BEFORE Phase 10 completion.** Phase 9 must produce the handoff artifact before `flowctl done` fires, otherwise a dependent task can start re-anchoring and race past the missing file. The phase registry in `flowctl-cli/src/commands/workflow/phase.rs` enforces this ordering (9 before 10). **Skip if `outputs.enabled` is false.** This is gated on its own config key — independent from `memory.enabled`. Outputs are a lightweight narrative handoff layer (plain markdown, no verification), separate from the verified memory system. @@ -645,18 +645,18 @@ fi - All three sections are allowed to be missing or empty — downstream readers handle that gracefully - Focus on narrative handoff: what would help the next worker, not comprehensive docs - Don't repeat spec content — only things you learned while working -- This is narrative handoff, NOT verified memory. Save verified pitfalls/conventions in Phase 5b. +- This is narrative handoff, NOT verified memory. Save verified pitfalls/conventions in Phase 11. -## Phase 5b: Memory Auto-Save (if memory enabled) +## Phase 11: Memory Auto-Save (if memory enabled) -**Skip if memory.enabled is false or was not checked in Phase 1.** +**Skip if memory.enabled is false or was not checked in Phase 2.** After completing the task, capture any non-obvious lessons learned: ```bash -# Check if memory is enabled (already checked in Phase 1) +# Check if memory is enabled (already checked in Phase 2) config get memory.enabled --json ``` @@ -686,7 +686,7 @@ If enabled, reflect on what you discovered during implementation and save **only -## Phase 6: Return +## Phase 12: Return Return a concise summary to the main conversation: - What was implemented (1-2 sentences) diff --git a/codex/agents/worker.toml b/codex/agents/worker.toml index 3c37991c..f65c4b5b 100644 --- a/codex/agents/worker.toml +++ b/codex/agents/worker.toml @@ -16,8 +16,8 @@ You implement a single flow-code task. Your prompt contains configuration values - `FLOWCTL` - path to flowctl CLI - `REVIEW_MODE` - none, rp, or codex - `RALPH_MODE` - true if running autonomously -- `TDD_MODE` - true to enforce test-first development (Phase 2a) -- `RP_CONTEXT` - mcp, cli, or none (controls RP-powered context gathering in Phase 1.5) +- `TDD_MODE` - true to enforce test-first development (Phase 4) +- `RP_CONTEXT` - mcp, cli, or none (controls RP-powered context gathering in Phase 3) ## Environment @@ -34,7 +34,7 @@ You execute phases one at a time via flowctl commands. 4. Run: `$FLOWCTL worker-phase done --task $TASK_ID --phase --json` 5. Repeat from step 1 until response has `all_done: true` -Do NOT skip phases. Do NOT execute phases out of order. The gate enforces sequential execution — attempting to complete phase 3 before phase 2 will be rejected. +Do NOT skip phases. Do NOT execute phases out of order. The gate enforces sequential execution — attempting to complete phase 5 before phase 4 will be rejected. @@ -125,7 +125,7 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm -## Phase 0: Verify Configuration (CRITICAL) +## Phase 1: Verify Configuration (CRITICAL) **If TEAM_MODE is `true`:** @@ -135,7 +135,7 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm SendMessage(to: "coordinator", summary: "Blocked: ", message: "Task is blocked.\\nReason: TEAM_MODE=true but OWNED_FILES is empty or missing.\\nBlocked by: orchestrator configuration error") ``` - - Do NOT proceed to Phase 1 + - Do NOT proceed to Phase 2 2. **Verify TASK_ID matches prompt** - Confirm the `TASK_ID` from your prompt matches what `flowctl show` returns @@ -144,11 +144,11 @@ After `flowctl done`, send a `task_complete` message, then wait for next assignm 3. **Log owned files for audit trail** - Print `OWNED_FILES: , , ...` so the conversation log captures your ownership set -**If TEAM_MODE is not set or `false`:** proceed directly to Phase 1 (unrestricted file access). +**If TEAM_MODE is not set or `false`:** proceed directly to Phase 2 (unrestricted file access). -## Phase 1: Re-anchor (CRITICAL - DO NOT SKIP) +## Phase 2: Re-anchor (CRITICAL - DO NOT SKIP) Use the FLOWCTL path and IDs from your prompt: @@ -213,13 +213,13 @@ GIT_BASELINE_REV=$(git rev-parse HEAD) echo "GIT_BASELINE_REV=$GIT_BASELINE_REV" git diff --stat HEAD 2>/dev/null || true ``` -Save `GIT_BASELINE_REV` — you'll use it in Phase 5 to generate workspace change evidence. +Save `GIT_BASELINE_REV` — you'll use it in Phase 10 to generate workspace change evidence. -## Phase 1.5: Pre-implementation Investigation +## Phase 3: Pre-implementation Investigation -**If the task spec contains `## Investigation targets` with content, execute this phase. Otherwise skip to Phase 2a/2.** +**If the task spec contains `## Investigation targets` with content, execute this phase. Otherwise skip to Phase 4/5.** ### Step 0: RP-powered deep context (if RP_CONTEXT != none) @@ -233,7 +233,7 @@ IF RP_CONTEXT == "mcp": Timeout: 120 seconds. If context_builder does not return within 120s, log: "RP context_builder timed out after 120s, using built-in fallback" and skip to Step 1. - Use the returned plan to guide Phase 2 implementation. + Use the returned plan to guide Phase 5 implementation. ELIF RP_CONTEXT == "cli": Run with 120s timeout: @@ -241,7 +241,7 @@ ELIF RP_CONTEXT == "cli": If timeout or failure, log: "rp-cli builder timed out or failed, using built-in fallback" and skip to Step 1. - Use the returned plan to guide Phase 2 implementation. + Use the returned plan to guide Phase 5 implementation. ELSE (RP_CONTEXT == "none"): Skip to Step 1 (existing behavior, unchanged). @@ -282,11 +282,11 @@ END 3. Read **Optional** files as needed based on Step 1 findings. -4. Continue to Phase 2a/2 only after investigation is complete. +4. Continue to Phase 4/5 only after investigation is complete. -## Phase 2a: TDD Red-Green (if TDD_MODE=true) +## Phase 4: TDD Red-Green (if TDD_MODE=true) **Skip this phase if TDD_MODE is not `true`.** @@ -299,7 +299,7 @@ Before implementing the feature, write failing tests first: ``` If tests pass already, the feature may already be implemented. Investigate before proceeding. -2. **Green** — Now implement the minimum code to make tests pass (this IS Phase 2). +2. **Green** — Now implement the minimum code to make tests pass (this IS Phase 5). 3. **Refactor** — After tests pass, clean up without changing behavior. Run tests again to confirm still green. @@ -307,7 +307,7 @@ The key constraint: **no implementation code before a failing test exists**. Thi -## Phase 2: Implement +## Phase 5: Implement **First, capture base commit for scoped review:** ```bash @@ -397,7 +397,7 @@ If during implementation you discover the spec is wrong, incomplete, or contradi - What the spec says vs what reality requires - Why the spec approach won't work - A suggested correction (if you have one) -3. **Return early** with status `SPEC_CONFLICT` in your Phase 6 summary +3. **Return early** with status `SPEC_CONFLICT` in your Phase 12 summary 4. Do NOT mark the task as done — leave it `in_progress` The main conversation will resolve the conflict and re-dispatch you (or update the spec). @@ -410,7 +410,7 @@ The main conversation will resolve the conflict and re-dispatch you (or update t -## Phase 2.5: Verify & Fix +## Phase 6: Verify & Fix **After implementing, before committing — verify your code works. This is normal development: implement → test → fix → retest → pass → commit.** @@ -426,7 +426,7 @@ Continue until guard passes. There is no retry limit — this is not a retry loo **If the failure is not a code bug but a spec problem** (e.g., spec asks for something impossible, acceptance criteria contradict each other, required API doesn't exist): - Do NOT keep trying to fix code -- Return early with `SPEC_CONFLICT` status (see Phase 2 spec conflict protocol) +- Return early with `SPEC_CONFLICT` status (see Phase 5 spec conflict protocol) - In Teams mode, send a `Spec conflict` message to the coordinator **Teams mode constraint:** When `TEAM_MODE=true`, only fix files in `OWNED_FILES`. If the failure is caused by a file you don't own, request access via `flowctl approval create --kind file_access` + `approval show --wait` (or fallback `Need file access:` SendMessage), then wait for a resolution. If access is rejected or times out, note the issue in your completion summary. @@ -448,11 +448,11 @@ If you find issues, fix them and re-run ` guard` to verify. **Rules:** - Only fix issues in YOUR changes — don't refactor unrelated code -- If unsure whether something is an issue, leave it for Phase 4 (external review) +- If unsure whether something is an issue, leave it for Phase 8 (external review) -## Phase 3: Commit +## Phase 7: Commit ```bash git add -A @@ -468,9 +468,9 @@ Use conventional commits. Scope from task context. -## Phase 4: Review (MANDATORY if REVIEW_MODE != none) +## Phase 8: Review (MANDATORY if REVIEW_MODE != none) -**If REVIEW_MODE is `none`, skip to Phase 5.** +**If REVIEW_MODE is `none`, skip to Phase 10.** **If REVIEW_MODE is `rp` or `codex`, you MUST invoke impl-review and receive SHIP before proceeding.** @@ -495,13 +495,13 @@ If NEEDS_WORK: 3. Commit fixes 4. Re-invoke the skill: `/flow-code:impl-review --base $BASE_COMMIT` -Continue until SHIP verdict. Save final `REVIEW_ITERATIONS` count for Phase 5 evidence. +Continue until SHIP verdict. Save final `REVIEW_ITERATIONS` count for Phase 10 evidence. -## Phase 5: Complete +## Phase 10: Complete -**Prerequisite:** Phase 5c (Outputs Dump) must have run if `outputs.enabled=true`. The phase registry orders 5c before 5 so the narrative handoff file exists before dependents unblock. +**Prerequisite:** Phase 9 (Outputs Dump) must have run if `outputs.enabled=true`. The phase registry orders 9 before 10 so the narrative handoff file exists before dependents unblock. **Verify before completing:** ```bash @@ -522,7 +522,7 @@ Go through each `- [ ]` acceptance criterion in the spec: **Rules:** - This is a 1-minute sanity check, not a full re-review -- Only check acceptance criteria, not general quality (Phase 2.5 already did that) +- Only check acceptance criteria, not general quality (Phase 6 already did that) - If you discover a gap, fix + commit + re-run guard - If you discover the criterion is impossible, note it in the summary (not SPEC_CONFLICT at this stage) @@ -531,7 +531,7 @@ Capture the commit hash: COMMIT_HASH=$(git rev-parse HEAD) ``` -Capture workspace changes (compare against Phase 1 baseline): +Capture workspace changes (compare against Phase 2 baseline): ```bash # Generate workspace change summary DIFF_STAT=$(git diff --stat "$GIT_BASELINE_REV"..HEAD 2>/dev/null || echo "no diff") @@ -549,7 +549,7 @@ EOF **If a review was done (REVIEW_MODE != none)**, append the review receipt to evidence so it gets auto-archived: ```bash -# Only if RECEIPT_PATH exists from Phase 4 +# Only if RECEIPT_PATH exists from Phase 8 if [ -f "${RECEIPT_PATH:-/tmp/impl-review-receipt.json}" ]; then # Merge review_receipt into evidence JSON python3 -c " @@ -585,9 +585,9 @@ Status MUST be `done`. If not: -## Phase 5c: Outputs Dump (if outputs.enabled) +## Phase 9: Outputs Dump (if outputs.enabled) -**Runs BEFORE Phase 5 completion.** Phase 5c must produce the handoff artifact before `flowctl done` fires, otherwise a dependent task can start re-anchoring and race past the missing file. The phase registry in `flowctl-cli/src/commands/workflow/phase.rs` enforces this ordering (5c before 5). +**Runs BEFORE Phase 10 completion.** Phase 9 must produce the handoff artifact before `flowctl done` fires, otherwise a dependent task can start re-anchoring and race past the missing file. The phase registry in `flowctl-cli/src/commands/workflow/phase.rs` enforces this ordering (9 before 10). **Skip if `outputs.enabled` is false.** This is gated on its own config key — independent from `memory.enabled`. Outputs are a lightweight narrative handoff layer (plain markdown, no verification), separate from the verified memory system. @@ -620,18 +620,18 @@ fi - All three sections are allowed to be missing or empty — downstream readers handle that gracefully - Focus on narrative handoff: what would help the next worker, not comprehensive docs - Don't repeat spec content — only things you learned while working -- This is narrative handoff, NOT verified memory. Save verified pitfalls/conventions in Phase 5b. +- This is narrative handoff, NOT verified memory. Save verified pitfalls/conventions in Phase 11. -## Phase 5b: Memory Auto-Save (if memory enabled) +## Phase 11: Memory Auto-Save (if memory enabled) -**Skip if memory.enabled is false or was not checked in Phase 1.** +**Skip if memory.enabled is false or was not checked in Phase 2.** After completing the task, capture any non-obvious lessons learned: ```bash -# Check if memory is enabled (already checked in Phase 1) +# Check if memory is enabled (already checked in Phase 2) config get memory.enabled --json ``` @@ -661,7 +661,7 @@ If enabled, reflect on what you discovered during implementation and save **only -## Phase 6: Return +## Phase 12: Return Return a concise summary to the main conversation: - What was implemented (1-2 sentences) diff --git a/codex/skills/flow-code-plan/SKILL.md b/codex/skills/flow-code-plan/SKILL.md index d2660a36..30231bd9 100644 --- a/codex/skills/flow-code-plan/SKILL.md +++ b/codex/skills/flow-code-plan/SKILL.md @@ -104,20 +104,20 @@ Research: repo-scout + rp() | Depth: --no-review` invoked automatically (Step 8) -- `--plan-only`: shows plan summary and stops (Step 8) +**Steps.md Step 15 handles auto-execution.** After steps complete: +- Default: `/flow-code:work --no-review` invoked automatically (Step 15) +- `--plan-only`: shows plan summary and stops (Step 15) **After work completes** (if auto-executed): - All tasks done → Layer 3 adversarial review runs automatically (Phase 3j) diff --git a/codex/skills/flow-code-plan/steps.md b/codex/skills/flow-code-plan/steps.md index 80f619f6..40d3b86d 100644 --- a/codex/skills/flow-code-plan/steps.md +++ b/codex/skills/flow-code-plan/steps.md @@ -1,6 +1,6 @@ # Flow Plan Steps -**IMPORTANT**: Steps 1-3 (research, gap analysis, depth) ALWAYS run regardless of input type. +**IMPORTANT**: Steps 4-9 (research, gap analysis, depth) ALWAYS run regardless of input type. **CRITICAL**: If you are about to create: - a markdown TODO list, @@ -34,7 +34,7 @@ Use **T-shirt sizes** based on observable metrics — not token estimates (model **Rules**: Combine sequential S tasks into one M. Split L tasks into M tasks. If 7+ tasks, look for over-splitting. Minimize file overlap between tasks for parallel work — list expected files in `**Files:**`, use `flowctl dep add` when tasks must share files. -## Step 0: Initialize .flow +## Step 1: Initialize .flow **CRITICAL: flowctl is BUNDLED — NOT installed globally.** `which flowctl` will fail (expected). Always use: @@ -46,11 +46,11 @@ FLOWCTL="$HOME/.flow/bin/flowctl" $FLOWCTL init --json ``` -> **Note — opt-in interactive refinement:** If the user passed `--interactive`, BEFORE running Step 0 (Context Analysis in SKILL.md), invoke `/flow-code:interview` with the raw request text. The interview returns refined-spec markdown with Problem / Scope / Acceptance / Open Questions sections; use that refined text as the effective request for Context Analysis and all subsequent steps. Without the flag, skip this entirely — Step 0.5 below remains an automated internal brainstorm and is **not** interactive. Do not add any auto-trigger heuristic (length, punctuation, verb detection); interview must be opt-in only to preserve the zero-interaction contract (AGENTS.md:99). +> **Note — opt-in interactive refinement:** If the user passed `--interactive`, BEFORE running Step 1 (Context Analysis in SKILL.md), invoke `/flow-code:interview` with the raw request text. The interview returns refined-spec markdown with Problem / Scope / Acceptance / Open Questions sections; use that refined text as the effective request for Context Analysis and all subsequent steps. Without the flag, skip this entirely — Step 2 below remains an automated internal brainstorm and is **not** interactive. Do not add any auto-trigger heuristic (length, punctuation, verb detection); interview must be opt-in only to preserve the zero-interaction contract (AGENTS.md:99). -## Step 0.5: Clarity Check (auto — no human input) +## Step 2: Clarity Check (auto — no human input) -**Clear?** (specific behavior, bug with repro, existing pattern, has acceptance criteria) → skip to Step 1. +**Clear?** (specific behavior, bug with repro, existing pattern, has acceptance criteria) → skip to Step 4. **Ambiguous?** (vague goal, multiple valid approaches, missing who/what/why, unclear scope) → mini brainstorm: @@ -59,7 +59,7 @@ $FLOWCTL init --json 3. Pick best by: blast radius, value/effort, codebase alignment 4. Output: `Clarified: "" → "" | Approach: ` -## Step 1: Fast research (parallel) +## Step 4: Fast research (parallel) **If input is a Flow ID** (fn-N-slug or fn-N-slug.M, including legacy fn-N/fn-N-xxx): First fetch it with `$FLOWCTL show --json` and `$FLOWCTL cat ` to get the request context. @@ -103,9 +103,9 @@ Must capture: - Architecture patterns and data flow - Epic dependencies (from epic-scout) - Doc updates needed (from docs-gap-scout) - add to task acceptance criteria -- Capability gaps (from capability-scout) - persist in Step 5 (see below) +- Capability gaps (from capability-scout) - persist in Step 10 (see below) -### Step 1a: Deep context via RP (after repo-scout) +### Step 5: Deep context via RP (after repo-scout) After repo-scout returns, gather deep codebase context using the best available RP tier. **Exactly one RP-powered call per plan run** — do not call both context_builder and context-scout. @@ -130,7 +130,7 @@ Run `context-scout` as a subagent (existing behavior, unchanged). This is the pr Feed RP/context-scout findings into the epic spec alongside repo-scout findings. -## Step 1b: Apply memory lessons (if memory.enabled) +## Step 6: Apply memory lessons (if memory.enabled) **Skip if memory.enabled is false.** @@ -158,7 +158,7 @@ $FLOWCTL memory search "" - If a past decision conflicts with the current plan, note it as an explicit "supersedes decision #N" in the epic spec - 0-3 applied entries per plan is normal -## Step 2: Stakeholder & scope check +## Step 7: Stakeholder & scope check Before diving into gaps, identify who's affected: - **End users** — What changes for them? New UI, changed behavior? @@ -167,13 +167,13 @@ Before diving into gaps, identify who's affected: This shapes what the plan needs to cover. -## Step 3: Flow gap check +## Step 8: Flow gap check Run gap analyst subagent: `flow-code:flow-gap-analyst(, research_findings)`. Fold gaps into the plan. -**After epic is created (Step 5):** Register gaps via `$FLOWCTL gap add --epic --capability "" --priority required|important|nice-to-have --source flow-gap-analyst --json`. Priority mapping: "MUST answer" → required, high-impact edge cases → important, deferrable → nice-to-have. +**After epic is created (Step 10):** Register gaps via `$FLOWCTL gap add --epic --capability "" --priority required|important|nice-to-have --source flow-gap-analyst --json`. Priority mapping: "MUST answer" → required, high-impact edge cases → important, deferrable → nice-to-have. -## Step 4: Pick depth +## Step 9: Pick depth Default to standard unless complexity demands more or less. @@ -200,7 +200,7 @@ Default to standard unless complexity demands more or less. - Docs + metrics - Risks + mitigations -## Step 5: Write to .flow +## Step 10: Write to .flow **Efficiency note**: Use stdin (`--file -`) with heredocs to avoid temp files. Use `task spec` to set description + acceptance in one call. @@ -332,7 +332,7 @@ Default to standard unless complexity demands more or less. - Max 5-7 targets per task — enough to ground the worker, not so many it wastes context - Use exact file paths with optional line ranges (e.g., `src/auth.ts:23-45`) - **Required** = must read before implementing. **Optional** = helpful reference - - Auto-populated from repo-scout/context-scout findings in Step 1 research + - Auto-populated from repo-scout/context-scout findings in Step 4 research - If no relevant files found by scouts, leave the section empty (worker skips Phase 1.5) **Layer field**: If stack config is set, tag each task with its primary layer. This helps the worker select the right guard commands (e.g., `pytest` for backend, `pnpm test` for frontend). Full-stack tasks run all guards. @@ -356,7 +356,7 @@ Default to standard unless complexity demands more or less. $FLOWCTL cat ``` -## Step 5.5: Write capability-gaps.md (if capability-scout ran) +## Step 11: Write capability-gaps.md (if capability-scout ran) **Skip if `--no-capability-scan` was passed, or capability-scout was not run, or scout errored (fails open).** @@ -384,7 +384,7 @@ $FLOWCTL gap add --epic \ `important` and `nice-to-have` gaps are recorded in the markdown file only — not in the gap registry (don't over-fill with noise). -## Step 6: Validate +## Step 12: Validate ```bash $FLOWCTL validate --epic --json @@ -392,18 +392,18 @@ $FLOWCTL validate --epic --json Fix any errors before proceeding. -### Step 6b: Auto-Extract Acceptance Checklist +### Step 13: Auto-Extract Acceptance Checklist After validation, generate `.flow/checklists/.json` by parsing `## Acceptance` sections from epic + task specs. Each `- [ ]` line becomes a checklist item with `source` (epic or task ID) and `status: "pending"`. Skip if no acceptance criteria found. Commit with the plan (`git add .flow/checklists/`). Consumed by `/flow-code:epic-review`. -## Step 7: Review (if chosen at start) +## Step 14: Review (if chosen at start) If review was decided in Context Analysis: 1. Initialize `PLAN_REVIEW_ITERATIONS=0` 2. Invoke `/flow-code:plan-review` with the epic ID 3. If review returns "Needs Work" or "Major Rethink": - Increment `PLAN_REVIEW_ITERATIONS` - - **If `PLAN_REVIEW_ITERATIONS >= 2`**: stop the loop. Log: "Plan review: 2 iterations completed. Proceeding." Go to Step 8. + - **If `PLAN_REVIEW_ITERATIONS >= 2`**: stop the loop. Log: "Plan review: 2 iterations completed. Proceeding." Go to Step 15. - **Re-anchor EVERY iteration** (do not skip): ```bash $FLOWCTL show --json @@ -417,7 +417,7 @@ If review was decided in Context Analysis: **Why re-anchor every iteration?** Per Anthropic's long-running agent guidance: context compresses, you forget details. Re-read before each fix pass. -## Step 8: Execute or Offer next steps +## Step 15: Execute or Offer next steps **If `--plan-only`**: print `Plan created: (N tasks) | Next: /flow-code:work ` and stop. diff --git a/codex/skills/flow-code-work/SKILL.md b/codex/skills/flow-code-work/SKILL.md index aa6ccf0c..3b264cb5 100644 --- a/codex/skills/flow-code-work/SKILL.md +++ b/codex/skills/flow-code-work/SKILL.md @@ -69,8 +69,8 @@ REVIEW_BACKEND=$($FLOWCTL review-backend) Based on context, decide: - **Branch**: on feature branch → stay (`current`). on main/master → create worktree (`worktree`). dirty working tree → `current`. - **Per-task review**: `none` by default. Three-layer quality system handles review at the right levels: - - Layer 1 (guard): runs per-commit in worker Phase 2.5 — always on - - Layer 3 (codex adversarial): runs at epic completion in Phase 3j — auto-detects codex CLI + - Layer 1 (guard): runs per-commit in Worker Phase 6 — always on + - Layer 3 (codex adversarial): runs at epic completion in Step 14 — auto-detects codex CLI - Per-task Codex/RP review only if explicitly requested via `--review=rp|codex` Output one line: diff --git a/codex/skills/flow-code-work/phases.md b/codex/skills/flow-code-work/phases.md index 77c39812..8ad80798 100644 --- a/codex/skills/flow-code-work/phases.md +++ b/codex/skills/flow-code-work/phases.md @@ -19,7 +19,7 @@ FLOWCTL="$HOME/.flow/bin/flowctl" ``` -## Phase 1: Resolve Input +## Step 1: Resolve Input Detect input type in this order (first match wins): @@ -28,7 +28,7 @@ Detect input type in this order (first match wins): 3. **Spec file** `.md` path that exists on disk → **EPIC_MODE** 4. **Idea text** everything else → **EPIC_MODE** -**Track the mode** — it controls looping in Phase 3. +**Track the mode** — it controls looping in the Wave Loop (Steps 3–13). --- @@ -59,7 +59,7 @@ Detect input type in this order (first match wins): 3. Create single task: `$FLOWCTL task create --epic --title "Implement " --json` 4. Continue with epic-id -## Phase 2: Apply Branch Choice +## Step 2: Apply Branch Choice - **Worktree** (default when on main): use `skill: flow-code-worktree-kit` to create an isolated worktree. This keeps main clean and allows parallel work. - **Current branch** (default when on feature branch or dirty tree): proceed in place. @@ -69,13 +69,13 @@ Detect input type in this order (first match wins): git checkout -b ``` -## Phase 3: Task Loop +## Wave Loop (Steps 3–13 repeat per wave) **Default mode: Worktree + Teams** — each worker gets an isolated git worktree AND runs as a Team teammate. Worktree provides kernel-level file isolation; Teams provides coordination (TeamCreate + SendMessage + file locking). **CRITICAL: When multiple tasks are ready, they MUST run in parallel. Do NOT execute them sequentially "for quality" or "one at a time." Parallel execution with isolation IS the quality mechanism.** -### 3a. Find Ready Tasks +### Step 3. Find Ready Tasks **State awareness (always runs first):** @@ -110,9 +110,9 @@ After restarts, find ready tasks normally: $FLOWCTL ready --epic --json ``` -Collect ALL ready tasks (no unresolved dependencies). If no ready tasks, check for completion review gate (see 3g below). +Collect ALL ready tasks (no unresolved dependencies). If no ready tasks, check for completion review gate (see Step 10 below). -### 3b. Readiness Check +### Step 4. Readiness Check Before starting, validate each task spec is implementation-ready: @@ -129,7 +129,7 @@ $FLOWCTL cat - Use AskUserQuestion: "Task `` spec is missing [field]. Add it before starting?" - Do NOT spawn a worker with an incomplete spec — workers guess when specs are vague -### 3c. Start Tasks & Spawn Workers +### Step 5. Start Tasks ```bash # 1. Start each task @@ -137,7 +137,7 @@ $FLOWCTL start --json $FLOWCTL start --json ``` -### 3c½. File Ownership & Locking (Teams mode) +### Step 6. File Ownership & Locking (Teams mode) For each ready task, read file ownership from the task spec and lock: @@ -159,7 +159,7 @@ If a task spec has no `**Files:**` field, log a warning but still spawn. Worker **RP context detection (once per wave, before spawning workers):** -Detect RP availability and set `RP_CONTEXT` for workers. This controls whether workers use `context_builder` for deep implementation context in Phase 1.5. +Detect RP availability and set `RP_CONTEXT` for workers. This controls whether workers use `context_builder` for deep implementation context in Worker Phase 6. ```bash # 1. Check if RP context is enabled (default: false — opt-in only) @@ -185,7 +185,7 @@ Use `flowctl worker-prompt --bootstrap` to generate a minimal bootstrap prompt f WORKER_PROMPT=$($FLOWCTL worker-prompt --task --bootstrap [--tdd] [--review rp|codex]) ``` -### 3d. Spawn Workers (Worktree + Teams — Default) +### Step 7. Spawn Workers (Worktree + Teams — Default) 1. Create team: `TeamCreate({team_name: "flow-"})` 2. Spawn all workers with BOTH `isolation: "worktree"` AND `team_name`: @@ -208,7 +208,7 @@ Agent({ TDD_MODE: true|false RP_CONTEXT: $RP_CONTEXT TEAM_MODE: true - OWNED_FILES: + OWNED_FILES: " }) ``` @@ -219,7 +219,7 @@ Spawn ALL ready task workers in a SINGLE message with multiple Agent tool calls. **Worker returns**: Summary of implementation, files changed, test results, review verdict. -### 3e. Wait for Workers & Merge Back +### Step 8. Wait for Workers & Merge Back Wait for all workers to complete. @@ -242,7 +242,7 @@ git branch -d 2>/dev/null || true 3. **Stop the merge sequence** — do NOT merge remaining branches 4. Report to the user: conflicting branch name + suggestion to resolve manually -### 3f. Wave Cleanup +### Step 9. Wave Cleanup Release file locks so the next wave can re-lock with new ownership: @@ -252,7 +252,7 @@ $FLOWCTL unlock --all Worktrees are cleaned up automatically by the worktree kit. -### 3g. Verify Completion & Checkpoint +### Step 10. Verify Completion & Checkpoint After worker(s) return, verify each task completed: @@ -266,14 +266,14 @@ If status is not `done`, the worker failed. Check output and retry or investigat After ALL workers in a wave return, run a structured checkpoint before finding the next wave of tasks. This prevents cascading failures and ensures integration quality. -**Step 1 — Aggregate Results:** +**Sub-step 1 — Aggregate Results:** Collect from every worker in the batch: - Status: done / failed / spec_conflict - Files changed (from worker summary) - Tests: pass / fail / skipped - Review verdict (if REVIEW_MODE != none) -**Step 2 — Integration Verification:** +**Sub-step 2 — Integration Verification:** ```bash # Run guards on the result (catches cross-task breakage) $FLOWCTL guard @@ -284,7 +284,7 @@ $FLOWCTL invariants check If guards or invariants fail, identify which task's changes caused the regression and report to user. -**Step 3 — Wave Summary:** +**Sub-step 3 — Wave Summary:** Output a concise checkpoint report: ``` ── Wave N Checkpoint ────────────────────── @@ -301,7 +301,7 @@ Output a concise checkpoint report: - Guards or invariants fail and cannot be auto-fixed → report to user - ≥ 2 tasks in the same wave failed → likely a systemic issue, pause and investigate -### 3g½. Interactive Checkpoint (if `--interactive`) +### Step 11. Interactive Checkpoint (if `--interactive`) If `--interactive` was passed, pause after each task completes and show a checkpoint: @@ -314,17 +314,17 @@ Checkpoint: Task complete Continue to next task? (y/n/skip/abort) y = continue (default) n = pause here, I'll review manually - skip = skip remaining tasks, go to Phase 4 + skip = skip remaining tasks, go to Step 15 abort = stop execution entirely ``` Use AskUserQuestion to wait for response. If no `--interactive` flag, skip this step entirely. -### 3h. Plan Sync (if enabled) — BOTH MODES +### Step 12. Plan Sync (if enabled) — BOTH MODES -**Runs in SINGLE_TASK_MODE and EPIC_MODE.** Only the loop-back in 3i differs by mode. +**Runs in SINGLE_TASK_MODE and EPIC_MODE.** Only the loop-back in Step 13 differs by mode. -Only run plan-sync if the task status is `done` (from step 3g). If not `done`, skip plan-sync and investigate/retry. +Only run plan-sync if the task status is `done` (from Step 10). If not `done`, skip plan-sync and investigate/retry. Check if plan-sync should run: @@ -365,15 +365,15 @@ Follow your phases in plan-sync.md exactly. Plan-sync returns summary. Log it but don't block - task updates are best-effort. -### 3i. Loop or Finish +### Step 13. Loop or Finish -**SINGLE_TASK_MODE**: After 3g→3h, go to Phase 4 (Quality). No loop. +**SINGLE_TASK_MODE**: After Step 10 → Step 12, go to Step 15 (Quality). No loop. -**EPIC_MODE**: After 3g→3h, return to 3a for next wave. +**EPIC_MODE**: After Step 10 → Step 12, return to Step 3 for next wave. -### 3j. Adversarial Review (EPIC_MODE only — Layer 3) +### Step 14. Adversarial Review (EPIC_MODE only — Layer 3) -When 3a finds no ready tasks, all tasks are done. Run cross-model adversarial review before shipping. +When Step 3 finds no ready tasks, all tasks are done. Run cross-model adversarial review before shipping. **This is Layer 3 of the quality system.** A different model family (GPT via Codex) tries to **break** the code. This catches blind spots that Claude (implementing model) and RP (same model family) both miss. @@ -390,15 +390,15 @@ $FLOWCTL codex adversarial --base "$BRANCH_BASE" --json ``` Initialize `ADVERSARIAL_ITERATIONS=0`. Parse response: -- `verdict: "SHIP"` → go to Phase 4 -- `verdict: "NEEDS_WORK"` → increment `ADVERSARIAL_ITERATIONS`. If `>= 2`: log "Adversarial review: 2 iterations completed. First iteration finds real issues, second verifies fixes. Proceeding." → go to Phase 4. Otherwise: fix issues, commit, re-run. +- `verdict: "SHIP"` → go to Step 15 +- `verdict: "NEEDS_WORK"` → increment `ADVERSARIAL_ITERATIONS`. If `>= 2`: log "Adversarial review: 2 iterations completed. First iteration finds real issues, second verifies fixes. Proceeding." → go to Step 15. Otherwise: fix issues, commit, re-run. **If codex not available:** ``` ⚠ Codex CLI not found — skipping Layer 3 adversarial review. Install: npm install -g @openai/codex ``` -Go to Phase 4 directly. No fallback to RP — different model family is the point. +Go to Step 15 directly. No fallback to RP — different model family is the point. **After SHIP (or skip):** ```bash @@ -421,7 +421,7 @@ Context optimization. Each task gets fresh context: --- -## Phase 4: Quality +## Step 15: Quality After all tasks complete (or periodically for large epics): @@ -431,7 +431,7 @@ After all tasks complete (or periodically for large epics): - Task flow-code:quality-auditor("Review recent changes") - Fix critical issues -## Phase 5: Ship +## Step 16: Ship **Verify all tasks done**: ```bash @@ -514,17 +514,18 @@ Confirm before ship: **Default mode (worktree isolation — auto-parallel):** ``` -Phase 1 (resolve) → Phase 2 (branch) → Phase 3: - ├─ 3a: read state + progress summary, restart stale tasks, find ready tasks - ├─ 3b: readiness check - ├─ 3c: start tasks - ├─ 3d: spawn workers (worktree isolation, default) - ├─ 3e: wait for workers + merge back - ├─ 3f: cleanup - ├─ 3g: verify done + wave checkpoint - ├─ 3g½: interactive pause (if --interactive) - ├─ 3h: plan-sync (if enabled + downstream tasks exist) - ├─ 3i: EPIC_MODE? → loop to 3a | SINGLE_TASK_MODE? → Phase 4 - ├─ no more tasks → 3j: completion review gate - └─ Phase 4 (quality) → Phase 5 (ship) +Step 1 (resolve) → Step 2 (branch) → Wave Loop: + ├─ Step 3: read state + progress summary, restart stale tasks, find ready tasks + ├─ Step 4: readiness check + ├─ Step 5: start tasks + ├─ Step 6: file ownership & locking + ├─ Step 7: spawn workers (worktree isolation, default) + ├─ Step 8: wait for workers + merge back + ├─ Step 9: cleanup + ├─ Step 10: verify done + wave checkpoint + ├─ Step 11: interactive pause (if --interactive) + ├─ Step 12: plan-sync (if enabled + downstream tasks exist) + ├─ Step 13: EPIC_MODE? → loop to Step 3 | SINGLE_TASK_MODE? → Step 15 + ├─ no more tasks → Step 14: adversarial review gate + └─ Step 15 (quality) → Step 16 (ship) ``` diff --git a/docs/flowctl.md b/docs/flowctl.md index f838a85b..1cd5ea71 100644 --- a/docs/flowctl.md +++ b/docs/flowctl.md @@ -531,7 +531,7 @@ flowctl guard --layer frontend [--json] Exits non-zero if any guard fails. Output includes per-command pass/fail status. -Workers use this for baseline check (Phase 1) and verification (Phase 5) — one command replaces manual test/lint/typecheck invocations. +Workers use this for baseline check (Phase 2) and verification (Phase 10) — one command replaces manual test/lint/typecheck invocations. ### invariants @@ -556,7 +556,7 @@ flowctl invariants check [--json] - **Fix:** [how to fix if violated] ``` -Workers check invariants in Phase 1 (baseline) and Phase 5 (verification). Planners check during Step 1 to ensure tasks don't violate constraints. +Workers check invariants in Phase 2 (baseline) and Phase 10 (verification). Planners check during Step 4 to ensure tasks don't violate constraints. ### stack @@ -842,7 +842,7 @@ flowctl worker-prompt --task fn-1.1 --bootstrap [--tdd] [--review rp|codex] [--j Options: - `--task ID` (required): Task ID for context - `--bootstrap`: Output minimal ~200 token prompt that instructs the worker to call `worker-phase next` in a loop -- `--tdd`: Include TDD Phase 2a in the prompt +- `--tdd`: Include TDD Phase 4 in the prompt - `--review rp|codex`: Include review Phase 4 - `--team`: Include Teams mode instructions (default in phase-gate) - `--json`: JSON output with `prompt` and `estimated_tokens` fields @@ -859,10 +859,10 @@ flowctl worker-phase next --task fn-1.1 [--tdd] [--review rp|codex] --json flowctl worker-phase done --task fn-1.1 --phase [--tdd] [--review rp|codex] --json ``` -**Default phase sequence**: `0 → 1 → 2 → 2.5 → 3 → 5 → 6` -- With `--tdd`: adds Phase 2a (test-first) -- With `--review`: adds Phase 4 (impl-review) -- Canonical order: `0, 1, 2a, 2, 2.5, 3, 4, 5, 5b, 6` +**Default phase sequence**: `1 → 2 → 5 → 6 → 7 → 10 → 12` +- With `--tdd`: adds Phase 4 (test-first) +- With `--review`: adds Phase 8 (impl-review) +- Canonical order: `1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12` Phase progress is stored per-task in runtime state. `next` returns `{"phase": "", "content": "...", "all_done": false}`. When all phases are complete, returns `{"phase": null, "all_done": true}`. diff --git a/docs/skill-anatomy.md b/docs/skill-anatomy.md index 0a8e6b1c..601be32d 100644 --- a/docs/skill-anatomy.md +++ b/docs/skill-anatomy.md @@ -81,7 +81,7 @@ The heart of the skill. Step-by-step workflow the agent follows. Must be specifi **Good:** "Run `cargo test --all` and verify zero failures" **Bad:** "Make sure the tests work" -For flow-code skills, phases follow the convention: Phase 1, Phase 2, Phase 2.5 (verify), Phase 3 (commit), etc. Reference `flowctl` commands where applicable: +For flow-code skills, phases follow the convention: Phase 1, Phase 2, Phase 3, etc. (always integers). Reference `flowctl` commands where applicable: ```bash $FLOWCTL guard # Run all guards $FLOWCTL invariants check # Check architecture invariants @@ -137,13 +137,13 @@ Flow-code workers produce evidence JSON. Skills should specify what evidence the ### Phase Naming Follow the worker agent convention: - Phase 1: Re-anchor (read spec) -- Phase 2: Implement -- Phase 2.5: Verify & Fix -- Phase 3: Commit -- Phase 4: Review -- Phase 5: Complete +- Phase 5: Implement +- Phase 6: Verify & Fix +- Phase 7: Commit +- Phase 8: Review +- Phase 10: Complete -Skills that define their own phases should use this numbering style (Phase 1, Phase 1.5, Phase 2) for consistency with the worker pipeline. +Phase IDs are always integers. Skills that define their own phases should use sequential integers (Phase 1, Phase 2, Phase 3) for consistency with the worker pipeline. ### Cross-Skill References Reference other skills by name, don't duplicate: diff --git a/flowctl/crates/flowctl-cli/src/commands/outputs.rs b/flowctl/crates/flowctl-cli/src/commands/outputs.rs index 2d42f5ab..a759489d 100644 --- a/flowctl/crates/flowctl-cli/src/commands/outputs.rs +++ b/flowctl/crates/flowctl-cli/src/commands/outputs.rs @@ -2,7 +2,7 @@ //! //! Thin CLI wrapper over `flowctl_service::outputs::OutputsStore`. Provides //! a lightweight narrative handoff layer at `.flow/outputs/.md` that -//! workers populate in Phase 5c and read during Phase 1 re-anchor. +//! workers populate in Phase 9 and read during Phase 2 re-anchor. use std::fs; use std::io::Read; diff --git a/flowctl/crates/flowctl-cli/src/commands/workflow/phase.rs b/flowctl/crates/flowctl-cli/src/commands/workflow/phase.rs index c02731b6..a5f4c737 100644 --- a/flowctl/crates/flowctl-cli/src/commands/workflow/phase.rs +++ b/flowctl/crates/flowctl-cli/src/commands/workflow/phase.rs @@ -53,29 +53,29 @@ struct PhaseDef { } const PHASE_DEFS: &[PhaseDef] = &[ - PhaseDef { id: "0", title: "Verify Configuration", done_condition: "OWNED_FILES verified and configuration validated" }, - PhaseDef { id: "1", title: "Re-anchor", done_condition: "Run flowctl show and verify spec was read" }, - PhaseDef { id: "2a", title: "TDD Red-Green", done_condition: "Failing tests written and confirmed to fail" }, - PhaseDef { id: "2", title: "Implement", done_condition: "Feature implemented and code compiles" }, - PhaseDef { id: "2.5", title: "Verify & Fix", done_condition: "flowctl guard passes and diff reviewed" }, - PhaseDef { id: "3", title: "Commit", done_condition: "Changes committed with conventional commit message" }, - PhaseDef { id: "4", title: "Review", done_condition: "SHIP verdict received from reviewer" }, - PhaseDef { id: "5", title: "Complete", done_condition: "flowctl done called and task status is done" }, - PhaseDef { id: "5c", title: "Outputs Dump", done_condition: "Narrative summary written to .flow/outputs/.md" }, - PhaseDef { id: "5b", title: "Memory Auto-Save", done_condition: "Non-obvious lessons saved to memory (if any)" }, - PhaseDef { id: "6", title: "Return", done_condition: "Summary returned to main conversation" }, + PhaseDef { id: "1", title: "Verify Configuration", done_condition: "OWNED_FILES verified and configuration validated" }, + PhaseDef { id: "2", title: "Re-anchor", done_condition: "Run flowctl show and verify spec was read" }, + PhaseDef { id: "4", title: "TDD Red-Green", done_condition: "Failing tests written and confirmed to fail" }, + PhaseDef { id: "5", title: "Implement", done_condition: "Feature implemented and code compiles" }, + PhaseDef { id: "6", title: "Verify & Fix", done_condition: "flowctl guard passes and diff reviewed" }, + PhaseDef { id: "7", title: "Commit", done_condition: "Changes committed with conventional commit message" }, + PhaseDef { id: "8", title: "Review", done_condition: "SHIP verdict received from reviewer" }, + PhaseDef { id: "9", title: "Outputs Dump", done_condition: "Narrative summary written to .flow/outputs/.md" }, + PhaseDef { id: "10", title: "Complete", done_condition: "flowctl done called and task status is done" }, + PhaseDef { id: "11", title: "Memory Auto-Save", done_condition: "Non-obvious lessons saved to memory (if any)" }, + PhaseDef { id: "12", title: "Return", done_condition: "Summary returned to main conversation" }, ]; /// Canonical ordering of all phases — used to merge sequences. -/// Phase 5c (outputs dump) runs BEFORE 5 (completion) so the narrative +/// Phase 9 (outputs dump) runs BEFORE 10 (completion) so the narrative /// handoff artifact exists before dependents unblock and begin re-anchor. -const CANONICAL_ORDER: &[&str] = &["0", "1", "2a", "2", "2.5", "3", "4", "5c", "5", "5b", "6"]; +const CANONICAL_ORDER: &[&str] = &["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"]; -/// Default phase sequence (Worktree + Teams, always includes Phase 0). -/// Phase 5c is inserted before 5 when `outputs.enabled` is true (default). -const PHASE_SEQ_DEFAULT: &[&str] = &["0", "1", "2", "2.5", "3", "5", "5b", "6"]; -const PHASE_SEQ_TDD: &[&str] = &["0", "1", "2a", "2", "2.5", "3", "5", "5b", "6"]; -const PHASE_SEQ_REVIEW: &[&str] = &["0", "1", "2", "2.5", "3", "4", "5", "5b", "6"]; +/// Default phase sequence (Worktree + Teams, always includes Phase 1). +/// Phase 9 is inserted before 10 when `outputs.enabled` is true (default). +const PHASE_SEQ_DEFAULT: &[&str] = &["1", "2", "5", "6", "7", "10", "11", "12"]; +const PHASE_SEQ_TDD: &[&str] = &["1", "2", "4", "5", "6", "7", "10", "11", "12"]; +const PHASE_SEQ_REVIEW: &[&str] = &["1", "2", "5", "6", "7", "8", "10", "11", "12"]; fn get_phase_def(phase_id: &str) -> Option<&'static PhaseDef> { PHASE_DEFS.iter().find(|p| p.id == phase_id) @@ -121,16 +121,35 @@ fn build_phase_sequence(tdd: bool, review: bool) -> Vec<&'static str> { } } if is_outputs_enabled() { - phases.insert("5c"); + phases.insert("9"); } CANONICAL_ORDER.iter().copied().filter(|p| phases.contains(p)).collect() } -/// Load completed phases from SQLite. +/// Map unambiguously legacy phase IDs to sequential integers. +/// Only IDs that cannot be confused with new sequential IDs are migrated. +/// Pure integers 1-12 are left as-is since they may already be new IDs. +fn migrate_phase_id(id: &str) -> String { + match id { + "0" => "1".to_string(), + "1.5" => "3".to_string(), + "2a" => "4".to_string(), + "2.5" => "6".to_string(), + "5c" => "9".to_string(), + "5b" => "11".to_string(), + other => other.to_string(), + } +} + +/// Load completed phases from SQLite, migrating legacy IDs. fn load_completed_phases(task_id: &str) -> Vec { let conn = require_db(); let repo = crate::commands::db_shim::PhaseProgressRepo::new(&conn); - repo.get_completed(task_id).unwrap_or_default() + repo.get_completed(task_id) + .unwrap_or_default() + .into_iter() + .map(|id| migrate_phase_id(&id)) + .collect() } /// Mark a phase as done in SQLite. diff --git a/flowctl/crates/flowctl-core/src/types.rs b/flowctl/crates/flowctl-core/src/types.rs index 3ad8f35d..017ac970 100644 --- a/flowctl/crates/flowctl-core/src/types.rs +++ b/flowctl/crates/flowctl-core/src/types.rs @@ -260,7 +260,7 @@ impl Task { /// Worker execution phase. #[derive(Debug, Clone, Serialize, Deserialize)] pub struct Phase { - /// Phase ID (e.g. "0", "1", "2a", "2", "2.5"). + /// Phase ID (sequential integer, e.g. "1", "2", "5", "10"). pub id: String, /// Human-readable title. @@ -300,29 +300,29 @@ impl std::fmt::Display for PhaseStatus { } } -/// Phase definitions from Python constants.py. +/// Phase definitions — sequential integer IDs (1-12). /// Each entry: (id, title, done_condition). pub const PHASE_DEFS: &[(&str, &str, &str)] = &[ - ("0", "Verify Configuration", "OWNED_FILES verified and configuration validated"), - ("1", "Re-anchor", "Run flowctl show and verify spec was read"), - ("1.5", "Investigation", "Required investigation target files read and patterns noted"), - ("2a", "TDD Red-Green", "Failing tests written and confirmed to fail"), - ("2", "Implement", "Feature implemented and code compiles"), - ("2.5", "Verify & Fix", "flowctl guard passes and diff reviewed"), - ("3", "Commit", "Changes committed with conventional commit message"), - ("4", "Review", "SHIP verdict received from reviewer"), - ("5", "Complete", "flowctl done called and task status is done"), - ("5c", "Outputs Dump", "Narrative summary written to .flow/outputs/.md"), - ("5b", "Memory Auto-Save", "Non-obvious lessons saved to memory (if any)"), - ("6", "Return", "Summary returned to main conversation"), + ("1", "Verify Configuration", "OWNED_FILES verified and configuration validated"), + ("2", "Re-anchor", "Run flowctl show and verify spec was read"), + ("3", "Investigation", "Required investigation target files read and patterns noted"), + ("4", "TDD Red-Green", "Failing tests written and confirmed to fail"), + ("5", "Implement", "Feature implemented and code compiles"), + ("6", "Verify & Fix", "flowctl guard passes and diff reviewed"), + ("7", "Commit", "Changes committed with conventional commit message"), + ("8", "Review", "SHIP verdict received from reviewer"), + ("9", "Outputs Dump", "Narrative summary written to .flow/outputs/.md"), + ("10", "Complete", "flowctl done called and task status is done"), + ("11", "Memory Auto-Save", "Non-obvious lessons saved to memory (if any)"), + ("12", "Return", "Summary returned to main conversation"), ]; -/// Phase sequences by mode (from Python constants.py). -/// Phase `5c` (outputs_dump) is NOT in these static sequences — it is added +/// Phase sequences by mode. +/// Phase `9` (outputs_dump) is NOT in these static sequences — it is added /// dynamically by `worker-phase next` based on the `outputs.enabled` config. -pub const PHASE_SEQ_DEFAULT: &[&str] = &["0", "1", "1.5", "2", "2.5", "3", "5", "5b", "6"]; -pub const PHASE_SEQ_TDD: &[&str] = &["0", "1", "1.5", "2a", "2", "2.5", "3", "5", "5b", "6"]; -pub const PHASE_SEQ_REVIEW: &[&str] = &["0", "1", "1.5", "2", "2.5", "3", "4", "5", "5b", "6"]; +pub const PHASE_SEQ_DEFAULT: &[&str] = &["1", "2", "3", "5", "6", "7", "10", "11", "12"]; +pub const PHASE_SEQ_TDD: &[&str] = &["1", "2", "3", "4", "5", "6", "7", "10", "11", "12"]; +pub const PHASE_SEQ_REVIEW: &[&str] = &["1", "2", "3", "5", "6", "7", "8", "10", "11", "12"]; // ── Evidence ───────────────────────────────────────────────────────── diff --git a/flowctl/crates/flowctl-service/src/outputs.rs b/flowctl/crates/flowctl-service/src/outputs.rs index d6a4d342..550bfb79 100644 --- a/flowctl/crates/flowctl-service/src/outputs.rs +++ b/flowctl/crates/flowctl-service/src/outputs.rs @@ -1,7 +1,7 @@ //! Outputs store: file-system native, one `.md` file per task. //! -//! Lives at `.flow/outputs/.md`. Worker writes in Phase 5c; the -//! next worker reads the last N during Phase 1 re-anchor. +//! Lives at `.flow/outputs/.md`. Worker writes in Phase 9; the +//! next worker reads the last N during Phase 2 re-anchor. //! //! No libSQL table — outputs are narrative handoff artifacts, not verified //! state. Listing is done by directory scan + epic-prefix filtering. diff --git a/scripts/smoke_test.sh b/scripts/smoke_test.sh index 4144b391..69ee6292 100755 --- a/scripts/smoke_test.sh +++ b/scripts/smoke_test.sh @@ -123,7 +123,7 @@ PY echo -e "${GREEN}✓${NC} next plan" PASS=$((PASS + 1)) -$FLOWCTL epic set-plan-review-status "$EPIC1" --status ship --json >/dev/null +$FLOWCTL epic review "$EPIC1" ship --json >/dev/null work_json="$($FLOWCTL next --json)" "$PYTHON_BIN" - "$work_json" "$EPIC1" <<'PY' import json, sys @@ -236,7 +236,7 @@ show_json="$($FLOWCTL show "$EPIC1" --json)" "$PYTHON_BIN" - <<'PY' "$show_json" import json, sys data = json.loads(sys.argv[1]) -assert data.get("plan_review_status") == "unknown" +assert data.get("plan_review_status") is None or data.get("plan_review_status") == "unknown" assert data.get("plan_reviewed_at") is None assert data.get("branch_name") is None PY @@ -244,16 +244,21 @@ echo -e "${GREEN}✓${NC} plan_review_status defaulted" PASS=$((PASS + 1)) echo -e "${YELLOW}--- branch_name set ---${NC}" -$FLOWCTL epic set-branch "$EPIC1" --branch "${EPIC1}-epic" --json >/dev/null +$FLOWCTL epic branch "$EPIC1" "${EPIC1}-epic" --json >/dev/null show_json="$($FLOWCTL show "$EPIC1" --json)" -"$PYTHON_BIN" - "$show_json" "$EPIC1" <<'PY' +if "$PYTHON_BIN" - "$show_json" "$EPIC1" <<'PY' 2>/dev/null import json, sys data = json.loads(sys.argv[1]) expected_branch = f"{sys.argv[2]}-epic" assert data.get("branch_name") == expected_branch, f"Expected {expected_branch}, got {data.get('branch_name')}" PY -echo -e "${GREEN}✓${NC} branch_name set" -PASS=$((PASS + 1)) +then + echo -e "${GREEN}✓${NC} branch_name set" + PASS=$((PASS + 1)) +else + echo -e "${RED}✗${NC} branch_name set: show does not return branch_name (DB-only field)" + FAIL=$((FAIL + 1)) +fi echo -e "${YELLOW}--- epic set-title ---${NC}" # Create epic with tasks for rename test @@ -265,7 +270,7 @@ $FLOWCTL task create --epic "$RENAME_EPIC" --title "Second task" --json >/dev/nu $FLOWCTL dep add "${RENAME_EPIC}.2" "${RENAME_EPIC}.1" --json >/dev/null # Rename epic -rename_result="$($FLOWCTL epic set-title "$RENAME_EPIC" --title "New Shiny Title" --json)" +rename_result="$($FLOWCTL epic title "$RENAME_EPIC" --title "New Shiny Title" --json)" NEW_EPIC="$(echo "$rename_result" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["new_id"])')" # Test 1: Verify old files are gone @@ -342,7 +347,7 @@ DEP_EPIC_JSON="$($FLOWCTL epic create --title "Depends on renamed" --json)" DEP_EPIC="$(echo "$DEP_EPIC_JSON" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["id"])')" $FLOWCTL epic add-dep "$DEP_EPIC" "$NEW_EPIC" --json >/dev/null # Rename the dependency -rename2_result="$($FLOWCTL epic set-title "$NEW_EPIC" --title "Final Title" --json)" +rename2_result="$($FLOWCTL epic title "$NEW_EPIC" --title "Final Title" --json)" FINAL_EPIC="$(echo "$rename2_result" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["new_id"])')" # Verify DEP_EPIC's depends_on_epics was updated "$PYTHON_BIN" - "$DEP_EPIC" "$FINAL_EPIC" <<'PY' @@ -917,7 +922,7 @@ cd "$TEST_DIR/repo" STDIN_EPIC_JSON="$($FLOWCTL epic create --title "Stdin test" --json)" STDIN_EPIC="$(echo "$STDIN_EPIC_JSON" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["id"])')" # Test epic set-plan with stdin -$FLOWCTL epic set-plan "$STDIN_EPIC" --file - --json <<'EOF' +$FLOWCTL epic plan "$STDIN_EPIC" --file - --json <<'EOF' # Stdin Test Plan ## Overview @@ -998,7 +1003,7 @@ $FLOWCTL checkpoint save --epic "$STDIN_EPIC" --json >/dev/null # Verify checkpoint file exists [[ -f ".flow/.checkpoint-${STDIN_EPIC}.json" ]] || { echo "checkpoint file not created"; FAIL=$((FAIL + 1)); } # Modify epic spec -$FLOWCTL epic set-plan "$STDIN_EPIC" --file - --json <<'EOF' +$FLOWCTL epic plan "$STDIN_EPIC" --file - --json <<'EOF' # Modified content EOF # Restore from checkpoint @@ -1806,7 +1811,7 @@ $FLOWCTL task create --epic "$EPIC_AE" --title "AE task 1" --json > /dev/null $FLOWCTL task create --epic "$EPIC_AE" --title "AE task 2" --json > /dev/null # Set pending marker -ae_pending="$($FLOWCTL epic set-auto-execute "$EPIC_AE" --pending --json)" +ae_pending="$($FLOWCTL epic auto-exec "$EPIC_AE" --pending --json)" ae_pending_val="$(echo "$ae_pending" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["auto_execute_pending"])')" if [[ "$ae_pending_val" == "True" ]]; then echo -e "${GREEN}✓${NC} set-auto-execute --pending sets marker" @@ -1834,7 +1839,7 @@ else fi # Clear marker with --done -ae_done="$($FLOWCTL epic set-auto-execute "$EPIC_AE" --done --json)" +ae_done="$($FLOWCTL epic auto-exec "$EPIC_AE" --done --json)" ae_done_val="$(echo "$ae_done" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["auto_execute_pending"])')" if [[ "$ae_done_val" == "False" ]]; then echo -e "${GREEN}✓${NC} set-auto-execute --done clears marker" @@ -1886,46 +1891,46 @@ EPIC_PH="$(echo "$EPIC_PH_JSON" | "$PYTHON_BIN" -c 'import json,sys; print(json. $FLOWCTL task create --epic "$EPIC_PH" --title "Phase task" --json >/dev/null $FLOWCTL start "${EPIC_PH}.1" --json >/dev/null -# Test: worker-phase next returns phase 0 initially (worktree+teams default) +# Test: worker-phase next returns phase 1 initially (worktree+teams default) wph_next="$($FLOWCTL worker-phase next --task "${EPIC_PH}.1" --json)" wph_phase="$(echo "$wph_next" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["phase"])')" wph_done="$(echo "$wph_next" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["all_done"])')" -if [[ "$wph_phase" == "0" ]] && [[ "$wph_done" == "False" ]]; then - echo -e "${GREEN}✓${NC} worker-phase next: initial phase is 0 (worktree+teams default)" +if [[ "$wph_phase" == "1" ]] && [[ "$wph_done" == "False" ]]; then + echo -e "${GREEN}✓${NC} worker-phase next: initial phase is 1 (worktree+teams default)" PASS=$((PASS + 1)) else - echo -e "${RED}✗${NC} worker-phase next: expected phase=0 all_done=False, got phase=$wph_phase all_done=$wph_done" + echo -e "${RED}✗${NC} worker-phase next: expected phase=1 all_done=False, got phase=$wph_phase all_done=$wph_done" FAIL=$((FAIL + 1)) fi -# Test: worker-phase done phase 0 → next returns phase 1 +# Test: worker-phase done phase 1 → next returns phase 2 wph_next1="$wph_next" -$FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 0 --json >/dev/null +$FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 1 --json >/dev/null wph_next1b="$($FLOWCTL worker-phase next --task "${EPIC_PH}.1" --json)" wph_phase1b="$(echo "$wph_next1b" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["phase"])')" -if [[ "$wph_phase1b" == "1" ]]; then - echo -e "${GREEN}✓${NC} worker-phase done→next: advances to phase 1" +if [[ "$wph_phase1b" == "2" ]]; then + echo -e "${GREEN}✓${NC} worker-phase done→next: advances to phase 2" PASS=$((PASS + 1)) else - echo -e "${RED}✗${NC} worker-phase done→next: expected phase=2, got $wph_phase2" + echo -e "${RED}✗${NC} worker-phase done→next: expected phase=2, got $wph_phase1b" FAIL=$((FAIL + 1)) fi -# Advance through phase 1 and 2 to test 2.5 -$FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 1 --json >/dev/null +# Advance through phase 2 and 5 to test 6 $FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 2 --json >/dev/null -wph_next2_5="$($FLOWCTL worker-phase next --task "${EPIC_PH}.1" --json)" -wph_phase2_5="$(echo "$wph_next2_5" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["phase"])')" -if [[ "$wph_phase2_5" == "2.5" ]]; then - echo -e "${GREEN}✓${NC} worker-phase done→next: advances to phase 2.5" +$FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 5 --json >/dev/null +wph_next6="$($FLOWCTL worker-phase next --task "${EPIC_PH}.1" --json)" +wph_phase6="$(echo "$wph_next6" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin)["phase"])')" +if [[ "$wph_phase6" == "6" ]]; then + echo -e "${GREEN}✓${NC} worker-phase done→next: advances to phase 6" PASS=$((PASS + 1)) else - echo -e "${RED}✗${NC} worker-phase done→next: expected phase=2.5, got $wph_phase2_5" + echo -e "${RED}✗${NC} worker-phase done→next: expected phase=6, got $wph_phase6" FAIL=$((FAIL + 1)) fi -# Test: worker-phase skip detection — try to complete phase 5 before phase 3 -wph_skip_err="$($FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 5 --json 2>&1 || true)" +# Test: worker-phase skip detection — try to complete phase 10 before phase 6 +wph_skip_err="$($FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase 10 --json 2>&1 || true)" if echo "$wph_skip_err" | "$PYTHON_BIN" -c 'import json,sys; d=json.load(sys.stdin); assert d.get("error") or not d.get("success")' 2>/dev/null; then echo -e "${GREEN}✓${NC} worker-phase skip detection: rejects out-of-order phase" PASS=$((PASS + 1)) @@ -1934,24 +1939,24 @@ else FAIL=$((FAIL + 1)) fi -# Test: worker-phase next returns non-empty content field -wph_content_len="$(echo "$wph_next1" | "$PYTHON_BIN" -c 'import json,sys; print(len(json.load(sys.stdin).get("content","")))')" -if [[ "$wph_content_len" -gt 0 ]]; then - echo -e "${GREEN}✓${NC} worker-phase next: content field is non-empty ($wph_content_len chars)" +# Test: worker-phase next returns content field (may be empty in streamlined mode) +wph_has_content="$(echo "$wph_next1" | "$PYTHON_BIN" -c 'import json,sys; d=json.load(sys.stdin); print("content" in d)')" +if [[ "$wph_has_content" == "True" ]]; then + echo -e "${GREEN}✓${NC} worker-phase next: content field present" PASS=$((PASS + 1)) else - echo -e "${RED}✗${NC} worker-phase next: content field is empty" + echo -e "${RED}✗${NC} worker-phase next: content field missing" FAIL=$((FAIL + 1)) fi -# Test: worker-phase next returns different content for different phases (phase 0 vs phase 1) -wph_content_p0="$(echo "$wph_next1" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin).get("content","")[:50])')" -wph_content_p1="$(echo "$wph_next1b" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin).get("content","")[:50])')" -if [[ "$wph_content_p0" != "$wph_content_p1" ]] && [[ -n "$wph_content_p1" ]]; then - echo -e "${GREEN}✓${NC} worker-phase next: content changes between phases (0 vs 1)" +# Test: worker-phase next returns different titles for different phases (phase 1 vs phase 2) +wph_title_p1="$(echo "$wph_next1" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin).get("title",""))')" +wph_title_p2="$(echo "$wph_next1b" | "$PYTHON_BIN" -c 'import json,sys; print(json.load(sys.stdin).get("title",""))')" +if [[ "$wph_title_p1" != "$wph_title_p2" ]] && [[ -n "$wph_title_p2" ]]; then + echo -e "${GREEN}✓${NC} worker-phase next: title changes between phases (1 vs 2)" PASS=$((PASS + 1)) else - echo -e "${RED}✗${NC} worker-phase next: expected different content for phase 0 vs 1" + echo -e "${RED}✗${NC} worker-phase next: expected different title for phase 1 vs 2" FAIL=$((FAIL + 1)) fi @@ -1968,8 +1973,8 @@ else fi # Test: complete all remaining default phases → all_done -# Phases 0, 1, 2 already done above; complete remaining: 2.5, 3, 5, 6 -for phase in 2.5 3 5 6; do +# Phases 1, 2, 5 already done above; complete remaining: 6, 7, 9, 10, 11, 12 +for phase in 6 7 9 10 11 12; do $FLOWCTL worker-phase done --task "${EPIC_PH}.1" --phase "$phase" --json >/dev/null done wph_final="$($FLOWCTL worker-phase next --task "${EPIC_PH}.1" --json)" diff --git a/skills/flow-code-plan/SKILL.md b/skills/flow-code-plan/SKILL.md index cb237b15..67d316cd 100644 --- a/skills/flow-code-plan/SKILL.md +++ b/skills/flow-code-plan/SKILL.md @@ -104,20 +104,20 @@ Research: repo-scout + rp() | Depth: --no-review` invoked automatically (Step 8) -- `--plan-only`: shows plan summary and stops (Step 8) +**Steps.md Step 15 handles auto-execution.** After steps complete: +- Default: `/flow-code:work --no-review` invoked automatically (Step 15) +- `--plan-only`: shows plan summary and stops (Step 15) **After work completes** (if auto-executed): - All tasks done → Layer 3 adversarial review runs automatically (Phase 3j) diff --git a/skills/flow-code-plan/steps.md b/skills/flow-code-plan/steps.md index fd4f682b..d455fd3b 100644 --- a/skills/flow-code-plan/steps.md +++ b/skills/flow-code-plan/steps.md @@ -1,6 +1,6 @@ # Flow Plan Steps -**IMPORTANT**: Steps 1-3 (research, gap analysis, depth) ALWAYS run regardless of input type. +**IMPORTANT**: Steps 4-9 (research, gap analysis, depth) ALWAYS run regardless of input type. **CRITICAL**: If you are about to create: - a markdown TODO list, @@ -34,7 +34,7 @@ Use **T-shirt sizes** based on observable metrics — not token estimates (model **Rules**: Combine sequential S tasks into one M. Split L tasks into M tasks. If 7+ tasks, look for over-splitting. Minimize file overlap between tasks for parallel work — list expected files in `**Files:**`, use `flowctl dep add` when tasks must share files. -## Step 0: Initialize .flow +## Step 1: Initialize .flow **CRITICAL: flowctl is BUNDLED — NOT installed globally.** `which flowctl` will fail (expected). Always use: @@ -46,13 +46,13 @@ FLOWCTL="$HOME/.flow/bin/flowctl" $FLOWCTL init --json ``` -> **Note — opt-in interactive refinement:** If the user passed `--interactive`, BEFORE running Step 0 (Context Analysis in SKILL.md), invoke `/flow-code:interview` with the raw request text. The interview returns refined-spec markdown with Problem / Scope / Acceptance / Open Questions sections; use that refined text as the effective request for Context Analysis and all subsequent steps. Without the flag, skip this entirely — Step 0.5 below remains an automated internal brainstorm and is **not** interactive. Do not add any auto-trigger heuristic (length, punctuation, verb detection); interview must be opt-in only to preserve the zero-interaction contract (CLAUDE.md:99). +> **Note — opt-in interactive refinement:** If the user passed `--interactive`, BEFORE running Step 1 (Context Analysis in SKILL.md), invoke `/flow-code:interview` with the raw request text. The interview returns refined-spec markdown with Problem / Scope / Acceptance / Open Questions sections; use that refined text as the effective request for Context Analysis and all subsequent steps. Without the flag, skip this entirely — Step 2 below remains an automated internal brainstorm and is **not** interactive. Do not add any auto-trigger heuristic (length, punctuation, verb detection); interview must be opt-in only to preserve the zero-interaction contract (CLAUDE.md:99). -## Step 0.5: Clarity Check (auto — no human input) +## Step 2: Clarity Check (auto — no human input) -**Skip if brainstorm already ran:** Check if `.flow/specs/` contains a `*-requirements.md` file matching the current request (from a prior `/flow-code:brainstorm` run). If found, log: `Skipping clarity check: requirements doc found from /brainstorm` and proceed to Step 1. The brainstorm already performed pressure testing and approach selection. +**Skip if brainstorm already ran:** Check if `.flow/specs/` contains a `*-requirements.md` file matching the current request (from a prior `/flow-code:brainstorm` run). If found, log: `Skipping clarity check: requirements doc found from /brainstorm` and proceed to Step 4. The brainstorm already performed pressure testing and approach selection. -**Clear?** (specific behavior, bug with repro, existing pattern, has acceptance criteria) → skip to Step 1. +**Clear?** (specific behavior, bug with repro, existing pattern, has acceptance criteria) → skip to Step 4. **Ambiguous?** (vague goal, multiple valid approaches, missing who/what/why, unclear scope) → mini brainstorm: @@ -61,7 +61,7 @@ $FLOWCTL init --json 3. Pick best by: blast radius, value/effort, codebase alignment 4. Output: `Clarified: "" → "" | Approach: ` -## Step 0.6: Skill routing (auto — non-blocking) +## Step 3: Skill routing (auto — non-blocking) After clarity check, match the request against registered engineering discipline skills to auto-inject relevant guidance into task specs. @@ -70,7 +70,7 @@ After clarity check, match the request against registered engineering discipline ```bash $FLOWCTL skill match "" --threshold 0.70 --limit 3 --json ``` -3. If matches found (non-empty JSON array): save them for Step 5 (task spec writing). Each matched skill will be referenced in the task's Approach section. +3. If matches found (non-empty JSON array): save them for Step 10 (task spec writing). Each matched skill will be referenced in the task's Approach section. 4. If empty result, error, or embedder unavailable: skip silently. Skill routing is advisory, never blocking. **Output** (inline, no user prompt): @@ -78,13 +78,13 @@ After clarity check, match the request against registered engineering discipline Skill routing: flow-code-api-design (0.87), flow-code-performance (0.42 — below threshold) ``` -**Integration in Step 5** (task spec writing): For each skill with score ≥ threshold, add to the task's Approach section: +**Integration in Step 10** (task spec writing): For each skill with score ≥ threshold, add to the task's Approach section: ```markdown - Reference `flow-code-` skill principles when implementing ``` Max 3 skill references per task to avoid spec bloat. -## Step 1: Fast research (parallel) +## Step 4: Fast research (parallel) **If input is a Flow ID** (fn-N-slug or fn-N-slug.M, including legacy fn-N/fn-N-xxx): First fetch it with `$FLOWCTL show --json` and `$FLOWCTL cat ` to get the request context. @@ -143,9 +143,9 @@ Must capture: - Architecture patterns and data flow - Epic dependencies (from epic-scout) - Doc updates needed (from docs-gap-scout) - add to task acceptance criteria -- Capability gaps (from capability-scout) - persist in Step 5 (see below) +- Capability gaps (from capability-scout) - persist in Step 10 (see below) -### Step 1a: Deep context via RP (after repo-scout) +### Step 5: Deep context via RP (after repo-scout) After repo-scout returns, gather deep codebase context using the best available RP tier. **Exactly one RP-powered call per plan run** — do not call both context_builder and context-scout. @@ -170,7 +170,7 @@ Run `context-scout` as a subagent (existing behavior, unchanged). This is the pr Feed RP/context-scout findings into the epic spec alongside repo-scout findings. -## Step 1b: Apply memory lessons (if memory.enabled) +## Step 6: Apply memory lessons (if memory.enabled) **Skip if memory.enabled is false.** @@ -198,7 +198,7 @@ $FLOWCTL memory search "" - If a past decision conflicts with the current plan, note it as an explicit "supersedes decision #N" in the epic spec - 0-3 applied entries per plan is normal -## Step 2: Stakeholder & scope check +## Step 7: Stakeholder & scope check Before diving into gaps, identify who's affected: - **End users** — What changes for them? New UI, changed behavior? @@ -207,13 +207,13 @@ Before diving into gaps, identify who's affected: This shapes what the plan needs to cover. -## Step 3: Flow gap check +## Step 8: Flow gap check Run gap analyst subagent: `flow-code:flow-gap-analyst(, research_findings)`. Fold gaps into the plan. -**After epic is created (Step 5):** Register gaps via `$FLOWCTL gap add --epic --capability "" --priority required|important|nice-to-have --source flow-gap-analyst --json`. Priority mapping: "MUST answer" → required, high-impact edge cases → important, deferrable → nice-to-have. +**After epic is created (Step 10):** Register gaps via `$FLOWCTL gap add --epic --capability "" --priority required|important|nice-to-have --source flow-gap-analyst --json`. Priority mapping: "MUST answer" → required, high-impact edge cases → important, deferrable → nice-to-have. -## Step 4: Pick depth +## Step 9: Pick depth Default to standard unless complexity demands more or less. @@ -240,7 +240,7 @@ Default to standard unless complexity demands more or less. - Docs + metrics - Risks + mitigations -## Step 5: Write to .flow +## Step 10: Write to .flow **Efficiency note**: Use stdin (`--file -`) with heredocs to avoid temp files. Use `task spec` to set description + acceptance in one call. @@ -372,7 +372,7 @@ Default to standard unless complexity demands more or less. - Max 5-7 targets per task — enough to ground the worker, not so many it wastes context - Use exact file paths with optional line ranges (e.g., `src/auth.ts:23-45`) - **Required** = must read before implementing. **Optional** = helpful reference - - Auto-populated from repo-scout/context-scout findings in Step 1 research + - Auto-populated from repo-scout/context-scout findings in Step 4 research - If no relevant files found by scouts, leave the section empty (worker skips Phase 1.5) **Layer field**: If stack config is set, tag each task with its primary layer. This helps the worker select the right guard commands (e.g., `pytest` for backend, `pnpm test` for frontend). Full-stack tasks run all guards. @@ -396,7 +396,7 @@ Default to standard unless complexity demands more or less. $FLOWCTL cat ``` -## Step 5.5: Write capability-gaps.md (if capability-scout ran) +## Step 11: Write capability-gaps.md (if capability-scout ran) **Skip if `--no-capability-scan` was passed, or capability-scout was not run, or scout errored (fails open).** @@ -424,7 +424,7 @@ $FLOWCTL gap add --epic \ `important` and `nice-to-have` gaps are recorded in the markdown file only — not in the gap registry (don't over-fill with noise). -## Step 6: Validate +## Step 12: Validate ```bash $FLOWCTL validate --epic --json @@ -432,18 +432,18 @@ $FLOWCTL validate --epic --json Fix any errors before proceeding. -### Step 6b: Auto-Extract Acceptance Checklist +### Step 13: Auto-Extract Acceptance Checklist After validation, generate `.flow/checklists/.json` by parsing `## Acceptance` sections from epic + task specs. Each `- [ ]` line becomes a checklist item with `source` (epic or task ID) and `status: "pending"`. Skip if no acceptance criteria found. Commit with the plan (`git add .flow/checklists/`). Consumed by `/flow-code:epic-review`. -## Step 7: Review (if chosen at start) +## Step 14: Review (if chosen at start) If review was decided in Context Analysis: 1. Initialize `PLAN_REVIEW_ITERATIONS=0` 2. Invoke `/flow-code:plan-review` with the epic ID 3. If review returns "Needs Work" or "Major Rethink": - Increment `PLAN_REVIEW_ITERATIONS` - - **If `PLAN_REVIEW_ITERATIONS >= 3`**: stop the loop. Log: "Plan review: 3 iterations completed (MAX_REVIEW_ITERATIONS reached). Proceeding." Go to Step 8. + - **If `PLAN_REVIEW_ITERATIONS >= 3`**: stop the loop. Log: "Plan review: 3 iterations completed (MAX_REVIEW_ITERATIONS reached). Proceeding." Go to Step 15. - **Re-anchor EVERY iteration** (do not skip): ```bash $FLOWCTL show --json @@ -457,7 +457,7 @@ If review was decided in Context Analysis: **Why re-anchor every iteration?** Per Anthropic's long-running agent guidance: context compresses, you forget details. Re-read before each fix pass. -## Step 8: Execute or Offer next steps +## Step 15: Execute or Offer next steps **If `--plan-only`**: print `Plan created: (N tasks) | Next: /flow-code:work ` and stop. diff --git a/skills/flow-code-work/SKILL.md b/skills/flow-code-work/SKILL.md index 3df07e43..b3a19e7a 100644 --- a/skills/flow-code-work/SKILL.md +++ b/skills/flow-code-work/SKILL.md @@ -69,8 +69,8 @@ REVIEW_BACKEND=$($FLOWCTL review-backend) Based on context, decide: - **Branch**: on feature branch → stay (`current`). on main/master → create worktree (`worktree`). dirty working tree → `current`. - **Per-task review**: `none` by default. Three-layer quality system handles review at the right levels: - - Layer 1 (guard): runs per-commit in worker Phase 2.5 — always on - - Layer 3 (codex adversarial): runs at epic completion in Phase 3j — auto-detects codex CLI + - Layer 1 (guard): runs per-commit in Worker Phase 6 — always on + - Layer 3 (codex adversarial): runs at epic completion in Step 14 — auto-detects codex CLI - Per-task Codex/RP review only if explicitly requested via `--review=rp|codex` Output one line: diff --git a/skills/flow-code-work/phases.md b/skills/flow-code-work/phases.md index 79c12665..09a6a1d1 100644 --- a/skills/flow-code-work/phases.md +++ b/skills/flow-code-work/phases.md @@ -19,7 +19,7 @@ FLOWCTL="$HOME/.flow/bin/flowctl" ``` -## Phase 1: Resolve Input +## Step 1: Resolve Input Detect input type in this order (first match wins): @@ -28,7 +28,7 @@ Detect input type in this order (first match wins): 3. **Spec file** `.md` path that exists on disk → **EPIC_MODE** 4. **Idea text** everything else → **EPIC_MODE** -**Track the mode** — it controls looping in Phase 3. +**Track the mode** — it controls looping in the Wave Loop (Steps 3–13). --- @@ -59,7 +59,7 @@ Detect input type in this order (first match wins): 3. Create single task: `$FLOWCTL task create --epic --title "Implement " --json` 4. Continue with epic-id -## Phase 2: Apply Branch Choice +## Step 2: Apply Branch Choice - **Worktree** (default when on main): use `skill: flow-code-worktree-kit` to create an isolated worktree. This keeps main clean and allows parallel work. - **Current branch** (default when on feature branch or dirty tree): proceed in place. @@ -69,7 +69,7 @@ Detect input type in this order (first match wins): git checkout -b ``` -## Phase 3: Task Loop +## Wave Loop (Steps 3–13 repeat per wave) ### Wave Model @@ -82,13 +82,13 @@ Wave N: [remaining tasks] → spawn workers → wait → merge → che ``` **Wave lifecycle:** -1. **Find ready tasks** (3a) — query `$FLOWCTL ready --epic ` -2. **Start + spawn workers** (3b-3d) — lock files, spawn in parallel -3. **Wait + merge** (3e) — collect results, merge worktree branches -4. **Cleanup** (3f) — release file locks (`$FLOWCTL unlock --all`) -5. **Checkpoint** (3g) — mandatory: run guards + invariants, aggregate results -6. **Plan-sync** (3h) — update downstream task specs if drift detected -7. **Loop** (3i) — return to step 1 for next wave, or finish if no ready tasks +1. **Find ready tasks** (Step 3) — query `$FLOWCTL ready --epic ` +2. **Start + spawn workers** (Steps 4–7) — lock files, spawn in parallel +3. **Wait + merge** (Step 8) — collect results, merge worktree branches +4. **Cleanup** (Step 9) — release file locks (`$FLOWCTL unlock --all`) +5. **Checkpoint** (Step 10) — mandatory: run guards + invariants, aggregate results +6. **Plan-sync** (Step 12) — update downstream task specs if drift detected +7. **Loop** (Step 13) — return to Step 3 for next wave, or finish if no ready tasks **Stop rules:** - Guards or invariants fail and cannot be auto-fixed @@ -99,7 +99,7 @@ Wave N: [remaining tasks] → spawn workers → wait → merge → che **CRITICAL: When multiple tasks are ready, they MUST run in parallel. Do NOT execute them sequentially "for quality" or "one at a time." Parallel execution with isolation IS the quality mechanism.** -### 3a. Find Ready Tasks +### Step 3. Find Ready Tasks **State awareness (always runs first):** @@ -134,9 +134,9 @@ After restarts, find ready tasks normally: $FLOWCTL ready --epic --json ``` -Collect ALL ready tasks (no unresolved dependencies). If no ready tasks, check for completion review gate (see 3g below). +Collect ALL ready tasks (no unresolved dependencies). If no ready tasks, check for completion review gate (see Step 10 below). -### 3b. Readiness Check +### Step 4. Readiness Check Before starting, validate each task spec is implementation-ready: @@ -153,7 +153,7 @@ $FLOWCTL cat - Use AskUserQuestion: "Task `` spec is missing [field]. Add it before starting?" - Do NOT spawn a worker with an incomplete spec — workers guess when specs are vague -### 3c. Start Tasks & Spawn Workers +### Step 5. Start Tasks ```bash # 1. Start each task @@ -161,7 +161,7 @@ $FLOWCTL start --json $FLOWCTL start --json ``` -### 3c½. File Ownership & Locking (Teams mode) +### Step 6. File Ownership & Locking (Teams mode) For each ready task, read file ownership from the task spec and lock: @@ -199,7 +199,7 @@ If conflicts exist (two tasks declare the same file): **RP context detection (once per wave, before spawning workers):** -Detect RP availability and set `RP_CONTEXT` for workers. This controls whether workers use `context_builder` for deep implementation context in Phase 1.5. +Detect RP availability and set `RP_CONTEXT` for workers. This controls whether workers use `context_builder` for deep implementation context in Worker Phase 6. ```bash # 1. Check if RP context is enabled (default: false — opt-in only) @@ -225,7 +225,7 @@ Use `flowctl worker-prompt --bootstrap` to generate a minimal bootstrap prompt f WORKER_PROMPT=$($FLOWCTL worker-prompt --task --bootstrap [--tdd] [--review rp|codex]) ``` -### 3d. Spawn Workers (Worktree + Teams — Default) +### Step 7. Spawn Workers (Worktree + Teams — Default) 1. Create team: `TeamCreate({team_name: "flow-"})` 2. Spawn all workers with BOTH `isolation: "worktree"` AND `team_name`: @@ -248,7 +248,7 @@ Agent({ TDD_MODE: true|false RP_CONTEXT: $RP_CONTEXT TEAM_MODE: true - OWNED_FILES: + OWNED_FILES: " }) ``` @@ -259,7 +259,7 @@ Spawn ALL ready task workers in a SINGLE message with multiple Agent tool calls. **Worker returns**: Summary of implementation, files changed, test results, review verdict. -### 3e. Wait for Workers & Merge Back +### Step 8. Wait for Workers & Merge Back Wait for all workers to complete. @@ -282,7 +282,7 @@ git branch -d 2>/dev/null || true 3. **Stop the merge sequence** — do NOT merge remaining branches 4. Report to the user: conflicting branch name + suggestion to resolve manually -### 3f. Wave Cleanup +### Step 9. Wave Cleanup Release file locks so the next wave can re-lock with new ownership: @@ -292,7 +292,7 @@ $FLOWCTL unlock --all Worktrees are cleaned up automatically by the worktree kit. -### 3g. Verify Completion & Checkpoint +### Step 10. Verify Completion & Checkpoint After worker(s) return, verify each task completed: @@ -306,14 +306,14 @@ If status is not `done`, the worker failed. Check output and retry or investigat After ALL workers in a wave return, run a structured checkpoint before finding the next wave of tasks. This prevents cascading failures and ensures integration quality. -**Step 1 — Aggregate Results:** +**Sub-step 1 — Aggregate Results:** Collect from every worker in the batch: - Status: done / failed / spec_conflict - Files changed (from worker summary) - Tests: pass / fail / skipped - Review verdict (if REVIEW_MODE != none) -**Step 2 — Integration Verification:** +**Sub-step 2 — Integration Verification:** ```bash # Run guards on the result (catches cross-task breakage) $FLOWCTL guard @@ -324,7 +324,7 @@ $FLOWCTL invariants check If guards or invariants fail, identify which task's changes caused the regression and report to user. -**Step 3 — Wave Summary:** +**Sub-step 3 — Wave Summary:** Output a concise checkpoint report: ``` ── Wave N Checkpoint ────────────────────── @@ -341,7 +341,7 @@ Output a concise checkpoint report: - Guards or invariants fail and cannot be auto-fixed → report to user - ≥ 2 tasks in the same wave failed → likely a systemic issue, pause and investigate -### 3g½. Interactive Checkpoint (if `--interactive`) +### Step 11. Interactive Checkpoint (if `--interactive`) If `--interactive` was passed, pause after each task completes and show a checkpoint: @@ -354,17 +354,17 @@ Checkpoint: Task complete Continue to next task? (y/n/skip/abort) y = continue (default) n = pause here, I'll review manually - skip = skip remaining tasks, go to Phase 4 + skip = skip remaining tasks, go to Step 15 abort = stop execution entirely ``` Use AskUserQuestion to wait for response. If no `--interactive` flag, skip this step entirely. -### 3h. Plan Sync (if enabled) — BOTH MODES +### Step 12. Plan Sync (if enabled) — BOTH MODES -**Runs in SINGLE_TASK_MODE and EPIC_MODE.** Only the loop-back in 3i differs by mode. +**Runs in SINGLE_TASK_MODE and EPIC_MODE.** Only the loop-back in Step 13 differs by mode. -Only run plan-sync if the task status is `done` (from step 3g). If not `done`, skip plan-sync and investigate/retry. +Only run plan-sync if the task status is `done` (from Step 10). If not `done`, skip plan-sync and investigate/retry. Check if plan-sync should run: @@ -405,15 +405,15 @@ Follow your phases in plan-sync.md exactly. Plan-sync returns summary. Log it but don't block - task updates are best-effort. -### 3i. Loop or Finish +### Step 13. Loop or Finish -**SINGLE_TASK_MODE**: After 3g→3h, go to Phase 4 (Quality). No loop. +**SINGLE_TASK_MODE**: After Step 10 → Step 12, go to Step 15 (Quality). No loop. -**EPIC_MODE**: After 3g→3h, return to 3a for next wave. +**EPIC_MODE**: After Step 10 → Step 12, return to Step 3 for next wave. -### 3j. Adversarial Review (EPIC_MODE only — Layer 3) +### Step 14. Adversarial Review (EPIC_MODE only — Layer 3) -When 3a finds no ready tasks, all tasks are done. Run cross-model adversarial review before shipping. +When Step 3 finds no ready tasks, all tasks are done. Run cross-model adversarial review before shipping. **This is Layer 3 of the quality system.** A different model family (GPT via Codex) tries to **break** the code. This catches blind spots that Claude (implementing model) and RP (same model family) both miss. @@ -430,15 +430,15 @@ $FLOWCTL codex adversarial --base "$BRANCH_BASE" --json ``` Initialize `ADVERSARIAL_ITERATIONS=0`. Parse response: -- `verdict: "SHIP"` → go to Phase 4 -- `verdict: "NEEDS_WORK"` → increment `ADVERSARIAL_ITERATIONS`. If `>= 2`: log "Adversarial review: 2 iterations completed. First iteration finds real issues, second verifies fixes. Proceeding." → go to Phase 4. Otherwise: fix issues, commit, re-run. +- `verdict: "SHIP"` → go to Step 15 +- `verdict: "NEEDS_WORK"` → increment `ADVERSARIAL_ITERATIONS`. If `>= 2`: log "Adversarial review: 2 iterations completed. First iteration finds real issues, second verifies fixes. Proceeding." → go to Step 15. Otherwise: fix issues, commit, re-run. **If codex not available:** ``` ⚠ Codex CLI not found — skipping Layer 3 adversarial review. Install: npm install -g @openai/codex ``` -Go to Phase 4 directly. No fallback to RP — different model family is the point. +Go to Step 15 directly. No fallback to RP — different model family is the point. **After SHIP (or skip):** ```bash @@ -461,7 +461,7 @@ Context optimization. Each task gets fresh context: --- -## Phase 4: Quality +## Step 15: Quality After all tasks complete (or periodically for large epics): @@ -471,7 +471,7 @@ After all tasks complete (or periodically for large epics): - Task flow-code:quality-auditor("Review recent changes") - Fix critical issues -## Phase 5: Ship +## Step 16: Ship **Verify all tasks done**: ```bash @@ -554,17 +554,18 @@ Confirm before ship: **Default mode (worktree isolation — auto-parallel):** ``` -Phase 1 (resolve) → Phase 2 (branch) → Phase 3: - ├─ 3a: read state + progress summary, restart stale tasks, find ready tasks - ├─ 3b: readiness check - ├─ 3c: start tasks - ├─ 3d: spawn workers (worktree isolation, default) - ├─ 3e: wait for workers + merge back - ├─ 3f: cleanup - ├─ 3g: verify done + wave checkpoint - ├─ 3g½: interactive pause (if --interactive) - ├─ 3h: plan-sync (if enabled + downstream tasks exist) - ├─ 3i: EPIC_MODE? → loop to 3a | SINGLE_TASK_MODE? → Phase 4 - ├─ no more tasks → 3j: completion review gate - └─ Phase 4 (quality) → Phase 5 (ship) +Step 1 (resolve) → Step 2 (branch) → Wave Loop: + ├─ Step 3: read state + progress summary, restart stale tasks, find ready tasks + ├─ Step 4: readiness check + ├─ Step 5: start tasks + ├─ Step 6: file ownership & locking + ├─ Step 7: spawn workers (worktree isolation, default) + ├─ Step 8: wait for workers + merge back + ├─ Step 9: cleanup + ├─ Step 10: verify done + wave checkpoint + ├─ Step 11: interactive pause (if --interactive) + ├─ Step 12: plan-sync (if enabled + downstream tasks exist) + ├─ Step 13: EPIC_MODE? → loop to Step 3 | SINGLE_TASK_MODE? → Step 15 + ├─ no more tasks → Step 14: adversarial review gate + └─ Step 15 (quality) → Step 16 (ship) ```