diff --git a/.gitignore b/.gitignore index 895dc97..615e362 100644 --- a/.gitignore +++ b/.gitignore @@ -12,6 +12,7 @@ plugins/*/skills/ # any transitive dist/ dependencies the HUD imports — currently scripts/utils/). scripts/hud/ scripts/utils/ +scripts/package.json # Generated shared agents (copied from shared/agents/ at build time) # Note: plugin-specific agents (catch-up.md, devlog.md) are committed in their plugins diff --git a/package.json b/package.json index 5249430..d1cee43 100644 --- a/package.json +++ b/package.json @@ -14,6 +14,8 @@ "scripts/hooks/", "scripts/hud.sh", "scripts/hud/", + "scripts/utils/", + "scripts/package.json", "src/templates/", "README.md", "LICENSE", diff --git a/plugins/devflow-plan/commands/plan.md b/plugins/devflow-plan/commands/plan.md index 51fab42..7aefeaa 100644 --- a/plugins/devflow-plan/commands/plan.md +++ b/plugins/devflow-plan/commands/plan.md @@ -33,7 +33,7 @@ For **multi-issue** mode: collect all `#N` tokens from `$ARGUMENTS` as `ISSUE_NU | Gate | Phase | Purpose | |------|-------|---------| -| Gate 0 | Phase 1 | Confirm understanding before exploration | +| Gate 0 | Phase 1 | Requirements discovery before exploration | | Gate 1 | Phase 7 | Validate scope + gap analysis results | | Gate 2 | Phase 13 | Confirm final plan + design review | @@ -45,15 +45,24 @@ No gate may be skipped. If user says "proceed" or "whatever you think", state re ### Block 1: Requirements Discovery -#### Phase 1: Gate 0 — Confirm Understanding +#### Phase 1: Gate 0 — Requirements Discovery -Present interpretation using AskUserQuestion: -- Core problem this solves -- Target users -- Expected outcome -- Key assumptions +Explore the user's intent through focused Socratic questioning before spawning agents. -For multi-issue: present unified scope across all issues. +**Skip discovery when** (semantic assessment, not word count): +- User has specified WHAT to build, HOW it should behave, and WHERE it integrates +- User input references an existing design document or detailed issue + +**Process:** + +1. **First question**: Confirm your understanding of the core problem and expected outcome. Frame as multiple choice when 2-3 interpretations exist. +2. **Follow-up questions** (if ambiguity remains): Probe constraints, scope boundaries, or tradeoffs via AskUserQuestion. +3. **Present approaches**: When multiple valid approaches exist, present 2-3 options with explicit tradeoffs. Lead with your recommendation and why. +4. **Confirm scope**: Summarize understanding including: core problem, target users, expected outcome, key assumptions, chosen approach (if applicable). + +For multi-issue: present unified scope across all issues after individual discovery. + +If the user says "skip" or "just proceed" — skip remaining questions, present inferred understanding (core problem, users, outcome, assumptions, recommended approach) in one message for confirmation, then proceed. Gate 0 is satisfied by the confirmation, not by the discovery questions. **MANDATORY**: Do not spawn any agents until Gate 0 is confirmed. @@ -350,7 +359,7 @@ Display completion summary: /plan (orchestrator - spawns agents only) │ ├─ Block 1: Requirements Discovery -│ ├─ Phase 1: GATE 0 - Confirm Understanding ⛔ MANDATORY +│ ├─ Phase 1: GATE 0 - Requirements Discovery ⛔ MANDATORY │ │ └─ AskUserQuestion: Validate interpretation │ ├─ Phase 2: Orient + Load Knowledge │ │ ├─ Skimmer agent (codebase context) diff --git a/scripts/build-hud.js b/scripts/build-hud.js index cd9441b..9b5c51c 100755 --- a/scripts/build-hud.js +++ b/scripts/build-hud.js @@ -80,4 +80,11 @@ while (queue.length > 0) { } } +// All compiled .js files use ESM syntax — Node needs this to resolve imports +// across sibling directories (e.g. hud/ importing from ../utils/). +const pkgJsonPath = path.join(scriptsRoot, 'package.json'); +if (!fs.existsSync(pkgJsonPath)) { + fs.writeFileSync(pkgJsonPath, '{"type": "module"}\n'); +} + console.log(`\u2713 HUD distribution: ${visited.size} files copied to ${scriptsRoot}`); diff --git a/shared/skills/plan:orch/SKILL.md b/shared/skills/plan:orch/SKILL.md index 6e30239..1f38e36 100644 --- a/shared/skills/plan:orch/SKILL.md +++ b/shared/skills/plan:orch/SKILL.md @@ -24,6 +24,7 @@ This is a focused variant of the `/plan` command pipeline for ambient ORCHESTRAT For GUIDED depth, the main session performs planning directly: +0. **Discover** — If the planning question is open-ended, ask clarifying questions via AskUserQuestion and present 2-3 approaches with tradeoffs before orienting. Skip if the user's prompt is already specific. If the user says "skip" or "just proceed": skip remaining questions, present inferred scope for confirmation. 1. **Spawn Skimmer** — `Agent(subagent_type="Skimmer")` targeting the area of interest. Use orientation output to ground design decisions in real file structures and patterns. 2. **Design** — Using Skimmer findings + loaded pattern/design skills, design the approach directly in main session. Apply `devflow:design-review` skill inline to check the plan for anti-patterns before presenting. 3. **Present** — Deliver structured plan using the Output format below. Use AskUserQuestion for ambiguous design choices. @@ -44,6 +45,41 @@ KNOWLEDGE_CONTEXT=$(node scripts/hooks/lib/knowledge-context.cjs index "{worktre This produces a compact index of active ADR/PF entries. Pass `KNOWLEDGE_CONTEXT` to Explorer and Designer agents — prior decisions constrain design, known pitfalls inform gap analysis. Agents use `devflow:apply-knowledge` to Read full entry bodies on demand. +## Phase 0.5: Requirements Discovery + +Before committing to an approach, surface ambiguity through focused Socratic questioning. + +**Skip when** (semantic assessment, not word count): +- User has specified WHAT to build, HOW it should behave, and WHERE it integrates — regardless of prompt length +- Invoked from within another pipeline (pipeline:orch, implement:orch) +- Single clear approach exists with no meaningful alternatives + +**Skip examples** (proceed directly to Phase 1): +- "Add retry with exponential backoff to HttpClient in src/http.ts, max 3 retries, configurable timeout" — specific files, clear behavior, defined parameters +- "Implement the design from .docs/design/caching.md" — pre-existing specification + +**Discover examples** (run Phase 0.5): +- "Add a caching layer" — open-ended, multiple valid approaches +- "Improve the auth flow" — vague scope, unclear what aspects need improvement +- "Design a notification system" — system-level, many architectural choices + +**Process:** + +1. **Assess** — Does the request have meaningful ambiguity or multiple valid approaches? If not, skip to Phase 1. +2. **Question** — Ask clarifying questions via AskUserQuestion. Prefer multiple choice (2-4 options) when tradeoffs exist. +3. **Propose approaches** — Present 2-3 options with explicit tradeoffs: + - Lead with your recommended approach and why + - Each option: 2-3 sentences + key tradeoff (complexity, performance, maintenance) + - Final option: "Other — describe your preferred approach" +4. **Confirm** — Get user's choice, then proceed to Phase 1 with a constrained problem. + +If the user says "skip", "just proceed", or signals impatience — skip remaining questions, present your inferred understanding (problem, scope, recommended approach) in one message for confirmation, then proceed to Phase 1 after confirmation. This matches /plan Gate 0 behavior. + +**Question design:** +- Ask about constraints and goals, not implementation details +- Surface hidden assumptions ("Does this need to handle concurrent writes?") +- Reveal scope boundaries ("Just the API layer, or the UI as well?") + ## Phase 1: Orient Spawn `Agent(subagent_type="Skimmer")` to get codebase overview relevant to the planning question: diff --git a/shared/skills/router/classification-rules.md b/shared/skills/router/classification-rules.md index 4e56e57..bd6f3ca 100644 --- a/shared/skills/router/classification-rules.md +++ b/shared/skills/router/classification-rules.md @@ -21,7 +21,10 @@ Classify each prompt by **intent** and **depth** before responding. Default to ORCHESTRATED for substantive work — it produces better results. Reserve GUIDED for small focused changes where orchestration adds no value. -Prefer GUIDED over QUICK for any prompt involving code changes. +Prefer GUIDED over QUICK for any prompt involving code changes — UNLESS the +change AND target are both explicit and trivial (e.g., "rename foo to bar in +utils.ts", "change color from red to blue in header.tsx"). Such single-site, +obvious edits stay QUICK. ## Action diff --git a/shared/skills/test-driven-development/SKILL.md b/shared/skills/test-driven-development/SKILL.md index ad5c2c8..106e4ed 100644 --- a/shared/skills/test-driven-development/SKILL.md +++ b/shared/skills/test-driven-development/SKILL.md @@ -72,21 +72,81 @@ Apply DRY, extract patterns, improve readability. --- +## Cycle Verification + +After each RED-GREEN-REFACTOR cycle, ALL must hold: + +- [ ] Test existed BEFORE production code (not concurrent, not after) +- [ ] Test failed for the RIGHT reason (expected behavior absent, not syntax/import error) +- [ ] Production code is minimal — no speculative additions beyond what the test demands +- [ ] ALL tests pass, not just the new one +- [ ] Refactoring happened in Step 3 (or code is already clean — state explicitly) +- [ ] No untested production code remains + +If any fails: you skipped TDD. Back up and redo the cycle correctly. + +--- + ## Rationalization Prevention These are the excuses developers use to skip TDD. Recognize and reject them. | Excuse | Why It Feels Right | Why It's Wrong | Correct Action | -|--------|-------------------|---------------|----------------| +|--------|-------------------|----------------|----------------| | "I'll write tests after" | Need to see the shape first | Tests ARE the shape — they define the interface before implementation exists | Write the test first | | "Too simple to test" | It's just a getter/setter | Getters break, defaults change, edge cases hide in "simple" code | Write it — takes 30 seconds | | "I'll refactor later" | Just get it working now | "Later" never comes; technical debt compounds silently | Refactor now in Step 3 | | "Test is too hard to write" | Setup is complex, mocking is painful | Hard-to-test code = bad design; the test is telling you the interface is wrong | Simplify the interface first | | "Need to see the whole picture" | Can't test what I haven't designed yet | TDD IS design; each test reveals the next piece of the interface | Let the test guide the design | | "Tests slow me down" | Faster to just write the code | Faster until the first regression; TDD is faster for anything > 50 lines | Trust the cycle | +| "Framework is hard to set up" | Setup is complex | One-time cost vs recurring regression cost; untested code compounds debt | Set up the framework first — that IS the work | +| "Need the architecture first" | Can't test without structure | Tests DEFINE the architecture; they reveal what interfaces are needed | Let tests drive the structure | +| "This is infrastructure, not logic" | Plumbing doesn't need tests | Infrastructure carries data; broken plumbing floods everything downstream | Test the contract, not the internals | +| "Deadline is tight, no time for tests" | Ship now, test later | Untested code ships bugs; fixing bugs under deadline is slower than TDD | TDD is faster under pressure, not slower | See `references/rationalization-prevention.md` for extended examples with code. +### Red Flags — STOP Immediately + +The rationalization table above catches excuses you make *before starting*. These red flags catch you *mid-work* — thoughts that signal you are about to skip a step. + +| Thought | Correction | +|---------|------------| +| "Let me write the implementation first, tests after" | Delete the code. Write the test. Watch it fail. Then rewrite. | +| "I need to see the shape before I can test" | The test IS the shape. It defines the interface before implementation. | +| "The test setup is too complex for this" | Complex setup = too much coupling. Simplify the design first. | +| "I'll just spike this and add tests later" | Unless the user said "spike" — you're rationalizing, not prototyping. | +| "Let me get it working, then lock it with tests" | Those tests verify your implementation, not the requirement. Backwards. | +| "I know this works, I've written it before" | Past code passed past tests. This code needs its own failing test. | +| "One more function, then I'll write the test" | STOP. Write the test for what you have now. Increments, not batches. | + +### Test-First vs Code-First + +**Code-first** (wrong): +1. Write `parseConfig()` — split lines, filter comments, build map +2. Write test: `parseConfig("key=val")` passes +3. Ship. Undiscovered: empty input, malformed lines, multi-value keys, whitespace +> Test mirrors implementation. Edge cases stay hidden until production. + +**Test-first** (correct): +1. Test: "ignores comment lines" — `parseConfig("# comment\nkey=val")` → `{key: val}` +2. Test: "handles empty input" — `parseConfig("")` → empty result +3. Test: "rejects malformed lines" — `parseConfig("no-equals")` → error +4. Implement `parseConfig()` to satisfy all three +> Tests define the contract. Implementation forced to handle edges from the start. + +The difference: code-first tests verify what you *happened to build*. Test-first tests specify what *should exist*. + +### When Stuck + +| Blocker | Solution | +|---------|----------| +| Don't know what to test first | Test the simplest input/output pair. "Given X, expect Y." Start there. | +| Test needs complex setup/state | Extract the logic into a pure function. Test that. Integrate after. | +| Behavior depends on external state (DB, API, clock) | Inject the dependency as a parameter. Pass a fake in tests. | +| Multiple behaviors tangled together | Decompose. One test = one behavior. If you can't isolate it, the design needs splitting. | +| Modifying legacy code with no tests | Write a characterization test first — a test that captures the current behavior, even if wrong. Then make your change and see what breaks. | + --- ## Process Enforcement @@ -109,7 +169,7 @@ When implementing any feature under ambient IMPLEMENT/GUIDED or IMPLEMENT/ORCHES ### What to Test | Test | Don't Test | -|------|-----------| +|------|------------| | Public API behavior | Private implementation details | | Error conditions and edge cases | Framework internals | | Integration points (boundaries) | Third-party library correctness |