Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ plugins/*/skills/
# any transitive dist/ dependencies the HUD imports — currently scripts/utils/).
scripts/hud/
scripts/utils/
scripts/package.json

# Generated shared agents (copied from shared/agents/ at build time)
# Note: plugin-specific agents (catch-up.md, devlog.md) are committed in their plugins
Expand Down
2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
"scripts/hooks/",
"scripts/hud.sh",
"scripts/hud/",
"scripts/utils/",
"scripts/package.json",
"src/templates/",
"README.md",
"LICENSE",
Expand Down
27 changes: 18 additions & 9 deletions plugins/devflow-plan/commands/plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ For **multi-issue** mode: collect all `#N` tokens from `$ARGUMENTS` as `ISSUE_NU

| Gate | Phase | Purpose |
|------|-------|---------|
| Gate 0 | Phase 1 | Confirm understanding before exploration |
| Gate 0 | Phase 1 | Requirements discovery before exploration |
| Gate 1 | Phase 7 | Validate scope + gap analysis results |
| Gate 2 | Phase 13 | Confirm final plan + design review |

Expand All @@ -45,15 +45,24 @@ No gate may be skipped. If user says "proceed" or "whatever you think", state re

### Block 1: Requirements Discovery

#### Phase 1: Gate 0 — Confirm Understanding
#### Phase 1: Gate 0 — Requirements Discovery

Present interpretation using AskUserQuestion:
- Core problem this solves
- Target users
- Expected outcome
- Key assumptions
Explore the user's intent through focused Socratic questioning before spawning agents.

For multi-issue: present unified scope across all issues.
**Skip discovery when** (semantic assessment, not word count):
- User has specified WHAT to build, HOW it should behave, and WHERE it integrates
- User input references an existing design document or detailed issue

**Process:**

1. **First question**: Confirm your understanding of the core problem and expected outcome. Frame as multiple choice when 2-3 interpretations exist.
2. **Follow-up questions** (if ambiguity remains): Probe constraints, scope boundaries, or tradeoffs via AskUserQuestion.
3. **Present approaches**: When multiple valid approaches exist, present 2-3 options with explicit tradeoffs. Lead with your recommendation and why.
4. **Confirm scope**: Summarize understanding including: core problem, target users, expected outcome, key assumptions, chosen approach (if applicable).

For multi-issue: present unified scope across all issues after individual discovery.

If the user says "skip" or "just proceed" — skip remaining questions, present inferred understanding (core problem, users, outcome, assumptions, recommended approach) in one message for confirmation, then proceed. Gate 0 is satisfied by the confirmation, not by the discovery questions.

**MANDATORY**: Do not spawn any agents until Gate 0 is confirmed.

Expand Down Expand Up @@ -350,7 +359,7 @@ Display completion summary:
/plan (orchestrator - spawns agents only)
├─ Block 1: Requirements Discovery
│ ├─ Phase 1: GATE 0 - Confirm Understanding ⛔ MANDATORY
│ ├─ Phase 1: GATE 0 - Requirements Discovery ⛔ MANDATORY
│ │ └─ AskUserQuestion: Validate interpretation
│ ├─ Phase 2: Orient + Load Knowledge
│ │ ├─ Skimmer agent (codebase context)
Expand Down
7 changes: 7 additions & 0 deletions scripts/build-hud.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,11 @@ while (queue.length > 0) {
}
}

// All compiled .js files use ESM syntax — Node needs this to resolve imports
// across sibling directories (e.g. hud/ importing from ../utils/).
const pkgJsonPath = path.join(scriptsRoot, 'package.json');
if (!fs.existsSync(pkgJsonPath)) {
fs.writeFileSync(pkgJsonPath, '{"type": "module"}\n');
}

console.log(`\u2713 HUD distribution: ${visited.size} files copied to ${scriptsRoot}`);
36 changes: 36 additions & 0 deletions shared/skills/plan:orch/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ This is a focused variant of the `/plan` command pipeline for ambient ORCHESTRAT

For GUIDED depth, the main session performs planning directly:

0. **Discover** — If the planning question is open-ended, ask clarifying questions via AskUserQuestion and present 2-3 approaches with tradeoffs before orienting. Skip if the user's prompt is already specific. If the user says "skip" or "just proceed": skip remaining questions, present inferred scope for confirmation.
1. **Spawn Skimmer** — `Agent(subagent_type="Skimmer")` targeting the area of interest. Use orientation output to ground design decisions in real file structures and patterns.
2. **Design** — Using Skimmer findings + loaded pattern/design skills, design the approach directly in main session. Apply `devflow:design-review` skill inline to check the plan for anti-patterns before presenting.
3. **Present** — Deliver structured plan using the Output format below. Use AskUserQuestion for ambiguous design choices.
Expand All @@ -44,6 +45,41 @@ KNOWLEDGE_CONTEXT=$(node scripts/hooks/lib/knowledge-context.cjs index "{worktre

This produces a compact index of active ADR/PF entries. Pass `KNOWLEDGE_CONTEXT` to Explorer and Designer agents — prior decisions constrain design, known pitfalls inform gap analysis. Agents use `devflow:apply-knowledge` to Read full entry bodies on demand.

## Phase 0.5: Requirements Discovery

Before committing to an approach, surface ambiguity through focused Socratic questioning.

**Skip when** (semantic assessment, not word count):
- User has specified WHAT to build, HOW it should behave, and WHERE it integrates — regardless of prompt length
- Invoked from within another pipeline (pipeline:orch, implement:orch)
- Single clear approach exists with no meaningful alternatives

**Skip examples** (proceed directly to Phase 1):
- "Add retry with exponential backoff to HttpClient in src/http.ts, max 3 retries, configurable timeout" — specific files, clear behavior, defined parameters
- "Implement the design from .docs/design/caching.md" — pre-existing specification

**Discover examples** (run Phase 0.5):
- "Add a caching layer" — open-ended, multiple valid approaches
- "Improve the auth flow" — vague scope, unclear what aspects need improvement
- "Design a notification system" — system-level, many architectural choices

**Process:**

1. **Assess** — Does the request have meaningful ambiguity or multiple valid approaches? If not, skip to Phase 1.
2. **Question** — Ask clarifying questions via AskUserQuestion. Prefer multiple choice (2-4 options) when tradeoffs exist.
3. **Propose approaches** — Present 2-3 options with explicit tradeoffs:
- Lead with your recommended approach and why
- Each option: 2-3 sentences + key tradeoff (complexity, performance, maintenance)
- Final option: "Other — describe your preferred approach"
4. **Confirm** — Get user's choice, then proceed to Phase 1 with a constrained problem.

If the user says "skip", "just proceed", or signals impatience — skip remaining questions, present your inferred understanding (problem, scope, recommended approach) in one message for confirmation, then proceed to Phase 1 after confirmation. This matches /plan Gate 0 behavior.

**Question design:**
- Ask about constraints and goals, not implementation details
- Surface hidden assumptions ("Does this need to handle concurrent writes?")
- Reveal scope boundaries ("Just the API layer, or the UI as well?")

## Phase 1: Orient

Spawn `Agent(subagent_type="Skimmer")` to get codebase overview relevant to the planning question:
Expand Down
5 changes: 4 additions & 1 deletion shared/skills/router/classification-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ Classify each prompt by **intent** and **depth** before responding.

Default to ORCHESTRATED for substantive work — it produces better results.
Reserve GUIDED for small focused changes where orchestration adds no value.
Prefer GUIDED over QUICK for any prompt involving code changes.
Prefer GUIDED over QUICK for any prompt involving code changes — UNLESS the
change AND target are both explicit and trivial (e.g., "rename foo to bar in
utils.ts", "change color from red to blue in header.tsx"). Such single-site,
obvious edits stay QUICK.

## Action

Expand Down
64 changes: 62 additions & 2 deletions shared/skills/test-driven-development/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,21 +72,81 @@ Apply DRY, extract patterns, improve readability.

---

## Cycle Verification

After each RED-GREEN-REFACTOR cycle, ALL must hold:

- [ ] Test existed BEFORE production code (not concurrent, not after)
- [ ] Test failed for the RIGHT reason (expected behavior absent, not syntax/import error)
- [ ] Production code is minimal — no speculative additions beyond what the test demands
- [ ] ALL tests pass, not just the new one
- [ ] Refactoring happened in Step 3 (or code is already clean — state explicitly)
- [ ] No untested production code remains

If any fails: you skipped TDD. Back up and redo the cycle correctly.

---

## Rationalization Prevention

These are the excuses developers use to skip TDD. Recognize and reject them.

| Excuse | Why It Feels Right | Why It's Wrong | Correct Action |
|--------|-------------------|---------------|----------------|
|--------|-------------------|----------------|----------------|
| "I'll write tests after" | Need to see the shape first | Tests ARE the shape — they define the interface before implementation exists | Write the test first |
| "Too simple to test" | It's just a getter/setter | Getters break, defaults change, edge cases hide in "simple" code | Write it — takes 30 seconds |
| "I'll refactor later" | Just get it working now | "Later" never comes; technical debt compounds silently | Refactor now in Step 3 |
| "Test is too hard to write" | Setup is complex, mocking is painful | Hard-to-test code = bad design; the test is telling you the interface is wrong | Simplify the interface first |
| "Need to see the whole picture" | Can't test what I haven't designed yet | TDD IS design; each test reveals the next piece of the interface | Let the test guide the design |
| "Tests slow me down" | Faster to just write the code | Faster until the first regression; TDD is faster for anything > 50 lines | Trust the cycle |
| "Framework is hard to set up" | Setup is complex | One-time cost vs recurring regression cost; untested code compounds debt | Set up the framework first — that IS the work |
| "Need the architecture first" | Can't test without structure | Tests DEFINE the architecture; they reveal what interfaces are needed | Let tests drive the structure |
| "This is infrastructure, not logic" | Plumbing doesn't need tests | Infrastructure carries data; broken plumbing floods everything downstream | Test the contract, not the internals |
| "Deadline is tight, no time for tests" | Ship now, test later | Untested code ships bugs; fixing bugs under deadline is slower than TDD | TDD is faster under pressure, not slower |

See `references/rationalization-prevention.md` for extended examples with code.

### Red Flags — STOP Immediately

The rationalization table above catches excuses you make *before starting*. These red flags catch you *mid-work* — thoughts that signal you are about to skip a step.

| Thought | Correction |
|---------|------------|
| "Let me write the implementation first, tests after" | Delete the code. Write the test. Watch it fail. Then rewrite. |
| "I need to see the shape before I can test" | The test IS the shape. It defines the interface before implementation. |
| "The test setup is too complex for this" | Complex setup = too much coupling. Simplify the design first. |
| "I'll just spike this and add tests later" | Unless the user said "spike" — you're rationalizing, not prototyping. |
| "Let me get it working, then lock it with tests" | Those tests verify your implementation, not the requirement. Backwards. |
| "I know this works, I've written it before" | Past code passed past tests. This code needs its own failing test. |
| "One more function, then I'll write the test" | STOP. Write the test for what you have now. Increments, not batches. |

### Test-First vs Code-First

**Code-first** (wrong):
1. Write `parseConfig()` — split lines, filter comments, build map
2. Write test: `parseConfig("key=val")` passes
3. Ship. Undiscovered: empty input, malformed lines, multi-value keys, whitespace
> Test mirrors implementation. Edge cases stay hidden until production.

**Test-first** (correct):
1. Test: "ignores comment lines" — `parseConfig("# comment\nkey=val")` → `{key: val}`
2. Test: "handles empty input" — `parseConfig("")` → empty result
3. Test: "rejects malformed lines" — `parseConfig("no-equals")` → error
4. Implement `parseConfig()` to satisfy all three
> Tests define the contract. Implementation forced to handle edges from the start.

The difference: code-first tests verify what you *happened to build*. Test-first tests specify what *should exist*.

### When Stuck

| Blocker | Solution |
|---------|----------|
| Don't know what to test first | Test the simplest input/output pair. "Given X, expect Y." Start there. |
| Test needs complex setup/state | Extract the logic into a pure function. Test that. Integrate after. |
| Behavior depends on external state (DB, API, clock) | Inject the dependency as a parameter. Pass a fake in tests. |
| Multiple behaviors tangled together | Decompose. One test = one behavior. If you can't isolate it, the design needs splitting. |
| Modifying legacy code with no tests | Write a characterization test first — a test that captures the current behavior, even if wrong. Then make your change and see what breaks. |

---

## Process Enforcement
Expand All @@ -109,7 +169,7 @@ When implementing any feature under ambient IMPLEMENT/GUIDED or IMPLEMENT/ORCHES
### What to Test

| Test | Don't Test |
|------|-----------|
|------|------------|
| Public API behavior | Private implementation details |
| Error conditions and edge cases | Framework internals |
| Integration points (boundaries) | Third-party library correctness |
Expand Down
Loading