From 3552f14b7d7830af5de88c00574adbbc6b1298cf Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:14:11 +0800 Subject: [PATCH 1/9] docs: add skill-anatomy.md template specification - Standardized skill structure: frontmatter, 6 required sections - Anti-rationalization as core principle (Common Rationalizations table) - Red Flags and Verification checklist requirements - Flow-code specifics: flowctl integration, evidence, phase naming - Writing principles: process > knowledge, specific > general, evidence > assumption - References flow-code-debug as native exemplar Task: fn-136-borrow-agent-skills-patterns-anti.1 --- docs/skill-anatomy.md | 169 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 docs/skill-anatomy.md diff --git a/docs/skill-anatomy.md b/docs/skill-anatomy.md new file mode 100644 index 00000000..0a8e6b1c --- /dev/null +++ b/docs/skill-anatomy.md @@ -0,0 +1,169 @@ +# Skill Anatomy + +Standard structure for flow-code skill files. Use this when creating or reviewing skills. + +## File Location + +Every skill lives in its own directory under `skills/`: + +``` +skills/flow-code-/ + SKILL.md # Required: The skill definition (<=500 lines) + workflow.md # Optional: Extended workflow if SKILL.md overflows + templates/ # Optional: Scripts, config templates +``` + +Prefix all skill directories with `flow-code-`. Main file is always `SKILL.md` (uppercase). + +## YAML Frontmatter (Required) + +```yaml +--- +name: flow-code- +description: Use when [triggering conditions and symptoms only] +--- +``` + +**Rules:** +- `name`: Lowercase, hyphen-separated. Must match directory name. Always starts with `flow-code-`. +- `description`: Starts with "Use when...". Max 500 characters. Third person. + - Include: triggering conditions, symptoms, contexts. + - Exclude: workflow summary, process steps, what the skill does. + +**Why:** Descriptions are injected into system prompts for skill discovery. If the description contains process steps, agents follow the summary and skip the actual skill content. + +## Required Sections + +```markdown +# Skill Title + +## Overview +Core principle in 1-2 sentences. What this skill enforces and why it matters. + +## When to Use +- Triggering conditions (symptoms, task types, failure patterns) +- **When NOT to use:** exclusions to prevent misapplication + +## Core Process +The step-by-step workflow. Numbered phases or steps. +Include inline code examples. Use ASCII flowcharts for decision points. + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "Too simple to need this" | Simple things break too | +| "I'll do it properly later" | Later never comes | + +## Red Flags +- Observable symptoms indicating the skill is being violated +- Patterns to watch for during self-check and review + +## Verification +After completing the process, confirm: +- [ ] Checklist item with verifiable evidence +- [ ] Another checkpoint (test output, build result, etc.) +``` + +## Section Purposes + +### Overview +The elevator pitch. Answers: what does this skill enforce, and why should an agent follow it? + +### When to Use +Helps agents decide if this skill applies. Include both positive triggers ("Use when X") and negative exclusions ("NOT for Y"). The flow-code-debug exemplar: +> Test failures, bugs, unexpected behavior, performance problems, build failures. +> **Especially when:** under time pressure, "quick fix" seems obvious. + +### Core Process +The heart of the skill. Step-by-step workflow the agent follows. Must be specific and actionable. + +**Good:** "Run `cargo test --all` and verify zero failures" +**Bad:** "Make sure the tests work" + +For flow-code skills, phases follow the convention: Phase 1, Phase 2, Phase 2.5 (verify), Phase 3 (commit), etc. Reference `flowctl` commands where applicable: +```bash +$FLOWCTL guard # Run all guards +$FLOWCTL invariants check # Check architecture invariants +``` + +### Common Rationalizations +The most distinctive section. Excuses agents use to skip important steps, paired with factual rebuttals. Think of every time an agent said "I'll add tests later" or "This is simple enough to skip" -- those go here. + +This is the core anti-rationalization principle. Every skip-worthy step needs a counter-argument. Without this section, agents reliably talk themselves out of following the process. + +### Red Flags +Observable signs the skill is being violated. Phrased as quotes or behaviors: +- "Quick fix for now, investigate later" +- "I don't fully understand but this might work" +- Proposing solutions before completing investigation +- Each fix reveals a new problem in a different place + +### Verification +Exit criteria as a checkbox checklist. Every item must be verifiable with evidence (test output, build result, git diff, flowctl output). No subjective items like "code looks good." + +## Supporting Files + +Create supporting files only when: +- SKILL.md exceeds 500 lines (overflow to `workflow.md` or similar) +- Reusable scripts or templates are needed +- Long checklists justify separate files + +Keep patterns and principles inline when under ~50 lines. Most skills need only SKILL.md. + +## Writing Principles + +1. **Process over knowledge.** Skills are workflows, not reference docs. Steps, not facts. +2. **Specific over general.** `$FLOWCTL guard` beats "verify the code works." +3. **Evidence over assumption.** Every verification checkbox requires proof. +4. **Anti-rationalization is core.** Every skip-worthy step needs a counter-argument in the rationalizations table. This is what separates effective skills from advice. +5. **Token-conscious.** Every section must justify its inclusion. If removing it wouldn't change agent behavior, remove it. +6. **Progressive disclosure.** SKILL.md is the entry point. Supporting files load on demand. + +## Flow-Code Specifics + +### flowctl Integration +Skills that enforce process should reference flowctl commands: +- `$FLOWCTL guard` for verification gates +- `$FLOWCTL done --summary-file --evidence-json` for evidence-based completion +- `$FLOWCTL invariants check` for architecture invariant enforcement + +### Evidence Requirements +Flow-code workers produce evidence JSON. Skills should specify what evidence their process generates: +```json +{"commits": ["abc123"], "tests": ["cargo test --all"], "prs": []} +``` + +### Phase Naming +Follow the worker agent convention: +- Phase 1: Re-anchor (read spec) +- Phase 2: Implement +- Phase 2.5: Verify & Fix +- Phase 3: Commit +- Phase 4: Review +- Phase 5: Complete + +Skills that define their own phases should use this numbering style (Phase 1, Phase 1.5, Phase 2) for consistency with the worker pipeline. + +### Cross-Skill References +Reference other skills by name, don't duplicate: +```markdown +If the build breaks, use the `flow-code-debug` skill. +For test-first development, see `flow-code-tdd`. +``` + +## Exemplar + +The `flow-code-debug` skill (`skills/flow-code-debug/SKILL.md`) is the reference implementation. It demonstrates all required sections including the Common Rationalizations table (lines 137-151) and Red Flags list (lines 128-135). + +## Checklist for New Skills + +- [ ] Directory created as `skills/flow-code-/` +- [ ] SKILL.md has valid YAML frontmatter with `name` and `description` +- [ ] Description starts with "Use when..." (no workflow summary) +- [ ] All six required sections present (Overview, When to Use, Core Process, Common Rationalizations, Red Flags, Verification) +- [ ] Rationalizations table has 3+ entries with factual rebuttals +- [ ] Red flags list has observable symptoms (not vague advice) +- [ ] Verification checklist items are evidence-backed +- [ ] SKILL.md is under 500 lines +- [ ] Registered in plugin.json and README.md skills table From 0b417d75d687261d5fc5463d477ec81d33467e84 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:14:30 +0800 Subject: [PATCH 2/9] agents: add Phase 2.3 plan alignment check to worker - New Phase 2.3 between Phase 2 (Implement) and Phase 2.5 (Verify & Fix) - Checks implementation scope against epic spec for drift - Minor drift noted in evidence, major drift escalated via Spec conflict protocol - Lightweight 30-second gate, not a full re-review Task: fn-136-borrow-agent-skills-patterns-anti.6 --- agents/worker.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/agents/worker.md b/agents/worker.md index d4006cda..2fccb934 100644 --- a/agents/worker.md +++ b/agents/worker.md @@ -413,6 +413,27 @@ The main conversation will resolve the conflict and re-dispatch you (or update t - Required API endpoint already exists with different signature + +## Phase 2.3: Plan Alignment Check + +Quick sanity check — did implementation stay within plan scope? + +1. Re-read epic spec: ` cat ` +2. Compare implementation scope against epic's scope section: + - Files changed match expected files in task spec? + - No features added beyond what spec described? + - No architectural changes not covered in the plan? +3. If drift detected: + - **Minor** (extra helper, renamed file): note in evidence as `"plan_drift": ""` + - **Major** (new feature, changed architecture, different approach): send protocol message: + ``` + Spec conflict: — implementation diverged from plan. + Drift: + ``` + +**This is a 30-second check, not a full re-review.** Read the spec, glance at your diff, note any drift. Then proceed to Phase 2.5. + + ## Phase 2.5: Verify & Fix From 1ddf9f3f39286d8e4fea776c70659286d26657b5 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:15:18 +0800 Subject: [PATCH 3/9] skills: inject Prove-It pattern into flow-code-debug Phase 4 Task: fn-136-borrow-agent-skills-patterns-anti.7 --- skills/flow-code-debug/SKILL.md | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/skills/flow-code-debug/SKILL.md b/skills/flow-code-debug/SKILL.md index 8fb89322..2463b24b 100644 --- a/skills/flow-code-debug/SKILL.md +++ b/skills/flow-code-debug/SKILL.md @@ -88,19 +88,20 @@ END ## Phase 4: Implementation -1. **Write failing test** (if TDD mode or test framework available): - ```bash - # Test must fail, proving the bug exists - guard --layer - ``` +### Prove-It Pattern (mandatory for bug fixes) -2. **Implement single fix** — address root cause, ONE change, no "while I'm here" improvements. - **No bundling:** Do NOT fix multiple things at once. If you're tempted to "also fix this other thing", STOP — commit the single fix first, verify, then address the next issue separately. +1. **Write reproduction test** — a test that demonstrates the bug (MUST FAIL) +2. **Confirm RED** — run the test, verify it actually fails. If it passes, your test doesn't reproduce the bug +3. **Fix root cause** — implement the fix (not a workaround) +4. **Confirm GREEN** — run the test, verify it now passes +5. **Run full suite** — check for regressions: ` guard` -3. **Verify fix:** - ```bash - guard - ``` +**If the test passes on step 2, your test is wrong.** Go back to Phase 1 and refine your understanding of the bug. + +### Fix Discipline + +- **Implement single fix** — address root cause, ONE change, no "while I'm here" improvements. + **No bundling:** Do NOT fix multiple things at once. If you're tempted to "also fix this other thing", STOP — commit the single fix first, verify, then address the next issue separately. 4. **If fix doesn't work — failure escalation:** @@ -148,6 +149,7 @@ END | "Need more context" | You have tools. Search first, ask only what's truly unavailable | | "Suggest handling manually" | This is your bug. Own it. Exhaust all options first | | Same logic, different parameters | Tweaking parameters is NOT a different approach. Change the method. | +| "Bug is too simple for a test" | Simple bugs regress. The test takes 2 minutes. The re-diagnosis takes 2 hours. | ## Quick Reference @@ -157,7 +159,7 @@ END | 1.5 RP Investigate | context_builder(question) with symptoms + hypotheses | Cross-file context gathered (or skipped if no RP) | | 2. Pattern | Find working examples, compare differences | Identified the delta | | 3. Hypothesis | Form theory, test ONE variable | Confirmed or new hypothesis | -| 4. Implement | Write test, fix root cause, verify | Bug resolved, guards pass | +| 4. Implement | Prove-It: RED test → fix root cause → GREEN test → full suite | Bug resolved, guards pass | ## After Fix From 4e62c55133fa02e835aecde5c4579100c92971d8 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:15:40 +0800 Subject: [PATCH 4/9] docs: add reference checklists for testing, security, performance - testing-patterns.md: AAA pattern, mock boundaries, Rust test patterns, anti-patterns, prove-it workflow - security-checklist.md: OWASP Top 10, secret scanning, cargo audit, SQL safety, input validation - performance-checklist.md: cargo bench/flamegraph, DB anti-patterns, async/await patterns, tokio runtime Task: fn-136-borrow-agent-skills-patterns-anti.2 --- docs/references/performance-checklist.md | 152 +++++++++++++++++++++ docs/references/security-checklist.md | 141 +++++++++++++++++++ docs/references/testing-patterns.md | 167 +++++++++++++++++++++++ 3 files changed, 460 insertions(+) create mode 100644 docs/references/performance-checklist.md create mode 100644 docs/references/security-checklist.md create mode 100644 docs/references/testing-patterns.md diff --git a/docs/references/performance-checklist.md b/docs/references/performance-checklist.md new file mode 100644 index 00000000..936141a5 --- /dev/null +++ b/docs/references/performance-checklist.md @@ -0,0 +1,152 @@ +# Performance Checklist + +Quick reference for performance in flow-code. Rust backend with libSQL, async runtime. + +## Measurement Commands + +```bash +# Rust benchmarks (add benches/ directory with criterion) +cargo bench + +# Criterion (statistical benchmarking) +cargo bench --bench my_benchmark + +# Flamegraph (CPU profiling) +cargo install flamegraph +cargo flamegraph --bin flowctl -- status --json + +# Memory profiling (Linux) +valgrind --tool=massif target/release/flowctl status --json + +# Binary size +ls -lh target/release/flowctl +cargo bloat --release --crates + +# Compile time +cargo build --release --timings +``` + +| Tool | What It Measures | When to Use | +|---|---|---| +| `cargo bench` (criterion) | Function-level latency | Before/after optimization | +| `cargo flamegraph` | CPU hotspots | Investigating slow commands | +| `cargo bloat` | Binary size by crate | When binary grows unexpectedly | +| `valgrind --tool=massif` | Heap allocations | Investigating memory usage | +| `hyperfine` | CLI command latency | Comparing command performance | + +## Database Anti-Patterns (libSQL) + +| Anti-Pattern | Impact | Fix | +|---|---|---| +| N+1 queries | Linear DB round-trips | Use JOINs or batch queries | +| Unbounded queries | Memory exhaustion, timeouts | Always use LIMIT, paginate | +| Missing indexes | Slow reads as data grows | Add indexes on filtered/sorted columns | +| No connection pooling | Connection overhead per query | Reuse connections | +| SELECT * | Fetches unused columns | Select only needed columns | +| String concatenation in SQL | Injection risk + no query cache | Use parameterized queries | +| No transactions for batch ops | Partial failures, inconsistency | Wrap in BEGIN/COMMIT | + +## Backend CLI Checklist + +| Area | Target | How to Verify | +|---|---|---| +| Command startup | < 50ms | `hyperfine 'flowctl status --json'` | +| Task list (100 tasks) | < 200ms | Benchmark with test data | +| Database open | < 20ms | Profile with `tracing` spans | +| JSON serialization | < 5ms for typical output | `cargo bench` | +| Binary size | < 20MB release | `ls -lh target/release/flowctl` | +| Memory usage (idle) | < 10MB | `valgrind` or `/usr/bin/time -v` | + +## Rust-Specific Performance + +### Async/Await Patterns + +| Do | Don't | +|---|---| +| Use `tokio::spawn` for concurrent I/O | Block the async runtime with sync I/O | +| Use `tokio::task::spawn_blocking` for CPU work | Run CPU-heavy code in async context | +| Use `tokio::join!` for parallel awaits | Await sequentially when independent | +| Use bounded channels for backpressure | Use unbounded channels (memory leak risk) | + +```rust +// Good: parallel async operations +let (tasks, epics) = tokio::join!( + db.list_tasks(), + db.list_epics() +); + +// Bad: sequential when independent +let tasks = db.list_tasks().await?; +let epics = db.list_epics().await?; // Waits unnecessarily +``` + +### Memory Patterns + +| Do | Don't | +|---|---| +| Use `&str` over `String` where possible | Clone strings unnecessarily | +| Use `Cow` for maybe-owned data | Allocate when borrowing suffices | +| Pre-allocate with `Vec::with_capacity` | Push to vec without size hint | +| Use iterators over collecting into vecs | `collect()` when you can chain | +| Use `Arc` for shared read-only data | Clone large structs across threads | + +```rust +// Good: pre-allocate +let mut results = Vec::with_capacity(tasks.len()); + +// Bad: repeated reallocation +let mut results = Vec::new(); // Grows 1, 2, 4, 8... +``` + +### String Performance + +| Do | Don't | +|---|---| +| Use `write!` or `format!` once | Repeated `push_str` / `+` concatenation | +| Use `String::with_capacity` for building | Build strings without size hint | +| Return `impl Display` for lazy formatting | Format eagerly when output may not be used | +| Use `serde_json::to_writer` for streaming | Serialize to String then write | + +## Tokio Runtime + +| Setting | Recommendation | +|---|---| +| Runtime flavor | `current_thread` for CLI (lower overhead) | +| Worker threads | Default for server, 1 for CLI tools | +| Blocking threads | Default (512), reduce for CLI | +| Task budget | Let tokio manage; avoid manual yielding | + +```rust +// CLI tool: single-threaded runtime (faster startup) +#[tokio::main(flavor = "current_thread")] +async fn main() { ... } + +// Server: multi-threaded (default) +#[tokio::main] +async fn main() { ... } +``` + +## Performance Anti-Patterns + +| Anti-Pattern | Impact | Fix | +|---|---|---| +| Blocking async runtime | Starves other tasks | Use `spawn_blocking` | +| N+1 database queries | Linear latency growth | Batch or JOIN queries | +| Cloning large structs | Memory + CPU waste | Use references or `Arc` | +| Unbounded collections | Memory exhaustion | Add limits, use streaming | +| Sync file I/O in async | Blocks executor thread | Use `tokio::fs` | +| Excessive logging in hot path | I/O overhead | Use log levels, sample | +| Compiling in debug mode for perf tests | 10-100x slower | Always `--release` | +| Not reusing DB connections | Connection overhead | Use connection pool | +| Serializing then parsing JSON | Double work | Pass typed structs | +| Formatting strings nobody reads | CPU waste | Lazy formatting with `Display` | + +## Profiling Workflow + +| Step | Command | Purpose | +|---|---|---| +| 1. Baseline | `hyperfine 'flowctl status --json'` | Measure current perf | +| 2. Profile | `cargo flamegraph --bin flowctl -- status` | Find hotspots | +| 3. Optimize | Edit code targeting hotspot | Reduce cost | +| 4. Verify | Re-run baseline command | Confirm improvement | +| 5. Benchmark | `cargo bench` | Prevent regression | diff --git a/docs/references/security-checklist.md b/docs/references/security-checklist.md new file mode 100644 index 00000000..a44bff1a --- /dev/null +++ b/docs/references/security-checklist.md @@ -0,0 +1,141 @@ +# Security Checklist + +Quick reference for security in flow-code. CLI tool handling local state, no web server. + +## Pre-Commit Secret Scan + +```bash +# Check staged files for secrets before committing +git diff --cached | grep -iE \ + "password\s*=|secret\s*=|api_key\s*=|token\s*=|private_key|BEGIN RSA|BEGIN EC" + +# Patterns to catch +grep -rn "AKIA[0-9A-Z]{16}" # AWS access keys +grep -rn "sk-[a-zA-Z0-9]{48}" # OpenAI API keys +grep -rn "ghp_[a-zA-Z0-9]{36}" # GitHub personal tokens +grep -rn "xoxb-[0-9]{10,}" # Slack bot tokens +``` + +## Files to Never Commit + +| File/Pattern | Reason | Mitigation | +|---|---|---| +| `.env` | Contains secrets | Add to `.gitignore` | +| `*.pem`, `*.key` | Private keys | Add to `.gitignore` | +| `.flow/` | Runtime state (may contain tokens) | Add to `.gitignore` | +| `credentials.json` | Service account keys | Add to `.gitignore` | +| `*.upstream` | Backup files with potential secrets | Add to `.gitignore` | + +Commit `.env.example` with placeholder values instead of `.env`. + +## Secrets Management + +| Do | Don't | +|---|---| +| Use environment variables for secrets | Hardcode secrets in source | +| Commit `.env.example` with placeholders | Commit `.env` with real values | +| Use `--json` flag for machine parsing | Parse human-readable output with secrets | +| Rotate secrets after exposure | Ignore leaked credentials | +| Scope API keys to minimum permissions | Use admin keys for all operations | + +## OWASP Top 10 Quick Reference + +| # | Vulnerability | Prevention | +|---|---|---| +| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification | +| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code | +| 3 | Injection | Parameterized queries, input validation | +| 4 | Insecure Design | Threat modeling, spec-driven development | +| 5 | Security Misconfiguration | Minimal permissions, audit deps, secure defaults | +| 6 | Vulnerable Components | `cargo audit`, `cargo deny`, minimal deps | +| 7 | Auth Failures | Strong passwords, rate limiting, session management | +| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts | +| 9 | Logging Failures | Log security events, don't log secrets | +| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests | + +## Dependency Audit (Rust) + +```bash +# Audit for known vulnerabilities +cargo audit + +# Deny specific licenses or advisories +cargo deny check + +# Check for outdated dependencies +cargo outdated + +# Minimal dependency tree (fewer deps = smaller attack surface) +cargo tree | wc -l +``` + +| Tool | Purpose | When to Run | +|---|---|---| +| `cargo audit` | CVE database check | Pre-commit, CI | +| `cargo deny` | License + advisory policy | CI pipeline | +| `cargo outdated` | Stale dependency check | Weekly/monthly | +| `cargo tree` | Dependency graph audit | When adding deps | + +## Input Validation + +| Rule | Example | +|---|---| +| Validate at boundaries only | CLI args, JSON input, file reads | +| Use allowlists, not denylists | Accept known-good, reject everything else | +| Constrain string lengths | Task titles: 1-500 chars | +| Validate file paths | No `..` traversal, stay within `.flow/` | +| Sanitize output for shells | Escape special chars in generated commands | +| Parse, don't validate | Deserialize into typed structs, not raw strings | + +```rust +// Good: parse into typed struct at boundary +let task: TaskInput = serde_json::from_str(&input)?; + +// Bad: pass raw string around and validate later +fn process(raw: &str) { /* who validates this? */ } +``` + +## Error Handling + +| Context | Approach | +|---|---| +| CLI output (user-facing) | Clear error message, no stack traces | +| JSON output (`--json`) | Structured error: `{"error": "message"}` | +| Internal logging | Full context, file/line, but no secrets | +| Panics | Catch at boundaries with `catch_unwind` or `?` | + +```rust +// Good: generic error to user, detail in logs +eprintln!("Error: failed to open database"); +tracing::error!("DB open failed: {:?} at {:?}", err, path); + +// Bad: expose internals +eprintln!("Error: {}", err); // May leak path, SQL, etc. +``` + +## SQL Safety (libSQL) + +| Do | Don't | +|---|---| +| Use parameterized queries | Concatenate user input into SQL | +| Use `?` placeholders | Use `format!()` for query building | +| Validate types before query | Trust raw input as safe | +| Use transactions for multi-step ops | Execute multiple queries without atomicity | + +```rust +// Good: parameterized +conn.execute("INSERT INTO tasks (title) VALUES (?1)", [&title]).await?; + +// Bad: string concatenation +conn.execute(&format!("INSERT INTO tasks (title) VALUES ('{}')", title), ()).await?; +``` + +## File System Security + +| Rule | Rationale | +|---|---| +| Validate paths stay within `.flow/` | Prevent directory traversal | +| Use temp dirs for scratch work | Don't pollute user directories | +| Set restrictive permissions on state files | Protect `.flow/` data | +| Clean up temp files on exit | No stale sensitive data | +| Don't follow symlinks blindly | Potential symlink attacks | diff --git a/docs/references/testing-patterns.md b/docs/references/testing-patterns.md new file mode 100644 index 00000000..8b192a85 --- /dev/null +++ b/docs/references/testing-patterns.md @@ -0,0 +1,167 @@ +# Testing Patterns Reference + +Quick reference for testing patterns in flow-code. Rust backend, Bash scripts, Markdown skills. + +## Test Pyramid + +| Layer | Coverage | What to Test | Tools | +|-------|----------|--------------|-------| +| Unit | ~80% | Pure functions, logic, parsing, serialization | `cargo test`, `#[test]` | +| Integration | ~15% | DB queries, CLI commands, cross-crate calls | `cargo test --test`, temp dirs | +| E2E | ~5% | Full workflows (smoke tests, ralph loops) | `bash scripts/smoke_test.sh` | + +## Test Structure (Arrange-Act-Assert) + +```rust +#[test] +fn creates_task_with_default_status() { + // Arrange: set up test data + let input = TaskInput { title: "Test".into(), ..Default::default() }; + + // Act: perform the action + let result = create_task(input); + + // Assert: verify outcome + assert_eq!(result.status, Status::Todo); + assert!(!result.id.is_empty()); +} +``` + +## Rust Test Patterns + +| Pattern | Usage | Example | +|---------|-------|---------| +| `#[test]` | Basic unit test | `fn test_parse() { assert_eq!(parse("x"), Ok("x")); }` | +| `#[should_panic]` | Expected panic | `#[should_panic(expected = "empty")] fn test_empty()` | +| `#[tokio::test]` | Async test | `async fn test_db_query() { ... }` | +| `assert_eq!` | Equality check | `assert_eq!(actual, expected)` | +| `assert!` | Boolean check | `assert!(result.is_ok())` | +| `assert_ne!` | Inequality check | `assert_ne!(id1, id2)` | +| Test modules | Organize tests | `#[cfg(test)] mod tests { use super::*; }` | +| `#[ignore]` | Skip slow tests | Run with `cargo test -- --ignored` | + +## Running Tests + +```bash +# All tests +cd flowctl && cargo test --all + +# Specific crate +cargo test -p flowctl-core +cargo test -p flowctl-db +cargo test -p flowctl-cli + +# Single test by name +cargo test test_create_task + +# With output +cargo test -- --nocapture + +# Bash smoke/e2e tests +bash scripts/smoke_test.sh +bash scripts/ci_test.sh +bash scripts/teams_e2e_test.sh +``` + +## Mock Boundaries + +| Mock These | Don't Mock These | +|------------|-----------------| +| Database calls (libSQL) | Internal utility functions | +| HTTP/network requests | Business logic | +| File system operations | Data transformations | +| External API calls | Validation functions | +| Time/Date (`SystemTime`) | Pure functions | +| Environment variables | Parsing/serialization | + +## Prove-It Pattern (Bug Fixes) + +| Step | Action | Gate | +|------|--------|------| +| 1. Red | Write a test that reproduces the bug | Test MUST fail | +| 2. Confirm | Run test suite, verify only new test fails | No other regressions | +| 3. Fix | Implement the minimal fix | Keep scope tight | +| 4. Green | Run test suite, verify all tests pass | New test now passes | +| 5. Commit | Include test + fix in same commit | Evidence-based fix | + +## Test Naming Conventions + +```rust +#[cfg(test)] +mod tests { + use super::*; + + // Pattern: action_condition_expected_result + #[test] + fn parse_valid_input_returns_ok() { ... } + + #[test] + fn parse_empty_string_returns_error() { ... } + + #[test] + fn create_task_assigns_unique_id() { ... } +} +``` + +## Integration Test Structure + +```rust +// tests/integration_test.rs (separate file in tests/ directory) +use tempfile::TempDir; + +#[tokio::test] +async fn full_task_lifecycle() { + // Setup: temp directory for .flow/ state + let tmp = TempDir::new().unwrap(); + let db = Database::open(tmp.path()).await.unwrap(); + + // Create + let task = db.create_task("Test task").await.unwrap(); + assert_eq!(task.status, "todo"); + + // Transition + db.start_task(&task.id).await.unwrap(); + let updated = db.get_task(&task.id).await.unwrap(); + assert_eq!(updated.status, "in_progress"); + + // Complete + db.complete_task(&task.id, "Done").await.unwrap(); + let done = db.get_task(&task.id).await.unwrap(); + assert_eq!(done.status, "done"); +} +``` + +## Bash Test Patterns + +```bash +# Temp directory isolation +TMPDIR=$(mktemp -d) +trap "rm -rf $TMPDIR" EXIT + +# Assert command succeeds +$FLOWCTL create-epic "test" --json || { echo "FAIL: create-epic"; exit 1; } + +# Assert output contains expected string +OUTPUT=$($FLOWCTL show fn-1 --json) +echo "$OUTPUT" | grep -q '"status":"todo"' || { echo "FAIL: status"; exit 1; } + +# Assert command fails (expected) +if $FLOWCTL done nonexistent 2>/dev/null; then + echo "FAIL: should have failed"; exit 1 +fi +``` + +## Test Anti-Patterns + +| Anti-Pattern | Problem | Better Approach | +|---|---|---| +| Testing implementation details | Breaks on refactor | Test inputs/outputs only | +| Shared mutable state between tests | Tests pollute each other | Use temp dirs, fresh DB per test | +| No assertions (test runs = passes) | False confidence | Every test needs `assert!` | +| Hardcoded paths in tests | Breaks on other machines | Use `TempDir`, relative paths | +| Skipping tests to pass CI | Hides real bugs | Fix or delete the test | +| Testing third-party code | Wastes time | Mock the boundary | +| Overly broad assertions (`is_ok()`) | Doesn't catch regressions | Assert specific values | +| Giant test functions (50+ lines) | Hard to diagnose failures | One assertion per concept | +| Test depends on execution order | Flaky in parallel | Each test self-contained | +| Sleeping for async (`thread::sleep`) | Slow and flaky | Use proper async awaiting | From d5c154989555ab46467765da17ff1eef883fba7c Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:16:12 +0800 Subject: [PATCH 5/9] hooks: add simplify-ignore hook for code block protection - Annotation-based protection: simplify-ignore-start/end markers - Replaces blocks with BLOCK_ placeholders before model reads - Expands placeholders back on edit/write, restores on stop - Supports //, /* */, #, comment styles - Opt-in only via .claude/settings.json (not hooks.json) - Documentation in hooks/SIMPLIFY-IGNORE.md Task: fn-136-borrow-agent-skills-patterns-anti.8 --- agents/cross-model-reviewer.md | 43 +++++- agents/quality-auditor.md | 89 ++++++++----- hooks/SIMPLIFY-IGNORE.md | 89 +++++++++++++ hooks/simplify-ignore.sh | 234 +++++++++++++++++++++++++++++++++ 4 files changed, 418 insertions(+), 37 deletions(-) create mode 100644 hooks/SIMPLIFY-IGNORE.md create mode 100755 hooks/simplify-ignore.sh diff --git a/agents/cross-model-reviewer.md b/agents/cross-model-reviewer.md index f58e4cdb..e057f914 100644 --- a/agents/cross-model-reviewer.md +++ b/agents/cross-model-reviewer.md @@ -63,11 +63,50 @@ The `flowctl_review` MCP tool exposes cross-model review: ## Review Types ### ReviewFinding -Individual issue with severity (critical/warning/info), category, description, and optional file/line. +Individual issue with severity, category, dimension, description, and optional file/line. + +**Severity classification** (aligned with quality-auditor dimensions): + +| Severity | Meaning | Ship impact | +|----------|---------|-------------| +| `Critical` | Correctness failure, security vulnerability, data loss risk | Blocks ship | +| `Important` | Architecture violation, readability problem, missing test coverage | Must fix before/shortly after ship | +| `Suggestion` | Performance improvement, naming nit, minor simplification | Optional | + +**Dimension tags** — each finding maps to one of five review dimensions: +- **Correctness** — edge cases, race conditions, state inconsistencies, off-by-one +- **Readability** — naming consistency, control flow clarity, module organization +- **Architecture** — pattern adherence, module boundaries, abstraction levels, dependency direction +- **Security** — injection, auth, data exposure, dependencies +- **Performance** — N+1, unbounded loops, blocking operations + +### Structured Output Format + +Each model's review MUST produce findings in this structure: + +```json +{ + "verdict": "SHIP | NEEDS_WORK | ABSTAIN", + "confidence": 0.0-1.0, + "findings": [ + { + "severity": "Critical | Important | Suggestion", + "dimension": "Correctness | Readability | Architecture | Security | Performance", + "file": "path/to/file.rs", + "line": 42, + "description": "What is wrong", + "suggestion": "How to fix it" + } + ], + "positives": ["At least one positive observation"] +} +``` + +The consensus algorithm uses severity to weight disagreements: a `Critical` finding from any model blocks SHIP regardless of the other model's verdict. `Suggestion`-only findings do not block. ### ReviewVerdict - **SHIP**: Code is ready -- **NEEDS_WORK**: Code needs fixes +- **NEEDS_WORK**: Code needs fixes (at least one Critical or multiple Important findings) - **ABSTAIN**: Model cannot determine (excluded from consensus) ### ConsensusResult diff --git a/agents/quality-auditor.md b/agents/quality-auditor.md index ebd3eaba..3e21b01c 100644 --- a/agents/quality-auditor.md +++ b/agents/quality-auditor.md @@ -17,7 +17,7 @@ You're invoked after implementation, before shipping. Review the changes and fla ## Audit Strategy -### 1. Get the Diff +### Step 0. Get the Diff + Quick Scan ```bash # What changed? git diff main --stat @@ -27,72 +27,91 @@ git diff main --name-only git diff main ``` -### 2. Quick Scan (find obvious issues fast) +Scan the diff for obvious issues before deep review: - **Secrets**: API keys, passwords, tokens in code - **Debug code**: console.log, debugger, TODO/FIXME - **Commented code**: Dead code that should be deleted - **Large files**: Accidentally committed binaries, logs -### 3. Correctness Review -- Does the code match the stated intent? -- Are there off-by-one errors, wrong operators, inverted conditions? -- Do error paths actually handle errors? -- Are promises/async properly awaited? +### Five-Dimension Review + +Evaluate every change across all five dimensions. Each dimension produces findings tagged with severity (see Output Format). -### 4. Security Scan +### 1. Correctness +- Does the code match the stated intent? +- Edge cases: off-by-one errors, wrong operators, inverted conditions +- Race conditions and state inconsistencies in concurrent code +- Error paths: do they actually handle errors or silently swallow? +- Async correctness: are promises/async properly awaited? +- Are new code paths tested? Do tests assert behavior (not just run)? +- Are error paths and edge cases from gap analysis covered? + +### 2. Readability +- Naming consistency: do new names follow existing conventions? +- Control flow clarity: can you follow the logic without mental gymnastics? +- Module organization: are things in the right files/directories? +- Could this be simpler? Is there duplicated code that should be extracted? +- Are there unnecessary abstractions or over-engineering for hypothetical futures? + +### 3. Architecture +- Pattern adherence: does the change follow established project patterns? +- Module boundaries: are concerns properly separated? +- Abstraction levels: is the right level of abstraction used? +- Dependency direction: do dependencies flow in the correct direction? + +### 4. Security - **Injection**: SQL, XSS, command injection vectors - **Auth/AuthZ**: Are permissions checked? Can they be bypassed? - **Data exposure**: Is sensitive data logged, leaked, or over-exposed? - **Dependencies**: Any known vulnerable packages added? -### 5. Simplicity Check -- Could this be simpler? -- Is there duplicated code that should be extracted? -- Are there unnecessary abstractions? -- Over-engineering for hypothetical future needs? - -### 6. Test Coverage -- Are new code paths tested? -- Do tests actually assert behavior (not just run)? -- Are edge cases from gap analysis covered? -- Are error paths tested? - -### 7. Performance Red Flags -- N+1 queries or O(n²) loops -- Unbounded data fetching -- Missing pagination/limits +### 5. Performance +- N+1 queries or O(n^2) loops +- Unbounded data fetching, missing pagination/limits - Blocking operations on hot paths +- Resource leaks (unclosed handles, missing cleanup) ## Output Format +Every finding MUST carry a severity prefix. Use exactly these four levels: + +- **`Critical:`** — Blocks ship. Could cause outage, data loss, security breach, or correctness failure. +- **`Important:`** — Must fix before or shortly after ship. Significant quality, readability, or architecture issue. +- **`Nit:`** — Optional improvement. Style, naming, minor simplification. +- **`FYI`** — Informational. Context for the author, no action required. + +Every review MUST include a "What's Good" section with at least one positive observation. Acknowledge patterns followed, good design decisions, thorough error handling, or clean naming. + ```markdown ## Quality Audit: [Branch/Feature] ### Summary - Files changed: N +- Dimensions reviewed: Correctness, Readability, Architecture, Security, Performance - Risk level: Low / Medium / High - Ship recommendation: ✅ Ship / ⚠️ Fix first / ❌ Major rework -### Critical (MUST fix before shipping) -- **[File:line]**: [Issue] +### Critical (blocks ship) +- **Critical:** [File:line] — [Issue] (Dimension: Correctness/Security/etc.) - Risk: [What could go wrong] - Fix: [Specific suggestion] -### Should Fix (High priority) -- **[File:line]**: [Issue] - - [Brief fix suggestion] +### Important (must fix) +- **Important:** [File:line] — [Issue] (Dimension: ...) + - Fix: [Brief fix suggestion] -### Consider (Nice to have) -- [Minor improvement suggestion] +### Nit (optional) +- **Nit:** [File:line] — [Improvement suggestion] + +### FYI +- **FYI** [Informational observation] ### Test Gaps - [ ] [Untested scenario] -### Security Notes -- [Any security observations] - ### What's Good -- [Positive observations - patterns followed, good decisions] +- [At least one positive observation — patterns followed, good decisions, clean code] +- [Additional positives as warranted] ``` ## Rules diff --git a/hooks/SIMPLIFY-IGNORE.md b/hooks/SIMPLIFY-IGNORE.md new file mode 100644 index 00000000..ead8ab69 --- /dev/null +++ b/hooks/SIMPLIFY-IGNORE.md @@ -0,0 +1,89 @@ +# simplify-ignore hook + +Annotation-based code protection for Claude Code. Mark code blocks with `simplify-ignore-start` / `simplify-ignore-end` annotations and the hook replaces them with `BLOCK_` placeholders before the model sees them. On edit or write, placeholders are expanded back to the original code. On session stop, all files are fully restored. + +## Setup + +Add to your project's `.claude/settings.json` (NOT `hooks/hooks.json` -- this is opt-in per project): + +```json +{ + "hooks": { + "PreToolUse": [ + { + "matcher": "Read", + "hooks": [{ "type": "command", "command": "bash ${CLAUDE_PROJECT_DIR}/hooks/simplify-ignore.sh" }] + } + ], + "PostToolUse": [ + { + "matcher": "Edit|Write", + "hooks": [{ "type": "command", "command": "bash ${CLAUDE_PROJECT_DIR}/hooks/simplify-ignore.sh" }] + } + ], + "Stop": [ + { + "hooks": [{ "type": "command", "command": "bash ${CLAUDE_PROJECT_DIR}/hooks/simplify-ignore.sh" }] + } + ] + } +} +``` + +Add `.claude/.simplify-ignore-cache/` to your `.gitignore`. + +## Annotation syntax + +```js +/* simplify-ignore-start: perf-critical */ +// manually unrolled XOR -- 3x faster than a loop +result[0] = buf[0] ^ key[0]; +result[1] = buf[1] ^ key[1]; +/* simplify-ignore-end */ +``` + +The reason after the colon is optional. It appears in the placeholder (`BLOCK_a1b2c3d4: perf-critical`) so the model knows why the block is hidden. + +## Supported comment styles + +| Style | Example | +|-------|---------| +| `/* */` | `/* simplify-ignore-start: reason */` | +| `//` | `// simplify-ignore-start: reason` | +| `#` | `# simplify-ignore-start: reason` | +| `` | `` | + +Multiple blocks per file and single-line blocks (start + end on same line) are supported. Placeholders preserve the original comment prefix/suffix. + +## How it works + +| Hook event | Action | +|------------|--------| +| PreToolUse Read | Backs up file, replaces annotated blocks with `BLOCK_` placeholders | +| PostToolUse Edit/Write | Expands placeholders back, saves model changes, re-filters | +| Stop | Restores all files from backup | + +Block hashes are 8 hex chars from `shasum`/`sha1sum` of the block content, making round-trips unambiguous. + +## Crash recovery + +If Claude Code exits without triggering the Stop hook, files may still have `BLOCK_` placeholders. Restore manually: + +```bash +echo '{}' | bash hooks/simplify-ignore.sh +``` + +Backups are in `.claude/.simplify-ignore-cache/` within your project directory. + +## Known limitations + +- **Single-line blocks hide the entire line.** Use dedicated lines for annotations. +- **Suffix detection covers `*/` and `-->` only.** Non-standard template comment closers (ERB, Blade) may not work; use `#` or `//` instead. +- **Fuzzy fallback on altered placeholders.** If the model changes a placeholder's reason text, the hook tries a hash-only match which may leave cosmetic debris. +- **File renaming leaves placeholders.** Renamed files retain placeholders; originals are saved as `.recovered` on stop. + +## Requirements + +- `jq` +- `shasum` or `sha1sum` (auto-detected) +- Bash 3.2+ diff --git a/hooks/simplify-ignore.sh b/hooks/simplify-ignore.sh new file mode 100755 index 00000000..fadd283c --- /dev/null +++ b/hooks/simplify-ignore.sh @@ -0,0 +1,234 @@ +#!/bin/bash +# simplify-ignore.sh — Opt-in hook for annotation-based code protection +# +# PreToolUse Read → backs up file, replaces annotated blocks with BLOCK_ placeholders +# PostToolUse Edit → expands placeholders back to real code, re-filters +# PostToolUse Write → expands placeholders back to real code, re-filters +# Stop → restores all files from backup (crash recovery) +# +# Dependencies: jq, shasum or sha1sum, Bash 3.2+ + +set -euo pipefail + +if ! command -v jq >/dev/null 2>&1; then + printf '%s\n' "error: simplify-ignore requires jq" >&2; exit 1 +fi + +CACHE="${CLAUDE_PROJECT_DIR:-.}/.claude/.simplify-ignore-cache" +if [ -t 0 ]; then INPUT="{}"; else INPUT=$(cat); fi + +# Parse hook input +TOOL_NAME=$(printf '%s' "$INPUT" | jq -r '.tool_name // empty' 2>/dev/null) || TOOL_NAME="" +FILE_PATH=$(printf '%s' "$INPUT" | jq -r '.tool_input.file_path // empty' 2>/dev/null) || FILE_PATH="" + +# ── Hash helpers ───────────────────────────────────────────────────────────── +hash_cmd() { + if command -v shasum >/dev/null 2>&1; then shasum + elif command -v sha1sum >/dev/null 2>&1; then sha1sum + else printf '%s\n' "error: missing shasum or sha1sum" >&2; exit 1; fi +} +file_id() { printf '%s' "$1" | hash_cmd | cut -c1-16; } +block_hash() { printf '%s' "$1" | hash_cmd | cut -c1-8; } + +# Escape glob metacharacters for Bash 3.2 parameter expansion +escape_glob() { + local s="$1" + s="${s//\\/\\\\}" + s="${s//\*/\\*}" + s="${s//\?/\\?}" + s="${s//\[/\\[}" + printf '%s' "$s" +} + +# ── filter_file: replace simplify-ignore blocks with BLOCK_ placeholders +# Reads $1 (source), writes filtered version to $2, saves blocks to cache. +# Returns 0 if blocks found, 1 if none. +filter_file() { + local src="$1" dest="$2" fid="$3" + : > "$dest" + rm -f "$CACHE/${fid}".block.* "$CACHE/${fid}".reason.* "$CACHE/${fid}".prefix.* "$CACHE/${fid}".suffix.* + + local count=0 in_block=0 buf="" reason="" prefix="" suffix="" + + while IFS= read -r line || [ -n "$line" ]; do + # Detect start marker + if [ $in_block -eq 0 ]; then + case "$line" in *simplify-ignore-start*) + in_block=1 + buf="$line" + prefix="${line%%simplify-ignore-start*}" + suffix="" + case "$line" in *'*/'*) suffix=" */" ;; *'-->'*) suffix=" -->" ;; esac + reason=$(printf '%s' "$line" | sed -n 's/.*simplify-ignore-start:[[:space:]]*//p' \ + | sed 's/[[:space:]]*\*\/.*$//' | sed 's/[[:space:]]*-->.*$//' | sed 's/[[:space:]]*$//') + # Handle single-line block (start + end on same line) + case "$line" in *simplify-ignore-end*) + in_block=0 + local h; h=$(block_hash "$buf") + count=$((count + 1)) + printf '%s' "$buf" > "$CACHE/${fid}.block.${h}" + [ -n "$reason" ] && printf '%s' "$reason" > "$CACHE/${fid}.reason.${h}" + printf '%s' "$prefix" > "$CACHE/${fid}.prefix.${h}" + printf '%s' "$suffix" > "$CACHE/${fid}.suffix.${h}" + if [ -n "$reason" ]; then + printf '%s\n' "${prefix}BLOCK_${h}: ${reason}${suffix}" >> "$dest" + else + printf '%s\n' "${prefix}BLOCK_${h}${suffix}" >> "$dest" + fi + buf=""; reason=""; prefix=""; suffix="" + continue + ;; *) + continue + ;; + esac + ;; esac + fi + # Accumulate block content + if [ $in_block -eq 1 ]; then + buf="${buf} +${line}" + fi + # Detect end marker + case "$line" in *simplify-ignore-end*) + if [ $in_block -eq 1 ]; then + local h; h=$(block_hash "$buf") + count=$((count + 1)) + printf '%s' "$buf" > "$CACHE/${fid}.block.${h}" + [ -n "$reason" ] && printf '%s' "$reason" > "$CACHE/${fid}.reason.${h}" + printf '%s' "$prefix" > "$CACHE/${fid}.prefix.${h}" + printf '%s' "$suffix" > "$CACHE/${fid}.suffix.${h}" + if [ -n "$reason" ]; then + printf '%s\n' "${prefix}BLOCK_${h}: ${reason}${suffix}" >> "$dest" + else + printf '%s\n' "${prefix}BLOCK_${h}${suffix}" >> "$dest" + fi + in_block=0; buf=""; reason=""; prefix=""; suffix="" + continue + fi + ;; + esac + [ $in_block -eq 0 ] && printf '%s\n' "$line" >> "$dest" + done < "$src" + + # Unclosed block: flush as-is with warning + if [ $in_block -eq 1 ] && [ -n "$buf" ]; then + printf 'Warning: unclosed simplify-ignore-start in %s\n' "$src" >&2 + printf '%s\n' "$buf" >> "$dest" + fi + + [ $count -gt 0 ] && return 0 || return 1 +} + +# ── Stop: restore all files from backup ────────────────────────────────────── +if [ -z "$TOOL_NAME" ]; then + [ -d "$CACHE" ] || exit 0 + for bak in "$CACHE"/*.bak; do + [ -f "$bak" ] || continue + fid="${bak##*/}"; fid="${fid%.bak}" + pathfile="$CACHE/${fid}.path" + [ -f "$pathfile" ] || { rm -f "$bak"; continue; } + orig=$(cat "$pathfile") + if [ -f "$orig" ]; then + cat "$bak" > "$orig" + else + mkdir -p "$(dirname "${orig}.recovered")" + mv "$bak" "${orig}.recovered" + printf 'Warning: %s moved/deleted. Recovered to %s.recovered\n' "$orig" "$orig" >&2 + fi + rm -f "$bak" "$pathfile" "$CACHE/${fid}".block.* "$CACHE/${fid}".reason.* \ + "$CACHE/${fid}".prefix.* "$CACHE/${fid}".suffix.* + done + exit 0 +fi + +[ -z "$FILE_PATH" ] && exit 0 + +# ── PreToolUse Read: filter in-place ───────────────────────────────────────── +if [ "$TOOL_NAME" = "Read" ]; then + [ -f "$FILE_PATH" ] || exit 0 + # Don't filter our own files + case "$(basename "$FILE_PATH")" in simplify-ignore*|SIMPLIFY-IGNORE*) exit 0 ;; esac + + mkdir -p "$CACHE" + ID=$(file_id "$FILE_PATH") + + # Already filtered + [ -f "$CACHE/${ID}.bak" ] && exit 0 + + # No annotations in this file + grep -q 'simplify-ignore-start' -- "$FILE_PATH" || exit 0 + + # Back up original + cp -p "$FILE_PATH" "$CACHE/${ID}.bak" 2>/dev/null || cp "$FILE_PATH" "$CACHE/${ID}.bak" + printf '%s' "$FILE_PATH" > "$CACHE/${ID}.path" + + # Filter in-place + FILTERED="$CACHE/${ID}.$$.tmp" + rm -f "$FILTERED" + if filter_file "$FILE_PATH" "$FILTERED" "$ID"; then + cat "$FILTERED" > "$FILE_PATH" + rm -f "$FILTERED" + else + # No blocks found (race condition) - clean up + rm -f "$FILTERED" "$CACHE/${ID}.bak" "$CACHE/${ID}.path" + fi + exit 0 +fi + +# ── PostToolUse Edit|Write: expand placeholders, then re-filter ────────────── +if [ "$TOOL_NAME" = "Edit" ] || [ "$TOOL_NAME" = "Write" ]; then + ID=$(file_id "$FILE_PATH") + [ -f "$CACHE/${ID}.bak" ] || exit 0 + ls "$CACHE/${ID}".block.* >/dev/null 2>&1 || exit 0 + + # Expand placeholders back to original blocks + EXPANDED="$CACHE/${ID}.$$.expanded" + rm -f "$EXPANDED" + while IFS= read -r line || [ -n "$line" ]; do + case "$line" in *BLOCK_*) + for bf in "$CACHE/${ID}".block.*; do + [ -f "$bf" ] || continue + h="${bf##*.}" + case "$line" in *"BLOCK_${h}"*) + bp=""; bs=""; br="" + [ -f "$CACHE/${ID}.prefix.${h}" ] && bp=$(cat "$CACHE/${ID}.prefix.${h}") + [ -f "$CACHE/${ID}.suffix.${h}" ] && bs=$(cat "$CACHE/${ID}.suffix.${h}") + [ -f "$CACHE/${ID}.reason.${h}" ] && br=$(cat "$CACHE/${ID}.reason.${h}") + if [ -n "$br" ]; then + placeholder="${bp}BLOCK_${h}: ${br}${bs}" + else + placeholder="${bp}BLOCK_${h}${bs}" + fi + block_content=$(cat "$bf"; printf x); block_content="${block_content%x}" + esc_placeholder=$(escape_glob "$placeholder") + line="${line//$esc_placeholder/$block_content}" + # Fuzzy fallback if model altered reason text + case "$block_content" in *"BLOCK_${h}"*) ;; *) + case "$line" in *"BLOCK_${h}"*) + printf 'Warning: BLOCK_%s modified by model, using fuzzy match\n' "$h" >&2 + line="${line//BLOCK_${h}/$block_content}" + ;; esac + ;; esac + ;; esac + done + ;; esac + printf '%s\n' "$line" >> "$EXPANDED" + done < "$FILE_PATH" + + # Preserve inode and permissions + cat "$EXPANDED" > "$FILE_PATH" + rm -f "$EXPANDED" + + # Save expanded version as new backup (includes model's changes) + cp "$FILE_PATH" "$CACHE/${ID}.bak" + + # Re-filter in-place so file stays with placeholders on disk + FILTERED="$CACHE/${ID}.$$.tmp" + rm -f "$FILTERED" + if filter_file "$FILE_PATH" "$FILTERED" "$ID"; then + cat "$FILTERED" > "$FILE_PATH" + rm -f "$FILTERED" + fi + + exit 0 +fi From d997c7b2628314fd4e4a2b9e147c8c00f6d54635 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:19:45 +0800 Subject: [PATCH 6/9] skills: update skill-create to enforce skill-anatomy.md template --- skills/flow-code-skill-create/SKILL.md | 35 ++++++++++++++++++++------ 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/skills/flow-code-skill-create/SKILL.md b/skills/flow-code-skill-create/SKILL.md index 9d17c044..56e1a085 100644 --- a/skills/flow-code-skill-create/SKILL.md +++ b/skills/flow-code-skill-create/SKILL.md @@ -5,6 +5,8 @@ description: Use when creating new flow-code skills, editing existing skills, or # Creating Flow-Code Skills +**Template spec:** See [docs/skill-anatomy.md](../../docs/skill-anatomy.md) for the full specification. + Writing skills IS Test-Driven Development applied to documentation. If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing. ## When to Create @@ -46,12 +48,27 @@ Core principle in 1-2 sentences. ## When to Use Symptoms, triggers, use cases. +- **When NOT to use:** exclusions to prevent misapplication ## Core Process / Pattern The actual workflow — inline code for simple patterns. -## Common Mistakes -What goes wrong + fixes. +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "Too simple to need this" | Simple things break too | +| "I'll do it properly later" | Later never comes | +| "I already know this" | Knowing != doing consistently | + +## Red Flags +- Observable symptoms indicating the skill is being violated +- Patterns to watch for during self-check and review + +## Verification +After completing the process, confirm: +- [ ] Checklist item with verifiable evidence +- [ ] Another checkpoint (test output, build result, etc.) ``` ## The Iron Law @@ -92,9 +109,9 @@ NO SKILL WITHOUT A FAILING TEST FIRST - Exclude: workflow summary, process steps, what the skill does - Keywords in body for discovery: error messages, symptoms, tool names -## Bulletproofing Discipline Skills +## Bulletproofing All Skills -For skills that enforce rules (debugging, TDD, verification): +Every skill — not just discipline skills — needs rationalization counters and red flags. Agents rationalize skipping steps in ALL skill types (technique, pattern, and reference). The sections below apply universally. **Close every loophole explicitly:** ```markdown @@ -133,14 +150,18 @@ Write code before test? Delete it. Start over. - [ ] Failure patterns identified **GREEN:** -- [ ] SKILL.md with frontmatter, overview, process, mistakes +- [ ] SKILL.md with frontmatter, overview, process - [ ] Description starts with "Use when..." (no workflow summary) - [ ] Scenario re-run with skill — agent complies +- [ ] "When NOT to use" included in When to Use section +- [ ] Common Rationalizations table present (>=3 entries) +- [ ] Red Flags list present (observable symptoms, not opinions) +- [ ] Verification checklist present (checkbox format) **REFACTOR:** - [ ] New rationalizations countered -- [ ] Rationalization table added (if discipline skill) -- [ ] Red flags list added (if discipline skill) +- [ ] Rationalization table complete (covers all observed agent excuses) +- [ ] Red flags list complete (covers all observed violation patterns) **DEPLOY:** - [ ] Committed to git From 56777d0fb9ec5ad341dde6b4bfd66c4085999822 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:21:14 +0800 Subject: [PATCH 7/9] skills: add anti-rationalization sections to 4 core skills Add Common Rationalizations (table format), Red Flags (observable symptoms), and Verification (checkbox checklists) sections to: - flow-code-plan - flow-code-work - flow-code-impl-review - flow-code-epic-review Follows the pattern established in flow-code-debug SKILL.md. Task: fn-136-borrow-agent-skills-patterns-anti.3 --- skills/flow-code-epic-review/SKILL.md | 26 +++++++++++++++++++++++++ skills/flow-code-impl-review/SKILL.md | 26 +++++++++++++++++++++++++ skills/flow-code-plan/SKILL.md | 28 +++++++++++++++++++++++++++ skills/flow-code-work/SKILL.md | 27 ++++++++++++++++++++++++++ 4 files changed, 107 insertions(+) diff --git a/skills/flow-code-epic-review/SKILL.md b/skills/flow-code-epic-review/SKILL.md index 4ed8474a..01f7096e 100644 --- a/skills/flow-code-epic-review/SKILL.md +++ b/skills/flow-code-epic-review/SKILL.md @@ -188,3 +188,29 @@ Epic review passed. Next: 2) Ship the code: push + create PR 3) Start next epic: `/flow-code:plan ` ``` + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "All tasks passed, epic must be fine" | Task-level success ≠ epic-level coherence. Integration gaps hide between tasks | +| "Tests pass, ship it" | Tests verify code behavior, not user value. Goal-backward check verifies intent | +| "We already reviewed each task" | Per-task review misses cross-task consistency and architectural drift | +| "The user will catch any issues" | Your job is to catch issues before the user does | +| "Good enough for now" | "Good enough" compounds into technical debt. Ship complete or scope down | + +## Red Flags + +- All acceptance criteria marked "Yes" but no test evidence provided +- Implementation diverged from epic plan (compare git diff vs spec scope) +- Goal-backward verification skipped or done superficially +- Epic has open required gaps but no justification for proceeding +- No cross-task integration verification performed + +## Verification + +- [ ] Every acceptance criterion verified with evidence (test output, screenshot, or manual check) +- [ ] Goal-backward verification completed for each criterion +- [ ] No open required gaps in gap registry +- [ ] Epic spec updated if scope changed during implementation +- [ ] Cross-task consistency verified (no conflicting changes between tasks) diff --git a/skills/flow-code-impl-review/SKILL.md b/skills/flow-code-impl-review/SKILL.md index 149a6d41..0df30abe 100644 --- a/skills/flow-code-impl-review/SKILL.md +++ b/skills/flow-code-impl-review/SKILL.md @@ -155,3 +155,29 @@ If verdict is NEEDS_WORK, loop internally until SHIP: 5. **Repeat** until `SHIP` **CRITICAL**: For RP, re-reviews must stay in the SAME chat so reviewer has context. Only use `--new-chat` on the FIRST review. + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "Reviewer said fix, so it must be right" | Review feedback needs technical verification too. Reviewers can be wrong | +| "This is just cosmetic" | 3 cosmetic issues often signal a structural problem underneath | +| "3 iterations is overkill for this" | The circuit breaker exists for a reason. If you hit it, the plan was wrong | +| "I already fixed the important ones" | "Important" is defined by severity, not your estimate of effort | +| "The code works, reviewer is being pedantic" | Working code is the minimum bar, not the finish line | + +## Red Flags + +- Same issue appears across 3+ review iterations (underlying design problem) +- Fix introduces a new Critical finding +- SHIP verdict issued with zero test evidence +- Review only checks code quality but ignores spec compliance +- ≥3 Critical findings in a single review pass + +## Verification + +- [ ] All Critical findings addressed with specific fixes +- [ ] No fix introduced new Critical or Important issues +- [ ] Tests still pass after all fixes applied +- [ ] Review receipt saved to .flow/reviews/ +- [ ] Acceptance criteria verified (not just code quality) diff --git a/skills/flow-code-plan/SKILL.md b/skills/flow-code-plan/SKILL.md index b01eeff0..cb237b15 100644 --- a/skills/flow-code-plan/SKILL.md +++ b/skills/flow-code-plan/SKILL.md @@ -142,3 +142,31 @@ All plans go into `.flow/`: **After work completes** (if auto-executed): - All tasks done → Layer 3 adversarial review runs automatically (Phase 3j) - Then auto push + draft PR (Phase 5) + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "This is obvious, skip scouts" | Scouts find patterns and reuse points you don't know about | +| "Only one task, no DAG needed" | Even single tasks need dependency analysis and acceptance criteria | +| "Requirements are clear, skip interview" | Your understanding ≠ user's intent. A 2-minute check prevents 2-hour rework | +| "This is too small to plan" | Small plans still need acceptance criteria. A 3-line spec is fine | +| "I'll refine the plan during implementation" | Implementation drift is the #1 cause of rework. Plan now | + +## Red Flags + +- Any task estimated >L size without being split +- Scout returns "Cannot analyze" but planning continues without investigation +- Task title contains "and" (likely two tasks combined) +- No acceptance criteria on any task +- Plan has zero file references from repo-scout +- 7+ tasks in a single epic (possible over-splitting) + +## Verification + +- [ ] Every task has ≥1 testable acceptance criterion +- [ ] Every task is ≤M size (L tasks split) +- [ ] File refs from repo-scout included in task specs +- [ ] DAG validated (`flowctl validate`) +- [ ] Quick commands section exists in epic spec +- [ ] No task title contains "and" without justification diff --git a/skills/flow-code-work/SKILL.md b/skills/flow-code-work/SKILL.md index aa6ccf0c..3df07e43 100644 --- a/skills/flow-code-work/SKILL.md +++ b/skills/flow-code-work/SKILL.md @@ -136,3 +136,30 @@ $FLOWCTL restart --force - Don't leave tasks half-done - Never use TodoWrite for task tracking - Never create plan files outside `.flow/` + +## Common Rationalizations + +| Excuse | Reality | +|--------|---------| +| "Spec is good enough, skip re-anchor" | Specs change during plan-review. Always re-read before implementing | +| "Guard passed, definitely no issues" | Guards may not cover your specific change. Manual verification still needed | +| "Small change, no test needed" | Small changes are the #1 source of regressions | +| "I'll test after all tasks are done" | Cross-task integration bugs are 10x harder to diagnose than per-task bugs | +| "One more attempt will fix it" | 3+ failed attempts = architectural problem. Escalate, don't retry | + +## Red Flags + +- ≥2 tasks failed in the same wave (systemic issue, stop and investigate) +- Same file needs 3+ fixes across iterations +- Guard passes but manual test reveals broken behavior +- Worker reports "spec unclear" but continues implementing +- >100 lines changed without running any tests +- Wave checkpoint skipped or results not reviewed + +## Verification + +- [ ] All tasks in epic are done or explicitly skipped +- [ ] Guards pass on final state (`flowctl guard`) +- [ ] Evidence files exist for each completed task +- [ ] No unresolved spec conflicts +- [ ] Wave checkpoints completed for each wave From ddb796bf5055bef74b264338b51c25cbf82c1b27 Mon Sep 17 00:00:00 2001 From: z23cc Date: Tue, 7 Apr 2026 05:26:57 +0800 Subject: [PATCH 8/9] fix(hooks): address adversarial review findings in simplify-ignore - Detect missing placeholders before expanding (prevents permanent data loss) - Keep original backup pristine (never overwrite with expanded version) - Require comment prefix for annotations (prevents false triggers in strings/docs) - Document disk-state limitation and missing-placeholder safety --- hooks/SIMPLIFY-IGNORE.md | 3 +++ hooks/simplify-ignore.sh | 28 +++++++++++++++++++++++----- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/hooks/SIMPLIFY-IGNORE.md b/hooks/SIMPLIFY-IGNORE.md index ead8ab69..f8fd18c6 100644 --- a/hooks/SIMPLIFY-IGNORE.md +++ b/hooks/SIMPLIFY-IGNORE.md @@ -81,6 +81,9 @@ Backups are in `.claude/.simplify-ignore-cache/` within your project directory. - **Suffix detection covers `*/` and `-->` only.** Non-standard template comment closers (ERB, Blade) may not work; use `#` or `//` instead. - **Fuzzy fallback on altered placeholders.** If the model changes a placeholder's reason text, the hook tries a hash-only match which may leave cosmetic debris. - **File renaming leaves placeholders.** Renamed files retain placeholders; originals are saved as `.recovered` on stop. +- **Annotation must be in a comment.** Only lines with `//`, `/*`, `#`, or `