From 070378ef40ca1dbe35d98a7affd71fae63d8b74b Mon Sep 17 00:00:00 2001 From: Dean Sharon Date: Sat, 21 Mar 2026 01:52:24 +0200 Subject: [PATCH] feat: improve agent quality gates (#120, #121, #124) Simplifier: replace vague clarity bullets with 8 structured slop detection categories (language-behavior tests, redundant type checks, over-defensive handling, debug remnants, commented-out code, unused imports, verbose names, unnecessary intermediates). Scrutinizer: add stub detection responsibility between P0 and P1 evaluation, backed by new stub-detection.md reference covering component stubs, API/service stubs, hook/effect stubs, and wiring gaps. Shepherd: add goal-backward verification (trace from user goals to implementation), artifact depth table (Exists/Substantive/Wired), stub misalignment type, and re-verification section for retry runs. --- CHANGELOG.md | 5 + shared/agents/scrutinizer.md | 10 +- shared/agents/shepherd.md | 38 +++-- shared/agents/simplifier.md | 39 +++-- .../self-review/references/stub-detection.md | 135 ++++++++++++++++++ 5 files changed, 196 insertions(+), 31 deletions(-) create mode 100644 shared/skills/self-review/references/stub-detection.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 7807c732..11f59ec4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **Simplifier agent** — 8 structured slop detection categories (#120) +- **Scrutinizer agent** — stub detection patterns with reference file (#121) +- **Shepherd agent** — goal-backward verification, artifact depth checking, stub type, re-verification (#124) + --- ## [1.7.0] - 2026-03-20 diff --git a/shared/agents/scrutinizer.md b/shared/agents/scrutinizer.md index dcdaeb2a..1e26154a 100644 --- a/shared/agents/scrutinizer.md +++ b/shared/agents/scrutinizer.md @@ -21,13 +21,15 @@ You receive from orchestrator: 2. **Evaluate P0 pillars** (Design, Functionality, Security): These MUST pass. Fix all issues found. -3. **Evaluate P1 pillars** (Complexity, Error Handling, Tests): These SHOULD pass. Fix all issues found. +3. **Detect stubs and wiring gaps**: Check for placeholder implementations that compile but don't deliver real functionality. See `references/stub-detection.md` for patterns. Flag as P0-Functionality issues. -4. **Evaluate P2 pillars** (Naming, Consistency, Documentation): Report as suggestions. Fix if straightforward. +4. **Evaluate P1 pillars** (Complexity, Error Handling, Tests): These SHOULD pass. Fix all issues found. -5. **Commit fixes**: If any changes were made, create a commit with message "fix: address self-review issues". +5. **Evaluate P2 pillars** (Naming, Consistency, Documentation): Report as suggestions. Fix if straightforward. -6. **Report status**: Return structured report with pillar evaluations and changes made. +6. **Commit fixes**: If any changes were made, create a commit with message "fix: address self-review issues". + +7. **Report status**: Return structured report with pillar evaluations and changes made. ## Principles diff --git a/shared/agents/shepherd.md b/shared/agents/shepherd.md index e79ea280..936daa04 100644 --- a/shared/agents/shepherd.md +++ b/shared/agents/shepherd.md @@ -21,9 +21,19 @@ You receive from orchestrator: 1. **Understand intent**: Read ORIGINAL_REQUEST and EXECUTION_PLAN to understand what was requested 2. **Review implementation**: Read FILES_CHANGED to understand what was built -3. **Check completeness**: Verify all plan steps implemented, all acceptance criteria met -4. **Check scope**: Identify out-of-scope additions not justified by design improvements -5. **Report misalignments**: Document issues with sufficient detail for Coder to fix +3. **Goal-backward verification**: Start from the user's observable goals. For each goal: trace backward through the implementation — is it wired into the running app → does it contain substantive logic → does the file/function exist? Report any goal failing at any depth. +4. **Check artifact depth**: Classify each deliverable using this scale: + + | Depth | Meaning | Example | + |-------|---------|---------| + | Exists | File/function created | Route file exists | + | Substantive | Contains real logic | Route has validation + DB call | + | Wired | Connected to running app | Route registered, imported, reachable | + + Flag anything at "Exists" without reaching "Wired" as `incomplete`. +5. **Check completeness**: Verify all plan steps implemented, all acceptance criteria met +6. **Check scope**: Identify out-of-scope additions not justified by design improvements +7. **Report misalignments**: Document issues with sufficient detail for Coder to fix ## Principles @@ -51,6 +61,11 @@ Return structured alignment status: - Implementation solves: {1-sentence summary} - Alignment: aligned | drifted +### Artifact Depth +| Deliverable | Exists | Substantive | Wired | Status | +|-------------|--------|-------------|-------|--------| +| {feature} | Y/N | Y/N | Y/N | complete/incomplete/stub | + ### Misalignments Found (if MISALIGNED) | Type | Description | Files | Suggested Fix | @@ -59,20 +74,17 @@ Return structured alignment status: | scope_creep | {what's out of scope} | {file paths} | {remove or justify} | | incomplete | {what's partially done} | {file paths} | {what remains} | | intent_drift | {how intent drifted} | {file paths} | {how to realign} | +| stub | {placeholder, not real logic} | {file paths} | {what real implementation needs} | ### Scope Check - Out-of-scope additions: {list or "None"} - Justification: {if additions found, are they justified design improvements?} -``` - -## Misalignment Types -| Type | Description | Example | -|------|-------------|---------| -| `missing` | Functionality in plan not implemented | "Login validation not implemented" | -| `scope_creep` | Added functionality not in plan | "Analytics tracking added but not requested" | -| `incomplete` | Partially implemented functionality | "Error handling added but no user-facing messages" | -| `intent_drift` | Implementation solves different problem | "Built password reset instead of login flow" | +### Re-verification (if applicable) +| Previously Failed | Status | Notes | +|-------------------|--------|-------| +| {item} | RESOLVED/STILL_FAILING | {details} | +``` ## Boundaries @@ -81,12 +93,14 @@ Return structured alignment status: - Out-of-scope additions not justified by design - Partial implementations - Intent drift +- Stubs or placeholders passing as real implementations **Report as ALIGNED:** - All plan steps implemented - All acceptance criteria met - No unjustified scope additions - Implementation matches original intent +- All deliverables reach "Wired" depth **Never:** - Modify code or create commits diff --git a/shared/agents/simplifier.md b/shared/agents/simplifier.md index 4d93fa43..b6a71e07 100644 --- a/shared/agents/simplifier.md +++ b/shared/agents/simplifier.md @@ -30,31 +30,39 @@ Analyze recently modified code and apply refinements that: - Use proper error handling patterns (avoid try/catch when possible) - Maintain consistent naming conventions -3. **Enhance Clarity**: Simplify code structure by: - - - Reducing unnecessary complexity and nesting - - Eliminating redundant code and abstractions - - Improving readability through clear variable and function names +3. **Remove Slop**: Detect and remove these categories: + + | Category | Pattern | + |----------|---------| + | Language-behavior tests | Tests verifying built-in language features work as documented | + | Redundant type checks | Runtime checks for types TypeScript already enforces | + | Over-defensive handling | try/catch around code that cannot throw | + | Debug remnants | console.log, debugger, alert() left behind | + | Commented-out code | Dead code preserved in comments | + | Unused imports | Imports not referenced anywhere in file | + | Verbose names | Unnecessarily long names (`currentUserDataObject` → `user`) | + | Unnecessary intermediates | Variables used once, immediately after assignment | + +4. **Enhance Clarity**: Simplify code structure by: + + - Reducing unnecessary nesting (early returns, guard clauses) - Consolidating related logic - - Removing unnecessary comments that describe obvious code - - IMPORTANT: Avoid nested ternary operators - prefer switch statements or if/else chains for multiple conditions - - Choose clarity over brevity - explicit code is often better than overly compact code + - Avoiding nested ternary operators — prefer switch or if/else + - Choosing clarity over brevity — explicit code beats compact code -4. **Maintain Balance**: Avoid over-simplification that could: +5. **Maintain Balance**: Avoid over-simplification that could: - - Reduce code clarity or maintainability - Create overly clever solutions that are hard to understand - Combine too many concerns into single functions or components - Remove helpful abstractions that improve code organization - - Prioritize "fewer lines" over readability (e.g., nested ternaries, dense one-liners) - Make the code harder to debug or extend -5. **Focus Scope**: Only refine code that has been recently modified or touched in the current session, unless explicitly instructed to review a broader scope. +6. **Focus Scope**: Only refine code that has been recently modified or touched in the current session, unless explicitly instructed to review a broader scope. Your refinement process: 1. Identify the recently modified code sections -2. Analyze for opportunities to improve elegance and consistency +2. Analyze for slop categories and clarity improvements 3. Apply project-specific best practices and coding standards 4. Ensure all functionality remains unchanged 5. Verify the refined code is simpler and more maintainable @@ -87,7 +95,8 @@ Return structured completion status: - Files outside the recently modified scope (unless instructed) **Handle autonomously:** -- Naming improvements, dead code removal, nesting reduction +- Slop removal (all 8 categories) +- Naming improvements, nesting reduction - Import sorting and organization - Redundant abstraction elimination -- Comment cleanup (remove obvious, keep non-obvious) \ No newline at end of file +- Comment cleanup (remove obvious, keep non-obvious) diff --git a/shared/skills/self-review/references/stub-detection.md b/shared/skills/self-review/references/stub-detection.md new file mode 100644 index 00000000..47e21508 --- /dev/null +++ b/shared/skills/self-review/references/stub-detection.md @@ -0,0 +1,135 @@ +# Stub Detection Patterns + +Placeholder implementations that compile but don't deliver real functionality. Flag as **P0-Functionality** issues. + +Cross-reference: `core-patterns/references/code-smell-violations.md` covers hardcoded data and fake functionality labeling. This file focuses on structural stub patterns. + +--- + +## 1. Component Stubs + +Render nothing meaningful or return placeholder markup. + +```tsx +// STUB — returns static text, no real rendering +function UserProfile({ userId }: Props) { + return
Name
; +} + +// REAL — fetches and renders actual data +function UserProfile({ userId }: Props) { + const user = useUser(userId); + if (!user) return ; + return

{user.name}

{user.email}

; +} +``` + +**Patterns to flag:** +- `return null` / `return <>` in components that should render content +- Empty function bodies (`{}`) for handlers or lifecycle methods +- Components returning only hardcoded strings with no data binding + +--- + +## 2. API / Service Stubs + +Functions that exist in signature but throw, return hardcoded values, or do nothing. + +```typescript +// STUB — throws instead of implementing +async function createOrder(items: CartItem[]): Promise { + throw new Error("TODO: implement"); +} + +// STUB — hardcoded return, no real logic +async function getUser(id: string): Promise { + return { id, name: "Test User", email: "test@test.com" }; +} + +// REAL — actual implementation +async function createOrder(items: CartItem[]): Promise> { + const validated = validateItems(items); + if (!validated.ok) return validated; + const order = await db.orders.create({ items: validated.value }); + await queue.publish("order.created", order); + return { ok: true, value: order }; +} +``` + +**Patterns to flag:** +- `throw new Error("TODO")` / `throw new Error("Not implemented")` +- Functions returning hardcoded objects (no DB/API/computation) +- `"Not implemented"` strings in response bodies +- Empty async functions (`async function foo() {}`) + +--- + +## 3. Hook / Effect Stubs + +State and effects declared but not wired to behavior. + +```typescript +// STUB — effect does nothing +useEffect(() => {}, [userId]); + +// STUB — state declared, setter never called +const [items, setItems] = useState([]); +// ... setItems never appears in the component + +// STUB — custom hook returns static value +function usePermissions(): Permissions { + return { canEdit: true, canDelete: false }; +} + +// REAL — custom hook with actual logic +function usePermissions(): Permissions { + const { user } = useAuth(); + const { data } = useQuery(["permissions", user.role], fetchPermissions); + return data ?? DEFAULT_PERMISSIONS; +} +``` + +**Patterns to flag:** +- `useEffect(() => {}, [...])` — empty effect body +- `useState` where the setter is never called in the component +- Custom hooks returning static/hardcoded values +- `useMemo`/`useCallback` wrapping static values + +--- + +## 4. Wiring Gaps + +Individual pieces exist but aren't connected to the running application. **Highest-value detection** — these pass compilation and individual tests but the feature doesn't work end-to-end. + +```typescript +// GAP — fetch without await (result discarded) +function loadDashboard() { + fetchMetrics(); // Promise floats, never awaited + return ; +} + +// GAP — state declared but never rendered +const [error, setError] = useState(null); +// ... error never appears in JSX + +// GAP — handler defined but not bound +function handleSubmit(data: FormData) { /* real logic */ } +// ...
never appears + +// GAP — route defined with no-op handler +app.post("/api/orders", (_req, res) => { + res.status(200).json({ ok: true }); +}); + +// GAP — env var read but unused +const API_KEY = process.env.STRIPE_API_KEY; +// ... API_KEY never passed to any client or fetch call +``` + +**Patterns to flag:** +- `fetch`/`axios`/API call without `await` or `.then` (result discarded) +- State variable (`useState`, `useRef`) never rendered or read in output +- Event handler defined but not bound to any element/listener +- Route/endpoint registered with empty or no-op handler +- Environment variable or config read but never used downstream +- Import used only in type position but imported as value (in non-type-only import)