Skip to content
This repository was archived by the owner on Apr 27, 2026. It is now read-only.

aptratcn/systematic-debugging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Systematic Debugging 🔍

Stop guessing. Stop shotgun debugging. 4-phase root cause process that actually works.

License: MIT Zero Dependencies Skill

The Problem

How do you debug now? Be honest:

Bad Habit What Happens Time Wasted
🎲 Guess-and-check Change random things hoping one works Hours
🩹 Symptom fixing Patch the error, ignore the cause Recurs forever
🔁 Loop debugging Same bug, different day Days/weeks
🤖 AI spray-and-pray "Try this... try that..." 20+ attempts

The data: 70% of debugging time is spent on approaches that don't work. Most bugs are fixed in minutes once the root cause is identified.

The Solution: 4-Phase Process

Bug Report
    │
    ▼
┌──────────────┐
│ 1. OBSERVE   │  What IS happening? (Not what you THINK)
│   Reproduce  │  Gather facts, not assumptions
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 2. HYPOTHESIZE│  List ALL possible causes
│   Rank them  │  What evidence proves/disproves each?
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 3. VERIFY    │  Test #1 hypothesis first
│   One at time│  One variable. Document negative results.
└──────┬───────┘
       │
       ▼
┌──────────────┐
│ 4. FIX       │  Fix the ROOT CAUSE
│   + Prevent  │  Add regression test. Document the lesson.
└──────────────┘

Phase 1: OBSERVE — Gather Facts

Before you touch anything:

# Reproduce the issue
# What's the EXACT error message?
# What were you doing when it happened?
# Does it happen every time?

# System state
env | grep -i <relevant_var>
which <command>
<command> --version

Output: A clear, reproducible bug description. No "it doesn't work" — exact steps, exact error.

Phase 2: HYPOTHESIZE — Brainstorm Causes

List every possible cause, rank by likelihood:

Hypothesis                  | Likelihood | Test
────────────────────────────|────────────|──────
Config typo                 | High       | Check config file
Database connection dropped | Medium     | Test DB connection
Race condition              | Low        | Add logging
Memory corruption           | Very Low   | Run under valgrind

Rule: Never test the fun/interesting hypothesis first. Test the most LIKELY one first.

Phase 3: VERIFY — Test One at a Time

For each hypothesis:
1. State what you'll test
2. Change ONE thing only
3. Record the result (positive OR negative)

❌ "I changed 3 things and it works now" → Which one fixed it? 🤷
✅ "Changed timeout from 5→30s. Still fails. Hypothesis eliminated."

Negative results are valuable: Every eliminated hypothesis narrows the search.

Phase 4: FIX & PREVENT

# Fix the ROOT CAUSE, not the symptom
# Symptom fix: catch the exception and retry
# Root fix: fix why the exception happens

# Add regression test
# If it broke once, it can break again

# Document the lesson
echo "Bug: [description]. Cause: [root cause]. Fix: [what changed]. Lesson: [takeaway]" >> debug-log.md

Anti-Patterns This Prevents

Anti-Pattern What People Do What This Skill Forces
Shotgun debugging Change 5 things at once One variable at a time
Assumption cascade "I think it's X" → waste hours Observe first, hypothesize second
Sunk cost debugging "I've spent 3 hours on this hypothesis" Eliminated? Move on.
AI suggestion spam "Try this... try that..." Hypothesis → Verify → Next
Symptom patching Add try-catch around the error Fix why the error happens

Quick Start

# Claude Code
cp SKILL.md ~/.claude/skills/systematic-debugging/

# OpenClaw
cp SKILL.md ~/.openclaw/workspace/skills/systematic-debugging/

# Any agent: Copy SKILL.md content into your agent's skill directory

Example Session

User: "The API returns 500 errors intermittently"

Agent (without this skill):
  "Try restarting the server. Try adding retries. Try checking the logs."

Agent (with this skill):
  "Let's apply systematic debugging.
   
   OBSERVE: How often? Which endpoints? Any patterns in timing?
   
   HYPOTHESIZE:
   1. Database connection pool exhaustion (likely - intermittent)
   2. Memory leak in worker process (possible - gets worse over time)
   3. External API timeout (less likely - would show different error)
   
   Let's test #1 first. Can you check the DB connection pool metrics?"

Works With

  • OpenClaw
  • Claude Code, Cursor, Codex
  • Any agent framework
  • Human developers (seriously, try this)

Related Skills

License

MIT — Debug smarter, not harder.

About

Systematic Debugging - 4-phase root cause process. No guessing, no random fixes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors