Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification.
RLP Desk brings Geoffrey Huntley's Ralph Loop philosophy to Claude Code. Inspired by OpenAI Codex's long-horizon tasks and design-desk, it orchestrates fresh-context workers and verifiers through Claude Code's Agent() tool.
Key insight: Each iteration starts fresh. No accumulated context drift. The filesystem is the only memory.
[Your Session = LEADER]
│
Agent()├──▶ [Worker (fresh context)]
│ └── reads PRD + memory → implements → updates memory
│
Agent()└──▶ [Verifier (fresh context)]
└── reads done-claim → runs checks → writes verdict
npm install -g @ai-dev-methodologies/rlp-deskOr without npm:
curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bashAlways start with brainstorm. It interactively walks you through the project contract:
/rlp-desk brainstorm "implement a Python calculator with tests"
You'll be asked to confirm each item:
- Slug — project identifier
- User Stories — discrete, testable units with Given/When/Then acceptance criteria
- Task Type & Risk Level — code/visual/content/integration/infra × LOW/MEDIUM/HIGH/CRITICAL
- Iteration Unit — one story per iteration (incremental) or all at once (fast)
- Verification Commands — how to check the work
- Ambiguity Gate — AC quality scoring (IL-2, 0-12 scale, blocks init if < 6)
- Models — which Claude model for Worker/Verifier
# Recommended (cross-engine + final consensus):
/rlp-desk run <slug> --mode tmux --worker-model spark:high --consensus final-only --debug
# Claude-only:
/rlp-desk run <slug> --debugThe leader loop runs autonomously — spawning workers, verifying results, and tracking progress until completion or a circuit breaker triggers.
LLM conversations accumulate context. Long sessions drift, hallucinate, and forget earlier decisions. The Ralph Loop solves this by treating context as a disposable resource:
- Each worker gets a fresh context — no prior conversation, no accumulated confusion
- Filesystem = memory — PRDs, campaign memory, and context files are the only state
- Independent verification — a separate fresh-context verifier checks the worker's claims against real evidence
| Concept | Source |
|---|---|
| Fresh context per iteration | Ralph Loop (guide, tips) |
| Long-horizon autonomous tasks | OpenAI Codex |
| Desk-based orchestration | design-desk |
| Agent() subprocess model | Claude Code native |
| Role | Runs In | Responsibility |
|---|---|---|
| Leader | Your current session | Orchestrates the loop, reads memory, selects models, writes sentinels |
| Worker | Fresh Agent() context |
Executes one bounded action per iteration, updates memory |
| Verifier | Fresh Agent() context |
Independently verifies worker claims with fresh evidence |
for iteration in 1..max_iter:
1. Check sentinels (complete? blocked?)
2. Read campaign memory → get next iteration contract
3. Select model (haiku/sonnet/opus based on complexity)
4. Build worker prompt → dispatch via Agent()
5. Worker executes one bounded action, updates memory
6. If worker claims done → dispatch Verifier via Agent()
7. Verifier runs fresh checks → pass/fail/blocked
8. Update status, report to user, continue or stop
The Leader computes a hash for prd-<slug>.md at startup and again at each iteration using md5.
When the hash changes, it:
- Logs
prd_changed=truewithprd_hash, previous/new US counts, andnew_us - Splits the PRD into per-US files (
prd-<slug>-US-<id>.md) - Splits the test-spec into per-US files (
test-spec-<slug>-US-<id>.md) - Updates the in-memory PRD US list used for per-US dispatch
- Adds
NOTE: PRD was updated since last iteration. New/changed US may exist.to the Worker prompt
If the PRD hash is unchanged, prd_changed=false is logged and no re-split is triggered.
If the PRD file is missing, the process degrades gracefully and continues without failing the campaign loop.
RLP Desk enforces a comprehensive verification policy defined in governance.md:
Iron Laws (§1a) — 4 absolute rules that cannot be violated:
- IL-1: No completion claims without fresh verification evidence
- IL-2: No init without AC quality score ≥ 6 (Ambiguity Gate)
- IL-3: No pass with TODO in any required verification layer
- IL-4: No pass without test count ≥ AC count × 3
Evidence Gate (§1b) — 5-step protocol: IDENTIFY → RUN → READ → VERIFY → ONLY THEN claim
Risk Classification (§1c) — Proportional verification layers per risk level:
| Risk | Required Layers |
|---|---|
| LOW | L1 (Unit) + L3 (E2E) |
| MEDIUM | L1 + L2 (Integration) + L3 |
| HIGH | L1 + L2 + L3 + L4 (Deploy) |
| CRITICAL | L1 + L2 + L3 + L4 + mutation testing |
Execution Traceability (§1f) — Always-on, not flag-gated:
- Worker records
execution_stepsin done-claim.json (what was done, in what order, with evidence) - Verifier records
reasoningin verify-verdict.json (why each judgment was made)
| Condition | Action |
|---|---|
| Context unchanged for 3 iterations | BLOCKED |
| Same error repeated twice | Upgrade model, retry once, then BLOCKED |
| 3 consecutive failures | Architecture Escalation (§7¾) → report to user |
| Max iterations reached | TIMEOUT |
Core principle: Worker and Verifier use different AI engines whenever possible.
- Per-US: lightweight verification after each user story (catches issues early)
- Final: top-tier consensus gate before COMPLETE (quality guarantee)
- Progressive upgrade: auto-upgrade models on consecutive failure (2-attempt windows)
- Verifier minimum: claude sonnet (haiku cannot verify)
Verifier is always +1 tier above Worker. Same-engine shares blind spots — install codex for improved detection.
| Risk | Worker | Per-US Verifier | Worker upgrade path | Verifier upgrade path |
|---|---|---|---|---|
| LOW | haiku | sonnet | sonnet → opus | sonnet → opus |
| MEDIUM | sonnet | sonnet | opus | sonnet → opus |
| HIGH | sonnet | opus | opus | opus (ceiling) |
| CRITICAL | opus | opus ⚠ | (ceiling) | (ceiling) |
Final: opus solo ⚠ same-engine warning displayed
Spark is speed-optimized for coding. Use as Worker for LOW-HIGH; 5.4 for CRITICAL.
| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
|---|---|---|---|---|
| LOW | spark medium | sonnet | spark high → xhigh | sonnet → opus |
| MEDIUM | spark high | sonnet | spark xhigh → 5.4 medium | sonnet → opus |
| HIGH | spark xhigh | opus | 5.4 high → 5.4 xhigh | opus (ceiling) |
| CRITICAL | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
Final: opus + 5.4 high (both must PASS)
| Risk | Worker (codex) | Per-US Verifier (claude) | Worker upgrade path | Verifier upgrade path |
|---|---|---|---|---|
| LOW | 5.4 low | sonnet | 5.4 medium → high | sonnet → opus |
| MEDIUM | 5.4 medium | sonnet | 5.4 high → xhigh | sonnet → opus |
| HIGH | 5.4 high | opus | 5.4 xhigh | opus (ceiling) |
| CRITICAL | 5.4 xhigh | opus | (ceiling) | opus (ceiling) |
Final: opus + 5.4 high (both must PASS)
| Environment | Engine 1 | Engine 2 | Rule |
|---|---|---|---|
| Claude-only | opus | — | Solo ⚠ |
| Cross-engine | opus | 5.4 high | Both must PASS → COMPLETE |
Worker auto-upgrades on consecutive same-US failure. Verifier is fixed at campaign start. CB default: 6.
fail 1-2: keep current model (2-attempt window)
fail 3-4: upgrade 1 step (e.g., haiku → sonnet)
fail 5-6: upgrade 2 steps (e.g., haiku → opus)
fail 7+: ceiling reached → BLOCKED
See src/model-upgrade-table.md for full upgrade paths per engine and complexity level.
When all US pass individually, the final ALL verify runs sequentially per-US instead of one big check. This prevents verifier timeout on large PRDs. After all per-US checks pass, the project's test suite runs once as a cross-US integration check.
/rlp-desk brainstorm <description> Plan before init (interactive)
/rlp-desk init <slug> [objective] Create project scaffold
/rlp-desk run <slug> [--opts] Run the loop (this session = leader)
/rlp-desk status <slug> Show loop status
/rlp-desk logs <slug> [N] Show iteration logs
/rlp-desk clean <slug> [--kill-session] Reset for re-run
| Option | Default | Description |
|---|---|---|
--mode agent|tmux |
agent | tmux=zsh Leader (stable, production), agent=Node Leader (alpha) |
--worker-model MODEL |
haiku | Worker model. name=claude, name:reasoning=codex |
--lock-worker-model |
off | Disable auto model upgrade on failure |
--verifier-model MODEL |
sonnet | per-US verification model (lighter) |
--final-verifier-model MODEL |
opus | final ALL verification model (stricter) |
--consensus off|all|final-only |
off | Cross-engine consensus scope |
--consensus-model MODEL |
gpt-5.5:medium | per-US cross-verifier (lighter) |
--final-consensus-model MODEL |
gpt-5.5:high | final cross-verifier (stricter) |
--verify-mode per-us|batch |
per-us | per-us: verify each US → final ALL |
--cb-threshold N |
6 | Consecutive failures → BLOCKED |
--max-iter N |
100 | Max iterations → TIMEOUT |
--iter-timeout N |
600 | Per-iteration timeout seconds (tmux only) |
--debug |
off | Debug logging |
--with-self-verification |
off | Post-campaign SV report |
RLP Desk runs two distinct verification passes:
- Per-US (
--verifier-model, default: sonnet) — runs after each user story completes. Lightweight and fast, catches issues early before later stories build on broken foundations. - Final ALL (
--final-verifier-model, default: opus) — runs once after all user stories pass individually. Stricter and more thorough, catches cross-US integration issues and anything per-US missed.
When --consensus is enabled, a second cross-engine verifier runs alongside each pass: --consensus-model for per-US and --final-consensus-model for the final ALL gate. Both engines must pass.
After brainstorm, init detects your environment and presents run command presets:
- Codex detected (GPT Pro / spark) → recommends cross-engine mode (
--worker-model spark:high --consensus final-only) - Codex detected (large PRD, AC > 15) → offers gpt-5.5 preset (
--worker-model gpt-5.5:high --consensus final-only) - Claude-only → defaults to
--debugwith haiku worker and opus final verifier - Basic → minimal flags for quick iteration
The brainstorm phase evaluates complexity (US count, file scope, logic, dependencies, code impact) and recommends a starting model. You can override any recommendation.
RLP Desk supports two execution modes. Both honor the same governance protocol.
v0.14.0 status:
--mode tmux(zsh-backed) is the stable, production path with the full safety net (heartbeat, copy-mode guard, prompt-stall timeout, no-progress detection, claude model upgrade chain).--mode agentis alpha and ships without those features — the runner emits a stderr warning when agent mode is invoked. For long campaigns and BOS-style autonomous loops, use--mode tmux.
| Environment | Agent Mode (alpha) | Tmux Mode (stable) |
|---|---|---|
| Claude Code (any terminal) | Works | Requires tmux |
| Inside tmux session | Works | Works — panes split in current window |
| Outside tmux session | Works | Rejected — "start tmux first" |
| Need | Use |
|---|---|
| Production / autonomous campaigns | --mode tmux (stable) |
| Long campaigns, CI, overnight runs | --mode tmux (stable) |
| Quick interactive exploration inside Claude Code | --mode agent (alpha — Node-native) |
/rlp-desk run calculator
The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via Agent(). The Leader is an LLM that dynamically routes models and reasons about context.
- Works anywhere — no tmux required
- Dynamic model routing — Leader upgrades models on failure
Known limitation: Agent mode runs inside Claude Code's turn-based request-response model. If the LLM outputs text without a tool call, the turn terminates and the loop pauses until the user sends "continue." This is a platform constraint — the protocol mitigates it but cannot guarantee 100% uninterrupted execution. For guaranteed autonomous loops, use tmux mode.
- Fix Loop — extracts verifier issues and feeds them back to the next worker
- Best for interactive development
/rlp-desk run calculator --mode tmux
Requires running inside a tmux session. A shell script takes over as Leader, splitting your current window into three panes. Workers run interactive claude sessions — you can watch them work in real-time.
+---------------------+---------------------+
| Your pane (Leader) | Worker pane |
| shell loop running | claude TUI running |
| polls signal files | you see it working |
| +---------------------+
| | Verifier pane |
| | claude TUI running |
| | (only when needed) |
+---------------------+---------------------+
- Real-time visibility — watch Worker/Verifier execute live
- Zero-token orchestration — shell loop, not LLM
- Automatic cleanup — panes removed on completion
- Best for long campaigns and observability
Prerequisites: tmux and jq must be installed.
To clean up tmux artifacts:
/rlp-desk clean calculator --kill-session
RLP Desk supports two execution engines for Worker and Verifier. Claude is the default. Codex is opt-in.
/rlp-desk run calculatorUses Claude Code's Agent() tool (agent mode) or claude -p CLI (tmux mode). Supports dynamic model routing (haiku/sonnet/opus).
# Install codex CLI first
npm install -g @openai/codex
# Run with codex worker (spark requires GPT Pro)
/rlp-desk run calculator --worker-model spark:high
# Customize model and reasoning effort
/rlp-desk run calculator --worker-model gpt-5.5:high
# Cross-engine: codex worker, claude verifier (recommended)
/rlp-desk run calculator --worker-model spark:high --consensus final-only --debugThe engine is inferred automatically from the --worker-model value: a plain model name (e.g. haiku) routes to Claude, while name:reasoning format (e.g. spark:high) routes to Codex. The codex binary is only required when a codex model is specified.
| Engine | Agent Mode | Tmux Mode | Dynamic Routing |
|---|---|---|---|
| claude | Agent() tool |
claude -p TUI |
Yes (haiku/sonnet/opus) |
| codex | Bash("codex ...") |
codex TUI |
No (static model) |
Each user story is verified independently, then a final full verification runs:
Worker: US-001 → Verifier(per-US): US-001 only → pass
Worker: US-002 → Verifier(per-US): US-002 only → pass
...
Final Verify: opus + 5.4 high → both pass → COMPLETE
Per-US catches issues early before later stories build on broken foundations.
/rlp-desk run calculator --verify-mode batch
Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.
By default, Worker and Verifier stop and ask for human input when they encounter document conflicts (e.g., PRD says one thing, test-spec says another) or ambiguous instructions. This breaks unattended execution.
--autonomous enables fully unattended campaigns:
/rlp-desk run my-feature --mode tmux --worker-model gpt-5.5:medium --autonomous --debugWhen --autonomous is active:
- PRD is the single source of truth. Resolution priority:
PRD > test-spec > context > memory - No stopping for questions. Worker and Verifier make autonomous decisions based on the priority chain
- All conflicts are logged. Every decision is recorded in
conflict-log.jsonlfor post-campaign review
Each conflict is logged as a JSONL entry in logs/<slug>/conflict-log.jsonl:
{
"iteration": 1,
"us_id": "US-001",
"source_a": "worker-prompt",
"source_b": "prd",
"conflict": "US-00 is required by the iteration prompt but is not defined as a PRD user story.",
"resolution": "Followed PRD as source of truth."
}- Long-running campaigns that run overnight or while you're away
- High-iteration tasks where stopping for every ambiguity wastes hours
- Well-defined PRDs where the PRD is comprehensive and authoritative
- Exploratory work where you want to review each decision
- Ambiguous PRDs where conflicts indicate real design gaps that need human judgment
- First run of a new project — run without
--autonomousfirst to catch PRD issues interactively
After the campaign, review the conflict log to identify systemic issues:
cat .claude/ralph-desk/logs/<slug>/conflict-log.jsonl | jq .Common patterns:
- Repeated PRD vs test-spec conflicts — test-spec needs updating to match PRD
- Scope lock vs fix contract conflicts — governance rules may need tuning
- Missing PRD definitions — Worker created stories not in the PRD (add them or tighten the brainstorm)
After init, your project gets this scaffold:
your-project/
├── .claude/
│ ├── settings.local.json # rlp-desk permissions (auto-added by init)
│ └── ralph-desk/
│ ├── prompts/
│ │ ├── <slug>.worker.prompt.md
│ │ └── <slug>.verifier.prompt.md
│ ├── context/
│ │ └── <slug>-latest.md
│ ├── memos/
│ │ └── <slug>-memory.md
│ ├── plans/
│ │ ├── prd-<slug>.md
│ │ └── test-spec-<slug>.md
│ └── logs/<slug>/
│ └── status.json
init automatically adds the following permissions to .claude/settings.local.json:
{
"permissions": {
"allow": [
"Read(.claude/ralph-desk/**)",
"Edit(.claude/ralph-desk/**)",
"Write(.claude/ralph-desk/**)"
]
}
}Why: Claude Code treats .claude/ files as sensitive and prompts for confirmation on each access, even with --dangerously-skip-permissions. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.
Note: settings.local.json is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.
See examples/calculator/ for a complete example that implements a Python calculator module with tests using the RLP Desk loop.
The example demonstrates:
- A PRD with two user stories (calculator functions + pytest tests)
- Test specification with verification commands
- Worker and verifier prompts configured for the task
To try it yourself:
mkdir my-calc && cd my-calc
/rlp-desk brainstorm "Python calculator with add, subtract, multiply, divide + pytest tests"
/rlp-desk run loop-test
- Architecture — Design philosophy, Agent() and tmux execution modes
- Getting Started — Step-by-step tutorial with the calculator example
- Protocol Reference — Full protocol specification
- Future Plans — P3 items and upcoming features
See CONTRIBUTING.md.