RLP Desk

Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification.

RLP Desk brings Geoffrey Huntley's Ralph Loop philosophy to Claude Code. Inspired by OpenAI Codex's long-horizon tasks and design-desk, it orchestrates fresh-context workers and verifiers through Claude Code's Agent() tool.

Key insight: Each iteration starts fresh. No accumulated context drift. The filesystem is the only memory.

[Your Session = LEADER]
        │
  Agent()├──▶ [Worker (fresh context)]
        │     └── reads PRD + memory → implements → updates memory
        │
  Agent()└──▶ [Verifier (fresh context)]
              └── reads done-claim → runs checks → writes verdict

Quick Start

1. Install

npm install -g @ai-dev-methodologies/rlp-desk

Or without npm:

curl -sSL https://raw.githubusercontent.com/ai-dev-methodologies/rlp-desk/main/install.sh | bash

2. Brainstorm (recommended)

Always start with brainstorm. It interactively walks you through the project contract:

/rlp-desk brainstorm "implement a Python calculator with tests"

You'll be asked to confirm each item:

Slug — project identifier
User Stories — discrete, testable units with Given/When/Then acceptance criteria
Task Type & Risk Level — code/visual/content/integration/infra × LOW/MEDIUM/HIGH/CRITICAL
Iteration Unit — one story per iteration (incremental) or all at once (fast)
Verification Commands — how to check the work
Ambiguity Gate — AC quality scoring (IL-2, 0-12 scale, blocks init if < 6)
Models — which Claude model for Worker/Verifier

3. Run

# Recommended (cross-engine + final consensus):
/rlp-desk run <slug> --mode tmux --worker-model spark:high --consensus final-only --debug

# Claude-only:
/rlp-desk run <slug> --debug

The leader loop runs autonomously — spawning workers, verifying results, and tracking progress until completion or a circuit breaker triggers.

Why?

The Context Problem

LLM conversations accumulate context. Long sessions drift, hallucinate, and forget earlier decisions. The Ralph Loop solves this by treating context as a disposable resource:

Each worker gets a fresh context — no prior conversation, no accumulated confusion
Filesystem = memory — PRDs, campaign memory, and context files are the only state
Independent verification — a separate fresh-context verifier checks the worker's claims against real evidence

Lineage

Concept	Source
Fresh context per iteration	Ralph Loop (guide, tips)
Long-horizon autonomous tasks	OpenAI Codex
Desk-based orchestration	design-desk
Agent() subprocess model	Claude Code native

How It Works

Three Roles

Role	Runs In	Responsibility
Leader	Your current session	Orchestrates the loop, reads memory, selects models, writes sentinels
Worker	Fresh `Agent()` context	Executes one bounded action per iteration, updates memory
Verifier	Fresh `Agent()` context	Independently verifies worker claims with fresh evidence

The Loop

for iteration in 1..max_iter:

  1. Check sentinels (complete? blocked?)
  2. Read campaign memory → get next iteration contract
  3. Select model (haiku/sonnet/opus based on complexity)
  4. Build worker prompt → dispatch via Agent()
  5. Worker executes one bounded action, updates memory
  6. If worker claims done → dispatch Verifier via Agent()
  7. Verifier runs fresh checks → pass/fail/blocked
  8. Update status, report to user, continue or stop

Live PRD Update

The Leader computes a hash for prd-<slug>.md at startup and again at each iteration using md5.

When the hash changes, it:

Logs prd_changed=true with prd_hash, previous/new US counts, and new_us
Splits the PRD into per-US files (prd-<slug>-US-<id>.md)
Splits the test-spec into per-US files (test-spec-<slug>-US-<id>.md)
Updates the in-memory PRD US list used for per-US dispatch
Adds NOTE: PRD was updated since last iteration. New/changed US may exist. to the Worker prompt

If the PRD hash is unchanged, prd_changed=false is logged and no re-split is triggered.

If the PRD file is missing, the process degrades gracefully and continues without failing the campaign loop.

Verification Policy (v0.3.0)

RLP Desk enforces a comprehensive verification policy defined in governance.md:

Iron Laws (§1a) — 4 absolute rules that cannot be violated:

IL-1: No completion claims without fresh verification evidence
IL-2: No init without AC quality score ≥ 6 (Ambiguity Gate)
IL-3: No pass with TODO in any required verification layer
IL-4: No pass without test count ≥ AC count × 3

Evidence Gate (§1b) — 5-step protocol: IDENTIFY → RUN → READ → VERIFY → ONLY THEN claim

Risk Classification (§1c) — Proportional verification layers per risk level:

Risk	Required Layers
LOW	L1 (Unit) + L3 (E2E)
MEDIUM	L1 + L2 (Integration) + L3
HIGH	L1 + L2 + L3 + L4 (Deploy)
CRITICAL	L1 + L2 + L3 + L4 + mutation testing

Execution Traceability (§1f) — Always-on, not flag-gated:

Worker records execution_steps in done-claim.json (what was done, in what order, with evidence)
Verifier records reasoning in verify-verdict.json (why each judgment was made)

Circuit Breakers

Condition	Action
Context unchanged for 3 iterations	BLOCKED
Same error repeated twice	Upgrade model, retry once, then BLOCKED
3 consecutive failures	Architecture Escalation (§7¾) → report to user
Max iterations reached	TIMEOUT

Verification Strategy (v0.5)

Core principle: Worker and Verifier use different AI engines whenever possible.

Per-US: lightweight verification after each user story (catches issues early)
Final: top-tier consensus gate before COMPLETE (quality guarantee)
Progressive upgrade: auto-upgrade models on consecutive failure (2-attempt windows)
Verifier minimum: claude sonnet (haiku cannot verify)

1. Claude-only (codex not installed)

Verifier is always +1 tier above Worker. Same-engine shares blind spots — install codex for improved detection.

Risk	Worker	Per-US Verifier	Worker upgrade path	Verifier upgrade path
LOW	haiku	sonnet	sonnet → opus	sonnet → opus
MEDIUM	sonnet	sonnet	opus	sonnet → opus
HIGH	sonnet	opus	opus	opus (ceiling)
CRITICAL	opus	opus ⚠	(ceiling)	(ceiling)

Final: opus solo ⚠ same-engine warning displayed

2. Cross-engine: GPT Pro (spark + 5.4)

Spark is speed-optimized for coding. Use as Worker for LOW-HIGH; 5.4 for CRITICAL.

Risk	Worker (codex)	Per-US Verifier (claude)	Worker upgrade path	Verifier upgrade path
LOW	spark medium	sonnet	spark high → xhigh	sonnet → opus
MEDIUM	spark high	sonnet	spark xhigh → 5.4 medium	sonnet → opus
HIGH	spark xhigh	opus	5.4 high → 5.4 xhigh	opus (ceiling)
CRITICAL	5.4 high	opus	5.4 xhigh	opus (ceiling)

Final: opus + 5.4 high (both must PASS)

3. Cross-engine: Non-Pro (5.4 only)

Risk	Worker (codex)	Per-US Verifier (claude)	Worker upgrade path	Verifier upgrade path
LOW	5.4 low	sonnet	5.4 medium → high	sonnet → opus
MEDIUM	5.4 medium	sonnet	5.4 high → xhigh	sonnet → opus
HIGH	5.4 high	opus	5.4 xhigh	opus (ceiling)
CRITICAL	5.4 xhigh	opus	(ceiling)	opus (ceiling)

Final: opus + 5.4 high (both must PASS)

Final Verify

Environment	Engine 1	Engine 2	Rule
Claude-only	opus	—	Solo ⚠
Cross-engine	opus	5.4 high	Both must PASS → COMPLETE

Progressive Upgrade (Worker Only)

Worker auto-upgrades on consecutive same-US failure. Verifier is fixed at campaign start. CB default: 6.

fail 1-2: keep current model (2-attempt window)
fail 3-4: upgrade 1 step (e.g., haiku → sonnet)
fail 5-6: upgrade 2 steps (e.g., haiku → opus)
fail 7+:  ceiling reached → BLOCKED

See src/model-upgrade-table.md for full upgrade paths per engine and complexity level.

Sequential Final Verify

When all US pass individually, the final ALL verify runs sequentially per-US instead of one big check. This prevents verifier timeout on large PRDs. After all per-US checks pass, the project's test suite runs once as a cross-US integration check.

Commands

/rlp-desk brainstorm <description>     Plan before init (interactive)
/rlp-desk init  <slug> [objective]     Create project scaffold
/rlp-desk run   <slug> [--opts]        Run the loop (this session = leader)
/rlp-desk status <slug>                Show loop status
/rlp-desk logs  <slug> [N]             Show iteration logs
/rlp-desk clean <slug> [--kill-session]  Reset for re-run

Run Options

Option	Default	Description
`--mode agent\|tmux`	agent	tmux=zsh Leader (stable, production), agent=Node Leader (alpha)
`--worker-model MODEL`	haiku	Worker model. `name`=claude, `name:reasoning`=codex
`--lock-worker-model`	off	Disable auto model upgrade on failure
`--verifier-model MODEL`	sonnet	per-US verification model (lighter)
`--final-verifier-model MODEL`	opus	final ALL verification model (stricter)
`--consensus off\|all\|final-only`	off	Cross-engine consensus scope
`--consensus-model MODEL`	gpt-5.5:medium	per-US cross-verifier (lighter)
`--final-consensus-model MODEL`	gpt-5.5:high	final cross-verifier (stricter)
`--verify-mode per-us\|batch`	per-us	per-us: verify each US → final ALL
`--cb-threshold N`	6	Consecutive failures → BLOCKED
`--max-iter N`	100	Max iterations → TIMEOUT
`--iter-timeout N`	600	Per-iteration timeout seconds (tmux only)
`--debug`	off	Debug logging
`--with-self-verification`	off	Post-campaign SV report

Per-US vs Final Verification

RLP Desk runs two distinct verification passes:

Per-US (--verifier-model, default: sonnet) — runs after each user story completes. Lightweight and fast, catches issues early before later stories build on broken foundations.
Final ALL (--final-verifier-model, default: opus) — runs once after all user stories pass individually. Stricter and more thorough, catches cross-US integration issues and anything per-US missed.

When --consensus is enabled, a second cross-engine verifier runs alongside each pass: --consensus-model for per-US and --final-consensus-model for the final ALL gate. Both engines must pass.

Init Presets

After brainstorm, init detects your environment and presents run command presets:

Codex detected (GPT Pro / spark) → recommends cross-engine mode (--worker-model spark:high --consensus final-only)
Codex detected (large PRD, AC > 15) → offers gpt-5.5 preset (--worker-model gpt-5.5:high --consensus final-only)
Claude-only → defaults to --debug with haiku worker and opus final verifier
Basic → minimal flags for quick iteration

The brainstorm phase evaluates complexity (US count, file scope, logic, dependencies, code impact) and recommends a starting model. You can override any recommendation.

Execution Modes

RLP Desk supports two execution modes. Both honor the same governance protocol.

v0.14.0 status: --mode tmux (zsh-backed) is the stable, production path with the full safety net (heartbeat, copy-mode guard, prompt-stall timeout, no-progress detection, claude model upgrade chain). --mode agent is alpha and ships without those features — the runner emits a stderr warning when agent mode is invoked. For long campaigns and BOS-style autonomous loops, use --mode tmux.

Environment Compatibility

Environment	Agent Mode (alpha)	Tmux Mode (stable)
Claude Code (any terminal)	Works	Requires tmux
Inside tmux session	Works	Works — panes split in current window
Outside tmux session	Works	Rejected — "start tmux first"

Choosing Your Mode

Need	Use
Production / autonomous campaigns	`--mode tmux` (stable)
Long campaigns, CI, overnight runs	`--mode tmux` (stable)
Quick interactive exploration inside Claude Code	`--mode agent` (alpha — Node-native)

Agent Mode (default) — "Smart Mode"

/rlp-desk run calculator

The current Claude Code session acts as the Leader, dispatching Workers and Verifiers via Agent(). The Leader is an LLM that dynamically routes models and reasons about context.

Works anywhere — no tmux required
Dynamic model routing — Leader upgrades models on failure

Known limitation: Agent mode runs inside Claude Code's turn-based request-response model. If the LLM outputs text without a tool call, the turn terminates and the loop pauses until the user sends "continue." This is a platform constraint — the protocol mitigates it but cannot guarantee 100% uninterrupted execution. For guaranteed autonomous loops, use tmux mode.

Fix Loop — extracts verifier issues and feeds them back to the next worker
Best for interactive development

Tmux Mode — "Lean Mode"

/rlp-desk run calculator --mode tmux

Requires running inside a tmux session. A shell script takes over as Leader, splitting your current window into three panes. Workers run interactive claude sessions — you can watch them work in real-time.

+---------------------+---------------------+
| Your pane (Leader)  | Worker pane         |
| shell loop running  | claude TUI running  |
| polls signal files  | you see it working  |
|                     +---------------------+
|                     | Verifier pane       |
|                     | claude TUI running  |
|                     | (only when needed)  |
+---------------------+---------------------+

Real-time visibility — watch Worker/Verifier execute live
Zero-token orchestration — shell loop, not LLM
Automatic cleanup — panes removed on completion
Best for long campaigns and observability

Prerequisites: tmux and jq must be installed.

To clean up tmux artifacts:

/rlp-desk clean calculator --kill-session

Engine Support

RLP Desk supports two execution engines for Worker and Verifier. Claude is the default. Codex is opt-in.

Claude (default)

/rlp-desk run calculator

Uses Claude Code's Agent() tool (agent mode) or claude -p CLI (tmux mode). Supports dynamic model routing (haiku/sonnet/opus).

Codex (opt-in)

# Install codex CLI first
npm install -g @openai/codex

# Run with codex worker (spark requires GPT Pro)
/rlp-desk run calculator --worker-model spark:high

# Customize model and reasoning effort
/rlp-desk run calculator --worker-model gpt-5.5:high

# Cross-engine: codex worker, claude verifier (recommended)
/rlp-desk run calculator --worker-model spark:high --consensus final-only --debug

The engine is inferred automatically from the --worker-model value: a plain model name (e.g. haiku) routes to Claude, while name:reasoning format (e.g. spark:high) routes to Codex. The codex binary is only required when a codex model is specified.

Engine	Agent Mode	Tmux Mode	Dynamic Routing
claude	`Agent()` tool	`claude -p` TUI	Yes (haiku/sonnet/opus)
codex	`Bash("codex ...")`	`codex` TUI	No (static model)

Verification Modes

Per-US Verification (default)

Each user story is verified independently, then a final full verification runs:

Worker: US-001 → Verifier(per-US): US-001 only → pass
Worker: US-002 → Verifier(per-US): US-002 only → pass
...
Final Verify: opus + 5.4 high → both pass → COMPLETE

Per-US catches issues early before later stories build on broken foundations.

Batch Verification

/rlp-desk run calculator --verify-mode batch

Worker completes all stories, then a single verification checks all AC at once. Final verify still applies.

Autonomous Mode

By default, Worker and Verifier stop and ask for human input when they encounter document conflicts (e.g., PRD says one thing, test-spec says another) or ambiguous instructions. This breaks unattended execution.

--autonomous enables fully unattended campaigns:

/rlp-desk run my-feature --mode tmux --worker-model gpt-5.5:medium --autonomous --debug

How it works

When --autonomous is active:

PRD is the single source of truth. Resolution priority: PRD > test-spec > context > memory
No stopping for questions. Worker and Verifier make autonomous decisions based on the priority chain
All conflicts are logged. Every decision is recorded in conflict-log.jsonl for post-campaign review

Conflict log

Each conflict is logged as a JSONL entry in logs/<slug>/conflict-log.jsonl:

{
  "iteration": 1,
  "us_id": "US-001",
  "source_a": "worker-prompt",
  "source_b": "prd",
  "conflict": "US-00 is required by the iteration prompt but is not defined as a PRD user story.",
  "resolution": "Followed PRD as source of truth."
}

When to use

Long-running campaigns that run overnight or while you're away
High-iteration tasks where stopping for every ambiguity wastes hours
Well-defined PRDs where the PRD is comprehensive and authoritative

When NOT to use

Exploratory work where you want to review each decision
Ambiguous PRDs where conflicts indicate real design gaps that need human judgment
First run of a new project — run without --autonomous first to catch PRD issues interactively

Post-campaign review

After the campaign, review the conflict log to identify systemic issues:

cat .claude/ralph-desk/logs/<slug>/conflict-log.jsonl | jq .

Common patterns:

Repeated PRD vs test-spec conflicts — test-spec needs updating to match PRD
Scope lock vs fix contract conflicts — governance rules may need tuning
Missing PRD definitions — Worker created stories not in the PRD (add them or tighten the brainstorm)

Project Structure

After init, your project gets this scaffold:

your-project/
├── .claude/
│   ├── settings.local.json          # rlp-desk permissions (auto-added by init)
│   └── ralph-desk/
│       ├── prompts/
│       │   ├── <slug>.worker.prompt.md
│       │   └── <slug>.verifier.prompt.md
│       ├── context/
│       │   └── <slug>-latest.md
│       ├── memos/
│       │   └── <slug>-memory.md
│       ├── plans/
│       │   ├── prd-<slug>.md
│       │   └── test-spec-<slug>.md
│       └── logs/<slug>/
│           └── status.json

Local Settings

init automatically adds the following permissions to .claude/settings.local.json:

{
  "permissions": {
    "allow": [
      "Read(.claude/ralph-desk/**)",
      "Edit(.claude/ralph-desk/**)",
      "Write(.claude/ralph-desk/**)"
    ]
  }
}

Why: Claude Code treats .claude/ files as sensitive and prompts for confirmation on each access, even with --dangerously-skip-permissions. Without these permissions, Worker and Verifier agents are blocked by interactive prompts during automated loop execution.

Note: settings.local.json is local to your machine and is not committed to git. If the file already exists, permissions are merged without overwriting your existing settings.

Example: Calculator

See examples/calculator/ for a complete example that implements a Python calculator module with tests using the RLP Desk loop.

The example demonstrates:

A PRD with two user stories (calculator functions + pytest tests)
Test specification with verification commands
Worker and verifier prompts configured for the task

To try it yourself:

mkdir my-calc && cd my-calc
/rlp-desk brainstorm "Python calculator with add, subtract, multiply, divide + pytest tests"
/rlp-desk run loop-test

Documentation

Architecture — Design philosophy, Agent() and tmux execution modes
Getting Started — Step-by-step tutorial with the calculator example
Protocol Reference — Full protocol specification
Future Plans — P3 items and upcoming features

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.claude/ralph-desk		.claude/ralph-desk
.github		.github
docs		docs
examples/calculator/.claude/ralph-desk		examples/calculator/.claude/ralph-desk
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

RLP Desk

Quick Start

1. Install

2. Brainstorm (recommended)

3. Run

Why?

The Context Problem

Lineage

How It Works

Three Roles

The Loop

Live PRD Update

Verification Policy (v0.3.0)

Circuit Breakers

Verification Strategy (v0.5)

1. Claude-only (codex not installed)

2. Cross-engine: GPT Pro (spark + 5.4)

3. Cross-engine: Non-Pro (5.4 only)

Final Verify

Progressive Upgrade (Worker Only)

Sequential Final Verify

Commands

Run Options

Per-US vs Final Verification

Init Presets

Execution Modes

Environment Compatibility

Choosing Your Mode

Agent Mode (default) — "Smart Mode"

Tmux Mode — "Lean Mode"

Engine Support

Claude (default)

Codex (opt-in)

Verification Modes

Per-US Verification (default)

Batch Verification

Autonomous Mode

How it works

Conflict log

When to use

When NOT to use

Post-campaign review

Project Structure

Local Settings

Example: Calculator

Documentation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 40

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages