Skip to content

cpplain/claude-agent-harness

Repository files navigation

Agent Harness

A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and SDK-native sandbox isolation.

This project originated from the autonomous-coding example in the claude-quickstarts repository and extends it into a fully configurable, project-agnostic harness.

What This Is

This is a project-agnostic harness that can drive any kind of autonomous coding task:

  • Web applications (Next.js, React, Vue, etc.)
  • Backend services (Node, Python, Ruby, Go, etc.)
  • Test refactoring (RSpec, Jest, Pytest, etc.)
  • Data pipelines and ETL workflows
  • CLI tools and automation scripts
  • Any other coding task requiring multiple iterations

The harness is completely generic — all project-specific configuration (tech stack, tools, prompts) is declared in a .agent-harness/config.toml file. No hardcoded assumptions about your stack.

Overview

Agent Harness provides:

  • Phase-based workflows — declarative phase definitions with conditions and run-once semantics
  • TOML-based configuration — no code changes needed to customize behavior
  • SDK-native security — OS sandbox with network isolation, declarative permission rules (allow/deny), secure defaults
  • Progress tracking — JSON checklist, notes file, or none with automatic completion detection
  • Error recovery — exponential backoff and circuit breaker to prevent runaway costs
  • MCP server support — browser automation, databases, etc.
  • Session persistence — auto-continue across sessions with state tracking
  • Setup verification — check auth, tools, config before running

Design Philosophy

This project follows Anthropic's recommendations for building agents. Before designing or implementing any feature, read and understand their guidance—they have already solved most agent problems and documented both WHAT to do and WHY.

Required Reading

SDK & Implementation

Architecture & Design

Example: Security

This project originally had ~500 lines of custom shell command parsing for security. The sandboxing article explains why Anthropic chose OS-level isolation instead—reading it first would have avoided this unnecessary complexity.

Getting Started

1. Clone and install

git clone <repo-url>
cd claude-agent-harness
uv sync
Alternative: using pip
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

If using pip, replace uv run with python in all commands below.

2. Set up authentication

Export one of these environment variables:

  • ANTHROPIC_API_KEY — get one from console.anthropic.com
  • CLAUDE_CODE_OAUTH_TOKEN — via claude setup-token

See .env.example for all options.

Using 1Password CLI

If you manage secrets with 1Password CLI, create a .env file with an op:// reference:

ANTHROPIC_API_KEY="op://Vault/Item/api_key"

Then wrap any command with op run:

op run --env-file "./.env" -- uv run python -m agent_harness run --project-dir ./my-project

3. Run

# Scaffold a new project configuration
uv run python -m agent_harness init --project-dir ./my-project

# Edit the configuration
#    -> ./my-project/.agent-harness/config.toml

# Verify setup
uv run python -m agent_harness verify --project-dir ./my-project

# Run the agent
uv run python -m agent_harness run --project-dir ./my-project

CLI Reference

# Run the agent
python -m agent_harness run --project-dir <path> [options]

# Verify setup (auth, dependencies, config)
python -m agent_harness verify [--project-dir <path>]

# Scaffold new project configuration
python -m agent_harness init --project-dir <path>

# Global flags (all commands)
--project-dir PATH      # Agent's working directory (default: .)
--harness-dir PATH      # Path to .agent-harness/ (default: project-dir/.agent-harness/)

# Run command options
--max-iterations N      # Override max iterations (default: from config)
--model MODEL           # Override model (default: from config)

How It Works

Phase-Driven Execution

The harness executes agents in configurable phases with conditions and run-once semantics:

  1. Initializer Phase (run_once: true):

    • Reads the specification
    • Creates a feature list with test cases
    • Sets up project structure
    • Initializes git repository
  2. Coding Phase (repeating):

    • Picks up where previous session left off
    • Implements features one by one
    • Marks features as complete in progress file
    • Creates git commits for changes

Session Management

  • Fresh context per session: Each session creates a new context window to prevent context pollution
  • Progress persistence: State preserved between sessions via:
    • Tracking file (e.g., feature_list.json) tracking feature completion
    • Session state file (session.json) tracking completed phases
    • Git commits preserving code changes
  • Auto-continue: Sessions auto-resume after configured delay (default 3s)
  • Completion detection: Harness stops automatically when tracker.is_complete() returns true (only json_checklist supports this; notes_file and none trackers require manual stop via Ctrl+C)
  • Press Ctrl+C to pause; run same command to resume

Error Recovery & Circuit Breaker

Prevents runaway API costs when sessions fail repeatedly:

  • Tracks consecutive errors across sessions
  • Exponential backoff: 5s → 10s → 20s → 40s → 80s (circuit breaker trips; max cap 120s)
  • Circuit breaker: Trips after 5 consecutive errors (configurable)
  • Successful session resets error counter
  • Error context forwarded to next session to help recovery
[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0

Security Model

This harness follows Anthropic's secure deployment recommendations by relying on the SDK's built-in sandbox and permission system as the primary defense, rather than custom application-layer validation.

SDK-Native Sandbox

The Claude SDK provides process-level isolation with:

  • Process isolation — Bash commands run in a sandboxed subprocess
  • Network restrictions — Configurable domain allowlist and Unix socket access
  • Filesystem boundaries — Commands are restricted to the project directory
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false  # secure default

[security.sandbox.network]
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

Declarative Permission Rules

Security is enforced through SDK permission rules, not runtime command parsing:

[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
    "Read(./**)", "Write(./**)", "Edit(./**)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

Permission rules are evaluated by the SDK before tool execution. The agent cannot bypass these rules through prompt injection or indirect command execution.

Secure Defaults

  • allow_unsandboxed_commands defaults to false
  • When sandbox is enabled, auto_allow_bash_if_sandboxed=true auto-allows Bash commands
  • When sandbox is disabled, explicit permissions.allow rules are required
  • Network access is denied by default

Git Protection Recommendations

For production deployments, protect critical branches using server-side git hooks or branch protection rules on your git hosting platform (GitHub, GitLab, Bitbucket), not client-side validation. This prevents destructive operations like git push --force at the source.

Configuration

Configuration lives in .agent-harness/config.toml. See the example config for a complete reference.

Directory Layout

project_dir/
├── .agent-harness/
│   ├── logs/                  # Session logs (auto-created, gitignored)
│   ├── prompts/               # Prompt files (referenced by config)
│   │   ├── app_spec.txt
│   │   ├── coding.md
│   │   └── initializer.md
│   ├── .claude_settings.json  # Generated security settings (auto-created, gitignored)
│   ├── config.toml            # Main configuration (required)
│   └── session.json           # Session number, completed phases (auto-created)
└── (generated code lives here)

Full Configuration Reference

Complete annotated example showing all available configuration options:

# --- Agent Settings ---
# Model to use for agent execution
model = "claude-sonnet-4-5-20250929"

# System prompt (can use "file:prompts/system.md" to load from file)
system_prompt = "You are an expert full-stack developer..."

# --- Session Settings ---
# Maximum API turns per session before auto-continuing
max_turns = 1000

# Maximum total sessions before stopping (default: unlimited)
max_iterations = 10

# Delay in seconds before auto-continuing to next session
auto_continue_delay = 3

# --- Tools ---
[tools]
# Built-in Claude SDK tools to enable
builtin = ["Read", "Write", "Edit", "Glob", "Grep", "Bash"]

# MCP servers to connect
[tools.mcp_servers.puppeteer]
command = "npx"
args = ["puppeteer-mcp-server"]
env = { NODE_ENV = "production" }

# --- Security ---
[security]
# Permission mode: "default", "acceptEdits", "bypassPermissions", "plan"
permission_mode = "acceptEdits"

# OS-level sandbox configuration
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false

# Network restrictions for sandboxed commands
[security.sandbox.network]
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

# Declarative permission rules (evaluated by SDK before tool execution)
[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

# --- Progress Tracking ---
[tracking]
# Tracker type: "json_checklist", "notes_file", "none"
type = "json_checklist"

# Tracking file path (relative to harness_dir)
file = "feature_list.json"

# Field name indicating completion (for json_checklist)
passing_field = "passes"

# --- Error Recovery ---
[error_recovery]
# Circuit breaker: max consecutive session errors before stopping
max_consecutive_errors = 5

# Initial backoff delay after first error
initial_backoff_seconds = 5.0

# Maximum backoff delay (capped exponential backoff)
max_backoff_seconds = 120.0

# Multiplier for exponential backoff
backoff_multiplier = 2.0

# --- Phases ---
# Multi-phase workflow definitions
[[phases]]
name = "initializer"
prompt = "file:prompts/initializer.md"
run_once = true
condition = "not_exists:.agent-harness/feature_list.json"

[[phases]]
name = "coding"
prompt = "file:prompts/coding.md"

# --- Init Files ---
# Files to copy on first run
[[init_files]]
source = "prompts/app_spec.txt"
dest = "app_spec.txt"

# --- Post-Run Instructions ---
# Commands to display after agent completes
post_run_instructions = [
    "npm install",
    "npm run dev",
    "Open http://localhost:3000",
]

Configuration Fields

Top-Level Settings

Field Type Default Description
model string "claude-sonnet-4-5-20250929" Claude model to use for agent execution
system_prompt string "You are a helpful coding assistant." System prompt (supports file: references)
max_turns int 1000 Maximum API turns per session before auto-continuing
max_iterations int? null Maximum total sessions before stopping (unlimited if not set)
auto_continue_delay int 3 Delay in seconds before auto-continuing to next session
post_run_instructions string[] [] Commands to display in final summary banner

[tools] Section

Field Type Default Description
builtin string[] ["Read", "Write", "Edit", "Glob", "Grep", "Bash"] Built-in Claude SDK tools to enable

MCP Servers ([tools.mcp_servers.<name>]):

Field Type Default Description
command string "" Command to execute MCP server
args string[] [] Command-line arguments
env map {} Environment variables (supports ${VAR} expansion)

[security] Section

Field Type Default Description
permission_mode string "acceptEdits" Permission mode: "default", "acceptEdits", "bypassPermissions", or "plan"

Sandbox ([security.sandbox]):

Field Type Default Description
enabled bool true Enable OS-level sandbox for Bash commands
auto_allow_bash_if_sandboxed bool true Auto-allow all Bash commands when sandbox is enabled
allow_unsandboxed_commands bool false Allow commands outside sandbox (secure default: false)
excluded_commands string[] [] Commands excluded from sandboxing

Sandbox Network ([security.sandbox.network]):

Field Type Default Description
allowed_domains string[] [] Domains the agent can access via network
allow_local_binding bool false Allow binding to localhost addresses
allow_unix_sockets string[] [] Unix socket paths the agent can access

Permission Rules ([security.permissions]):

Field Type Default Description
allow string[] [] Tool patterns to allow (e.g., "Bash(npm *)", "Read(./**)")
deny string[] [] Tool patterns to deny (e.g., "Bash(curl *)", "Read(./.env)")

[tracking] Section

Field Type Default Description
type string "none" Tracker type: "json_checklist", "notes_file", or "none"
file string "" Tracking file path (relative to harness_dir, required for json_checklist/notes_file)
passing_field string "passes" JSON field indicating completion (for json_checklist)

[error_recovery] Section

Field Type Default Description
max_consecutive_errors int 5 Circuit breaker: max consecutive session errors before stopping
initial_backoff_seconds float 5.0 Initial backoff delay after first error
max_backoff_seconds float 120.0 Maximum backoff delay (capped exponential backoff)
backoff_multiplier float 2.0 Multiplier for exponential backoff

[[phases]] Section

Multiple phases can be defined using [[phases]] array syntax:

Field Type Default Description
name string required Phase name (for logging/debugging)
prompt string required Phase prompt (supports file: references)
run_once bool false Only execute this phase once across all sessions
condition string "" Condition for running phase (e.g., "not_exists:file.txt")

[[init_files]] Section

Multiple init files can be defined using [[init_files]] array syntax:

Field Type Default Description
source string required Source file path (relative to harness_dir)
dest string required Destination file path (relative to harness_dir)

Config Loading Precedence

CLI flags > config.toml values > defaults

Project Structure

claude-agent-harness/
├── agent_harness/          # Python package
│   ├── __init__.py
│   ├── __main__.py         # Entry point
│   ├── cli.py              # Argument parsing, subcommands
│   ├── client_factory.py   # Builds ClaudeSDKClient from config
│   ├── config.py           # Config loading, validation, HarnessConfig
│   ├── runner.py           # Generic agent loop
│   ├── tracking.py         # Progress tracking implementations
│   └── verify.py           # Setup verification checks
├── examples/
│   └── claude-ai-clone/
│       ├── .agent-harness/
│       │   ├── prompts/
│       │   │   ├── app_spec.txt
│       │   │   ├── coding.md
│       │   │   └── initializer.md
│       │   └── config.toml
│       └── README.md
├── tests/
│   ├── test_cli.py
│   ├── test_client_factory.py
│   ├── test_config.py
│   ├── test_prompts.py
│   ├── test_runner.py
│   ├── test_tracking.py
│   └── test_verify.py
├── .env.example
└── pyproject.toml

Examples

Claude.ai Clone (Next.js)

See examples/claude-ai-clone/ for a complete example that:

  • Uses Next.js/React stack (npm, node commands)
  • Integrates Puppeteer MCP server for browser testing
  • Generates a production-quality chat interface
  • Tracks progress via feature_list.json
# Run the Claude.ai clone example
python -m agent_harness run \
    --project-dir ./my-clone-output \
    --harness-dir examples/claude-ai-clone/.agent-harness

Troubleshooting

"Configuration file not found"

The harness expects a .agent-harness/config.toml file in your project directory. If you see this error:

  1. Check that you're running from the correct directory
  2. Use python -m agent_harness init --project-dir ./my-project to scaffold a new configuration
  3. If using --harness-dir, verify the path points to a directory containing config.toml

"Prompt file not found"

Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory. For example:

[[phases]]
prompt = "file:prompts/coding_prompt.md"  # Must exist at .agent-harness/prompts/coding_prompt.md

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

You need authentication credentials to use the Claude API:

  • API Key: Get one from console.anthropic.com and set export ANTHROPIC_API_KEY="your-key"
  • OAuth Token: Run claude setup-token and the harness will use CLAUDE_CODE_OAUTH_TOKEN automatically
  • See .env.example for setting these via environment file

Agent is hanging on the first session

The first session (initializer phase) can take 10-20+ minutes for complex projects because it:

  • Reads the entire spec
  • Plans the feature breakdown
  • Creates initial project structure
  • Sets up git repository
  • May run initial installs (npm, pip, etc.)

This is expected behavior. Subsequent sessions (coding phase) are typically faster as they focus on individual features.

If a session truly hangs:

  1. Check the .agent-harness/session.json file for error messages
  2. Look for permission prompts or security blocks in the output
  3. Verify your progress file format matches the configuration (e.g., feature_list.json with "passes": false fields)

Design Principles

  1. Zero assumptions about tech stack: The harness has no hardcoded knowledge of npm, Ruby, Python, or any other stack. Projects declare exactly what they need.

  2. Zero wasted context: Only tools and servers declared in the config are available to the agent. No unused MCP servers polluting the context.

  3. Defense in depth: Multiple security layers (sandbox, permission rules, secure defaults) protect against unintended actions.

  4. Session persistence: Progress is saved between sessions via the progress file and git commits, enabling long-running tasks that span hours or days.

  5. Fresh context per session: Each session starts with a clean context window, preventing context pollution and allowing unlimited total work.

Running Tests

# Run all tests
python -m unittest discover tests -v

# Run specific test modules
python -m unittest tests.test_config -v         # Configuration loading
python -m unittest tests.test_tracking -v       # Progress tracking
python -m unittest tests.test_runner -v         # Session loop logic
python -m unittest tests.test_client_factory -v # Client creation

Test coverage includes:

  • Security configuration: Sandbox settings, permission rules, network isolation
  • Configuration loading: TOML parsing, defaults, validation, error cases
  • Progress tracking: Completion detection, JSON parsing, print formatting
  • Prompt loading: File reading, file: resolution, error handling

License

MIT License. See LICENSE.

This harness is based on the patterns described in Anthropic's guide for long-running agent harnesses. The project originated from the autonomous-coding example in the claude-quickstarts repository and is built using the Claude Agent SDK.

Releases

No releases published

Packages

No packages published

Languages