Agent Harness

A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and SDK-native sandbox isolation.

This project originated from the autonomous-coding example in the claude-quickstarts repository and extends it into a fully configurable, project-agnostic harness.

What This Is

This is a project-agnostic harness that can drive any kind of autonomous coding task:

Web applications (Next.js, React, Vue, etc.)
Backend services (Node, Python, Ruby, Go, etc.)
Test refactoring (RSpec, Jest, Pytest, etc.)
Data pipelines and ETL workflows
CLI tools and automation scripts
Any other coding task requiring multiple iterations

The harness is completely generic — all project-specific configuration (tech stack, tools, prompts) is declared in a .agent-harness/config.toml file. No hardcoded assumptions about your stack.

Overview

Agent Harness provides:

Phase-based workflows — declarative phase definitions with conditions and run-once semantics
TOML-based configuration — no code changes needed to customize behavior
SDK-native security — OS sandbox with network isolation, declarative permission rules (allow/deny), secure defaults
Progress tracking — JSON checklist, notes file, or none with automatic completion detection
Error recovery — exponential backoff and circuit breaker to prevent runaway costs
MCP server support — browser automation, databases, etc.
Session persistence — auto-continue across sessions with state tracking
Setup verification — check auth, tools, config before running

Design Philosophy

This project follows Anthropic's recommendations for building agents. Before designing or implementing any feature, read and understand their guidance—they have already solved most agent problems and documented both WHAT to do and WHY.

Required Reading

SDK & Implementation

Agent SDK Overview — Built-in capabilities
Claude Code Sandboxing — Security model
Effective Harnesses — Session patterns

Architecture & Design

Building Effective Agents — Core principles
Effective Context Engineering — Prompt design
Writing Tools for Agents — Tool design

Example: Security

This project originally had ~500 lines of custom shell command parsing for security. The sandboxing article explains why Anthropic chose OS-level isolation instead—reading it first would have avoided this unnecessary complexity.

Getting Started

1. Clone and install

git clone <repo-url>
cd claude-agent-harness
uv sync

Alternative: using pip

python3 -m venv .venv && source .venv/bin/activate
pip install -e .

If using pip, replace uv run with python in all commands below.

2. Set up authentication

Export one of these environment variables:

ANTHROPIC_API_KEY — get one from console.anthropic.com
CLAUDE_CODE_OAUTH_TOKEN — via claude setup-token

See .env.example for all options.

Using 1Password CLI

If you manage secrets with 1Password CLI, create a .env file with an op:// reference:

ANTHROPIC_API_KEY="op://Vault/Item/api_key"

Then wrap any command with op run:

op run --env-file "./.env" -- uv run python -m agent_harness run --project-dir ./my-project

3. Run

# Scaffold a new project configuration
uv run python -m agent_harness init --project-dir ./my-project

# Edit the configuration
#    -> ./my-project/.agent-harness/config.toml

# Verify setup
uv run python -m agent_harness verify --project-dir ./my-project

# Run the agent
uv run python -m agent_harness run --project-dir ./my-project

CLI Reference

# Run the agent
python -m agent_harness run --project-dir <path> [options]

# Verify setup (auth, dependencies, config)
python -m agent_harness verify [--project-dir <path>]

# Scaffold new project configuration
python -m agent_harness init --project-dir <path>

# Global flags (all commands)
--project-dir PATH      # Agent's working directory (default: .)
--harness-dir PATH      # Path to .agent-harness/ (default: project-dir/.agent-harness/)

# Run command options
--max-iterations N      # Override max iterations (default: from config)
--model MODEL           # Override model (default: from config)

How It Works

Phase-Driven Execution

The harness executes agents in configurable phases with conditions and run-once semantics:

Initializer Phase (run_once: true):
- Reads the specification
- Creates a feature list with test cases
- Sets up project structure
- Initializes git repository
Coding Phase (repeating):
- Picks up where previous session left off
- Implements features one by one
- Marks features as complete in progress file
- Creates git commits for changes

Session Management

Fresh context per session: Each session creates a new context window to prevent context pollution
Progress persistence: State preserved between sessions via:
- Tracking file (e.g., feature_list.json) tracking feature completion
- Session state file (session.json) tracking completed phases
- Git commits preserving code changes
Auto-continue: Sessions auto-resume after configured delay (default 3s)
Completion detection: Harness stops automatically when tracker.is_complete() returns true (only json_checklist supports this; notes_file and none trackers require manual stop via Ctrl+C)
Press Ctrl+C to pause; run same command to resume

Error Recovery & Circuit Breaker

Prevents runaway API costs when sessions fail repeatedly:

Tracks consecutive errors across sessions
Exponential backoff: 5s → 10s → 20s → 40s → 80s (circuit breaker trips; max cap 120s)
Circuit breaker: Trips after 5 consecutive errors (configurable)
Successful session resets error counter
Error context forwarded to next session to help recovery

[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0

Security Model

This harness follows Anthropic's secure deployment recommendations by relying on the SDK's built-in sandbox and permission system as the primary defense, rather than custom application-layer validation.

SDK-Native Sandbox

The Claude SDK provides process-level isolation with:

Process isolation — Bash commands run in a sandboxed subprocess
Network restrictions — Configurable domain allowlist and Unix socket access
Filesystem boundaries — Commands are restricted to the project directory

[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false  # secure default

[security.sandbox.network]
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

Declarative Permission Rules

Security is enforced through SDK permission rules, not runtime command parsing:

[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
    "Read(./**)", "Write(./**)", "Edit(./**)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

Permission rules are evaluated by the SDK before tool execution. The agent cannot bypass these rules through prompt injection or indirect command execution.

Secure Defaults

allow_unsandboxed_commands defaults to false
When sandbox is enabled, auto_allow_bash_if_sandboxed=true auto-allows Bash commands
When sandbox is disabled, explicit permissions.allow rules are required
Network access is denied by default

Git Protection Recommendations

For production deployments, protect critical branches using server-side git hooks or branch protection rules on your git hosting platform (GitHub, GitLab, Bitbucket), not client-side validation. This prevents destructive operations like git push --force at the source.

Configuration

Configuration lives in .agent-harness/config.toml. See the example config for a complete reference.

Directory Layout

project_dir/
├── .agent-harness/
│   ├── logs/                  # Session logs (auto-created, gitignored)
│   ├── prompts/               # Prompt files (referenced by config)
│   │   ├── app_spec.txt
│   │   ├── coding.md
│   │   └── initializer.md
│   ├── .claude_settings.json  # Generated security settings (auto-created, gitignored)
│   ├── config.toml            # Main configuration (required)
│   └── session.json           # Session number, completed phases (auto-created)
└── (generated code lives here)

Full Configuration Reference

Complete annotated example showing all available configuration options:

# --- Agent Settings ---
# Model to use for agent execution
model = "claude-sonnet-4-5-20250929"

# System prompt (can use "file:prompts/system.md" to load from file)
system_prompt = "You are an expert full-stack developer..."

# --- Session Settings ---
# Maximum API turns per session before auto-continuing
max_turns = 1000

# Maximum total sessions before stopping (default: unlimited)
max_iterations = 10

# Delay in seconds before auto-continuing to next session
auto_continue_delay = 3

# --- Tools ---
[tools]
# Built-in Claude SDK tools to enable
builtin = ["Read", "Write", "Edit", "Glob", "Grep", "Bash"]

# MCP servers to connect
[tools.mcp_servers.puppeteer]
command = "npx"
args = ["puppeteer-mcp-server"]
env = { NODE_ENV = "production" }

# --- Security ---
[security]
# Permission mode: "default", "acceptEdits", "bypassPermissions", "plan"
permission_mode = "acceptEdits"

# OS-level sandbox configuration
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
allow_unsandboxed_commands = false

# Network restrictions for sandboxed commands
[security.sandbox.network]
allowed_domains = ["registry.npmjs.org", "github.com"]
allow_local_binding = false
allow_unix_sockets = []

# Declarative permission rules (evaluated by SDK before tool execution)
[security.permissions]
allow = [
    "Bash(npm *)", "Bash(node *)", "Bash(git *)",
    "Bash(ls *)", "Bash(cat *)", "Bash(grep *)",
]
deny = [
    "Bash(curl *)", "Bash(wget *)",
    "Read(./.env)", "Read(./.env.*)",
]

# --- Progress Tracking ---
[tracking]
# Tracker type: "json_checklist", "notes_file", "none"
type = "json_checklist"

# Tracking file path (relative to harness_dir)
file = "feature_list.json"

# Field name indicating completion (for json_checklist)
passing_field = "passes"

# --- Error Recovery ---
[error_recovery]
# Circuit breaker: max consecutive session errors before stopping
max_consecutive_errors = 5

# Initial backoff delay after first error
initial_backoff_seconds = 5.0

# Maximum backoff delay (capped exponential backoff)
max_backoff_seconds = 120.0

# Multiplier for exponential backoff
backoff_multiplier = 2.0

# --- Phases ---
# Multi-phase workflow definitions
[[phases]]
name = "initializer"
prompt = "file:prompts/initializer.md"
run_once = true
condition = "not_exists:.agent-harness/feature_list.json"

[[phases]]
name = "coding"
prompt = "file:prompts/coding.md"

# --- Init Files ---
# Files to copy on first run
[[init_files]]
source = "prompts/app_spec.txt"
dest = "app_spec.txt"

# --- Post-Run Instructions ---
# Commands to display after agent completes
post_run_instructions = [
    "npm install",
    "npm run dev",
    "Open http://localhost:3000",
]

Configuration Fields

Top-Level Settings

Field	Type	Default	Description
`model`	string	`"claude-sonnet-4-5-20250929"`	Claude model to use for agent execution
`system_prompt`	string	`"You are a helpful coding assistant."`	System prompt (supports `file:` references)
`max_turns`	int	`1000`	Maximum API turns per session before auto-continuing
`max_iterations`	int?	`null`	Maximum total sessions before stopping (unlimited if not set)
`auto_continue_delay`	int	`3`	Delay in seconds before auto-continuing to next session
`post_run_instructions`	string[]	`[]`	Commands to display in final summary banner

`[tools]` Section

Field	Type	Default	Description
`builtin`	string[]	`["Read", "Write", "Edit", "Glob", "Grep", "Bash"]`	Built-in Claude SDK tools to enable

MCP Servers ([tools.mcp_servers.<name>]):

Field	Type	Default	Description
`command`	string	`""`	Command to execute MCP server
`args`	string[]	`[]`	Command-line arguments
`env`	map	`{}`	Environment variables (supports `${VAR}` expansion)

`[security]` Section

Field	Type	Default	Description
`permission_mode`	string	`"acceptEdits"`	Permission mode: `"default"`, `"acceptEdits"`, `"bypassPermissions"`, or `"plan"`

Sandbox ([security.sandbox]):

Field	Type	Default	Description
`enabled`	bool	`true`	Enable OS-level sandbox for Bash commands
`auto_allow_bash_if_sandboxed`	bool	`true`	Auto-allow all Bash commands when sandbox is enabled
`allow_unsandboxed_commands`	bool	`false`	Allow commands outside sandbox (secure default: false)
`excluded_commands`	string[]	`[]`	Commands excluded from sandboxing

Sandbox Network ([security.sandbox.network]):

Field	Type	Default	Description
`allowed_domains`	string[]	`[]`	Domains the agent can access via network
`allow_local_binding`	bool	`false`	Allow binding to localhost addresses
`allow_unix_sockets`	string[]	`[]`	Unix socket paths the agent can access

Permission Rules ([security.permissions]):

Field	Type	Default	Description
`allow`	string[]	`[]`	Tool patterns to allow (e.g., `"Bash(npm )"`, `"Read(./*)"`)
`deny`	string[]	`[]`	Tool patterns to deny (e.g., `"Bash(curl *)"`, `"Read(./.env)")`

`[tracking]` Section

Field	Type	Default	Description
`type`	string	`"none"`	Tracker type: `"json_checklist"`, `"notes_file"`, or `"none"`
`file`	string	`""`	Tracking file path (relative to harness_dir, required for json_checklist/notes_file)
`passing_field`	string	`"passes"`	JSON field indicating completion (for json_checklist)

`[error_recovery]` Section

Field	Type	Default	Description
`max_consecutive_errors`	int	`5`	Circuit breaker: max consecutive session errors before stopping
`initial_backoff_seconds`	float	`5.0`	Initial backoff delay after first error
`max_backoff_seconds`	float	`120.0`	Maximum backoff delay (capped exponential backoff)
`backoff_multiplier`	float	`2.0`	Multiplier for exponential backoff

`[[phases]]` Section

Multiple phases can be defined using [[phases]] array syntax:

Field	Type	Default	Description
`name`	string	required	Phase name (for logging/debugging)
`prompt`	string	required	Phase prompt (supports `file:` references)
`run_once`	bool	`false`	Only execute this phase once across all sessions
`condition`	string	`""`	Condition for running phase (e.g., `"not_exists:file.txt"`)

`[[init_files]]` Section

Multiple init files can be defined using [[init_files]] array syntax:

Field	Type	Default	Description
`source`	string	required	Source file path (relative to harness_dir)
`dest`	string	required	Destination file path (relative to harness_dir)

Config Loading Precedence

CLI flags > config.toml values > defaults

Project Structure

claude-agent-harness/
├── agent_harness/          # Python package
│   ├── __init__.py
│   ├── __main__.py         # Entry point
│   ├── cli.py              # Argument parsing, subcommands
│   ├── client_factory.py   # Builds ClaudeSDKClient from config
│   ├── config.py           # Config loading, validation, HarnessConfig
│   ├── runner.py           # Generic agent loop
│   ├── tracking.py         # Progress tracking implementations
│   └── verify.py           # Setup verification checks
├── examples/
│   └── claude-ai-clone/
│       ├── .agent-harness/
│       │   ├── prompts/
│       │   │   ├── app_spec.txt
│       │   │   ├── coding.md
│       │   │   └── initializer.md
│       │   └── config.toml
│       └── README.md
├── tests/
│   ├── test_cli.py
│   ├── test_client_factory.py
│   ├── test_config.py
│   ├── test_prompts.py
│   ├── test_runner.py
│   ├── test_tracking.py
│   └── test_verify.py
├── .env.example
└── pyproject.toml

Examples

Claude.ai Clone (Next.js)

See examples/claude-ai-clone/ for a complete example that:

Uses Next.js/React stack (npm, node commands)
Integrates Puppeteer MCP server for browser testing
Generates a production-quality chat interface
Tracks progress via feature_list.json

# Run the Claude.ai clone example
python -m agent_harness run \
    --project-dir ./my-clone-output \
    --harness-dir examples/claude-ai-clone/.agent-harness

Troubleshooting

"Configuration file not found"

The harness expects a .agent-harness/config.toml file in your project directory. If you see this error:

Check that you're running from the correct directory
Use python -m agent_harness init --project-dir ./my-project to scaffold a new configuration
If using --harness-dir, verify the path points to a directory containing config.toml

"Prompt file not found"

Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory. For example:

[[phases]]
prompt = "file:prompts/coding_prompt.md"  # Must exist at .agent-harness/prompts/coding_prompt.md

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

You need authentication credentials to use the Claude API:

API Key: Get one from console.anthropic.com and set export ANTHROPIC_API_KEY="your-key"
OAuth Token: Run claude setup-token and the harness will use CLAUDE_CODE_OAUTH_TOKEN automatically
See .env.example for setting these via environment file

Agent is hanging on the first session

The first session (initializer phase) can take 10-20+ minutes for complex projects because it:

Reads the entire spec
Plans the feature breakdown
Creates initial project structure
Sets up git repository
May run initial installs (npm, pip, etc.)

This is expected behavior. Subsequent sessions (coding phase) are typically faster as they focus on individual features.

If a session truly hangs:

Check the .agent-harness/session.json file for error messages
Look for permission prompts or security blocks in the output
Verify your progress file format matches the configuration (e.g., feature_list.json with "passes": false fields)

Design Principles

Zero assumptions about tech stack: The harness has no hardcoded knowledge of npm, Ruby, Python, or any other stack. Projects declare exactly what they need.
Zero wasted context: Only tools and servers declared in the config are available to the agent. No unused MCP servers polluting the context.
Defense in depth: Multiple security layers (sandbox, permission rules, secure defaults) protect against unintended actions.
Session persistence: Progress is saved between sessions via the progress file and git commits, enabling long-running tasks that span hours or days.
Fresh context per session: Each session starts with a clean context window, preventing context pollution and allowing unlimited total work.

Running Tests

# Run all tests
python -m unittest discover tests -v

# Run specific test modules
python -m unittest tests.test_config -v         # Configuration loading
python -m unittest tests.test_tracking -v       # Progress tracking
python -m unittest tests.test_runner -v         # Session loop logic
python -m unittest tests.test_client_factory -v # Client creation

Test coverage includes:

Security configuration: Sandbox settings, permission rules, network isolation
Configuration loading: TOML parsing, defaults, validation, error cases
Progress tracking: Completion detection, JSON parsing, print formatting
Prompt loading: File reading, file: resolution, error handling

License

MIT License. See LICENSE.

This harness is based on the patterns described in Anthropic's guide for long-running agent harnesses. The project originated from the autonomous-coding example in the claude-quickstarts repository and is built using the Claude Agent SDK.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
agent_harness		agent_harness
examples/claude-ai-clone		examples/claude-ai-clone
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
uv.lock		uv.lock

License

cpplain/claude-agent-harness

Folders and files

Latest commit

History

Repository files navigation

Agent Harness

What This Is

Overview

Design Philosophy

Required Reading

Example: Security

Getting Started

1. Clone and install

2. Set up authentication

3. Run

CLI Reference

How It Works

Phase-Driven Execution

Session Management

Error Recovery & Circuit Breaker

Security Model

SDK-Native Sandbox

Declarative Permission Rules

Secure Defaults

Git Protection Recommendations

Configuration

Directory Layout

Full Configuration Reference

Configuration Fields

Top-Level Settings

[tools] Section

[security] Section

[tracking] Section

[error_recovery] Section

[[phases]] Section

[[init_files]] Section

Config Loading Precedence

Project Structure

Examples

Claude.ai Clone (Next.js)

Troubleshooting

"Configuration file not found"

"Prompt file not found"

"Neither ANTHROPIC_API_KEY nor CLAUDE_CODE_OAUTH_TOKEN is set"

Agent is hanging on the first session

Design Principles

Running Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`[tools]` Section

`[security]` Section

`[tracking]` Section

`[error_recovery]` Section

`[[phases]]` Section

`[[init_files]]` Section

Packages