clawforge

Reusable experiment harness for autonomous Codex / AgenTeam agent pipelines.

A drop-in directory you add to any project to run Codex CLI (solo) or the AgenTeam plugin (6-role pipeline, non-interactive) against a seed prompt, with built-in:

Background git auto-committer (commits land as Codex (agent))
Telegram status push + ad-hoc status queries
Steering inbox (post mid-run ideas via chat, applied at natural break points)
AgenTeam bug-filing helper (auto-files to yimwoo/codex-agenteam)
Post-run metric rollup + human-readable markdown report

Status

Used in production by yimwoo/agento — an experiment where AI agents build an agent orchestration platform.

Directory layout

clawforge/
├── harness/
│   ├── _lib.sh                 # Shared helpers (HARNESS_DIR, remote URL)
│   ├── run_experiment.sh       # Main entry point
│   ├── solo_baseline.sh        # Solo-Codex runner
│   ├── agenteam_run.sh         # AgenTeam wrapper
│   ├── agenteam_runner.py      # AgenTeam non-interactive driver
│   ├── autocommit.sh           # Background snapshot committer
│   ├── status.sh               # One-shot status printer
│   ├── steer.sh                # Append a directive to the run's inbox
│   ├── read_steering.sh        # Read / archive the inbox
│   ├── apply_steering.sh       # Force-apply inbox via `codex exec resume`
│   ├── file_agenteam_bug.sh    # Post an issue to codex-agenteam
│   └── telegram_notifier.py    # Push events to Telegram
├── metrics/
│   ├── collector.py            # Trace event rollup
│   ├── evaluator.py            # Lint / test / security scan runner
│   ├── comparator.py           # Cross-run comparison tables
│   └── generate_report.py      # Markdown + JSON report (auto-run on exit)
└── playbook.md                 # Tool-awareness prelude prepended to every seed

Using clawforge in a project

1. Add clawforge as a submodule in your project:

cd your-project
git submodule add https://github.com/yimwoo/clawforge.git clawforge

2. Create experiment/ at your project root for project-specific content and runtime state:

your-project/
├── clawforge/                  # ← submodule (this repo)
├── experiment/                 # ← project-local
│   ├── config.yaml             # project metadata + knobs
│   ├── seed_prompts/
│   │   ├── minimal.md
│   │   └── detailed.md         # ← the product spec
│   ├── checkpoints/            # runtime state (gitignored)
│   ├── traces/                 # runtime JSONL logs (gitignored)
│   ├── reports/                # per-run markdown + metrics.json (tracked)
│   ├── steering/               # steering inbox + consumed/ archive
│   └── learnings/              # accumulated findings across runs
└── src/                        # ← what codex builds

Minimal experiment/config.yaml:

experiment:
  name: "my-experiment"

project:
  name: "MyProject"
  seed_prompt: "experiment/seed_prompts/detailed.md"
  src_dir: "src"

git:
  remote_url: ""   # blank → reads from `git remote get-url origin`

model:
  id: "gpt-5.4"

harness:
  type: "solo"      # or "agenteam"
  max_time_hours: 8
  max_cost_usd: 300

3. Provide env vars (in a .env at project root, or the container env):

GH_TOKEN                     # for pushing to GitHub
TELEGRAM_DEVTEAM_BOT_TOKEN   # optional, for Telegram push
TELEGRAM_DEVTEAM_CHAT_ID     # optional

4. Run:

./clawforge/harness/run_experiment.sh --harness solo     --run-id run_001
./clawforge/harness/run_experiment.sh --harness agenteam --run-id run_002

That's it. No source edits to clawforge.

Self-locating scripts

All inter-script references resolve through HARNESS_DIR and HARNESS_ROOT computed by _lib.sh via $(dirname BASH_SOURCE). The harness works regardless of whether you mount it at clawforge/, harness/, vendor/clawforge/, or somewhere else. Runtime artifacts (experiment/traces/, experiment/checkpoints/, etc.) are always read/written relative to $PROJECT_ROOT, which is computed as the parent of the harness install directory.

Remote URL resolution

Commit + push go to a remote URL resolved in this order:

GH_REMOTE_URL environment variable (explicit override)
git.remote_url in experiment/config.yaml
git remote get-url origin

Option 3 is the default and works for any cloned project.

Autonomous daily cycle (self-improving loop)

Clawforge includes a daily cycle mode that generates a context-aware seed prompt each day and runs a full pipeline autonomously:

./clawforge/harness/daily_cycle.sh           # auto: run_YYYYMMDD
./clawforge/harness/daily_cycle.sh --dry-run # inspect the seed, don't run

The dynamic seed generator (harness/dynamic_seed.py) reads:

docs/ROADMAP.md — what's planned
experiment/reports/*/report.md — what was done last cycle
Open GitHub issues (via GH_TOKEN) — bugs and feature requests
experiment/steering/ inbox — human ideas posted via Telegram
src/ file stats + recent git log — current codebase state
experiment/seed_prompts/detailed.md — the product spec (drift anchor)

Each role receives the full context and knows what to focus on: researcher audits, PM updates the roadmap, architect designs, dev implements + fixes bugs, QA tests and files issues, reviewer checks quality. QA-filed bugs land as GitHub issues → next cycle picks them up automatically. Self-healing.

Scheduling via OpenClaw cron

{
  "agentId": "devteam",
  "name": "MyProject Daily Cycle",
  "schedule": { "kind": "cron", "expr": "0 9 * * *", "tz": "America/Los_Angeles" },
  "payload": {
    "kind": "agentTurn",
    "message": "Run today's daily cycle for my-project.",
    "timeoutSeconds": 120
  }
}

The devteam agent reads the message, invokes the daily_cycle.sh via the project's Telegram skill, and the pipeline runs in the background.

Multi-project usage

Each project that uses clawforge is fully isolated. Run IDs, traces, reports, git remotes, and GitHub issues are all scoped to $PROJECT_ROOT (the parent of wherever the clawforge submodule is mounted). Two projects can share the same container and even run the same run_YYYYMMDD id without conflict.

Setting up a second project

# 1. Mount into the container (docker-compose.yml bind mount)
# 2. Add clawforge submodule
cd project-b
git submodule add https://github.com/yimwoo/clawforge.git clawforge

# 3. Create experiment dir + seed prompt
mkdir -p experiment/{seed_prompts,checkpoints,traces,reports,steering,learnings}
$EDITOR experiment/seed_prompts/detailed.md
$EDITOR experiment/config.yaml  # project.name, seed_prompt path

# 4. Add a Telegram skill (copy + adapt the project-a skill, change paths)
# 5. Add a cron job in ~/.openclaw/cron/jobs.json for the daily cycle
# 6. Run:
./clawforge/harness/daily_cycle.sh

What's shared vs isolated

Resource	Shared?	Notes
Codex sessions	Isolated	Per-run `session.id` capture
Telegram chat	Shared chat, tagged	`project=<name>` in every message
GitHub issues	Isolated	Each project has its own origin
Traces / reports	Isolated	Under each project's `experiment/`
API spend	Shared	Stagger cron times to avoid rate limits

Prerequisites

Codex CLI (@openai/codex, tested with 0.120.0)
Python 3.11+ with pyyaml and toml
git, curl, rsync, bash
Optionally: AgenTeam plugin (github.com/yimwoo/codex-agenteam) for the --harness agenteam runner

All of these are available in the OpenClaw container image with a small apt package addition (see agento's Dockerfile for reference).

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
harness		harness
metrics		metrics
skills		skills
.gitignore		.gitignore
README.md		README.md
playbook.md		playbook.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clawforge

Status

Directory layout

Using clawforge in a project

Self-locating scripts

Remote URL resolution

Autonomous daily cycle (self-improving loop)

Scheduling via OpenClaw cron

Multi-project usage

Setting up a second project

What's shared vs isolated

Prerequisites

License

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

clawforge

Status

Directory layout

Using clawforge in a project

Self-locating scripts

Remote URL resolution

Autonomous daily cycle (self-improving loop)

Scheduling via OpenClaw cron

Multi-project usage

Setting up a second project

What's shared vs isolated

Prerequisites

License

Related

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages