Reusable experiment harness for autonomous Codex / AgenTeam agent pipelines.
A drop-in directory you add to any project to run Codex CLI (solo) or the AgenTeam plugin (6-role pipeline, non-interactive) against a seed prompt, with built-in:
- Background git auto-committer (commits land as
Codex (agent)) - Telegram status push + ad-hoc status queries
- Steering inbox (post mid-run ideas via chat, applied at natural break points)
- AgenTeam bug-filing helper (auto-files to
yimwoo/codex-agenteam) - Post-run metric rollup + human-readable markdown report
Used in production by yimwoo/agento —
an experiment where AI agents build an agent orchestration platform.
clawforge/
├── harness/
│ ├── _lib.sh # Shared helpers (HARNESS_DIR, remote URL)
│ ├── run_experiment.sh # Main entry point
│ ├── solo_baseline.sh # Solo-Codex runner
│ ├── agenteam_run.sh # AgenTeam wrapper
│ ├── agenteam_runner.py # AgenTeam non-interactive driver
│ ├── autocommit.sh # Background snapshot committer
│ ├── status.sh # One-shot status printer
│ ├── steer.sh # Append a directive to the run's inbox
│ ├── read_steering.sh # Read / archive the inbox
│ ├── apply_steering.sh # Force-apply inbox via `codex exec resume`
│ ├── file_agenteam_bug.sh # Post an issue to codex-agenteam
│ └── telegram_notifier.py # Push events to Telegram
├── metrics/
│ ├── collector.py # Trace event rollup
│ ├── evaluator.py # Lint / test / security scan runner
│ ├── comparator.py # Cross-run comparison tables
│ └── generate_report.py # Markdown + JSON report (auto-run on exit)
└── playbook.md # Tool-awareness prelude prepended to every seed
1. Add clawforge as a submodule in your project:
cd your-project
git submodule add https://github.com/yimwoo/clawforge.git clawforge2. Create experiment/ at your project root for project-specific content
and runtime state:
your-project/
├── clawforge/ # ← submodule (this repo)
├── experiment/ # ← project-local
│ ├── config.yaml # project metadata + knobs
│ ├── seed_prompts/
│ │ ├── minimal.md
│ │ └── detailed.md # ← the product spec
│ ├── checkpoints/ # runtime state (gitignored)
│ ├── traces/ # runtime JSONL logs (gitignored)
│ ├── reports/ # per-run markdown + metrics.json (tracked)
│ ├── steering/ # steering inbox + consumed/ archive
│ └── learnings/ # accumulated findings across runs
└── src/ # ← what codex builds
Minimal experiment/config.yaml:
experiment:
name: "my-experiment"
project:
name: "MyProject"
seed_prompt: "experiment/seed_prompts/detailed.md"
src_dir: "src"
git:
remote_url: "" # blank → reads from `git remote get-url origin`
model:
id: "gpt-5.4"
harness:
type: "solo" # or "agenteam"
max_time_hours: 8
max_cost_usd: 3003. Provide env vars (in a .env at project root, or the container env):
GH_TOKEN # for pushing to GitHub
TELEGRAM_DEVTEAM_BOT_TOKEN # optional, for Telegram push
TELEGRAM_DEVTEAM_CHAT_ID # optional
4. Run:
./clawforge/harness/run_experiment.sh --harness solo --run-id run_001
./clawforge/harness/run_experiment.sh --harness agenteam --run-id run_002That's it. No source edits to clawforge.
All inter-script references resolve through HARNESS_DIR and HARNESS_ROOT
computed by _lib.sh via $(dirname BASH_SOURCE). The harness works
regardless of whether you mount it at clawforge/, harness/,
vendor/clawforge/, or somewhere else. Runtime artifacts
(experiment/traces/, experiment/checkpoints/, etc.) are always
read/written relative to $PROJECT_ROOT, which is computed as the parent
of the harness install directory.
Commit + push go to a remote URL resolved in this order:
GH_REMOTE_URLenvironment variable (explicit override)git.remote_urlinexperiment/config.yamlgit remote get-url origin
Option 3 is the default and works for any cloned project.
Clawforge includes a daily cycle mode that generates a context-aware seed prompt each day and runs a full pipeline autonomously:
./clawforge/harness/daily_cycle.sh # auto: run_YYYYMMDD
./clawforge/harness/daily_cycle.sh --dry-run # inspect the seed, don't runThe dynamic seed generator (harness/dynamic_seed.py) reads:
docs/ROADMAP.md— what's plannedexperiment/reports/*/report.md— what was done last cycle- Open GitHub issues (via
GH_TOKEN) — bugs and feature requests experiment/steering/inbox — human ideas posted via Telegramsrc/file stats + recentgit log— current codebase stateexperiment/seed_prompts/detailed.md— the product spec (drift anchor)
Each role receives the full context and knows what to focus on: researcher audits, PM updates the roadmap, architect designs, dev implements + fixes bugs, QA tests and files issues, reviewer checks quality. QA-filed bugs land as GitHub issues → next cycle picks them up automatically. Self-healing.
{
"agentId": "devteam",
"name": "MyProject Daily Cycle",
"schedule": { "kind": "cron", "expr": "0 9 * * *", "tz": "America/Los_Angeles" },
"payload": {
"kind": "agentTurn",
"message": "Run today's daily cycle for my-project.",
"timeoutSeconds": 120
}
}The devteam agent reads the message, invokes the daily_cycle.sh via
the project's Telegram skill, and the pipeline runs in the background.
Each project that uses clawforge is fully isolated. Run IDs, traces,
reports, git remotes, and GitHub issues are all scoped to
$PROJECT_ROOT (the parent of wherever the clawforge submodule is
mounted). Two projects can share the same container and even run the
same run_YYYYMMDD id without conflict.
# 1. Mount into the container (docker-compose.yml bind mount)
# 2. Add clawforge submodule
cd project-b
git submodule add https://github.com/yimwoo/clawforge.git clawforge
# 3. Create experiment dir + seed prompt
mkdir -p experiment/{seed_prompts,checkpoints,traces,reports,steering,learnings}
$EDITOR experiment/seed_prompts/detailed.md
$EDITOR experiment/config.yaml # project.name, seed_prompt path
# 4. Add a Telegram skill (copy + adapt the project-a skill, change paths)
# 5. Add a cron job in ~/.openclaw/cron/jobs.json for the daily cycle
# 6. Run:
./clawforge/harness/daily_cycle.sh| Resource | Shared? | Notes |
|---|---|---|
| Codex sessions | Isolated | Per-run session.id capture |
| Telegram chat | Shared chat, tagged | project=<name> in every message |
| GitHub issues | Isolated | Each project has its own origin |
| Traces / reports | Isolated | Under each project's experiment/ |
| API spend | Shared | Stagger cron times to avoid rate limits |
- Codex CLI (
@openai/codex, tested with 0.120.0) - Python 3.11+ with
pyyamlandtoml git,curl,rsync,bash- Optionally: AgenTeam plugin (
github.com/yimwoo/codex-agenteam) for the--harness agenteamrunner
All of these are available in the OpenClaw container image with a small apt package addition (see agento's Dockerfile for reference).
MIT.
yimwoo/agento— reference consumeryimwoo/codex-agenteam— AgenTeam pluginopenclaw/openclaw— runtime container