Skip to content

yimwoo/clawforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clawforge

Reusable experiment harness for autonomous Codex / AgenTeam agent pipelines.

A drop-in directory you add to any project to run Codex CLI (solo) or the AgenTeam plugin (6-role pipeline, non-interactive) against a seed prompt, with built-in:

  • Background git auto-committer (commits land as Codex (agent))
  • Telegram status push + ad-hoc status queries
  • Steering inbox (post mid-run ideas via chat, applied at natural break points)
  • AgenTeam bug-filing helper (auto-files to yimwoo/codex-agenteam)
  • Post-run metric rollup + human-readable markdown report

Status

Used in production by yimwoo/agento — an experiment where AI agents build an agent orchestration platform.

Directory layout

clawforge/
├── harness/
│   ├── _lib.sh                 # Shared helpers (HARNESS_DIR, remote URL)
│   ├── run_experiment.sh       # Main entry point
│   ├── solo_baseline.sh        # Solo-Codex runner
│   ├── agenteam_run.sh         # AgenTeam wrapper
│   ├── agenteam_runner.py      # AgenTeam non-interactive driver
│   ├── autocommit.sh           # Background snapshot committer
│   ├── status.sh               # One-shot status printer
│   ├── steer.sh                # Append a directive to the run's inbox
│   ├── read_steering.sh        # Read / archive the inbox
│   ├── apply_steering.sh       # Force-apply inbox via `codex exec resume`
│   ├── file_agenteam_bug.sh    # Post an issue to codex-agenteam
│   └── telegram_notifier.py    # Push events to Telegram
├── metrics/
│   ├── collector.py            # Trace event rollup
│   ├── evaluator.py            # Lint / test / security scan runner
│   ├── comparator.py           # Cross-run comparison tables
│   └── generate_report.py      # Markdown + JSON report (auto-run on exit)
└── playbook.md                 # Tool-awareness prelude prepended to every seed

Using clawforge in a project

1. Add clawforge as a submodule in your project:

cd your-project
git submodule add https://github.com/yimwoo/clawforge.git clawforge

2. Create experiment/ at your project root for project-specific content and runtime state:

your-project/
├── clawforge/                  # ← submodule (this repo)
├── experiment/                 # ← project-local
│   ├── config.yaml             # project metadata + knobs
│   ├── seed_prompts/
│   │   ├── minimal.md
│   │   └── detailed.md         # ← the product spec
│   ├── checkpoints/            # runtime state (gitignored)
│   ├── traces/                 # runtime JSONL logs (gitignored)
│   ├── reports/                # per-run markdown + metrics.json (tracked)
│   ├── steering/               # steering inbox + consumed/ archive
│   └── learnings/              # accumulated findings across runs
└── src/                        # ← what codex builds

Minimal experiment/config.yaml:

experiment:
  name: "my-experiment"

project:
  name: "MyProject"
  seed_prompt: "experiment/seed_prompts/detailed.md"
  src_dir: "src"

git:
  remote_url: ""   # blank → reads from `git remote get-url origin`

model:
  id: "gpt-5.4"

harness:
  type: "solo"      # or "agenteam"
  max_time_hours: 8
  max_cost_usd: 300

3. Provide env vars (in a .env at project root, or the container env):

GH_TOKEN                     # for pushing to GitHub
TELEGRAM_DEVTEAM_BOT_TOKEN   # optional, for Telegram push
TELEGRAM_DEVTEAM_CHAT_ID     # optional

4. Run:

./clawforge/harness/run_experiment.sh --harness solo     --run-id run_001
./clawforge/harness/run_experiment.sh --harness agenteam --run-id run_002

That's it. No source edits to clawforge.

Self-locating scripts

All inter-script references resolve through HARNESS_DIR and HARNESS_ROOT computed by _lib.sh via $(dirname BASH_SOURCE). The harness works regardless of whether you mount it at clawforge/, harness/, vendor/clawforge/, or somewhere else. Runtime artifacts (experiment/traces/, experiment/checkpoints/, etc.) are always read/written relative to $PROJECT_ROOT, which is computed as the parent of the harness install directory.

Remote URL resolution

Commit + push go to a remote URL resolved in this order:

  1. GH_REMOTE_URL environment variable (explicit override)
  2. git.remote_url in experiment/config.yaml
  3. git remote get-url origin

Option 3 is the default and works for any cloned project.

Autonomous daily cycle (self-improving loop)

Clawforge includes a daily cycle mode that generates a context-aware seed prompt each day and runs a full pipeline autonomously:

./clawforge/harness/daily_cycle.sh           # auto: run_YYYYMMDD
./clawforge/harness/daily_cycle.sh --dry-run # inspect the seed, don't run

The dynamic seed generator (harness/dynamic_seed.py) reads:

  • docs/ROADMAP.md — what's planned
  • experiment/reports/*/report.md — what was done last cycle
  • Open GitHub issues (via GH_TOKEN) — bugs and feature requests
  • experiment/steering/ inbox — human ideas posted via Telegram
  • src/ file stats + recent git log — current codebase state
  • experiment/seed_prompts/detailed.md — the product spec (drift anchor)

Each role receives the full context and knows what to focus on: researcher audits, PM updates the roadmap, architect designs, dev implements + fixes bugs, QA tests and files issues, reviewer checks quality. QA-filed bugs land as GitHub issues → next cycle picks them up automatically. Self-healing.

Scheduling via OpenClaw cron

{
  "agentId": "devteam",
  "name": "MyProject Daily Cycle",
  "schedule": { "kind": "cron", "expr": "0 9 * * *", "tz": "America/Los_Angeles" },
  "payload": {
    "kind": "agentTurn",
    "message": "Run today's daily cycle for my-project.",
    "timeoutSeconds": 120
  }
}

The devteam agent reads the message, invokes the daily_cycle.sh via the project's Telegram skill, and the pipeline runs in the background.

Multi-project usage

Each project that uses clawforge is fully isolated. Run IDs, traces, reports, git remotes, and GitHub issues are all scoped to $PROJECT_ROOT (the parent of wherever the clawforge submodule is mounted). Two projects can share the same container and even run the same run_YYYYMMDD id without conflict.

Setting up a second project

# 1. Mount into the container (docker-compose.yml bind mount)
# 2. Add clawforge submodule
cd project-b
git submodule add https://github.com/yimwoo/clawforge.git clawforge

# 3. Create experiment dir + seed prompt
mkdir -p experiment/{seed_prompts,checkpoints,traces,reports,steering,learnings}
$EDITOR experiment/seed_prompts/detailed.md
$EDITOR experiment/config.yaml  # project.name, seed_prompt path

# 4. Add a Telegram skill (copy + adapt the project-a skill, change paths)
# 5. Add a cron job in ~/.openclaw/cron/jobs.json for the daily cycle
# 6. Run:
./clawforge/harness/daily_cycle.sh

What's shared vs isolated

Resource Shared? Notes
Codex sessions Isolated Per-run session.id capture
Telegram chat Shared chat, tagged project=<name> in every message
GitHub issues Isolated Each project has its own origin
Traces / reports Isolated Under each project's experiment/
API spend Shared Stagger cron times to avoid rate limits

Prerequisites

  • Codex CLI (@openai/codex, tested with 0.120.0)
  • Python 3.11+ with pyyaml and toml
  • git, curl, rsync, bash
  • Optionally: AgenTeam plugin (github.com/yimwoo/codex-agenteam) for the --harness agenteam runner

All of these are available in the OpenClaw container image with a small apt package addition (see agento's Dockerfile for reference).

License

MIT.

Related

About

Reusable experiment harness for autonomous Codex / AgenTeam agent pipelines — solo + multi-role runners, auto-commit, Telegram, steering inbox, metrics, reports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors