Autoloop: don't accept an iteration until CI is green — fix-and-retry until it passes

## Problem

Today an autoloop iteration is marked ✅ Accepted as soon as the agent's in-sandbox self-evaluation returns a non-zero metric. CI runs **after** the push and is not in the acceptance path. When CI fails (as in #174 — 24 TS2339 errors because the agent wrote tests against `DataFrame.fromArrays(...)`, which doesn't exist), the autoloop just keeps going:

- The broken commit stays on the long-running branch.
- The next iteration stacks new work on top of a red branch.
- Nothing reverts, nothing retries, nothing notifies.
- The draft PR (#174) accumulates failures indefinitely.

Compounding factor: the agent sandbox can't reliably run `bun`/`tsc` because `releaseassets.githubusercontent.com` is firewall-blocked, so the sandbox self-evaluation is unreliable by construction. Real validation must come from CI.

## Proposed behavior — fix-and-retry loop until green

Change the iteration acceptance criterion from "agent self-evaluated OK" to **"CI is green on the pushed commit"**. If CI is red, the agent keeps iterating on the fix in-place until it's green. No revert.

### Per-iteration flow

```
1. Agent proposes change, commits, and pushes to autoloop/<program>.
2. Wait for CI on the new HEAD commit:
     gh pr checks <pr> --watch --interval 30
   (or poll `/repos/.../commits/:sha/check-runs` and `/check-suites`).
3. If CI is green:
     → Mark iteration accepted. Comment on the program issue. Update state. Done.
4. If CI is red:
     → Fetch the failing check-run logs and a structured failure summary
       (compile errors / failing test names / first N lines of stack).
     → Feed that back to the agent as the next task:
         "CI failed on <sha>. Here are the failures: <...>. Fix them and push again."
     → Agent commits the fix and pushes again.
     → Go to step 2.
5. Keep looping until CI is green OR the fix-retry budget is exhausted.
```

### Bounds (to avoid runaway cost / infinite loops)

- **Per-iteration fix budget**: up to 5 fix attempts. Each fix attempt has its own bounded wall-clock and token budget.
- **Per-iteration wall clock**: hard cap at 60 min total (including CI waits). If the cap is hit mid-fix, record the current state, leave the PR in a failing state, and surface it loudly — don't silently abandon.
- **No-progress guard**: if two consecutive fix attempts produce the same failing check signature, stop looping — the agent is clearly stuck. Record `paused` with `pause_reason: "stuck in CI fix loop: <signature>"` and surface for human review.

### When the budget is exhausted

Prefer *loud failure over silent corruption*:

- Set `paused: true` on the program state file with the pause reason.
- Open or update a comment on the program issue (#1, etc.) with the failing check summary and the last three fix attempts.
- Do **not** revert. Leaving the broken commit in place lets a human (or Evergreen — see below) see what's wrong and continue from there. Reverting would lose the partial progress the agent made.

## Evergreen extension

Currently Evergreen ("PR Health Keeper") is scoped to merge-conflict and CI-failure fixing on human-authored PRs. Extend it to also pick up **autoloop PRs** whose long-running branch is red. Two cases where Evergreen would help:

1. **Autoloop hit its fix-retry budget and paused.** Evergreen picks up the PR, attempts its own fix with fresh context. If green → un-pause the program.
2. **The fix loop is in-flight but stalled** (wall-clock cap hit). Evergreen resumes on the next tick.

Keep Evergreen's 5-attempts-per-SHA rule so it also gives up eventually rather than burning cycles.

## Implementation sketch in `.github/workflows/autoloop.md`

The agent prompt already has a Step 5 (Accept/Reject). Replace the current accept logic with:

```markdown
### Step 5a: Push and wait for CI

After committing, push to `autoloop/{program-name}`. Then wait for CI on the new HEAD
commit:

\`\`\`bash
gh pr checks $EXISTING_PR --watch --interval 30 --fail-fast || true
status=$(gh pr checks $EXISTING_PR --json conclusion -q '.[].conclusion' \\
  | awk 'BEGIN{r="success"} /FAILURE|CANCELLED/{r="failure"} END{print r}')
\`\`\`

### Step 5b: Fix loop (up to 5 attempts)

If `$status == "failure"`:
- Fetch failing check-run logs: `gh run view --log` for each failing job.
- Extract the first 50 lines of error output and the set of failing test names.
- Treat this as the new task description and implement a fix.
- Commit, push, go back to Step 5a.
- If you are on the 5th attempt and still failing, stop. Set `paused=true`
  on the state file with \`pause_reason: "ci-fix-exhausted: <signature>"\`,
  comment on the program issue, and end the iteration.

### Step 5c: Accept

Only when `$status == "success"` do you mark the iteration accepted: update the state file,
post the accepted comment on the program issue, and finish.
```

The `missing_tool` safe-output is irrelevant here — this runs in the agent's bash tool, not via an MCP tool.

## Why not just revert?

The original proposal was "revert if red, let the next iteration try again". The user explicitly prefers fix-and-retry because:

- Reverting throws away real work. The agent's proposed change is usually 80% right; it's better to fix the 20% than re-derive it.
- Reverting creates churn in the commit history — harder to audit what actually got tried.
- Fix-and-retry produces a single clean commit on accept rather than a commit + revert + new commit.

Edge case: if the agent's change is fundamentally wrong (e.g., wrong architectural direction), the fix loop will exhaust its budget, the program will pause, and a human can manually reset the branch. That's a deliberate loud-failure path, not silent corruption.

## Acceptance

- An iteration cannot be accepted while CI is red.
- When CI is red, the agent fetches the failure details and pushes a fix (up to 5 attempts).
- If all attempts fail, the program auto-pauses with a clear `pause_reason` and a comment on the program issue — not a revert.
- No autoloop PR sits with persistent red CI unnoticed.
- Evergreen picks up paused autoloop PRs (same rules as human-authored PRs: max 5 attempts per SHA).

## Context

- Failing PR with current broken acceptance: #174 (24 TS2339 errors in `tests/stats/eval_query.test.ts`).
- Related firewall issue making sandbox self-evaluation unreliable: the agent can't install bun because `releaseassets.githubusercontent.com` is blocked (see comment on #1, iteration 233). That's why CI-based validation is the right anchor — the sandbox can't be trusted to catch these errors on its own.
- Related bug that also needs fixing for this to work end-to-end: #162 (state files aren't read → program scheduling broken).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoloop: don't accept an iteration until CI is green — fix-and-retry until it passes #175

Problem

Proposed behavior — fix-and-retry loop until green

Per-iteration flow

Bounds (to avoid runaway cost / infinite loops)

When the budget is exhausted

Evergreen extension

Implementation sketch in `.github/workflows/autoloop.md`

Why not just revert?

Acceptance

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Autoloop: don't accept an iteration until CI is green — fix-and-retry until it passes #175

Description

Problem

Proposed behavior — fix-and-retry loop until green

Per-iteration flow

Bounds (to avoid runaway cost / infinite loops)

When the budget is exhausted

Evergreen extension

Implementation sketch in .github/workflows/autoloop.md

Why not just revert?

Acceptance

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Implementation sketch in `.github/workflows/autoloop.md`