Gate iteration acceptance on CI green, with a bounded fix-and-retry loop

## Summary

Today an iteration is marked ✅ Accepted as soon as the sandbox-computed metric improves. CI runs **after** the push and is not on the acceptance path — which means a broken commit can land on the long-running branch and stack under subsequent iterations, with no revert or retry.

This matters because the agent sandbox cannot reliably install common toolchains — `bun`, `tsc`, `cargo`, `go`, `pytest`, etc. — due to firewall restrictions on asset hosts like `releaseassets.githubusercontent.com`. Sandbox self-evaluation is therefore structurally unreliable. Real validation has to come from CI on the pushed HEAD commit.

## Concrete failure this prevents

A long-running branch has an open draft PR with dozens of TypeScript compile errors because the agent wrote tests against a method that doesn't exist on the target class. The sandbox couldn't run `tsc` (toolchain install blocked), so the iteration's self-evaluation rubber-stamped the change as Accepted. Nothing reverts or retries; the red branch stays red and piles subsequent iterations on top. This is the default failure mode whenever the sandbox can't run the project's type-check/test suite — which is most of them in practice.

## Proposed flow (replaces current Step 5)

Split the current "Step 5: Accept or Reject" into three sub-steps with an explicit CI gate between push and accept:

### Step 5a: Push and wait for CI

After committing, push to `autoloop/{program-name}` and wait for the CI on the new HEAD:

```bash
PR=${EXISTING_PR:-$(gh pr list --head autoloop/{program-name} --json number -q '.[0].number')}
gh pr checks "$PR" --watch --interval 30 || true
status=$(gh pr checks "$PR" --json conclusion,state -q '.[] | (.conclusion // .state // "")' \
  | awk '
      BEGIN { r = "success" }
      /^(FAILURE|CANCELLED|TIMED_OUT|ACTION_REQUIRED|STARTUP_FAILURE|STALE)$/ { r = "failure" }
      /^(PENDING|QUEUED|IN_PROGRESS|WAITING|REQUESTED)$/ { if (r == "success") r = "pending" }
      END { print r }')
```

Three outcomes: `success`, `failure`, `pending`. `pending` should rarely happen if `--watch` is used but the awk fallback is defensive.

### Step 5b: Fix loop (up to 5 attempts per iteration)

If `status == "failure"`, **fix and retry — do not revert, do not accept**:

1. **Fetch the failing check-run logs** for the pushed SHA via `gh run view --log` or the Checks API.
2. **Extract a structured failure summary**:
   - Failing job names and their first error lines.
   - **A failure signature** — a stable, normalized fingerprint of the failures (e.g., sorted failing-test names + the top error code, like `TS2339:fromArrays:tests/stats/eval_query.test.ts`). Used by the no-progress guard.
3. **No-progress guard**: if this attempt's failure signature matches the previous attempt's signature, **stop**. The agent is stuck in a repeat-loop. Set `paused: true` on the state file with `pause_reason: "stuck in CI fix loop: <signature>"`, comment on the program issue with the signature and the three most recent attempts, and end the iteration.
4. **Attempt the fix**: feed the structured failure summary back to the agent as the next task ("CI failed on <sha>. Here are the failures: <…>. Fix them and push again"). Agent commits the fix and pushes.
5. **Loop back to Step 5a** with the new HEAD.
6. **Budget: 5 fix attempts per iteration.** If the 5th attempt still leaves CI red, `paused: true` with `pause_reason: "ci-fix-exhausted: <signature>"`.
7. **Wall-clock cap: 60 min per iteration** including CI waits. If exceeded mid-fix, set `paused: true` with `pause_reason: "ci-timeout"` and leave the current state in place.

### Step 5c: Accept

Only when `status == "success"`:
- Mark iteration accepted. Update state file's Machine State, push/update PR, comment on program issue with metric delta and fix-attempt count if > 0.

## Why no revert?

The naive alternative is "revert on red, retry next iteration." Fix-and-retry is strictly better:

- **Reverting throws away real work.** The agent's change is usually 80% right; re-deriving the fix from scratch next iteration is wasteful.
- **Reverting creates commit-history churn** — the branch ends up with commit-revert-commit triples that are hard to audit.
- **Fix-and-retry produces a single clean commit on accept.** Multiple fix attempts within an iteration are local to that iteration; if it succeeds, only the final commit is on the branch.
- **Edge case of fundamentally wrong direction**: caught by the 5-attempt budget plus the no-progress guard. Program auto-pauses with a loud, structured `pause_reason`. Humans (or a PR-health-keeper workflow) can reset.

## New Machine State values to document

Add to the pause-reason vocabulary:
- `ci-fix-exhausted: <signature>` — 5 fix attempts didn't fix CI.
- `stuck in CI fix loop: <signature>` — no-progress guard tripped (same failure twice in a row).
- `ci-timeout` — 60-min wall-clock cap hit.

Add to the `recent_statuses` vocabulary:
- `ci-fix-exhausted` — alongside `accepted`, `rejected`, `error`.

## Coordination with PR-health-keeper workflows

If a repo ships a companion PR-health-keeper workflow (e.g., an "Evergreen" workflow that fixes failing CI on open PRs), it should be able to pick up paused Autoloop PRs using the same rules as human-authored PRs. The handoff is via the `pause_reason` field. Absent such a workflow, the loud pause + structured reason gives a human enough signal to intervene.

## Related

- Depends on sibling issue #34 (scheduler extraction) for the fix-loop helper code (failure-signature extraction is ~30 lines of Python).
- Related root cause of sandbox unreliability: gh-aw's firewall blocks `releaseassets.githubusercontent.com`, preventing tools like `bun` from installing. Even if that's fixed upstream, a CI gate is the correct acceptance criterion regardless.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gate iteration acceptance on CI green, with a bounded fix-and-retry loop #37

Summary

Concrete failure this prevents

Proposed flow (replaces current Step 5)

Step 5a: Push and wait for CI

Step 5b: Fix loop (up to 5 attempts per iteration)

Step 5c: Accept

Why no revert?

New Machine State values to document

Coordination with PR-health-keeper workflows

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gate iteration acceptance on CI green, with a bounded fix-and-retry loop #37

Description

Summary

Concrete failure this prevents

Proposed flow (replaces current Step 5)

Step 5a: Push and wait for CI

Step 5b: Fix loop (up to 5 attempts per iteration)

Step 5c: Accept

Why no revert?

New Machine State values to document

Coordination with PR-health-keeper workflows

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions