Skip to content

OpenEvolve: gate iteration comments on CI and measure fitness in CI#199

Merged
mrjf merged 2 commits intomainfrom
copilot/fix-iteration-comments-pending-ci
Apr 23, 2026
Merged

OpenEvolve: gate iteration comments on CI and measure fitness in CI#199
mrjf merged 2 commits intomainfrom
copilot/fix-iteration-comments-pending-ci

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

OpenEvolve iteration comments on tsb-perf-evolve were posted with ⏳ Pending CI / Metric: pending CI (sandbox has no bun) and never reconciled, because the strategy playbook posts before CI runs and the sandbox can't install bun to measure fitness itself. CI also didn't run the benchmark, so even a green build left fitness: null on every population entry.

Fix 2 — measure fitness in CI

  • Extracted ## Evaluation from .autoloop/programs/tsb-perf-evolve/program.md into an executable evaluate.sh next to it; program.md now just calls bash evaluate.sh. Single source of truth shared between agent and CI.
  • New benchmark job in .github/workflows/ci.yml, gated on autoloop/*-evolve branches: Bun + Python + pandas, runs evaluate.sh, uploads bench-result.json, and creates an OpenEvolve benchmark check-run titled fitness=<num> (or fitness=null + neutral if the benchmark itself errored). Fitness/result are passed to actions/github-script via env, not expression interpolation. Workflow gets checks: write.

Fix 1 — make iteration comments reconcile with CI

  • Added ### Step 6.5. Wait for CI between Steps 6 and 7 in .autoloop/strategies/openevolve/strategy.md. Resolves the PR (existing_prgh pr list --head fallback), blocks on gh pr checks --watch --interval 30 --fail-fast, classifies with the same awk classifier as the generic Step 5a, then reads fitness from the check-run:

    fitness=$(gh api "repos/${GITHUB_REPOSITORY}/commits/${SHA}/check-runs" \
      --jq '.check_runs[] | select(.name == "OpenEvolve benchmark") | .output.title' \
      | sed -n 's/^fitness=//p' | head -n1)
  • Branching: success → record numeric fitness, post ✅ Accepted; failure → defer to the existing Gate autoloop iteration acceptance on CI green with a bounded fix-retry loop #176 fix-retry loop, on exhaustion post ❌ Rejected / ⚠️ Error with pause_reason: ci-fix-exhausted; pending (60-min wall-clock cap fired) → record status: pending-ci and leave a single reconciliation-pending comment the next iteration is allowed to overwrite.

  • Step 6 reframed as a cheap pre-filter only — null sandbox metric is no longer grounds for rejection.

  • Step 8 requires the iteration comment use the terminal Step 6.5 status and the Iteration History fitness come from the check-run, never from the sandbox. ⏳ Pending CI is explicitly forbidden as a permanent terminal state.

Notes

Copilot AI changed the title [WIP] Fix iteration comments stuck on 'Pending CI' in OpenEvolve OpenEvolve: gate iteration comments on CI and measure fitness in CI Apr 23, 2026
Copilot AI requested a review from mrjf April 23, 2026 06:17
Copilot finished work on behalf of mrjf April 23, 2026 06:17
@mrjf mrjf marked this pull request as ready for review April 23, 2026 06:22
@mrjf mrjf merged commit 5792af4 into main Apr 23, 2026
18 checks passed
@mrjf mrjf deleted the copilot/fix-iteration-comments-pending-ci branch April 23, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenEvolve iteration comments stuck on 'Pending CI'; fitness never populated — wait for CI + run benchmark in CI

2 participants