OpenEvolve: gate iteration comments on CI and measure fitness in CI by Copilot · Pull Request #199 · githubnext/tsessebe

Copilot · 2026-04-23T06:11:25Z

OpenEvolve iteration comments on tsb-perf-evolve were posted with ⏳ Pending CI / Metric: pending CI (sandbox has no bun) and never reconciled, because the strategy playbook posts before CI runs and the sandbox can't install bun to measure fitness itself. CI also didn't run the benchmark, so even a green build left fitness: null on every population entry.

Fix 2 — measure fitness in CI

Extracted ## Evaluation from .autoloop/programs/tsb-perf-evolve/program.md into an executable evaluate.sh next to it; program.md now just calls bash evaluate.sh. Single source of truth shared between agent and CI.
New benchmark job in .github/workflows/ci.yml, gated on autoloop/*-evolve branches: Bun + Python + pandas, runs evaluate.sh, uploads bench-result.json, and creates an OpenEvolve benchmark check-run titled fitness=<num> (or fitness=null + neutral if the benchmark itself errored). Fitness/result are passed to actions/github-script via env, not expression interpolation. Workflow gets checks: write.

Fix 1 — make iteration comments reconcile with CI

Added ### Step 6.5. Wait for CI between Steps 6 and 7 in .autoloop/strategies/openevolve/strategy.md. Resolves the PR (existing_pr → gh pr list --head fallback), blocks on gh pr checks --watch --interval 30 --fail-fast, classifies with the same awk classifier as the generic Step 5a, then reads fitness from the check-run:
```
fitness=$(gh api "repos/${GITHUB_REPOSITORY}/commits/${SHA}/check-runs" \
  --jq '.check_runs[] | select(.name == "OpenEvolve benchmark") | .output.title' \
  | sed -n 's/^fitness=//p' | head -n1)
```
Branching: success → record numeric fitness, post ✅ Accepted; failure → defer to the existing Gate autoloop iteration acceptance on CI green with a bounded fix-retry loop #176 fix-retry loop, on exhaustion post ❌ Rejected / ⚠️ Error with pause_reason: ci-fix-exhausted; pending (60-min wall-clock cap fired) → record status: pending-ci and leave a single reconciliation-pending comment the next iteration is allowed to overwrite.
Step 6 reframed as a cheap pre-filter only — null sandbox metric is no longer grounds for rejection.
Step 8 requires the iteration comment use the terminal Step 6.5 status and the Iteration History fitness come from the check-run, never from the sandbox. ⏳ Pending CI is explicitly forbidden as a permanent terminal state.

Notes

Stale comments on [Autoloop: tsb-perf-evolve] #189 / [Autoloop] [Autoloop: tsb-perf-evolve] #190 are left as historical record per the issue.
Firewall allowlist for releaseassets.githubusercontent.com remains a separate concern; CI runners are unrestricted, so this PR works without it.

Agent-Logs-Url: https://github.com/githubnext/tsessebe/sessions/4ffc84f5-3ff8-4a4a-a946-14eeae0ee263 Co-authored-by: mrjf <180956+mrjf@users.noreply.github.com>

Initial plan

9d3524a

Copilot AI assigned Copilot and mrjf Apr 23, 2026

Copilot started work on behalf of mrjf April 23, 2026 06:11 View session

Copilot AI linked an issue Apr 23, 2026 that may be closed by this pull request

OpenEvolve iteration comments stuck on 'Pending CI'; fitness never populated — wait for CI + run benchmark in CI #196

Closed

openevolve: wait for CI + run benchmark in CI; extract evaluate.sh

f3657f5

Agent-Logs-Url: https://github.com/githubnext/tsessebe/sessions/4ffc84f5-3ff8-4a4a-a946-14eeae0ee263 Co-authored-by: mrjf <180956+mrjf@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix iteration comments stuck on 'Pending CI' in OpenEvolve~~ OpenEvolve: gate iteration comments on CI and measure fitness in CI Apr 23, 2026

Copilot AI requested a review from mrjf April 23, 2026 06:17

Copilot finished work on behalf of mrjf April 23, 2026 06:17

mrjf marked this pull request as ready for review April 23, 2026 06:22

mrjf merged commit 5792af4 into main Apr 23, 2026
18 checks passed

mrjf deleted the copilot/fix-iteration-comments-pending-ci branch April 23, 2026 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenEvolve: gate iteration comments on CI and measure fitness in CI#199

OpenEvolve: gate iteration comments on CI and measure fitness in CI#199
mrjf merged 2 commits intomainfrom
copilot/fix-iteration-comments-pending-ci

Copilot AI commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix 2 — measure fitness in CI

Fix 1 — make iteration comments reconcile with CI

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 23, 2026 •

edited

Loading