OpenEvolve: gate iteration comments on CI and measure fitness in CI#199
Merged
OpenEvolve: gate iteration comments on CI and measure fitness in CI#199
Conversation
Agent-Logs-Url: https://github.com/githubnext/tsessebe/sessions/4ffc84f5-3ff8-4a4a-a946-14eeae0ee263 Co-authored-by: mrjf <180956+mrjf@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix iteration comments stuck on 'Pending CI' in OpenEvolve
OpenEvolve: gate iteration comments on CI and measure fitness in CI
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
OpenEvolve iteration comments on
tsb-perf-evolvewere posted with⏳ Pending CI/Metric: pending CI (sandbox has no bun)and never reconciled, because the strategy playbook posts before CI runs and the sandbox can't installbunto measure fitness itself. CI also didn't run the benchmark, so even a green build leftfitness: nullon every population entry.Fix 2 — measure fitness in CI
## Evaluationfrom.autoloop/programs/tsb-perf-evolve/program.mdinto an executableevaluate.shnext to it;program.mdnow just callsbash evaluate.sh. Single source of truth shared between agent and CI.benchmarkjob in.github/workflows/ci.yml, gated onautoloop/*-evolvebranches: Bun + Python + pandas, runsevaluate.sh, uploadsbench-result.json, and creates anOpenEvolve benchmarkcheck-run titledfitness=<num>(orfitness=null+neutralif the benchmark itself errored). Fitness/result are passed toactions/github-scriptvia env, not expression interpolation. Workflow getschecks: write.Fix 1 — make iteration comments reconcile with CI
Added
### Step 6.5. Wait for CIbetween Steps 6 and 7 in.autoloop/strategies/openevolve/strategy.md. Resolves the PR (existing_pr→gh pr list --headfallback), blocks ongh pr checks --watch --interval 30 --fail-fast, classifies with the same awk classifier as the generic Step 5a, then reads fitness from the check-run:Branching:
success→ record numeric fitness, post✅ Accepted;failure→ defer to the existing Gate autoloop iteration acceptance on CI green with a bounded fix-retry loop #176 fix-retry loop, on exhaustion post❌ Rejected/⚠️ Errorwithpause_reason: ci-fix-exhausted;pending(60-min wall-clock cap fired) → recordstatus: pending-ciand leave a single reconciliation-pending comment the next iteration is allowed to overwrite.Step 6 reframed as a cheap pre-filter only — null sandbox metric is no longer grounds for rejection.
Step 8 requires the iteration comment use the terminal Step 6.5 status and the Iteration History
fitnesscome from the check-run, never from the sandbox.⏳ Pending CIis explicitly forbidden as a permanent terminal state.Notes
releaseassets.githubusercontent.comremains a separate concern; CI runners are unrestricted, so this PR works without it.