Add strategy system to autoloop — ship AlphaEvolve as the first specialized iteration playbook

## Summary

Add a **strategy system** to our autoloop so programs can opt into specialized per-iteration playbooks that supersede the generic "analyze / propose / accept / reject" loop. Ship one concrete strategy — **AlphaEvolve** — as the first implementation: a MAP-Elites / islands / four-operators evolutionary loop suitable for problems where we want to *evolve a self-contained artifact toward a scalar fitness* (e.g., making individual tsb functions faster than pandas).

Programs opt in by dropping a `strategy/` directory into their program folder. The running LLM detects it on every iteration and follows the strategy file literally.

## Motivation

Today, every autoloop program uses the same per-iteration flow (read state → propose change → evaluate → accept/reject). That works for coverage tasks (add another benchmark, port another pandas feature), but it doesn't match how you'd want to attack a **performance optimization** problem. For perf work we want:

- Memory of candidates tried, with their fitness — so the agent doesn't re-discover the same 2× speedup six times.
- Diversity pressure — so the agent doesn't converge to one local optimum and stop trying new algorithmic families.
- A "port the good idea from function A to function B" move (migration) — because once one hot function gets a typed-array gather that beats pandas, we want that technique to spread.
- A rut-breaking rule — when five iterations in a row reject, force a jump to a new approach family.

AlphaEvolve gives us all of those. Concrete first use case: a new `tsb-perf-evolve` program that evolves each slow tsb function's implementation to beat pandas, with islands = algorithmic families (column scan, iterator pipeline, gather/scatter, WASM, SoA batched, etc.).

## Design — how programs opt into a strategy

**Repo layout:**

```
.autoloop/
├── strategies/                        # NEW — reusable strategy templates
│   └── alphaevolve/
│       ├── strategy.md                # the runtime playbook (template with <CUSTOMIZE> markers)
│       ├── CUSTOMIZE.md               # creator-time guide; not copied into programs
│       └── prompts/
│           ├── mutation.md
│           └── crossover.md
└── programs/
    └── <program-name>/
        ├── program.md                 # Evolution Strategy section points to strategy/
        ├── code/                      # the target artifact + evaluator
        └── strategy/                  # NEW — program's customized strategy files
            ├── alphaevolve.md         # customized playbook
            └── prompts/
                ├── mutation.md
                └── crossover.md
```

**Opt-in mechanism (agent-mediated):** `program.md`'s `## Evolution Strategy` section contains a pointer block. On every iteration, the agent:

1. Reads `program.md`, looks for `## Evolution Strategy`.
2. If the section names a strategy file (e.g., `strategy/alphaevolve.md`), read it and follow it literally — it supersedes the generic per-iteration steps.
3. If the section is absent or just prose, fall back to the default autoloop iteration flow.

This is entirely in the agent's prompt — no scheduler code changes needed beyond the prompt additions in the next section.

## Changes to `.github/workflows/autoloop.md`

Add a **strategy discovery** step to the agent's per-iteration prompt. Insert near the top of the iteration flow (before "Step 1: Load state"):

```markdown
## Strategy discovery

Before executing the generic iteration loop below, check whether this program has opted into a specialized strategy:

1. Read `<program-dir>/program.md` and look for a `## Evolution Strategy` section.
2. If that section points to a strategy file — e.g., "This program uses the **AlphaEvolve** strategy. Read `strategy/alphaevolve.md` at the start of every iteration and follow it literally." — read the referenced file and follow it.
3. The strategy playbook **supersedes** the generic "Step 2: Analyze" through "Step 5: Accept or Reject" steps below. The other steps (state read, branch management, state file updates, CI gating) still apply.
4. If `## Evolution Strategy` is absent, contains only prose, or points to a file that does not exist, fall back to the default iteration flow below.

Strategy files live under `<program-dir>/strategy/`. Program-specific prompts (e.g., `strategy/prompts/mutation.md`) are read by the strategy playbook at the appropriate step — do not read them pre-emptively, the playbook will tell you when.
```

No changes to `scripts/autoloop_scheduler.py` are required — strategies are an agent-prompt concern, not a scheduler concern.

## Content to ship — `.autoloop/strategies/alphaevolve/`

These files go into this repo verbatim. The `<CUSTOMIZE: …>` markers are intentional — they are placeholders resolved when a program adopts the strategy (see "Authoring a strategy-based program" below).

### `strategies/alphaevolve/strategy.md`

````markdown
# AlphaEvolve Strategy — <CUSTOMIZE: program-name>

This file is the **runtime playbook** for this program. The autoloop agent reads it at the start of every iteration and follows it literally. It supersedes the generic "Analyze and Propose" / "Accept or Reject" steps in the default autoloop iteration loop — all other steps (state read, branch management, state file updates) still apply.

## Problem framing

<CUSTOMIZE: 2–4 sentences describing the target artifact, the fitness function, and what makes a candidate valid. This is the orientation the agent reads first every iteration.>

## Per-iteration loop

### Step 1. Load state

1. Read `program.md` — Goal, Target, Evaluation.
2. Read the program's state file from the repo-memory folder (`{program-name}.md`). Locate the `## 🧬 Population` subsection. If it does not exist, create it using the schema in [Population schema](#population-schema).
3. Read any config the program exposes (e.g. `code/config.yaml`) for tunables like `exploitation_ratio`, `num_islands`. Do not hard-code values you can read from config — the maintainer may have tuned them.
4. Read both prompt templates in `strategy/prompts/`. These frame how you reason about mutations and crossovers for this specific problem.

### Step 2. Pick operator

Sample one operator using these weights (<CUSTOMIZE: tune defaults if the problem has known bias>):

| Operator | Default weight | When it fires |
|---|---|---|
| Exploitation | 0.50 | Refine one of the elites — the current best or a near-best. |
| Exploration | 0.30 | Generate a candidate from an **under-represented island** or a novel family. |
| Crossover | 0.15 | Combine ideas from two parents on different islands. |
| Migration | 0.05 | Take a technique that works on island A and port it into a solution on island B. |

Deterministic overrides (apply *before* sampling):

- If the population is empty or has one member → **Exploration** (seed diversity).
- If the last 3 statuses in `recent_statuses` are all `rejected` → force **Exploration** with a previously-unused island.
- If the last 5 statuses are all `rejected` → force **Migration** or a radically new island; also revisit any domain knowledge in `prompts/mutation.md` that has not yet been applied.

Record your chosen operator in the iteration's reasoning — the state file's Iteration History entry must include it.

### Step 3. Pick parent(s)

**Islands** for this program (<CUSTOMIZE: replace with real islands — 3–6, each a distinct family of approaches>):

- **Island 0 — <CUSTOMIZE: family name>**: <CUSTOMIZE: one-line description>
- **Island 1 — <CUSTOMIZE>**: <CUSTOMIZE>
- **Island 2 — <CUSTOMIZE>**: <CUSTOMIZE>
- **Island 3 — <CUSTOMIZE>**: <CUSTOMIZE>

Parent selection by operator:

- **Exploitation** — pick the best scorer; break ties by picking the most recent.
- **Exploration** — pick the island with the fewest members (or a brand-new island number if all are full), then either start from its best member or from scratch.
- **Crossover** — pick two parents on **different islands**. Bias toward one elite (top quartile) and one diverse (any island with a distinct feature-cell — see [Feature dimensions](#feature-dimensions)).
- **Migration** — pick one donor island (the source of the technique) and one recipient island (where the technique will be grafted in). The parent you actually edit is on the recipient island.

### Step 4. Apply the operator

Frame your reasoning using the matching prompt template:

- Exploitation or Exploration → `strategy/prompts/mutation.md`
- Crossover or Migration → `strategy/prompts/crossover.md`

Before writing any code, state (in your visible reasoning):

1. Chosen operator + why.
2. Parent(s) picked — their IDs, island, score, and a one-line summary of each parent's approach.
3. What specifically you're changing, and your hypothesis for *why* it should improve the fitness.
4. Validity pre-check (<CUSTOMIZE: list the cheap invariants>): walk through why the proposed candidate will satisfy each invariant.
5. Novelty check: confirm this is not a near-duplicate of an existing population member or of anything in the state file's 🚧 Foreclosed Avenues.

### Step 5. Implement

Edit only the files listed in `program.md`'s Target section. The diff style for this program is: <CUSTOMIZE: "full rewrite of the evolve block" or "minimal diff" — depends on problem>.

### Step 6. Evaluate

Run the evaluation command from `program.md`. Parse the metric.

### Step 7. Update the population

Regardless of whether the iteration is accepted or rejected at the branch level, the candidate has been tried and should be recorded in the population — the population is a memory of what's been explored, not just what's been kept.

Append a new entry to the `## 🧬 Population` subsection in the state file using the schema below. Then enforce these caps:

- **Population cap**: <CUSTOMIZE: population_size, default 40>. If exceeded, evict the *worst* member in the most-crowded feature cell (MAP-Elites style — never evict the best of any cell).
- **Elite archive**: the top <CUSTOMIZE: archive_size, default 10> by fitness are always preserved regardless of cell crowding.

### Step 8. Fold through to the default loop

Continue with the normal autoloop Step 5 (Accept or Reject → commit / discard, update state file's Machine State, Iteration History, Lessons Learned, etc.) as defined in the workflow. The only additional requirements from AlphaEvolve are:

- The Iteration History entry must include `operator`, `parent_id(s)`, `island`, and `fitness` fields (in addition to the normal status/change/metric/notes).
- Lessons Learned additions should be phrased as *transferable heuristics* about the problem space, not as reports of what this iteration did. (E.g. "Hex layouts dominate grid layouts above n=20" — not "Iteration 17 tried a hex layout.")

## Population schema

The population lives in the state file `{program-name}.md` on the `memory/autoloop` branch as a subsection. Use this exact layout so maintainers can read and edit it:

```markdown
## 🧬 Population

> 🤖 *Managed by the AlphaEvolve strategy. One entry per candidate that has been evaluated (accepted or rejected). Newest first.*

### Candidate <id>  ·  island <n>  ·  fitness <value>  ·  gen <N>

- **Operator**: exploitation / exploration / crossover / migration
- **Parent(s)**: <id>[, <id>]
- **Feature cell**: <feature_dim_1_bucket> · <feature_dim_2_bucket>
- **Approach**: <one-line summary of the idea — not the code>
- **Status**: ✅ accepted / ❌ rejected
- **Notes**: <one sentence on what was learned>

<details><summary>Code</summary>

```<lang>
…the full source of the evolved block…
```

</details>

---
```

- **id** is monotonic across the whole run (start at 1, increment every candidate, regardless of island).
- **gen** is the iteration number from the Machine State table at the time the candidate was produced.
- **fitness** is the primary metric from Evaluation, using the same sign convention as `best_metric`.
- The `<details>` block keeps the file readable — only expand when picking parents.

## Feature dimensions

MAP-Elites niching along two dimensions keeps the population diverse even when many candidates score similarly.

- **Dimension 1 — <CUSTOMIZE: e.g. `approach_family`>**: buckets are the islands above (or a finer split, if useful).
- **Dimension 2 — <CUSTOMIZE: e.g. `code_complexity`>**: bucket into `small` / `medium` / `large` using a concrete threshold (<CUSTOMIZE: e.g. <40 / 40–100 / >100 lines of the evolved block>).

A feature cell is the pair (dim1, dim2). Eviction rule: when the population is full, remove the worst member of the *most crowded* cell. This keeps rare cells alive even when their occupants score lower.

## Invariants the agent must not violate

- Never edit files outside `program.md`'s Target list.
- Never modify the evaluator or the config (if present) — those are part of the fitness definition. Tune hyperparameters *inside* the evolve block when relevant, not by rewriting config.
- Never mutate a population entry's recorded fitness. The population is append-only with eviction; historical scores do not change.
- If the evaluation fails to run (import error, timeout, etc.), record the candidate in the population with `fitness: null` and `status: error`, and follow the default loop's error path — do **not** silently drop it.
````

### `strategies/alphaevolve/prompts/mutation.md`

````markdown
# Mutation prompt — <CUSTOMIZE: program-name>

The running LLM uses this file as framing when applying an **exploitation** or **exploration** operator. Read it before proposing a mutation.

---

You are mutating a candidate solution for this program. The goal is to produce a *new* candidate that, in expectation, improves the fitness — not to mechanically perturb the parent.

## Domain knowledge

<CUSTOMIZE: paste the high-leverage facts about this problem space here. 5–15 bullets. Examples of what belongs here:

- Known analytical bounds or benchmarks ("optimal sum of radii for n=26 is ~2.635")
- Structural regularities the search should exploit ("hex packings dominate square packings in the interior")
- Common failure modes to avoid ("concentric rings are a local optimum — escape requires breaking radial symmetry")
- Useful libraries or primitives ("scipy.optimize.minimize with SLSQP handles the non-convex constraint set well")

These should be facts an expert would put on a whiteboard, not boilerplate.>

## How to reason about this mutation

1. **Read the parent candidate's code** (from the state file's Population subsection).
2. Identify **which specific mechanism** in the parent is limiting its fitness. Name it concretely — "the radial symmetry forces outer-ring circles to shrink to fit corners" is useful; "the algorithm isn't good enough" is not.
3. Propose a **single, targeted change** to that mechanism. The change should be large enough to plausibly move the fitness, but small enough that you can explain why it works before running the evaluation.
4. If the chosen operator is **exploration**, your change should move the candidate into a different island — it should be recognizably a different *family* of approach, not a tuning tweak.
5. If the chosen operator is **exploitation**, your change should preserve the parent's core idea and improve the execution.

## Diff style

<CUSTOMIZE: pick one>

- **Full rewrite of the evolve block** — preferred for problems where small perturbations rarely help (geometry, architecture search, prompt programs).
- **Minimal diff** — preferred for problems where the parent is mostly correct and only small adjustments are under test.

## What the reasoning output must contain

Before writing any code:

- Parent ID, island, fitness, one-line summary.
- The specific mechanism you are changing.
- The hypothesis: *why* this change should improve the fitness. If you can't state the hypothesis in one sentence, the change is not targeted enough.
- The invariants you are about to check (validity pre-check from `strategy/alphaevolve.md`).

After writing the code:

- A "Mutation summary" line: 10–20 words, suitable for the Iteration History and Population entry.
````

### `strategies/alphaevolve/prompts/crossover.md`

````markdown
# Crossover prompt — <CUSTOMIZE: program-name>

The running LLM uses this file as framing when applying a **crossover** or **migration** operator. Read it before combining parents.

---

You are combining two parent candidates into a new one. The goal is a *synthesis* — a candidate that takes the strongest mechanism from each parent and fuses them coherently, not a syntactic merge or an average.

## Domain knowledge

<CUSTOMIZE: same set of domain facts as in mutation.md — duplicated here so the crossover prompt stands alone. Keep them in sync when one is updated.>

## How to reason about this crossover

1. **Read both parents' code** (from the state file's Population subsection).
2. For each parent, identify **the single strongest mechanism** it contributes — the part that earns its fitness. Name it concretely.
3. Identify the **incompatibility** between the two mechanisms. Almost always there is one: two optimizers with different state representations, two layouts with different coordinate conventions, two architectures with different tensor shapes. Name it.
4. Propose a **bridging design** — the smallest change that lets both mechanisms coexist. This is the creative step; it is the thing crossover does that mutation cannot.
5. If the operator is **migration** specifically: the donor parent contributes one technique (e.g. SLSQP refinement, cosine LR schedule) that gets grafted into the recipient parent's architecture. The result should read as "recipient, with donor's X applied." Do not wholesale replace the recipient.

## Diff style

Crossover almost always implies a **full rewrite** of the evolve block — the bridging design rarely fits as a local diff. Use full rewrite unless the parents are extremely similar.

## What the reasoning output must contain

Before writing any code:

- Parent A: ID, island, fitness, strongest mechanism (one line).
- Parent B: ID, island, fitness, strongest mechanism (one line).
- Incompatibility between A and B (one line).
- Bridging design (2–3 sentences).
- The hypothesis: *why* the synthesis should out-score both parents.
- Validity pre-check walk-through.

After writing the code:

- A "Crossover summary" line naming both parents and the grafted mechanism.
````

### `strategies/alphaevolve/CUSTOMIZE.md`

````markdown
# AlphaEvolve — Customization Guide (read by the program-creator agent)

This file is **not** copied into programs. It tells the creator agent how to turn the generic AlphaEvolve template into a problem-specific strategy in a new program directory.

## When to pick this strategy

Pick AlphaEvolve when **all** of the following hold:

- The Goal is to discover or improve a **self-contained artifact** (an algorithm, a constructor, a layout, a circuit, a prompt, a heuristic).
- The Evaluation produces a **scalar fitness** that can be compared across candidates.
- Validity is usually cheap to check; progress comes from exploring a large design space.
- **Full rewrites** of the target are acceptable and often preferable — the search is over *ideas*, not local edits.

Do **not** pick AlphaEvolve for: growing test coverage, refactoring code for clarity, reducing bundle size, fixing bugs, or any task where "better" is defined by a codebase property rather than a runnable artifact. Those fit a plain prose Evolution Strategy.

## What to copy

Copy these files from this template into the new program directory at `.autoloop/programs/<program-name>/strategy/`:

- `strategy.md` → `strategy/alphaevolve.md` (rename — the program-specific copy is named after the strategy so multiple strategies could co-exist)
- `prompts/mutation.md` → `strategy/prompts/mutation.md`
- `prompts/crossover.md` → `strategy/prompts/crossover.md`

The `population/` directory is **not** a filesystem concept — population state lives in the program's state file on the `memory/autoloop` branch under a `## 🧬 Population` subsection. The playbook explains the schema.

## What to customize before opening the PR

Every `<CUSTOMIZE: …>` marker in the copied files must be resolved. Do not leave any marker unresolved — the running LLM treats these files as literal instructions.

In `strategy/alphaevolve.md`:

- **Problem statement** — replace the generic problem description with 2–4 sentences specific to this program (what is the target artifact, what is the fitness, what makes candidates valid).
- **Island definitions** — define 3–6 islands, each a distinct family of approaches for *this* problem.
- **Feature dimensions** — pick 2 feature dimensions the MAP-Elites niching uses. For an algorithm search: often `approach_family` × `code_complexity`. For a model/architecture search: often `architecture_family` × `parameter_count_bucket`.
- **Validity pre-check** — list the cheap invariants the agent should mentally check *before* running evaluation.
- **Operator weights** — set the exploitation / exploration / crossover / migration probabilities. Defaults (0.5 / 0.3 / 0.15 / 0.05) are a starting point; tune if the problem has a known bias.

In `strategy/prompts/mutation.md` and `crossover.md`:

- **Domain knowledge block** — paste the domain insights that belong at the top of every mutation/crossover prompt.
- **Diff style** — specify whether mutations should produce diffs or full rewrites.

## What goes in program.md

Replace the program's `## Evolution Strategy` section with a short pointer block that the running LLM can act on. The pointer block must:

1. Name the strategy (`AlphaEvolve`).
2. Tell the running LLM to **read `strategy/alphaevolve.md` at the start of every iteration and follow it literally**.
3. List the support files and what each is for.
4. Not duplicate content from the playbook.

Example:

```markdown
## Evolution Strategy

This program uses the **AlphaEvolve** strategy. On every iteration, read `strategy/alphaevolve.md` and follow it literally — it supersedes the generic analyze/accept/reject steps in the default autoloop loop.

Support files:
- `strategy/alphaevolve.md` — the runtime playbook (operators, parent selection, population rules).
- `strategy/prompts/mutation.md` — framing for exploitation and exploration operators.
- `strategy/prompts/crossover.md` — framing for crossover and migration operators.

Population state lives in the state file on the `memory/autoloop` branch under the `## 🧬 Population` subsection (see the playbook for the schema).
```

## What NOT to put in the program directory

- Do not commit the population into a file in the program dir. It lives in the state file on `memory/autoloop` — that way it co-evolves with the program's other state.
- Do not copy `CUSTOMIZE.md`. It is a tool for creation, not runtime.
- Do not create a filesystem `population/` directory.
````

## Authoring a strategy-based program

When a maintainer creates a new program that should use AlphaEvolve, they:

1. Create `.autoloop/programs/<program-name>/` with the usual `program.md` + `code/` layout.
2. Copy `.autoloop/strategies/alphaevolve/strategy.md`, `prompts/mutation.md`, `prompts/crossover.md` into `.autoloop/programs/<program-name>/strategy/` (renaming `strategy.md` → `alphaevolve.md`).
3. Resolve every `<CUSTOMIZE: …>` marker following `CUSTOMIZE.md`'s guidance.
4. Replace `program.md`'s `## Evolution Strategy` section with the pointer block from `CUSTOMIZE.md`.

The existing "create program" prompt (if we have one, else file a follow-up) should be updated to offer a strategy picker — currently just "AlphaEvolve or plain prose", extensible to more strategies later.

## Acceptance

- `.autoloop/strategies/alphaevolve/` exists with `strategy.md`, `CUSTOMIZE.md`, `prompts/mutation.md`, `prompts/crossover.md`.
- `.github/workflows/autoloop.md` has a "Strategy discovery" prompt section instructing the agent to check for and read strategy files.
- A proof-of-concept AlphaEvolve program is included (e.g., `tsb-perf-evolve` targeting one slow tsb function with islands = algorithmic families). Does not need to have run successfully — presence + well-formed customization is enough to validate the path.
- The autoloop iteration on a program *without* a strategy file behaves identically to today (no regression for the coverage-style programs).

## Out of scope

- A UI / CLI for picking strategies. The creation flow is prose-in-program.md for now.
- Additional strategies beyond AlphaEvolve. Leave `test-driven` and other names as future work; one concrete strategy is enough to validate the system.
- Scheduler-level strategy awareness. Strategies are agent-prompt concerns; the scheduler picks *which program* runs, the agent picks *how* it runs within that program.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add strategy system to autoloop — ship AlphaEvolve as the first specialized iteration playbook #181

Summary

Motivation

Design — how programs opt into a strategy

Changes to `.github/workflows/autoloop.md`

Content to ship — `.autoloop/strategies/alphaevolve/`

`strategies/alphaevolve/strategy.md`

`strategies/alphaevolve/prompts/mutation.md`

`strategies/alphaevolve/prompts/crossover.md`

`strategies/alphaevolve/CUSTOMIZE.md`

Authoring a strategy-based program

Acceptance

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add strategy system to autoloop — ship AlphaEvolve as the first specialized iteration playbook #181

Description

Summary

Motivation

Design — how programs opt into a strategy

Changes to .github/workflows/autoloop.md

Content to ship — .autoloop/strategies/alphaevolve/

strategies/alphaevolve/strategy.md

strategies/alphaevolve/prompts/mutation.md

strategies/alphaevolve/prompts/crossover.md

strategies/alphaevolve/CUSTOMIZE.md

Authoring a strategy-based program

Acceptance

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Changes to `.github/workflows/autoloop.md`

Content to ship — `.autoloop/strategies/alphaevolve/`

`strategies/alphaevolve/strategy.md`

`strategies/alphaevolve/prompts/mutation.md`

`strategies/alphaevolve/prompts/crossover.md`

`strategies/alphaevolve/CUSTOMIZE.md`