Skip to content

Autoloop pre-step can't read state files — build-tsb starved since 2026-04-12 #162

@mrjf

Description

@mrjf

Symptom

Issue #1 (the build-tsb-pandas-typescript-migration autoloop program) has received no new comments since 2026-04-12 — over 8 days. The state file on memory/autoloop says Last Run: 2026-04-12T11:15:07Z, Iteration Count: 230, Paused: false, Completed: false. The program is healthy by every recorded criterion, but it is never selected.

Meanwhile perf-comparison runs every 30 minutes.

Root cause

The autoloop pre-step ("Check which programs are due") reads state files from /tmp/gh-aw/repo-memory/autoloop/, but the memory-clone step runs after the pre-step and clones to /tmp/gh-aw/repo-memory/default/. Wrong directory and wrong order — the pre-step never sees state for any program.

Confirmed in the agent-job log for run 24642957273:

Found issue-based program: 'build-tsb-pandas-typescript-migration' (issue #1)
perf-comparison: no state file found (first run)
perf-comparison: no state found (first run)
build-tsb-pandas-typescript-migration: no state file found (first run)
build-tsb-pandas-typescript-migration: no state found (first run)
=== Autoloop Program Check ===
Selected program:      perf-comparison (.autoloop/programs/perf-comparison/program.md)
Deferred (next run):   ['build-tsb-pandas-typescript-migration']
Programs skipped:      (none)

Relevant code in .github/workflows/autoloop.md:

  • L107: repo_memory_dir = "/tmp/gh-aw/repo-memory/autoloop" — where the Python pre-step looks for state files.
  • The memory-clone step (downstream of the pre-step) sets MEMORY_DIR=/tmp/gh-aw/repo-memory/default and clones memory/autoloop there after the Python script has already run.

Why build-tsb gets starved

Without state, every program looks like "first run" (no last_run, nothing to order by). The selection tiebreaker picks programs in program_files order, which appends file-based programs first (perf-comparison) and issue-based programs last (build-tsb-pandas-typescript-migration). So every run:

  1. Both programs discovered.
  2. Both "no state found → treat as first run".
  3. perf-comparison wins the tiebreaker.
  4. build-tsb-pandas-typescript-migration is deferred to "next run".
  5. Next run: go to step 1.

Nothing breaks the cycle. build-tsb has been in "deferred" purgatory for 8 days.

Fix

Clone the memory/autoloop branch into /tmp/gh-aw/repo-memory/autoloop/ before the "Check which programs are due" step runs. Options:

Option A (minimal): Add a shell step at the top of the steps: list that does the clone:

steps:
  - name: Clone repo-memory for scheduler
    env:
      GITHUB_TOKEN: ${{ github.token }}
      GITHUB_REPOSITORY: ${{ github.repository }}
    run: |
      mkdir -p /tmp/gh-aw/repo-memory
      git clone --depth=1 --branch memory/autoloop \
        "https://x-access-token:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git" \
        /tmp/gh-aw/repo-memory/autoloop \
        || mkdir -p /tmp/gh-aw/repo-memory/autoloop  # branch may not exist on first run

  - name: Check which programs are due
    # (existing step)

Option B: Do the memory/autoloop fetch inline within the existing Python script using the GitHub Contents API (no separate shell step; single source of truth for where state lives). Heavier rewrite but less moving parts.

Option C: Reorder so gh-aw's built-in repo-memory clone runs before the pre-step, and change the pre-step to read from wherever the built-in clone lands. Requires coordination with gh-aw plumbing; brittle.

Prefer Option A — it's additive, doesn't touch gh-aw internals, and it's obvious from reading the workflow why the clone is there.

Secondary fix — deterministic tie-breaking

Even after the state is read, when a program genuinely has never run, the tiebreaker should avoid permanent starvation. Prefer:

  • Among programs with no last_run, pick the one whose schedule is shortest (so "every 30m" beats "every 6h"), then fall back to alphabetical by name.
  • This way build-tsb (every 30m) would beat perf-comparison (every 6h) on the first run after the fix, then state catches up and ordinary last_run ordering takes over.

Acceptance

  • After merge, the next autoloop run logs build-tsb-pandas-typescript-migration: last_run=2026-04-12T11:15:07Z, iteration_count=230 (state successfully read).
  • Issue Build tsb: pandas → TypeScript migration #1 receives a new Autoloop comment within one scheduled window.
  • perf-comparison and build-tsb alternate naturally based on last_run, with neither starved.

Context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions