You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build an end-to-end integration test that exercises the install.md Quick Start flow against a real GitHub repository, verifies everything is wired up correctly, and cleans up after itself. The test uses Copilot CLI as the agent that reads and follows the install instructions, so regressions in install.md or in the compile/init flow show up as test failures rather than silent rot.
Two modes:
Local mode — a shell script a maintainer runs on their workstation. Primary use.
Actions mode — a workflow_dispatch trigger in this repo that runs the same flow on a GitHub-hosted runner and reports pass/fail. Not part of CI — manually invoked only.
Target repo
Runs against mrjf/autoloop-test (private, already provisioned as an empty repo with a base state on main). The base state will contain enough content for autoloop to have a plausible program target:
README.md — explains the repo's purpose as the install-integration target.
src/minimize.py — a naive minimizer for a well-known optimization function (Rastrigin); the thing autoloop iterates on.
src/evaluate.py — runs the minimizer, emits {"metric": <float>} on stdout.
tests/test_minimize.py — correctness pin for the minimizer.
Every test run starts by hard-resetting mrjf/autoloop-test to origin/main (the base state), then runs the install, verifies, then hard-resets again so the next run starts from a known-clean state. No repo creation/deletion per run — the repo is long-lived; the state is transient.
What the test verifies
Phase 1 — install (required)
The agent follows install.md and produces:
gh aw extension installed on the runner.
gh aw init completed (.gitattributes, dispatcher, copilot-setup-steps).
Lock file is idempotent: re-running gh aw compile autoloop does not change the file (sha256 before == sha256 after).
Phase 2 — program creation (required)
After the install PR is merged:
Agent creates an autoloop-program issue via the issue template (either file-based program in .autoloop/programs/<name>/program.md or issue-based program with the autoloop-program label).
The program targets src/minimize.py with src/evaluate.py as the evaluation script.
The program is discovered by the scheduler on a manually-triggered workflow run (gh workflow run autoloop.lock.yml -f program=<name>).
The first iteration runs to completion (accepted OR rejected OR errored — all three are valid "it ran" outcomes).
Phase 3 — teardown (required)
Hard-reset main on the test repo to a saved base-state SHA.
Force-push.
Delete any autoloop/* branches created during the test.
Close any issues and PRs created during the test.
Delete the memory/autoloop branch if created.
Local mode
Script lives at tests/install-integration/run.sh. Usage:
# from the autoloop repo root:
./tests/install-integration/run.sh
Requirements:
gh authenticated as a user with write access to mrjf/autoloop-test.
copilot CLI on PATH.
python3 available.
INSTALL_TEST_REPO env var (default mrjf/autoloop-test) in case someone wants to point at a different target.
Behavior:
1. Pre-flight: verify gh auth, copilot on PATH, python3 on PATH.
2. Capture the current base-state SHA of the target repo's main branch.
3. Reset target repo to base state (no-op on first run; discards prior test debris on subsequent runs).
4. Clone target locally to a temp dir.
5. Feed install.md to copilot CLI with a tight prompt ("follow these steps exactly, do not improvise, report the install PR URL").
6. Phase 1 verification (file presence, lock idempotency, PR exists).
7. Merge the install PR via `gh pr merge --auto --squash`, wait for merge to land.
8. Create a program via the issue template (or file-based — prefer file-based for determinism).
9. Trigger the autoloop workflow: `gh workflow run autoloop.lock.yml -f program=<name>`. Poll until completion.
10. Phase 2 verification (program discovered, iteration ran to completion, state file written to memory/autoloop).
11. Teardown: reset to base state, close test issues/PRs, delete autoloop branches.
12. Report PASS/FAIL with a summary.
Exit non-zero on any failed assertion. trap 'teardown' EXIT ensures cleanup even on abort.
Actions mode
A workflow file at .github/workflows/install-integration-test.yml with workflow_dispatch:
name: Install Integration Teston:
workflow_dispatch:
inputs:
keep_state_on_failure:
description: "Leave test repo in failure state for inspection"type: booleandefault: falsejobs:
install-integration:
runs-on: ubuntu-latesttimeout-minutes: 30steps:
- uses: actions/checkout@v4
- name: Install gh aw extensionrun: gh extension install github/gh-awenv:
GH_TOKEN: ${{ secrets.INSTALL_TEST_TOKEN }}
- name: Install Copilot CLIrun: | # whatever the current install path is for copilot CLI
- name: Run integration testrun: ./tests/install-integration/run.shenv:
GH_TOKEN: ${{ secrets.INSTALL_TEST_TOKEN }}INSTALL_TEST_REPO: mrjf/autoloop-testKEEP_STATE_ON_FAILURE: ${{ inputs.keep_state_on_failure }}
Requires a repo secret INSTALL_TEST_TOKEN — a PAT with repo scope on mrjf/autoloop-test. This token is how Actions mode authenticates to the target repo (the default GITHUB_TOKEN has no access to repos outside the host).
Actions mode is not on any schedule and not on PRs. It exists so maintainers can click "Run workflow" from the web UI and get a full end-to-end pass/fail in ~15–25 minutes without having to set up copilot locally.
Files to ship in this repo
tests/install-integration/
├── run.sh # the driver (local + Actions)
├── prompt.md # the Copilot CLI prompt (external for easy editing)
├── verify-phase1.sh # file-presence + idempotency assertions
├── verify-phase2.sh # program discovery + first iteration assertions
└── teardown.sh # reset-to-base + close issues + delete branches
.github/workflows/
└── install-integration-test.yml # workflow_dispatch wrapper around run.sh
Keep the script small and readable; the value is in the flow, not the test framework.
Copilot CLI prompt sketch
Separate file (tests/install-integration/prompt.md) so it's editable without touching the driver. Something like:
You are installing autoloop into a freshly-reset GitHub repository.
Your working directory is the root of that repository, cloned locally. The
repository is empty except for the base fixtures in `src/` and `tests/`.
Follow the install instructions at the URL below, EXACTLY AS WRITTEN. Execute
each step using shell commands. Do not skip steps. Do not improvise. Do not
optimize or "improve" the instructions.
When you finish: print a single line `INSTALL_PR=<url>` with the URL of the
PR you opened in step 5. Then stop.
Install instructions: https://github.com/githubnext/autoloop/blob/main/install.md
The driver captures stdout and greps for INSTALL_PR= to get the PR URL it needs for Phase 2.
Base state on mrjf/autoloop-test
Initial content pushed to main (by a maintainer, one-time setup — not part of the test):
README.md — documents the repo's purpose and warns that main is force-pushed by the integration test.
src/minimize.py — naive Rastrigin minimizer; the optimization target.
src/evaluate.py — runs minimize.py, emits {"metric": <value>} (lower is better).
tests/test_minimize.py — pins correctness of the minimizer's signature and a smoke case.
.gitignore — standard Python ignores.
The test script captures origin/main's SHA at startup and resets to exactly that SHA at teardown. No assumption about what's in it beyond "it's a valid baseline." Future expansion (more fixtures, different language) only requires updating the base state on the test repo — the driver doesn't need changes.
Failure modes the test should catch
install.md instructions silently rot (a step references a file or command that no longer exists).
gh aw compile becomes non-idempotent (re-running changes the lock file).
A new file is added to autoloop's workflows/ that install.md doesn't mention copying.
Copilot CLI changes in a way that breaks "follow a numbered list of shell commands."
The issue template's front-matter drifts from what the scheduler parses.
The first iteration of a freshly-created program cannot complete (missing permissions, missing secrets, workflow compile error).
None of these are caught by any other existing test.
Out of scope
CI scheduling. Integration tests with real repos + LLM calls don't belong on every PR. Manual-dispatch only.
Testing Copilot CLI itself. We treat Copilot as a black box — if it can't follow clear instructions, the test fails and we investigate separately.
Repo creation per run. Using a long-lived target repo with reset-to-base semantics is cheaper and avoids accumulating abandoned repos across runs.
Acceptance
tests/install-integration/run.sh exists, passes locally when run against mrjf/autoloop-test.
.github/workflows/install-integration-test.yml runs the same script in Actions and reports PASS when triggered.
mrjf/autoloop-test is in the expected base state after a passing run.
If the test fails, a maintainer can pass --keep (local) or set the input keep_state_on_failure=true (Actions) to inspect the failure state before teardown runs.
Related
install.md — the document under test. Every regression in install caught here is a regression caught before it hits consumers.
Summary
Build an end-to-end integration test that exercises the
install.mdQuick Start flow against a real GitHub repository, verifies everything is wired up correctly, and cleans up after itself. The test uses Copilot CLI as the agent that reads and follows the install instructions, so regressions ininstall.mdor in the compile/init flow show up as test failures rather than silent rot.Two modes:
workflow_dispatchtrigger in this repo that runs the same flow on a GitHub-hosted runner and reports pass/fail. Not part of CI — manually invoked only.Target repo
Runs against
mrjf/autoloop-test(private, already provisioned as an empty repo with a base state onmain). The base state will contain enough content for autoloop to have a plausible program target:README.md— explains the repo's purpose as the install-integration target.src/minimize.py— a naive minimizer for a well-known optimization function (Rastrigin); the thing autoloop iterates on.src/evaluate.py— runs the minimizer, emits{"metric": <float>}on stdout.tests/test_minimize.py— correctness pin for the minimizer.Every test run starts by hard-resetting
mrjf/autoloop-testtoorigin/main(the base state), then runs the install, verifies, then hard-resets again so the next run starts from a known-clean state. No repo creation/deletion per run — the repo is long-lived; the state is transient.What the test verifies
Phase 1 — install (required)
The agent follows
install.mdand produces:gh awextension installed on the runner.gh aw initcompleted (.gitattributes, dispatcher, copilot-setup-steps)..github/workflows/autoloop.md,.github/workflows/sync-branches.md(or, when Remove sync-branches workflow — made redundant by per-iteration Step 3 ahead/behind logic #52 lands, onlyautoloop.md),.github/workflows/shared/..github/workflows/autoloop.lock.yml(andsync-branches.lock.ymlif still present)..github/ISSUE_TEMPLATE/autoloop-program.md..autoloop/programs/.main.gh aw compile autoloopdoes not change the file (sha256 before == sha256 after).Phase 2 — program creation (required)
After the install PR is merged:
.autoloop/programs/<name>/program.mdor issue-based program with theautoloop-programlabel).src/minimize.pywithsrc/evaluate.pyas the evaluation script.gh workflow run autoloop.lock.yml -f program=<name>).Phase 3 — teardown (required)
mainon the test repo to a saved base-state SHA.autoloop/*branches created during the test.memory/autoloopbranch if created.Local mode
Script lives at
tests/install-integration/run.sh. Usage:# from the autoloop repo root: ./tests/install-integration/run.shRequirements:
ghauthenticated as a user with write access tomrjf/autoloop-test.copilotCLI on PATH.python3available.INSTALL_TEST_REPOenv var (defaultmrjf/autoloop-test) in case someone wants to point at a different target.Behavior:
Exit non-zero on any failed assertion.
trap 'teardown' EXITensures cleanup even on abort.Actions mode
A workflow file at
.github/workflows/install-integration-test.ymlwithworkflow_dispatch:Requires a repo secret
INSTALL_TEST_TOKEN— a PAT withreposcope onmrjf/autoloop-test. This token is how Actions mode authenticates to the target repo (the defaultGITHUB_TOKENhas no access to repos outside the host).Actions mode is not on any schedule and not on PRs. It exists so maintainers can click "Run workflow" from the web UI and get a full end-to-end pass/fail in ~15–25 minutes without having to set up
copilotlocally.Files to ship in this repo
Keep the script small and readable; the value is in the flow, not the test framework.
Copilot CLI prompt sketch
Separate file (
tests/install-integration/prompt.md) so it's editable without touching the driver. Something like:The driver captures stdout and greps for
INSTALL_PR=to get the PR URL it needs for Phase 2.Base state on
mrjf/autoloop-testInitial content pushed to
main(by a maintainer, one-time setup — not part of the test):README.md— documents the repo's purpose and warns thatmainis force-pushed by the integration test.src/minimize.py— naive Rastrigin minimizer; the optimization target.src/evaluate.py— runsminimize.py, emits{"metric": <value>}(lower is better).tests/test_minimize.py— pins correctness of the minimizer's signature and a smoke case..gitignore— standard Python ignores.The test script captures
origin/main's SHA at startup and resets to exactly that SHA at teardown. No assumption about what's in it beyond "it's a valid baseline." Future expansion (more fixtures, different language) only requires updating the base state on the test repo — the driver doesn't need changes.Failure modes the test should catch
install.mdinstructions silently rot (a step references a file or command that no longer exists).gh aw compilebecomes non-idempotent (re-running changes the lock file).workflows/thatinstall.mddoesn't mention copying.None of these are caught by any other existing test.
Out of scope
Acceptance
tests/install-integration/run.shexists, passes locally when run againstmrjf/autoloop-test..github/workflows/install-integration-test.ymlruns the same script in Actions and reports PASS when triggered.mrjf/autoloop-testis in the expected base state after a passing run.--keep(local) or set the inputkeep_state_on_failure=true(Actions) to inspect the failure state before teardown runs.Related
install.md— the document under test. Every regression in install caught here is a regression caught before it hits consumers.install.mdstarts copying aVERSIONfile (per Basic versioning: VERSION file + self-check that opens an update-available issue on drift #46), the test adds a Phase 1 assertion that.github/autoloop-versionexists in the install PR's diff.sync-branches.lock.ymlassertions.