Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Use this routing before editing so the right package and tests get updated:
| Agent runtime (clone, tools, prompts, container) | `agent/src/` (`pipeline.py`, `runner.py`, `config.py`, `hooks.py`, `policy.py`, `prompts/`, Dockerfile, etc.) | `agent/tests/`, `agent/README.md` for env/PAT |
| Agent progress events (written to `TaskEventsTable` from the MicroVM; read by `bgagent watch`) | `agent/src/progress_writer.py`, `agent/src/pipeline.py` and `agent/src/runner.py` (integration points) | `agent/tests/test_progress_writer.py`; `cli/src/commands/watch.ts` for the consumer side |
| User-facing or design prose | `docs/guides/`, `docs/design/` | Run **`mise //docs:sync`** or **`mise //docs:build`** (do not edit `docs/src/content/docs/` by hand) |
| Architecture decisions (ADRs) | `docs/decisions/` | Run **`mise //docs:sync`** after adding or editing an ADR |
| Monorepo tasks, CI glue | Root `mise.toml`, `scripts/`, `.github/workflows/` | — |

### CDK handler tests (quick map)
Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Guidelines:
- If you change API types in `cdk/src/handlers/shared/types.ts`, update `cli/src/types.ts` to match.
- If you change docs sources (`docs/guides/`, `docs/design/`), run `mise //docs:sync` so generated content stays in sync.
- For significant features, add a design document to `docs/design/`.
- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](./docs/decisions/README.md)).

### 4. Commit

Expand Down
5 changes: 5 additions & 0 deletions docs/astro.config.mjs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

120 changes: 120 additions & 0 deletions docs/decisions/001-stacked-pull-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# ADR-001: Stacked pull requests for multi-PR features

**Status:** accepted
**Date:** 2026-05-19

## Context

Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems:

- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes.
- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains.
- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues.
- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation).

The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high.
Comment thread
scottschreckengaust marked this conversation as resolved.

## Decision

Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules:

### 1. Position statement

Every PR description states its position:

```markdown
## Stack position

PR {N} for #{parent-issue} — {overall goal one-liner}

### Prior: {what the previous PR delivered}
### This PR: {what this adds}
### Next (optional): {what comes next, if scope is known}
```

This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress.

### 2. Branch targeting

- PR 1 targets `main`
- PR N targets PR N-1's branch
- Final PR merges the full stack to `main`

```
main
└── feat/first-concern (PR 1)
└── feat/second-concern (PR 2)
└── feat/third-concern (PR 3 → merge to main)
```

### 3. Self-contained reviewability

Each PR:
- Compiles and passes tests independently
- Can be deployed without breaking the system (see exception below)
- Has a single clear responsibility (one concern per PR)
- Does not leave dead code, TODOs, or broken intermediate states

**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables.

### 4. Size guidelines

| Metric | Target | Maximum |
|--------|--------|---------|
| Lines changed | 200–400 | 600 |
| Review time | 20–30 min | 45 min |
| Files touched | 3–8 | 12 |

If a PR exceeds these, decompose further.

### 5. Rebase discipline

When a lower PR changes after review feedback:
- All PRs above it in the stack must be rebased
- CI must pass on each PR independently after rebase
- Reviewers are notified of the rebase (GitHub does this automatically)

### 6. Sub-issue linking

- Parent issue lists all sub-issues with a stack visualization diagram
- Each sub-issue references the parent and its position in the stack
- GitHub's task list in the parent tracks completion
- Estimated review time is listed per sub-issue to help reviewers plan
Comment thread
scottschreckengaust marked this conversation as resolved.
- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup

### 7. When NOT to use stacked PRs

- Changes under ~200 lines that fit naturally in one PR
- Hotfixes that need immediate merge
- Dependency bumps (use Dependabot grouping instead)
- Documentation-only changes that are self-contained

### 8. Merge semantics

The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed:

1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit <N> --base main` or GitHub's "Retarget" button.
2. **Rebase** each retargeted PR onto its new base so the diff is clean.
3. **CI must pass** on each retargeted PR independently after rebase.

After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception.

**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains.

## Consequences

- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min)
- (+) Agents can pick up any sub-issue independently — the position statement provides full context
- (+) Partial delivery is meaningful — each merged PR adds value independently
- (+) Reviewers approve incrementally without needing full-stack mental context
- (+) Early PRs can merge and ship while later ones are still in review
- (-) Rebase cascades when early PRs receive feedback
- (-) More overhead in PR descriptions and branch management
- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1")
- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks

## References

- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs)
- RFC #120 — first formal use of this pattern in ABCA
- Issue #129 — implementation of this ADR
69 changes: 69 additions & 0 deletions docs/decisions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Architecture Decision Records (ADRs)

This directory captures significant design decisions for the ABCA project. Each ADR explains **why** a decision was made — not just what was decided — so that future contributors (human and AI) can understand the reasoning without excavating git history or PR discussions.

## When to write an ADR

Write an ADR when a decision:

- Affects multiple packages or the overall architecture
- Establishes a pattern other code will follow
- Is non-obvious — a reasonable person might choose differently
- Is hard to reverse once implemented

Do **not** write an ADR for routine implementation choices that are self-evident from the code.

## Template

```markdown
# ADR-NNN: Title

**Status:** proposed | accepted | superseded | deprecated
**Date:** YYYY-MM-DD
**Supersedes:** ADR-NNN (if applicable)
**Superseded by:** ADR-NNN (if applicable)

## Context

What is the problem or situation that requires a decision? Include constraints, requirements, and forces at play.

## Decision

What was decided and why. Be specific — name the approach chosen.

## Consequences

What follows from this decision:
- (+) Positive outcomes
- (-) Negative outcomes or trade-offs
- (!) Risks or things to watch

## References

- Links to RFCs, issues, PRs, or external resources that informed the decision
```

## Numbering

ADRs are numbered sequentially with zero-padded three-digit prefixes: `001-slug.md`, `002-slug.md`, etc. Numbers are never reused.

## Lifecycle

| Status | Meaning |
|--------|---------|
| `proposed` | Under discussion, not yet binding |
| `accepted` | Active and authoritative |
| `superseded` | Replaced by a newer ADR (link to successor) |
| `deprecated` | No longer applicable (context changed) |

A decision starts as `proposed` during RFC discussion and moves to `accepted` when the implementing PR merges. To change an accepted decision, write a new ADR that supersedes it — do not edit the original.

## Relationship to `docs/design/`

Design documents describe system shape, interfaces, and implementation detail. ADRs capture cross-cutting choices that constrain multiple designs. When a design decision is significant enough to be "hard to reverse" or "non-obvious," extract it as an ADR and reference it from the design doc. An ADR may supersede another ADR; a design doc is simply updated in place.

## Discovery

- **Agents:** `AGENTS.md` routes to this directory for understanding past design rationale.
- **Humans:** Browse this directory or the docs site under the "Decisions" section.
- **Search:** Each ADR title and context section are written to be grep-friendly.
3 changes: 3 additions & 0 deletions docs/scripts/sync-starlight.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -262,5 +262,8 @@ mirrorMarkdownFile(
// AGENTS.md reference for contributors. The rename happens only at the site level.
mirrorDirectory(path.join(docsRoot, 'design'), path.join('src', 'content', 'docs', 'architecture'));

// --- Decision records (ADRs): mirror to decisions/ ---
mirrorDirectory(path.join(docsRoot, 'decisions'), path.join('src', 'content', 'docs', 'decisions'));

// Guardrail: ensure target tree exists when running in a clean checkout.
fs.mkdirSync(targetRoot, { recursive: true });
124 changes: 124 additions & 0 deletions docs/src/content/docs/decisions/001-stacked-pull-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: 001 stacked pull requests
---

# ADR-001: Stacked pull requests for multi-PR features

**Status:** accepted
**Date:** 2026-05-19

## Context

Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems:

- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes.
- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains.
- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues.
- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation).

The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high.

## Decision

Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules:

### 1. Position statement

Every PR description states its position:

```markdown
## Stack position

PR {N} for #{parent-issue} — {overall goal one-liner}

### Prior: {what the previous PR delivered}
### This PR: {what this adds}
### Next (optional): {what comes next, if scope is known}
```

This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress.

### 2. Branch targeting

- PR 1 targets `main`
- PR N targets PR N-1's branch
- Final PR merges the full stack to `main`

```
main
└── feat/first-concern (PR 1)
└── feat/second-concern (PR 2)
└── feat/third-concern (PR 3 → merge to main)
```

### 3. Self-contained reviewability

Each PR:
- Compiles and passes tests independently
- Can be deployed without breaking the system (see exception below)
- Has a single clear responsibility (one concern per PR)
- Does not leave dead code, TODOs, or broken intermediate states

**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables.

### 4. Size guidelines

| Metric | Target | Maximum |
|--------|--------|---------|
| Lines changed | 200–400 | 600 |
| Review time | 20–30 min | 45 min |
| Files touched | 3–8 | 12 |

If a PR exceeds these, decompose further.

### 5. Rebase discipline

When a lower PR changes after review feedback:
- All PRs above it in the stack must be rebased
- CI must pass on each PR independently after rebase
- Reviewers are notified of the rebase (GitHub does this automatically)

### 6. Sub-issue linking

- Parent issue lists all sub-issues with a stack visualization diagram
- Each sub-issue references the parent and its position in the stack
- GitHub's task list in the parent tracks completion
- Estimated review time is listed per sub-issue to help reviewers plan
- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup

### 7. When NOT to use stacked PRs

- Changes under ~200 lines that fit naturally in one PR
- Hotfixes that need immediate merge
- Dependency bumps (use Dependabot grouping instead)
- Documentation-only changes that are self-contained

### 8. Merge semantics

The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed:

1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit <N> --base main` or GitHub's "Retarget" button.
2. **Rebase** each retargeted PR onto its new base so the diff is clean.
3. **CI must pass** on each retargeted PR independently after rebase.

After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception.

**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains.

## Consequences

- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min)
- (+) Agents can pick up any sub-issue independently — the position statement provides full context
- (+) Partial delivery is meaningful — each merged PR adds value independently
- (+) Reviewers approve incrementally without needing full-stack mental context
- (+) Early PRs can merge and ship while later ones are still in review
- (-) Rebase cascades when early PRs receive feedback
- (-) More overhead in PR descriptions and branch management
- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1")
- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks

## References

- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs)
- RFC #120 — first formal use of this pattern in ABCA
- Issue #129 — implementation of this ADR
Loading