aws-samples · scottschreckengaust · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
@@ -22,6 +22,7 @@ Use this routing before editing so the right package and tests get updated:
 | Agent runtime (clone, tools, prompts, container) | `agent/src/` (`pipeline.py`, `runner.py`, `config.py`, `hooks.py`, `policy.py`, `prompts/`, Dockerfile, etc.) | `agent/tests/`, `agent/README.md` for env/PAT |
 | Agent progress events (written to `TaskEventsTable` from the MicroVM; read by `bgagent watch`) | `agent/src/progress_writer.py`, `agent/src/pipeline.py` and `agent/src/runner.py` (integration points) | `agent/tests/test_progress_writer.py`; `cli/src/commands/watch.ts` for the consumer side |
 | User-facing or design prose | `docs/guides/`, `docs/design/` | Run **`mise //docs:sync`** or **`mise //docs:build`** (do not edit `docs/src/content/docs/` by hand) |
+| Architecture decisions (ADRs) | `docs/decisions/` | Run **`mise //docs:sync`** after adding or editing an ADR |
 | Monorepo tasks, CI glue | Root `mise.toml`, `scripts/`, `.github/workflows/` | — |
 
 ### CDK handler tests (quick map)

@@ -28,6 +28,7 @@ Guidelines:
 - If you change API types in `cdk/src/handlers/shared/types.ts`, update `cli/src/types.ts` to match.
 - If you change docs sources (`docs/guides/`, `docs/design/`), run `mise //docs:sync` so generated content stays in sync.
 - For significant features, add a design document to `docs/design/`.
+- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](./docs/decisions/README.md)).
 
 ### 4. Commit
 

@@ -0,0 +1,120 @@
+# ADR-001: Stacked pull requests for multi-PR features
+
+**Status:** accepted
+**Date:** 2026-05-19
+
+## Context
+
+Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems:
+
+- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes.
+- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains.
+- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues.
+- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation).
+
+The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high.
+
+## Decision
+
+Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules:
+
+### 1. Position statement
+
+Every PR description states its position:
+
+```markdown
+## Stack position
+
+PR {N} for #{parent-issue} — {overall goal one-liner}
+
+### Prior: {what the previous PR delivered}
+### This PR: {what this adds}
+### Next (optional): {what comes next, if scope is known}
+```
+
+This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress.
+
+### 2. Branch targeting
+
+- PR 1 targets `main`
+- PR N targets PR N-1's branch
+- Final PR merges the full stack to `main`
+
+```
+main
+ └── feat/first-concern       (PR 1)
+      └── feat/second-concern  (PR 2)
+           └── feat/third-concern   (PR 3 → merge to main)
+```
+
+### 3. Self-contained reviewability
+
+Each PR:
+- Compiles and passes tests independently
+- Can be deployed without breaking the system (see exception below)
+- Has a single clear responsibility (one concern per PR)
+- Does not leave dead code, TODOs, or broken intermediate states
+
+**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables.
+
+### 4. Size guidelines
+
+| Metric | Target | Maximum |
+|--------|--------|---------|
+| Lines changed | 200–400 | 600 |
+| Review time | 20–30 min | 45 min |
+| Files touched | 3–8 | 12 |
+
+If a PR exceeds these, decompose further.
+
+### 5. Rebase discipline
+
+When a lower PR changes after review feedback:
+- All PRs above it in the stack must be rebased
+- CI must pass on each PR independently after rebase
+- Reviewers are notified of the rebase (GitHub does this automatically)
+
+### 6. Sub-issue linking
+
+- Parent issue lists all sub-issues with a stack visualization diagram
+- Each sub-issue references the parent and its position in the stack
+- GitHub's task list in the parent tracks completion
+- Estimated review time is listed per sub-issue to help reviewers plan
+- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup
+
+### 7. When NOT to use stacked PRs
+
+- Changes under ~200 lines that fit naturally in one PR
+- Hotfixes that need immediate merge
+- Dependency bumps (use Dependabot grouping instead)
+- Documentation-only changes that are self-contained
+
+### 8. Merge semantics
+
+The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed:
+
+1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit <N> --base main` or GitHub's "Retarget" button.
+2. **Rebase** each retargeted PR onto its new base so the diff is clean.
+3. **CI must pass** on each retargeted PR independently after rebase.
+
+After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception.
+
+**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains.
+
+## Consequences
+
+- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min)
+- (+) Agents can pick up any sub-issue independently — the position statement provides full context
+- (+) Partial delivery is meaningful — each merged PR adds value independently
+- (+) Reviewers approve incrementally without needing full-stack mental context
+- (+) Early PRs can merge and ship while later ones are still in review
+- (-) Rebase cascades when early PRs receive feedback
+- (-) More overhead in PR descriptions and branch management
+- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1")
+- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks
+
+## References
+
+- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs)
+- RFC #120 — first formal use of this pattern in ABCA
+- Issue #129 — implementation of this ADR
@@ -0,0 +1,69 @@
+# Architecture Decision Records (ADRs)
+
+This directory captures significant design decisions for the ABCA project. Each ADR explains **why** a decision was made — not just what was decided — so that future contributors (human and AI) can understand the reasoning without excavating git history or PR discussions.
+
+## When to write an ADR
+
+Write an ADR when a decision:
+
+- Affects multiple packages or the overall architecture
+- Establishes a pattern other code will follow
+- Is non-obvious — a reasonable person might choose differently
+- Is hard to reverse once implemented
+
+Do **not** write an ADR for routine implementation choices that are self-evident from the code.
+
+## Template
+
+```markdown
+# ADR-NNN: Title
+
+**Status:** proposed | accepted | superseded | deprecated
+**Date:** YYYY-MM-DD
+**Supersedes:** ADR-NNN (if applicable)
+**Superseded by:** ADR-NNN (if applicable)
+
+## Context
+
+What is the problem or situation that requires a decision? Include constraints, requirements, and forces at play.
+
+## Decision
+
+What was decided and why. Be specific — name the approach chosen.
+
+## Consequences
+
+What follows from this decision:
+- (+) Positive outcomes
+- (-) Negative outcomes or trade-offs
+- (!) Risks or things to watch
+
+## References
+
+- Links to RFCs, issues, PRs, or external resources that informed the decision
+```
+
+## Numbering
+
+ADRs are numbered sequentially with zero-padded three-digit prefixes: `001-slug.md`, `002-slug.md`, etc. Numbers are never reused.
+
+## Lifecycle
+
+| Status | Meaning |
+|--------|---------|
+| `proposed` | Under discussion, not yet binding |
+| `accepted` | Active and authoritative |
+| `superseded` | Replaced by a newer ADR (link to successor) |
+| `deprecated` | No longer applicable (context changed) |
+
+A decision starts as `proposed` during RFC discussion and moves to `accepted` when the implementing PR merges. To change an accepted decision, write a new ADR that supersedes it — do not edit the original.
+
+## Relationship to `docs/design/`
+
+Design documents describe system shape, interfaces, and implementation detail. ADRs capture cross-cutting choices that constrain multiple designs. When a design decision is significant enough to be "hard to reverse" or "non-obvious," extract it as an ADR and reference it from the design doc. An ADR may supersede another ADR; a design doc is simply updated in place.
+
+## Discovery
+
+- **Agents:** `AGENTS.md` routes to this directory for understanding past design rationale.
+- **Humans:** Browse this directory or the docs site under the "Decisions" section.
+- **Search:** Each ADR title and context section are written to be grep-friendly.
@@ -262,5 +262,8 @@ mirrorMarkdownFile(
 // AGENTS.md reference for contributors. The rename happens only at the site level.
 mirrorDirectory(path.join(docsRoot, 'design'), path.join('src', 'content', 'docs', 'architecture'));
 
+// --- Decision records (ADRs): mirror to decisions/ ---
+mirrorDirectory(path.join(docsRoot, 'decisions'), path.join('src', 'content', 'docs', 'decisions'));
+
 // Guardrail: ensure target tree exists when running in a clean checkout.
 fs.mkdirSync(targetRoot, { recursive: true });
@@ -0,0 +1,124 @@
+---
+title: 001 stacked pull requests
+---
+
+# ADR-001: Stacked pull requests for multi-PR features
+
+**Status:** accepted
+**Date:** 2026-05-19
+
+## Context
+
+Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems:
+
+- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes.
+- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains.
+- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues.
+- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation).
+
+The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high.
+
+## Decision
+
+Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules:
+
+### 1. Position statement
+
+Every PR description states its position:
+
+```markdown
+## Stack position
+
+PR {N} for #{parent-issue} — {overall goal one-liner}
+
+### Prior: {what the previous PR delivered}
+### This PR: {what this adds}
+### Next (optional): {what comes next, if scope is known}
+```
+
+This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress.
+
+### 2. Branch targeting
+
+- PR 1 targets `main`
+- PR N targets PR N-1's branch
+- Final PR merges the full stack to `main`
+
+```
+main
+ └── feat/first-concern       (PR 1)
+      └── feat/second-concern  (PR 2)
+           └── feat/third-concern   (PR 3 → merge to main)
+```
+
+### 3. Self-contained reviewability
+
+Each PR:
+- Compiles and passes tests independently
+- Can be deployed without breaking the system (see exception below)
+- Has a single clear responsibility (one concern per PR)
+- Does not leave dead code, TODOs, or broken intermediate states
+
+**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables.
+
+### 4. Size guidelines
+
+| Metric | Target | Maximum |
+|--------|--------|---------|
+| Lines changed | 200–400 | 600 |
+| Review time | 20–30 min | 45 min |
+| Files touched | 3–8 | 12 |
+
+If a PR exceeds these, decompose further.
+
+### 5. Rebase discipline
+
+When a lower PR changes after review feedback:
+- All PRs above it in the stack must be rebased
+- CI must pass on each PR independently after rebase
+- Reviewers are notified of the rebase (GitHub does this automatically)
+
+### 6. Sub-issue linking
+
+- Parent issue lists all sub-issues with a stack visualization diagram
+- Each sub-issue references the parent and its position in the stack
+- GitHub's task list in the parent tracks completion
+- Estimated review time is listed per sub-issue to help reviewers plan
+- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup
+
+### 7. When NOT to use stacked PRs
+
+- Changes under ~200 lines that fit naturally in one PR
+- Hotfixes that need immediate merge
+- Dependency bumps (use Dependabot grouping instead)
+- Documentation-only changes that are self-contained
+
+### 8. Merge semantics
+
+The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed:
+
+1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit <N> --base main` or GitHub's "Retarget" button.
+2. **Rebase** each retargeted PR onto its new base so the diff is clean.
+3. **CI must pass** on each retargeted PR independently after rebase.
+
+After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception.
+
+**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains.
+
+## Consequences
+
+- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min)
+- (+) Agents can pick up any sub-issue independently — the position statement provides full context
+- (+) Partial delivery is meaningful — each merged PR adds value independently
+- (+) Reviewers approve incrementally without needing full-stack mental context
+- (+) Early PRs can merge and ship while later ones are still in review
+- (-) Rebase cascades when early PRs receive feedback
+- (-) More overhead in PR descriptions and branch management
+- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1")
+- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks
+
+## References
+
+- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs)
+- RFC #120 — first formal use of this pattern in ABCA
+- Issue #129 — implementation of this ADR