diff --git a/AGENTS.md b/AGENTS.md
index 0346ac3..02296af 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -39,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
 
 ### Common mistakes
 
+- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md).
+- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/<issue-number>-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`.
 - Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
 - Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
 - Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.
diff --git a/docs/decisions/003-contribution-governance.md b/docs/decisions/003-contribution-governance.md
index 537b502..91761e7 100644
--- a/docs/decisions/003-contribution-governance.md
+++ b/docs/decisions/003-contribution-governance.md
@@ -11,6 +11,10 @@ The rules below define how any contributor — human or AI — picks up, owns, a
 
 ## Decision
 
+### No branches without an Issue
+
+Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.
+
 ### No PRs without an Issue
 
 Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.
@@ -27,9 +31,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto
 
 Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval.
 
-### Self-assignment on start
+### Assignments
 
-Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification.
+Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.
 
 ### Issue body as primary directive
 
@@ -47,10 +51,16 @@ Before implementation, the assigned contributor must:
 
 **Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?"
 
-**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
+**Predecessor validation (GraphQL dependency graph is authoritative):**
+- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
+- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
+- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
+- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
 
 **Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment.
 
+**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).
+
 **Final gate:** If all checks pass, comment "Starting implementation."
 
 ### Identity and attribution
@@ -65,6 +75,36 @@ Provide progress signals at checkpoints. If blocked or abandoning, comment and u
 
 CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.
 
+### Conversational approval is NOT issue approval
+
+A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:
+
+1. Create an issue with acceptance criteria
+2. Request the `approved` label from an admin
+3. Self-assign once approved
+4. Then begin implementation
+
+**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.
+
+**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.
+
+### Enforcement mechanisms (planned)
+
+Prose governance is necessary but insufficient. The following enforcement points are planned to prevent bypass progressively. Mechanisms are deployed incrementally — see #186 for implementation tracking.
+
+| Mechanism | Layer | What it catches | Status |
+|-----------|-------|-----------------|--------|
+| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" | Implemented |
+| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference | Planned |
+| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` | Planned |
+| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API | Planned |
+| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context | Planned |
+| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue | Planned |
+
+**Transition:** Branch naming and commit-msg rules apply to branches created after the corresponding hooks are deployed. Existing branches (including this PR's) pre-date enforcement.
+
+**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (see ADR-012, proposed, for the skill model).
+
 ## Consequences
 
 - (+) Prevents duplicate effort — assignment signals ownership
@@ -72,13 +112,18 @@ CI passes before requesting review. After merge, verify acceptance criteria and
 - (+) Prevents rework — predecessor validation catches out-of-order work
 - (+) Issue body stays current — threads are folded back
 - (+) Cross-reference audit catches duplicates early
+- (+) Enforcement mechanisms catch bypass at multiple points
 - (-) Pre-start overhead for small tasks
 - (-) Requires discipline to fold threads into body
+- (-) Commit-msg hook adds friction for rapid iteration on approved work
 - (!) Assumes priority labels exist and are maintained
 - (!) Self-assignment is not atomic — concurrent agents may race; mitigate by verifying assignment after claiming via refresh
+- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral
 
 ## References
 
 - Issue #134 — full RFC with open questions and automation requirements
 - Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
 - ADR-001 — delivery methodology referenced by completion rules
+- ADR-012 (proposed) — operational knowledge stack; planned enforcement via skill gates
+- ADR-013 (proposed) — tiered validation; planned enforcement hooks at Tier 0 and Tier 1
diff --git a/docs/decisions/005-feedback-loop.md b/docs/decisions/005-feedback-loop.md
new file mode 100644
index 0000000..540b40b
--- /dev/null
+++ b/docs/decisions/005-feedback-loop.md
@@ -0,0 +1,68 @@
+# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience.
+
+Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains.
+
+## Decision
+
+### Review comment classification
+
+| Type | Action | Propagates to |
+|------|--------|---------------|
+| Nit (style, naming) | Fix in PR | Nothing |
+| Bug (logic error) | Fix in PR | Nothing (unless systemic) |
+| Design concern | Pause PR; evaluate | Issue body |
+| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) |
+| Scope question | Clarify | Issue body |
+| Blocker (won't approve as-is) | Pause PR | Issue body |
+
+### Upstream propagation
+
+When a review surfaces a design concern or architecture challenge:
+
+1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one.
+2. **Assess** — Does this invalidate the issue's approach? The ADR's decision?
+3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents).
+4. **Resolve** — Revise the approach, defend with evidence, or cancel the work.
+5. **Resume** — Once resolved, unblock the PR and dependents.
+
+### ADR evolution
+
+| Trigger | Response |
+|---------|----------|
+| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR |
+| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause |
+| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` |
+| Decision works but needs refinement | Amend via PR (minor, no new ADR) |
+
+Never silently ignore a challenged decision.
+
+### Stacked PR chain revision
+
+When feedback on PR N invalidates PRs N+1 through N+M:
+1. Comment on all affected PRs
+2. Do not rebase dependent PRs until the base is stable
+3. If architectural: re-evaluate whether the remaining stack is valid
+4. If redesign needed: close dependent PRs, revise issue, re-plan
+
+## Consequences
+
+- (+) Review insights propagate to architectural decisions
+- (+) Issue bodies stay current with implementation learnings
+- (+) ADRs evolve rather than silently becoming outdated
+- (+) Stacked PR chains have a defined recovery protocol
+- (-) Adds process overhead to reviews (classification step)
+- (-) Pausing stacked chains delays delivery
+- (!) Requires discipline to actually propagate feedback upstream
+
+## References
+
+- Issue #136 — full RFC with open questions
+- ADR-003 — governance (issue body as source of truth)
+- ADR-001 — stacked PRs (chain revision protocol)
diff --git a/docs/decisions/006-feature-flags.md b/docs/decisions/006-feature-flags.md
new file mode 100644
index 0000000..979187c
--- /dev/null
+++ b/docs/decisions/006-feature-flags.md
@@ -0,0 +1,82 @@
+# ADR-006: Feature flags for concurrent development
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits.
+
+Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other.
+
+## Decision
+
+### When to use flags
+
+| Situation | Use a flag? |
+|-----------|-------------|
+| Feature spans multiple PRs, incomplete state is unsafe | Yes |
+| Two contributors touch the same module for different purposes | Yes |
+| SRE needs a kill switch for a new capability | Yes |
+| Simple refactor with no behavioral change | No |
+| Bug fix | No |
+| One-PR feature, complete on merge | No |
+
+### Flag ownership
+
+- Every flag has an owner (the issue that introduced it)
+- Every flag has an expiration (the issue/PR that removes it)
+- Flags without a removal plan are rejected in review
+
+### Separation of concerns
+
+- **Planners** decide which features get flags (issue/RFC level)
+- **Implementors** add/use flags in code (PR level)
+- **SRE/operators** toggle flags in production (runtime level)
+- **No self-approval** — the person who introduces a flag cannot approve its removal
+
+### Flag lifecycle
+
+1. **Proposed** — issue identifies the need for a flag
+2. **Introduced** — PR adds the flag (default: off)
+3. **Active** — feature behind flag is in development
+4. **Verified** — feature complete, flag toggled on in testing
+5. **Permanent** — flag removed, feature is always-on (or removed entirely)
+
+### Lifecycle metadata
+
+Each flag must track:
+
+| Field | Required | Source |
+|-------|----------|--------|
+| Flag name | Yes | Code constant |
+| Purpose / linked issue | Yes | Issue reference |
+| First merge date | Yes | Auto from git log |
+| Max lifetime | Yes | Declared at creation (default: 4 weeks) |
+| Expected removal date | Yes | first_merge + max_lifetime |
+| Actual removal date | — | Auto when flag deleted |
+| Days active | — | Computed |
+
+### Maximum lifetime
+
+Flags must be removed within the declared max lifetime (default: 4 weeks) of the feature being verified. The max lifetime can be overridden per-flag with justification in the issue. Stale flags are treated as technical debt and surfaced in periodic reviews.
+
+### Mechanism constraint
+
+Flags MUST be resolvable at synth time for infrastructure flags and at runtime for behavior flags. The specific storage mechanism (CDK context, DynamoDB, SSM Parameter Store, env vars) is context-dependent and follows from this split — it is not prescribed by this ADR.
+
+## Consequences
+
+- (+) Concurrent work proceeds without blocking
+- (+) Trunk-based development: main stays deployable
+- (+) SRE can disable features without code changes
+- (+) Partial features merge safely
+- (-) Flag management overhead
+- (-) Combinatorial testing complexity if many flags exist simultaneously
+- (!) Maximum lifetime must be enforced or flags accumulate indefinitely
+
+## References
+
+- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars)
+- ADR-003 — governance (flag introduction requires approval)
+- ADR-005 — feedback loop (reviewer may flag-gate a feature during review)
diff --git a/docs/decisions/007-knowledge-acquisition.md b/docs/decisions/007-knowledge-acquisition.md
new file mode 100644
index 0000000..8f9f7fd
--- /dev/null
+++ b/docs/decisions/007-knowledge-acquisition.md
@@ -0,0 +1,79 @@
+# ADR-007: Knowledge acquisition through progressive failure
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured.
+
+Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures.
+
+## Decision
+
+### Zero-context execution attempts
+
+Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues.
+
+### Failure capture protocol
+
+At each failure point, the agent:
+1. **Stops** — does not attempt to work around or guess
+2. **Documents** — creates an issue: which document, which step, what was missing
+3. **Continues** — attempts the next step (if possible) to find additional gaps
+
+### Retrospectives
+
+After completing a task, project milestone, or sprint, agents produce a retrospective artifact:
+- What worked well (patterns to repeat)
+- What failed or caused friction (patterns to avoid)
+- Actionable experiments for future workflows
+
+Retrospectives are a first-class knowledge artifact — they feed into documentation improvements, inform ADR amendments, and surface systemic issues that individual task failures cannot.
+
+### Knowledge artifacts (interim)
+
+Until documentation meets ADR-004, agents may create ephemeral artifacts:
+- Semantic indices of the codebase (call graphs, dependency maps)
+- Annotated walkthroughs of successful executions
+- "What I learned" summaries after completing a task
+- Retrospectives (see above)
+
+These are scaffolding that informs documentation improvements, not documentation themselves.
+
+### Maturity model
+
+| Level | State | Agent capability |
+|-------|-------|-----------------|
+| 0 | No docs | Cannot start; files issue for missing docs |
+| 1 | Partial docs | Follows docs, stops at gaps, files issues |
+| 2 | Complete docs (ADR-004) | Completes end-to-end without help |
+| 3 | Self-improving | Detects drift between docs and code, auto-files issues |
+
+### The self-improvement loop
+
+```
+Agent starts fresh → follows docs → hits failure →
+  files issue → issue gets fixed → next agent goes further →
+    hits next failure → files issue → ...
+      until end-to-end works from zero context
+```
+
+This runs continuously because code changes outpace documentation and different agent implementations fail at different points.
+
+## Consequences
+
+- (+) Documentation gaps become bugs with reproduction steps
+- (+) Priority ordering emerges naturally (most common failures surface first)
+- (+) The system self-improves without human identification of gaps
+- (+) Creates a natural definition of "docs are done" (Level 2 achieved)
+- (-) Generates issue volume that needs triage
+- (-) Requires periodic investment in zero-context test runs
+- (!) The gap between Level 1 and Level 2 may be large — patience required
+
+## References
+
+- Issue #138 — full RFC with open questions
+- ADR-004 — defines the quality target (tabula rasa test)
+- ADR-003 — governance for issues filed by failing agents
+- ADR-008 — Level 4 Definition of Done depends on this protocol
diff --git a/docs/decisions/008-definition-of-done.md b/docs/decisions/008-definition-of-done.md
new file mode 100644
index 0000000..e552ec8
--- /dev/null
+++ b/docs/decisions/008-definition-of-done.md
@@ -0,0 +1,82 @@
+# ADR-008: Definition of Done (progressive maturity)
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+"Done" is implicit and varies by contributor. Some consider a passing build sufficient; others expect documentation, tests, and deployment verification. Agents have no unambiguous checklist to know they have completed work. Over-engineering "done" early blocks velocity; under-defining it ships incomplete work.
+
+The definition must be progressive — rising as the project matures — so it does not block early momentum but ensures quality at scale.
+
+## Decision
+
+### Progressive levels
+
+**Level 1 — Basic (minimum viable):**
+- Code compiles without errors
+- Existing tests pass (no regressions)
+- New code has tests (unit level minimum)
+- Linting passes
+- PR description explains what and why
+- Linked issue exists
+
+**Level 2 — Standard (current project default):**
+- All of Level 1
+- Pre-commit hooks pass
+- CDK synth succeeds (if infrastructure changes)
+- Security scans pass (no new HIGH/CRITICAL findings)
+- Documentation updated if behavior changes
+- Starlight mirrors synced (if docs changed)
+
+**Level 3 — Rigorous (critical paths):**
+- All of Level 2
+- Integration or E2E test covers the happy path
+- Error paths tested
+- Reviewer approved (human or qualified agent)
+- Deployed to ephemeral stack and smoke-tested (if infrastructure)
+- ADR written (if architectural decision made)
+
+**Level 4 — Self-verifying (future target):**
+- All of Level 3
+- Tabula rasa agent can replicate the outcome using only docs
+- CI includes behavioral verification
+- Documentation drift detection passes
+
+### Default level by issue type
+
+| Issue type | Default level |
+|-----------|---------------|
+| Bug fix | Level 2 |
+| New feature | Level 2-3 (based on blast radius) |
+| Infrastructure/IAM change | Level 3 |
+| Documentation only | Level 1 |
+| Security fix | Level 3 |
+| RFC/ADR implementation | Level 2 + ADR written |
+
+Issues may override by specifying `Done: Level N` in the body.
+
+### Verification responsibility
+
+| Level | Who verifies |
+|-------|-------------|
+| 1 | CI (automated) |
+| 2 | CI + self-check by implementor |
+| 3 | CI + reviewer + implementor |
+| 4 | CI + reviewer + independent agent |
+
+## Consequences
+
+- (+) Agents have an unambiguous completion checklist
+- (+) Quality bar rises as the project matures
+- (+) Over-engineering is prevented (Level 1 for simple docs changes)
+- (+) Critical paths get rigorous verification (Level 3)
+- (-) Requires labeling or explicit level assignment per issue
+- (-) Level 4 is aspirational and depends on ADR-007 (knowledge acquisition)
+- (!) The project must eventually graduate from Level 2 to Level 3 default
+
+## References
+
+- Issue #139 — full RFC with open questions
+- ADR-003 — governance (defines when to start; this defines when to stop)
+- ADR-007 — knowledge acquisition (Level 4 depends on tabula rasa verification)
diff --git a/docs/decisions/009-security-posture-dev-agents.md b/docs/decisions/009-security-posture-dev-agents.md
new file mode 100644
index 0000000..6a67fd7
--- /dev/null
+++ b/docs/decisions/009-security-posture-dev-agents.md
@@ -0,0 +1,73 @@
+# ADR-009: Security posture and blast radius for development-time agents
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+The existing `SECURITY.md` covers runtime agent execution (inside MicroVMs). It does not cover **development-time agents** — those writing code, creating PRs, and modifying infrastructure in this repository. A development-time agent operates with the credentials of whoever invoked it, creating a risk of self-approval, policy modification, and unbounded blast radius.
+
+The core principle: **planners and implementors must be separated by context and ideally by identity. No self-approval.**
+
+## Decision
+
+### Role separation
+
+| Role | Can do | Cannot do |
+|------|--------|-----------|
+| **Planner** | Create/edit issues, write RFCs/ADRs, define roadmap and revisit vision | Write code, push branches, approve PRs |
+| **Implementor** | Write code, create PRs, push branches, run tests | Approve own PRs, merge own PRs, modify CI/security config |
+| **Reviewer** | Approve PRs, request changes, merge, suggest code (no commits) | Write code on the same PR being reviewed |
+| **Admin** | All of the above + modify policies, approve issues | Still requires 2P for policy changes |
+
+### Blast radius classification
+
+| Action | Risk | Gate |
+|--------|------|------|
+| Edit code in existing patterns | Low | CI + peer review |
+| Add new dependency | Medium | Security scan + review |
+| Modify IAM policy / security config | High | 2P review + admin approval |
+| Modify CI/CD workflow | High | 2P review + admin approval |
+| Modify branch protection / approval rules | Critical | Admin-only + audit trail |
+| Modify governance ADRs | Critical | Admin-only + 2P review |
+| Delete or force-push protected branches | Critical | Never automated; human-only |
+
+### 2P (two-person) review
+
+For High and Critical actions:
+- The author cannot be one of the two approvers
+- At least one approver must be a human
+- Approvals reference the specific risk being accepted
+
+### No self-approval (structural)
+
+- Branch protection requires review from someone other than the pusher
+- Approval cannot come from the last committer on the branch
+- If an agent plans AND implements, review must come from an identity that did neither
+- The identity that writes code cannot approve or merge it
+
+### Credential scoping
+
+| Agent context | Minimum credentials |
+|---------------|-------------------|
+| Planning (issues, RFCs) | GitHub Issues write, read-only repo |
+| Implementation (code, PRs) | Repo write, PR create, no merge capability |
+| Review | PR review write, no push capability |
+| Deployment | Separate deploy key, environment approval gate |
+
+## Consequences
+
+- (+) Prevents self-approval of dangerous changes
+- (+) Blast radius is explicit and enforceable
+- (+) Role separation enables audit trail
+- (+) 2P review catches compromised or confused agents
+- (-) Credential management complexity increases
+- (-) Small tasks require multi-identity orchestration
+- (!) Personal PATs grant all permissions — structural enforcement requires GitHub Apps or fine-grained tokens
+
+## References
+
+- Issue #140 — full RFC with open questions
+- `docs/design/SECURITY.md` — runtime agent security (complementary)
+- Cedar HITL gates (PR #88) — runtime tool-call governance
+- ADR-003 — governance (approval gates enforced here technically)
diff --git a/docs/decisions/010-error-recovery.md b/docs/decisions/010-error-recovery.md
new file mode 100644
index 0000000..dd3883e
--- /dev/null
+++ b/docs/decisions/010-error-recovery.md
@@ -0,0 +1,69 @@
+# ADR-010: Error recovery and rollback protocol
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+When merged code breaks something, the response is ad-hoc. Agents operating autonomously may merge code that passes CI but breaks integration. No protocol defines when to revert vs. fix forward, who decides, or how stacked PR chains recover.
+
+## Decision
+
+### Decision tree
+
+```
+Broken thing detected
+├─ Production affected (users impacted NOW)?
+│  └─ Yes → REVERT immediately, investigate after
+├─ Fix obvious and < 30 minutes?
+│  └─ Yes → Fix forward (new PR, not amend)
+├─ Stacked PR chain?
+│  └─ Yes → Pause dependent PRs, fix the base
+└─ Scope of damage unclear?
+   └─ Yes → REVERT (safe default), then investigate
+```
+
+### Revert protocol
+
+1. Create a revert commit (not force-push) — preserves history
+2. Open an issue: what broke, why CI did not catch it, what the fix needs
+3. The fix goes through normal review (no rushing, no skipping gates)
+
+### Fix-forward protocol
+
+1. Only if the fix is obvious, small, and low-risk
+2. Must still go through PR + review
+3. If the fix introduces new complexity — revert instead
+
+### Stacked PR chain recovery
+
+1. Identify which PR introduced the breakage
+2. Pause/close all PRs above it
+3. Fix the base PR
+4. Rebase and re-evaluate dependent PRs
+5. Re-run CI on each before re-opening
+
+### Agents must NEVER do during recovery
+
+- Force-push to shared branches
+- Delete branches with others' work
+- Amend published commits
+- Skip review "because it's urgent"
+- Self-approve a revert
+
+## Consequences
+
+- (+) Clear decision tree prevents analysis paralysis during incidents
+- (+) Revert-first default limits blast radius
+- (+) Stacked chain recovery is defined (not improvised)
+- (+) History is preserved (revert commits, not force-push)
+- (-) Reverts create noise in git history
+- (-) Fix-forward temptation may lead to rushed fixes
+- (!) "Production affected" requires definition per deployment (self-hosted varies)
+
+## References
+
+- Issue #141 — full RFC with open questions
+- ADR-003 — governance (no bypasses during recovery)
+- ADR-001 — stacked PRs (chain recovery protocol)
+- ADR-009 — security (revert authority tied to role)
diff --git a/docs/decisions/011-conflict-resolution.md b/docs/decisions/011-conflict-resolution.md
new file mode 100644
index 0000000..067d05e
--- /dev/null
+++ b/docs/decisions/011-conflict-resolution.md
@@ -0,0 +1,64 @@
+# ADR-011: Conflict resolution protocol
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Multiple concurrent contributors — human or AI — will propose incompatible approaches, create merge conflicts, and disagree on design. Without a defined escalation path, work stalls or the loudest voice wins.
+
+## Decision
+
+### Escalation ladder
+
+```
+Level 1: Contributor discussion (PR comments, issue thread)
+   ↓ (no resolution within 2 interactions)
+Level 2: Request additional reviewer (fresh perspective)
+   ↓ (still no resolution)
+Level 3: Competing proposals in the issue body (explicit trade-off comparison)
+   ↓ (still no resolution)
+Level 4: Admin decision (binding, documented in issue body)
+```
+
+### Decision criteria
+
+When comparing approaches, evaluate on:
+1. **Correctness** — does it solve the stated problem?
+2. **Simplicity** — fewer moving parts wins when correctness is equal
+3. **Consistency** — follows existing codebase patterns?
+4. **Reversibility** — can we change our mind later?
+5. **Blast radius** — what breaks if this is wrong?
+
+### Merge conflict ownership
+
+| Situation | Who resolves |
+|-----------|-------------|
+| Two PRs modify same file, one merged first | Second PR's author rebases |
+| Stacked PR conflict from lower change | Lower PR author notifies; upper PRs rebase after stable |
+| Concurrent agents modified same module | First to merge wins; second adapts |
+| Architectural conflict (both valid) | Escalate to Level 3 |
+
+### Human vs. agent disagreement
+
+- Agents present evidence (code, tests, measurements) not authority
+- Humans can override but must document why
+- Agents do not repeatedly argue a rejected point
+- If an agent believes a human decision causes harm (security, data loss), it escalates to admin
+
+## Consequences
+
+- (+) Disagreements have a defined path to resolution
+- (+) Merge conflicts have clear ownership
+- (+) Competing approaches are compared on criteria, not authority
+- (+) Admin decision is the final backstop (no infinite loops)
+- (-) Escalation takes time; may slow delivery
+- (-) Level 3 (written trade-off) requires effort
+- (!) Must not become a veto mechanism for slow contributors
+
+## References
+
+- Issue #142 — full RFC with open questions
+- ADR-003 — governance (issue body as resolution record)
+- ADR-005 — feedback loop (reviewer disagreements feed into this)
+- ADR-009 — security (authority levels for decisions)
diff --git a/docs/decisions/012-operational-knowledge-stack.md b/docs/decisions/012-operational-knowledge-stack.md
new file mode 100644
index 0000000..273b6e3
--- /dev/null
+++ b/docs/decisions/012-operational-knowledge-stack.md
@@ -0,0 +1,265 @@
+# ADR-012: Operational knowledge as a three-layer stack (Decision → Guide → Skill)
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Several ADRs in this repository contain operational runbook material embedded directly in the decision record. ADR-003 (contribution governance) prescribes a full pre-start review checklist. ADR-010 (error recovery) defines a decision tree and step-by-step protocols. ADR-008 (definition of done) provides per-issue-type checklists.
+
+This creates three problems:
+
+1. **Stale procedures** — Teams hesitate to update ADRs for minor procedural tweaks (timing thresholds, label names), so runbooks drift from practice.
+2. **Agent execution gap** — Agents must parse prose ADRs, extract the operational steps, and interpret judgment calls. The ADR format is optimized for decision rationale, not execution.
+3. **Persona mismatch** — A planner reading ADR-003 for the governance philosophy gets bogged down in GraphQL query syntax. An implementor executing the pre-start checklist must skip rationale paragraphs to find the steps.
+
+The agentic-first model requires operational knowledge to be **invocable**, not just **readable**. An agent should execute a governance workflow the same way it invokes a tool — with defined inputs, gates, and outputs.
+
+## Decision
+
+### Three-layer operational knowledge stack
+
+Every operational procedure identified in an ADR is decomposed into three layers:
+
+```
+┌─────────────────────────────────────────┐
+│  Layer 1: ADR (Decision Record)         │  Immutable-ish
+│  WHY we do it this way                  │  Changes: decision is superseded
+│  Consumer: architects, future deciders  │
+└─────────────────────────┬───────────────┘
+                          │ references
+┌─────────────────────────▼───────────────┐
+│  Layer 2: Guide (Reference Document)    │  Living document
+│  WHAT to do, organized by persona       │  Changes: process is refined
+│  Consumer: humans + agents needing      │
+│  context                                │
+└─────────────────────────┬───────────────┘
+                          │ operationalized by
+┌─────────────────────────▼───────────────┐
+│  Layer 3: Skill (Executable Runbook)    │  Versioned, invocable
+│  HOW to execute, with gates and outputs │  Changes: implementation shifts
+│  Consumer: agents during execution      │
+└─────────────────────────────────────────┘
+```
+
+### Layer definitions
+
+**Layer 1 — ADR (Decision Record)**
+
+- Records the architectural or process decision and its rationale
+- States WHAT was decided and WHY
+- Does NOT contain step-by-step procedures (those belong in Layer 2/3)
+- References the guide(s) that operationalize the decision
+- Changes only when the decision itself is superseded or amended
+
+**Layer 2 — Guide (Reference Document)**
+
+- Lives in `docs/guides/`
+- Organized by persona (planner, implementor, reviewer, admin)
+- Contains the WHAT and WHEN — what to do in which situations
+- Includes context that helps humans (and agents needing background) understand the workflow
+- References the ADR for justification
+- Links to the skill(s) that mechanize the workflow
+- Changes when the process is refined
+
+**Layer 3 — Skill (Executable Runbook)**
+
+- Lives as a Claude Code skill (or plugin skill) — invocable by name
+- Encodes the HOW — the mechanical execution with explicit gates, inputs, outputs
+- Structured as bounded, invocable units with clear entry/exit criteria
+- An agent invokes the skill rather than parsing the guide/ADR
+- References the guide for context when judgment is needed
+- Changes when implementation details shift
+
+### Reference direction
+
+References always point upward:
+
+- Skill → references Guide (for context)
+- Guide → references ADR (for justification)
+- ADR → references Guide (for operationalization, "see Guide X for the workflow")
+
+This means a change at any layer triggers review of layers below:
+
+- ADR amended → review Guide → review Skill
+- Guide refined → review Skill
+- Skill updated → no upstream change needed (unless the procedure itself changed)
+
+### When a layer is NOT needed
+
+| Situation | Layers needed |
+|-----------|---------------|
+| Pure policy decision (no steps to follow) | ADR only |
+| Decision with human-executed steps (rare, non-repeatable) | ADR + Guide |
+| Decision with agent-executable procedure | ADR + Guide + Skill |
+| Lightweight procedure (< 3 steps, no gates) | ADR + Guide (skill is overhead) |
+
+### ADR content rules (post-adoption)
+
+After adoption, ADRs:
+- **MUST** contain: Context, Decision (the choice made), Consequences, References
+- **MUST NOT** contain: Step-by-step procedures, checklists with >3 items, decision trees with branches, protocol sequences
+- **SHOULD** contain: A one-paragraph summary of the operational approach (enough to understand without reading the guide)
+- **SHOULD** reference: The guide that operationalizes the decision
+
+Existing ADRs are updated incrementally (not rewritten) — operational content is extracted, and a reference to the new guide/skill is added.
+
+### Skill structure requirements
+
+Skills that operationalize ADRs must:
+- State which ADR/guide they implement (in frontmatter or header)
+- Define explicit gates (conditions that MUST be true to proceed)
+- Define explicit outputs (what the skill produces on completion)
+- Be independently invocable (no implicit state from prior skills)
+- Fail loudly at gates (not silently skip)
+
+## Example: ADR-003 decomposition
+
+ADR-003 (Contribution Governance) is the first ADR to be decomposed under this pattern because it is the most frequently executed procedure and the dependency root for other governance ADRs.
+
+### Current state (ADR-003 contains everything)
+
+ADR-003 currently holds:
+- The decision to govern contributions (rationale) ✓ belongs in ADR
+- Pre-start review checklist (8 mechanical steps) ✗ belongs in Guide + Skill
+- Priority evaluation procedure ✗ belongs in Guide + Skill
+- Predecessor validation with GraphQL queries ✗ belongs in Skill
+- Cross-reference audit steps ✗ belongs in Guide + Skill
+- Work-in-progress discipline rules ✗ belongs in Guide
+- Completion and handoff procedure ✗ belongs in Guide + Skill
+
+### Target state (three layers)
+
+**Layer 1 — ADR-003 (slimmed)**
+
+Retains:
+- Context (why governance is needed for async agents)
+- Decision summary: "Every contribution follows: issue → approval → assignment → pre-start validation → implementation → completion"
+- The principles: no PRs without issues, issue quality bar, admin approval gate, no self-approval, GraphQL as authoritative dependency source
+- Consequences
+- Reference: "See `docs/guides/CONTRIBUTOR_WORKFLOW.md` for the full workflow"
+
+Removes (extracted to Guide/Skill):
+- The detailed pre-start review checklist
+- GraphQL query specifics
+- Step-by-step completion protocol
+
+**Layer 2 — `docs/guides/CONTRIBUTOR_WORKFLOW.md`**
+
+Organized by persona:
+
+```markdown
+# Contributor Workflow
+
+> Operationalizes [ADR-003](../decisions/003-contribution-governance.md)
+
+## For Planners
+- Issue quality bar (what makes an issue "ready")
+- Approval process
+- Priority labeling
+- Dependency graph maintenance
+
+## For Implementors
+- How to pick up an issue
+- Pre-start review (summary — invoke skill for execution)
+- Work-in-progress signals
+- Completion criteria (references ADR-008 guide)
+
+## For Reviewers
+- Review comment classification (references ADR-005 guide)
+- When to block vs. approve
+- Propagation responsibilities
+```
+
+**Layer 3 — Skills (invocable by agents)**
+
+| Skill | Inputs | Gates | Outputs |
+|-------|--------|-------|---------|
+| `pickup-issue` | Issue number | Issue approved, unassigned, no unresolved conflicts, predecessors complete (GraphQL check) | Assignment confirmed, "Starting implementation" comment |
+| `validate-dependencies` | Issue number | GraphQL `blockedBy` returns no open blockers | Dependency report (clear / blocked with reason) |
+| `complete-work` | Issue number, PR number | CI passes, DoD level met (ADR-008), no stale assignments | Completion comment, follow-up issues created |
+| `cross-reference-audit` | Issue number | No duplicate issues, no conflicting open PRs | Audit report (clear / conflicts listed) |
+
+Each skill is a bounded unit. An agent picking up work invokes `pickup-issue` — it doesn't read ADR-003 and improvise.
+
+## Why prose alone fails: observed failure mode
+
+This ADR was itself initially created in violation of ADR-003. The agent (author) had ADR-003 loaded in context, analyzed it, called it "ready for contributing" — then immediately began implementation without creating an issue, requesting approval, or self-assigning.
+
+**The rationalization chain:**
+1. "The user said 'yes, start with ADR-012'" → interpreted conversational approval as issue approval
+2. "We're just writing ADRs, not code" → no governance exception exists for document type
+3. "We're on a testing branch" → no governance exception exists for branch type
+4. "Momentum — we're exploring" → governance exists precisely to interrupt unstructured momentum
+
+**What this proves:** An agent with full knowledge of the governance rules will still bypass them when the rules are prose-only. The agent *understood* ADR-003 intellectually but had no structural enforcement preventing violation. Reading a rule is not the same as being gated by it.
+
+**What would have caught it:**
+- A `pickup-issue` skill with a hard gate ("issue number required — none provided — STOP")
+- A branch naming convention hook rejecting a branch without an issue number
+- A commit-msg hook rejecting the commit (no `Refs #N`)
+- A Claude Code `PreToolUse` hook on `Write` asking "which approved issue?"
+
+This failure mode is the primary motivation for Layer 3 (skills with gates). Prose governance (Layer 1) establishes the rule. Guides (Layer 2) explain how to follow it. But only executable skills with hard gates (Layer 3) *enforce* it at the point of action.
+
+## Migration plan
+
+### Phase 1: Establish pattern (this ADR)
+
+- Adopt this ADR
+- No existing ADRs are modified yet (operational content stays in place until guides/skills exist)
+
+### Phase 2: Decompose ADR-003 (proof of concept)
+
+- Create `docs/guides/CONTRIBUTOR_WORKFLOW.md`
+- Create skills: `pickup-issue`, `validate-dependencies`, `complete-work`, `cross-reference-audit`
+- Slim ADR-003 to decision + rationale + reference to guide
+- Validate: an agent can invoke the skills and complete the governance workflow
+
+### Phase 3: Decompose remaining ADRs (incremental)
+
+Priority order (by execution frequency and mechanical content):
+
+| ADR | Guide | Skills |
+|-----|-------|--------|
+| 010 (Error Recovery) | `ERROR_RECOVERY.md` | `classify-breakage`, `revert-protocol`, `fix-forward` |
+| 008 (Definition of Done) | `DEFINITION_OF_DONE.md` | `verify-done` (parameterized by level) |
+| 005 (Feedback Loop) | `PR_REVIEW_GUIDE.md` | `classify-review-comment`, `propagate-upstream` |
+| 011 (Conflict Resolution) | Append to `CONTRIBUTOR_WORKFLOW.md` | `resolve-conflict` (escalation ladder) |
+
+ADRs without operational content (001, 002, 004, 006, 007, 009) remain unchanged.
+
+### Phase 4: Plugin marketplace (future)
+
+Skills become shareable across projects:
+- Fork governance skills for team-specific thresholds
+- Compose skills from multiple ADRs into project-specific workflows
+- Version skills independently from the ADRs that justify them
+
+## Consequences
+
+- (+) ADRs stay stable as decision records — not burdened with procedure maintenance
+- (+) Guides serve the human reader organized by what they need to do
+- (+) Skills make agents execute consistently — no prose interpretation, no drift
+- (+) Change cadence is appropriate per layer — procedures evolve without "amending an ADR"
+- (+) The three layers serve different consumers without redundancy
+- (+) Skills are testable — you can verify an agent follows the procedure correctly
+- (+) Hard gates in skills prevent the "understood but violated" failure mode
+- (-) Three artifacts per procedure increases maintenance surface
+- (-) Migration of existing ADRs requires effort
+- (-) Skill development requires understanding the skill format and tooling
+- (!) Reference chain integrity must be maintained — a broken link between layers means drift goes undetected
+- (!) Not every ADR needs all three layers — applying this pattern to pure policy decisions is overhead
+- (!) Without Layer 3 enforcement, Layers 1 and 2 are advisory-only — agents WILL rationalize bypasses
+
+## References
+
+- Issue #148 — implementation tracking for this ADR
+- ADR-003 — first decomposition target (contribution governance); enforcement mechanisms added
+- ADR-004 — documentation quality standard (guides must meet tabula rasa test)
+- ADR-007 — knowledge acquisition (skills enable Level 3 self-improving)
+- ADR-008 — definition of done (skill `verify-done` is a natural fit)
+- ADR-010 — error recovery (decision tree is a natural skill)
+- ADR-013 (proposed) — tiered validation pyramid; depends on this ADR for skill-based agent interaction with validation tiers
+- [agentskills.io](https://agentskills.io/) — skill marketplace concept for shareable operational knowledge
+- Claude Code plugin/skill format — the implementation vehicle for Layer 3
diff --git a/docs/decisions/013-tiered-validation-pyramid.md b/docs/decisions/013-tiered-validation-pyramid.md
new file mode 100644
index 0000000..0c81750
--- /dev/null
+++ b/docs/decisions/013-tiered-validation-pyramid.md
@@ -0,0 +1,213 @@
+# ADR-013: Tiered validation pyramid for agentic-first development
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+The current validation architecture has two operational tiers:
+
+- **Pre-commit hooks** (< 5s) — formatting, secrets scan, file-level linting
+- **Remote CI** (5–20 min) — full build, test, synth, security scans, deploy verification
+
+The gap between these tiers is significant. When an agent (or human) makes a change that passes pre-commit but fails in CI, the feedback loop is:
+
+```
+Write code → commit → push → wait 5-20 min → CI fails →
+  read failure → fix → commit → push → wait 5-20 min → ...
+```
+
+For a human, this is annoying. For an autonomous agent, this is catastrophic:
+
+- **Compute waste** — the agent idles or context-switches while waiting for remote results
+- **Context loss** — by the time CI reports back, the agent may have compacted context or moved on
+- **Cascade failures** — in a stacked PR chain (ADR-001), a CI failure on PR 1 blocks PRs 2–N, multiplying the wait
+- **Cost amplification** — each round-trip costs inference tokens for the agent to re-read the failure, re-analyze, and re-attempt
+
+The root cause: there is no **Tier 2** — a local, fast, high-fidelity validation layer that catches integration-level issues *before* pushing to remote.
+
+### What exists today
+
+| Tier | Time | What it catches | Gap |
+|------|------|-----------------|-----|
+| Pre-commit (Tier 0) | < 5s | Formatting, secrets, trailing whitespace | None — works well |
+| mise build (Tier 1) | 30–90s | Compile, unit tests, CDK synth, docs sync, linting | Partial — available but not gated on push |
+| Remote CI (Tier 3) | 5–20 min | Full matrix, security, E2E, deploy | Authoritative but slow |
+| **Local integration (Tier 2)** | — | **Does not exist** | Integration-level validation without remote round-trip |
+
+### Agentic-first motivation
+
+In a repo where agents run autonomously (ABCA's own design goal), validation speed directly determines:
+
+- **Throughput** — an agent with 30s feedback loops delivers 10–20x more iterations per hour than one with 15-minute loops
+- **Quality** — fast feedback enables test-driven approaches; slow feedback encourages "push and pray"
+- **Cost** — fewer remote CI runs, fewer wasted inference tokens on retry cycles
+- **Autonomy** — an agent that can self-validate locally needs fewer human interventions
+
+## Decision
+
+### The validation pyramid
+
+```
+                    ┌─────────┐
+                    │ Tier 3  │  Remote CI (authoritative)
+                    │ 5-20min │  Full matrix, deploy, E2E
+                   ─┴─────────┴─
+                  ┌─────────────┐
+                  │   Tier 2    │  Local sandbox (high-fidelity)
+                  │  1-5 min    │  Integration, ephemeral stack
+                 ─┴─────────────┴─
+                ┌─────────────────┐
+                │     Tier 1      │  Local build (fast check)
+                │    30-90s       │  Compile, unit test, synth
+               ─┴─────────────────┴─
+              ┌─────────────────────┐
+              │       Tier 0        │  Pre-commit (gate)
+              │       < 5s          │  Format, lint, secrets
+              └─────────────────────┘
+```
+
+Each tier is **necessary but not sufficient** — passing a lower tier is required before attempting the next. Higher tiers never repeat work done by lower tiers.
+
+### Tier definitions
+
+**Tier 0 — Pre-commit (< 5s, gates every commit)**
+
+- Trailing whitespace, end-of-file fix
+- Merge conflict markers
+- Secrets scan (gitleaks)
+- ESLint (file-level, staged files only)
+- Docs sync check (no stale mirrors)
+- YAML/JSON syntax validation
+
+Status: **Implemented** (prek hooks)
+
+**Tier 1 — Local build (30–90s, gates push)**
+
+- TypeScript compilation (all packages)
+- Unit test suite (Jest)
+- CDK synth (CloudFormation template generation)
+- Agent quality checks (Python linting, type checking)
+- Docs site build (astro check)
+- Type sync drift (CDK ↔ CLI types in sync)
+- Constants drift (cross-language contract check)
+
+Status: **Partially implemented** — available as `mise run build` but not enforced as a push gate. Agents can invoke this but often skip it.
+
+Requirement: Make `mise run build` (or a subset) the pre-push gate. Consider splitting into `mise run check:fast` (compile + lint, 30s) and `mise run check:full` (compile + test + synth, 90s).
+
+**Tier 2 — Local sandbox (1–5 min, on-demand before PR)**
+
+This tier does not exist today. It should provide:
+
+- Container-based integration tests against mocked AWS services (LocalStack or moto)
+- CDK deploy to a local/ephemeral sandbox (validate IAM, resource creation without real cloud)
+- Agent runtime smoke test (run the agent pipeline against a test repo in a local container)
+- Cross-package integration (API → handler → agent contract verification)
+- Policy validation (Cedar policy evaluation against test fixtures)
+
+Status: **Gap — does not exist.** This is the primary investment needed.
+
+Progressive build-out:
+
+| Phase | Capability | Mechanism | Catches |
+|-------|-----------|-----------|---------|
+| 2a | Container integration tests | `mise run test:integration` → Docker Compose + LocalStack | AWS API call failures, DynamoDB schema issues, SQS message format |
+| 2b | Agent pipeline smoke | `mise run test:agent-smoke` → build agent container, run against fixture repo | Agent crashes, tool failures, prompt regressions |
+| 2c | Ephemeral stack deploy | `mise run deploy:ephemeral` → CDK deploy to a disposable environment with auto-destroy | IAM permission gaps (ADR-002 preflight), resource wiring, real API behavior |
+| 2d | Full local sandbox | `mise run sandbox` → MicroVM matching prod topology | End-to-end flow in production-equivalent isolation |
+
+**Tier 3 — Remote CI (5–20 min, authoritative, gates merge)**
+
+- Full test matrix (multiple Node versions if applicable)
+- Security scans (Semgrep SAST, OSV deps, Grype container, Retire.js, zizmor)
+- CDK diff against deployed stack
+- Multi-account deployment verification
+- E2E tests against real AWS services
+- Performance/cost regression checks
+- Documentation mutation check (fail if Starlight mirrors are stale)
+
+Status: **Implemented** (GitHub Actions). This remains the authoritative gate for merge.
+
+### Enforcement model
+
+| Event | Required tier | Enforcement |
+|-------|--------------|-------------|
+| `git commit` | Tier 0 | Pre-commit hook (prek) |
+| `git push` | Tier 1 | Pre-push hook |
+| PR created/updated | Tier 3 | GitHub Actions required status checks |
+| Agent self-validation (before PR) | Tier 1 + Tier 2 (when available) | Skill-driven (agent invokes `validate-locally`) |
+| Merge | Tier 3 passed + reviewer approved | Branch protection |
+
+### Agent interaction model
+
+Agents interact with validation tiers through skills (depends on ADR-012 for the skill model):
+
+```
+Agent completes implementation
+  → invokes `validate-locally` skill
+    → skill runs Tier 1 (`mise run check:full`)
+    → if Tier 2 available: runs Tier 2 (`mise run test:integration`)
+    → reports: PASS (safe to push) / FAIL (fix before push, here's why)
+  → agent fixes failures locally (fast loop)
+  → pushes only when local validation passes
+  → Tier 3 runs remotely (confirmatory, not exploratory)
+```
+
+The critical shift: **Tier 3 becomes confirmatory, not exploratory.** Agents should not discover failures in remote CI — they should confirm that locally-validated work passes the authoritative gate.
+
+### Investment priority
+
+The gap analysis dictates priority:
+
+| Priority | Investment | Impact |
+|----------|-----------|--------|
+| P0 | Enforce Tier 1 as pre-push gate | Eliminates "pushed without building" class of CI failures |
+| P1 | `mise run test:integration` (Tier 2a — LocalStack) | Eliminates 60%+ of CI-only failures (AWS API contract mismatches) |
+| P2 | Agent smoke test (Tier 2b) | Catches agent runtime regressions before PR |
+| P3 | Ephemeral stack deploy (Tier 2c) | Catches IAM/wiring issues that only surface in real deployment |
+| P4 | Full local sandbox (Tier 2d) | Production-equivalent local validation (long-term target) |
+
+### Design constraints
+
+- **Tier 2 must not require cloud credentials for basic operation** — agents running in isolation (MicroVM, CI runner) need to validate without AWS access. LocalStack/moto fills this.
+- **Tier 2 must be optional until stable** — a failing Tier 2 should warn, not block, during build-out. Once stable, it becomes a gate.
+- **Tier 2 must be cacheable** — container images, LocalStack state, and fixture repos should be cached between runs. An agent shouldn't rebuild the world every time.
+- **No tier should duplicate work from a lower tier** — if Tier 0 checks formatting, Tier 1 does not re-check it. If Tier 1 runs unit tests, Tier 3 does not re-run them (it may run *additional* tests but not the same ones).
+
+### Escape hatches
+
+| Situation | Allowed bypass |
+|-----------|---------------|
+| Hotfix with production down | Skip Tier 2, expedite Tier 3 review |
+| Documentation-only change | Tier 0 + Tier 1 (synth not needed) |
+| Dependency bump (Dependabot) | Tier 0 + Tier 3 (CI validates compatibility) |
+| Agent cannot run Tier 2 (tooling unavailable) | Push with Tier 1 only, note in PR that Tier 2 was skipped |
+
+Escape hatches must be explicit (noted in PR description, not silent).
+
+## Consequences
+
+- (+) Agent feedback loops drop from 15 minutes to 30–90 seconds for most issues
+- (+) Remote CI failure rate drops — issues caught locally before push
+- (+) Agents can self-validate autonomously without waiting for external systems
+- (+) Investment is progressive — each tier delivers value independently
+- (+) Clear ownership: Tier 0–2 are developer/agent responsibility; Tier 3 is platform responsibility
+- (+) Cost reduction — fewer CI minutes wasted on obviously-broken pushes
+- (-) Tier 2 infrastructure requires maintenance (LocalStack config, container images, fixtures)
+- (-) Local machine requirements increase (Docker, disk space for containers)
+- (-) Tier 2 may diverge from real AWS behavior — LocalStack is not 100% faithful
+- (-) Pre-push gate adds 30–90s to every push (mitigation: `mise run check:fast` for safe paths)
+- (!) LocalStack fidelity gaps must be documented — when Tier 2 passes but Tier 3 fails, document the divergence and add it to Tier 2's scope
+- (!) Tier 2 "optional until stable" phase must have a defined graduation criteria, or it stays optional forever
+
+## References
+
+- Issue #149 — implementation tracking for this ADR
+- ADR-002 — bootstrap policies (Tier 2c validates IAM preflight locally)
+- ADR-008 — definition of done (tier requirements per DoD level)
+- ADR-012 (prerequisite) — operational knowledge stack; this ADR depends on 012's skill model for agent interaction with validation tiers
+- Current hooks: `.pre-commit-config.yaml` (Tier 0 implementation)
+- Current build: `mise.toml` root + package-level configs (Tier 1 implementation)
+- LocalStack: https://localstack.cloud (candidate for Tier 2a)
+- Firecracker MicroVMs: https://firecracker-microvm.github.io (candidate for Tier 2d)
diff --git a/docs/src/content/docs/decisions/003-contribution-governance.md b/docs/src/content/docs/decisions/003-contribution-governance.md
index 722c3db..5453488 100644
--- a/docs/src/content/docs/decisions/003-contribution-governance.md
+++ b/docs/src/content/docs/decisions/003-contribution-governance.md
@@ -15,6 +15,10 @@ The rules below define how any contributor — human or AI — picks up, owns, a
 
 ## Decision
 
+### No branches without an Issue
+
+Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.
+
 ### No PRs without an Issue
 
 Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.
@@ -31,9 +35,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto
 
 Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval.
 
-### Self-assignment on start
+### Assignments
 
-Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification.
+Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.
 
 ### Issue body as primary directive
 
@@ -51,10 +55,16 @@ Before implementation, the assigned contributor must:
 
 **Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?"
 
-**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
+**Predecessor validation (GraphQL dependency graph is authoritative):**
+- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
+- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
+- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
+- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
 
 **Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment.
 
+**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).
+
 **Final gate:** If all checks pass, comment "Starting implementation."
 
 ### Identity and attribution
@@ -69,6 +79,36 @@ Provide progress signals at checkpoints. If blocked or abandoning, comment and u
 
 CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.
 
+### Conversational approval is NOT issue approval
+
+A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:
+
+1. Create an issue with acceptance criteria
+2. Request the `approved` label from an admin
+3. Self-assign once approved
+4. Then begin implementation
+
+**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.
+
+**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.
+
+### Enforcement mechanisms (planned)
+
+Prose governance is necessary but insufficient. The following enforcement points are planned to prevent bypass progressively. Mechanisms are deployed incrementally — see #186 for implementation tracking.
+
+| Mechanism | Layer | What it catches | Status |
+|-----------|-------|-----------------|--------|
+| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" | Implemented |
+| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference | Planned |
+| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` | Planned |
+| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API | Planned |
+| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context | Planned |
+| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue | Planned |
+
+**Transition:** Branch naming and commit-msg rules apply to branches created after the corresponding hooks are deployed. Existing branches (including this PR's) pre-date enforcement.
+
+**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (see ADR-012, proposed, for the skill model).
+
 ## Consequences
 
 - (+) Prevents duplicate effort — assignment signals ownership
@@ -76,13 +116,18 @@ CI passes before requesting review. After merge, verify acceptance criteria and
 - (+) Prevents rework — predecessor validation catches out-of-order work
 - (+) Issue body stays current — threads are folded back
 - (+) Cross-reference audit catches duplicates early
+- (+) Enforcement mechanisms catch bypass at multiple points
 - (-) Pre-start overhead for small tasks
 - (-) Requires discipline to fold threads into body
+- (-) Commit-msg hook adds friction for rapid iteration on approved work
 - (!) Assumes priority labels exist and are maintained
 - (!) Self-assignment is not atomic — concurrent agents may race; mitigate by verifying assignment after claiming via refresh
+- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral
 
 ## References
 
 - Issue #134 — full RFC with open questions and automation requirements
 - Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
 - ADR-001 — delivery methodology referenced by completion rules
+- ADR-012 (proposed) — operational knowledge stack; planned enforcement via skill gates
+- ADR-013 (proposed) — tiered validation; planned enforcement hooks at Tier 0 and Tier 1
diff --git a/docs/src/content/docs/decisions/005-feedback-loop.md b/docs/src/content/docs/decisions/005-feedback-loop.md
new file mode 100644
index 0000000..174713f
--- /dev/null
+++ b/docs/src/content/docs/decisions/005-feedback-loop.md
@@ -0,0 +1,72 @@
+---
+title: 005 feedback loop
+---
+
+# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience.
+
+Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains.
+
+## Decision
+
+### Review comment classification
+
+| Type | Action | Propagates to |
+|------|--------|---------------|
+| Nit (style, naming) | Fix in PR | Nothing |
+| Bug (logic error) | Fix in PR | Nothing (unless systemic) |
+| Design concern | Pause PR; evaluate | Issue body |
+| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) |
+| Scope question | Clarify | Issue body |
+| Blocker (won't approve as-is) | Pause PR | Issue body |
+
+### Upstream propagation
+
+When a review surfaces a design concern or architecture challenge:
+
+1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one.
+2. **Assess** — Does this invalidate the issue's approach? The ADR's decision?
+3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents).
+4. **Resolve** — Revise the approach, defend with evidence, or cancel the work.
+5. **Resume** — Once resolved, unblock the PR and dependents.
+
+### ADR evolution
+
+| Trigger | Response |
+|---------|----------|
+| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR |
+| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause |
+| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` |
+| Decision works but needs refinement | Amend via PR (minor, no new ADR) |
+
+Never silently ignore a challenged decision.
+
+### Stacked PR chain revision
+
+When feedback on PR N invalidates PRs N+1 through N+M:
+1. Comment on all affected PRs
+2. Do not rebase dependent PRs until the base is stable
+3. If architectural: re-evaluate whether the remaining stack is valid
+4. If redesign needed: close dependent PRs, revise issue, re-plan
+
+## Consequences
+
+- (+) Review insights propagate to architectural decisions
+- (+) Issue bodies stay current with implementation learnings
+- (+) ADRs evolve rather than silently becoming outdated
+- (+) Stacked PR chains have a defined recovery protocol
+- (-) Adds process overhead to reviews (classification step)
+- (-) Pausing stacked chains delays delivery
+- (!) Requires discipline to actually propagate feedback upstream
+
+## References
+
+- Issue #136 — full RFC with open questions
+- ADR-003 — governance (issue body as source of truth)
+- ADR-001 — stacked PRs (chain revision protocol)
diff --git a/docs/src/content/docs/decisions/006-feature-flags.md b/docs/src/content/docs/decisions/006-feature-flags.md
new file mode 100644
index 0000000..da778eb
--- /dev/null
+++ b/docs/src/content/docs/decisions/006-feature-flags.md
@@ -0,0 +1,86 @@
+---
+title: 006 feature flags
+---
+
+# ADR-006: Feature flags for concurrent development
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits.
+
+Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other.
+
+## Decision
+
+### When to use flags
+
+| Situation | Use a flag? |
+|-----------|-------------|
+| Feature spans multiple PRs, incomplete state is unsafe | Yes |
+| Two contributors touch the same module for different purposes | Yes |
+| SRE needs a kill switch for a new capability | Yes |
+| Simple refactor with no behavioral change | No |
+| Bug fix | No |
+| One-PR feature, complete on merge | No |
+
+### Flag ownership
+
+- Every flag has an owner (the issue that introduced it)
+- Every flag has an expiration (the issue/PR that removes it)
+- Flags without a removal plan are rejected in review
+
+### Separation of concerns
+
+- **Planners** decide which features get flags (issue/RFC level)
+- **Implementors** add/use flags in code (PR level)
+- **SRE/operators** toggle flags in production (runtime level)
+- **No self-approval** — the person who introduces a flag cannot approve its removal
+
+### Flag lifecycle
+
+1. **Proposed** — issue identifies the need for a flag
+2. **Introduced** — PR adds the flag (default: off)
+3. **Active** — feature behind flag is in development
+4. **Verified** — feature complete, flag toggled on in testing
+5. **Permanent** — flag removed, feature is always-on (or removed entirely)
+
+### Lifecycle metadata
+
+Each flag must track:
+
+| Field | Required | Source |
+|-------|----------|--------|
+| Flag name | Yes | Code constant |
+| Purpose / linked issue | Yes | Issue reference |
+| First merge date | Yes | Auto from git log |
+| Max lifetime | Yes | Declared at creation (default: 4 weeks) |
+| Expected removal date | Yes | first_merge + max_lifetime |
+| Actual removal date | — | Auto when flag deleted |
+| Days active | — | Computed |
+
+### Maximum lifetime
+
+Flags must be removed within the declared max lifetime (default: 4 weeks) of the feature being verified. The max lifetime can be overridden per-flag with justification in the issue. Stale flags are treated as technical debt and surfaced in periodic reviews.
+
+### Mechanism constraint
+
+Flags MUST be resolvable at synth time for infrastructure flags and at runtime for behavior flags. The specific storage mechanism (CDK context, DynamoDB, SSM Parameter Store, env vars) is context-dependent and follows from this split — it is not prescribed by this ADR.
+
+## Consequences
+
+- (+) Concurrent work proceeds without blocking
+- (+) Trunk-based development: main stays deployable
+- (+) SRE can disable features without code changes
+- (+) Partial features merge safely
+- (-) Flag management overhead
+- (-) Combinatorial testing complexity if many flags exist simultaneously
+- (!) Maximum lifetime must be enforced or flags accumulate indefinitely
+
+## References
+
+- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars)
+- ADR-003 — governance (flag introduction requires approval)
+- ADR-005 — feedback loop (reviewer may flag-gate a feature during review)
diff --git a/docs/src/content/docs/decisions/007-knowledge-acquisition.md b/docs/src/content/docs/decisions/007-knowledge-acquisition.md
new file mode 100644
index 0000000..b137b2c
--- /dev/null
+++ b/docs/src/content/docs/decisions/007-knowledge-acquisition.md
@@ -0,0 +1,83 @@
+---
+title: 007 knowledge acquisition
+---
+
+# ADR-007: Knowledge acquisition through progressive failure
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured.
+
+Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures.
+
+## Decision
+
+### Zero-context execution attempts
+
+Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues.
+
+### Failure capture protocol
+
+At each failure point, the agent:
+1. **Stops** — does not attempt to work around or guess
+2. **Documents** — creates an issue: which document, which step, what was missing
+3. **Continues** — attempts the next step (if possible) to find additional gaps
+
+### Retrospectives
+
+After completing a task, project milestone, or sprint, agents produce a retrospective artifact:
+- What worked well (patterns to repeat)
+- What failed or caused friction (patterns to avoid)
+- Actionable experiments for future workflows
+
+Retrospectives are a first-class knowledge artifact — they feed into documentation improvements, inform ADR amendments, and surface systemic issues that individual task failures cannot.
+
+### Knowledge artifacts (interim)
+
+Until documentation meets ADR-004, agents may create ephemeral artifacts:
+- Semantic indices of the codebase (call graphs, dependency maps)
+- Annotated walkthroughs of successful executions
+- "What I learned" summaries after completing a task
+- Retrospectives (see above)
+
+These are scaffolding that informs documentation improvements, not documentation themselves.
+
+### Maturity model
+
+| Level | State | Agent capability |
+|-------|-------|-----------------|
+| 0 | No docs | Cannot start; files issue for missing docs |
+| 1 | Partial docs | Follows docs, stops at gaps, files issues |
+| 2 | Complete docs (ADR-004) | Completes end-to-end without help |
+| 3 | Self-improving | Detects drift between docs and code, auto-files issues |
+
+### The self-improvement loop
+
+```
+Agent starts fresh → follows docs → hits failure →
+  files issue → issue gets fixed → next agent goes further →
+    hits next failure → files issue → ...
+      until end-to-end works from zero context
+```
+
+This runs continuously because code changes outpace documentation and different agent implementations fail at different points.
+
+## Consequences
+
+- (+) Documentation gaps become bugs with reproduction steps
+- (+) Priority ordering emerges naturally (most common failures surface first)
+- (+) The system self-improves without human identification of gaps
+- (+) Creates a natural definition of "docs are done" (Level 2 achieved)
+- (-) Generates issue volume that needs triage
+- (-) Requires periodic investment in zero-context test runs
+- (!) The gap between Level 1 and Level 2 may be large — patience required
+
+## References
+
+- Issue #138 — full RFC with open questions
+- ADR-004 — defines the quality target (tabula rasa test)
+- ADR-003 — governance for issues filed by failing agents
+- ADR-008 — Level 4 Definition of Done depends on this protocol
diff --git a/docs/src/content/docs/decisions/008-definition-of-done.md b/docs/src/content/docs/decisions/008-definition-of-done.md
new file mode 100644
index 0000000..caeda51
--- /dev/null
+++ b/docs/src/content/docs/decisions/008-definition-of-done.md
@@ -0,0 +1,86 @@
+---
+title: 008 definition of done
+---
+
+# ADR-008: Definition of Done (progressive maturity)
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+"Done" is implicit and varies by contributor. Some consider a passing build sufficient; others expect documentation, tests, and deployment verification. Agents have no unambiguous checklist to know they have completed work. Over-engineering "done" early blocks velocity; under-defining it ships incomplete work.
+
+The definition must be progressive — rising as the project matures — so it does not block early momentum but ensures quality at scale.
+
+## Decision
+
+### Progressive levels
+
+**Level 1 — Basic (minimum viable):**
+- Code compiles without errors
+- Existing tests pass (no regressions)
+- New code has tests (unit level minimum)
+- Linting passes
+- PR description explains what and why
+- Linked issue exists
+
+**Level 2 — Standard (current project default):**
+- All of Level 1
+- Pre-commit hooks pass
+- CDK synth succeeds (if infrastructure changes)
+- Security scans pass (no new HIGH/CRITICAL findings)
+- Documentation updated if behavior changes
+- Starlight mirrors synced (if docs changed)
+
+**Level 3 — Rigorous (critical paths):**
+- All of Level 2
+- Integration or E2E test covers the happy path
+- Error paths tested
+- Reviewer approved (human or qualified agent)
+- Deployed to ephemeral stack and smoke-tested (if infrastructure)
+- ADR written (if architectural decision made)
+
+**Level 4 — Self-verifying (future target):**
+- All of Level 3
+- Tabula rasa agent can replicate the outcome using only docs
+- CI includes behavioral verification
+- Documentation drift detection passes
+
+### Default level by issue type
+
+| Issue type | Default level |
+|-----------|---------------|
+| Bug fix | Level 2 |
+| New feature | Level 2-3 (based on blast radius) |
+| Infrastructure/IAM change | Level 3 |
+| Documentation only | Level 1 |
+| Security fix | Level 3 |
+| RFC/ADR implementation | Level 2 + ADR written |
+
+Issues may override by specifying `Done: Level N` in the body.
+
+### Verification responsibility
+
+| Level | Who verifies |
+|-------|-------------|
+| 1 | CI (automated) |
+| 2 | CI + self-check by implementor |
+| 3 | CI + reviewer + implementor |
+| 4 | CI + reviewer + independent agent |
+
+## Consequences
+
+- (+) Agents have an unambiguous completion checklist
+- (+) Quality bar rises as the project matures
+- (+) Over-engineering is prevented (Level 1 for simple docs changes)
+- (+) Critical paths get rigorous verification (Level 3)
+- (-) Requires labeling or explicit level assignment per issue
+- (-) Level 4 is aspirational and depends on ADR-007 (knowledge acquisition)
+- (!) The project must eventually graduate from Level 2 to Level 3 default
+
+## References
+
+- Issue #139 — full RFC with open questions
+- ADR-003 — governance (defines when to start; this defines when to stop)
+- ADR-007 — knowledge acquisition (Level 4 depends on tabula rasa verification)
diff --git a/docs/src/content/docs/decisions/009-security-posture-dev-agents.md b/docs/src/content/docs/decisions/009-security-posture-dev-agents.md
new file mode 100644
index 0000000..7fa62f8
--- /dev/null
+++ b/docs/src/content/docs/decisions/009-security-posture-dev-agents.md
@@ -0,0 +1,77 @@
+---
+title: 009 security posture dev agents
+---
+
+# ADR-009: Security posture and blast radius for development-time agents
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+The existing `SECURITY.md` covers runtime agent execution (inside MicroVMs). It does not cover **development-time agents** — those writing code, creating PRs, and modifying infrastructure in this repository. A development-time agent operates with the credentials of whoever invoked it, creating a risk of self-approval, policy modification, and unbounded blast radius.
+
+The core principle: **planners and implementors must be separated by context and ideally by identity. No self-approval.**
+
+## Decision
+
+### Role separation
+
+| Role | Can do | Cannot do |
+|------|--------|-----------|
+| **Planner** | Create/edit issues, write RFCs/ADRs, define roadmap and revisit vision | Write code, push branches, approve PRs |
+| **Implementor** | Write code, create PRs, push branches, run tests | Approve own PRs, merge own PRs, modify CI/security config |
+| **Reviewer** | Approve PRs, request changes, merge, suggest code (no commits) | Write code on the same PR being reviewed |
+| **Admin** | All of the above + modify policies, approve issues | Still requires 2P for policy changes |
+
+### Blast radius classification
+
+| Action | Risk | Gate |
+|--------|------|------|
+| Edit code in existing patterns | Low | CI + peer review |
+| Add new dependency | Medium | Security scan + review |
+| Modify IAM policy / security config | High | 2P review + admin approval |
+| Modify CI/CD workflow | High | 2P review + admin approval |
+| Modify branch protection / approval rules | Critical | Admin-only + audit trail |
+| Modify governance ADRs | Critical | Admin-only + 2P review |
+| Delete or force-push protected branches | Critical | Never automated; human-only |
+
+### 2P (two-person) review
+
+For High and Critical actions:
+- The author cannot be one of the two approvers
+- At least one approver must be a human
+- Approvals reference the specific risk being accepted
+
+### No self-approval (structural)
+
+- Branch protection requires review from someone other than the pusher
+- Approval cannot come from the last committer on the branch
+- If an agent plans AND implements, review must come from an identity that did neither
+- The identity that writes code cannot approve or merge it
+
+### Credential scoping
+
+| Agent context | Minimum credentials |
+|---------------|-------------------|
+| Planning (issues, RFCs) | GitHub Issues write, read-only repo |
+| Implementation (code, PRs) | Repo write, PR create, no merge capability |
+| Review | PR review write, no push capability |
+| Deployment | Separate deploy key, environment approval gate |
+
+## Consequences
+
+- (+) Prevents self-approval of dangerous changes
+- (+) Blast radius is explicit and enforceable
+- (+) Role separation enables audit trail
+- (+) 2P review catches compromised or confused agents
+- (-) Credential management complexity increases
+- (-) Small tasks require multi-identity orchestration
+- (!) Personal PATs grant all permissions — structural enforcement requires GitHub Apps or fine-grained tokens
+
+## References
+
+- Issue #140 — full RFC with open questions
+- `docs/design/SECURITY.md` — runtime agent security (complementary)
+- Cedar HITL gates (PR #88) — runtime tool-call governance
+- ADR-003 — governance (approval gates enforced here technically)
diff --git a/docs/src/content/docs/decisions/010-error-recovery.md b/docs/src/content/docs/decisions/010-error-recovery.md
new file mode 100644
index 0000000..d16c44b
--- /dev/null
+++ b/docs/src/content/docs/decisions/010-error-recovery.md
@@ -0,0 +1,73 @@
+---
+title: 010 error recovery
+---
+
+# ADR-010: Error recovery and rollback protocol
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+When merged code breaks something, the response is ad-hoc. Agents operating autonomously may merge code that passes CI but breaks integration. No protocol defines when to revert vs. fix forward, who decides, or how stacked PR chains recover.
+
+## Decision
+
+### Decision tree
+
+```
+Broken thing detected
+├─ Production affected (users impacted NOW)?
+│  └─ Yes → REVERT immediately, investigate after
+├─ Fix obvious and < 30 minutes?
+│  └─ Yes → Fix forward (new PR, not amend)
+├─ Stacked PR chain?
+│  └─ Yes → Pause dependent PRs, fix the base
+└─ Scope of damage unclear?
+   └─ Yes → REVERT (safe default), then investigate
+```
+
+### Revert protocol
+
+1. Create a revert commit (not force-push) — preserves history
+2. Open an issue: what broke, why CI did not catch it, what the fix needs
+3. The fix goes through normal review (no rushing, no skipping gates)
+
+### Fix-forward protocol
+
+1. Only if the fix is obvious, small, and low-risk
+2. Must still go through PR + review
+3. If the fix introduces new complexity — revert instead
+
+### Stacked PR chain recovery
+
+1. Identify which PR introduced the breakage
+2. Pause/close all PRs above it
+3. Fix the base PR
+4. Rebase and re-evaluate dependent PRs
+5. Re-run CI on each before re-opening
+
+### Agents must NEVER do during recovery
+
+- Force-push to shared branches
+- Delete branches with others' work
+- Amend published commits
+- Skip review "because it's urgent"
+- Self-approve a revert
+
+## Consequences
+
+- (+) Clear decision tree prevents analysis paralysis during incidents
+- (+) Revert-first default limits blast radius
+- (+) Stacked chain recovery is defined (not improvised)
+- (+) History is preserved (revert commits, not force-push)
+- (-) Reverts create noise in git history
+- (-) Fix-forward temptation may lead to rushed fixes
+- (!) "Production affected" requires definition per deployment (self-hosted varies)
+
+## References
+
+- Issue #141 — full RFC with open questions
+- ADR-003 — governance (no bypasses during recovery)
+- ADR-001 — stacked PRs (chain recovery protocol)
+- ADR-009 — security (revert authority tied to role)
diff --git a/docs/src/content/docs/decisions/011-conflict-resolution.md b/docs/src/content/docs/decisions/011-conflict-resolution.md
new file mode 100644
index 0000000..b9068b6
--- /dev/null
+++ b/docs/src/content/docs/decisions/011-conflict-resolution.md
@@ -0,0 +1,68 @@
+---
+title: 011 conflict resolution
+---
+
+# ADR-011: Conflict resolution protocol
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Multiple concurrent contributors — human or AI — will propose incompatible approaches, create merge conflicts, and disagree on design. Without a defined escalation path, work stalls or the loudest voice wins.
+
+## Decision
+
+### Escalation ladder
+
+```
+Level 1: Contributor discussion (PR comments, issue thread)
+   ↓ (no resolution within 2 interactions)
+Level 2: Request additional reviewer (fresh perspective)
+   ↓ (still no resolution)
+Level 3: Competing proposals in the issue body (explicit trade-off comparison)
+   ↓ (still no resolution)
+Level 4: Admin decision (binding, documented in issue body)
+```
+
+### Decision criteria
+
+When comparing approaches, evaluate on:
+1. **Correctness** — does it solve the stated problem?
+2. **Simplicity** — fewer moving parts wins when correctness is equal
+3. **Consistency** — follows existing codebase patterns?
+4. **Reversibility** — can we change our mind later?
+5. **Blast radius** — what breaks if this is wrong?
+
+### Merge conflict ownership
+
+| Situation | Who resolves |
+|-----------|-------------|
+| Two PRs modify same file, one merged first | Second PR's author rebases |
+| Stacked PR conflict from lower change | Lower PR author notifies; upper PRs rebase after stable |
+| Concurrent agents modified same module | First to merge wins; second adapts |
+| Architectural conflict (both valid) | Escalate to Level 3 |
+
+### Human vs. agent disagreement
+
+- Agents present evidence (code, tests, measurements) not authority
+- Humans can override but must document why
+- Agents do not repeatedly argue a rejected point
+- If an agent believes a human decision causes harm (security, data loss), it escalates to admin
+
+## Consequences
+
+- (+) Disagreements have a defined path to resolution
+- (+) Merge conflicts have clear ownership
+- (+) Competing approaches are compared on criteria, not authority
+- (+) Admin decision is the final backstop (no infinite loops)
+- (-) Escalation takes time; may slow delivery
+- (-) Level 3 (written trade-off) requires effort
+- (!) Must not become a veto mechanism for slow contributors
+
+## References
+
+- Issue #142 — full RFC with open questions
+- ADR-003 — governance (issue body as resolution record)
+- ADR-005 — feedback loop (reviewer disagreements feed into this)
+- ADR-009 — security (authority levels for decisions)
diff --git a/docs/src/content/docs/decisions/012-operational-knowledge-stack.md b/docs/src/content/docs/decisions/012-operational-knowledge-stack.md
new file mode 100644
index 0000000..5c6ab7a
--- /dev/null
+++ b/docs/src/content/docs/decisions/012-operational-knowledge-stack.md
@@ -0,0 +1,269 @@
+---
+title: 012 operational knowledge stack
+---
+
+# ADR-012: Operational knowledge as a three-layer stack (Decision → Guide → Skill)
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Several ADRs in this repository contain operational runbook material embedded directly in the decision record. ADR-003 (contribution governance) prescribes a full pre-start review checklist. ADR-010 (error recovery) defines a decision tree and step-by-step protocols. ADR-008 (definition of done) provides per-issue-type checklists.
+
+This creates three problems:
+
+1. **Stale procedures** — Teams hesitate to update ADRs for minor procedural tweaks (timing thresholds, label names), so runbooks drift from practice.
+2. **Agent execution gap** — Agents must parse prose ADRs, extract the operational steps, and interpret judgment calls. The ADR format is optimized for decision rationale, not execution.
+3. **Persona mismatch** — A planner reading ADR-003 for the governance philosophy gets bogged down in GraphQL query syntax. An implementor executing the pre-start checklist must skip rationale paragraphs to find the steps.
+
+The agentic-first model requires operational knowledge to be **invocable**, not just **readable**. An agent should execute a governance workflow the same way it invokes a tool — with defined inputs, gates, and outputs.
+
+## Decision
+
+### Three-layer operational knowledge stack
+
+Every operational procedure identified in an ADR is decomposed into three layers:
+
+```
+┌─────────────────────────────────────────┐
+│  Layer 1: ADR (Decision Record)         │  Immutable-ish
+│  WHY we do it this way                  │  Changes: decision is superseded
+│  Consumer: architects, future deciders  │
+└─────────────────────────┬───────────────┘
+                          │ references
+┌─────────────────────────▼───────────────┐
+│  Layer 2: Guide (Reference Document)    │  Living document
+│  WHAT to do, organized by persona       │  Changes: process is refined
+│  Consumer: humans + agents needing      │
+│  context                                │
+└─────────────────────────┬───────────────┘
+                          │ operationalized by
+┌─────────────────────────▼───────────────┐
+│  Layer 3: Skill (Executable Runbook)    │  Versioned, invocable
+│  HOW to execute, with gates and outputs │  Changes: implementation shifts
+│  Consumer: agents during execution      │
+└─────────────────────────────────────────┘
+```
+
+### Layer definitions
+
+**Layer 1 — ADR (Decision Record)**
+
+- Records the architectural or process decision and its rationale
+- States WHAT was decided and WHY
+- Does NOT contain step-by-step procedures (those belong in Layer 2/3)
+- References the guide(s) that operationalize the decision
+- Changes only when the decision itself is superseded or amended
+
+**Layer 2 — Guide (Reference Document)**
+
+- Lives in `docs/guides/`
+- Organized by persona (planner, implementor, reviewer, admin)
+- Contains the WHAT and WHEN — what to do in which situations
+- Includes context that helps humans (and agents needing background) understand the workflow
+- References the ADR for justification
+- Links to the skill(s) that mechanize the workflow
+- Changes when the process is refined
+
+**Layer 3 — Skill (Executable Runbook)**
+
+- Lives as a Claude Code skill (or plugin skill) — invocable by name
+- Encodes the HOW — the mechanical execution with explicit gates, inputs, outputs
+- Structured as bounded, invocable units with clear entry/exit criteria
+- An agent invokes the skill rather than parsing the guide/ADR
+- References the guide for context when judgment is needed
+- Changes when implementation details shift
+
+### Reference direction
+
+References always point upward:
+
+- Skill → references Guide (for context)
+- Guide → references ADR (for justification)
+- ADR → references Guide (for operationalization, "see Guide X for the workflow")
+
+This means a change at any layer triggers review of layers below:
+
+- ADR amended → review Guide → review Skill
+- Guide refined → review Skill
+- Skill updated → no upstream change needed (unless the procedure itself changed)
+
+### When a layer is NOT needed
+
+| Situation | Layers needed |
+|-----------|---------------|
+| Pure policy decision (no steps to follow) | ADR only |
+| Decision with human-executed steps (rare, non-repeatable) | ADR + Guide |
+| Decision with agent-executable procedure | ADR + Guide + Skill |
+| Lightweight procedure (< 3 steps, no gates) | ADR + Guide (skill is overhead) |
+
+### ADR content rules (post-adoption)
+
+After adoption, ADRs:
+- **MUST** contain: Context, Decision (the choice made), Consequences, References
+- **MUST NOT** contain: Step-by-step procedures, checklists with >3 items, decision trees with branches, protocol sequences
+- **SHOULD** contain: A one-paragraph summary of the operational approach (enough to understand without reading the guide)
+- **SHOULD** reference: The guide that operationalizes the decision
+
+Existing ADRs are updated incrementally (not rewritten) — operational content is extracted, and a reference to the new guide/skill is added.
+
+### Skill structure requirements
+
+Skills that operationalize ADRs must:
+- State which ADR/guide they implement (in frontmatter or header)
+- Define explicit gates (conditions that MUST be true to proceed)
+- Define explicit outputs (what the skill produces on completion)
+- Be independently invocable (no implicit state from prior skills)
+- Fail loudly at gates (not silently skip)
+
+## Example: ADR-003 decomposition
+
+ADR-003 (Contribution Governance) is the first ADR to be decomposed under this pattern because it is the most frequently executed procedure and the dependency root for other governance ADRs.
+
+### Current state (ADR-003 contains everything)
+
+ADR-003 currently holds:
+- The decision to govern contributions (rationale) ✓ belongs in ADR
+- Pre-start review checklist (8 mechanical steps) ✗ belongs in Guide + Skill
+- Priority evaluation procedure ✗ belongs in Guide + Skill
+- Predecessor validation with GraphQL queries ✗ belongs in Skill
+- Cross-reference audit steps ✗ belongs in Guide + Skill
+- Work-in-progress discipline rules ✗ belongs in Guide
+- Completion and handoff procedure ✗ belongs in Guide + Skill
+
+### Target state (three layers)
+
+**Layer 1 — ADR-003 (slimmed)**
+
+Retains:
+- Context (why governance is needed for async agents)
+- Decision summary: "Every contribution follows: issue → approval → assignment → pre-start validation → implementation → completion"
+- The principles: no PRs without issues, issue quality bar, admin approval gate, no self-approval, GraphQL as authoritative dependency source
+- Consequences
+- Reference: "See `docs/guides/CONTRIBUTOR_WORKFLOW.md` for the full workflow"
+
+Removes (extracted to Guide/Skill):
+- The detailed pre-start review checklist
+- GraphQL query specifics
+- Step-by-step completion protocol
+
+**Layer 2 — `docs/guides/CONTRIBUTOR_WORKFLOW.md`**
+
+Organized by persona:
+
+```markdown
+# Contributor Workflow
+
+> Operationalizes [ADR-003](/architecture/003-contribution-governance)
+
+## For Planners
+- Issue quality bar (what makes an issue "ready")
+- Approval process
+- Priority labeling
+- Dependency graph maintenance
+
+## For Implementors
+- How to pick up an issue
+- Pre-start review (summary — invoke skill for execution)
+- Work-in-progress signals
+- Completion criteria (references ADR-008 guide)
+
+## For Reviewers
+- Review comment classification (references ADR-005 guide)
+- When to block vs. approve
+- Propagation responsibilities
+```
+
+**Layer 3 — Skills (invocable by agents)**
+
+| Skill | Inputs | Gates | Outputs |
+|-------|--------|-------|---------|
+| `pickup-issue` | Issue number | Issue approved, unassigned, no unresolved conflicts, predecessors complete (GraphQL check) | Assignment confirmed, "Starting implementation" comment |
+| `validate-dependencies` | Issue number | GraphQL `blockedBy` returns no open blockers | Dependency report (clear / blocked with reason) |
+| `complete-work` | Issue number, PR number | CI passes, DoD level met (ADR-008), no stale assignments | Completion comment, follow-up issues created |
+| `cross-reference-audit` | Issue number | No duplicate issues, no conflicting open PRs | Audit report (clear / conflicts listed) |
+
+Each skill is a bounded unit. An agent picking up work invokes `pickup-issue` — it doesn't read ADR-003 and improvise.
+
+## Why prose alone fails: observed failure mode
+
+This ADR was itself initially created in violation of ADR-003. The agent (author) had ADR-003 loaded in context, analyzed it, called it "ready for contributing" — then immediately began implementation without creating an issue, requesting approval, or self-assigning.
+
+**The rationalization chain:**
+1. "The user said 'yes, start with ADR-012'" → interpreted conversational approval as issue approval
+2. "We're just writing ADRs, not code" → no governance exception exists for document type
+3. "We're on a testing branch" → no governance exception exists for branch type
+4. "Momentum — we're exploring" → governance exists precisely to interrupt unstructured momentum
+
+**What this proves:** An agent with full knowledge of the governance rules will still bypass them when the rules are prose-only. The agent *understood* ADR-003 intellectually but had no structural enforcement preventing violation. Reading a rule is not the same as being gated by it.
+
+**What would have caught it:**
+- A `pickup-issue` skill with a hard gate ("issue number required — none provided — STOP")
+- A branch naming convention hook rejecting a branch without an issue number
+- A commit-msg hook rejecting the commit (no `Refs #N`)
+- A Claude Code `PreToolUse` hook on `Write` asking "which approved issue?"
+
+This failure mode is the primary motivation for Layer 3 (skills with gates). Prose governance (Layer 1) establishes the rule. Guides (Layer 2) explain how to follow it. But only executable skills with hard gates (Layer 3) *enforce* it at the point of action.
+
+## Migration plan
+
+### Phase 1: Establish pattern (this ADR)
+
+- Adopt this ADR
+- No existing ADRs are modified yet (operational content stays in place until guides/skills exist)
+
+### Phase 2: Decompose ADR-003 (proof of concept)
+
+- Create `docs/guides/CONTRIBUTOR_WORKFLOW.md`
+- Create skills: `pickup-issue`, `validate-dependencies`, `complete-work`, `cross-reference-audit`
+- Slim ADR-003 to decision + rationale + reference to guide
+- Validate: an agent can invoke the skills and complete the governance workflow
+
+### Phase 3: Decompose remaining ADRs (incremental)
+
+Priority order (by execution frequency and mechanical content):
+
+| ADR | Guide | Skills |
+|-----|-------|--------|
+| 010 (Error Recovery) | `ERROR_RECOVERY.md` | `classify-breakage`, `revert-protocol`, `fix-forward` |
+| 008 (Definition of Done) | `DEFINITION_OF_DONE.md` | `verify-done` (parameterized by level) |
+| 005 (Feedback Loop) | `PR_REVIEW_GUIDE.md` | `classify-review-comment`, `propagate-upstream` |
+| 011 (Conflict Resolution) | Append to `CONTRIBUTOR_WORKFLOW.md` | `resolve-conflict` (escalation ladder) |
+
+ADRs without operational content (001, 002, 004, 006, 007, 009) remain unchanged.
+
+### Phase 4: Plugin marketplace (future)
+
+Skills become shareable across projects:
+- Fork governance skills for team-specific thresholds
+- Compose skills from multiple ADRs into project-specific workflows
+- Version skills independently from the ADRs that justify them
+
+## Consequences
+
+- (+) ADRs stay stable as decision records — not burdened with procedure maintenance
+- (+) Guides serve the human reader organized by what they need to do
+- (+) Skills make agents execute consistently — no prose interpretation, no drift
+- (+) Change cadence is appropriate per layer — procedures evolve without "amending an ADR"
+- (+) The three layers serve different consumers without redundancy
+- (+) Skills are testable — you can verify an agent follows the procedure correctly
+- (+) Hard gates in skills prevent the "understood but violated" failure mode
+- (-) Three artifacts per procedure increases maintenance surface
+- (-) Migration of existing ADRs requires effort
+- (-) Skill development requires understanding the skill format and tooling
+- (!) Reference chain integrity must be maintained — a broken link between layers means drift goes undetected
+- (!) Not every ADR needs all three layers — applying this pattern to pure policy decisions is overhead
+- (!) Without Layer 3 enforcement, Layers 1 and 2 are advisory-only — agents WILL rationalize bypasses
+
+## References
+
+- Issue #148 — implementation tracking for this ADR
+- ADR-003 — first decomposition target (contribution governance); enforcement mechanisms added
+- ADR-004 — documentation quality standard (guides must meet tabula rasa test)
+- ADR-007 — knowledge acquisition (skills enable Level 3 self-improving)
+- ADR-008 — definition of done (skill `verify-done` is a natural fit)
+- ADR-010 — error recovery (decision tree is a natural skill)
+- ADR-013 (proposed) — tiered validation pyramid; depends on this ADR for skill-based agent interaction with validation tiers
+- [agentskills.io](https://agentskills.io/) — skill marketplace concept for shareable operational knowledge
+- Claude Code plugin/skill format — the implementation vehicle for Layer 3
diff --git a/docs/src/content/docs/decisions/013-tiered-validation-pyramid.md b/docs/src/content/docs/decisions/013-tiered-validation-pyramid.md
new file mode 100644
index 0000000..610abcc
--- /dev/null
+++ b/docs/src/content/docs/decisions/013-tiered-validation-pyramid.md
@@ -0,0 +1,217 @@
+---
+title: 013 tiered validation pyramid
+---
+
+# ADR-013: Tiered validation pyramid for agentic-first development
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+The current validation architecture has two operational tiers:
+
+- **Pre-commit hooks** (< 5s) — formatting, secrets scan, file-level linting
+- **Remote CI** (5–20 min) — full build, test, synth, security scans, deploy verification
+
+The gap between these tiers is significant. When an agent (or human) makes a change that passes pre-commit but fails in CI, the feedback loop is:
+
+```
+Write code → commit → push → wait 5-20 min → CI fails →
+  read failure → fix → commit → push → wait 5-20 min → ...
+```
+
+For a human, this is annoying. For an autonomous agent, this is catastrophic:
+
+- **Compute waste** — the agent idles or context-switches while waiting for remote results
+- **Context loss** — by the time CI reports back, the agent may have compacted context or moved on
+- **Cascade failures** — in a stacked PR chain (ADR-001), a CI failure on PR 1 blocks PRs 2–N, multiplying the wait
+- **Cost amplification** — each round-trip costs inference tokens for the agent to re-read the failure, re-analyze, and re-attempt
+
+The root cause: there is no **Tier 2** — a local, fast, high-fidelity validation layer that catches integration-level issues *before* pushing to remote.
+
+### What exists today
+
+| Tier | Time | What it catches | Gap |
+|------|------|-----------------|-----|
+| Pre-commit (Tier 0) | < 5s | Formatting, secrets, trailing whitespace | None — works well |
+| mise build (Tier 1) | 30–90s | Compile, unit tests, CDK synth, docs sync, linting | Partial — available but not gated on push |
+| Remote CI (Tier 3) | 5–20 min | Full matrix, security, E2E, deploy | Authoritative but slow |
+| **Local integration (Tier 2)** | — | **Does not exist** | Integration-level validation without remote round-trip |
+
+### Agentic-first motivation
+
+In a repo where agents run autonomously (ABCA's own design goal), validation speed directly determines:
+
+- **Throughput** — an agent with 30s feedback loops delivers 10–20x more iterations per hour than one with 15-minute loops
+- **Quality** — fast feedback enables test-driven approaches; slow feedback encourages "push and pray"
+- **Cost** — fewer remote CI runs, fewer wasted inference tokens on retry cycles
+- **Autonomy** — an agent that can self-validate locally needs fewer human interventions
+
+## Decision
+
+### The validation pyramid
+
+```
+                    ┌─────────┐
+                    │ Tier 3  │  Remote CI (authoritative)
+                    │ 5-20min │  Full matrix, deploy, E2E
+                   ─┴─────────┴─
+                  ┌─────────────┐
+                  │   Tier 2    │  Local sandbox (high-fidelity)
+                  │  1-5 min    │  Integration, ephemeral stack
+                 ─┴─────────────┴─
+                ┌─────────────────┐
+                │     Tier 1      │  Local build (fast check)
+                │    30-90s       │  Compile, unit test, synth
+               ─┴─────────────────┴─
+              ┌─────────────────────┐
+              │       Tier 0        │  Pre-commit (gate)
+              │       < 5s          │  Format, lint, secrets
+              └─────────────────────┘
+```
+
+Each tier is **necessary but not sufficient** — passing a lower tier is required before attempting the next. Higher tiers never repeat work done by lower tiers.
+
+### Tier definitions
+
+**Tier 0 — Pre-commit (< 5s, gates every commit)**
+
+- Trailing whitespace, end-of-file fix
+- Merge conflict markers
+- Secrets scan (gitleaks)
+- ESLint (file-level, staged files only)
+- Docs sync check (no stale mirrors)
+- YAML/JSON syntax validation
+
+Status: **Implemented** (prek hooks)
+
+**Tier 1 — Local build (30–90s, gates push)**
+
+- TypeScript compilation (all packages)
+- Unit test suite (Jest)
+- CDK synth (CloudFormation template generation)
+- Agent quality checks (Python linting, type checking)
+- Docs site build (astro check)
+- Type sync drift (CDK ↔ CLI types in sync)
+- Constants drift (cross-language contract check)
+
+Status: **Partially implemented** — available as `mise run build` but not enforced as a push gate. Agents can invoke this but often skip it.
+
+Requirement: Make `mise run build` (or a subset) the pre-push gate. Consider splitting into `mise run check:fast` (compile + lint, 30s) and `mise run check:full` (compile + test + synth, 90s).
+
+**Tier 2 — Local sandbox (1–5 min, on-demand before PR)**
+
+This tier does not exist today. It should provide:
+
+- Container-based integration tests against mocked AWS services (LocalStack or moto)
+- CDK deploy to a local/ephemeral sandbox (validate IAM, resource creation without real cloud)
+- Agent runtime smoke test (run the agent pipeline against a test repo in a local container)
+- Cross-package integration (API → handler → agent contract verification)
+- Policy validation (Cedar policy evaluation against test fixtures)
+
+Status: **Gap — does not exist.** This is the primary investment needed.
+
+Progressive build-out:
+
+| Phase | Capability | Mechanism | Catches |
+|-------|-----------|-----------|---------|
+| 2a | Container integration tests | `mise run test:integration` → Docker Compose + LocalStack | AWS API call failures, DynamoDB schema issues, SQS message format |
+| 2b | Agent pipeline smoke | `mise run test:agent-smoke` → build agent container, run against fixture repo | Agent crashes, tool failures, prompt regressions |
+| 2c | Ephemeral stack deploy | `mise run deploy:ephemeral` → CDK deploy to a disposable environment with auto-destroy | IAM permission gaps (ADR-002 preflight), resource wiring, real API behavior |
+| 2d | Full local sandbox | `mise run sandbox` → MicroVM matching prod topology | End-to-end flow in production-equivalent isolation |
+
+**Tier 3 — Remote CI (5–20 min, authoritative, gates merge)**
+
+- Full test matrix (multiple Node versions if applicable)
+- Security scans (Semgrep SAST, OSV deps, Grype container, Retire.js, zizmor)
+- CDK diff against deployed stack
+- Multi-account deployment verification
+- E2E tests against real AWS services
+- Performance/cost regression checks
+- Documentation mutation check (fail if Starlight mirrors are stale)
+
+Status: **Implemented** (GitHub Actions). This remains the authoritative gate for merge.
+
+### Enforcement model
+
+| Event | Required tier | Enforcement |
+|-------|--------------|-------------|
+| `git commit` | Tier 0 | Pre-commit hook (prek) |
+| `git push` | Tier 1 | Pre-push hook |
+| PR created/updated | Tier 3 | GitHub Actions required status checks |
+| Agent self-validation (before PR) | Tier 1 + Tier 2 (when available) | Skill-driven (agent invokes `validate-locally`) |
+| Merge | Tier 3 passed + reviewer approved | Branch protection |
+
+### Agent interaction model
+
+Agents interact with validation tiers through skills (depends on ADR-012 for the skill model):
+
+```
+Agent completes implementation
+  → invokes `validate-locally` skill
+    → skill runs Tier 1 (`mise run check:full`)
+    → if Tier 2 available: runs Tier 2 (`mise run test:integration`)
+    → reports: PASS (safe to push) / FAIL (fix before push, here's why)
+  → agent fixes failures locally (fast loop)
+  → pushes only when local validation passes
+  → Tier 3 runs remotely (confirmatory, not exploratory)
+```
+
+The critical shift: **Tier 3 becomes confirmatory, not exploratory.** Agents should not discover failures in remote CI — they should confirm that locally-validated work passes the authoritative gate.
+
+### Investment priority
+
+The gap analysis dictates priority:
+
+| Priority | Investment | Impact |
+|----------|-----------|--------|
+| P0 | Enforce Tier 1 as pre-push gate | Eliminates "pushed without building" class of CI failures |
+| P1 | `mise run test:integration` (Tier 2a — LocalStack) | Eliminates 60%+ of CI-only failures (AWS API contract mismatches) |
+| P2 | Agent smoke test (Tier 2b) | Catches agent runtime regressions before PR |
+| P3 | Ephemeral stack deploy (Tier 2c) | Catches IAM/wiring issues that only surface in real deployment |
+| P4 | Full local sandbox (Tier 2d) | Production-equivalent local validation (long-term target) |
+
+### Design constraints
+
+- **Tier 2 must not require cloud credentials for basic operation** — agents running in isolation (MicroVM, CI runner) need to validate without AWS access. LocalStack/moto fills this.
+- **Tier 2 must be optional until stable** — a failing Tier 2 should warn, not block, during build-out. Once stable, it becomes a gate.
+- **Tier 2 must be cacheable** — container images, LocalStack state, and fixture repos should be cached between runs. An agent shouldn't rebuild the world every time.
+- **No tier should duplicate work from a lower tier** — if Tier 0 checks formatting, Tier 1 does not re-check it. If Tier 1 runs unit tests, Tier 3 does not re-run them (it may run *additional* tests but not the same ones).
+
+### Escape hatches
+
+| Situation | Allowed bypass |
+|-----------|---------------|
+| Hotfix with production down | Skip Tier 2, expedite Tier 3 review |
+| Documentation-only change | Tier 0 + Tier 1 (synth not needed) |
+| Dependency bump (Dependabot) | Tier 0 + Tier 3 (CI validates compatibility) |
+| Agent cannot run Tier 2 (tooling unavailable) | Push with Tier 1 only, note in PR that Tier 2 was skipped |
+
+Escape hatches must be explicit (noted in PR description, not silent).
+
+## Consequences
+
+- (+) Agent feedback loops drop from 15 minutes to 30–90 seconds for most issues
+- (+) Remote CI failure rate drops — issues caught locally before push
+- (+) Agents can self-validate autonomously without waiting for external systems
+- (+) Investment is progressive — each tier delivers value independently
+- (+) Clear ownership: Tier 0–2 are developer/agent responsibility; Tier 3 is platform responsibility
+- (+) Cost reduction — fewer CI minutes wasted on obviously-broken pushes
+- (-) Tier 2 infrastructure requires maintenance (LocalStack config, container images, fixtures)
+- (-) Local machine requirements increase (Docker, disk space for containers)
+- (-) Tier 2 may diverge from real AWS behavior — LocalStack is not 100% faithful
+- (-) Pre-push gate adds 30–90s to every push (mitigation: `mise run check:fast` for safe paths)
+- (!) LocalStack fidelity gaps must be documented — when Tier 2 passes but Tier 3 fails, document the divergence and add it to Tier 2's scope
+- (!) Tier 2 "optional until stable" phase must have a defined graduation criteria, or it stays optional forever
+
+## References
+
+- Issue #149 — implementation tracking for this ADR
+- ADR-002 — bootstrap policies (Tier 2c validates IAM preflight locally)
+- ADR-008 — definition of done (tier requirements per DoD level)
+- ADR-012 (prerequisite) — operational knowledge stack; this ADR depends on 012's skill model for agent interaction with validation tiers
+- Current hooks: `.pre-commit-config.yaml` (Tier 0 implementation)
+- Current build: `mise.toml` root + package-level configs (Tier 1 implementation)
+- LocalStack: https://localstack.cloud (candidate for Tier 2a)
+- Firecracker MicroVMs: https://firecracker-microvm.github.io (candidate for Tier 2d)