aws-samples · scottschreckengaust · May 26, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
@@ -39,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
 
 ### Common mistakes
 
+- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md).
+- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/<issue-number>-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`.
 - Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
 - Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
 - Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.

@@ -11,6 +11,10 @@ The rules below define how any contributor — human or AI — picks up, owns, a
 
 ## Decision
 
+### No branches without an Issue
+
+Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.
+
 ### No PRs without an Issue
 
 Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.
@@ -27,9 +31,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto
 
 Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval.
 
-### Self-assignment on start
+### Assignments
 
-Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification.
+Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.
 
 ### Issue body as primary directive
 
@@ -47,10 +51,16 @@ Before implementation, the assigned contributor must:
 
 **Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?"
 
-**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
+**Predecessor validation (GraphQL dependency graph is authoritative):**
+- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
+- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
+- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
+- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
 
 **Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment.
 
+**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).
+
 **Final gate:** If all checks pass, comment "Starting implementation."
 
 ### Identity and attribution
@@ -65,20 +75,55 @@ Provide progress signals at checkpoints. If blocked or abandoning, comment and u
 
 CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.
 
+### Conversational approval is NOT issue approval
+
+A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:
+
+1. Create an issue with acceptance criteria
+2. Request the `approved` label from an admin
+3. Self-assign once approved
+4. Then begin implementation
+
+**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.
+
+**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.
+
+### Enforcement mechanisms (planned)
+
+Prose governance is necessary but insufficient. The following enforcement points are planned to prevent bypass progressively. Mechanisms are deployed incrementally — see #186 for implementation tracking.
+
+| Mechanism | Layer | What it catches | Status |
+|-----------|-------|-----------------|--------|
+| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" | Implemented |
+| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference | Planned |
+| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` | Planned |
+| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API | Planned |
+| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context | Planned |
+| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue | Planned |
+
+**Transition:** Branch naming and commit-msg rules apply to branches created after the corresponding hooks are deployed. Existing branches (including this PR's) pre-date enforcement.
+
+**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (see ADR-012, proposed, for the skill model).
+
 ## Consequences
 
 - (+) Prevents duplicate effort — assignment signals ownership
 - (+) Prevents priority inversion — agents challenge low-priority requests
 - (+) Prevents rework — predecessor validation catches out-of-order work
 - (+) Issue body stays current — threads are folded back
 - (+) Cross-reference audit catches duplicates early
+- (+) Enforcement mechanisms catch bypass at multiple points
 - (-) Pre-start overhead for small tasks
 - (-) Requires discipline to fold threads into body
+- (-) Commit-msg hook adds friction for rapid iteration on approved work
 - (!) Assumes priority labels exist and are maintained
 - (!) Self-assignment is not atomic — concurrent agents may race; mitigate by verifying assignment after claiming via refresh
+- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral
 
 ## References
 
 - Issue #134 — full RFC with open questions and automation requirements
 - Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
 - ADR-001 — delivery methodology referenced by completion rules
+- ADR-012 (proposed) — operational knowledge stack; planned enforcement via skill gates
+- ADR-013 (proposed) — tiered validation; planned enforcement hooks at Tier 0 and Tier 1
@@ -0,0 +1,68 @@
+# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience.
+
+Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains.
+
+## Decision
+
+### Review comment classification
+
+| Type | Action | Propagates to |
+|------|--------|---------------|
+| Nit (style, naming) | Fix in PR | Nothing |
+| Bug (logic error) | Fix in PR | Nothing (unless systemic) |
+| Design concern | Pause PR; evaluate | Issue body |
+| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) |
+| Scope question | Clarify | Issue body |
+| Blocker (won't approve as-is) | Pause PR | Issue body |
+
+### Upstream propagation
+
+When a review surfaces a design concern or architecture challenge:
+
+1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one.
+2. **Assess** — Does this invalidate the issue's approach? The ADR's decision?
+3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents).
+4. **Resolve** — Revise the approach, defend with evidence, or cancel the work.
+5. **Resume** — Once resolved, unblock the PR and dependents.
+
+### ADR evolution
+
+| Trigger | Response |
+|---------|----------|
+| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR |
+| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause |
+| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` |
+| Decision works but needs refinement | Amend via PR (minor, no new ADR) |
+
+Never silently ignore a challenged decision.
+
+### Stacked PR chain revision
+
+When feedback on PR N invalidates PRs N+1 through N+M:
+1. Comment on all affected PRs
+2. Do not rebase dependent PRs until the base is stable
+3. If architectural: re-evaluate whether the remaining stack is valid
+4. If redesign needed: close dependent PRs, revise issue, re-plan
+
+## Consequences
+
+- (+) Review insights propagate to architectural decisions
+- (+) Issue bodies stay current with implementation learnings
+- (+) ADRs evolve rather than silently becoming outdated
+- (+) Stacked PR chains have a defined recovery protocol
+- (-) Adds process overhead to reviews (classification step)
+- (-) Pausing stacked chains delays delivery
+- (!) Requires discipline to actually propagate feedback upstream
+
+## References
+
+- Issue #136 — full RFC with open questions
+- ADR-003 — governance (issue body as source of truth)
+- ADR-001 — stacked PRs (chain revision protocol)
@@ -0,0 +1,82 @@
+# ADR-006: Feature flags for concurrent development
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits.
+
+Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other.
+
+## Decision
+
+### When to use flags
+
+| Situation | Use a flag? |
+|-----------|-------------|
+| Feature spans multiple PRs, incomplete state is unsafe | Yes |
+| Two contributors touch the same module for different purposes | Yes |
+| SRE needs a kill switch for a new capability | Yes |
+| Simple refactor with no behavioral change | No |
+| Bug fix | No |
+| One-PR feature, complete on merge | No |
+
+### Flag ownership
+
+- Every flag has an owner (the issue that introduced it)
+- Every flag has an expiration (the issue/PR that removes it)
+- Flags without a removal plan are rejected in review
+
+### Separation of concerns
+
+- **Planners** decide which features get flags (issue/RFC level)
+- **Implementors** add/use flags in code (PR level)
+- **SRE/operators** toggle flags in production (runtime level)
+- **No self-approval** — the person who introduces a flag cannot approve its removal
+
+### Flag lifecycle
+
+1. **Proposed** — issue identifies the need for a flag
+2. **Introduced** — PR adds the flag (default: off)
+3. **Active** — feature behind flag is in development
+4. **Verified** — feature complete, flag toggled on in testing
+5. **Permanent** — flag removed, feature is always-on (or removed entirely)
+
+### Lifecycle metadata
+
+Each flag must track:
+
+| Field | Required | Source |
+|-------|----------|--------|
+| Flag name | Yes | Code constant |
+| Purpose / linked issue | Yes | Issue reference |
+| First merge date | Yes | Auto from git log |
+| Max lifetime | Yes | Declared at creation (default: 4 weeks) |
+| Expected removal date | Yes | first_merge + max_lifetime |
+| Actual removal date | — | Auto when flag deleted |
+| Days active | — | Computed |
+
+### Maximum lifetime
+
+Flags must be removed within the declared max lifetime (default: 4 weeks) of the feature being verified. The max lifetime can be overridden per-flag with justification in the issue. Stale flags are treated as technical debt and surfaced in periodic reviews.
+
+### Mechanism constraint
+
+Flags MUST be resolvable at synth time for infrastructure flags and at runtime for behavior flags. The specific storage mechanism (CDK context, DynamoDB, SSM Parameter Store, env vars) is context-dependent and follows from this split — it is not prescribed by this ADR.
+
+## Consequences
+
+- (+) Concurrent work proceeds without blocking
+- (+) Trunk-based development: main stays deployable
+- (+) SRE can disable features without code changes
+- (+) Partial features merge safely
+- (-) Flag management overhead
+- (-) Combinatorial testing complexity if many flags exist simultaneously
+- (!) Maximum lifetime must be enforced or flags accumulate indefinitely
+
+## References
+
+- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars)
+- ADR-003 — governance (flag introduction requires approval)
+- ADR-005 — feedback loop (reviewer may flag-gate a feature during review)
@@ -0,0 +1,79 @@
+# ADR-007: Knowledge acquisition through progressive failure
+
+**Status:** proposed
+**Date:** 2026-05-19
+
+## Context
+
+Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured.
+
+Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures.
+
+## Decision
+
+### Zero-context execution attempts
+
+Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues.
+
+### Failure capture protocol
+
+At each failure point, the agent:
+1. **Stops** — does not attempt to work around or guess
+2. **Documents** — creates an issue: which document, which step, what was missing
+3. **Continues** — attempts the next step (if possible) to find additional gaps
+
+### Retrospectives
+
+After completing a task, project milestone, or sprint, agents produce a retrospective artifact:
+- What worked well (patterns to repeat)
+- What failed or caused friction (patterns to avoid)
+- Actionable experiments for future workflows
+
+Retrospectives are a first-class knowledge artifact — they feed into documentation improvements, inform ADR amendments, and surface systemic issues that individual task failures cannot.
+
+### Knowledge artifacts (interim)
+
+Until documentation meets ADR-004, agents may create ephemeral artifacts:
+- Semantic indices of the codebase (call graphs, dependency maps)
+- Annotated walkthroughs of successful executions
+- "What I learned" summaries after completing a task
+- Retrospectives (see above)
+
+These are scaffolding that informs documentation improvements, not documentation themselves.
+
+### Maturity model
+
+| Level | State | Agent capability |
+|-------|-------|-----------------|
+| 0 | No docs | Cannot start; files issue for missing docs |
+| 1 | Partial docs | Follows docs, stops at gaps, files issues |
+| 2 | Complete docs (ADR-004) | Completes end-to-end without help |
+| 3 | Self-improving | Detects drift between docs and code, auto-files issues |
+
+### The self-improvement loop
+
+```
+Agent starts fresh → follows docs → hits failure →
+  files issue → issue gets fixed → next agent goes further →
+    hits next failure → files issue → ...
+      until end-to-end works from zero context
+```
+
+This runs continuously because code changes outpace documentation and different agent implementations fail at different points.
+
+## Consequences
+
+- (+) Documentation gaps become bugs with reproduction steps
+- (+) Priority ordering emerges naturally (most common failures surface first)
+- (+) The system self-improves without human identification of gaps
+- (+) Creates a natural definition of "docs are done" (Level 2 achieved)
+- (-) Generates issue volume that needs triage
+- (-) Requires periodic investment in zero-context test runs
+- (!) The gap between Level 1 and Level 2 may be large — patience required
+
+## References
+
+- Issue #138 — full RFC with open questions
+- ADR-004 — defines the quality target (tabula rasa test)
+- ADR-003 — governance for issues filed by failing agents
+- ADR-008 — Level 4 Definition of Done depends on this protocol