Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.

### Common mistakes

- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md).
- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/<issue-number>-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`.
- Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
- Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
- Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.
Expand Down
51 changes: 48 additions & 3 deletions docs/decisions/003-contribution-governance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ The rules below define how any contributor — human or AI — picks up, owns, a

## Decision

### No branches without an Issue

Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.

### No PRs without an Issue

Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.
Expand All @@ -27,9 +31,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto

Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval.

### Self-assignment on start
### Assignments

Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification.
Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.

### Issue body as primary directive

Expand All @@ -47,10 +51,16 @@ Before implementation, the assigned contributor must:

**Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?"

**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."
**Predecessor validation (GraphQL dependency graph is authoritative):**
- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."

**Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment.

**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).

**Final gate:** If all checks pass, comment "Starting implementation."

### Identity and attribution
Expand All @@ -65,20 +75,55 @@ Provide progress signals at checkpoints. If blocked or abandoning, comment and u

CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.

### Conversational approval is NOT issue approval

A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:

1. Create an issue with acceptance criteria
2. Request the `approved` label from an admin
3. Self-assign once approved
4. Then begin implementation

**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.

**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.

### Enforcement mechanisms (planned)

Prose governance is necessary but insufficient. The following enforcement points are planned to prevent bypass progressively. Mechanisms are deployed incrementally — see #186 for implementation tracking.

| Mechanism | Layer | What it catches | Status |
|-----------|-------|-----------------|--------|
| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" | Implemented |
| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference | Planned |
| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` | Planned |
| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API | Planned |
| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context | Planned |
| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue | Planned |

**Transition:** Branch naming and commit-msg rules apply to branches created after the corresponding hooks are deployed. Existing branches (including this PR's) pre-date enforcement.

**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (see ADR-012, proposed, for the skill model).

## Consequences

- (+) Prevents duplicate effort — assignment signals ownership
- (+) Prevents priority inversion — agents challenge low-priority requests
- (+) Prevents rework — predecessor validation catches out-of-order work
- (+) Issue body stays current — threads are folded back
- (+) Cross-reference audit catches duplicates early
- (+) Enforcement mechanisms catch bypass at multiple points
- (-) Pre-start overhead for small tasks
- (-) Requires discipline to fold threads into body
- (-) Commit-msg hook adds friction for rapid iteration on approved work
- (!) Assumes priority labels exist and are maintained
- (!) Self-assignment is not atomic — concurrent agents may race; mitigate by verifying assignment after claiming via refresh
- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral

## References

- Issue #134 — full RFC with open questions and automation requirements
- Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
- ADR-001 — delivery methodology referenced by completion rules
- ADR-012 (proposed) — operational knowledge stack; planned enforcement via skill gates
- ADR-013 (proposed) — tiered validation; planned enforcement hooks at Tier 0 and Tier 1
68 changes: 68 additions & 0 deletions docs/decisions/005-feedback-loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs

**Status:** proposed
**Date:** 2026-05-19

## Context

PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience.

Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains.

## Decision

### Review comment classification

| Type | Action | Propagates to |
|------|--------|---------------|
| Nit (style, naming) | Fix in PR | Nothing |
| Bug (logic error) | Fix in PR | Nothing (unless systemic) |
| Design concern | Pause PR; evaluate | Issue body |
| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) |
| Scope question | Clarify | Issue body |
| Blocker (won't approve as-is) | Pause PR | Issue body |

### Upstream propagation

When a review surfaces a design concern or architecture challenge:

1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one.
2. **Assess** — Does this invalidate the issue's approach? The ADR's decision?
3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents).
4. **Resolve** — Revise the approach, defend with evidence, or cancel the work.
5. **Resume** — Once resolved, unblock the PR and dependents.

### ADR evolution

| Trigger | Response |
|---------|----------|
| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR |
| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause |
| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` |
| Decision works but needs refinement | Amend via PR (minor, no new ADR) |

Never silently ignore a challenged decision.

### Stacked PR chain revision

When feedback on PR N invalidates PRs N+1 through N+M:
1. Comment on all affected PRs
2. Do not rebase dependent PRs until the base is stable
3. If architectural: re-evaluate whether the remaining stack is valid
4. If redesign needed: close dependent PRs, revise issue, re-plan

## Consequences

- (+) Review insights propagate to architectural decisions
- (+) Issue bodies stay current with implementation learnings
- (+) ADRs evolve rather than silently becoming outdated
- (+) Stacked PR chains have a defined recovery protocol
- (-) Adds process overhead to reviews (classification step)
- (-) Pausing stacked chains delays delivery
- (!) Requires discipline to actually propagate feedback upstream

## References

- Issue #136 — full RFC with open questions
- ADR-003 — governance (issue body as source of truth)
- ADR-001 — stacked PRs (chain revision protocol)
82 changes: 82 additions & 0 deletions docs/decisions/006-feature-flags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# ADR-006: Feature flags for concurrent development

**Status:** proposed
**Date:** 2026-05-19

## Context

Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits.

Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other.

## Decision

### When to use flags

| Situation | Use a flag? |
|-----------|-------------|
| Feature spans multiple PRs, incomplete state is unsafe | Yes |
| Two contributors touch the same module for different purposes | Yes |
| SRE needs a kill switch for a new capability | Yes |
| Simple refactor with no behavioral change | No |
| Bug fix | No |
| One-PR feature, complete on merge | No |

### Flag ownership

- Every flag has an owner (the issue that introduced it)
- Every flag has an expiration (the issue/PR that removes it)
- Flags without a removal plan are rejected in review

### Separation of concerns

- **Planners** decide which features get flags (issue/RFC level)
- **Implementors** add/use flags in code (PR level)
- **SRE/operators** toggle flags in production (runtime level)
- **No self-approval** — the person who introduces a flag cannot approve its removal

### Flag lifecycle

1. **Proposed** — issue identifies the need for a flag
2. **Introduced** — PR adds the flag (default: off)
3. **Active** — feature behind flag is in development
4. **Verified** — feature complete, flag toggled on in testing
5. **Permanent** — flag removed, feature is always-on (or removed entirely)

### Lifecycle metadata

Each flag must track:

| Field | Required | Source |
|-------|----------|--------|
| Flag name | Yes | Code constant |
| Purpose / linked issue | Yes | Issue reference |
| First merge date | Yes | Auto from git log |
| Max lifetime | Yes | Declared at creation (default: 4 weeks) |
| Expected removal date | Yes | first_merge + max_lifetime |
| Actual removal date | — | Auto when flag deleted |
| Days active | — | Computed |

### Maximum lifetime

Flags must be removed within the declared max lifetime (default: 4 weeks) of the feature being verified. The max lifetime can be overridden per-flag with justification in the issue. Stale flags are treated as technical debt and surfaced in periodic reviews.

### Mechanism constraint

Flags MUST be resolvable at synth time for infrastructure flags and at runtime for behavior flags. The specific storage mechanism (CDK context, DynamoDB, SSM Parameter Store, env vars) is context-dependent and follows from this split — it is not prescribed by this ADR.

## Consequences

- (+) Concurrent work proceeds without blocking
- (+) Trunk-based development: main stays deployable
- (+) SRE can disable features without code changes
- (+) Partial features merge safely
- (-) Flag management overhead
- (-) Combinatorial testing complexity if many flags exist simultaneously
- (!) Maximum lifetime must be enforced or flags accumulate indefinitely

## References

- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars)
- ADR-003 — governance (flag introduction requires approval)
- ADR-005 — feedback loop (reviewer may flag-gate a feature during review)
79 changes: 79 additions & 0 deletions docs/decisions/007-knowledge-acquisition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# ADR-007: Knowledge acquisition through progressive failure

**Status:** proposed
**Date:** 2026-05-19

## Context

Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured.

Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures.

## Decision

### Zero-context execution attempts

Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues.

### Failure capture protocol

At each failure point, the agent:
1. **Stops** — does not attempt to work around or guess
2. **Documents** — creates an issue: which document, which step, what was missing
3. **Continues** — attempts the next step (if possible) to find additional gaps

### Retrospectives

After completing a task, project milestone, or sprint, agents produce a retrospective artifact:
- What worked well (patterns to repeat)
- What failed or caused friction (patterns to avoid)
- Actionable experiments for future workflows

Retrospectives are a first-class knowledge artifact — they feed into documentation improvements, inform ADR amendments, and surface systemic issues that individual task failures cannot.

### Knowledge artifacts (interim)

Until documentation meets ADR-004, agents may create ephemeral artifacts:
- Semantic indices of the codebase (call graphs, dependency maps)
- Annotated walkthroughs of successful executions
- "What I learned" summaries after completing a task
Comment thread
scottschreckengaust marked this conversation as resolved.
- Retrospectives (see above)

These are scaffolding that informs documentation improvements, not documentation themselves.

### Maturity model

| Level | State | Agent capability |
|-------|-------|-----------------|
| 0 | No docs | Cannot start; files issue for missing docs |
| 1 | Partial docs | Follows docs, stops at gaps, files issues |
| 2 | Complete docs (ADR-004) | Completes end-to-end without help |
| 3 | Self-improving | Detects drift between docs and code, auto-files issues |

### The self-improvement loop

```
Agent starts fresh → follows docs → hits failure →
files issue → issue gets fixed → next agent goes further →
hits next failure → files issue → ...
until end-to-end works from zero context
```

This runs continuously because code changes outpace documentation and different agent implementations fail at different points.

## Consequences

- (+) Documentation gaps become bugs with reproduction steps
- (+) Priority ordering emerges naturally (most common failures surface first)
- (+) The system self-improves without human identification of gaps
- (+) Creates a natural definition of "docs are done" (Level 2 achieved)
- (-) Generates issue volume that needs triage
- (-) Requires periodic investment in zero-context test runs
- (!) The gap between Level 1 and Level 2 may be large — patience required

## References

- Issue #138 — full RFC with open questions
- ADR-004 — defines the quality target (tabula rasa test)
- ADR-003 — governance for issues filed by failing agents
- ADR-008 — Level 4 Definition of Done depends on this protocol
Loading