Skip to content

RFC: Least-privilege CDK bootstrap policies as code (with preflight validation) #120

@scottschreckengaust

Description

@scottschreckengaust

Summary

Convert the three hand-documented IaCRole-ABCA policies (DEPLOYMENT_ROLES.md) into versioned, testable TypeScript code under cdk/src/bootstrap/. Provide a custom CDK bootstrap template that embeds these policies with per-compute-variant selection. Add a two-layer preflight validation system: a CDK Aspect that warns at synth time when the template exceeds the declared policy envelope, and a mise //cdk:preflight task that validates the live account's bootstrap state before deploy. Use stacked PRs for incremental review.

Use case and motivation

Who it's for: Operators deploying ABCA, CI/CD pipelines (deploy.yml), and agents modifying CDK stacks.

Problem today:

  1. cdk bootstrap grants AdministratorAccess to the CloudFormation execution role — violates least-privilege and may breach organizational compliance gates (SCPs, permission boundaries).
  2. The three IaCRole-ABCA policies exist only as JSON blobs in a design doc. An operator must manually create them in IAM, then re-bootstrap — error-prone and unversioned.
  3. When a new release adds resources (e.g., SQS queue, Step Functions), operators who pull-and-deploy hit a mid-rollback failure because their bootstrap policy predates the new permissions. No warning until CloudFormation fails 15 minutes into the update.

After this RFC:

  • Policies are code: versioned, tested, diffable in PRs.
  • mise //cdk:synth warns immediately when a CDK construct requires permissions not covered by the declared policy.
  • mise //cdk:preflight (and CI) verifies the deployed bootstrap version is compatible before cdk deploy runs — failing fast with an actionable message ("re-bootstrap required: v1.2 → v2.0, adds SQS and StepFunctions permissions").
  • Operators run a single mise //cdk:bootstrap to provision least-privilege roles.

Proposal

Location

  • Policy source codecdk/src/bootstrap/policies/ — co-located with the CDK app for testability, agent routing, Aspect import at synth-time, and self-containment.
  • Template generatorcdk/scripts/generate-bootstrap-template.ts — build-time script, not part of CDK synthesis (bootstrap is an operational prerequisite, not an app construct).
  • Generated artifactscdk/bootstrap/ — committed, operator-consumable.

Policy architecture: core + compute variants

Policy Always required? Scope
infrastructure Yes VPC, IAM, CloudFormation
application Yes DynamoDB, Lambda, API GW, Cognito, WAF, EventBridge, Secrets
observability Yes CloudWatch, ECR, S3, KMS, SSM, STS, Bedrock Guardrails
compute-agentcore Yes (default runtime) Bedrock AgentCore-specific permissions
compute-ecs Optional ECS Fargate cluster, task definitions
compute-eks Future EKS permissions
compute-ec2 Future EC2 permissions

Operator-controlled ceiling: The bootstrap template accepts a ComputeTypes parameter (default: agentcore). Operators choose which compute variants to include based on their security posture. The CDKToolkit stack's attached policies represent the maximum permissions authorized for any CloudFormation deployment in that account.

Sufficiency model: Preflight checks deployed PolicySet ⊇ app's required set — not equality. Multiple stacks with different compute types share one bootstrap; the bootstrap is the union of all needed policies.

Versioning (triple-layer)

Layer What it answers How it works
Semver (BOOTSTRAP_VERSION) "Is my bootstrap broadly compatible?" Bumped when permissions change. Major = breaking (re-bootstrap required). Emitted as CF output on CDKToolkit stack.
Hash (BOOTSTRAP_HASH) "Has my deployed bootstrap drifted from code?" SHA256 of the policies actually attached (per-configuration, not global). Detects console drift.
PolicySet (BootstrapPolicySet) "Which policies are attached?" CF output listing included policies. Used for sufficiency check.
Action-set "Precisely which actions are missing?" Resource-type → IAM-actions map for the ~30 resource types in this app. Used by the Aspect and preflight.

Preflight (two layers)

  • CDK Aspect — runs at synth time (mise //cdk:synth), warns/errors when template resources exceed the policy envelope
  • Live validator — runs before deploy (mise //cdk:preflight), checks deployed bootstrap PolicySet/version/hash against requirements. Distinguishes "forgot to add" from "intentionally excluded."

File structure

cdk/
  src/
    bootstrap/
      index.ts
      policies/
        infrastructure.ts
        application.ts
        observability.ts
        compute-agentcore.ts
        compute-ecs.ts
        index.ts
      preflight/
        aspect.ts
        validator.ts
        resource-action-map.ts
        index.ts
      version.ts
  scripts/
    generate-bootstrap-artifacts.ts
    generate-bootstrap-template.ts
  test/
    bootstrap/
      policies.test.ts
      artifact-sync.test.ts
      golden-baseline.test.ts
      aspect.test.ts
      validator.test.ts
      version.test.ts
      bootstrap-template.test.ts
      resource-action-map.test.ts
  bootstrap/                          # generated outputs (committed)
    bootstrap-template.yaml
    policies/
      infrastructure.json
      application.json
      observability.json
      compute-agentcore.json
      compute-ecs.json
    BOOTSTRAP_VERSION
    BOOTSTRAP_HASH

Stacked PR strategy

Each sub-issue is a reviewable PR targeting main directly (per ADR-001 §8, predecessors merge first, successors retarget and rebase).

Estimated review time per PR

PR Est. Review Status
0. ADR framework + stacked PRs ~20 min ✅ Merged (#130)
1. ADR-002 decision record ~15 min ✅ Merged (#133)
2. Policies as TypeScript ~40 min ✅ Merged (#158)
3. Bootstrap template + compute variants ~30 min ✅ Merged (#162)
4. Resource-action-map + ECS gate ~25 min 🔄 In progress (PR #165)
5. CDK Aspect ~30 min Queued
6. Live preflight validator ~30 min Queued
7. CI integration ~20 min Queued
8. Documentation ~15 min Queued
Total ~3.8 hours 3/8 complete

Out of scope

  • SCPs or Organization-level controls (single-account only)
  • Permission boundaries on runtime roles (existing CDK grants handle those)
  • Automated policy generation from CloudTrail
  • Multi-account deployment (hub-and-spoke bootstrap)
  • Modifying CDK Deploy, File Publishing, Image Publishing, or Lookup roles
  • E2E deployment verification (future: deploy → run task → verify → intentional-omit test)

Potential challenges

Risk Mitigation
Policy size limit (6,144 chars) Tests assert each rendered JSON < limit
Resource-action-map maintenance Covers only ~30 resource types this app uses; unknown → warning
CDK bootstrap template drift Pin to known CDK version; document rebase on upgrade
Chicken-and-egg on first deploy Custom template is superset of default; fresh bootstrap works
Mid-deploy failure if preflight skipped deploy.yml gates on preflight; task deps enforce locally
Multi-stack accounts with different compute needs Bootstrap = union of all required; sufficiency check per-stack

Dependencies and integrations

Alternative solutions

Approach Why not preferred
Keep policies as documentation only No testability, no preflight, no versioning
CloudFormation StackSet for policies Over-engineering for single-account
CDK Pipelines self-mutating bootstrap Circular dependency
cfn-guard rules instead of Aspect Can't compare against a dynamic policy envelope
Action-set only (no semver) Loses quick operator answer; both together is best
Single monolithic "Compute" policy Operators can't restrict to their actual compute; violates least-privilege

Sub-issues (stacked PR order)

References

Metadata

Metadata

Labels

RFC-proposalRequest for Comments: design proposalenhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions