Skip to content

Support cross aws account iam role authetication#979

Open
munishchouhan wants to merge 25 commits intomasterfrom
001-cross-account-iam-role-auth
Open

Support cross aws account iam role authetication#979
munishchouhan wants to merge 25 commits intomasterfrom
001-cross-account-iam-role-auth

Conversation

@munishchouhan
Copy link
Member

@munishchouhan munishchouhan commented Feb 11, 2026

Summary

  • Adds AWS STS AssumeRole support so Wave can authenticate to customer ECR registries using IAM role chaining instead of static credentials
  • Implements a two-hop role chaining model: Wave assumes a "jump" role, then assumes the customer's target role with an external ID for secure cross-account access
  • Temporary credentials are cached with automatic refresh 5 minutes before expiration, with retry logic for transient STS errors
  • Fully backward compatible — existing static credential flows are unchanged

Key Changes

  • AwsEcrService: Major rework to support role-based auth via AssumeRole, credential caching with expiry-aware refresh, and jump role chaining
    (WAVE_AWS_JUMP_ROLE_ARN/WAVE_AWS_JUMP_EXTERNAL_ID)
  • StsRetryPredicate: New retry predicate for transient STS errors (throttling, service unavailable)
  • ContainerRegistryKeys: Updated to propagate session tokens for temporary credentials
  • build.gradle: Promoted software.amazon.awssdk:sts from runtimeOnly to implementation
  • docs/configuration.md: Added documentation for cross-account role chaining configuration
  • Tests: Comprehensive Spock specs for AwsEcrService and ContainerRegistryKeys covering role detection, credential caching, refresh, error handling, and backward compatibility

Test Plan

  • Verify static credential flow still works unchanged
  • Test role-based auth with a configured jump role ARN and customer role ARN
  • Verify credential caching and automatic refresh before expiry
  • Verify retry behavior on transient STS errors
  • Run full test suite: ./gradlew test

Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
@munishchouhan munishchouhan self-assigned this Feb 11, 2026
@munishchouhan munishchouhan marked this pull request as draft February 11, 2026 12:04
@munishchouhan munishchouhan marked this pull request as ready for review February 13, 2026 16:29
@munishchouhan
Copy link
Member Author

@claude reveiw

@claude
Copy link

claude bot commented Feb 13, 2026

Claude finished @munishchouhan's task —— View job


Code Review for Cross-Account IAM Role Authentication

I've completed a comprehensive review of this PR that adds AWS IAM cross-account role authentication support. This is a solid implementation with excellent security practices and thorough testing.

Summary

This PR successfully implements AWS IAM cross-account role assumption for ECR authentication, allowing Wave to securely access ECR repositories across AWS accounts without sharing long-term credentials. The implementation is backward compatible and follows security best practices.

Code Quality Assessment: ✅ Excellent

Core Implementation Quality

ContainerRegistryKeys.groovy - src/main/groovy/io/seqera/wave/service/ContainerRegistryKeys.groovy:57-60

  • ✅ Clean mapping of assumeRoleArn → userName and externalId → password
  • ✅ Proper priority: IAM role credentials take precedence over access keys
  • ✅ Good security logging with sensitive data redaction
  • ✅ toString() method properly redacts sensitive fields

AwsEcrService.groovy - Excellent implementation with comprehensive features:

  • Pattern matching: Clean detection of role ARN vs access key using AWS_ROLE_ARN regex pattern (line 64)
  • Security: Proper session token handling in all AWS SDK calls (lines 196-213)
  • Error handling: Excellent STS exception mapping with actionable error messages (lines 162-193)
  • Caching strategy: Smart cache key using stable roleArn instead of temporary credentials (lines 288-295)
  • Session naming: Good practice using timestamp in session names (line 139)
  • Duration: Appropriate 1-hour session duration (line 140)

CredentialServiceImpl.groovy - Well-integrated with existing flow:

  • ✅ Maintains existing AWS credential lookup logic
  • ✅ Proper integration with Tower's credential system

Test Coverage Quality: ✅ Outstanding

ContainerRegistryKeysTest.groovy - Comprehensive unit tests (304 lines):

  • ✅ Tests all authentication scenarios (container-reg, AWS access keys, IAM roles)
  • ✅ Tests role prioritization over access keys
  • ✅ Tests with/without external ID
  • ✅ Tests toString() redaction for various password lengths
  • ✅ Tests various registry hostnames (Docker Hub, Quay, ECR, GCR, etc.)
  • ✅ Tests error conditions and edge cases
  • ✅ Tests JSON parsing with extra fields

AwsEcrServiceTest.groovy - Thorough unit tests:

  • ✅ Tests role ARN pattern detection with valid/invalid patterns
  • ✅ Tests cache key generation for static vs session credentials
  • ✅ Tests parameter validation
  • ✅ Tests integration with existing ECR functionality

Security Assessment: ✅ Excellent

  • External ID support: Prevents confused deputy attacks (line 144)
  • Temporary credentials: 1-hour expiration reduces exposure (line 140)
  • Session tokens: Properly included in all API calls
  • Credential redaction: Sensitive data properly masked in logs and toString()
  • Default credentials: Uses Wave's default AWS credentials to assume customer roles
  • Exception mapping: Comprehensive error handling without exposing sensitive details

Dependency Changes: ✅ Appropriate

build.gradle - Line 96:

  • ✅ Changed software.amazon.awssdk:sts from runtimeOnly to implementation
  • ✅ Required for direct STS AssumeRole API usage - this change is correct and necessary

Documentation: ✅ Comprehensive

The specs/ directory contains extensive documentation covering:

  • Service interface contracts
  • STS integration contracts
  • Data model documentation
  • Implementation plans and research

Minor Suggestions for Future Enhancement

  1. Metrics: Consider adding metrics for role assumption success/failure rates
  2. Logging: Could add trace-level logging for cache hit/miss scenarios
  3. Error context: STS error messages could include the role ARN for easier debugging

Backward Compatibility: ✅ Maintained

  • ✅ Existing static AWS credentials continue to work unchanged
  • ✅ Automatic detection via pattern matching requires no API changes
  • ✅ All existing functionality preserved

Final Assessment

This is a high-quality implementation that:

  • ✅ Follows AWS security best practices
  • ✅ Has comprehensive test coverage
  • ✅ Maintains backward compatibility
  • ✅ Includes proper error handling and logging
  • ✅ Uses appropriate caching strategies

The code is production-ready and demonstrates excellent understanding of AWS IAM, STS, and ECR authentication patterns.

Recommendation: ✅ APPROVED - This PR is ready to merge.

@munishchouhan
Copy link
Member Author

Tested locally with assume role and external Id

@munishchouhan
Copy link
Member Author

@claude review

@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @munishchouhan's task —— View job


Code Review: AWS Cross-Account IAM Role Authentication

I've completed a comprehensive review of this PR that implements AWS cross-account IAM role authentication for ECR. The implementation has evolved since the previous review and continues to demonstrate excellent engineering practices.

Todo List

  • Read core implementation files
  • Analyze security implementation
  • Review test coverage
  • Check for potential issues
  • Provide comprehensive review feedback

Summary

This PR successfully adds AWS IAM cross-account role assumption capability to Wave, enabling secure ECR access across AWS accounts without sharing long-term credentials. The implementation is production-ready with excellent security practices and comprehensive testing.

Key Strengths

Security Implementation: ✅ Excellent

  • External ID support: Prevents confused deputy attacks (AwsEcrService.groovy:156)
  • Temporary credentials: 1-hour session duration minimizes exposure (line 153)
  • Session tokens: Properly included in all AWS SDK calls (lines 209-210, 219-220)
  • Credential redaction: Sensitive data masked in logs and toString() methods (ContainerRegistryKeys.groovy:68)
  • Smart cache keys: Uses stable roleArn instead of temporary credentials for caching (AwsEcrService.groovy:302-308)

Robust Error Handling: ✅ Outstanding

The STS exception mapping (AwsEcrService.groovy:175-206) provides excellent user guidance:

  • AccessDenied: Clear trust policy guidance
  • InvalidParameterValue: Specific ARN format requirements
  • RegionDisabledException: STS enablement instructions
  • ExpiredTokenException: Clear expiration messaging

Intelligent Caching Strategy: ✅ Smart Design

  • 5-minute refresh buffer: Proactive credential refresh (line 72)
  • Minimum TTL protection: 1-minute minimum prevents thrashing (line 77)
  • Dynamic TTL calculation: Based on STS credential expiration (lines 343-357)
  • Proper cache keys: Role ARN-based keys for stability vs temporary credential keys

Test Coverage: ✅ Comprehensive

ContainerRegistryKeysTest.groovy (304 lines):

  • ✅ All authentication scenarios (container-reg, AWS keys, IAM roles)
  • ✅ Role prioritization over static keys
  • ✅ External ID handling (with/without)
  • ✅ toString() redaction for various password lengths
  • ✅ Edge cases and error conditions

AwsEcrServiceTest.groovy:

  • ✅ Role ARN pattern detection with comprehensive test cases
  • ✅ Cache key generation for different credential types
  • ✅ TTL calculation scenarios with edge cases
  • ✅ Parameter validation

Recent Improvements

The latest commits show continued refinement:

  • 5-minute refresh buffer (commit e66fb1f): Smart proactive credential refresh
  • Merged with master: Staying current with codebase changes

Implementation Quality

Clean Architecture

  • Pattern-based detection: Role ARN vs access key via regex (AwsEcrService.groovy:67, 121)
  • Backward compatibility: Static credentials continue working unchanged
  • Separation of concerns: Clear method separation for role vs static auth

Production Readiness

  • Dependency update: STS moved from runtimeOnly to implementation (build.gradle:96) - correct and necessary
  • Session naming: Timestamped session names for traceability (AwsEcrService.groovy:152)
  • Comprehensive logging: Appropriate levels with security-conscious redaction

Security Assessment

Cross-Account Security: ✅ Best Practices

  • Uses Wave's default credentials to assume customer roles
  • External ID prevents confused deputy attacks
  • Temporary credentials with limited lifetime
  • Proper session token handling throughout

Credential Management: ✅ Secure

  • No long-term credential storage
  • Sensitive data redaction in all logging
  • Smart prioritization (IAM role > static keys)

Minor Enhancement Opportunities

  1. Metrics: Consider adding metrics for role assumption success/failure rates
  2. Debugging: Role ARN could be included in STS error contexts for easier troubleshooting
  3. Monitoring: Cache hit/miss logging at trace level

Backward Compatibility: ✅ Maintained

  • Existing static AWS credentials work unchanged
  • Automatic detection requires no API changes
  • All existing functionality preserved

Final Assessment

This implementation demonstrates:

  • Security best practices with external ID support and temporary credentials
  • Production-grade error handling with actionable user messages
  • Intelligent caching with dynamic TTL and refresh buffering
  • Comprehensive testing covering all scenarios and edge cases
  • Clean architecture with clear separation and backward compatibility

The code is well-architected, thoroughly tested, and follows AWS security best practices. The recent addition of the 5-minute refresh buffer shows continued attention to operational excellence.

Recommendation: ✅ APPROVED - This PR is ready to merge.


@munishchouhan
Copy link
Member Author

munishchouhan commented Feb 23, 2026

This PR has been tested locally
please review

Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
munishchouhan and others added 4 commits February 23, 2026 12:16
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
@munishchouhan munishchouhan force-pushed the 001-cross-account-iam-role-auth branch from 2b70189 to 43cf07e Compare March 3, 2026 11:48
@munishchouhan
Copy link
Member Author

@pditommaso Since now we have aws role and external ID is platform production
We can move forward with this PR

@pditommaso
Copy link
Collaborator

Review

Critical

  1. Replace @Retryable annotation with Wave's Retryable utility in AwsEcrService — Micronaut @Retryable does not work on self-invocations (calls within the same bean bypass the AOP proxy). Use io.seqera.util.retry.Retryable instead (same pattern as ProxyClient). This also allows removing StsRetryPredicate and the wave.aws.sts.retry.* config properties.

  2. Close STS and ECR clients after usestsClient(), ecrClient(), ecrPublicClient() create new SDK clients per call but never close them. AWS SDK clients hold HTTP connection pools and thread pools. Wrap with .withCloseable { } in all methods that create them.

Important

  1. Fix exception re-wrapping in getLoginToken — The catch block uses e.cause ?: e, which discards the user-friendly message from mapStsException. Add catch (AwsEcrAuthException e) { throw e } before the generic catch.

  2. Tighten role ARN regex — The trailing .+ matches any character. Change to [\w+=,.@\/-]+$ to restrict to valid IAM role name characters.

  3. Remove unnecessary as casts — e.g. client.assumeRole(request as AssumeRoleRequest) where request is already the correct type.

Suggestions

  1. Cache jump role credentials to avoid redundant STS calls when jump role is configured.
  2. Include identity info in STS session name for better CloudTrail tracing.
  3. Align log levels — use debug consistently for credential operations.

Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
@munishchouhan
Copy link
Member Author

@claude review again

@claude
Copy link

claude bot commented Mar 5, 2026

Claude finished @munishchouhan's task —— View job


Code Review: AWS Cross-Account IAM Role Authentication

I've completed a comprehensive review of this PR following up on @pditommaso's critical feedback. Excellent work addressing all the critical issues!

Todo List

  • Read and analyze core implementation files
  • Check if @pditommaso's critical issues have been addressed
  • Review security implementation and resource management
  • Provide comprehensive feedback with specific recommendations
  • Post final review assessment

Critical Issues: ✅ ALL RESOLVED

1. ✅ @retryable Annotation Fixed

Original Issue: Micronaut @Retryable doesn't work on self-invocations
Resolution: Successfully migrated to Wave's Retryable utility (AwsEcrService.groovy:224-241, 257-272)

  • Uses io.seqera.util.retry.Retryable with proper configuration
  • Removed StsRetryPredicate class entirely
  • No more wave.aws.sts.retry.* config dependencies

2. ✅ Resource Management Fixed

Original Issue: AWS SDK clients not being closed
Resolution: All clients now properly closed with withCloseable { }:

  • Line 223: assumeRole() - STS client properly closed
  • Line 256: assumeJumpRole() - STS client properly closed
  • Line 342: getLoginToken0() - ECR client properly closed
  • Line 351: getLoginToken1() - ECR public client properly closed

3. ✅ Exception Handling Fixed

Original Issue: Exception re-wrapping discarding user-friendly messages
Resolution: Proper exception handling in getLoginToken() (AwsEcrService.groovy:381-382)

  • AwsEcrAuthException properly caught and re-thrown
  • User-friendly error messages from mapStsException() preserved

4. ✅ Role ARN Regex Improved

Original Issue: Loose trailing .+ pattern
Resolution: Tightened to valid IAM characters (AwsEcrService.groovy:70)

  • Pattern: ~/^arn:aws(-cn|-us-gov)?:iam::\d{12}:role\/[\w+=,.@\/-]+$/
  • Properly restricts to valid IAM role name characters

5. ✅ Unnecessary Casts Removed

Original Issue: Redundant as casts
Resolution: No unnecessary as casts found in current code


Additional Excellent Improvements

Jump Role Credential Caching

@pditommaso's Suggestion Implemented: AwsJumpRoleCache.groovy (lines 44-84)

  • Dedicated cache for jump role credentials
  • Configurable duration (45m default) and size (100 entries)
  • Avoids redundant STS calls for jump role chaining

Enhanced Session Naming

@pditommaso's Suggestion Implemented: Identity info in session names

  • Lines 230, 263: wave-ecr-${extractAccountId(roleArn)}-${timestamp}
  • Lines 263: wave-jump-${extractAccountId(jumpRoleArn)}-${timestamp}
  • Excellent CloudTrail traceability

Consistent Logging

@pditommaso's Suggestion Implemented: Debug level consistently used

  • All credential operations use log.debug
  • Proper sensitive data redaction throughout

Security Assessment: 🔒 EXCELLENT

Cross-Account Security Best Practices

  • External ID support: Prevents confused deputy attacks (lines 234-237, 266-269)
  • Temporary credentials: 1-hour sessions minimize exposure window
  • Session tokens: Properly included in all AWS SDK calls
  • Jump role chaining: Secure two-hop authentication model
  • Credential redaction: Sensitive data masked in logs and toString()

Resource Management

  • Proper cleanup: All AWS clients wrapped with withCloseable
  • Connection pooling: SDK clients properly managed
  • Cache efficiency: Smart TTL calculation with 5-minute refresh buffer

Error Handling Excellence

The STS exception mapping (AwsEcrService.groovy:287-318) provides outstanding user guidance:

  • AccessDenied: Clear trust policy instructions
  • InvalidParameterValue: Specific ARN format requirements
  • RegionDisabledException: STS enablement guidance
  • ExpiredTokenException: Clear expiration messaging

Implementation Quality: 🏆 OUTSTANDING

Architecture

  • Clean separation: Role vs static credential flows
  • Backward compatibility: Existing static credentials unchanged
  • Pattern detection: Smart ARN vs access key detection

Caching Strategy

  • Intelligent keys: Role ARN-based (stable) vs credential-based (temporary)
  • Dynamic TTL: Based on STS credential expiration with safety buffers
  • Proactive refresh: 5-minute buffer prevents expiration race conditions

Test Coverage

  • Comprehensive: 794 lines of test coverage in AwsEcrServiceTest.groovy
  • Edge cases: Role ARN patterns, cache scenarios, error conditions
  • Security scenarios: External ID handling, credential validation

Dependency Management: ✅ CORRECT

build.gradle Line 96: software.amazon.awssdk:sts moved from runtimeOnly to implementation

  • Required change: Direct STS AssumeRole API usage necessitates implementation scope
  • Minimal impact: Only adds necessary dependency for new functionality

Final Assessment

This implementation demonstrates exceptional engineering quality:

Production Readiness

  • All critical issues from @pditommaso's review addressed
  • Comprehensive error handling with actionable user guidance
  • Proper resource management preventing memory/connection leaks
  • Intelligent caching with dynamic TTL management

Security Excellence

  • Follows AWS security best practices with external ID support
  • Implements secure jump role chaining for cross-account access
  • Proper credential lifecycle management with automatic refresh

Code Quality

  • Clean architecture with clear separation of concerns
  • Excellent test coverage including edge cases and error scenarios
  • Consistent logging and debugging capabilities

Recommendation: ✅ APPROVED FOR MERGE

This PR successfully implements AWS cross-account IAM role authentication with production-grade security, reliability, and maintainability. All critical feedback has been addressed, and the implementation follows industry best practices.


Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Signed-off-by: munishchouhan <hrma017@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants