feat(vault-secrets): add @adobe/spacecat-shared-vault-secrets package#1364
feat(vault-secrets): add @adobe/spacecat-shared-vault-secrets package#1364solaris007 merged 18 commits intomainfrom
Conversation
Adds loadBootstrapConfig() that reads Vault AppRole credentials from AWS Secrets Manager using aws4-signed requests. Includes tests for happy path, missing secret, network error, and missing AWS credentials with 100% coverage.
Implement vaultSecrets middleware wrapper and loadSecrets function with:
- Bootstrap from AWS Secrets Manager, AppRole auth, secret loading
- Two-tier caching: time-based expiration + metadata-based change detection
- Convention-based path resolution ({env}/{func.name}) with custom name support
- Merges secrets into context.env and process.env
- Returns 502 Response on error with x-error header
- Skips Vault on simulate runtime or missing ctx.func
- Token renewal when expiring soon, re-auth when expired
- Remove public token getter from VaultClient (security)
- Cache bootstrap config for re-auth instead of re-fetching (efficiency)
- Add ensureClient promise lock to prevent concurrent bootstrap race
- Remove redundant metadata call on initial load, establish baseline
on first metadata check instead
- Fix x-error header to match helix-shared-secrets ('error fetching secrets.')
- Stop leaking AWS response body in bootstrap error messages
- Fix README option name (bootstrapPath) and name function signature
|
This PR will trigger a minor release when merged. |
- ensureClient lock now checks client state after awaiting, so concurrent waiters get a clear error when bootstrap fails instead of proceeding with a null client - Reuse lastChanged value from metadata check instead of calling getLastChangedDate a second time after re-fetch (eliminates one HTTP call per secret rotation and fixes TOCTOU issue)
…apper # Conflicts: # package-lock.json
Comprehensive README covering VPC requirements, IAM permissions, bootstrap secret format, secret-id rotation, and error reference. CLAUDE.md with dev instructions and e2e validation procedure.
Design for SITES-40735 - per-service credential isolation with auto-resolved bootstrap paths, 5-phase migration plan, and validation gates between each phase for self-correction.
SM secret creation, three IAM policy updates, data-service ARN variable change, and expanded validation gate checks for the infra repo.
coralogix-feeder is excluded from the SM-to-Vault migration. Updates service count from 13 to 12, removes from all validation gate bash loops, and adjusts rollout order accordingly.
Replace hardcoded DEFAULT_BOOTSTRAP_PATH with dynamic resolution:
- No bootstrapPath option: resolves to /mysticat/bootstrap/{ctx.func.name}
- Explicit bootstrapPath option: used as-is (override)
This enables zero-config per-service bootstrap secrets. Each service
automatically reads from its own SM bootstrap secret based on its
function name (e.g. api-service -> /mysticat/bootstrap/api-service).
SITES-40735
…strap
- Update bootstrap path convention from /mysticat/vault-bootstrap
to /mysticat/bootstrap/{service-name} (auto-resolved)
- Replace data-service-only AppRole table with per-service convention
- Update IAM section to reflect policies already include /mysticat/bootstrap/*
- Update production integration blocklist with completed items
- Update test count from 51 to 54
ekremney
left a comment
There was a problem hiding this comment.
PR #1364 Review
@solaris007 — Nice work. Clean architecture, solid test coverage, and the migration plan is thorough. One item I'd like addressed before merge, plus a few smaller things.
Findings
1. No recovery when cached bootstrap credentials become stale — Medium-High
File: vault-secrets-wrapper.js, ensureClient() re-auth branch
When a Vault token expires, ensureClient re-authenticates using the cached bootstrapConfig. If secret_id was rotated while the Lambda container is warm, every subsequent request fails with 502 until the container is recycled — there is no fallback to re-fetch from SM.
// Current: always uses cached config, no recovery
} else if (!vaultClient.isAuthenticated()) {
await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
}Suggested fix: catch auth failure and retry with fresh bootstrap:
} else if (!vaultClient.isAuthenticated()) {
try {
await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
} catch {
// secret_id may have been rotated — re-fetch from SM
const bootstrapPath = resolveBootstrapPath(ctx, opts);
bootstrapConfig = await loadBootstrapConfig({ bootstrapPath });
await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
}
}This adds one extra SM call only on auth failure (not on the happy path), and makes secret-id rotation seamless without requiring container recycle. The migration plan calls out secret-id rotation as a tracked concern (SITES-40736) — this code path is exactly where it would bite. Recommend fixing before merge.
2. Singleton state assumes one service identity per runtime — Medium
The module-level cache, vaultClient, bootstrapConfig, and bootstrapEnvironment are all singletons. secretPath is resolved per call, but cache reads/writes are not keyed by it. If name resolves to different paths across invocations in the same runtime, one path receives another's cached secrets.
For the intended deployment model (one Lambda = one service = one secret path) this is fine and matches helix-shared-secrets. But the public API (name as function, bootstrapPath as option) implies broader flexibility than the singleton state supports.
Recommendation: No code change needed — just add a contract note to the README and the TypeScript declarations:
This module assumes one service identity per process lifetime. All invocations within a runtime share a single Vault client, bootstrap config, and secret cache.
3. Stale TypeScript declaration comment — Low
File: index.d.ts:14
Comment says Default: '/mysticat/vault-bootstrap' but the code now auto-resolves to /mysticat/bootstrap/${ctx.func.name}. Should be:
/** AWS Secrets Manager path for Vault bootstrap config. Default: auto-resolved to '/mysticat/bootstrap/{ctx.func.name}' */
bootstrapPath?: string;4. cache.lastChanged not reset after hard-expiration re-fetch — Low
File: vault-secrets-wrapper.js, re-fetch block
When cache hard-expires (1h), secrets are re-fetched but cache.lastChanged retains its old value. The next metadata check compares against stale lastChanged and may trigger one redundant re-fetch. Harmless (at most 1 extra call per hour), but easy to clean up:
if (!cache.data || isExpired || metadataChanged) {
cache.data = await vaultClient.readSecret(secretPath);
cache.loaded = now;
cache.checked = now;
if (metadataChanged) {
cache.lastChanged = newLastChanged;
} else if (isExpired) {
cache.lastChanged = 0;
}
}Nits
ensureClient(ctx, opts, log)—logis alwaysctx.log; could simplify to(ctx, opts).- The concurrency lock (
clientLock = new Promise(...)) works correctly but a one-line comment noting it's a one-shot lock (cleared infinally) would help future readers.
What's done well
- Zero-config design — bootstrap path and Vault path both auto-resolve from
ctx.func.name - Lightweight — aws4 signing without the full AWS SDK for a single SM call
- Two-tier caching — 0 HTTP calls for warm invocations within 60s, metadata-based invalidation within the 1h window
- Concurrency lock — prevents duplicate bootstrap/auth during cold-start fan-out
- Drop-in contract — 502 +
x-error: error fetching secrets.matches helix-shared-secrets exactly - Test coverage — 54 tests covering concurrency, token lifecycle, metadata change detection, custom name fallback, bootstrap failure propagation
- Migration plan — 5-phase rollout with validation gates, rollback at any phase, old SM secrets preserved until Phase 5
Verdict
| Finding | Severity | Action |
|---|---|---|
| Bootstrap credential staleness | Medium-High | Fix before merge |
| Singleton contract documentation | Medium | Document |
| Stale d.ts comment | Low | Fix |
| cache.lastChanged after expiration | Low | Nice-to-have |
Package is well-implemented and ready to go once #1 is addressed.
- Add bootstrap re-fetch on auth failure for seamless secret_id rotation - Reset cache.lastChanged after hard expiration to avoid redundant re-fetch - Fix stale d.ts comment for bootstrapPath default - Add singleton contract note to README and d.ts - Simplify ensureClient signature (remove redundant log param) - Add concurrency lock comment for future readers - Add 2 tests for bootstrap re-fetch behavior (56 total, 100% coverage)
|
Thanks for the utterly thorough review, @ekremney! Really appreciate you catching the bootstrap staleness issue - that's exactly the kind of thing that would bite us at 2am on a Friday when someone rotates a secret_id. All four findings addressed in 1. Bootstrap credential staleness (Medium-High) - Done. 2. Singleton contract documentation (Medium) - Added note to both the README ("How It Works" section) and the TypeScript declarations above 3. Stale d.ts comment (Low) - Fixed, now says 4. cache.lastChanged after hard expiration (Low) - Reset to 0 on hard expiration so the next metadata check establishes a fresh baseline. Also took the nits: dropped the redundant 56 tests, 100% coverage, lint clean. |
ekremney
left a comment
There was a problem hiding this comment.
Remarkably thoroughly addressed, @solaris007. All four findings resolved, nits included, coverage up. Ship it.
|
🎉 This PR is included in version @adobe/spacecat-shared-vault-secrets-v1.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Why
SpaceCat Lambda functions currently use
@adobe/helix-shared-secretsto load secrets from AWS Secrets Manager. This couples secret management to AWS administrative access - CAMP's 3 immutable roles (admin, power user, read only) don't allow creating a restricted Secrets Manager-only role, so anyone with AWS admin access can read all secrets. Following the S3 credentials leak incident (MSI0001036), moving secrets to HashiCorp Vault was identified as remediation track #6: Vault provides a separate auth boundary with granular ACLs, audit trail, and short-lived credentials, fully de-coupled from AWS access.This package enables the migration - services switch from
helixSecretstovaultSecretsin their middleware chain with no other code changes.What
Adds
@adobe/spacecat-shared-vault-secrets- a drop-in replacement for@adobe/helix-shared-secretsthat loads secrets from Vault instead of AWS Secrets Manager. Same.with()middleware interface, same caching strategy, different backend./mysticat/bootstrap/{service-name}(auto-resolved fromctx.func.name) to get per-service Vault AppRole credentialsdx_mysticat/{environment}/{service-name}(KV V2)context.envandprocess.envPer-service credential isolation: Each service gets its own AppRole, its own policy, and its own bootstrap secret. A compromised service credential can only read that service's secrets.
HTTP call budget: Cold start = 3 calls (AWS SM + AppRole login + secret read). Warm invocation within 60s = 0 calls. Past 60s = 1 call (metadata). Secret rotation = 2 calls (metadata + read).
Files
src/vault-client.jssrc/bootstrap.jssrc/vault-secrets-wrapper.jssrc/index.js/src/index.d.tsUsage
Prerequisites
Each service in each AWS account (dev/stage/prod) needs:
Per-service AppRole in Vault - provisioned via
cst-vault/vault_policiesPRs (done)Per-service bootstrap secret in AWS Secrets Manager at
/mysticat/bootstrap/{service-name}:{ "role_id": "<service-specific-approle-role-id>", "secret_id": "<service-specific-secret-id>", "vault_addr": "https://vault-amer.adobe.net", "mount_point": "dx_mysticat", "environment": "dev" }VPC configuration - Lambda must run in the SpaceCat VPC (private subnets with NAT gateway). Vault rejects AppRole login from non-Adobe IPs with HTTP 403.
IAM permissions - Lambda role needs
secretsmanager:GetSecretValueon/mysticat/bootstrap/*. Three IAM policies updated in spacecat-infrastructure PR fix: add try-catch block to authentication manager #336.See the package README for full setup details, secret-id rotation procedure, and error reference.
Test plan