Skip to content

feat(vault-secrets): add @adobe/spacecat-shared-vault-secrets package#1364

Merged
solaris007 merged 18 commits intomainfrom
feat/vault-secrets-wrapper
Feb 19, 2026
Merged

feat(vault-secrets): add @adobe/spacecat-shared-vault-secrets package#1364
solaris007 merged 18 commits intomainfrom
feat/vault-secrets-wrapper

Conversation

@solaris007
Copy link
Copy Markdown
Member

@solaris007 solaris007 commented Feb 18, 2026

Why

SpaceCat Lambda functions currently use @adobe/helix-shared-secrets to load secrets from AWS Secrets Manager. This couples secret management to AWS administrative access - CAMP's 3 immutable roles (admin, power user, read only) don't allow creating a restricted Secrets Manager-only role, so anyone with AWS admin access can read all secrets. Following the S3 credentials leak incident (MSI0001036), moving secrets to HashiCorp Vault was identified as remediation track #6: Vault provides a separate auth boundary with granular ACLs, audit trail, and short-lived credentials, fully de-coupled from AWS access.

This package enables the migration - services switch from helixSecrets to vaultSecrets in their middleware chain with no other code changes.

What

Adds @adobe/spacecat-shared-vault-secrets - a drop-in replacement for @adobe/helix-shared-secrets that loads secrets from Vault instead of AWS Secrets Manager. Same .with() middleware interface, same caching strategy, different backend.

  1. On cold start, reads bootstrap config from AWS Secrets Manager at /mysticat/bootstrap/{service-name} (auto-resolved from ctx.func.name) to get per-service Vault AppRole credentials
  2. Authenticates to Vault via AppRole (role_id + secret_id)
  3. Reads secrets from dx_mysticat/{environment}/{service-name} (KV V2)
  4. Merges into context.env and process.env
  5. Two-tier caching: 60s metadata check, 1h hard expiration

Per-service credential isolation: Each service gets its own AppRole, its own policy, and its own bootstrap secret. A compromised service credential can only read that service's secrets.

HTTP call budget: Cold start = 3 calls (AWS SM + AppRole login + secret read). Warm invocation within 60s = 0 calls. Past 60s = 1 call (metadata). Secret rotation = 2 calls (metadata + read).

Files

File Purpose
src/vault-client.js Vault HTTP client with private class fields (AppRole login, KV V2 read, metadata, token renewal)
src/bootstrap.js AWS Secrets Manager bootstrap loader (aws4-signed, no SDK dependency)
src/vault-secrets-wrapper.js Middleware wrapper (two-tier caching, path resolution, concurrency lock, env merge)
src/index.js / src/index.d.ts Exports and TypeScript declarations

Usage

import wrap from '@adobe/helix-shared-wrap';
import vaultSecrets from '@adobe/spacecat-shared-vault-secrets';

// Zero-config: bootstrapPath and Vault path both auto-resolve from ctx.func.name
export const main = wrap(run)
  .with(vaultSecrets)
  .with(helixStatus);

Prerequisites

Each service in each AWS account (dev/stage/prod) needs:

  1. Per-service AppRole in Vault - provisioned via cst-vault/vault_policies PRs (done)

  2. Per-service bootstrap secret in AWS Secrets Manager at /mysticat/bootstrap/{service-name}:

{
  "role_id": "<service-specific-approle-role-id>",
  "secret_id": "<service-specific-secret-id>",
  "vault_addr": "https://vault-amer.adobe.net",
  "mount_point": "dx_mysticat",
  "environment": "dev"
}
  1. VPC configuration - Lambda must run in the SpaceCat VPC (private subnets with NAT gateway). Vault rejects AppRole login from non-Adobe IPs with HTTP 403.

  2. IAM permissions - Lambda role needs secretsmanager:GetSecretValue on /mysticat/bootstrap/*. Three IAM policies updated in spacecat-infrastructure PR fix: add try-catch block to authentication manager #336.

See the package README for full setup details, secret-id rotation procedure, and error reference.

Test plan

  • 54 unit tests passing (including 3 new bootstrapPath auto-resolution tests)
  • 100% statement/function/line coverage
  • Two rounds of crew review (security, architecture, code quality) - all must-fix items addressed
  • Lint clean
  • E2E validated on real Lambda (SM bootstrap -> AppRole auth -> Vault KV read)
  • CI passes

Adds loadBootstrapConfig() that reads Vault AppRole credentials
from AWS Secrets Manager using aws4-signed requests. Includes
tests for happy path, missing secret, network error, and missing
AWS credentials with 100% coverage.
Implement vaultSecrets middleware wrapper and loadSecrets function with:
- Bootstrap from AWS Secrets Manager, AppRole auth, secret loading
- Two-tier caching: time-based expiration + metadata-based change detection
- Convention-based path resolution ({env}/{func.name}) with custom name support
- Merges secrets into context.env and process.env
- Returns 502 Response on error with x-error header
- Skips Vault on simulate runtime or missing ctx.func
- Token renewal when expiring soon, re-auth when expired
@solaris007 solaris007 self-assigned this Feb 18, 2026
@solaris007 solaris007 added the enhancement New feature or request label Feb 18, 2026
- Remove public token getter from VaultClient (security)
- Cache bootstrap config for re-auth instead of re-fetching (efficiency)
- Add ensureClient promise lock to prevent concurrent bootstrap race
- Remove redundant metadata call on initial load, establish baseline
  on first metadata check instead
- Fix x-error header to match helix-shared-secrets ('error fetching secrets.')
- Stop leaking AWS response body in bootstrap error messages
- Fix README option name (bootstrapPath) and name function signature
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

- ensureClient lock now checks client state after awaiting, so
  concurrent waiters get a clear error when bootstrap fails instead
  of proceeding with a null client
- Reuse lastChanged value from metadata check instead of calling
  getLastChangedDate a second time after re-fetch (eliminates one
  HTTP call per secret rotation and fixes TOCTOU issue)
Comprehensive README covering VPC requirements, IAM permissions,
bootstrap secret format, secret-id rotation, and error reference.
CLAUDE.md with dev instructions and e2e validation procedure.
Design for SITES-40735 - per-service credential isolation with
auto-resolved bootstrap paths, 5-phase migration plan, and
validation gates between each phase for self-correction.
SM secret creation, three IAM policy updates, data-service ARN
variable change, and expanded validation gate checks for the
infra repo.
coralogix-feeder is excluded from the SM-to-Vault migration. Updates
service count from 13 to 12, removes from all validation gate bash
loops, and adjusts rollout order accordingly.
Replace hardcoded DEFAULT_BOOTSTRAP_PATH with dynamic resolution:
- No bootstrapPath option: resolves to /mysticat/bootstrap/{ctx.func.name}
- Explicit bootstrapPath option: used as-is (override)

This enables zero-config per-service bootstrap secrets. Each service
automatically reads from its own SM bootstrap secret based on its
function name (e.g. api-service -> /mysticat/bootstrap/api-service).

SITES-40735
…strap

- Update bootstrap path convention from /mysticat/vault-bootstrap
  to /mysticat/bootstrap/{service-name} (auto-resolved)
- Replace data-service-only AppRole table with per-service convention
- Update IAM section to reflect policies already include /mysticat/bootstrap/*
- Update production integration blocklist with completed items
- Update test count from 51 to 54
Copy link
Copy Markdown
Member

@ekremney ekremney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #1364 Review

@solaris007 — Nice work. Clean architecture, solid test coverage, and the migration plan is thorough. One item I'd like addressed before merge, plus a few smaller things.


Findings

1. No recovery when cached bootstrap credentials become stale — Medium-High

File: vault-secrets-wrapper.js, ensureClient() re-auth branch

When a Vault token expires, ensureClient re-authenticates using the cached bootstrapConfig. If secret_id was rotated while the Lambda container is warm, every subsequent request fails with 502 until the container is recycled — there is no fallback to re-fetch from SM.

// Current: always uses cached config, no recovery
} else if (!vaultClient.isAuthenticated()) {
  await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
}

Suggested fix: catch auth failure and retry with fresh bootstrap:

} else if (!vaultClient.isAuthenticated()) {
  try {
    await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
  } catch {
    // secret_id may have been rotated — re-fetch from SM
    const bootstrapPath = resolveBootstrapPath(ctx, opts);
    bootstrapConfig = await loadBootstrapConfig({ bootstrapPath });
    await vaultClient.authenticate(bootstrapConfig.role_id, bootstrapConfig.secret_id);
  }
}

This adds one extra SM call only on auth failure (not on the happy path), and makes secret-id rotation seamless without requiring container recycle. The migration plan calls out secret-id rotation as a tracked concern (SITES-40736) — this code path is exactly where it would bite. Recommend fixing before merge.


2. Singleton state assumes one service identity per runtime — Medium

The module-level cache, vaultClient, bootstrapConfig, and bootstrapEnvironment are all singletons. secretPath is resolved per call, but cache reads/writes are not keyed by it. If name resolves to different paths across invocations in the same runtime, one path receives another's cached secrets.

For the intended deployment model (one Lambda = one service = one secret path) this is fine and matches helix-shared-secrets. But the public API (name as function, bootstrapPath as option) implies broader flexibility than the singleton state supports.

Recommendation: No code change needed — just add a contract note to the README and the TypeScript declarations:

This module assumes one service identity per process lifetime. All invocations within a runtime share a single Vault client, bootstrap config, and secret cache.


3. Stale TypeScript declaration comment — Low

File: index.d.ts:14

Comment says Default: '/mysticat/vault-bootstrap' but the code now auto-resolves to /mysticat/bootstrap/${ctx.func.name}. Should be:

/** AWS Secrets Manager path for Vault bootstrap config. Default: auto-resolved to '/mysticat/bootstrap/{ctx.func.name}' */
bootstrapPath?: string;

4. cache.lastChanged not reset after hard-expiration re-fetch — Low

File: vault-secrets-wrapper.js, re-fetch block

When cache hard-expires (1h), secrets are re-fetched but cache.lastChanged retains its old value. The next metadata check compares against stale lastChanged and may trigger one redundant re-fetch. Harmless (at most 1 extra call per hour), but easy to clean up:

if (!cache.data || isExpired || metadataChanged) {
  cache.data = await vaultClient.readSecret(secretPath);
  cache.loaded = now;
  cache.checked = now;
  if (metadataChanged) {
    cache.lastChanged = newLastChanged;
  } else if (isExpired) {
    cache.lastChanged = 0;
  }
}

Nits

  • ensureClient(ctx, opts, log)log is always ctx.log; could simplify to (ctx, opts).
  • The concurrency lock (clientLock = new Promise(...)) works correctly but a one-line comment noting it's a one-shot lock (cleared in finally) would help future readers.

What's done well

  • Zero-config design — bootstrap path and Vault path both auto-resolve from ctx.func.name
  • Lightweight — aws4 signing without the full AWS SDK for a single SM call
  • Two-tier caching — 0 HTTP calls for warm invocations within 60s, metadata-based invalidation within the 1h window
  • Concurrency lock — prevents duplicate bootstrap/auth during cold-start fan-out
  • Drop-in contract — 502 + x-error: error fetching secrets. matches helix-shared-secrets exactly
  • Test coverage — 54 tests covering concurrency, token lifecycle, metadata change detection, custom name fallback, bootstrap failure propagation
  • Migration plan — 5-phase rollout with validation gates, rollback at any phase, old SM secrets preserved until Phase 5

Verdict

Finding Severity Action
Bootstrap credential staleness Medium-High Fix before merge
Singleton contract documentation Medium Document
Stale d.ts comment Low Fix
cache.lastChanged after expiration Low Nice-to-have

Package is well-implemented and ready to go once #1 is addressed.

- Add bootstrap re-fetch on auth failure for seamless secret_id rotation
- Reset cache.lastChanged after hard expiration to avoid redundant re-fetch
- Fix stale d.ts comment for bootstrapPath default
- Add singleton contract note to README and d.ts
- Simplify ensureClient signature (remove redundant log param)
- Add concurrency lock comment for future readers
- Add 2 tests for bootstrap re-fetch behavior (56 total, 100% coverage)
@solaris007
Copy link
Copy Markdown
Member Author

solaris007 commented Feb 19, 2026

Thanks for the utterly thorough review, @ekremney! Really appreciate you catching the bootstrap staleness issue - that's exactly the kind of thing that would bite us at 2am on a Friday when someone rotates a secret_id.

All four findings addressed in 86374a41:

1. Bootstrap credential staleness (Medium-High) - Done. ensureClient now catches auth failure and re-fetches bootstrap config from SM before retrying. Added two tests: one for successful recovery after secret_id rotation, one for propagating the error when both re-auth and SM re-fetch fail.

2. Singleton contract documentation (Medium) - Added note to both the README ("How It Works" section) and the TypeScript declarations above VaultSecretsOptions.

3. Stale d.ts comment (Low) - Fixed, now says auto-resolved to '/mysticat/bootstrap/{ctx.func.name}'.

4. cache.lastChanged after hard expiration (Low) - Reset to 0 on hard expiration so the next metadata check establishes a fresh baseline.

Also took the nits: dropped the redundant log param from ensureClient (just uses ctx.log directly) and added a comment explaining the one-shot concurrency lock.

56 tests, 100% coverage, lint clean.

@solaris007 solaris007 requested a review from ekremney February 19, 2026 16:29
Copy link
Copy Markdown
Member

@ekremney ekremney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remarkably thoroughly addressed, @solaris007. All four findings resolved, nits included, coverage up. Ship it.

@solaris007 solaris007 merged commit 7eae4e2 into main Feb 19, 2026
7 checks passed
@solaris007 solaris007 deleted the feat/vault-secrets-wrapper branch February 19, 2026 19:06
solaris007 pushed a commit that referenced this pull request Feb 19, 2026
## @adobe/spacecat-shared-vault-secrets-v1.0.0 (2026-02-19)

### Features

* **vault-secrets:** add @adobe/spacecat-shared-vault-secrets package ([#1364](#1364)) ([7eae4e2](7eae4e2))
@solaris007
Copy link
Copy Markdown
Member Author

🎉 This PR is included in version @adobe/spacecat-shared-vault-secrets-v1.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants