fix(utils): include subpath in resolveCustomerSecretsName to prevent credential collisions#1577
fix(utils): include subpath in resolveCustomerSecretsName to prevent credential collisions#1577alinarublea wants to merge 5 commits intomainfrom
Conversation
…credential collisions Subpath sites on the same domain (e.g. nba.com/kings, nba.com/lakers) were sharing a single Secrets Manager key because only the hostname was used. Now the full path is included so each subpath site gets its own secret key. Backward compatible: root-domain sites produce the same key as before. Fixes LLMO-4186 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alinarublea
left a comment
There was a problem hiding this comment.
Code review
The fix is correct and backward compatible. Two small observations:
url.host vs url.hostname
The implementation uses url.host (which includes the port, e.g. nba.com:8080) rather than url.hostname. This matches the pre-existing behavior so it is not a regression, but generateDataFolder in spacecat-api-service (the companion fix for LLMO-4186) uses url.hostname. Worth aligning the two for consistency, or at minimum documenting the difference.
Missing test case for nested paths
https://nba.com/us/kings → nba_com_us_kings is the expected behavior (correct), but there is no test covering nested subpaths. A single extra assertion in the new test would document this and guard against future regressions.
Neither is a blocker — the core fix is solid.
- Switch from url.host to url.hostname for consistency with generateDataFolder - Add test for nested subpaths (e.g. /us/kings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This PR will trigger a patch release when merged. |
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the focused fix and the upfront self-review. The core change is correct and the tests cover the headline scenarios. The cluster of concerns below is mostly about what happens to credentials that already exist under the old key shape, plus a few normalization edges that the new path-based key does not fully close.
Strengths
- The fix correctly closes a real cross-tenant credential collision: distinct subpath customers on the same hostname were sharing one Secrets Manager key (
packages/spacecat-shared-utils/src/helpers.js:54). - Backward compatibility for root-domain sites is preserved by the
pathSuffix ? ... : hostternary; thenew URL('https://nba.com/').pathname === '/'case is correctly handled by the^\/|\/$strip. - Switching from
url.hosttourl.hostnamewas the right correctness move - a port should not be part of an identity key. - The new tests in
packages/spacecat-shared-utils/test/helpers.test.js:93-117cover the primary collision scenario, root trailing-slash idempotence, and nested subpaths. The pre-existing'resolves the customer secrets name correctly with valid inputs'test (line 65) implicitly anchors the "no path - same as before" claim. - Both points raised in the self-review (host vs hostname, nested-path test) were addressed before review was requested. Clean change discipline.
Issues
Important (Should Fix)
-
Migration impact for existing subpath customers is unaddressed (
packages/spacecat-shared-utils/src/helpers.js:54)The PR description says "Backward compatible," but this is only true for sites with no path. Any customer whose
baseURLalready carries a path had its credentials written undercustomer-secrets/<host>/<ver>. After this PR ships, the samebaseURLresolves tocustomer-secrets/<host>_<path>/<ver>, a key that does not exist in Secrets Manager. The downstream consumers (spacecat-shared-google-client,spacecat-shared-ims-client,spacecat-shared-content-client, plusspacecat-auth-serviceGoogle/SharePoint handlers) will start failing those lookups.content-clientswallows the lookup error atlog.debugand silently degrades; IMS auth raises and breaks page authentication.Two paths forward, pick one:
- Confirm in the PR description (and ideally in LLMO-4186) that no production Site has a non-empty path in
baseURLtoday, so the "backward compatible" claim is in practice "backward compatible for all current sites." - If subpath sites do exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.
- Confirm in the PR description (and ideally in LLMO-4186) that no production Site has a non-empty path in
-
Path-segment separator collision narrows but does not close the original bug class (
packages/spacecat-shared-utils/src/helpers.js:53)Both
/and any non-alphanumeric character collapse to_. As a result:nba.com/us/kingsandnba.com/us-kingsboth resolve tonba_com_us_kings.nba.com/us/kingsandnba.com/us_kingsalso collide.nba.com/foo barandnba.com/foo%20barcollide on_-runs.
Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because
baseURLis server-stored fromsite.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output (e.g. join host and path-segments with a delimiter that never collapses), or hash thepathname(e.g. first 12 hex of sha256) and append. Both are small and bounded to this function. -
JSDoc does not document the new behavior (
packages/spacecat-shared-utils/src/helpers.js:41-46)The doc was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with
_, lowercased) and combined with_. Sites with the same hostname but different paths resolve to distinct secrets." A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a@sincenote recording the format change so the breadcrumb is discoverable.
Minor (Nice to Have)
-
Repeated-slash path normalization (
packages/spacecat-shared-utils/src/helpers.js:53)The strip-slash regex
/^\/|\/$/gis a single-character alternation - it strips at most one leading and one trailing slash.nba.com//kings(sometimes produced by URL builders) yields_kings, different fromnba.com/kings->kings. Same risk class as the original bug, less likely to be hit in practice. Either collapse with^\/+|\/+$, or normalize the path uniformly withpathname.split('/').filter(Boolean).join('_').replace(/[^a-zA-Z0-9_]/g, '_').toLowerCase()(which also handles dot-segments). -
Missing subpath trailing-slash idempotence test (
packages/spacecat-shared-utils/test/helpers.test.js)The trailing-slash idempotence test only covers root URLs. The same property should hold for subpaths and is not asserted:
expect(resolveCustomerSecretsName('https://nba.com/kings', ctx)) .to.equal(resolveCustomerSecretsName('https://nba.com/kings/', ctx));
A future refactor of the path regex could silently break this.
-
Path case-folding is undocumented (
packages/spacecat-shared-utils/src/helpers.js:53)nba.com/Kingsandnba.com/kingscollapse to the same key because of.toLowerCase()on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate normalization choice. A one-line comment on the.toLowerCase()(or a test) makes the intent visible to a future contributor who might "fix" it.
Recommendations
- Consider extracting the host+path-to-key transformation into a small named helper (
urlToCustomerSlugor similar) with focused unit tests covering the normalization edges (encoded chars, repeated slashes, query/fragment, mixed case). This is load-bearing logic; isolating it pays for itself the next time someone needs to extend it. - Add a length guard. AWS Secrets Manager caps secret names at 512 chars; the basePath consumes ~46. Deep subpaths could plausibly approach that ceiling. A length check would surface this earlier than a Secrets Manager API rejection at runtime.
- Forward-looking: the canonical place to enforce the customer-identity contract is the Site model. A future
site.getCustomerSecretId()would shrink the surface area where the convention can drift across the four downstream consumers.
Out of scope, worth tracking
- The companion
generateDataFolderfix inspacecat-api-serviceis correctly excluded here, but worth tracking that the two functions encode the same identity concept using compatible conventions (this PR uses_,generateDataFolderreportedly uses-). A drift between the two reproduces the original bug class on the data-folder side. - Downstream packages (
spacecat-shared-google-client,spacecat-shared-ims-client,spacecat-shared-content-client, plusspacecat-auth-servicehandlers) will need a coordinated bump of the@adobe/spacecat-shared-utilsdependency so a subpath site's auth flow does not write to the new key while a different service still reads the old one.
Assessment
Ready to merge? With fixes.
Reasoning: the bug fix is correct, minimal, and well-tested for the targeted regression. The main blocker is the migration story - the PR description should explicitly confirm whether any in-production Site already has a subpath baseURL, since the answer determines whether this is a clean ship or a coordinated cutover. The JSDoc update and the separator-collision narrowing are small and worth landing in this PR.
Next Steps
- Resolve the migration question first - either confirm "no subpath sites in prod today" in the PR description / Jira, or put a coordinated cutover plan in place.
- Tighten the separator collision (Important item 2) and update the JSDoc (Important item 3).
- The three Minor items can land here or in a small follow-up.
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| customer = pathSuffix ? `${host}_${pathSuffix}` : host; | ||
| } catch { |
There was a problem hiding this comment.
Important - Migration impact for existing subpath customers
The PR description says "Backward compatible," but that holds only for sites with no path. Any customer whose baseURL already carries a path had its credentials written under customer-secrets/<host>/<ver>. After this ships, the same URL resolves to customer-secrets/<host>_<path>/<ver> - a key that does not exist in Secrets Manager. Downstream consumers (spacecat-shared-google-client, spacecat-shared-ims-client, spacecat-shared-content-client, plus spacecat-auth-service Google/SharePoint handlers) will start failing those lookups; content-client swallows the error at log.debug and silently degrades, while IMS auth raises and breaks page authentication.
Pick one:
- Confirm in the PR description (and LLMO-4186) that no production Site has a non-empty path in
baseURLtoday, so the "backward compatible" claim is in practice "backward compatible for all current sites." - If subpath sites already exist, coordinate a one-time secret rename in Secrets Manager before deploy, or add a transitional dual-read in callers (try new key, fall back to legacy hostname-only key, log a warning) for one release cycle.
| const url = new URL(baseURL); | ||
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const pathSuffix = url.pathname.replace(/^\/|\/$/g, '').replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| customer = pathSuffix ? `${host}_${pathSuffix}` : host; |
There was a problem hiding this comment.
Important - Separator collision narrows but does not close the original bug class
Both / and any non-alphanumeric character collapse to _. As a result:
nba.com/us/kingsandnba.com/us-kingsboth resolve tonba_com_us_kingsnba.com/us/kingsandnba.com/us_kingsalso collidenba.com/foo barandnba.com/foo%20barcollide on_-runs
Two legitimately distinct customers landing on these forms reproduces the exact bug class this PR set out to fix. Not currently exploitable as cross-customer auth bypass because baseURL is server-stored from site.getBaseURL(), but it leaves the door open for the next onboarding incident. Either pick a separator that cannot appear in the sanitized output, or hash the pathname (e.g. first 12 hex of sha256) and append. Both are small and bounded to this function.
Minor (same line) - Repeated-slash normalization
The strip-slash regex /^\/|\/$/g strips at most one leading and one trailing slash. nba.com//kings yields _kings, different from nba.com/kings -> kings. Either collapse with ^\/+|\/+$, or normalize uniformly with pathname.split('/').filter(Boolean).join('_') (which also handles dot-segments).
Minor (same line) - Path case-folding is undocumented
nba.com/Kings and nba.com/kings collapse to the same key because of .toLowerCase() on the path. RFC 3986 says path segments are case-sensitive, so this is a deliberate choice. A one-line comment on the .toLowerCase() (or a test) makes the intent visible to a future contributor who might "fix" it.
| @@ -47,7 +47,10 @@ export function resolveCustomerSecretsName(baseURL, ctx) { | |||
| const basePath = '/helix-deploy/spacecat-services/customer-secrets'; | |||
There was a problem hiding this comment.
Important - JSDoc does not document the new behavior
The existing JSDoc (just above this line) was thin before and is now actively misleading - readers will not know that the path component participates in the key. At minimum: "The hostname and (if present) URL path are normalized (non-alphanumeric replaced with _, lowercased) and combined with _. Sites with the same hostname but different paths resolve to distinct secrets."
A worked example of the format (/helix-deploy/spacecat-services/customer-secrets/<host>[_<path>]/<version>) would also help the next person debugging "why is this customer's secret missing." Consider adding a @since note recording the format change so the breadcrumb is discoverable.
- Split pathname into segments and join with '__' so 'nba.com/us/kings' and
'nba.com/us-kings' produce distinct keys
- Use split('/').filter(Boolean) to handle repeated slashes correctly
- Update JSDoc with key format and examples
- Add trailing-slash idempotence test for subpaths
- Add separator-collision test
Addresses review feedback on LLMO-4186
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the thorough review @dzehnder — all points addressed in the latest commits: Migration impact: No production sites currently have a non-empty path in Separator collision (important #2): Fixed. Switched to Repeated-slash normalization (minor #1): Fixed as a side-effect of the segment split — JSDoc (important #3): Updated with format description, delimiter rationale, and worked examples. Subpath trailing-slash idempotence test (minor #2): Added. The companion PR (spacecat-api-service#2315 |
…olding intent - decodeURIComponent each path segment before sanitizing so %20 and equivalent decoded forms map to the same key - Collapse consecutive _ runs after sanitizing (e.g. %20%20 -> single _) - Document deliberate case-folding of path segments in JSDoc - Add tests for percent-encoded paths, uppercase paths, and double slashes Addresses remaining review feedback on LLMO-4186 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressing the remaining items: Minor #3 (case-folding undocumented): Added explicit note to JSDoc — "paths are case-folded deliberately so /Kings and /kings map to the same key" — and a test asserting this. decodeURIComponent (parallel with companion PR): Added percent-decode for each segment before sanitizing, with a safe fallback on malformed encoding. Added test for Consecutive Double-slash test: Added — All 1043 tests pass. |
dzehnder
left a comment
There was a problem hiding this comment.
Hey @alinarublea,
Thanks for the quick turnaround on the prior round - the new commit is well-targeted and clearly tries to close the collision class properly. One residual issue: the new JSDoc makes a categorical injectivity claim that the sanitizer does not actually deliver, and the new collision test is too narrow to catch it. Both are fixable with a one-line regex tweak plus a couple of extra assertions.
Strengths
Previously flagged issues now addressed:
- Path-segment separator collision (prior Important): the new per-segment sanitization joined by
__eliminates the originalnba.com/us/kingsvsnba.com/us-kingscollision (packages/spacecat-shared-utils/src/helpers.js:62-67). - JSDoc (prior Important): the docstring now describes the format, the per-segment normalization, the
__delimiter, and gives three worked examples for root, single-segment, and nested-segment URLs (packages/spacecat-shared-utils/src/helpers.js:40-52). - Repeated-slash normalization (prior Minor):
split('/').filter(Boolean)cleanly handlesnba.com//kingsand similar, as a side effect. - Subpath trailing-slash idempotence test (prior Minor): dedicated test added at
packages/spacecat-shared-utils/test/helpers.test.js:113-117. - Path case-folding (prior Minor): now documented in the expanded JSDoc.
- Migration impact for existing subpath customers (prior Important): prior concern resolved by author response - subpath support is new functionality from LLMO-4186, no existing sites carry a non-empty path. The new format has no migration footprint inside spacecat-shared today.
Issues
Important (Should Fix)
-
Injectivity claim is false; the new approach still admits collisions on consecutive non-alphanumeric inputs (
packages/spacecat-shared-utils/src/helpers.js:62-67, JSDoc at:43-46)The JSDoc states: "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This is not actually true. The sanitizer
replace(/[^a-zA-Z0-9]/g, '_')is per-character and does not collapse runs, and_itself is non-alphanumeric so it is preserved 1:1. Two consecutive non-alphanumeric characters in a single path segment therefore produce__inside the segment, indistinguishable from a segment boundary.Verified collisions (all produce the same final secret name):
https://nba.com/us..kings -> nba_com__us__kings https://nba.com/us/kings -> nba_com__us__kings COLLIDE https://nba.com/us-_kings -> nba_com__us__kings COLLIDE https://nba.com/us__kings -> nba_com__us__kings COLLIDE (literal __ in path) https://nba.com//us//kings -> nba_com__us__kings COLLIDE (filter(Boolean) collapses runs of /)Plus intra-segment, all single-segment paths whose two halves are separated by any non-alphanumeric character collide on
us_kings:nba.com/us-kings nba.com/us_kings nba.com/us.kings nba.com/us kings -> nba_com__us_kingsWhy it matters: this is the same security class the PR is meant to close - two distinct
baseURLs mapping to one Secrets Manager key. Severity is bounded today by the admin-onlycreateSitegate and the absence of production subpath sites, so I am calling this Important rather than Critical. But: the PR's whole premise is "close the collision class" and the JSDoc explicitly promises injectivity, so shipping a docstring guarantee that the code does not honor sets up a foot-gun for the next contributor and for any caller that builds on the contract.The new test at
packages/spacecat-shared-utils/test/helpers.test.js:127-132only assertsnba.com/us/kings != nba.com/us-kings. None of the residual collisions above are covered, so the test does not actually pin the property the JSDoc claims.Pick one fix:
-
Tighten the sanitizer (smallest change, recommended): collapse runs of non-alphanumeric to a single
_per segment. One-character regex tweak:const segments = url.pathname.split('/').filter(Boolean) .map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());
With
+,us..kingsbecomesus_kings, and joined-with-__paths can no longer match a single-segment input. The JSDoc claim becomes architecturally true. -
Or weaken the JSDoc to "best-effort distinctness for typical baseURL shapes; pathological inputs may collide." Worst option for a credentials-routing function.
-
Or hash the canonical form (
host,segments) and use the hash as the secret-name suffix. Eliminates the injectivity question entirely. Heavier change.
Whichever path, also extend the separator-collision test to pin the property:
expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx)) .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx)); expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx)) .to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
-
Recommendations
- Adopt the
+regex change above. It is a one-character edit, makes the docstring accurate, preserves all currently-passing tests, and locks the collision class shut. - After this PR merges, raise the same fix on the companion
generateDataFolder(inspacecat-api-service) - it has the identical residual collision with--instead of__. Out of scope here; worth a follow-up ticket so the two functions stay aligned on the injectivity property. - The
__delimiter doubles per-segment overhead and brings the AWS Secrets Manager 512-char name limit a bit closer. Realistic baseURLs are nowhere near that ceiling, but if the team ever wants the limit to be a non-issue, the hash-based key suggestion above is the cleanest answer. Not for this PR.
Out of scope, worth tracking
- The companion fix in
spacecat-api-servicecarries the same residual collision class (--separator with the same regex). One mention here, no separate finding.
Assessment
Ready to merge? With one fix.
Reasoning: every prior finding is genuinely addressed and the design is the right shape. The remaining issue is that the JSDoc advertises injectivity that the sanitizer does not provide, and the new test does not catch the residual collisions. The fix is a one-character regex change plus two additional assertions - small enough to land here rather than defer.
Next Steps
- Apply the
[^a-zA-Z0-9]+(collapse-runs) regex change in the per-segment sanitizer. - Add the two adversarial assertions to the separator-collision test.
- The recommendations are optional.
| try { | ||
| customer = new URL(baseURL).host.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); | ||
| const url = new URL(baseURL); | ||
| const host = url.hostname.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase(); |
There was a problem hiding this comment.
Important - Injectivity claim is false; collision class still open
The new JSDoc (line 43-46) states "The double-underscore delimiter cannot appear in a sanitized segment, so distinct paths cannot collide." This isn't true. The sanitizer is per-character and _ is non-alphanumeric so two adjacent non-alphanumerics in one segment produce __ inside the segment, indistinguishable from a segment boundary.
Verified collisions (all produce nba_com__us__kings):
https://nba.com/us/kings
https://nba.com/us..kings (consecutive `..`)
https://nba.com/us-_kings (mixed `-_`)
https://nba.com/us__kings (literal `__` in path)
https://nba.com//us//kings (filter(Boolean) collapses runs of `/`)
Plus intra-segment: us-kings, us_kings, us.kings, us kings all collapse to us_kings.
Severity is bounded today by the admin-only createSite gate and the absence of production subpath sites, so this stays Important rather than Critical. But the PR's whole purpose is to close this collision class on a credentials function, and shipping a JSDoc guarantee that the code does not honor sets up a foot-gun.
The new test at helpers.test.js:127-132 only asserts us/kings != us-kings; none of the collisions above are covered.
One-line fix (recommended): collapse runs of non-alphanumeric to a single _ per segment:
const segments = url.pathname.split('/').filter(Boolean)
.map((seg) => seg.replace(/[^a-zA-Z0-9]+/g, '_').toLowerCase());The + makes us..kings -> us_kings (single underscore), so joined-with-__ paths can no longer match a single-segment input. JSDoc claim becomes architecturally true; all current tests still pass.
Add adversarial assertions to lock it in:
expect(resolveCustomerSecretsName('https://nba.com/us..kings', ctx))
.to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));
expect(resolveCustomerSecretsName('https://nba.com/us__kings', ctx))
.to.not.equal(resolveCustomerSecretsName('https://nba.com/us/kings', ctx));The companion generateDataFolder in spacecat-api-service has the identical residual collision with --; worth a follow-up there so the two functions stay aligned on the injectivity property.
The per-segment sanitizer now uses /[^a-zA-Z0-9]+/g (with +) to collapse consecutive non-alphanumeric characters into a single underscore in one pass, removing the separate replace(/_+/g, '_') step. This makes the JSDoc injectivity claim architecturally accurate: a sanitized segment can never contain __, so the __ segment delimiter is unambiguous. Added three new tests: - 'us..kings' (dots) does not collide with 'us/kings' (two segments) - 'us__kings' (literal double underscore) does not collide with 'us/kings' - Malformed percent-encoding (%ZZ) is handled without throwing (covers the decodeURIComponent catch branch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
nba.com/kingsandnba.com/lakers) were colliding on a single Secrets Manager key becauseresolveCustomerSecretsNameused only the hostnameFixes LLMO-4186
Test plan
🤖 Generated with Claude Code