feat: unblock SP calls for UC in production#310
Draft
atilafassina wants to merge 15 commits intomainfrom
Draft
Conversation
Phase 1 of 2 — remove pre-policy 401 on missing x-forwarded-user; let the
volume policy decide via { id: <sp-id>, isServicePrincipal: true }.
asUser(req) keeps strict throw semantics; single logger.debug on fallback
replaces the dev-mode logger.warn. Startup no-explicit-policy warning
broadened to mention header-less HTTP. Tests rewritten to codify new
contract. Two pre-existing auto-generated files reformatted to match biome.
Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
Phase 2 of 2 — non-behavioral polish. JSDoc on FilePolicyUser.id / isServicePrincipal broadened to describe header-less HTTP as a valid SP call origin. VolumeHandle JSDoc notes asUser(req) throws AuthenticationError.missingToken regardless of NODE_ENV. Files-plugin docs paragraphs that implied x-forwarded-user was mandatory have been reworded. Auto-regenerated typedoc for FilePolicyUser included. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds optional `auth: "service-principal" | "on-behalf-of-user"` to both VolumeConfig and IFilesConfig. Resolution order: volume.auth ?? plugin.auth ?? "service-principal". Removes the undocumented `bypassPolicy` parameter from createVolumeAPI (zero callsites in packages/ or apps/, so no consumers to migrate). Phase 1 of files-per-volume-auth-mode — config surface only, no routing or SDK identity change yet. _resolveAuth is unused at this point and will be wired in Phase 2. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Adds `_extractObiUser` to the files plugin and wires identity selection into `_enforcePolicy` so it branches on `_resolveAuth(volumeKey)`: - service-principal volumes: existing inline extraction (unchanged) - on-behalf-of-user volumes: require x-forwarded-access-token + user headers; 401 in production when missing, dev-fallback to SP with a single warn in development Only the policy-user identity changes here. UC SDK calls still execute as the service principal until Phase 3 wires `runInUserContext`. Phase 2 of files-per-volume-auth-mode. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Wires the seven read handlers (list, read, download, raw, exists, metadata, preview) through a new `_runWithAuth(req, volumeKey, fn)` helper that: - runs `fn` directly on SP volumes — byte-for-byte identical to today - wraps `fn` in `runInUserContext` on OBO volumes when both x-forwarded headers are present, so the SDK call and `getCurrentUserId()` resolve to the end user's identity. Cache keys use `getCurrentUserId()`, so per-user cache isolation falls out for free. Policy still gates first; `_enforcePolicy` already 401s in production when OBO headers are missing, so the dev-fallback path inside `_runWithAuth` is only reachable under `NODE_ENV === "development"`. Also rewrites the cache invalidation comment and the `VolumeAPI` / `VolumeHandle` / `exports()` JSDoc to describe both modes (previously all three claimed "operations execute as the service principal"). Phase 3 of files-per-volume-auth-mode — first end-to-end demoable slice. Write routes (Phase 4) and `VolumeHandle.asUser` SDK identity (Phase 5) are not yet wired. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Wires `_handleUpload`, `_handleMkdir`, and `_handleDelete` through `_runWithAuth` so write traffic on OBO volumes executes as the end user. SP volumes are byte-for-byte identical (no-op wrap fall-through). The upload handler does a hand-rolled `fetch PUT` and calls `client.config.authenticate(headers)` to populate auth headers — the whole body now runs inside the user-context wrap, so the user-token client is what authenticates the outgoing request. Adds the non-negotiable upload-headers contract test: triggers an OBO upload and asserts the outgoing fetch carries `Authorization: Bearer USER-TOKEN-FOO` (not the SP's marker), proving the chain runInUserContext → getWorkspaceClient → client.config.authenticate → fetch headers is intact end-to-end. Verified the test fails meaningfully if the wrap is removed. Phase 4 of files-per-volume-auth-mode. Phase 5 (`VolumeHandle.asUser` SDK identity fix) is next. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
`appKit.files("vol").asUser(req).<op>()` previously only swapped the
user identity passed to the volume policy — the underlying SDK call
still ran as the service principal. After this change, every method on
the returned handle is wrapped in `runInUserContext` with the request's
user identity, so the UC SDK call genuinely executes as the end user.
This is a hard override at the SDK level: it applies even on
`auth: "service-principal"` volumes, where it was previously a no-op
for SDK identity.
Implementation: a new `_wrapVolumeAPIInUserContext` helper wraps each
method in the returned `VolumeAPI`. Reuses `_buildUserContextOrNull` so
the dev-mode fallback (`NODE_ENV === "development"` + missing token)
skips the wrap and continues to execute as the SP — matching pre-OBO
behavior locally without a reverse proxy.
Strict `_extractUser` is unchanged — `asUser` still throws
`AuthenticationError.missingToken` in production when the user header
is missing.
Programmatic OBO note: `appKit.files("obo-vol").<op>()` (no req, no
asUser) cannot synthesize a user identity and continues to execute
against whatever client `getWorkspaceClient()` resolves to at the call
site (typically the SP at the top level). The OBO volume default
applies to HTTP route traffic via `_runWithAuth`. For programmatic
per-user execution, `asUser(req)` is the supported path.
Phase 5 of files-per-volume-auth-mode.
BREAKING CHANGE: programmatic callers of
appKit.files("vol").asUser(req).<op>() that expected the SDK call to
execute as the service principal (with only the policy seeing the
user) now see the SDK call execute as the user. Audit programmatic
asUser callers and remove the asUser wrap if SP credentials were the
desired behavior.
Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
…phase 6)
Tags every Files plugin trace span with `files.auth_mode` set to either
`"service-principal"` or `"on-behalf-of-user"`, reflecting what
operationally happened (whether the SDK call ran inside
`runInUserContext`). Two complementary plumbing paths land on the same
attribute key:
- HTTP routes: route handlers compute the effective mode via a new
`_effectiveAuthMode(req, volumeKey)` helper and thread it into the
existing `PluginExecutionSettings.telemetryInterceptor.attributes`
shape — `TelemetryInterceptor` already passes those attributes to
`tracer.startActiveSpan`, so the attribute lands on the existing
`plugin.execute` span without new infrastructure.
- Programmatic API (`exports()` SP path + `asUser` OBO path): wraps each
VolumeAPI method in a new `_withAuthModeSpan(operation, mode, fn)`
helper that calls `this.telemetry.startActiveSpan("files.<op>", ...)`.
Programmatic calls previously created NO span, so this introduces new
`files.<op>` spans on programmatic surface — a small observability
enrichment beyond pure attribute plumbing. Spans respect the plugin's
`traces` config; the noop tracer takes over when traces are disabled.
Updates `getResourceRequirements()` JSDoc to document the per-volume
permission split: SP volumes need the SP to hold `WRITE_VOLUME`; OBO
volumes need the end user (not the SP) to hold it. Adds a
`// TODO: extend plugin-manifest.schema.json` marker for the eventual
schema-level fix.
Phase 6 of files-per-volume-auth-mode. Phase 7 (docs, playground,
changelog) is next.
Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
Final polish phase for files-per-volume-auth-mode: JSDoc: - Expand `VolumeConfig.auth` and `IFilesConfig.auth` with SP and OBO example blocks plus resolution-order notes. - Extend `FilePolicyUser.isServicePrincipal` JSDoc with the full six-row matrix covering SP/OBO x HTTP/programmatic x header presence, including the `asUser(req)` rows. Docs (`docs/docs/plugins/files.md`): - New "Auth modes" section: two modes, resolution order, per-mode permission requirements, prod/dev OBO behavior, manifest-scope limitation, side-by-side config examples. - New `usersOnly` policy example using `isServicePrincipal: false` and a "Policy user matrix" subsection. - Rewrites every "all operations execute as the service principal" paragraph to describe both modes. - Dedicated `asUser(req)` subsection with the honest limitation that programmatic OBO defaults don't apply without `asUser`. Dev-playground: - Server: new `obo_demo` volume with `auth: "on-behalf-of-user"` and a `usersOnly` policy; new smoke route `GET /policy/obo-volume` using `asUser(req)`. `app.yaml` gets the matching `DATABRICKS_VOLUME_OBO_DEMO` resource binding. - Client: `policy-matrix.route.tsx` gets a "Per-volume OBO mode" section with two probes (HTTP route + programmatic smoke) so the end-to-end OBO path is exercisable from the UI. Changelog: - New `## Unreleased` section in `CHANGELOG.md` (lives above the release-it generated `[0.24.0]` block) covering the per-volume auth feature, the `files.auth_mode` span, the `asUser` SDK identity behavior change, the `bypassPolicy` removal, and the honest limitation about programmatic OBO without `asUser`. Generated: - `types.generated.ts` and `plugin-manifest.generated.ts` are regenerated formatting (line-collapsing) from `pnpm build`'s generator step. Content unchanged. Backpressure: pnpm build, pnpm docs:build, pnpm check:fix, pnpm -r typecheck, pnpm test (1666 passing) all clean. Phase 7 of files-per-volume-auth-mode — feature is shippable. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
…che, spans
Addresses four findings from the multi-model review of the files OBO
feature.
1. asUser privilege confusion in production (CRITICAL, security)
`_extractUser` now requires BOTH `x-forwarded-user` AND
`x-forwarded-access-token` in production — throws
`AuthenticationError.missingToken` when either is missing. Previously
a request with the user header but no token returned a SP-wrapped
API where the policy saw the request as a real end user
(isServicePrincipal: undefined) but the SDK ran with SP credentials,
so policies like `usersOnly: !user.isServicePrincipal` were
satisfiable while wielding the SP's broader UC grants. CWE-639.
Dev fallback now marks the returned policy user as
`isServicePrincipal: true` and warns (was debug) so policies can't
be tricked even in dev.
2. Double WorkspaceClient allocation per OBO request (HIGH, perf)
New `_resolveAuthForRequest(req, volumeKey)` returns
{ mode, userCtx } and builds the UserContext at most once per
request. `_runWithAuth(userCtx, fn)` now takes the pre-built ctx.
Removes `_effectiveAuthMode` (no longer needed). Every OBO HTTP
request previously constructed two WorkspaceClients; cache hits
paid that cost too. Now: one allocation, even on cache hits.
3. Write-cache invalidation on OBO + wrong path arg (HIGH, correctness)
Two related bugs at `_invalidateListCache`: (a) cache keys included
`getCurrentUserId()`, so user A's write only busted user A's list
cache — user B continued to see stale listings; (b) the `path` arg
was the file path instead of the parent directory.
Fix: list-cache is disabled on OBO volumes (no cross-user staleness
possible). On SP volumes, invalidation now derives the parent
directory via `parentDirectory()`, falling back to the `"__root__"`
sentinel for root-level writes — matches `_handleList`'s key shape.
Cache delete + path resolution are wrapped in best-effort try/catch
so an invalidation failure cannot convert a successful write into
HTTP 500.
4. Duplicate `files.<op>` spans on programmatic calls (HIGH, perf)
`FilesConnector.<op>` already opens a `files.<op>` span via its
`traced()` decorator. The Phase 6 `_withAuthModeSpan` opened an
identical span on top, doubling span allocation/export volume on
every programmatic call.
Fix: new `runWithFilesSpanAttributes(attrs, fn)` exported from
`connectors/files/client.ts` uses AsyncLocalStorage to make ambient
attributes available to the connector's `traced()` decorator.
`_withAuthModeSpan` renamed to `_withAuthModeAttributes` — no longer
creates a span; just sets `files.auth_mode` on the connector's
existing span via the ALS channel. Programmatic calls now produce
exactly one `files.<op>` span. HTTP route attribute path
(`PluginExecutionSettings.telemetryInterceptor.attributes`) is
unchanged.
Tests: 8 new regression guards covering each fix's failure mode (1674
total, was 1666). The pre-existing `DownloadResponse` unused-import
warning is also resolved as cleanup fallout.
User-facing notes (for changelog follow-up):
- asUser(req) throws on missing token in production (was silent SP
fallback — the previous behavior was a privilege-confusion bug).
- OBO volume reads are no longer cached. Trade-off: every OBO read
hits the SDK; in exchange, no cross-user staleness. SP volume reads
still cache.
- Programmatic appKit.files(vol).<op>() emits one `files.<op>` span
instead of two. The `files.auth_mode` attribute now lands on the
connector's existing span.
Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
Two follow-up review fixes on top of 3f98628. A — _invalidateListCache not awaited (medium, correctness) Write handlers (_handleUpload, _handleMkdir, _handleDelete) called _invalidateListCache without `await`, so res.json() shipped before cache invalidation completed. A client that issued a follow-up GET /list in the same tick could hit the stale cache. Fix: _invalidateListCache is now genuinely `async` and all three call sites `await` it before sending the response. Best-effort try/catch around cache.delete and connector.resolvePath remains, so an invalidation failure cannot convert a successful write into 500. Regression test installs a deferred-promise cache.delete and asserts res.json is NOT called until the delete settles. Verified the test fails when the `await` is removed and passes when it is added back. B — hardcoded ports in integration tests (medium, CI hygiene) plugin.integration.test.ts bound to fixed offsets of TEST_PORT (9880, 9881, 9882). Concurrent CI runs or stale test processes risked EADDRINUSE flakiness. Fix: bind `port: 0` so the OS assigns ephemeral ports. New getListeningPort() helper waits for the server's `listening` event and returns the assigned port for building localBase URLs. Chose port-0 over supertest because the tests build their own AppKit instance, hold the Server for afterAll cleanup, and use real fetch calls — moving to supertest would have required restructuring the fixture lifecycle. Tests: 1675 (was 1674 + 1 new). typecheck + biome clean. Co-authored-by: Isaac Signed-off-by: Atila Fassina <atila@fassina.eu>
Signed-off-by: Atila Fassina <atila@fassina.eu> # Conflicts: # CHANGELOG.md # apps/dev-playground/server/index.ts
- _readPathQuery helper rejects array/object query params with 400 across all 7 path-bearing handlers (was: req.query.path raw-cast) - /read streams via connector.download + size-enforcing TransformStream capped at maxReadSize (new VolumeConfig field, default 10MB); /read no longer participates in the read-tier cache - Upload TransformStream enforces bytesReceived <= contentLength when declared, closing a per-user policy bypass where small Content-Length + larger body would exceed an approved upload size - _enforcePolicy 401 and _handleApiError 4xx now return generic public bodies (Unauthorized / standard HTTP STATUS_CODES); raw error.message goes to server-side logs only (CWE-209) Co-authored-by: Isaac Co-authored-by: Orca <help@stably.ai> Signed-off-by: Atila Fassina <atila@fassina.eu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.