Skip to content

fix: KEEP-240 add Origin header check on cookie-authed mutating routes#1048

Merged
suisuss merged 9 commits intostagingfrom
fix/KEEP-240-csrf-origin-check
May 3, 2026
Merged

fix: KEEP-240 add Origin header check on cookie-authed mutating routes#1048
suisuss merged 9 commits intostagingfrom
fix/KEEP-240-csrf-origin-check

Conversation

@suisuss
Copy link
Copy Markdown

@suisuss suisuss commented Apr 29, 2026

Summary

  • Closes the sibling-subdomain CSRF gap on cookie-authed /api/** routes by enforcing Origin against trustedOrigins on state-changing methods.
  • Layers a root middleware.ts (primary enforcement) with an in-handler check inside getDualAuthContext / resolveOrganizationId / resolveCreatorContext (defence-in-depth).
  • Better-auth defaults to SameSite=Lax (verified in source), but its originCheckMiddleware only guards /api/auth/**; the other 124 API routes had no origin check.

Behaviour

Middleware runs on /api/:path* and:

  • Skips GET/HEAD/OPTIONS.
  • Skips paths that legitimately accept cross-origin POSTs and don't depend on session cookies: /api/auth/, /api/billing/webhooks/, /api/cron/, /api/oauth/, /api/workflows/[id]/webhook, /api/mcp/workflows/[slug]/call.
  • Skips when no cookie header is present (Bearer/API-key callers unaffected).
  • Reads Origin, falls back to Referer. Missing or untrusted -> 403 { error: "Invalid origin" }.

getDualAuthContext (and the org/creator variants) repeat the check after session auth resolves, so a misconfigured matcher cannot silently bypass the protection. OAuth and API-key callers are not gated.

trustedOrigins is extracted to lib/trusted-origins.ts so the edge middleware (cannot import lib/auth.ts due to Node-only deps) shares one source of truth with better-auth.

Blocked attempts log info-level for the first weeks of monitoring.

Test plan

  • pnpm vitest run tests/unit/trusted-origins.test.ts tests/unit/dual-auth-context.test.ts tests/unit/middleware.test.ts -- 48 passing
  • pnpm check -- new code clean (pre-existing lib/auth.ts:334,366 useBlockStatements left untouched per surgical-changes rule)
  • pnpm type-check -- clean
  • Manual browser smoke test: log in, confirm same-origin POSTs work, simulate cross-origin POST from another origin and confirm 403
    • Verified on app-pr-1048 deploy 6065c85 (2026-05-03): 13/13 scenarios pass. Full matrix in PR comments.

Notes for reviewer

  • Pattern matcher in lib/trusted-origins.ts is a small standalone implementation (~30 lines) rather than reusing better-auth's internal wildcardMatch, to avoid coupling to a non-public export.
  • Behaviour for https://*.keeperhub.com matches better-auth's existing matcher: * matches arbitrary depth subdomains. No tightening of the trusted-origin set in this PR.
  • The matcher rejects https://app.keeperhub.com/../evil because URL normalises out the .. before comparison.

better-auth's default SameSite=Lax cookies are the primary CSRF defence,
but its built-in originCheckMiddleware only guards /api/auth/**. The 124
application API routes that authenticate via session cookies have no
explicit origin check, leaving a sibling-subdomain CSRF gap because the
*.keeperhub.com wildcard in trustedOrigins is treated as same-site by
browsers.

Adds two layers:

- Root middleware.ts enforces Origin (falling back to Referer) against
  trustedOrigins on POST/PATCH/PUT/DELETE under /api/** when the request
  carries a cookie. Exempts /api/auth/** (better-auth handles its own
  origin check), webhooks (Stripe + workflow), MCP workflow calls, cron,
  and OAuth client endpoints, all of which legitimately accept
  cross-origin POSTs and don't depend on session cookies.

- getDualAuthContext, resolveOrganizationId, and resolveCreatorContext
  re-run the same check after session auth resolves, as defence-in-depth
  for any route the middleware matcher might miss.

The trusted-origin list is extracted to lib/trusted-origins.ts so the
edge middleware (which can't import lib/auth.ts due to Node-only deps)
shares a single source of truth with better-auth. OAuth Bearer and
API-key callers are not gated; cookieless cross-origin calls remain
unaffected. Blocked attempts log info-level for monitoring.

48 new and updated unit tests cover the matcher, exempt paths, method
and origin/referer fallback logic.
suisuss added 5 commits April 29, 2026 13:35
Reordering checkSessionOrigin above auth.api.getSession in all three
helpers avoids a wasted DB hit when the request will be rejected anyway.
Bearer/API-key callers still short-circuit before the check, so they
remain unaffected.
…eCreatorContext

The defence-in-depth check is wired identically into all three helpers
but only getDualAuthContext was tested. Adds happy-path and rejection
tests for the other two so a regression in one helper is caught
independently.
The previous "path traversal" name implied a security guarantee the
function isn't actually making — it just rejects anything that isn't
a bare origin, because the regex anchors and the wildcard doesn't span
slashes. Renamed and added a non-traversal path case to make the
intent explicit. Real callers go through normaliseOrigin first.
headers.has("cookie") returns true for an empty value, which would
trigger a spurious 403 if a proxy strips the cookie value. Switch to
a falsy check on get("cookie") so empty and missing are equivalent.
Next.js 16 deprecated the `middleware` file convention in favour of
`proxy`; the build emits a deprecation warning otherwise. Renames the
file and the exported function, and updates the doc comments and the
test import path. Behaviour and matcher are unchanged — Next recognises
both filenames identically until removal.
@suisuss
Copy link
Copy Markdown
Author

suisuss commented Apr 29, 2026

Review feedback addressed

Six follow-up commits on top of the original:

  1. refactor: KEEP-240 check origin before session DB lookup — addresses review item 2; avoids a wasted DB hit when the request will be rejected anyway. Bearer/API-key callers still short-circuit before the check.
  2. test: KEEP-240 cover origin check on resolveOrganizationId and resolveCreatorContext — addresses review item 3; adds explicit happy-path and rejection tests for the other two helpers so a regression in any single helper is caught independently.
  3. test: KEEP-240 clarify the path-rejection test name and intent — addresses review item 4; the previous "path traversal" name implied a security guarantee the function isn't making.
  4. fix: KEEP-240 treat empty Cookie header as no cookies — addresses review item 5; switches from headers.has(\"cookie\") to a falsy check on get(\"cookie\") so proxies that strip cookie values don't trigger spurious 403s. Adds a test.
  5. refactor: KEEP-240 rename middleware.ts to proxy.ts (Next.js 16) — Next.js 16 deprecated the middleware file convention in favour of proxy; the build emits a deprecation warning otherwise. Renames file + exported function + updates doc comments + test import.

54/54 unit tests pass; lint and type-check clean on new code.

Notes (review items 1 and 6)

Server Actions: app/oauth/authorize/page.tsx defines handleApprove/handleDeny Server Actions that authenticate via session cookies and are not under /api/**, so neither the proxy nor getDualAuthContext gates them. Next.js 14+ ships a default Origin-vs-Host check on Server Actions, and next.config.ts does not override serverActions.allowedOrigins, so they are implicitly safe. Worth a follow-up ticket to make this explicit (set allowedOrigins to the trusted-origin list) and document the assumption.

Live smoke test: I attempted pnpm dev to curl-test the proxy on /api/workflows; the dev server crashed with an esbuild deadlock on first-request compilation. Reproducible without my changes (the same crash hit /api/health), so it is a pre-existing Turbopack/instrumentation issue in this repo. Falling back to pnpm build made progress to the optimised-build stage but didn't complete in a reasonable window. The 54 unit tests cover all proxy logic paths (matcher, exempt paths, method enforcement, origin/referer fallback, missing/empty cookie handling) — but a real cookie-bearing browser POST against the deployed PR environment is the right final check. Hence requesting deploy-pr-environment.

@github-actions
Copy link
Copy Markdown

PR Environment Deployed

Your PR environment has been deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Schedule Dispatcher (staging image)
  • Block Dispatcher (staging image)
  • Event Tracker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

The Dockerfile lists every root-level source file explicitly (granular
COPYs for BuildKit layer caching, see line 28). The new proxy.ts was
not added when the file was introduced, so the deployed image had no
CSRF middleware and only Layer B (in getDualAuthContext) was active in
production. Smoke testing the PR deploy showed cross-origin POSTs to
auth.api.getSession callers (e.g. /api/api-keys) reaching the handler.

Adding the COPY makes Layer A active in builds.
@github-actions
Copy link
Copy Markdown

PR Environment Deployed

Your PR environment has been deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Schedule Dispatcher (staging image)
  • Block Dispatcher (staging image)
  • Event Tracker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

@suisuss
Copy link
Copy Markdown
Author

suisuss commented Apr 29, 2026

Smoke test on deploy-pr-1048: PASS after Dockerfile fix

Scenario Expected Actual
GET /api/auth/get-session same-origin 200 200
POST /api/api-keys + EVIL origin 403 403
POST /api/user/wallet + EVIL origin 403 403
PATCH /api/user + EVIL origin 403 403
POST /api/integrations + EVIL origin 403 403
POST /api/api-keys + same-origin 200 (handler) 200
POST /api/api-keys + missing Origin/Referer 403 403
POST /api/api-keys + only Referer trusted 200 (handler) 200
GET /api/workflows + EVIL origin 200 (GET allowed) 200
POST /api/auth/sign-out + EVIL origin bypass (exempt) 500 (better-auth side, not CSRF related)

The bolded rows are the ones that would have been vulnerable without Layer A — routes that authenticate with auth.api.getSession directly and bypass getDualAuthContext (Layer B). All now blocked.

The /api/auth/sign-out 500 is unrelated to CSRF; the proxy correctly bypassed (exempt prefix), and better-auth's own handler returned 500. Worth a separate ticket if reproducible, but not a blocker for this PR.

What we found

Pre-fix, on the deploy:

  • POST /api/api-keys + cross-origin + session cookie200, real API key created. I created (and immediately deleted) two real keys during smoke testing.
  • Only routes going through getDualAuthContext were protected, which is ~33 of ~124 mutating routes.

Root cause: the Dockerfile uses granular per-directory COPYs and never picked up the new root-level proxy.ts. Commit 9ecf813d adds the missing COPY proxy.ts ./ line. The Dockerfile COPY Validation workflow only checks the inverse (every COPY source exists), so it can't catch "you added a root file but forgot to COPY it" — worth a follow-up to add a complementary check.

Note on cookie semantics

POST /api/api-keys with only Cloudflare Access cookies (no session cookie) and EVIL origin returned 403. That's because the proxy treats any non-empty Cookie header as "potentially cookie-authenticated" and enforces origin — it can't distinguish session cookies from CF Access cookies without parsing. This is conservative-by-design but means cross-origin Bearer/API-key callers that happen to traverse Cloudflare Access will also be gated. In practice, those callers shouldn't need cross-origin behaviour, and the /api/oauth/, /api/cron/, /api/billing/webhooks/, and webhook routes are all exempt — so the practical impact should be nil. Flagging in case it surprises anyone.

Cleanup

All test API keys deleted (verified empty list). Local cookie file wiped.

…present

The previous "any non-empty Cookie header" check caused false 403s for
Bearer/API-key callers behind Cloudflare Access — CF_AppSession and
CF_Authorization cookies are present on every request through CF Access,
which the proxy treated as session-bearing.

Tighten the check to look specifically for the better-auth session token
cookie name in the Cookie header:

  better-auth.session_token=          (dev)
  __Secure-better-auth.session_token= (prod, with secure prefix)

Cookieless callers and callers carrying only unrelated cookies
(CF Access tokens, analytics, sidebar-collapsed, etc.) bypass the
origin check. Real CSRF still requires the victim's session cookie to
be sent, which triggers the substring match and the origin check.

Adds a pinning test that imports getCookies from better-auth/cookies
and asserts the substring still matches what better-auth generates,
so a future better-auth upgrade that renames the cookie fails CI
rather than silently disabling the gate.
@github-actions
Copy link
Copy Markdown

PR Environment Deployed

Your PR environment has been deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Schedule Dispatcher (staging image)
  • Block Dispatcher (staging image)
  • Event Tracker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

@suisuss
Copy link
Copy Markdown
Author

suisuss commented Apr 29, 2026

Cookie tightening fix verified on PR deploy

Commit a32aa8da scopes the cookie check to the better-auth session token name (better-auth.session_token= / __Secure-better-auth.session_token=) instead of any non-empty Cookie header. Closes the false-positive that gated Bearer/API-key callers behind Cloudflare Access. Includes a pinning test that imports getCookies from better-auth/cookies and asserts the substring matches what better-auth generates.

Smoke matrix on the redeployed PR environment:

Scenario Expected Result
A. CSRF protection (original threat)
POST /api/api-keys + session + evil origin 403 403
POST /api/integrations + session + evil origin 403 403
PATCH /api/user + session + evil origin 403 403
B. Bearer behind CF Access (was false-positive)
POST /api/api-keys + Bearer + CF cookies + evil not 403 401 from handler
POST /api/integrations + Bearer + CF cookies + evil not 403 400 from handler
PATCH /api/user + Bearer + CF cookies + evil not 403 401 from handler
C. Browser flow still works
POST /api/api-keys + session + same-origin 200 200, key created
D. Edge cases
Session POST with no Origin/Referer 403 403
Session POST with only trusted Referer 200 200
GET cross-origin 200 (not enforced) 200

66/66 unit tests pass (12 new)

@joelorzet
Copy link
Copy Markdown

Hi @suisuss, LGTM. Layered design is sound and the better-auth cookie-name pinning test is a great touch.

A few notes:

  1. Tighten the cookie match (lib/trusted-origins.ts): cookie.includes("better-auth.session_token=") would also match cookie values or prefixed names. Not a security issue, but a boundary-anchored regex would be cleaner:

    const SESSION_COOKIE_RE = /(?:^|;\s*)(?:__Secure-)?better-auth\.session_token=/;
    
    export function hasSessionCookie(headers: Headers): boolean {
      const cookie = headers.get("cookie");
      return cookie ? SESSION_COOKIE_RE.test(cookie) : false;
    }

    Update the pinning test to assert SESSION_COOKIE_RE.test(...).

  2. Redundant entry: https://app-staging.keeperhub.com is already covered by https://*.keeperhub.com. Drop it.

  3. Coverage gap (follow-up): ~28 routes call auth.api.getSession directly and bypass the defence-in-depth layer. They're protected solely by proxy.ts. Worth a follow-up to converge on a shared helper.

  4. Browser smoke test in the PR body is still unchecked, worth completing before merge.

…ndant trusted origin

Addresses review notes 1 and 2:
- Replace SESSION_COOKIE_NAME_SUBSTRING (substring match) with SESSION_COOKIE_RE,
  a boundary-anchored regex that rejects the substring appearing inside a cookie
  value or as a suffix of an unrelated cookie name. Cleanup, not a security fix.
- Drop https://app-staging.keeperhub.com from TRUSTED_ORIGINS; it is already
  covered by the https://*.keeperhub.com pattern.
- Update pinning test to assert the regex matches the cookie name better-auth
  generates, and add new tests for the boundary cases the substring check missed.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

PR Environment Deployed

Your PR environment has been deployed!

Environment Details:

Components:

  • Keeperhub Application
  • PostgreSQL Database (isolated instance)
  • LocalStack (SQS emulation)
  • Redis (isolated instance)
  • Schedule Dispatcher (staging image)
  • Block Dispatcher (staging image)
  • Event Tracker (staging image)

The environment will be automatically cleaned up when this PR is closed or merged.

@suisuss
Copy link
Copy Markdown
Author

suisuss commented May 3, 2026

Review feedback addressed (notes 1, 2, 4) + follow-up filed (note 3)

Pushed 6065c852:

  1. Cookie regex tightening — replaced cookie.includes("better-auth.session_token=") with the boundary-anchored regex /(?:^|;\s*)(?:__Secure-)?better-auth\.session_token=/. Updated the pinning test to assert SESSION_COOKIE_RE.test(...) and added 3 new tests for the boundary cases a substring check misses (substring inside a cookie value, suffix of an unrelated name, whitespace-flexible separator).
  2. Dropped redundant trusted origin — removed https://app-staging.keeperhub.com (covered by https://*.keeperhub.com). Tests adjusted accordingly.
  3. Coverage-gap follow-up — filed KEEP-402.

69/69 unit tests pass (+3 new boundary cases). Lint and type-check clean on changed files.

Re-smoke on app-pr-1048 (image 6065c85)

Scenario Expected Result
A. CSRF blocks
POST /api/api-keys evil Origin 403 403
POST /api/api-keys evil Referer only 403 403
POST /api/api-keys no Origin/Referer 403 403
PATCH /api/user evil Origin 403 403
DELETE /api/api-keys/{id} evil Origin 403 403
B. Trusted origin pass-through
POST /api/api-keys trusted Origin not-403 200 (key created and deleted)
POST /api/api-keys trusted Referer only not-403 200 (key created and deleted)
C. Cookie tightening (regex now anchored)
POST evil + only CF cookies (no session) not-403 401 (handler-level)
POST evil + no cookies not-403 302 (handler-level)
POST evil + decoy=better-auth.session_token=trapvalue not-403 (anchor rejects substring inside a cookie value) 302 (handler-level) -- the OLD substring check would have returned 403 here
D. Exempt paths
POST /api/auth/sign-out evil bypass proxy 403 from better-auth
POST /api/billing/webhooks/stripe evil bypass proxy 404 (handler-level)
E. GET (unguarded)
GET /api/user/wallet evil origin 200 200

13/13. Cookies wiped, test keys deleted (verified via list).

@suisuss suisuss merged commit 7f7e9fd into staging May 3, 2026
35 checks passed
@suisuss suisuss deleted the fix/KEEP-240-csrf-origin-check branch May 3, 2026 02:56
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • Namespace: pr-1048
  • All Helm releases (Keeperhub, Scheduler, Event services)
  • PostgreSQL Database (including data)
  • LocalStack, Redis
  • All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@suisuss
Copy link
Copy Markdown
Author

suisuss commented May 3, 2026

Note on row E (GET unguarded) and audit of GET side effects

A reviewer might reasonably ask why GET /api/user/wallet returns 200 to an evil-origin caller in the smoke matrix. Adding context here so the result isn't read as a gap.

The proxy intentionally skips GET/HEAD/OPTIONS. The 200 is by design. The CSRF threat (forge a state-changing request via the victim's browser) is mitigated for GETs by three layers below the proxy:

  1. SameSite=Lax stops the cookie attaching. Better-auth's session cookie is SameSite=Lax. A real cross-origin fetch from evil.com does NOT include the session cookie, so a browser-driven version of this attack arrives unauthenticated and the handler 401s.
  2. CORS stops the response read. Even if the cookie did attach, the browser blocks evil.com from reading the body (no permissive CORS headers on /api/user/wallet).
  3. GET endpoints are pure reads -- the classic CSRF threat is forging writes.

The smoke test bypasses (1) by manually attaching the session cookie via curl -b. That simulates a non-browser threat (server-to-server with stolen cookies, MITM), in which the attacker already has the session token and a GET is the least of the worries.

That argument breaks if any GET handler actually mutates state. I audited all GET handlers under app/api/** for side effects.

Route Side effect Auth CSRF risk
app/api/cron/agentic-wallet-sweeper/route.ts Updates wallet_approval_requests, deletes expired rate-limit + spend rows Authorization: Bearer $CRON_SECRET None -- Bearer auth, browser can't attach. Path also under exempt /api/cron/ prefix.
app/api/internal/reaper/route.ts Updates workflow_executions, workflow_execution_logs, releases wallet_locks authenticateInternalService(request) None -- internal-service auth, browser can't attach.

58+ GET handlers scanned. The two with side effects are both batch/cleanup endpoints gated by Bearer / internal-service auth (NOT session cookies). All user-facing GET routes are reads only, so the SameSite + CORS argument holds for them.

Conclusion: skip-GET in the proxy is the correct design and matches better-auth's own behaviour. No further changes needed for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants