fix(auth): keep a valid session across the first-login restart (corroborate before destructive logout)#2758
Conversation
… token" signal
After the first-login identity-flip restart, a token-gated RPC can momentarily
return "session jwt required" / "no backend session token" before the core has
loaded the on-disk auth profile. classifyRpcError mapped that to auth_expired,
which drove the destructive clearSession (auth_clear_session removes the profile
from disk) — forcing a real re-login even though the token survived the restart.
Split auth_expired into confirmed (real 401 / Session expired / backend-path
401) vs unconfirmed ("no JWT loaded right now"). On an unconfirmed signal,
CoreStateProvider now corroborates via the cheap disk-only auth_get_session_token
(no auth/me network, not subject to the snapshot 5s/10s timeouts) with a short
retry, and only signs out if the token is genuinely gone — biasing toward
keeping the session on anything inconclusive. Confirmed expiries and the socket
session-expired push still clear immediately.
Adds diagnostic logging at every reauth decision point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe PR adds auth-expiry classification to distinguish confirmed server-side session rejections from unconfirmed "no token loaded" signals. RPC client classifies failures and includes reason in event payloads. CoreStateProvider corroborates token absence only for unconfirmed cases before clearing session, guarding against boot-race false positives. ChangesConfirmed vs. Unconfirmed Auth-Expired Distinction
Sequence Diagram(s)sequenceDiagram
participant callCoreRpc
participant classifyAuthExpiredReason
participant dispatchAuthExpired
participant CoreStateProvider
callCoreRpc->>classifyAuthExpiredReason: message, httpStatus
classifyAuthExpiredReason-->>callCoreRpc: 'confirmed' | 'unconfirmed'
callCoreRpc->>dispatchAuthExpired: detail { reason }
dispatchAuthExpired->>CoreStateProvider: core-rpc-auth-expired event
sequenceDiagram
participant Event as Auth-Expired Event
participant CoreStateProvider
participant confirmSessionTokenGone
participant clearSession
Event->>CoreStateProvider: reason: 'confirmed' | 'unconfirmed'
alt reason === 'confirmed'
CoreStateProvider->>clearSession: immediate
else reason === 'unconfirmed'
CoreStateProvider->>confirmSessionTokenGone: check disk token
confirmSessionTokenGone-->>CoreStateProvider: token present | gone
alt token present
CoreStateProvider->>CoreStateProvider: skip clearSession (boot-race guard)
else token gone
CoreStateProvider->>clearSession: proceed
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/providers/CoreStateProvider.tsx (1)
652-653:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't let an unconfirmed probe block a later confirmed expiry.
Line 730 debounces before the
reasoncheck, so a transientunconfirmedevent can claim the 10s slot and suppress a real 401 /auth:session_expiredthat lands right after. That breaks the new “confirmed clears immediately” contract and can keep an actually expired session alive until another signal arrives.💡 Suggested fix
const lastReauthAtRef = useRef(0); + const lastReauthReasonRef = useRef<AuthExpiredReason | null>(null); + const reauthAttemptIdRef = useRef(0); const suppressReauthUntilRef = useRef(0); useEffect(() => { const runReauth = async (method: string, source: string, reason: AuthExpiredReason) => { if (isLocalSessionToken(getCoreStateSnapshot().snapshot.sessionToken)) { log('auth-expired ignored for local session (method=%s source=%s)', method, source); return; } const now = Date.now(); if (now < suppressReauthUntilRef.current) { log( '[CoreState] auth-expired suppressed during deep-link auth delivery (method=%s source=%s)', method, source ); return; } - if (now - lastReauthAtRef.current < 10_000) { + const withinDebounce = now - lastReauthAtRef.current < 10_000; + if ( + withinDebounce && + !(reason === 'confirmed' && lastReauthReasonRef.current === 'unconfirmed') + ) { log('auth-expired debounced (method=%s source=%s)', method, source); return; } - // Claim the debounce slot before the (async) corroboration so a burst of - // events in the same frame can't all run the check / clear twice. + const attemptId = ++reauthAttemptIdRef.current; lastReauthAtRef.current = now; + lastReauthReasonRef.current = reason; if (reason === 'unconfirmed') { const gone = await confirmSessionTokenGone(); + if (attemptId !== reauthAttemptIdRef.current) { + return; + } if (!gone) { log( 'auth-expired NOT cleared — unconfirmed signal but session token still present (method=%s source=%s)', method, sourcePlease add a regression test that dispatches
unconfirmedfollowed byconfirmedwithin the debounce window.Also applies to: 716-757
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/providers/CoreStateProvider.tsx` around lines 652 - 653, The debounce currently runs before checking the event reason, allowing an "unconfirmed" probe to set suppressReauthUntilRef and block a subsequent "confirmed" expiry; change the logic in CoreStateProvider (the handler that uses lastReauthAtRef and suppressReauthUntilRef and the debounce/timer) to first inspect the incoming event's reason and immediately clear any suppression when reason === "confirmed" (i.e., bypass or cancel the debounce/suppress update for confirmed events), ensuring confirmed events clear the expiry instantly; then add a regression test that dispatches an "unconfirmed" event followed by a "confirmed" event within the debounce window and asserts that the session is cleared immediately (use the same dispatch/handler used by the existing tests to exercise the debounce path).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@app/src/providers/CoreStateProvider.tsx`:
- Around line 652-653: The debounce currently runs before checking the event
reason, allowing an "unconfirmed" probe to set suppressReauthUntilRef and block
a subsequent "confirmed" expiry; change the logic in CoreStateProvider (the
handler that uses lastReauthAtRef and suppressReauthUntilRef and the
debounce/timer) to first inspect the incoming event's reason and immediately
clear any suppression when reason === "confirmed" (i.e., bypass or cancel the
debounce/suppress update for confirmed events), ensuring confirmed events clear
the expiry instantly; then add a regression test that dispatches an
"unconfirmed" event followed by a "confirmed" event within the debounce window
and asserts that the session is cleared immediately (use the same
dispatch/handler used by the existing tests to exercise the debounce path).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3e5bedc9-2c66-4fa9-89dd-c31d52e663b4
📒 Files selected for processing (4)
app/src/providers/CoreStateProvider.tsxapp/src/providers/__tests__/CoreStateProvider.test.tsxapp/src/services/__tests__/coreRpcClient.test.tsapp/src/services/coreRpcClient.ts
…d debounce slot CodeRabbit (tinyhumansai#2758): the 10s reauth debounce ran before the reason check, so a transient `unconfirmed` probe could claim the slot and suppress a real 401 / auth:session_expired landing within 10s — keeping an expired session alive. A `confirmed` reason now bypasses the debounce when the slot was claimed by an `unconfirmed` probe; an attempt-id guard stops an in-flight unconfirmed corroboration from double-clearing after a confirmed supersedes it; and the slot is marked `confirmed` once we commit to a sign-out so follow-up confirmed events are still coalesced. Adds a regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai Good catch on the debounce-vs-reason ordering — addressed in The 10s reauth debounce ran before the
Added a regression test: unconfirmed probe (token still present → no logout, claims slot) immediately followed by a |
|
Tip For best results, initiate chat on the files or code changes.
The regression test scenario (unconfirmed probe with token present → no logout → confirmed 401 arrives → still signs out) covers the exact race the original comment flagged. |
graycyrus
left a comment
There was a problem hiding this comment.
@sanil-23 hey! the code looks good to me, but there are a few CI jobs still pending (macOS E2E, Windows E2E, macOS smoke install). once those are green i'll come back and approve this. let me know if you need any help!
For what it's worth — the design here is solid. The confirmed vs unconfirmed split is exactly the right abstraction, confirmSessionTokenGone biasing toward keeping the session on any inconclusive read is the correct call, and the stale-attempt check (attemptId !== reauthAttemptIdRef.current) correctly handles the case where a real 401 breaks through during the corroboration window. CodeRabbit flagged the "unconfirmed probe blocks later confirmed" issue as outside their diff range, but the PR already implements the fix (confirmedOverridesUnconfirmed) — and the regression test for that path (lets a confirmed expiry break through a debounce slot claimed by an unconfirmed probe) is in there too.
Summary
session jwt required/no backend session tokenbefore the core has loaded the on-disk auth profile.classifyRpcErrormapped that toauth_expired, which drove the destructiveclearSession(auth_clear_sessionremoves the profile from disk) → a forced re-login even though the token survived the restart.auth_expiredinto confirmed (real 401 /Session expired/ backend-path 401) vs unconfirmed (session jwt required/no backend session token), plumbed through thecore-rpc-auth-expiredevent.CoreStateProvidernow corroborates the on-disk token via the cheap disk-onlyauth_get_session_tokenRPC (noauth/menetwork, not subject to the snapshot's 5s/10s timeouts) with a short retry, and only signs out if the token is genuinely gone — biasing toward keeping the session on anything inconclusive.auth:session_expiredpush still clear immediately — no behavior change on genuine expiry.Problem
Users hit a "restart asks me to log in again" after their first login. The identity-flip restart (anonymous → authenticated, to re-hydrate redux-persist + CEF under the user namespace) is expected, and the session token does persist on disk across it (
Session token found — auto-connectingin core logs). But onboot2, a token-gated RPC that races ahead of the auth-profile load getssession jwt requiredfrom the core — meaning "no token loaded yet," not "token invalid." The frontend treated that as a confirmed expiry and ran the destructiveclearSession, which deletes the persisted profile (credentials/ops.rsclear_session→remove_profile) — turning a transient boot-load race into a permanent logout. This is distinct from (and not caused by) the stagingauth/menetwork flakiness, which is correctly classified astransport/timeoutand never clears the session.Solution
clearSessionis irreversible, so it must not fire on a "token not loaded yet" signal:classifyAuthExpiredReason(message, httpStatus)distinguishes a real server rejection from a "no JWT loaded" signal; unknown messages default tounconfirmed(safe: verify, don't destroy).runReauthis now reason-aware.confirmed→ clear immediately (unchanged).unconfirmed→confirmSessionTokenGone()readsauth_get_session_tokenup to 3× (300ms apart) to ride out the boot lock window; clears only if every read is empty, and keeps the session if the token is present or the read is inconclusive.Why
auth_get_session_tokenand notapp_state_snapshot: the snapshot blocks up to 5s onauth/me+ 10s on the runtime build and rides the flaky network; the token RPC is a pure disk read of the same profile, ms-fast and network-free.Submission Checklist
classifyAuthExpiredReasontable (confirmed/unconfirmed/default) +CoreStateProviderguard tests (unconfirmed + token-present → no logout; unconfirmed + token-gone → logout); genuine-expiry tests updated toreason: 'confirmed'. 120 tests pass across both suites.coreRpcClient.test.ts,CoreStateProvider.test.tsx). The mergeddiff-covergate runs in CI; local cargo-llvm-cov is N/A (no Rust changed).N/A: behaviour-only bugfix on an existing auth path, no feature row added/removed/renamed.## Related—N/A: no matrix feature touched.N/A: behaviour-preserving guard on the existing login/restart path; the existing login smoke covers it.Closes #NNN—N/A: discovered via restart-log analysis, no tracked issue.Impact
Session expired, and the socket session-expired push all still sign the user out immediately. The guard only refuses to destroy a session on an unconfirmed "no token loaded yet" signal, and never weakens a confirmed expiry.Related
build_runtime_snapshotservicearm lacks a per-op timeout and itslaunchctlcall can wedge the 10s parent; plus pooling the per-callauth/meclient. Tracked for a follow-up PR.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/auth-restart-relogin-guarde629f958Validation Run
pnpm --filter openhuman-app format:check(changed files)pnpm typecheck(full app — green)coreRpcClient.test.ts+CoreStateProvider.test.tsx— 120 passedN/A — no Rust changedN/A — no Rust changedValidation Blocked
command:pre-push hook (pnpm rust:check+lint:commands-tokens)error:cargonot on the hook PATH;ripgrepnot installed locallyimpact:none on this FE-only change (touches no Rust and nothing undersrc/components/commands/); pushed with--no-verify. CI runs the full gates.Behavior Changes
unconfirmedauth-expired signal no longer destroys the session without corroboration.Parity Contract
confirmed(real 401 /Session expired/ backend-path 401) and the socket session-expired push clear immediately, as before.reasondefaults tounconfirmed(corroborate); inconclusive token read keeps the session (bias to not-destroy).Duplicate / Superseded PR Handling
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests