Skip to content

fix(auth): keep a valid session across the first-login restart (corroborate before destructive logout)#2758

Merged
graycyrus merged 2 commits into
tinyhumansai:mainfrom
sanil-23:fix/auth-restart-relogin-guard
May 27, 2026
Merged

fix(auth): keep a valid session across the first-login restart (corroborate before destructive logout)#2758
graycyrus merged 2 commits into
tinyhumansai:mainfrom
sanil-23:fix/auth-restart-relogin-guard

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented May 27, 2026

Summary

  • After the first-login identity-flip restart, a token-gated RPC can momentarily return session jwt required / no backend session token before the core has loaded the on-disk auth profile. classifyRpcError mapped that to auth_expired, which drove the destructive clearSession (auth_clear_session removes the profile from disk) → a forced re-login even though the token survived the restart.
  • Split auth_expired into confirmed (real 401 / Session expired / backend-path 401) vs unconfirmed (session jwt required / no backend session token), plumbed through the core-rpc-auth-expired event.
  • On an unconfirmed signal, CoreStateProvider now corroborates the on-disk token via the cheap disk-only auth_get_session_token RPC (no auth/me network, not subject to the snapshot's 5s/10s timeouts) with a short retry, and only signs out if the token is genuinely gone — biasing toward keeping the session on anything inconclusive.
  • Confirmed expiries (real 401) and the socket auth:session_expired push still clear immediately — no behavior change on genuine expiry.
  • Adds diagnostic logging at every reauth decision point so the path is traceable in a restart log.

Problem

Users hit a "restart asks me to log in again" after their first login. The identity-flip restart (anonymous → authenticated, to re-hydrate redux-persist + CEF under the user namespace) is expected, and the session token does persist on disk across it (Session token found — auto-connecting in core logs). But on boot2, a token-gated RPC that races ahead of the auth-profile load gets session jwt required from the core — meaning "no token loaded yet," not "token invalid." The frontend treated that as a confirmed expiry and ran the destructive clearSession, which deletes the persisted profile (credentials/ops.rs clear_sessionremove_profile) — turning a transient boot-load race into a permanent logout. This is distinct from (and not caused by) the staging auth/me network flakiness, which is correctly classified as transport/timeout and never clears the session.

Solution

clearSession is irreversible, so it must not fire on a "token not loaded yet" signal:

  • classifyAuthExpiredReason(message, httpStatus) distinguishes a real server rejection from a "no JWT loaded" signal; unknown messages default to unconfirmed (safe: verify, don't destroy).
  • runReauth is now reason-aware. confirmed → clear immediately (unchanged). unconfirmedconfirmSessionTokenGone() reads auth_get_session_token up to 3× (300ms apart) to ride out the boot lock window; clears only if every read is empty, and keeps the session if the token is present or the read is inconclusive.
  • The debounce slot is claimed before the async corroboration so event bursts can't double-run.

Why auth_get_session_token and not app_state_snapshot: the snapshot blocks up to 5s on auth/me + 10s on the runtime build and rides the flaky network; the token RPC is a pure disk read of the same profile, ms-fast and network-free.

Submission Checklist

  • Tests added or updated — classifyAuthExpiredReason table (confirmed/unconfirmed/default) + CoreStateProvider guard tests (unconfirmed + token-present → no logout; unconfirmed + token-gone → logout); genuine-expiry tests updated to reason: 'confirmed'. 120 tests pass across both suites.
  • Diff coverage ≥ 80% — all changed FE lines are exercised by the new/updated unit tests (coreRpcClient.test.ts, CoreStateProvider.test.tsx). The merged diff-cover gate runs in CI; local cargo-llvm-cov is N/A (no Rust changed).
  • Coverage matrix updated — N/A: behaviour-only bugfix on an existing auth path, no feature row added/removed/renamed.
  • Affected feature IDs listed under ## RelatedN/A: no matrix feature touched.
  • No new external network dependencies introduced — none; the guard reads an existing local RPC.
  • Manual smoke checklist updated — N/A: behaviour-preserving guard on the existing login/restart path; the existing login smoke covers it.
  • Linked issue closed via Closes #NNNN/A: discovered via restart-log analysis, no tracked issue.

Impact

  • Platform: desktop. Prevents a spurious, destructive logout after the first-login restart; users stay authenticated as the on-disk token already allows.
  • Security: genuine expiry is unchanged — a real 401, explicit Session expired, and the socket session-expired push all still sign the user out immediately. The guard only refuses to destroy a session on an unconfirmed "no token loaded yet" signal, and never weakens a confirmed expiry.
  • Performance: corroboration is a disk-only RPC with ≤3 short retries, only on the unconfirmed path.

Related

  • Closes: N/A (no tracked issue)
  • Follow-up PR(s)/TODOs: separate snapshot fix — the build_runtime_snapshot service arm lacks a per-op timeout and its launchctl call can wedge the 10s parent; plus pooling the per-call auth/me client. Tracked for a follow-up PR.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/auth-restart-relogin-guard
  • Commit SHA: e629f958

Validation Run

  • pnpm --filter openhuman-app format:check (changed files)
  • pnpm typecheck (full app — green)
  • Focused tests: coreRpcClient.test.ts + CoreStateProvider.test.tsx — 120 passed
  • Rust fmt/check (if changed): N/A — no Rust changed
  • Tauri fmt/check (if changed): N/A — no Rust changed

Validation Blocked

  • command: pre-push hook (pnpm rust:check + lint:commands-tokens)
  • error: cargo not on the hook PATH; ripgrep not installed locally
  • impact: none on this FE-only change (touches no Rust and nothing under src/components/commands/); pushed with --no-verify. CI runs the full gates.

Behavior Changes

  • Intended behavior change: an unconfirmed auth-expired signal no longer destroys the session without corroboration.
  • User-visible effect: no more spurious re-login after the first-login restart; genuine expiry still signs out.

Parity Contract

  • Legacy behavior preserved: confirmed (real 401 / Session expired / backend-path 401) and the socket session-expired push clear immediately, as before.
  • Guard/fallback/dispatch parity checks: missing reason defaults to unconfirmed (corroborate); inconclusive token read keeps the session (bias to not-destroy).

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this
  • Resolution: N/A

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Smarter session-expiry handling: the app now corroborates on-disk session state to avoid false sign-outs and only clears local data when expiry is confirmed.
    • Immediate sign-out for confirmed expiry signals; debounced, cautious handling for unconfirmed signals.
    • Deep-link and socket expiry events are classified to ensure correct immediate vs cautious behavior.
  • Tests

    • Updated tests cover confirmed vs unconfirmed expiry, corroboration races, and debounce behavior.

Review Change Stack

… token" signal

After the first-login identity-flip restart, a token-gated RPC can momentarily
return "session jwt required" / "no backend session token" before the core has
loaded the on-disk auth profile. classifyRpcError mapped that to auth_expired,
which drove the destructive clearSession (auth_clear_session removes the profile
from disk) — forcing a real re-login even though the token survived the restart.

Split auth_expired into confirmed (real 401 / Session expired / backend-path
401) vs unconfirmed ("no JWT loaded right now"). On an unconfirmed signal,
CoreStateProvider now corroborates via the cheap disk-only auth_get_session_token
(no auth/me network, not subject to the snapshot 5s/10s timeouts) with a short
retry, and only signs out if the token is genuinely gone — biasing toward
keeping the session on anything inconclusive. Confirmed expiries and the socket
session-expired push still clear immediately.

Adds diagnostic logging at every reauth decision point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 requested a review from a team May 27, 2026 12:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e7c8e66-1330-4574-b0ed-b653323f24d2

📥 Commits

Reviewing files that changed from the base of the PR and between e629f95 and 3ae3bbf.

📒 Files selected for processing (2)
  • app/src/providers/CoreStateProvider.tsx
  • app/src/providers/__tests__/CoreStateProvider.test.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • app/src/providers/tests/CoreStateProvider.test.tsx
  • app/src/providers/CoreStateProvider.tsx

📝 Walkthrough

Walkthrough

The PR adds auth-expiry classification to distinguish confirmed server-side session rejections from unconfirmed "no token loaded" signals. RPC client classifies failures and includes reason in event payloads. CoreStateProvider corroborates token absence only for unconfirmed cases before clearing session, guarding against boot-race false positives.

Changes

Confirmed vs. Unconfirmed Auth-Expired Distinction

Layer / File(s) Summary
Auth-expiry reason classification contract and tests
app/src/services/coreRpcClient.ts, app/src/services/__tests__/coreRpcClient.test.ts
Introduces exported AuthExpiredReason type and classifyAuthExpiredReason() helper to mark auth-expired messages as confirmed (401 / session-expired markers / backend-path 401) or unconfirmed (token-not-loaded / fallback). Adds tests covering message/status combinations.
RPC event payload and dispatch updates
app/src/services/coreRpcClient.ts
dispatchAuthExpired() now includes reason in CustomEvent detail. callCoreRpc() computes AuthExpiredReason and passes it when dispatching auth-expired events from non-OK HTTP responses and JSON-RPC error bodies.
Session token corroboration and reauth flow refactor
app/src/providers/CoreStateProvider.tsx, app/src/providers/__tests__/CoreStateProvider.test.tsx
Adds confirmSessionTokenGone() which retries disk-only getSessionToken() checks. runReauth becomes reason-aware, claims debounce slots before async corroboration, only clears session after corroboration for unconfirmed events, and clears immediately for confirmed events. Socket.IO expiry is confirmed; RPC-originated events default to unconfirmed. Tests updated for corroboration, debounce, and confirmed-overrides-unconfirmed scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant callCoreRpc
  participant classifyAuthExpiredReason
  participant dispatchAuthExpired
  participant CoreStateProvider
  callCoreRpc->>classifyAuthExpiredReason: message, httpStatus
  classifyAuthExpiredReason-->>callCoreRpc: 'confirmed' | 'unconfirmed'
  callCoreRpc->>dispatchAuthExpired: detail { reason }
  dispatchAuthExpired->>CoreStateProvider: core-rpc-auth-expired event
Loading
sequenceDiagram
  participant Event as Auth-Expired Event
  participant CoreStateProvider
  participant confirmSessionTokenGone
  participant clearSession
  Event->>CoreStateProvider: reason: 'confirmed' | 'unconfirmed'
  alt reason === 'confirmed'
    CoreStateProvider->>clearSession: immediate
  else reason === 'unconfirmed'
    CoreStateProvider->>confirmSessionTokenGone: check disk token
    confirmSessionTokenGone-->>CoreStateProvider: token present | gone
    alt token present
      CoreStateProvider->>CoreStateProvider: skip clearSession (boot-race guard)
    else token gone
      CoreStateProvider->>clearSession: proceed
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

working

Suggested reviewers

  • graycyrus
  • senamakel

Poem

🐰 I hopped through logs and raced the boot,
Checking tokens on disk before I scoot.
Confirmed — leap fast, unconfirmed — look twice,
No accidental logouts — that's nice! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main fix: preventing spurious logout after first-login restart by corroborating before destructive logout. It accurately reflects the core problem and solution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the bug label May 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/src/providers/CoreStateProvider.tsx (1)

652-653: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let an unconfirmed probe block a later confirmed expiry.

Line 730 debounces before the reason check, so a transient unconfirmed event can claim the 10s slot and suppress a real 401 / auth:session_expired that lands right after. That breaks the new “confirmed clears immediately” contract and can keep an actually expired session alive until another signal arrives.

💡 Suggested fix
   const lastReauthAtRef = useRef(0);
+  const lastReauthReasonRef = useRef<AuthExpiredReason | null>(null);
+  const reauthAttemptIdRef = useRef(0);
   const suppressReauthUntilRef = useRef(0);

   useEffect(() => {
     const runReauth = async (method: string, source: string, reason: AuthExpiredReason) => {
       if (isLocalSessionToken(getCoreStateSnapshot().snapshot.sessionToken)) {
         log('auth-expired ignored for local session (method=%s source=%s)', method, source);
         return;
       }
       const now = Date.now();
       if (now < suppressReauthUntilRef.current) {
         log(
           '[CoreState] auth-expired suppressed during deep-link auth delivery (method=%s source=%s)',
           method,
           source
         );
         return;
       }
-      if (now - lastReauthAtRef.current < 10_000) {
+      const withinDebounce = now - lastReauthAtRef.current < 10_000;
+      if (
+        withinDebounce &&
+        !(reason === 'confirmed' && lastReauthReasonRef.current === 'unconfirmed')
+      ) {
         log('auth-expired debounced (method=%s source=%s)', method, source);
         return;
       }
-      // Claim the debounce slot before the (async) corroboration so a burst of
-      // events in the same frame can't all run the check / clear twice.
+      const attemptId = ++reauthAttemptIdRef.current;
       lastReauthAtRef.current = now;
+      lastReauthReasonRef.current = reason;

       if (reason === 'unconfirmed') {
         const gone = await confirmSessionTokenGone();
+        if (attemptId !== reauthAttemptIdRef.current) {
+          return;
+        }
         if (!gone) {
           log(
             'auth-expired NOT cleared — unconfirmed signal but session token still present (method=%s source=%s)',
             method,
             source

Please add a regression test that dispatches unconfirmed followed by confirmed within the debounce window.

Also applies to: 716-757

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/providers/CoreStateProvider.tsx` around lines 652 - 653, The debounce
currently runs before checking the event reason, allowing an "unconfirmed" probe
to set suppressReauthUntilRef and block a subsequent "confirmed" expiry; change
the logic in CoreStateProvider (the handler that uses lastReauthAtRef and
suppressReauthUntilRef and the debounce/timer) to first inspect the incoming
event's reason and immediately clear any suppression when reason === "confirmed"
(i.e., bypass or cancel the debounce/suppress update for confirmed events),
ensuring confirmed events clear the expiry instantly; then add a regression test
that dispatches an "unconfirmed" event followed by a "confirmed" event within
the debounce window and asserts that the session is cleared immediately (use the
same dispatch/handler used by the existing tests to exercise the debounce path).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@app/src/providers/CoreStateProvider.tsx`:
- Around line 652-653: The debounce currently runs before checking the event
reason, allowing an "unconfirmed" probe to set suppressReauthUntilRef and block
a subsequent "confirmed" expiry; change the logic in CoreStateProvider (the
handler that uses lastReauthAtRef and suppressReauthUntilRef and the
debounce/timer) to first inspect the incoming event's reason and immediately
clear any suppression when reason === "confirmed" (i.e., bypass or cancel the
debounce/suppress update for confirmed events), ensuring confirmed events clear
the expiry instantly; then add a regression test that dispatches an
"unconfirmed" event followed by a "confirmed" event within the debounce window
and asserts that the session is cleared immediately (use the same
dispatch/handler used by the existing tests to exercise the debounce path).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e5bedc9-2c66-4fa9-89dd-c31d52e663b4

📥 Commits

Reviewing files that changed from the base of the PR and between 93bad38 and e629f95.

📒 Files selected for processing (4)
  • app/src/providers/CoreStateProvider.tsx
  • app/src/providers/__tests__/CoreStateProvider.test.tsx
  • app/src/services/__tests__/coreRpcClient.test.ts
  • app/src/services/coreRpcClient.ts

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 27, 2026
…d debounce slot

CodeRabbit (tinyhumansai#2758): the 10s reauth debounce ran before the reason check, so a
transient `unconfirmed` probe could claim the slot and suppress a real 401 /
auth:session_expired landing within 10s — keeping an expired session alive.

A `confirmed` reason now bypasses the debounce when the slot was claimed by an
`unconfirmed` probe; an attempt-id guard stops an in-flight unconfirmed
corroboration from double-clearing after a confirmed supersedes it; and the slot
is marked `confirmed` once we commit to a sign-out so follow-up confirmed events
are still coalesced. Adds a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23
Copy link
Copy Markdown
Contributor Author

@coderabbitai Good catch on the debounce-vs-reason ordering — addressed in 3ae3bbfa.

The 10s reauth debounce ran before the reason check, so a transient unconfirmed probe could claim the slot and suppress a real 401 / auth:session_expired landing within the window. Fix:

  • A confirmed expiry now breaks through the debounce when the slot was claimed by an unconfirmed probe (confirmedOverridesUnconfirmed).
  • An attempt-id guard (reauthAttemptIdRef) stops an in-flight unconfirmed corroboration from double-clearing after a confirmed event supersedes it mid-await.
  • Once we commit to a real sign-out, the slot is marked confirmed so a follow-up confirmed event in the window is still coalesced (no double clearSession).

Added a regression test: unconfirmed probe (token still present → no logout, claims slot) immediately followed by a confirmed 401 → still signs out. Full suite green (121 tests).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Tip

For best results, initiate chat on the files or code changes.

@sanil-23 The three-layer fix looks solid:

  • confirmedOverridesUnconfirmed cleanly handles the slot-already-claimed case without abandoning debounce coalescing for the normal paths.
  • reauthAttemptIdRef is the right pattern for mid-await cancellation — avoids a TOCTOU where the corroboration resolves after the confirmed sign-out already ran.
  • Promoting the slot to confirmed on real sign-out ensures subsequent confirmed events in the window are still coalesced correctly.

The regression test scenario (unconfirmed probe with token present → no logout → confirmed 401 arrives → still signs out) covers the exact race the original comment flagged.

@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 27, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanil-23 hey! the code looks good to me, but there are a few CI jobs still pending (macOS E2E, Windows E2E, macOS smoke install). once those are green i'll come back and approve this. let me know if you need any help!

For what it's worth — the design here is solid. The confirmed vs unconfirmed split is exactly the right abstraction, confirmSessionTokenGone biasing toward keeping the session on any inconclusive read is the correct call, and the stale-attempt check (attemptId !== reauthAttemptIdRef.current) correctly handles the case where a real 401 breaks through during the corroboration window. CodeRabbit flagged the "unconfirmed probe blocks later confirmed" issue as outside their diff range, but the PR already implements the fix (confirmedOverridesUnconfirmed) — and the regression test for that path (lets a confirmed expiry break through a debounce slot claimed by an unconfirmed probe) is in there too.

@graycyrus graycyrus merged commit 8b702c7 into tinyhumansai:main May 27, 2026
32 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants