fix(socket): refresh session token on Invalid token rejection (TAURI-RUST-9C) by graycyrus · Pull Request #2896 · tinyhumansai/openhuman

graycyrus · 2026-05-29T04:24:16Z

Summary

Fixes Sentry issue TAURI-RUST-9C — 237 events across 0.54.0 → 0.56.0.

Root cause: ws_loop captured the session token at SocketManager::connect time. On reconnect after a disconnect/token rotation, the server returned 44{"message":"Invalid token"}. The loop treated this as a regular ConnectionOutcome::Failed and retried with the same stale credential 5 times before declaring "sustained outage" and firing a Sentry event.

Fix:

Add ConnectionOutcome::InvalidToken to types.rs — distinct from transport failures.
In run_connection, detect "Invalid token" in the SIO connect-error message via is_invalid_token_error() (case-insensitive) and return InvalidToken instead of Failed.
In ws_loop, on InvalidToken: call refresh_session_token() which loads config + reads the current stored JWT (same path as handle_connect_with_session). On success (a different token), reset consecutive_failures/backoff and continue immediately with no backoff sleep. On failure (user logged out / store unavailable / same stale token returned), fall back to Failed accounting so the Sentry escalation still fires.
Only reset consecutive_failures counter and backoff after a genuinely different token is returned — prevents counter suppression if the store keeps returning the same stale JWT.

Tests added:

is_invalid_token_error_matches_backend_message — exact/case-variant wire shapes
is_invalid_token_error_does_not_match_other_errors — no false positives
run_connection_returns_invalid_token_on_auth_reject — end-to-end through a mock EIO server sending 44{"message":"Invalid token"}
connection_outcome_variants_can_be_constructed — updated to cover InvalidToken

Test plan

Unit tests pass: pnpm debug rust (or cargo test -p openhuman socket)
(N/A) Manual: rotate/expire a session token while the socket is connected → verify reconnect succeeds with refreshed token and no "sustained outage" Sentry event fires — covered by run_connection_returns_invalid_token_on_auth_reject e2e mock test; live rotation requires a production session
Verify consecutive_failures resets to 0 on InvalidToken (no false outage Sentry event after a single auth rejection)

Closes #2892

…RUST-9C) After a disconnect, the ws_loop reconnected using the same token captured at `SocketManager::connect` time. When the token had been rotated/expired, the server returned `44{"message":"Invalid token"}` — but the loop treated this as a regular `Failed` outcome and retried with the same stale credential 5 times before declaring "sustained outage" and firing a Sentry event. Fix: - Add `ConnectionOutcome::InvalidToken` to distinguish auth rejection from transient transport failures. - In `run_connection`, detect "Invalid token" in the SIO connect-error message via the new `is_invalid_token_error` helper (case-insensitive) and return `InvalidToken` instead of `Failed`. - In `ws_loop`, on `InvalidToken`: call `refresh_session_token()` which loads config + reads the current stored JWT (same path as `handle_connect_with_session`). On success, update the in-loop token and `continue` immediately (no backoff sleep). On failure (no stored token, user logged out), fall back to `Failed` accounting so the Sentry escalation still fires if the auth problem persists. - Add unit tests for `is_invalid_token_error` and an end-to-end `run_connection` test against a mock EIO server that sends the exact production wire payload `44{"message":"Invalid token"}`. Closes tinyhumansai#2892

coderabbitai · 2026-05-29T04:24:31Z

📝 Walkthrough

Walkthrough

Detects Socket.IO "Invalid token" connect ACKs as ConnectionOutcome::InvalidToken, refreshes the session token from the credential store, updates the in-loop token, and immediately retries the connection on a successful refresh; otherwise falls back to the existing failure/backoff path.

Changes

Socket Token Refresh & Recovery

Layer / File(s)	Summary
Token Invalidation Outcome Contract `src/openhuman/socket/types.rs`	`ConnectionOutcome` enum adds `InvalidToken` variant; unit tests updated to construct and match the new variant.
Invalid Token Error Detection and Testing `src/openhuman/socket/ws_loop.rs`, `src/openhuman/socket/ws_loop_tests.rs`	Adds `is_invalid_token_error()` predicate, classifies Socket.IO CONNECT ACK errors containing "Invalid token" (case-insensitive) as `ConnectionOutcome::InvalidToken`, and adds unit and end-to-end tests that exercise detection and the `run_connection` regression path. Also removes a prior TLS-EOF regression check from the sustained-outage suite.
Token Refresh and Reconnection Recovery `src/openhuman/socket/ws_loop.rs`	`ws_loop` token parameter becomes mutable. Adds `refresh_session_token()` helper that loads config and returns a non-empty stored session token. On `InvalidToken`, the reconnection loop resets failure/backoff, attempts token refresh, updates `token` and retries immediately on success; on refresh failure it records a failed attempt and proceeds with normal backoff.

Sequence Diagram(s)

sequenceDiagram
  participant WsLoop as ws_loop
  participant RunConn as run_connection
  participant SocketIO as Socket.IO Server
  participant Detector as is_invalid_token_error()
  participant TokenRefresh as refresh_session_token()
  participant CredStore as Credential Store
  
  WsLoop->>RunConn: attempt connection with current token
  RunConn->>SocketIO: CONNECT with auth token
  SocketIO-->>RunConn: CONNECT ACK error: "Invalid token"
  RunConn->>Detector: classify error message
  Detector-->>RunConn: true (case-insensitive match)
  RunConn-->>WsLoop: ConnectionOutcome::InvalidToken
  WsLoop->>WsLoop: reset failure/backoff state
  WsLoop->>TokenRefresh: request fresh token
  TokenRefresh->>CredStore: load config and session token
  alt token refresh succeeds
    CredStore-->>TokenRefresh: non-empty token returned
    TokenRefresh-->>WsLoop: Some(new_token)
    WsLoop->>WsLoop: update token variable
    WsLoop->>RunConn: retry immediately with refreshed token
  else token refresh fails
    CredStore-->>TokenRefresh: None (missing/blank/error)
    TokenRefresh-->>WsLoop: None
    WsLoop->>WsLoop: apply normal failure backoff
    WsLoop->>RunConn: retry on next backoff interval
  end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

tinyhumansai/openhuman#2810: Overlapping adjustments to error-classification tests and network-unreachable classifier in the same test suite.

Suggested Reviewers

oxoxDev
senamakel

Poem

🐰 I nibble bugs and hunt the clue,

"Invalid token" whisper true.
I fetch a fresh one, hop, and then—
reconnect, reconnect, we mend. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding token refresh logic for Invalid token rejection cases, with the issue reference for traceability.
Linked Issues check	✅ Passed	All objectives from issue `#2892` are met: InvalidToken variant distinguishes auth from transport failures, token refresh logic implemented in ws_loop and run_connection, consecutive_failures reset behavior included, and comprehensive unit/integration tests added.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing token refresh on invalid token rejection. The removal of a prior TLS EOF regression test is a minor cleanup related to test maintenance and does not introduce unrelated functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/socket/ws_loop.rs`:
- Around line 130-180: The code resets consecutive_failures and backoff before
actually obtaining a fresh token, which can suppress escalation or create tight
retry loops; instead, remove the early resets and only reset
consecutive_failures = 0 and backoff = Duration::from_millis(1000) after
refresh_session_token().await returns Some(fresh) and fresh != token (i.e., a
non-empty, different token), then assign token = fresh and continue immediately;
if refresh_session_token() returns None or returns the same token, treat it as a
regular failure path: increment consecutive_failures, call
log_connection_failure(...) as currently done, leave backoff unchanged, and do
not short-circuit the normal backoff sleep. Ensure you reference the
variables/functions consecutive_failures, backoff, token,
refresh_session_token(), and log_connection_failure when making this change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 487b3d2f-7ba6-4145-859e-f2cc3c550ab1

📥 Commits

Reviewing files that changed from the base of the PR and between a41d913 and fab8b47.

📒 Files selected for processing (3)

src/openhuman/socket/types.rs
src/openhuman/socket/ws_loop.rs
src/openhuman/socket/ws_loop_tests.rs

…ained Move consecutive_failures/backoff resets inside the `Some(fresh) if fresh != token` arm so they only fire when refresh_session_token() returns a *different* token. Previously the counters were cleared before the refresh completed, which meant: - A failed refresh (None) would restart the failure streak from 1 on every InvalidToken loop, suppressing the sustained-outage Sentry escalation. - A refresh that returned the same stale token would spin with no backoff delay, tight-looping against the server. Now `Some(_) | None` (same token or no token) falls through the normal Failed accounting so FAIL_ESCALATE_THRESHOLD still fires correctly.

…token-refresh-tauri-rust-9c

oxoxDev · 2026-05-29T11:29:06Z

Thanks for this, @graycyrus — and credit where due: this PR was filed first (this one at 04:24, #2905 at 05:47). Both target the same crash, TAURI-RUST-9C / #2892 (the stale-token "Invalid token" reconnect storm), with the same set of files.

We're going to consolidate on #2905 and close this one. To be transparent about why — it's not a comment on effort, the two diffs are genuinely close — but #2905 edges ahead on a few correctness points:

Token-error detection — fix(socket): refresh token before reconnect, fast-fail on Invalid token (#2892) #2905 anchors on both "socket.io connect error" and "invalid token"; this PR matches a bare contains("invalid token"), which can fire on unrelated upstream errors that happen to contain that phrase.
Bounded retry — on a refreshed-but-different token this PR resets consecutive_failures = 0 and continues with no backoff; if the token store keeps returning new tokens (rotation / non-deterministic source) that's an unbounded no-sleep retry loop. fix(socket): refresh token before reconnect, fast-fail on Invalid token (#2892) #2905 caps the immediate-retry to one attempt before falling through to backoff + escalation.
Genuine-expiry UX — fix(socket): refresh token before reconnect, fast-fail on Invalid token (#2892) #2905 fast-fails a definitively-expired session, surfaces "session expired — please sign in again" to the UI, and emits no Sentry event; this PR runs to the 5-retry sustained-outage path, which still fires the very Sentry event we're trying to silence.
This diff also drops the unrelated sustained_outage_for_tls_handshake_eof_classifies_as_expected regression test (TAURI-RUST-4ZD) — looks like a stale base.

Your is_invalid_token_error case-insensitivity rationale and the run_connection → InvalidToken e2e mock test are both nicely done — if any of that is missing from #2905 it's worth porting over. Closing this in favor of #2905; thanks again for the fast turnaround on the crash.

graycyrus added 2 commits May 29, 2026 09:51

chore: apply cargo fmt formatting

fab8b47

graycyrus requested a review from a team May 29, 2026 04:24

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026

coderabbitai Bot requested changes May 29, 2026

View reviewed changes

Comment thread src/openhuman/socket/ws_loop.rs

graycyrus added 2 commits May 29, 2026 12:00

Merge remote-tracking branch 'upstream/main' into fix/socket-invalid-…

298a054

…token-refresh-tauri-rust-9c

coderabbitai Bot approved these changes May 29, 2026

View reviewed changes

graycyrus requested a review from sanil-23 May 29, 2026 09:11

oxoxDev closed this May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(socket): refresh session token on Invalid token rejection (TAURI-RUST-9C)#2896

fix(socket): refresh session token on Invalid token rejection (TAURI-RUST-9C)#2896
graycyrus wants to merge 4 commits into
tinyhumansai:mainfrom
graycyrus:fix/socket-invalid-token-refresh-tauri-rust-9c

graycyrus commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

oxoxDev commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

graycyrus commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oxoxDev commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

graycyrus commented May 29, 2026 •

edited

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading