Skip to content

fix(socket): refresh session token on Invalid token rejection (TAURI-RUST-9C)#2896

Closed
graycyrus wants to merge 4 commits into
tinyhumansai:mainfrom
graycyrus:fix/socket-invalid-token-refresh-tauri-rust-9c
Closed

fix(socket): refresh session token on Invalid token rejection (TAURI-RUST-9C)#2896
graycyrus wants to merge 4 commits into
tinyhumansai:mainfrom
graycyrus:fix/socket-invalid-token-refresh-tauri-rust-9c

Conversation

@graycyrus
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus commented May 29, 2026

Summary

Fixes Sentry issue TAURI-RUST-9C — 237 events across 0.54.0 → 0.56.0.

Root cause: ws_loop captured the session token at SocketManager::connect time. On reconnect after a disconnect/token rotation, the server returned 44{"message":"Invalid token"}. The loop treated this as a regular ConnectionOutcome::Failed and retried with the same stale credential 5 times before declaring "sustained outage" and firing a Sentry event.

Fix:

  • Add ConnectionOutcome::InvalidToken to types.rs — distinct from transport failures.
  • In run_connection, detect "Invalid token" in the SIO connect-error message via is_invalid_token_error() (case-insensitive) and return InvalidToken instead of Failed.
  • In ws_loop, on InvalidToken: call refresh_session_token() which loads config + reads the current stored JWT (same path as handle_connect_with_session). On success (a different token), reset consecutive_failures/backoff and continue immediately with no backoff sleep. On failure (user logged out / store unavailable / same stale token returned), fall back to Failed accounting so the Sentry escalation still fires.
  • Only reset consecutive_failures counter and backoff after a genuinely different token is returned — prevents counter suppression if the store keeps returning the same stale JWT.

Tests added:

  • is_invalid_token_error_matches_backend_message — exact/case-variant wire shapes
  • is_invalid_token_error_does_not_match_other_errors — no false positives
  • run_connection_returns_invalid_token_on_auth_reject — end-to-end through a mock EIO server sending 44{"message":"Invalid token"}
  • connection_outcome_variants_can_be_constructed — updated to cover InvalidToken

Test plan

  • Unit tests pass: pnpm debug rust (or cargo test -p openhuman socket)
  • (N/A) Manual: rotate/expire a session token while the socket is connected → verify reconnect succeeds with refreshed token and no "sustained outage" Sentry event fires — covered by run_connection_returns_invalid_token_on_auth_reject e2e mock test; live rotation requires a production session
  • Verify consecutive_failures resets to 0 on InvalidToken (no false outage Sentry event after a single auth rejection)

Closes #2892

graycyrus added 2 commits May 29, 2026 09:51
…RUST-9C)

After a disconnect, the ws_loop reconnected using the same token captured at
`SocketManager::connect` time. When the token had been rotated/expired, the
server returned `44{"message":"Invalid token"}` — but the loop treated this
as a regular `Failed` outcome and retried with the same stale credential 5
times before declaring "sustained outage" and firing a Sentry event.

Fix:
- Add `ConnectionOutcome::InvalidToken` to distinguish auth rejection from
  transient transport failures.
- In `run_connection`, detect "Invalid token" in the SIO connect-error message
  via the new `is_invalid_token_error` helper (case-insensitive) and return
  `InvalidToken` instead of `Failed`.
- In `ws_loop`, on `InvalidToken`: call `refresh_session_token()` which loads
  config + reads the current stored JWT (same path as
  `handle_connect_with_session`). On success, update the in-loop token and
  `continue` immediately (no backoff sleep). On failure (no stored token, user
  logged out), fall back to `Failed` accounting so the Sentry escalation still
  fires if the auth problem persists.
- Add unit tests for `is_invalid_token_error` and an end-to-end
  `run_connection` test against a mock EIO server that sends the exact
  production wire payload `44{"message":"Invalid token"}`.

Closes tinyhumansai#2892
@graycyrus graycyrus requested a review from a team May 29, 2026 04:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

Detects Socket.IO "Invalid token" connect ACKs as ConnectionOutcome::InvalidToken, refreshes the session token from the credential store, updates the in-loop token, and immediately retries the connection on a successful refresh; otherwise falls back to the existing failure/backoff path.

Changes

Socket Token Refresh & Recovery

Layer / File(s) Summary
Token Invalidation Outcome Contract
src/openhuman/socket/types.rs
ConnectionOutcome enum adds InvalidToken variant; unit tests updated to construct and match the new variant.
Invalid Token Error Detection and Testing
src/openhuman/socket/ws_loop.rs, src/openhuman/socket/ws_loop_tests.rs
Adds is_invalid_token_error() predicate, classifies Socket.IO CONNECT ACK errors containing "Invalid token" (case-insensitive) as ConnectionOutcome::InvalidToken, and adds unit and end-to-end tests that exercise detection and the run_connection regression path. Also removes a prior TLS-EOF regression check from the sustained-outage suite.
Token Refresh and Reconnection Recovery
src/openhuman/socket/ws_loop.rs
ws_loop token parameter becomes mutable. Adds refresh_session_token() helper that loads config and returns a non-empty stored session token. On InvalidToken, the reconnection loop resets failure/backoff, attempts token refresh, updates token and retries immediately on success; on refresh failure it records a failed attempt and proceeds with normal backoff.

Sequence Diagram(s)

sequenceDiagram
  participant WsLoop as ws_loop
  participant RunConn as run_connection
  participant SocketIO as Socket.IO Server
  participant Detector as is_invalid_token_error()
  participant TokenRefresh as refresh_session_token()
  participant CredStore as Credential Store
  
  WsLoop->>RunConn: attempt connection with current token
  RunConn->>SocketIO: CONNECT with auth token
  SocketIO-->>RunConn: CONNECT ACK error: "Invalid token"
  RunConn->>Detector: classify error message
  Detector-->>RunConn: true (case-insensitive match)
  RunConn-->>WsLoop: ConnectionOutcome::InvalidToken
  WsLoop->>WsLoop: reset failure/backoff state
  WsLoop->>TokenRefresh: request fresh token
  TokenRefresh->>CredStore: load config and session token
  alt token refresh succeeds
    CredStore-->>TokenRefresh: non-empty token returned
    TokenRefresh-->>WsLoop: Some(new_token)
    WsLoop->>WsLoop: update token variable
    WsLoop->>RunConn: retry immediately with refreshed token
  else token refresh fails
    CredStore-->>TokenRefresh: None (missing/blank/error)
    TokenRefresh-->>WsLoop: None
    WsLoop->>WsLoop: apply normal failure backoff
    WsLoop->>RunConn: retry on next backoff interval
  end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • tinyhumansai/openhuman#2810: Overlapping adjustments to error-classification tests and network-unreachable classifier in the same test suite.

Suggested Reviewers

  • oxoxDev
  • senamakel

Poem

🐰 I nibble bugs and hunt the clue,

"Invalid token" whisper true.
I fetch a fresh one, hop, and then—
reconnect, reconnect, we mend. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding token refresh logic for Invalid token rejection cases, with the issue reference for traceability.
Linked Issues check ✅ Passed All objectives from issue #2892 are met: InvalidToken variant distinguishes auth from transport failures, token refresh logic implemented in ws_loop and run_connection, consecutive_failures reset behavior included, and comprehensive unit/integration tests added.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing token refresh on invalid token rejection. The removal of a prior TLS EOF regression test is a minor cleanup related to test maintenance and does not introduce unrelated functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/socket/ws_loop.rs`:
- Around line 130-180: The code resets consecutive_failures and backoff before
actually obtaining a fresh token, which can suppress escalation or create tight
retry loops; instead, remove the early resets and only reset
consecutive_failures = 0 and backoff = Duration::from_millis(1000) after
refresh_session_token().await returns Some(fresh) and fresh != token (i.e., a
non-empty, different token), then assign token = fresh and continue immediately;
if refresh_session_token() returns None or returns the same token, treat it as a
regular failure path: increment consecutive_failures, call
log_connection_failure(...) as currently done, leave backoff unchanged, and do
not short-circuit the normal backoff sleep. Ensure you reference the
variables/functions consecutive_failures, backoff, token,
refresh_session_token(), and log_connection_failure when making this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 487b3d2f-7ba6-4145-859e-f2cc3c550ab1

📥 Commits

Reviewing files that changed from the base of the PR and between a41d913 and fab8b47.

📒 Files selected for processing (3)
  • src/openhuman/socket/types.rs
  • src/openhuman/socket/ws_loop.rs
  • src/openhuman/socket/ws_loop_tests.rs

Comment thread src/openhuman/socket/ws_loop.rs
graycyrus added 2 commits May 29, 2026 12:00
…ained

Move consecutive_failures/backoff resets inside the `Some(fresh) if
fresh != token` arm so they only fire when refresh_session_token()
returns a *different* token.  Previously the counters were cleared
before the refresh completed, which meant:

- A failed refresh (None) would restart the failure streak from 1 on
  every InvalidToken loop, suppressing the sustained-outage Sentry
  escalation.
- A refresh that returned the same stale token would spin with no
  backoff delay, tight-looping against the server.

Now `Some(_) | None` (same token or no token) falls through the normal
Failed accounting so FAIL_ESCALATE_THRESHOLD still fires correctly.
@graycyrus graycyrus requested a review from sanil-23 May 29, 2026 09:11
@oxoxDev
Copy link
Copy Markdown
Contributor

oxoxDev commented May 29, 2026

Thanks for this, @graycyrus — and credit where due: this PR was filed first (this one at 04:24, #2905 at 05:47). Both target the same crash, TAURI-RUST-9C / #2892 (the stale-token "Invalid token" reconnect storm), with the same set of files.

We're going to consolidate on #2905 and close this one. To be transparent about why — it's not a comment on effort, the two diffs are genuinely close — but #2905 edges ahead on a few correctness points:

Your is_invalid_token_error case-insensitivity rationale and the run_connectionInvalidToken e2e mock test are both nicely done — if any of that is missing from #2905 it's worth porting over. Closing this in favor of #2905; thanks again for the fast turnaround on the crash.

@oxoxDev oxoxDev closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Socket reconnection fails with expired token after disconnect (237 events)

2 participants