Skip to content

feat: optional session store (resumabillity support)#775

Merged
alexhancock merged 9 commits into
modelcontextprotocol:mainfrom
binahm:feat/session-store
Apr 21, 2026
Merged

feat: optional session store (resumabillity support)#775
alexhancock merged 9 commits into
modelcontextprotocol:mainfrom
binahm:feat/session-store

Conversation

@glicht
Copy link
Copy Markdown
Contributor

@glicht glicht commented Mar 25, 2026

Motivation and Context

In horizontally-scaled deployments or after restarting a server, MCP clients use Mcp-Session-Id to resume sessions. When routed to a different server instance or after a restart, the session is unknown and clients get a 404, forcing re-initialization.

This PR adds an optional SessionStore trait that persists InitializeRequestParams after a successful handshake. When a request arrives at an instance without a matching in-memory session, the store is consulted and the session is transparently restored by replaying initialize.

The SessionManager trait gains a restore_session method with a default no-op implementation. Custom session manager implementations can override it to integrate with their own logic if needed. The built-in LocalSessionManager has an implementation that re-creates the in-memory session worker.

Additionally to provide indication to ServerHandler implementation that a call to initialize and on_initialized is as a result of a restore, a marker SessionRestoreMarker is added to the context.extensions so implementors can act appropriately when a session is restored.

The feature is opt-in. Configurations without a session_store configured are unaffected.

Note: for full resumability and multi-server (behind a load balancer) support there is need to also implement an Event store so events aren't lost. This is being discussed at: #330

How Has This Been Tested?

Added a new integration test suite at: test_streamable_http_session_store.rs with three test scenarios.

Breaking Changes

None. Existing logic remains if session_store is not configured (defaults to None).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • [ x I have added or updated documentation as needed

Additional context

Related to #330

@glicht glicht requested a review from a team as a code owner March 25, 2026 15:43
@github-actions github-actions Bot added T-dependencies Dependencies related changes T-test Testing related changes T-config Configuration file changes T-core Core library changes T-transport Transport layer changes labels Mar 25, 2026
Comment thread crates/rmcp/src/transport/streamable_http_server/tower.rs
Comment thread crates/rmcp/src/transport/streamable_http_server/tower.rs Outdated
@glicht glicht force-pushed the feat/session-store branch from e1f4999 to 192faa2 Compare March 30, 2026 18:14
@glicht glicht force-pushed the feat/session-store branch from 192faa2 to 2361eca Compare March 30, 2026 18:27
@glicht
Copy link
Copy Markdown
Contributor Author

glicht commented Apr 6, 2026

@alexhancock any feedback on this PR? Would really like to progress this forward.

Copy link
Copy Markdown
Contributor

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this will only be needed until June/July when modelcontextprotocol/modelcontextprotocol#2575 is in the official spec, but the implementation looks good and will likely need to live longer than SEP-2575 is included official and used!

@alexhancock alexhancock self-requested a review April 21, 2026 18:10
@alexhancock alexhancock merged commit 8f696e6 into modelcontextprotocol:main Apr 21, 2026
17 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 21, 2026
dax added a commit to universal-inbox/universal-inbox that referenced this pull request May 13, 2026
## Summary

Production runs two replicas of the API behind a load balancer. With
rmcp's per-process `LocalSessionManager`, a request landing on a
different pod than the one that handled `initialize` was rejected with
401, which Claude Code interpreted as token expiry — causing a refresh
loop that eventually surfaced as *"the universal-inbox-alan MCP server
token has expired and requires re-authorization"*.

This PR bumps `rmcp` to 1.6 and wires its new `SessionStore` trait
through a vendored patch of `rmcp-actix-web` 0.12.3 (upstream
re-implements its own handlers instead of delegating to rmcp's tower
service, so the trait alone is not enough). A new `RedisSessionStore`
persists each session's `initialize` parameters; when a follow-up
request hits a pod that doesn't know the session, the patched transport
loads from Redis and replays the handshake transparently. The
missing-session response also flips from 401 to 404 per the MCP spec, so
clients re-initialize cleanly instead of looping on token refresh.

## Changes

- **`rmcp` 1.2 → 1.6** in `api/Cargo.toml` to pick up the `SessionStore`
trait (`PR modelcontextprotocol/rust-sdk#775`).
- **Vendored patch of `rmcp-actix-web` 0.12.3** under
`api/vendor/rmcp-actix-web/`, redirected via `[patch.crates-io]` in the
workspace `Cargo.toml`. Patch wires `SessionStore` into
`handle_get`/`handle_post`/`handle_delete`: replays the `initialize`
handshake on any pod that doesn't know the session, persists state on a
fresh `initialize`, deletes on session close, and returns 404 (not 401)
for unknown/expired sessions.
- **`RedisSessionStore`** (`api/src/mcp/session_store.rs`) —
`SessionStore` impl using the existing `Cache::connection_manager`,
JSON-serialized `SessionState`, namespace
`universal-inbox:mcp:session:`, configurable TTL via `set_ex`.
- **Wiring** in `api/src/mcp/mod.rs` (`build_http_service` now takes
`Arc<dyn SessionStore>`) and `api/src/lib.rs` (constructs
`RedisSessionStore` from the existing cache + config).
- **Config** `McpSessionStoreSettings { ttl_seconds }` (always enabled;
default 86400s).
- **Unit tests** in `api/tests/api/test_mcp_session_store.rs` covering
round-trip, missing-key returns `None`, delete, and TTL.

## Test plan

- [x] `just check` — clippy + compile clean against the patched crate.
- [x] `just test session_store` — 4/4 new unit tests pass.
- [x] `just test test_mcp` — 18/18 existing MCP/OAuth2 tests pass (no
regressions on the patched paths).
- [x] Manual two-process smoke test (Caddy round-robining `/api/*`
between two API instances sharing the same Postgres + Redis):
`initialize` on pod A, `tools/list` with the same `Mcp-Session-Id` lands
on pod B, returns 200 — the patched transport restores from Redis.
- [ ] Prod canary on Alan: set
`UNIVERSAL_INBOX__APPLICATION__MCP_SESSION_STORE__TTL_SECONDS`
(defaulted to 86400) and confirm `qovery log` shows `MCP auth accepted`
for every request, `session restored from external store` exactly once
per pod-bounce of the same session id, and zero `Unauthorized: Session
not found` lines.

## Notes

- The vendored fork is a near-verbatim copy of upstream rmcp 1.6's
`tower::try_restore_from_store` adapted for actix-web. The actix version
uses the existing `on_request` hook to populate the synthesized
`initialize`/`initialized` extensions, since actix middleware already
populates the request's extensions (this is how our auth claim reaches
the MCP service today).
- `SessionRestoreMarker` is `#[non_exhaustive]` with no public
constructor, so it can't be inserted from outside the rmcp crate. Our
`UniversalInboxMcpServer` doesn't observe the marker, so omitting it is
harmless.
- Out of scope: cross-instance SSE event replay (upstream rust-sdk#330).
Universal Inbox tools are pure RPC — request/response only — so we don't
need an event store.
<!-- devin-review-badge-begin -->

---

<a
href="https://app.devin.ai/review/universal-inbox/universal-inbox/pull/163"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added Redis-backed MCP session persistence with configurable TTL,
enabling sessions to persist across restarts and server instances.
* Implemented cross-instance session restoration—sessions can now be
seamlessly recovered and restored on different pods with automatic
handshake replay.
* Updated session management infrastructure with improved HTTP routing
for API and OAuth discovery endpoints.

* **Dependencies**
* Updated RMCP library to version 1.6 for enhanced session store
integration.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/universal-inbox/universal-inbox/pull/163)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-config Configuration file changes T-core Core library changes T-dependencies Dependencies related changes T-test Testing related changes T-transport Transport layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants