fix(conversations): release store lock during cold index rebuild (#2849)#2886
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors ConversationStore::search_cross_thread_messages to add a warm-cache fast path that locks only the per-workspace index cache, and a cold-path that snapshots threads under CONVERSATION_STORE_LOCK, releases it for JSONL reads to rebuild the index, then finalizes the index under the cache lock. Adds a concurrency test validating no serialization on append. ChangesLock Contention Fix for Cross-Thread Search
Sequence DiagramsequenceDiagram
participant Client
participant ConversationStore
participant CONVERSATION_INDEX_CACHE
participant JSONLFiles
Client->>ConversationStore: search_cross_thread_messages(workspace)
ConversationStore->>CONVERSATION_INDEX_CACHE: try read cached index
alt cache hit (warm)
CONVERSATION_INDEX_CACHE-->>ConversationStore: return index
ConversationStore-->>Client: return search results
else cache miss (cold)
ConversationStore->>ConversationStore: acquire CONVERSATION_STORE_LOCK
ConversationStore->>ConversationStore: snapshot thread list
ConversationStore-->>ConversationStore: release CONVERSATION_STORE_LOCK
ConversationStore->>JSONLFiles: read per-thread JSONL to build index
ConversationStore->>CONVERSATION_INDEX_CACHE: lock and entry().or_insert(rebuilt_index)
CONVERSATION_INDEX_CACHE-->>ConversationStore: cached index (maybe reused)
ConversationStore-->>Client: return search results
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…yhumansai#2849) search_cross_thread_messages previously held CONVERSATION_STORE_LOCK for the entire index rebuild (walking all JSONL files), blocking append_message, get_messages, and list_threads for the duration. Split into fast path (warm cache: cache lock only) and cold path (brief store lock for thread-list snapshot, then lock-free JSONL I/O, then cache lock to insert). JSONL files are append-only so a concurrent write during rebuild just means one missed message, picked up on the next warm-cache insert. Removes now-unused with_index and populate_index_unlocked helpers.
32dff1c to
89b5780
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/openhuman/memory_conversations/store.rs (1)
209-220: ⚡ Quick winAdd branch-decision debug logs for the warm/cold split (do not log the query).
This is a new, concurrency-sensitive flow whose whole purpose is diagnosing stalls, yet the warm-cache early return and the cold-rebuild entry emit no logs — only the post-population
debug!at line 247 fires. A grep-friendly log on each branch makes it possible to confirm in the field whether the fast path is actually being taken. Keep the userqueryout of the log (it is user PII, like the redacted titles elsewhere).🪵 Suggested logs
{ let mut cache = CONVERSATION_INDEX_CACHE.lock(); if let Some(idx) = cache.get_mut(&self.root_dir()) { + debug!( + "{LOG_PREFIX} cross-thread search warm-cache hit workspace={}", + self.root_dir().display() + ); return Ok(idx.search(query, limit, exclude_thread_id)); } } + debug!( + "{LOG_PREFIX} cross-thread search cold rebuild start workspace={}", + self.root_dir().display() + ); // Cold path: build the index. Snapshot the thread list under the // store lock (brief), then release it so concurrent writes aren't // blocked during the potentially-long JSONL file reads.As per coding guidelines: "add substantial, development-oriented logs on new/changed flows at entry/exit points, branch decisions ... never log secrets, API keys, JWTs, or sensitive PII".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_conversations/store.rs` around lines 209 - 220, Add branch-decision debug logs around the fast-path cache check and the cold-path index build: when checking CONVERSATION_INDEX_CACHE.lock() and finding a cached index for self.root_dir(), emit a debug log indicating the warm-cache fast path was taken (include the root_dir() or a short identifier but never the user query); likewise emit a debug log when falling through to the cold path just before snapshotting the thread list to indicate a rebuild will occur. Keep the existing post-population debug! intact; the new logs should only state branch decisions and the root identifier and must not include the query or any other PII.src/openhuman/memory_conversations/store_tests.rs (1)
849-915: ⚡ Quick winThis test can't fail on the unfixed code — it's a smoke test, not a regression guard.
There's no synchronization forcing the spawned
appendto run while the search is mid-rebuild; the writer may run entirely before or after the search. More importantly, the pre-fix code never deadlocked here —append_messageand the oldsearchboth tookCONVERSATION_STORE_LOCKfirst, so the writer would just serialize behind the rebuild and still complete. Since the test only asserts "both complete without deadlock," it passes on the old code too and so does not actually validate the lock release this PR is about. The inline comment at lines 886-889 ("would deadlock or serialize") overstates this — it would serialize, and the test cannot observe serialization.A truly deterministic concurrency assertion is hard and tends to be flaky, so at minimum correct the comment to reflect that this is a non-deterministic smoke test (it exercises the path under TSan/loom or repeated runs but does not by itself prove non-serialization).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_conversations/store_tests.rs` around lines 849 - 915, The test cold_search_does_not_serialize_on_outer_lock is a non-deterministic smoke test and does not prove the pre-fix behavior (it could only have serialized, not deadlocked); update the inline comment in that test to remove the claim that the old code "would deadlock or serialize" and instead state that the test only exercises the cold-cache path and verifies both append_message and search_cross_thread_messages complete without deadlock but cannot assert serialization ordering deterministically; mention that deterministic validation requires TSan/loom or a targeted synchronization harness and keep references to CONVERSATION_INDEX_CACHE and the spawned writer to preserve context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/memory_conversations/store.rs`:
- Around line 189-202: Update the comment that claims a concurrently-written
message "gets picked up via the warm-cache `insert` path on the next
`append_message` call": clarify that `append_message` only inserts the message
it is appending and does not re-read JSONL, so a message missed during the
cold-path rebuild can remain absent from the inverted index until the cache is
evicted/rebuilt; reference the cold-path snapshot under
`CONVERSATION_STORE_LOCK`, the warm-path `CONVERSATION_INDEX_CACHE`, and the
`append_message` behavior to make the recovery tradeoff explicit and avoid
implying automatic self-healing.
---
Nitpick comments:
In `@src/openhuman/memory_conversations/store_tests.rs`:
- Around line 849-915: The test cold_search_does_not_serialize_on_outer_lock is
a non-deterministic smoke test and does not prove the pre-fix behavior (it could
only have serialized, not deadlocked); update the inline comment in that test to
remove the claim that the old code "would deadlock or serialize" and instead
state that the test only exercises the cold-cache path and verifies both
append_message and search_cross_thread_messages complete without deadlock but
cannot assert serialization ordering deterministically; mention that
deterministic validation requires TSan/loom or a targeted synchronization
harness and keep references to CONVERSATION_INDEX_CACHE and the spawned writer
to preserve context.
In `@src/openhuman/memory_conversations/store.rs`:
- Around line 209-220: Add branch-decision debug logs around the fast-path cache
check and the cold-path index build: when checking
CONVERSATION_INDEX_CACHE.lock() and finding a cached index for self.root_dir(),
emit a debug log indicating the warm-cache fast path was taken (include the
root_dir() or a short identifier but never the user query); likewise emit a
debug log when falling through to the cold path just before snapshotting the
thread list to indicate a rebuild will occur. Keep the existing post-population
debug! intact; the new logs should only state branch decisions and the root
identifier and must not include the query or any other PII.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 24750504-f9b9-487f-b679-930d25b1231c
📒 Files selected for processing (2)
src/openhuman/memory_conversations/store.rssrc/openhuman/memory_conversations/store_tests.rs
graycyrus
left a comment
There was a problem hiding this comment.
@YOMXXX the split-lock strategy looks correct and the logic is sound. CI is still pending so I'll hold off on approving until those are green — address CodeRabbit's actionable comment on the misleading doc at line 202 while you wait.
One thing I noticed that CodeRabbit didn't flag:
The new test (cold_search_does_not_serialize_on_outer_lock) doesn't actually guarantee temporal interleaving between the cold rebuild and the concurrent write. There's no barrier or channel forcing the writer thread to execute during the JSONL read phase — depending on scheduling the writer can complete entirely before the search starts, meaning the test only proves "no deadlock" rather than "no serialization under true concurrency." The concurrency correctness is in the logic itself (which is solid), but the test name oversells what it demonstrates.
To actually stress the concurrent path, you could add a small thread::sleep or use a Barrier to synchronize the writer's start with the cold-rebuild window. Not a blocker, but the test name sets an expectation that isn't backed by the timing today.
Everything else — the or_insert race guard, the drop of _guard before JSONL reads, the warm-path early return, and the lock-ordering doc rewrite — looks right. Good fix.
…comment Address CodeRabbit review: append_message only indexes its own new message, not previously missed ones. A message missed during the cold rebuild stays absent until the cache is evicted/rebuilt.
graycyrus
left a comment
There was a problem hiding this comment.
@YOMXXX the doc comment fix in 96b8c1a is exactly right — the clarified wording accurately describes the missed-message tradeoff without implying false self-healing. Prior CodeRabbit finding addressed.
On the test synchronization note from my last review — the test can't guarantee true temporal interleaving since the writer may finish before the cold rebuild starts reading JSONL, but the structural correctness guarantee from the implementation is solid and the test does prove no deadlock. Resolving that thread as a non-blocker.
Code is clean. Waiting on CI — a few jobs are still pending (Rust Core Coverage, Build Tauri App, e2e Appium). Once those are green I'll approve.
Summary
search_cross_thread_messagescall after process start blocks all conversation-store operations (append_message,get_messages,list_threads, etc.) for the full JSONL rebuild durationProblem
with_indexheld bothCONVERSATION_STORE_LOCKandCONVERSATION_INDEX_CACHE.lock()while walking every per-thread JSONL file to build the inverted index. For workspaces with ~100k messages this is a multi-second stall on the hot chat path (agent/memory_loader.rs:311, called every turn).Solution
Split-lock strategy per approach (1) from the issue:
JSONL files are append-only, so a concurrent
append_messageduring rebuild is safe — the new message gets indexed by the warm-cacheinsertpath on the next call.Changes
store.rs: Rewrotesearch_cross_thread_messageswith split-lock, removedwith_indexandpopulate_index_unlocked(inlined), updated lock-ordering docsstore_tests.rs: Addedcold_search_does_not_serialize_on_outer_lock— concurrent write during cold rebuildTest plan
cold_search_does_not_serialize_on_outer_lock— verifies no deadlock on concurrent write during cold rebuildCloses #2849
Summary by CodeRabbit
Bug Fixes
Tests