Skip to content

fix(conversations): release store lock during cold index rebuild (#2849)#2886

Merged
graycyrus merged 2 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2849-conversation-store-lock-contention
May 29, 2026
Merged

fix(conversations): release store lock during cold index rebuild (#2849)#2886
graycyrus merged 2 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2849-conversation-store-lock-contention

Conversation

@YOMXXX
Copy link
Copy Markdown
Contributor

@YOMXXX YOMXXX commented May 29, 2026

Summary

  • Fix lock contention where the first search_cross_thread_messages call after process start blocks all conversation-store operations (append_message, get_messages, list_threads, etc.) for the full JSONL rebuild duration

Problem

with_index held both CONVERSATION_STORE_LOCK and CONVERSATION_INDEX_CACHE.lock() while walking every per-thread JSONL file to build the inverted index. For workspaces with ~100k messages this is a multi-second stall on the hot chat path (agent/memory_loader.rs:311, called every turn).

Solution

Split-lock strategy per approach (1) from the issue:

  • Fast path (warm cache): acquires only the cache lock — no outer store lock needed
  • Cold path (first access): acquires store lock briefly to snapshot thread list, releases it, reads JSONL files lock-free, then acquires cache lock to insert the index

JSONL files are append-only, so a concurrent append_message during rebuild is safe — the new message gets indexed by the warm-cache insert path on the next call.

Changes

  • store.rs: Rewrote search_cross_thread_messages with split-lock, removed with_index and populate_index_unlocked (inlined), updated lock-ordering docs
  • store_tests.rs: Added cold_search_does_not_serialize_on_outer_lock — concurrent write during cold rebuild

Test plan

  • cold_search_does_not_serialize_on_outer_lock — verifies no deadlock on concurrent write during cold rebuild
  • All existing conversation store tests unaffected

Closes #2849

Summary by CodeRabbit

  • Bug Fixes

    • Prevents potential deadlocks during cross-thread message searches by changing locking so searches use a cached index when available and avoid holding the global store lock for the full operation.
  • Tests

    • Added a concurrency test ensuring searches and concurrent message appends can run without deadlock and return expected results.

Review Change Stack

@YOMXXX YOMXXX requested a review from a team May 29, 2026 02:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dbd2cd7b-db57-41a0-ab0d-188ca14f50de

📥 Commits

Reviewing files that changed from the base of the PR and between 89b5780 and 96b8c1a.

📒 Files selected for processing (1)
  • src/openhuman/memory_conversations/store.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/memory_conversations/store.rs

📝 Walkthrough

Walkthrough

Refactors ConversationStore::search_cross_thread_messages to add a warm-cache fast path that locks only the per-workspace index cache, and a cold-path that snapshots threads under CONVERSATION_STORE_LOCK, releases it for JSONL reads to rebuild the index, then finalizes the index under the cache lock. Adds a concurrency test validating no serialization on append.

Changes

Lock Contention Fix for Cross-Thread Search

Layer / File(s) Summary
Lock ordering docs update
src/openhuman/memory_conversations/store.rs
Replace the old INVARIANT comment with explicit lock ordering documentation and the rule forbidding holding the per-workspace inverted-index cache lock while calling ConversationStore APIs that require CONVERSATION_STORE_LOCK.
search_cross_thread_messages warm/cold split
src/openhuman/memory_conversations/store.rs
Implement warm-path: read and search from CONVERSATION_INDEX_CACHE under its lock only. Implement cold-path: briefly acquire CONVERSATION_STORE_LOCK to snapshot thread list, release it before reading per-thread JSONL to rebuild the inverted index, then acquire cache lock to entry(...).or_insert(...) the rebuilt index and run the search.
Cross-thread search concurrency test
src/openhuman/memory_conversations/store_tests.rs
Add cold_search_does_not_serialize_on_outer_lock that evicts the index cache, forces a cold rebuild while a concurrent thread calls append_message, and asserts search returns results and the concurrent append completes (no deadlock).

Sequence Diagram

sequenceDiagram
  participant Client
  participant ConversationStore
  participant CONVERSATION_INDEX_CACHE
  participant JSONLFiles

  Client->>ConversationStore: search_cross_thread_messages(workspace)
  ConversationStore->>CONVERSATION_INDEX_CACHE: try read cached index
  alt cache hit (warm)
    CONVERSATION_INDEX_CACHE-->>ConversationStore: return index
    ConversationStore-->>Client: return search results
  else cache miss (cold)
    ConversationStore->>ConversationStore: acquire CONVERSATION_STORE_LOCK
    ConversationStore->>ConversationStore: snapshot thread list
    ConversationStore-->>ConversationStore: release CONVERSATION_STORE_LOCK
    ConversationStore->>JSONLFiles: read per-thread JSONL to build index
    ConversationStore->>CONVERSATION_INDEX_CACHE: lock and entry().or_insert(rebuilt_index)
    CONVERSATION_INDEX_CACHE-->>ConversationStore: cached index (maybe reused)
    ConversationStore-->>Client: return search results
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2756: Introduced the per-workspace inverted index and with_index mechanism that this PR refactors to avoid holding both locks during JSONL rebuild.

Poem

🐰 I hop between locks with ears held high,

I snapshot threads, then let the outer fly,
I read the JSONL while fields are free,
I stash the index, and the writers flee not me,
A quiet hop keeps race and deadlock shy.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: releasing the store lock during cold index rebuild. It is specific, concise, and directly relates to the primary objective of fixing lock contention.
Linked Issues check ✅ Passed The pull request fully addresses all coding requirements from issue #2849: implements split-lock strategy, releases outer lock during JSONL rebuild, adds concurrent write test, and ensures first search does not block operations beyond snapshot duration.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #2849. Documentation updates clarify lock-ordering, synchronization refactoring removes contention via split-lock design, and the new test verifies concurrent safety during rebuild.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…yhumansai#2849)

search_cross_thread_messages previously held CONVERSATION_STORE_LOCK for
the entire index rebuild (walking all JSONL files), blocking append_message,
get_messages, and list_threads for the duration.

Split into fast path (warm cache: cache lock only) and cold path (brief
store lock for thread-list snapshot, then lock-free JSONL I/O, then cache
lock to insert). JSONL files are append-only so a concurrent write during
rebuild just means one missed message, picked up on the next warm-cache
insert.

Removes now-unused with_index and populate_index_unlocked helpers.
@YOMXXX YOMXXX force-pushed the fix/2849-conversation-store-lock-contention branch from 32dff1c to 89b5780 Compare May 29, 2026 02:10
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. working A PR that is being worked on by the team. labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/openhuman/memory_conversations/store.rs (1)

209-220: ⚡ Quick win

Add branch-decision debug logs for the warm/cold split (do not log the query).

This is a new, concurrency-sensitive flow whose whole purpose is diagnosing stalls, yet the warm-cache early return and the cold-rebuild entry emit no logs — only the post-population debug! at line 247 fires. A grep-friendly log on each branch makes it possible to confirm in the field whether the fast path is actually being taken. Keep the user query out of the log (it is user PII, like the redacted titles elsewhere).

🪵 Suggested logs
         {
             let mut cache = CONVERSATION_INDEX_CACHE.lock();
             if let Some(idx) = cache.get_mut(&self.root_dir()) {
+                debug!(
+                    "{LOG_PREFIX} cross-thread search warm-cache hit workspace={}",
+                    self.root_dir().display()
+                );
                 return Ok(idx.search(query, limit, exclude_thread_id));
             }
         }
 
+        debug!(
+            "{LOG_PREFIX} cross-thread search cold rebuild start workspace={}",
+            self.root_dir().display()
+        );
         // Cold path: build the index. Snapshot the thread list under the
         // store lock (brief), then release it so concurrent writes aren't
         // blocked during the potentially-long JSONL file reads.

As per coding guidelines: "add substantial, development-oriented logs on new/changed flows at entry/exit points, branch decisions ... never log secrets, API keys, JWTs, or sensitive PII".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_conversations/store.rs` around lines 209 - 220, Add
branch-decision debug logs around the fast-path cache check and the cold-path
index build: when checking CONVERSATION_INDEX_CACHE.lock() and finding a cached
index for self.root_dir(), emit a debug log indicating the warm-cache fast path
was taken (include the root_dir() or a short identifier but never the user
query); likewise emit a debug log when falling through to the cold path just
before snapshotting the thread list to indicate a rebuild will occur. Keep the
existing post-population debug! intact; the new logs should only state branch
decisions and the root identifier and must not include the query or any other
PII.
src/openhuman/memory_conversations/store_tests.rs (1)

849-915: ⚡ Quick win

This test can't fail on the unfixed code — it's a smoke test, not a regression guard.

There's no synchronization forcing the spawned append to run while the search is mid-rebuild; the writer may run entirely before or after the search. More importantly, the pre-fix code never deadlocked here — append_message and the old search both took CONVERSATION_STORE_LOCK first, so the writer would just serialize behind the rebuild and still complete. Since the test only asserts "both complete without deadlock," it passes on the old code too and so does not actually validate the lock release this PR is about. The inline comment at lines 886-889 ("would deadlock or serialize") overstates this — it would serialize, and the test cannot observe serialization.

A truly deterministic concurrency assertion is hard and tends to be flaky, so at minimum correct the comment to reflect that this is a non-deterministic smoke test (it exercises the path under TSan/loom or repeated runs but does not by itself prove non-serialization).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_conversations/store_tests.rs` around lines 849 - 915,
The test cold_search_does_not_serialize_on_outer_lock is a non-deterministic
smoke test and does not prove the pre-fix behavior (it could only have
serialized, not deadlocked); update the inline comment in that test to remove
the claim that the old code "would deadlock or serialize" and instead state that
the test only exercises the cold-cache path and verifies both append_message and
search_cross_thread_messages complete without deadlock but cannot assert
serialization ordering deterministically; mention that deterministic validation
requires TSan/loom or a targeted synchronization harness and keep references to
CONVERSATION_INDEX_CACHE and the spawned writer to preserve context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_conversations/store.rs`:
- Around line 189-202: Update the comment that claims a concurrently-written
message "gets picked up via the warm-cache `insert` path on the next
`append_message` call": clarify that `append_message` only inserts the message
it is appending and does not re-read JSONL, so a message missed during the
cold-path rebuild can remain absent from the inverted index until the cache is
evicted/rebuilt; reference the cold-path snapshot under
`CONVERSATION_STORE_LOCK`, the warm-path `CONVERSATION_INDEX_CACHE`, and the
`append_message` behavior to make the recovery tradeoff explicit and avoid
implying automatic self-healing.

---

Nitpick comments:
In `@src/openhuman/memory_conversations/store_tests.rs`:
- Around line 849-915: The test cold_search_does_not_serialize_on_outer_lock is
a non-deterministic smoke test and does not prove the pre-fix behavior (it could
only have serialized, not deadlocked); update the inline comment in that test to
remove the claim that the old code "would deadlock or serialize" and instead
state that the test only exercises the cold-cache path and verifies both
append_message and search_cross_thread_messages complete without deadlock but
cannot assert serialization ordering deterministically; mention that
deterministic validation requires TSan/loom or a targeted synchronization
harness and keep references to CONVERSATION_INDEX_CACHE and the spawned writer
to preserve context.

In `@src/openhuman/memory_conversations/store.rs`:
- Around line 209-220: Add branch-decision debug logs around the fast-path cache
check and the cold-path index build: when checking
CONVERSATION_INDEX_CACHE.lock() and finding a cached index for self.root_dir(),
emit a debug log indicating the warm-cache fast path was taken (include the
root_dir() or a short identifier but never the user query); likewise emit a
debug log when falling through to the cold path just before snapshotting the
thread list to indicate a rebuild will occur. Keep the existing post-population
debug! intact; the new logs should only state branch decisions and the root
identifier and must not include the query or any other PII.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24750504-f9b9-487f-b679-930d25b1231c

📥 Commits

Reviewing files that changed from the base of the PR and between b1b919a and 89b5780.

📒 Files selected for processing (2)
  • src/openhuman/memory_conversations/store.rs
  • src/openhuman/memory_conversations/store_tests.rs

Comment thread src/openhuman/memory_conversations/store.rs Outdated
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YOMXXX the split-lock strategy looks correct and the logic is sound. CI is still pending so I'll hold off on approving until those are green — address CodeRabbit's actionable comment on the misleading doc at line 202 while you wait.

One thing I noticed that CodeRabbit didn't flag:

The new test (cold_search_does_not_serialize_on_outer_lock) doesn't actually guarantee temporal interleaving between the cold rebuild and the concurrent write. There's no barrier or channel forcing the writer thread to execute during the JSONL read phase — depending on scheduling the writer can complete entirely before the search starts, meaning the test only proves "no deadlock" rather than "no serialization under true concurrency." The concurrency correctness is in the logic itself (which is solid), but the test name oversells what it demonstrates.

To actually stress the concurrent path, you could add a small thread::sleep or use a Barrier to synchronize the writer's start with the cold-rebuild window. Not a blocker, but the test name sets an expectation that isn't backed by the timing today.

Everything else — the or_insert race guard, the drop of _guard before JSONL reads, the warm-path early return, and the lock-ordering doc rewrite — looks right. Good fix.

Comment thread src/openhuman/memory_conversations/store_tests.rs
…comment

Address CodeRabbit review: append_message only indexes its own new
message, not previously missed ones. A message missed during the cold
rebuild stays absent until the cache is evicted/rebuilt.
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YOMXXX the doc comment fix in 96b8c1a is exactly right — the clarified wording accurately describes the missed-message tradeoff without implying false self-healing. Prior CodeRabbit finding addressed.

On the test synchronization note from my last review — the test can't guarantee true temporal interleaving since the writer may finish before the cold rebuild starts reading JSONL, but the structural correctness guarantee from the implementation is solid and the test does prove no deadlock. Resolving that thread as a non-blocker.

Code is clean. Waiting on CI — a few jobs are still pending (Rust Core Coverage, Build Tauri App, e2e Appium). Once those are green I'll approve.

@graycyrus graycyrus merged commit 3278f2c into tinyhumansai:main May 29, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory_conversations: first cross-thread search holds both locks during JSONL rebuild

2 participants