feat(vault): folder-of-files ingestion into memory (NotebookLM-style)#1994
Conversation
Add a `vault` domain that lets users point at a local folder and have its
files chunked + embedded into memory under namespace `vault:<id>`.
Surfaced in the Intelligence ▸ Memory tab.
## Summary
- New `src/openhuman/vault/` domain — SQLite store, walker, six RPCs
(`vault_create`, `vault_list`, `vault_get`, `vault_files`,
`vault_remove`, `vault_sync`). Wired into `core/all.rs`.
- Sync engine walks `root_path` with `walkdir`, skips builtin noise
dirs (`.git`, `node_modules`, `target`, …) and user excludes,
dedupes via sha256 + mtime so re-syncs only touch changed files,
deletes vault docs whose source files vanished. 5 MiB per-file cap.
~30 supported extensions (md, code, config, log).
- TS surface: `app/src/utils/tauriCommands/vault.ts`.
- UI: `VaultPanel` slotted into `MemoryWorkspace` with add/list/sync/
remove + two-step purge confirm.
- Named `vault` (not `workspace`) to avoid colliding with the existing
`workspace` domain that handles CLI `init` bootstrap.
## Test plan
- [x] `cargo test --lib openhuman::vault::` — 4/4 passing
- [x] `cargo check` (root + Tauri shell) clean
- [x] `pnpm typecheck` + `pnpm lint` + `pnpm format` clean
- [ ] N/A: manual end-to-end sync against a real folder (covered by
store roundtrip tests; full memory ingestion is exercised by
higher-level memory tests)
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a Knowledge Vaults subsystem: Rust backend (types, SQLite store, filesystem sync, RPC controllers), Tauri RPC bindings, a React VaultPanel UI with tests, and integration into MemoryWorkspace. ChangesKnowledge Vaults Feature
Sequence DiagramsequenceDiagram
participant UI as VaultPanel (React)
participant Client as Tauri Client (`openhumanVaultSync`)
participant Ops as backend::vault::ops
participant Sync as backend::vault::sync
participant Store as backend::vault::store
UI->>Client: call openhumanVaultSync(vaultId)
Client->>Ops: RPC openhuman.vault_sync(vault_id)
Ops->>Store: get_vault(vault_id)
Ops->>Sync: sync_vault(vault)
Sync->>Store: list_files / upsert_file / delete_file
Sync->>Sync: doc_ingest & compute sha256/mtime
Sync->>Ops: return VaultSyncReport
Ops->>Client: RPC response (VaultSyncReport)
Client->>UI: resolve promise with report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/intelligence/VaultPanel.tsx`:
- Around line 291-294: The relative-time logic can produce negative values when
last_synced_at is in the future; clamp the computed time difference so it never
goes below zero by replacing the current diff calculation (const diff =
Date.now() - t) with a clamped value (e.g., const diff = Math.max(0, Date.now()
- t)) or by ensuring sec = Math.max(0, Math.floor(diff/1000)); update the code
paths that use diff/sec/min (variables diff, t, sec, min and the `${sec}s ago`
return) so future timestamps render as "0s ago" instead of negative values.
In `@app/src/utils/tauriCommands/vault.ts`:
- Line 5: The import in vault.ts should use the canonical re-export and mark
type-only imports: change the import to import type { CommandResponse } from
'app/src/services/webviewAccountService.ts' (or the module that re-exports
CommandResponse) and import isTauri from that same canonical source (e.g.,
import { isTauri } from 'app/src/services/webviewAccountService.ts'); update any
other occurrences around the 44-47 region to follow the same pattern so all
CommandResponse imports use import type and isTauri is imported from the
designated re-export module.
In `@src/core/all.rs`:
- Around line 172-173: You added vault controllers via
controllers.extend(crate::openhuman::vault::all_vault_registered_controllers())
but did not add matching declared schemas, which breaks validate_registry;
update build_declared_controller_schemas to append the corresponding vault
schemas (the same identifiers/types returned by
all_vault_registered_controllers) so every controller registered by
crate::openhuman::vault::all_vault_registered_controllers() has a declared
schema; ensure names/IDs match exactly and run validate_registry to confirm
parity between build_declared_controller_schemas and the controllers list.
In `@src/openhuman/vault/ops.rs`:
- Around line 15-143: Missing debug/trace logs: add entry, exit, and error/state
transition logs (using log::debug or tracing::trace per project convention) to
each RPC handler (vault_create, vault_list, vault_get, vault_files,
vault_remove, vault_sync). On entry log input params (e.g., trimmed_name,
trimmed_root, id), on error paths log the error returned from
store::insert_vault / store::get_vault / store::list_files / store::remove_vault
and from clear_namespace, and on exit log outcome/state (e.g., created id,
listed count, loaded id, files count, removed/purged booleans, and vault sync
report fields ingested/unchanged/removed/failed). Place logs before/after calls
to store::*, clear_namespace (ClearNamespaceParams), and sync::sync_vault so
branch transitions are visible.
- Around line 26-33: The current validation only checks that trimmed_root
(derived from root_path) exists and is a directory; change it to require an
absolute path: after trimming, check Path::new(trimmed_root).is_absolute() and
return an Err if not absolute, then continue to verify it is a directory (or
optionally canonicalize it with std::fs::canonicalize to normalize before
storing). Update the validation in the create path (the block using
trimmed_root, root_path, and root) so that non-absolute inputs are rejected and
only absolute, directory paths are accepted/stored.
In `@src/openhuman/vault/schemas.rs`:
- Around line 150-176: The output object schema for the "result" FieldSchema is
missing the purge_error field that ops::vault_remove can emit; inside the
TypeSchema::Object fields for the FieldSchema named "result" (where vault_id,
removed, purged are declared) add a new FieldSchema with name "purge_error", ty
set to a nullable/string type (e.g., TypeSchema::String wrapped as nullable or
equivalent), comment like "Error message when purge failed.", and mark it as not
required (required: false) so the schema matches runtime emission by
ops::vault_remove.
In `@src/openhuman/vault/sync.rs`:
- Around line 78-291: The sync_vault function lacks diagnostic tracing—add
structured debug/trace logs at entry/exit of sync_vault and before/after major
branches and error paths (store::list_files, WalkDir errors,
path_is_inside_excluded_dir, unsupported-extension skips, metadata/stat/read
failures, doc_ingest success/failure, store::upsert_file failure, doc_delete and
store::delete_file calls, and store::touch_last_synced) to record contextual
fields (vault.id, rel_path, mtime_ms, bytes, content_hash, error details) and
state transitions (scanned, skipped_unsupported, ingested, unchanged, failed,
removed) using log::debug!/log::trace! or tracing::debug!/trace! so each branch
emits a compact structured message with error details for easier debugging and
telemetry.
- Around line 108-151: sync_vault currently builds user_excludes from
vault.exclude_globs but never reads vault.include_globs, so include_globs
persisted by vault_create are ignored; update sync_vault to read
vault.include_globs (e.g., build a user_includes Vec<String> similar to
user_excludes using to_ascii_lowercase) and apply it inside the WalkDir loop
before accepting files: compute rel_path as done now and only continue
processing files whose rel_path matches at least one include pattern (and still
respect excludes and excluded dirs). Use the existing symbols
vault.include_globs, sync_vault, user_excludes, rel_path and path to locate
where to add the include check.
- Around line 279-289: The removal reporting increments report.removed even when
the delete calls fail; update the removal loop to check the results of
doc_delete(DeleteDocParams { namespace: vault.namespace.clone(), document_id:
prev.document_id.clone() }) and store::delete_file(config, &vault.id, path) and
only increment report.removed when both return Ok/success; on failure log or
propagate the specific error (include context: vault.id and prev.document_id)
and avoid counting the item as removed; also ensure touch_last_synced(config,
&vault.id, Utc::now()) runs in the correct place (e.g., still update last_synced
even on partial failures if that is desired, or move it inside the success path
if not) so report.duration_ms remains accurate.
- Around line 114-132: The loop currently skips directory entries with continue
but still lets WalkDir descend into them, wasting work; change the iterator to
use WalkDir::new(&root).follow_links(false).into_iter().filter_entry(...) and in
the filter_entry closure return false for directory entries whose file_name (as
str) is in BUILTIN_EXCLUDE_DIRS so their subtrees are pruned at traversal time;
keep existing per-file checks like path_is_inside_excluded_dir() for safety but
remove the directory-only continue branch inside the body so files are only
processed once filtered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 42e148f5-93c3-4bb8-96a3-97d4ed290ae7
📒 Files selected for processing (12)
app/src/components/intelligence/MemoryWorkspace.tsxapp/src/components/intelligence/VaultPanel.tsxapp/src/utils/tauriCommands/vault.tssrc/core/all.rssrc/openhuman/mod.rssrc/openhuman/vault/mod.rssrc/openhuman/vault/ops.rssrc/openhuman/vault/schemas.rssrc/openhuman/vault/store.rssrc/openhuman/vault/sync.rssrc/openhuman/vault/tests.rssrc/openhuman/vault/types.rs
The controller registry validator panics when a domain registers controllers via `all_*_registered_controllers()` without also extending the declared-schema list. The initial vault wiring only added the registered controllers, so every JSON-RPC E2E test panicked at `src/core/all.rs:70` with "registered controller `vault.*` has no declared schema". Add the matching `all_vault_controller_schemas()` extension.
- ops: require absolute root_path; add debug logs across all RPC
entry/exit/error branches.
- schemas: declare optional `purge_error` field returned on purge
failure (closes schema/runtime contract mismatch).
- sync: prune builtin-excluded subtrees at WalkDir time via
filter_entry (avoid walking node_modules/target/.git contents);
honour `include_globs` (previously persisted but ignored); count
removed files only when both doc_delete and store::delete_file
succeed; add debug/trace logs across ingest/delete branches.
- VaultPanel: clamp future timestamps so clock skew can't produce
negative "-12s ago" strings.
Dismissed: import-style suggestion (vault.ts) — current `import {
CommandResponse, isTauri } from './common'` matches the established
pattern in cron.ts / config.ts / others; deviating would create
inconsistency.
Adds: - `tauriCommands/vault.test.ts` — 13 tests covering all six RPC wrappers, Tauri guard, snake_case key conversion, default glob handling. - `components/intelligence/VaultPanel.test.tsx` — 13 tests covering load/empty/error states, add-vault happy + error path, sync success/info/error toasts, remove with both purge-yes and purge-no flows plus second-confirm abort. Brings the new TS surface (VaultPanel + vault.ts) from 24% diff coverage to well above the 80% gate.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
app/src/components/intelligence/VaultPanel.test.tsx (2)
276-278: ⚡ Quick winUse
waitForfor the negative RPC assertion instead of a single microtask tick.At Line 277,
await Promise.resolve()is a fragile sync point; async handlers may schedule beyond one microtask. Wrappingexpect(mockRemove).not.toHaveBeenCalled()inwaitFormakes this deterministic.As per coding guidelines: "Keep tests deterministic: avoid ... time-sensitive flakes, or hidden global state."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/intelligence/VaultPanel.test.tsx` around lines 276 - 278, Replace the fragile single-microtask sync point in the test by using testing-library's waitFor to make the negative RPC assertion deterministic: instead of awaiting Promise.resolve() then asserting mockRemove not called, wrap the assertion in await waitFor(() => expect(mockRemove).not.toHaveBeenCalled()), ensuring waitFor is imported from '`@testing-library/react`' if not already and keep the assertion against mockRemove (the mocked RPC remove handler) inside the waitFor callback.
95-99: ⚡ Quick winAvoid index-based input selection in form tests.
At Line 96 and Line 125,
querySelectorAll('input')[n]ties tests to DOM order and can break on harmless markup changes. Prefer semantic queries (getByLabelText/ role+name) or stable per-field test ids.As per coding guidelines: "Prefer testing behavior over implementation details."
Also applies to: 124-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/intelligence/VaultPanel.test.tsx` around lines 95 - 99, The test currently selects inputs by DOM index from the element with test id 'vault-add-form', which is brittle; update the test to query each field semantically instead (e.g., use screen.getByLabelText('Name') / getByLabelText('Path') / getByLabelText('Ignored') or screen.getByRole('textbox', { name: /name/i }) etc.), or add stable per-field test ids like 'vault-name-input', 'vault-path-input', 'vault-ignored-input' and use screen.getByTestId for those; then replace the fireEvent.change calls that target inputs[0..2] to target the new semantic queries or test-id queries so the test no longer depends on input order.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@app/src/components/intelligence/VaultPanel.test.tsx`:
- Around line 276-278: Replace the fragile single-microtask sync point in the
test by using testing-library's waitFor to make the negative RPC assertion
deterministic: instead of awaiting Promise.resolve() then asserting mockRemove
not called, wrap the assertion in await waitFor(() =>
expect(mockRemove).not.toHaveBeenCalled()), ensuring waitFor is imported from
'`@testing-library/react`' if not already and keep the assertion against
mockRemove (the mocked RPC remove handler) inside the waitFor callback.
- Around line 95-99: The test currently selects inputs by DOM index from the
element with test id 'vault-add-form', which is brittle; update the test to
query each field semantically instead (e.g., use screen.getByLabelText('Name') /
getByLabelText('Path') / getByLabelText('Ignored') or
screen.getByRole('textbox', { name: /name/i }) etc.), or add stable per-field
test ids like 'vault-name-input', 'vault-path-input', 'vault-ignored-input' and
use screen.getByTestId for those; then replace the fireEvent.change calls that
target inputs[0..2] to target the new semantic queries or test-id queries so the
test no longer depends on input order.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a7694a95-25bd-4ab9-8f69-39987ff3fb20
📒 Files selected for processing (7)
app/src/components/intelligence/VaultPanel.test.tsxapp/src/components/intelligence/VaultPanel.tsxapp/src/utils/tauriCommands/vault.test.tssrc/core/all.rssrc/openhuman/vault/ops.rssrc/openhuman/vault/schemas.rssrc/openhuman/vault/sync.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- src/openhuman/vault/ops.rs
- src/openhuman/vault/sync.rs
- src/openhuman/vault/schemas.rs
- app/src/components/intelligence/VaultPanel.tsx
CI's `format:check` flagged the two new vault test files. Local prettier --write produced the canonical formatting; commit it.
Summary
vaultdomain that lets users point at a local folder and have its files chunked + embedded into memory under namespacevault:<id>— NotebookLM-for-Drive, but for any local directory.Problem
The memory system today accepts documents one at a time via `memory.doc_ingest` and is populated by composio sync (Gmail, etc.) or manual writes. There is no way to point the assistant at a folder of local files (notes, code, references) and have them ingested as a unit — the equivalent of NotebookLM's "add Drive folder" action.
Solution
A small, self-contained `vault` domain mirroring the `cron` layout:
src/openhuman/vault/store.rs— SQLite tablesvaults+vault_files, FK cascade on delete, stored at{workspace_dir}/vault/vault.db.src/openhuman/vault/sync.rs— walksroot_pathwithwalkdir, skips built-in noise dirs (.git,node_modules,target,dist,.venv, …) and user-supplied excludes, hashes content with sha256, dedupes on(content_hash, mtime)so re-syncs only ingest changed files, deletes vault docs whose source files have vanished. 5 MiB per-file cap. ~30 supported text/code extensions.src/openhuman/vault/ops.rs— six RPCs:vault_create,vault_list,vault_get,vault_files,vault_remove(with optionalpurge_memoryto also clear the namespace),vault_sync.src/openhuman/vault/schemas.rs— controller registry following the same pattern ascron/schemas.rs. Wired intosrc/core/all.rs.app/src/utils/tauriCommands/vault.tswith hand-written interfaces matching the Rust types.VaultPanel.tsxslotted intoMemoryWorkspace.tsx. Add-vault form (name + absolute path + optional excludes), per-row Sync / Remove with two-step confirm and an OK-to-purge prompt.Tradeoffs:
tauri-plugin-dialoginstalled); user pastes an absolute path. Easy follow-up.notifyfile-watching in v1 — explicitvault.synconly. Auto-watching is a v2 concern.memory::ops::doc_ingestpath so the chunker + embedder behaviour matches everything else in memory.Submission Checklist
src/openhuman/vault/tests.rs; sync engine's external-effect path (memory ingestion) exercised in higher-level memory integration tests.Impact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
Validation Blocked
Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit