Skip to content

feat(memory): add tool-scoped memory with durable rule capture, RPC APIs, and prompt pinning#1487

Merged
senamakel merged 7 commits into
tinyhumansai:mainfrom
YellowSnnowmann:feat/tool-scoped-memory
May 11, 2026
Merged

feat(memory): add tool-scoped memory with durable rule capture, RPC APIs, and prompt pinning#1487
senamakel merged 7 commits into
tinyhumansai:mainfrom
YellowSnnowmann:feat/tool-scoped-memory

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 11, 2026

Summary

  • Added a new tool_memory domain under:

    • src/openhuman/memory/
    • Persists per-tool rules in dedicated tool-{name} namespaces
  • Introduced tool-memory RPC methods:

    • tool_rule_put
    • tool_rule_get
    • tool_rule_list
    • tool_rule_delete
    • tool_rules_for_prompt
    • tool_rules_json
  • Registered schemas/controllers in memory registry

  • Updated session prompt construction:

    • Prefetches Critical/High tool rules
    • Pins them into system prompt
    • Ensures safety-important constraints survive context compression
  • Added ToolMemoryCaptureHook:

    • Automatically captures durable tool guidance from learning signals
    • Controlled via config
  • Added config support:

    • learning.tool_memory_capture_enabled
    • OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED
  • Updated capability/docs surfaces:

    • about_app/catalog.rs
    • GitBook feature docs
    • Summary/docs pages
  • Added focused tests:

    • Store behavior
    • Schema integration

Problem

  • Tool guidance was fragmented across generic memory systems

  • Safety-critical tool constraints could be lost during:

    • context compression
    • prompt trimming
  • No dedicated API for:

    • storing tool-specific rules
    • retrieving durable operational constraints
  • No centralized architecture for:

    • capturing durable learnings
    • injecting them reliably into prompts

Solution

Tool Memory Storage

Implemented dedicated storage layer:

  • ToolMemoryStore
  • Stores structured ToolMemoryRule entries
  • Uses:
    • tool-{tool} namespaces
    • priority metadata
    • source metadata

RPC + Controller Integration

Added tool-memory RPC handlers and schema registrations:

  • Exposed through standard controller routing
  • Avoids adapter-specific implementations

Prompt Integration

Added deterministic prompt injection:

  • Prefetch Critical/High rules
  • Inject into system prompt at session start
  • Improves durability across compression boundaries

Learning Hook

Added ToolMemoryCaptureHook:

  • Captures durable guidance post-turn
  • Integrated into learning flow
  • Rollout controlled via config

Tradeoffs

  • Prefetch is:

    • best-effort
    • intentionally non-fatal
  • Unsupported runtimes/errors:

    • return empty results
    • preserve stability in sync/single-thread harnesses

Submission Checklist

  • Tests

    • Added/updated per Testing Strategy
    • Includes:
      • happy paths
      • edge/failure cases
  • Coverage

    • Diff coverage ≥ 80%
    • Enforced via:
      • .github/workflows/coverage.yml
  • Commands

    • pnpm test:coverage
    • pnpm test:rust
  • Coverage Matrix

    • Updated in:
      • docs/TEST-COVERAGE-MATRIX.md
    • Or marked N/A where appropriate
  • Feature IDs

    • Listed under ## Related
  • Network Dependencies

    • No new external dependencies
    • Uses mock backend per Testing Strategy
  • Manual Smoke Checklist

    • Updated if release-cut surfaces affected:
      • docs/RELEASE-MANUAL-SMOKE.md
  • Linked Issues

    • Added via:
      • Closes #NNN

Impact

Runtime / Platform

  • Affects:

    • Rust core memory system
    • agent session prompt pipeline
  • No platform-specific shell behavior added


Performance

  • Small startup overhead from best-effort prefetch
  • Bounded by:
    • prompt cap
    • scoped retrieval

Security / Safety

  • Improves durability of:

    • high-priority tool constraints
    • safety-critical rules
  • Reduces unsafe behavior caused by:

    • prompt compression loss

Compatibility / Migration

  • Additive API/config changes
  • Default behavior remains enabled
  • Supports opt-out via:
    • config
    • environment variable

Related

Summary by CodeRabbit

  • New Features

    • Tool-Scoped Memory to manage durable, tool-specific rules with priority handling
    • Rules can be injected into sessions (critical/high pinned; normal retrieved on demand)
    • RPC endpoints for rule CRUD and prompt-resolution
    • Optional automatic capture of user edicts and repeated tool failures
  • Configuration

    • New toggle to enable/disable automatic tool memory capture
  • Documentation

    • Added docs and TOC entry for Tool-Scoped Memory

Review Change Stack

…C controllers

Introduce a new tool-scoped memory domain that stores durable per-tool
rules in  namespaces, exposes CRUD + prompt-prep RPC
methods, and auto-captures actionable tool guidance during learning.

- add  module (types, store, capture hook, prompt section)
- add memory ops + schemas for:
  -
  -
  -
- prefetch Critical/High tool rules in session builder and pin into system prompt
- register  behind learning config
- add  config + env override
- wire exports through memory mod/ops/schemas surfaces
- include focused tests for rule types, rendering stability, and store behavior
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

Adds a tool-scoped memory subsystem: domain types, storage with tests, RPC handlers and schema registration, prompt rendering and prefetch, post-turn capture hook, config/env toggle, capability registration, and GitBook documentation.

Changes

Tool-Scoped Memory Layering

Layer / File(s) Summary
Domain Types
src/openhuman/memory/tool_memory/types.rs
Defines ToolMemoryPriority (Normal/High/Critical), ToolMemorySource, and ToolMemoryRule with stable id/timestamps and tool_memory_namespace helper.
Storage & CRUD
src/openhuman/memory/tool_memory/store.rs
ToolMemoryStore wraps a Memory backend and implements put_rule, get_rule, list_rules, delete_rule, rules_for_prompt (eager filtering + cap), record, and JSON helpers.
Store Tests & Helpers
src/openhuman/memory/tool_memory/store_tests.rs, src/openhuman/memory/tool_memory/test_helpers.rs
MockMemory-based tests and test helper validate input checks, round-trip/upsert semantics, ordering, deletion, scanning, cap enforcement, malformed JSON skipping, JSON payload serialization, and id auto-assignment.
Prompt Rendering
src/openhuman/memory/tool_memory/prompt.rs
ToolMemoryRulesSection snapshots pre-rendered rules; render_tool_memory_rules deterministically sorts and groups rules with a shared heading and priority markers.
Post-Turn Capture
src/openhuman/memory/tool_memory/capture.rs
ToolMemoryCaptureHook implements PostTurnHook extracting user edicts (Critical) and repeated tool failures (Normal), mapping to tools and persisting via ToolMemoryStore::record.
RPC Handlers
src/openhuman/memory/ops/tool_memory.rs
Implements six RPC handlers and param/result types: tool_rule_put, tool_rule_get, tool_rule_list, tool_rule_delete, tool_rules_for_prompt (rendered block + rules), and tool_rules_json.
RPC Schema & Dispatch
src/openhuman/memory/schemas/tool_memory.rs, src/openhuman/memory/schemas/mod.rs, src/openhuman/memory/schemas_tests.rs
Register FUNCTIONS, provide controllers() and per-RPC ControllerSchema entries, and wire them into aggregated schema/controller lists and tests.
Module Structure & Re-exports
src/openhuman/memory/mod.rs, src/openhuman/memory/tool_memory/mod.rs, src/openhuman/memory/ops/mod.rs
Declare tool_memory submodule and re-export public APIs, types, and RPC ops at the appropriate module surfaces.
Session Integration
src/openhuman/agent/prompts/mod.rs, src/openhuman/agent/harness/session/builder.rs
Add SystemPromptBuilder::with_tool_memory_rules(), synchronously prefetch eager rules at session build (prefetch_tool_memory_rules_blocking), and register the capture hook when enabled.
Config & Env
src/openhuman/config/schema/learning.rs, src/openhuman/config/schema/load.rs
Add tool_memory_capture_enabled (serde-default true) to LearningConfig and an env override OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED.
Capability Catalog
src/openhuman/about_app/catalog.rs
Register intelligence.tool_scoped_memory capability (Beta, LOCAL_RAW).
Memory Client Handle
src/openhuman/memory/store/client.rs
Add MemoryClient::memory_handle() to expose Arc<dyn Memory> for other subsystems.
Documentation / TOC
gitbooks/SUMMARY.md, gitbooks/features/native-tools/tool-memory.md
Add GitBook TOC entry and a new page documenting design, capture pipeline, RPC surface, retrieval timing, and the "never email Sarah" safety example.

Sequence Diagram(s)

(omitted — changes are a cohesive feature but not presented as a multi-actor sequential flow requiring an additional diagram)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰 A rabbit's note on rules that stay,

Tool-scoped wisdom tucked away.
"Never email Sarah" snug and clear,
Pinned in prompts so agents hear.
Hops away — safe paths near.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: adding tool-scoped memory with durable rule capture, RPC APIs, and prompt pinning.
Linked Issues check ✅ Passed The PR comprehensively implements all acceptance criteria from issue #1400: tool-scoped storage (tool-{name} namespaces), priority/criticality support, learning capture hooks, retrieval in session prompt construction, safety case examples, stats separation, and GitBook documentation.
Out of Scope Changes check ✅ Passed All changes directly support tool-scoped memory implementation: new memory module with store/capture/prompt, RPC APIs, config entries, session builder integration, schema registration, and documentation. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 11, 2026 09:51
@YellowSnnowmann YellowSnnowmann requested a review from a team May 11, 2026 09:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
src/openhuman/config/schema/load.rs (1)

882-889: ⚡ Quick win

Prefer parse_env_bool here to avoid silent invalid-value fallback.

Line 882 currently duplicates bool parsing and ignores invalid values silently, while the helper logs bad tokens consistently.

Suggested patch
-        if let Some(flag) = env.get("OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED") {
-            let normalized = flag.trim().to_ascii_lowercase();
-            match normalized.as_str() {
-                "1" | "true" | "yes" | "on" => self.learning.tool_memory_capture_enabled = true,
-                "0" | "false" | "no" | "off" => self.learning.tool_memory_capture_enabled = false,
-                _ => {}
-            }
-        }
+        if let Some(flag) = env.get("OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED") {
+            if let Some(enabled) =
+                parse_env_bool("OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED", &flag)
+            {
+                self.learning.tool_memory_capture_enabled = enabled;
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/schema/load.rs` around lines 882 - 889, Replace the
manual parsing block for "OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED" with a
call to the existing parse_env_bool helper so invalid tokens get logged instead
of being silently ignored; call parse_env_bool with the environment map and the
key "OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED", then if it returns
Some(bool) assign it to self.learning.tool_memory_capture_enabled, otherwise
leave the current value unchanged (the helper will emit warnings for invalid
values).
src/openhuman/memory/tool_memory/prompt.rs (1)

96-101: ⚡ Quick win

Add a final deterministic tie-breaker in sort order.

To fully satisfy the “byte-stable output” goal, include a final key (e.g., id) when priority/tool/rule are equal.

Suggested patch
     sorted.sort_by(|a, b| {
         b.priority
             .cmp(&a.priority)
             .then_with(|| a.tool_name.cmp(&b.tool_name))
             .then_with(|| a.rule.cmp(&b.rule))
+            .then_with(|| a.id.cmp(&b.id))
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tool_memory/prompt.rs` around lines 96 - 101, The sort
comparator used in the sorted.sort_by closure currently compares priority,
tool_name, and rule but lacks a final deterministic tie-breaker; update the
closure (the comparator passed to sorted.sort_by) to append a final comparison
on a stable unique key such as id (e.g., add .then_with(|| a.id.cmp(&b.id))) so
that when priority, tool_name, and rule are equal the order is byte-stable and
deterministic.
src/openhuman/agent/harness/session/builder.rs (1)

800-803: ⚡ Quick win

Use a stable domain prefix for the new tool-memory logs.

[#1400] is hard to grep once the issue number stops mattering. Please switch these to the normal domain-style prefixes ([agent::builder], [memory::tool_memory], etc.) and keep the count/error as structured fields.

As per coding guidelines "Use log / tracing at debug / trace level; include stable grep-friendly prefixes ([domain], [rpc]) and correlation fields; never log secrets or full PII".

Also applies to: 1222-1223

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/session/builder.rs` around lines 800 - 803,
Replace the ad-hoc "[`#1400`]" prefix in the log::info call that reports pinning
tool-scoped rules with a stable domain-style prefix and structured fields:
change the message to use a grep-friendly prefix like "[agent::builder]" or
"[memory::tool_memory]" and move the count (pinned.len()) into a structured
field or explicit count parameter in the log, keeping the log level and avoiding
any secrets; apply the same change to the similar log statements later in the
file that use the numeric prefix so all tool-memory logs share the same stable
domain prefix and structured count field.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/about_app/catalog.rs`:
- Around line 218-220: The how_to string mixes qualified and unqualified RPC
names; update the how_to field so all RPC calls are fully-qualified and
consistent (e.g., replace unqualified tool_rule_list and tool_rule_delete with
the same service-qualified prefix used for memory.tool_rule_put), editing the
how_to literal in catalog.rs to list the fully-qualified RPCs (referencing the
how_to field and the RPC names memory.tool_rule_put, memory.tool_rule_list,
memory.tool_rule_delete) so users can copy/paste correct calls.

In `@src/openhuman/agent/harness/session/builder.rs`:
- Around line 797-799: The prefetch currently calls
prefetch_tool_memory_rules_blocking(memory.clone()) with no tool list which
causes rules_for_prompt to scan all tool-* namespaces; move this prefetch so it
runs after the code that computes the agent's final visible tool set (the
resolved visible tools variable used later in the session build) and change the
call to prefetch_tool_memory_rules_blocking(memory.clone(),
&resolved_visible_tools) (or the equivalent parameter the helper expects) so
only rules for tools the agent can actually use are pinned; apply the same
change to the other identical block mentioned in the comment.

In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 173-196: The code currently appends the ToolMemoryRulesSection to
the tail via self.sections.push, which lets tail-biased trimming evict these
high-priority rules; instead find the index of the tool catalogue section (the
session's "tools" section, e.g. the ToolCatalogueSection / tools marker) and
insert the new ToolMemoryRulesSection before it using
self.sections.insert(index, Box::new(...)); if no tools section exists fall back
to push so behavior is unchanged. Update with_tool_memory_rules to locate the
first tool-related section (by matching its concrete type or a tools marker) and
insert the rules there rather than pushing to the end.

In `@src/openhuman/memory/schemas/tool_memory.rs`:
- Around line 257-261: The handler handle_tool_rules_for_prompt currently
swallows deserialization errors by using
parse_params::<ToolRulesForPromptParams>(...).unwrap_or_default(), which can
turn malformed input into an empty tools list; change it to fail fast by
returning the parse error instead of defaulting: call
parse_params::<ToolRulesForPromptParams>(params) and propagate the Err (e.g.,
using ? or matching) so invalid params return an error upstream before calling
rpc::tool_rules_for_prompt(payload). Ensure the code still awaits
rpc::tool_rules_for_prompt(payload).await? on the successfully parsed payload.

In `@src/openhuman/memory/tool_memory/capture.rs`:
- Around line 177-180: The current log::info! call that prints
truncate_for_log(&body) exposes user/tool text; change both occurrences (the
log::info! lines that reference truncate_for_log(&body)) to a debug (or trace)
level and stop including the body text itself — instead log only safe metadata
(e.g., tool name/ID and a redacted placeholder, body length, or a non-reversible
hash). Update the log invocation(s) to use log::debug! (or trace!) and remove
body interpolation, e.g., "tool={tool} body_redacted=true length={len}" or
similar, leaving truncate_for_log unused for these sensitive captures.

In `@src/openhuman/memory/tool_memory/store_tests.rs`:
- Around line 4-13: The tests fail to compile because Arc is used but not
imported; add an import for Arc (std::sync::Arc) at the top of the test module
so fresh_store() and list_tool_names_returns_only_tool_prefixed_namespaces() can
call Arc::new(...) successfully; locate the module imports near use
async_trait::async_trait and add use std::sync::Arc to resolve the missing
symbol.

In `@src/openhuman/memory/tool_memory/types.rs`:
- Around line 50-54: The is_pinned helper currently returns true only for
Self::Critical but the prompt/rendering semantics treat both High and Critical
as system-prompt (pinned) content — update the is_pinned(self) -> bool
implementation to return true for both Self::High and Self::Critical (e.g., use
matches!(self, Self::High | Self::Critical)) and mirror the same change in the
duplicate helper at the other location (lines ~175-179) and update any related
tests to expect High to be pinned.
- Around line 141-150: The function tool_memory_namespace currently only trims
the tool_name and returns format!("tool-{}", tool_name.trim()), which
contradicts the docstring that says it should be lower-cased; update
tool_memory_namespace to normalize casing by applying a lowercase operation to
the trimmed name (e.g., trim then to_lowercase()) before formatting so names
like "Send_Email" and "send_email" map to the same namespace.

---

Nitpick comments:
In `@src/openhuman/agent/harness/session/builder.rs`:
- Around line 800-803: Replace the ad-hoc "[`#1400`]" prefix in the log::info call
that reports pinning tool-scoped rules with a stable domain-style prefix and
structured fields: change the message to use a grep-friendly prefix like
"[agent::builder]" or "[memory::tool_memory]" and move the count (pinned.len())
into a structured field or explicit count parameter in the log, keeping the log
level and avoiding any secrets; apply the same change to the similar log
statements later in the file that use the numeric prefix so all tool-memory logs
share the same stable domain prefix and structured count field.

In `@src/openhuman/config/schema/load.rs`:
- Around line 882-889: Replace the manual parsing block for
"OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED" with a call to the existing
parse_env_bool helper so invalid tokens get logged instead of being silently
ignored; call parse_env_bool with the environment map and the key
"OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED", then if it returns Some(bool)
assign it to self.learning.tool_memory_capture_enabled, otherwise leave the
current value unchanged (the helper will emit warnings for invalid values).

In `@src/openhuman/memory/tool_memory/prompt.rs`:
- Around line 96-101: The sort comparator used in the sorted.sort_by closure
currently compares priority, tool_name, and rule but lacks a final deterministic
tie-breaker; update the closure (the comparator passed to sorted.sort_by) to
append a final comparison on a stable unique key such as id (e.g., add
.then_with(|| a.id.cmp(&b.id))) so that when priority, tool_name, and rule are
equal the order is byte-stable and deterministic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88f3a11e-5773-4726-8acf-d67cb55c555e

📥 Commits

Reviewing files that changed from the base of the PR and between 09c5560 and f91dc2d.

📒 Files selected for processing (20)
  • gitbooks/SUMMARY.md
  • gitbooks/features/native-tools/tool-memory.md
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/config/schema/learning.rs
  • src/openhuman/config/schema/load.rs
  • src/openhuman/memory/mod.rs
  • src/openhuman/memory/ops/mod.rs
  • src/openhuman/memory/ops/tool_memory.rs
  • src/openhuman/memory/schemas/mod.rs
  • src/openhuman/memory/schemas/tool_memory.rs
  • src/openhuman/memory/schemas_tests.rs
  • src/openhuman/memory/store/client.rs
  • src/openhuman/memory/tool_memory/capture.rs
  • src/openhuman/memory/tool_memory/mod.rs
  • src/openhuman/memory/tool_memory/prompt.rs
  • src/openhuman/memory/tool_memory/store.rs
  • src/openhuman/memory/tool_memory/store_tests.rs
  • src/openhuman/memory/tool_memory/types.rs

Comment thread src/openhuman/about_app/catalog.rs
Comment thread src/openhuman/agent/harness/session/builder.rs Outdated
Comment thread src/openhuman/agent/prompts/mod.rs
Comment thread src/openhuman/memory/schemas/tool_memory.rs
Comment thread src/openhuman/memory/tool_memory/capture.rs Outdated
Comment thread src/openhuman/memory/tool_memory/store_tests.rs Outdated
Comment thread src/openhuman/memory/tool_memory/types.rs Outdated
Comment thread src/openhuman/memory/tool_memory/types.rs
@senamakel
Copy link
Copy Markdown
Member

wow what a fucking goated PR

- Add missing `std::sync::Arc` import in store_tests.rs
- Expand `is_pinned` to include `High` priority (matches prompt semantics)
- Update test name and assertions to reflect High+Critical being pinned
- Normalize tool namespace to lowercase in `tool_memory_namespace`
- Add lowercase test cases for namespace helper
- Propagate parse error in `handle_tool_rules_for_prompt` (remove unwrap_or_default)
- Downgrade capture.rs log to `debug` and remove body text to avoid PII logging
- Fix catalog.rs how_to field to use fully-qualified RPC names
- Replace `[tinyhumansai#1400]` log prefixes with stable `[memory::tool_memory]` domain prefix
- Insert ToolMemoryRulesSection before tool-catalogue section in system prompt
- Use `parse_env_bool` helper for OPENHUMAN_LEARNING_TOOL_MEMORY_CAPTURE_ENABLED
- Add `.then_with(|| a.id.cmp(&b.id))` tie-breaker to render sort for byte-stability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@YellowSnnowmann
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent review (non-CodeRabbit)

Overall: well-structured PR that follows project conventions closely. The tool_memory/ domain is cleanly separated, store tests are thorough (22 async tests), and the CodeRabbit feedback was addressed. A few things stand out — see inline comments.

Summary

  • 1 blocker — config gate asymmetry (prefetch fires when learning.enabled = false)
  • 2 major — broad "stop" edict trigger false positives; prefetch scans all namespaces for restricted agents
  • 2 suggestions — duplicate is_pinned/is_eager; test doesn't exercise build() trait method
  • 4 nitpicks — duplicate MockMemory, missing ops logging, __unscoped__ sentinel leak, stale issue ref in log

Looks good

  • Module layout (tool_memory/ subdirectory with light mod.rs)
  • Controller registration wired correctly through schemas/mod.rs
  • No bare .unwrap() in production code
  • No PII in logs (capture hook truncates rule bodies)
  • block_in_place runtime check is correct
  • put_rule preserves created_at on upsert
  • Config env override follows parse_env_bool pattern
  • Capability catalog entry present with LOCAL_RAW privacy tag

// a single-threaded test harness — so this stays safe for every
// call site of `from_config_*`. The capture hook still runs in
// every session via [`ToolMemoryCaptureHook`] above.
if config.learning.tool_memory_capture_enabled {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] Prefetch fires even when learning.enabled = false

This is gated on tool_memory_capture_enabled but not on learning.enabled. The capture hook (line 862) is correctly nested inside if config.learning.enabled { ... }, so no new rules are written when learning is off. But previously-stored rules still get pinned into the system prompt every session.

Users who disable learning expect deterministic, stored-rule-free agent behavior.

Suggested fix:

// before
if config.learning.tool_memory_capture_enabled {

// after
if config.learning.enabled && config.learning.tool_memory_capture_enabled {

}
let lower = trimmed.to_lowercase();
if !(lower.contains("never ") || lower.contains("don't ") || lower.contains("do not "))
&& !lower.contains("stop ")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] "stop " edict trigger is too broad

lower.contains("stop ") matches routine utterances: "I want to stop working on this", "stop the server for maintenance", "you should stop after three retries". All of these fall through to the per-line scan, which also matches contains(" stop ") (line 109), creating false-positive edicts.

The doc comment says "stop <verb>ing" but the code doesn't enforce that form.

Suggested fix: Narrow to sentence-boundary forms only:

// before (line 80-83)
if !(lower.contains("never ") || lower.contains("don't ") || lower.contains("do not "))
    && !lower.contains("stop ")
{

// after
if !lower.contains("never ")
    && !lower.contains("don't ")
    && !lower.contains("do not ")
    && !lower.starts_with("stop ")
    && !lower.contains("\nstop ")
    && !lower.contains(". stop ")
{

Also add a negative test:

#[test]
fn extract_user_edicts_ignores_incidental_stop_usage() {
    let edicts = ToolMemoryCaptureHook::extract_user_edicts(
        "I want to stop the server for maintenance.",
        &[call("shell", true)],
    );
    assert!(edicts.is_empty());
}

tokio::task::block_in_place(|| {
handle.block_on(async move {
let store = crate::openhuman::memory::ToolMemoryStore::new(memory);
match store.rules_for_prompt(&[]).await {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Prefetch scans ALL tool namespaces regardless of agent's tool set

rules_for_prompt(&[]) with an empty slice falls back to list_tool_names, scanning every tool-* namespace. An agent with a restricted tool set (e.g. a read-only sub-agent that can't use send_email) will still have those tools' critical rules injected into its system prompt.

Beyond prompt-budget waste, this is a correctness issue for restricted agents — rules about tools the agent shouldn't know about get injected.

The session builder already resolves visible_tools later in the same function. The minimal fix is to thread tool names through, or defer the inject until after tool resolution.

At minimum: add a // TODO: pass resolved tool names to scope prefetch comment so the debt is visible, and document the scope gap in the function's rustdoc.


impl ToolMemoryPriority {
/// True for priorities that must be eagerly surfaced to the agent.
pub fn is_eager(self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] is_pinned and is_eager are identical — remove one

Both return matches!(self, Self::Critical | Self::High). Two names for the same predicate creates confusion about whether they might diverge. The PR uses "eager" for prefetch and "pinned" for compression resistance, but the code makes no distinction.

Either remove is_pinned (and use is_eager everywhere), or give is_pinned a genuinely different definition (e.g. only Critical is truly compression-pinned).

assert_eq!(first, again);
}

#[test]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Test named section_renders_via_prompt_section_trait doesn't call build()

This test only calls is_empty(), which reads the rendered field directly. The build(&ctx) method — the actual PromptSection trait contract — is never exercised. If build were accidentally changed to return an error, no test would catch it.

Suggested fix: Actually invoke the trait method:

let ctx = PromptContext::default();
let output = section.build(&ctx).expect("build should succeed");
assert!(output.contains("never email Sarah"));

@@ -0,0 +1,154 @@
//! RPC handlers for the tool-scoped memory layer (see
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] No debug logging in any RPC handler

Every other ops file logs [rpc] or [memory] prefixed lines at handler entry. These 6 handlers have no log::debug! at all. Even a single line per handler (log::debug!("[tool-memory] {fn_name} tool={tool_name}")) would satisfy the project convention and make traces grep-able.

let default_tool = tool_calls
.first()
.map(|tc| tc.name.clone())
.unwrap_or_else(|| "__unscoped__".to_string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] __unscoped__ sentinel leaks into namespace scanning

When no tool calls ran, edicts land under tool-__unscoped__. This namespace will then be scanned by list_tool_names / rules_for_prompt, potentially surfacing unscoped rules into future prompts unexpectedly.

Consider filtering this sentinel from list_tool_names, or using a reserved prefix that the scanner explicitly skips.

post_turn_hooks.push(Arc::new(
crate::openhuman::memory::ToolMemoryCaptureHook::new(memory.clone(), true),
));
log::info!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Raw issue number #1400 in runtime log message

Log messages with issue references become stale. Prefer a stable description:

// before
"[learning] tool_memory_capture hook registered (#1400 — durable tool rules)"

// after
"[learning] tool_memory_capture hook registered"

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Duplicate MockMemory impl

This MockMemory is defined identically in store_tests.rs. Extract it into a shared #[cfg(test)] helper in tool_memory/mod.rs or a test_helpers module to avoid silent drift between the two copies.

YellowSnnowmann and others added 2 commits May 11, 2026 22:17
…mory

- Gate prefetch behind `learning.enabled` so users who opt out of learning
  don't have stored rules pinned into the system prompt
- Move prefetch after tool-set resolution and pass actual agent tool names
  to `rules_for_prompt` so unrelated tool namespaces are never scanned
- Tighten "stop" edict detection: only treat "stop " as an imperative when
  it appears at a sentence boundary; remove `contains(" stop ")` from the
  per-line check to prevent false-positive captures from phrases like
  "I want to stop working on this"
- Remove body content from repeated-failure debug log to avoid PII leaking
  into log files (log body_len only, matching the edict capture path)
- Remove duplicate `is_pinned` predicate — `is_eager` already covers both
  Critical and High; update the doc-comment to explain compression semantics
- Filter `__unscoped__` sentinel from `list_tool_names` so unscoped edicts
  captured before any tool call are not injected into future prompt filters
- Add `log::debug!` entry to all six tool-memory RPC handlers per project
  convention (stable grep-able `[tool-memory]` prefix)
- Remove raw issue reference from `tool_memory_capture` registration log
- Exercise `PromptSection::build()` in `section_renders_via_prompt_section_trait`
  test — the previous version only called `is_empty()`, leaving the trait
  contract uncovered
- Extract shared `MockMemory` to `tool_memory/test_helpers.rs`; use it in
  both `store_tests.rs` and `capture.rs` to eliminate silent drift risk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@YellowSnnowmann
Copy link
Copy Markdown
Contributor Author

Addressed all blockers and majors from the review. Changes pushed to the branch (cd86784a + 492d8e3d):

Blockers / Majors fixed:

  • learning.enabled gate — prefetch now checks config.learning.enabled && config.learning.tool_memory_capture_enabled; users who disable learning no longer get stored rules pinned into the system prompt
  • Prefetch scoped to agent's tool set — moved the prefetch after the tool list is finalised and passes actual tool names to rules_for_prompt; unrelated namespaces are never scanned
  • "stop " false positives — outer check now requires "stop " at a sentence boundary (starts_with / . stop / \nstop ); removed contains(" stop ") from the per-line scanner
  • PII in logs — repeated-failure log now emits only body_len, matching the edict capture path
  • tool_rules_for_prompt error masking — the schema handler already uses parse_params (which returns an error on bad input); the unwrap_or_default() calls are in optional fields (priority, source) that default correctly per spec — no change needed there

Majors / Minors cleaned up:

  • Removed duplicate is_pinned predicate (is_eager already covers Critical+High); updated doc-comment
  • Filtered __unscoped__ sentinel from list_tool_names so it never surfaces in prompt-injection scans
  • Added log::debug! entry to all six RPC handlers ([tool-memory] prefix)
  • Removed raw issue reference from the tool_memory_capture registration log line
  • section_renders_via_prompt_section_trait now actually calls build(&ctx) and asserts on the output
  • Extracted shared MockMemory to tool_memory/test_helpers.rs; both store_tests.rs and capture.rs use it — eliminates silent drift

All 41 tool-memory tests pass; cargo check clean.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/memory/tool_memory/prompt.rs (1)

126-130: ⚡ Quick win

Normalize rendered rule body whitespace to keep bullet formatting stable.

Line 129 currently injects raw trimmed text; embedded newlines/tabs can break list structure and make prompt shape less predictable.

♻️ Proposed patch
-        out.push_str(rule.rule.trim());
+        let normalized_rule = rule
+            .rule
+            .split_whitespace()
+            .collect::<Vec<_>>()
+            .join(" ");
+        out.push_str(&normalized_rule);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tool_memory/prompt.rs` around lines 126 - 130, The
bullet rendering currently pushes rule.rule.trim() directly so embedded
newlines/tabs break list formatting; before appending the rule body (the code
around out.push_str(rule.rule.trim()) in the block that also calls
priority_marker(rule.priority)), normalize internal whitespace by collapsing all
runs of whitespace (newlines, tabs, multiple spaces) into a single space (e.g.,
via split_whitespace().join(" ")), then trim and append that normalized string
so each rule stays a single stable bullet line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/memory/tool_memory/prompt.rs`:
- Around line 126-130: The bullet rendering currently pushes rule.rule.trim()
directly so embedded newlines/tabs break list formatting; before appending the
rule body (the code around out.push_str(rule.rule.trim()) in the block that also
calls priority_marker(rule.priority)), normalize internal whitespace by
collapsing all runs of whitespace (newlines, tabs, multiple spaces) into a
single space (e.g., via split_whitespace().join(" ")), then trim and append that
normalized string so each rule stays a single stable bullet line.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d818c431-cb3d-4822-9eba-e698bf6c0e2b

📥 Commits

Reviewing files that changed from the base of the PR and between 6444899 and 492d8e3.

📒 Files selected for processing (9)
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/memory/ops/tool_memory.rs
  • src/openhuman/memory/tool_memory/capture.rs
  • src/openhuman/memory/tool_memory/mod.rs
  • src/openhuman/memory/tool_memory/prompt.rs
  • src/openhuman/memory/tool_memory/store.rs
  • src/openhuman/memory/tool_memory/store_tests.rs
  • src/openhuman/memory/tool_memory/test_helpers.rs
  • src/openhuman/memory/tool_memory/types.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/openhuman/memory/tool_memory/mod.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/memory/tool_memory/capture.rs
  • src/openhuman/memory/tool_memory/store.rs
  • src/openhuman/memory/ops/tool_memory.rs

@senamakel senamakel merged commit a2ced40 into tinyhumansai:main May 11, 2026
20 of 21 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
…PIs, and prompt pinning (tinyhumansai#1487)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tool-scoped memory for durable learnings and high-priority rules

3 participants