Skip to content

feat: refine MCP tool descriptions per R79 (<1024 chars, Use when/Don't use when)#223

Open
EtanHey wants to merge 1 commit intomainfrom
feat/r79-tool-descriptions
Open

feat: refine MCP tool descriptions per R79 (<1024 chars, Use when/Don't use when)#223
EtanHey wants to merge 1 commit intomainfrom
feat/r79-tool-descriptions

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 7, 2026

Summary

  • Refined all active MCP tool descriptions following R79 research pattern
  • All descriptions under 1,024 characters (Azure OpenAI compat)
  • Added explicit "Use when:" and "Don't use when:" patterns
  • Added parameter hints and sibling tool cross-references
  • Updated server_info connect instructions for top 3 tools
  • Deprecated tools kept to short 1-2 sentence descriptions
  • 3 new quality tests (char limit, Use when, Don't use when)

Context

R79 research found: "Tool descriptions are the highest-leverage fix. Anthropic's team states: description refinements alone achieved SWE-bench SOTA." PR #212 did a first pass; this refines per R79's specific guidance.

Test plan

  • pytest tests/test_tool_description_quality.py -v — 3 tests
  • All descriptions < 1,024 chars verified by test
  • ruff check src/ tests/ clean

🤖 Generated with Claude Code

Note

Refine MCP tool descriptions to include 'Use when'/'Don't use when' guidance under 1024 chars

  • Updates description strings for all active MCP tools (brain_search, brain_store, brain_recall, brain_digest, brain_entity, etc.) in init.py to follow a structured format with explicit Use when: and Don't use when: sections.
  • Rewrites the MCP server instructions text to emphasize primary routing guidance for the three core tools.
  • Adds test_tool_description_quality.py to enforce that all tool descriptions stay under 1024 characters and that active tools include the required guidance sections.

Macroscope summarized a89e3e0.

Summary by CodeRabbit

  • Documentation

    • Updated MCP tool descriptions to clarify usage boundaries, provide better guidance on tool selection, and refine parameter expectations across the platform for improved integration.
  • Tests

    • Added comprehensive test suite to validate tool description quality and ensure consistent documentation standards, including character limits and usage guidance sections.

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The PR updates the MCP server's top-level instructions and revises tool descriptions for nine tools to clarify usage boundaries and parameter expectations. It also introduces a new test module that validates tool descriptions against quality standards (character limits and required text patterns). No logic or control flow changes were made.

Changes

Cohort / File(s) Summary
MCP Tool Descriptions
src/brainlayer/mcp/__init__.py
Updated server instructions and rewrote descriptions for brain_search, brain_store, brain_get_person, brain_recall, brain_digest, brain_entity, brain_supersede, brain_archive, and brain_enrich to reframe routing boundaries, clarify parameter usage, and adjust guidance on when to avoid each tool. Net change: +19/-94 lines of descriptive text.
Tool Description Validation
tests/test_tool_description_quality.py
New test module that validates tool descriptions via runtime registry inspection. Asserts all descriptions are under 1024 characters, active tools exist in registry, and descriptions contain required text patterns ("Use when:" and "Don't use when:" or "Does NOT").

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Descriptions bloom where clarity grows,
Nine tools now shine with purpose clear,
Tests keep our words in proper rows,
Each boundary bright for all to hear!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the primary change: refining MCP tool descriptions to meet R79 requirements (character limit and specific formatting standards).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/r79-tool-descriptions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_tool_description_quality.py`:
- Around line 15-17: The helper _tool_descriptions repeatedly calls
asyncio.run(list_tools()); replace this with a cached result to avoid multiple
event loop runs—either compute a module-level constant (e.g., TOOL_DESCRIPTIONS)
by running asyncio.run(list_tools()) once and building the {tool.name:
tool.description} dict at import time, or make _tool_descriptions a cached/lazy
function (e.g., functools.lru_cache or a module-level memo) that calls
list_tools only on first invocation; update references to use the cached dict
and keep the function name _tool_descriptions (or return the cached dict) so
tests continue to work.
- Around line 5-12: The ACTIVE_TOOL_NAMES set in
tests/test_tool_description_quality.py is missing recently added active tools;
update the set literal ACTIVE_TOOL_NAMES to include "brain_supersede",
"brain_archive", and "brain_enrich" so the test matches the non-deprecated tools
defined in src/brainlayer/mcp/__init__.py (verify names exactly match the
function/class identifiers used there).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1fd08e05-31c2-43c3-a0ab-8d5524bb5bd7

📥 Commits

Reviewing files that changed from the base of the PR and between a870b96 and a89e3e0.

📒 Files selected for processing (2)
  • src/brainlayer/mcp/__init__.py
  • tests/test_tool_description_quality.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_tool_description_quality.py
  • src/brainlayer/mcp/__init__.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/mcp/__init__.py
🧠 Learnings (13)
📓 Common learnings
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)
📚 Learning: 2026-03-17T01:04:22.497Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

  • tests/test_tool_description_quality.py
  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/mcp/*.py : MCP tools include: brain_search, brain_store, brain_recall, brain_entity, brain_expand, brain_update, brain_digest, brain_get_person, brain_tags, brain_supersede, brain_archive (legacy brainlayer_* aliases still supported)

Applied to files:

  • tests/test_tool_description_quality.py
  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-02T10:56:45.869Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T10:56:45.869Z
Learning: BrainLayer serves as a local, private knowledge base for Claude Code sessions with fast search and context retrieval

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Memory tools: always use `brain_search` before answering questions about project history, architecture, or past decisions; use `brain_store` after making decisions, hitting bugs, or receiving corrections; don't rely solely on hook-injected context

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-04T15:21:39.570Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 198
File: hooks/brainlayer-prompt-search.py:169-169
Timestamp: 2026-04-04T15:21:39.570Z
Learning: In EtanHey/brainlayer, `hooks/brainlayer-prompt-search.py` reads `entity_type` directly from existing rows in `kg_entities` (read-only). `contracts/entity-types.yaml` defines the write-side schema only and is not authoritative for what `entity_type` values exist in the DB. The DB already stores `technology` (72 entities), `project` (24), and `tool` (1) as valid `entity_type` values, so `INJECT_TYPES` in the hook should match these DB values, not the contract file.

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Add `supersedes` parameter to `brain_store` for atomic store-and-replace operations

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-04T23:24:03.159Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T23:24:03.159Z
Learning: Applies to src/brainlayer/{vector_store,search}*.py : Chunk lifecycle: implement columns `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement `brain_supersede` with safety gate for personal data (journals, notes, health/finance); use soft-delete for `brain_archive` with timestamp

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table; exclude lifecycle-managed chunks from default search; allow `include_archived=True` to show history

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-01T22:38:15.693Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T22:38:15.693Z
Learning: Applies to src/brainlayer/mcp/brain_supersede.py : `brain_supersede` tool must include safety gate for personal data (journals, notes, health/finance)

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

  • src/brainlayer/mcp/__init__.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via `enrichment_controller.py`, and Ollama as offline last-resort; allow override via `BRAINLAYER_ENRICH_BACKEND` env var

Applied to files:

  • src/brainlayer/mcp/__init__.py
🔇 Additional comments (13)
src/brainlayer/mcp/__init__.py (10)

88-101: Server instructions are well-structured with clear routing guidance.

The updated instructions string provides good primary routing guidance with "Use when:" and "Don't use when:" patterns for the top three tools. The cross-references between sibling tools are helpful for LLM routing decisions.


294-294: brain_search description follows R79 pattern correctly.

The description includes "Use when:", "Don't use when:", parameter hints, and sibling tool cross-references. The guidance is clear and actionable.


434-434: brain_store description is comprehensive with proper lifecycle guidance.

The description correctly mentions the supersedes parameter for atomic store-and-replace operations, which aligns with the coding guidelines. Cross-references to brain_archive and brain_supersede are accurate.


536-536: brain_get_person description is well-structured.

Clear guidance for person-specific context retrieval with appropriate cross-references to sibling tools.


563-563: brain_recall description effectively explains the unified routing entry point.

The description clearly explains mode detection and provides appropriate guidance for when to use direct tools (brain_search) vs. the routing entry point.


699-699: brain_digest description clarifies mode differences well.

The distinction between mode='digest', mode='connect', and mode='enrich' is clear, with appropriate guidance on when to use brain_store for quick notes instead.


740-740: brain_entity description provides clear entity lookup guidance.

Good distinction between fuzzy topical search (brain_search) and structured entity graph lookup, with appropriate cross-reference to brain_get_person for person-specific needs.


902-902: brain_supersede description correctly documents the safety gate.

The description appropriately mentions safety_check='confirm' for personal data, aligning with the coding guideline requirement for safety gates on personal data (journals, notes, health/finance).


933-933: brain_archive description correctly explains soft-delete semantics.

The description aligns with the coding guideline to use soft-delete with timestamp for brain_archive.


953-953: brain_enrich description is comprehensive with mode and scope guidance.

Clear explanation of enrichment modes (realtime, batch, local) and scope-narrowing parameters (limit, since_hours, chunk_ids). The cross-references to sibling tools are accurate.

tests/test_tool_description_quality.py (3)

20-24: Character limit test covers all tools correctly.

Good approach to check all tool descriptions (including deprecated ones) against the 1024 character limit for Azure OpenAI compatibility.


27-32: Test structure is good with clear assertions.

The test correctly verifies both tool existence and the "Use when:" pattern presence.


35-41: Flexible pattern matching for negative guidance is appropriate.

The dual check for "Don't use when:" or "Does NOT" allows reasonable flexibility in how tools document their non-applicable scenarios.

Comment on lines +5 to +12
ACTIVE_TOOL_NAMES = {
"brain_search",
"brain_store",
"brain_recall",
"brain_entity",
"brain_digest",
"brain_get_person",
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ACTIVE_TOOL_NAMES is missing recently added active tools.

Based on the tool descriptions in src/brainlayer/mcp/__init__.py, the following tools also have "Use when:" and "Don't use when:" guidance and are not deprecated: brain_supersede, brain_archive, and brain_enrich. These should be included in ACTIVE_TOOL_NAMES for comprehensive test coverage.

🔧 Proposed fix to include all active tools
 ACTIVE_TOOL_NAMES = {
     "brain_search",
     "brain_store",
     "brain_recall",
     "brain_entity",
     "brain_digest",
     "brain_get_person",
+    "brain_supersede",
+    "brain_archive",
+    "brain_enrich",
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ACTIVE_TOOL_NAMES = {
"brain_search",
"brain_store",
"brain_recall",
"brain_entity",
"brain_digest",
"brain_get_person",
}
ACTIVE_TOOL_NAMES = {
"brain_search",
"brain_store",
"brain_recall",
"brain_entity",
"brain_digest",
"brain_get_person",
"brain_supersede",
"brain_archive",
"brain_enrich",
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_tool_description_quality.py` around lines 5 - 12, The
ACTIVE_TOOL_NAMES set in tests/test_tool_description_quality.py is missing
recently added active tools; update the set literal ACTIVE_TOOL_NAMES to include
"brain_supersede", "brain_archive", and "brain_enrich" so the test matches the
non-deprecated tools defined in src/brainlayer/mcp/__init__.py (verify names
exactly match the function/class identifiers used there).

Comment on lines +15 to +17
def _tool_descriptions() -> dict[str, str]:
tools = asyncio.run(list_tools())
return {tool.name: tool.description for tool in tools}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider caching tool descriptions to avoid repeated asyncio.run() calls.

The _tool_descriptions() helper is called once per test, resulting in three asyncio.run() calls. While this works, caching the result at module level would be more efficient.

♻️ Optional: Cache tool descriptions
+import functools
 import asyncio

 from brainlayer.mcp import list_tools

 ACTIVE_TOOL_NAMES = {
     "brain_search",
     "brain_store",
     "brain_recall",
     "brain_entity",
     "brain_digest",
     "brain_get_person",
 }


+@functools.lru_cache(maxsize=1)
 def _tool_descriptions() -> dict[str, str]:
     tools = asyncio.run(list_tools())
     return {tool.name: tool.description for tool in tools}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_tool_description_quality.py` around lines 15 - 17, The helper
_tool_descriptions repeatedly calls asyncio.run(list_tools()); replace this with
a cached result to avoid multiple event loop runs—either compute a module-level
constant (e.g., TOOL_DESCRIPTIONS) by running asyncio.run(list_tools()) once and
building the {tool.name: tool.description} dict at import time, or make
_tool_descriptions a cached/lazy function (e.g., functools.lru_cache or a
module-level memo) that calls list_tools only on first invocation; update
references to use the cached dict and keep the function name _tool_descriptions
(or return the cached dict) so tests continue to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant