Skip to content

feat: collection MCP tools and instinct import#2

Merged
thebtf merged 1 commit into
mainfrom
feat/collection-mcp-tools
Mar 7, 2026
Merged

feat: collection MCP tools and instinct import#2
thebtf merged 1 commit into
mainfrom
feat/collection-mcp-tools

Conversation

@thebtf
Copy link
Copy Markdown
Owner

@thebtf thebtf commented Mar 7, 2026

Summary

  • Add 7 MCP tools: list_collections, list_documents, get_document, ingest_document, search_collection, remove_document, import_instincts
  • Add internal/instincts/ package: YAML frontmatter parser, observation converter, vector similarity dedup, import orchestrator
  • Add migration 029: text column on content_chunks table
  • Update document_store to use raw SQL for pgvector embedding insertion
  • Wire DocumentStore, EmbedSvc, ChunkManager into MCP server constructor
  • Add POST /api/instincts/import HTTP endpoint

Context

Implements roadmap items #9 (Collection MCP Tools) and #7 (ECC Instinct Import).
Spec: .agent/specs/collection-mcp-tools.md, .agent/specs/instinct-import.md

Test plan

  • go build ./... compiles clean
  • 7 MCP tools appear in tools/list
  • ingest_document creates document with chunks and embeddings
  • search_collection returns ranked results with chunk text
  • list_collections returns collections with doc counts
  • import_instincts creates guidance observations from instinct files
  • Re-run import_instincts is idempotent (0 new)

Summary by CodeRabbit

Новые функции

  • Управление документами: загрузка, список, просмотр и удаление документов в коллекциях
  • Семантический поиск по содержимому коллекций
  • Импорт инстинктов из определений с автоматической дедупликацией и оценкой важности

- Add 7 MCP tools: list_collections, list_documents, get_document,
  ingest_document, search_collection, remove_document, import_instincts
- Add instinct parser package (internal/instincts/) with YAML frontmatter
  parsing, observation conversion, vector similarity dedup
- Add migration 029: text column on content_chunks table
- Update document_store to use raw SQL for pgvector embedding insertion
- Wire DocumentStore, EmbedSvc, ChunkManager into MCP server
- Add POST /api/instincts/import HTTP endpoint
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 7, 2026

Walkthrough

PR добавляет функционал парсинга и импорта инстинктов с новыми моделями данных (Instinct, ImportResult) и утилитами парсера. Улучшены операции с документами: оптимизирован SQL для работы с чанками, добавлена поддержка текстового поля, проверка дубликатов и семантический поиск. MCP сервер расширен инструментами управления документами и инстинктами.

Changes

Cohort / File(s) Summary
Инициализация MCP сервера
cmd/mcp-sse/main.go, cmd/mcp/main.go
Добавлены три nil-аргумента (documentStore, embedSvc, chunkManager) в конструктор MCP сервера для обозначения недоступности этих компонентов в standalone режиме.
Система парсинга и импорта инстинктов
internal/instincts/types.go, internal/instincts/parser.go, internal/instincts/converter.go, internal/instincts/dedup.go, internal/instincts/importer.go
Новый модуль для парсинга инстинктов из markdown файлов с YAML frontmatter, конвертации в observations, проверки дубликатов через vector client и импорта с обновлением важности.
Хранилище документов и база данных
internal/db/gorm/models.go, internal/db/gorm/migrations.go, internal/db/gorm/document_store.go
Добавлены поля Embedding и Text в ContentChunk. Миграция для новой колонки text. UpsertChunks переработана на raw SQL транзакции. SearchChunks поддерживает фильтрацию по collection и возвращает текст чанков.
MCP инструменты управления документами
internal/mcp/server.go, internal/mcp/tools_documents.go
Добавлены поля documentStore, embedSvc, chunkManager в Server. Реализованы обработчики: list_collections, list_documents, get_document, ingest_document, search_collection, remove_document с полной валидацией и обработкой ошибок.
MCP обработчик импорта инстинктов
internal/mcp/tools_instincts.go
Обработчик handleImportInstincts для вызова инстинкт-импорта через MCP с поддержкой пути по умолчанию ~/.claude/homunculus/instincts.
HTTP API и интеграция сервиса
internal/worker/handlers_instincts.go, internal/worker/service.go
Добавлен HTTP обработчик POST /api/instincts/import. В service.go зафиксирована работа chunk manager, document store и расширена инициализация MCP сервера.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ImportHandler as Import Handler
    participant Parser as Instinct Parser
    participant DedupChecker as Dedup Checker
    participant Converter as Converter
    participant ObsStore as Observation Store
    participant VectorClient as Vector Client

    Client->>ImportHandler: POST /api/instincts/import {path}
    ImportHandler->>Parser: ParseDir(instincts_path)
    Parser-->>ImportHandler: []*Instinct, []error
    
    loop For each parsed Instinct
        ImportHandler->>DedupChecker: IsDuplicate(title, threshold)
        DedupChecker->>VectorClient: Query similar observations
        VectorClient-->>DedupChecker: similarity result
        DedupChecker-->>ImportHandler: isDuplicate bool
        
        alt Not Duplicate
            ImportHandler->>Converter: ConvertToObservation(instinct)
            Converter-->>ImportHandler: ParsedObservation
            ImportHandler->>ObsStore: StoreObservation(observation)
            ObsStore-->>ImportHandler: stored observation
            ImportHandler->>ObsStore: UpdateImportanceScore(importance)
            ObsStore-->>ImportHandler: success
            ImportHandler->>ImportHandler: Track: imported++
        else Is Duplicate
            ImportHandler->>ImportHandler: Track: skipped++
        end
    end
    
    ImportHandler-->>Client: ImportResult {Total, Imported, Skipped, Errors}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

ai-review:passed

Poem

🐰 Хопь-хопь, текст и чанки, векторы в ночи,
Инстинкты парсим в ночь, помощник мудрец!
Документы, коллекции, поиск волшебный—
MCP инструменты сверкают яркой звездой! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Название точно отражает основное изменение: добавление набора инструментов MCP для коллекций и импорт инстинктов.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/collection-mcp-tools

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by introducing robust document and collection management features, alongside a dedicated system for importing external 'instincts' as guidance observations. These changes enhance the platform's ability to ingest, organize, and semantically search diverse content, thereby improving the overall knowledge base and enabling more sophisticated AI agent interactions. The integration of new database operations and service components lays the groundwork for a more intelligent and data-rich environment.

Highlights

  • New MCP Tools for Document and Collection Management: Seven new MCP tools were added to facilitate document and collection management: list_collections, list_documents, get_document, ingest_document, search_collection, remove_document, and import_instincts. These tools enable listing collections and documents, retrieving document content, ingesting new documents with chunking and embedding, performing semantic searches, and deactivating documents.
  • Instinct Import System: A new internal/instincts package was introduced to handle the parsing, deduplication, and import of ECC instinct files. This system includes a YAML frontmatter parser, an observation converter, vector similarity-based deduplication, and an import orchestrator, allowing instincts to be stored as guidance observations.
  • Database Enhancements for Document Storage: The document_store was updated to use raw SQL for pgvector embedding insertion, improving efficiency. A new database migration (029) was added to include a text column in the content_chunks table, which is crucial for returning readable search results.
  • Service Integration: The DocumentStore, EmbedSvc, and ChunkManager components were wired into the MCP server constructor, enabling the new document and instinct-related functionalities. Additionally, a new POST /api/instincts/import HTTP endpoint was added to trigger instinct imports via the worker service.
Changelog
  • cmd/mcp-sse/main.go
    • Updated the mcp.NewServer constructor call to include nil placeholders for documentStore, embedSvc, and chunkManager when running in standalone SSE mode.
  • cmd/mcp/main.go
    • Updated the mcp.NewServer constructor call to include nil placeholders for documentStore, embedSvc, and chunkManager when running in standalone MCP mode.
  • internal/db/gorm/document_store.go
    • Refactored UpsertChunks to use raw SQL for inserting pgvector embeddings, bypassing GORM's default mapping for performance and direct control.
    • Modified SearchChunks to optionally search across all active documents if no specific collection is provided.
    • Updated SearchChunks to include the text column in the returned ContentChunk for more informative search results.
  • internal/db/gorm/migrations.go
    • Added migration 029_content_chunks_text to introduce a text column to the content_chunks table, with a default empty string and NOT NULL constraint.
  • internal/db/gorm/models.go
    • Imported the pgvector-go package.
    • Added Embedding pgvec.Vector and Text string fields to the ContentChunk struct, with appropriate GORM and JSON tags.
  • internal/instincts/converter.go
    • Added ConvertToObservation function to transform an Instinct into a models.ParsedObservation suitable for storage.
    • Implemented InstinctImportanceScore to scale instinct confidence to an importance score (1-10).
  • internal/instincts/dedup.go
    • Added IsDuplicate function to check for existing observations with similar content using vector similarity search.
  • internal/instincts/importer.go
    • Implemented the Import function to orchestrate the parsing, deduplication, conversion, and storage of instinct files as guidance observations.
  • internal/instincts/parser.go
    • Added ParseFile to parse individual markdown files with YAML frontmatter into Instinct structs.
    • Implemented ParseDir to recursively parse all markdown files within a given directory.
  • internal/instincts/types.go
    • Defined the Instinct struct to represent parsed instinct data, including metadata and body content.
    • Defined the ImportResult struct to summarize the outcome of an instinct import operation.
  • internal/mcp/server.go
    • Imported chunking and embedding packages.
    • Added documentStore, embedSvc, and chunkManager fields to the Server struct.
    • Updated the NewServer constructor to accept documentStore, embedSvc, and chunkManager.
    • Added definitions for seven new MCP tools: list_collections, list_documents, get_document, ingest_document, search_collection, remove_document, and import_instincts.
    • Wired the new MCP tools to their respective handler functions.
  • internal/mcp/tools_documents.go
    • Added handleListCollections to list all configured document collections with their document counts.
    • Implemented handleListDocuments to list documents within a specified collection.
    • Created handleGetDocument to retrieve the full content of a document by collection and path.
    • Added handleRemoveDocument to deactivate a document from a collection.
    • Implemented handleIngestDocument to ingest new documents, handling content chunking, embedding, and storage.
    • Created handleSearchCollection to perform semantic searches across document chunks in a collection.
  • internal/mcp/tools_instincts.go
    • Implemented handleImportInstincts to import ECC instinct files from a specified directory into the system as guidance observations.
  • internal/worker/handlers_instincts.go
    • Added handleInstinctsImport HTTP handler for the POST /api/instincts/import endpoint, facilitating the import of instinct files.
  • internal/worker/service.go
    • Imported chunking, golang chunking, and markdown chunking packages.
    • Added createChunkManager function to initialize a chunking manager with available language chunkers.
    • Initialized documentStore and chunkManager within the initializeAsync method.
    • Passed documentStore, embedSvc, and chunkManager to the mcp.NewServer constructor.
    • Registered the POST /api/instincts/import route in setupRoutes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@thebtf
Copy link
Copy Markdown
Owner Author

thebtf commented Mar 7, 2026

@coderabbitai review

@thebtf
Copy link
Copy Markdown
Owner Author

thebtf commented Mar 7, 2026

@gemini-code-assist review

@thebtf
Copy link
Copy Markdown
Owner Author

thebtf commented Mar 7, 2026

@codex review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces collection management tools and an instinct import feature. The changes include new MCP tools, database migrations, and API endpoints. My review focuses on correctness, efficiency, and maintainability. I've identified a bug in the document ingestion logic, an inefficiency in file handling, an unhandled error in an HTTP handler, and some code duplication.

Note: Security Review did not run due to the size of the PR.

Comment thread internal/mcp/tools_documents.go
Comment thread internal/worker/handlers_instincts.go
Comment thread internal/db/gorm/document_store.go
Comment thread internal/instincts/converter.go
Comment thread internal/mcp/tools_documents.go
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new functionality by adding MCP tools for document collections and a mechanism for importing "instincts". However, a high-severity path traversal vulnerability was identified in the import_instincts tool and its corresponding HTTP endpoint, which allows reading arbitrary directories and files on the host system. Additionally, the review highlighted potential inconsistencies in scoring logic, areas for improved robustness in error handling for the new API endpoint, and opportunities to enhance code clarity and resource management.

Comment thread internal/mcp/tools_instincts.go
Comment thread internal/worker/handlers_instincts.go
Comment thread internal/instincts/converter.go
Comment thread internal/worker/handlers_instincts.go
Comment thread internal/instincts/converter.go
Comment thread internal/mcp/tools_documents.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
internal/instincts/importer.go (2)

15-57: Рассмотрите возможность возврата ошибки при критических сбоях.

Функция Import всегда возвращает nil как error, даже если все instincts не удалось импортировать. Рассмотрите возможность возврата ошибки, если result.Imported == 0 && len(result.Errors) > 0, чтобы вызывающий код мог обработать полный сбой импорта.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/instincts/importer.go` around lines 15 - 57, The Import function
currently always returns nil error; modify it so after processing (before the
final return) it checks the ImportResult (result.Imported and result.Errors) and
returns a non-nil error when no instincts were imported and there are stored
errors (i.e., result.Imported == 0 && len(result.Errors) > 0). Construct a clear
error message (e.g., "no instincts imported: N errors") that includes summary
info from result (counts or first error) to aid callers, and keep returning the
result along with that error; leave successful paths unchanged. Ensure this
change is made in the Import function and uses the existing ImportResult
variable for the condition.

40-50: Непоследовательная обработка ошибок между StoreObservation и UpdateImportanceScore.

Ошибки StoreObservation добавляются в result.Errors (строка 42), но ошибки UpdateImportanceScore только логируются (строка 49). Если обновление importance score критично, ошибки должны обрабатываться единообразно. Если это осознанный выбор (score некритичен), рекомендуется добавить комментарий для ясности.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/instincts/importer.go` around lines 40 - 50, The error handling is
inconsistent: obsStore.StoreObservation errors are appended to result.Errors
while obsStore.UpdateImportanceScore errors are only logged; decide and apply a
consistent strategy. Either treat UpdateImportanceScore as critical by appending
its error to result.Errors (e.g., same fmt.Sprintf pattern with inst.ID) and
continue, or if the score update is intentionally non-critical, add a clear
comment above the UpdateImportanceScore call explaining that failures are safe
to ignore and therefore only logged; update the code around
obsStore.StoreObservation, obsStore.UpdateImportanceScore,
InstinctImportanceScore and result.Errors accordingly.
internal/instincts/converter.go (1)

13-20: Дублирование логики расчёта importance score.

Логика масштабирования confidence в importance score (округление, clamping к диапазону 1-10) дублируется в ConvertToObservation (строки 14-20) и InstinctImportanceScore (строки 49-57). Рекомендуется использовать helper-функцию.

♻️ Предлагаемый рефакторинг
 func ConvertToObservation(inst *Instinct) *models.ParsedObservation {
-	// Scale confidence (0.0-1.0) to importance score (1-10)
-	importance := math.Round(inst.Confidence * 10)
-	if importance < 1 {
-		importance = 1
-	}
-	if importance > 10 {
-		importance = 10
-	}
-
 	// Build concepts from domain
 	var concepts []string
 	if inst.Domain != "" {
 		concepts = append(concepts, inst.Domain)
 	}

 	// Build tags preserving original metadata
 	tags := []string{"instinct", fmt.Sprintf("source:%s", inst.Source)}
 	if inst.ID != "" {
 		tags = append(tags, fmt.Sprintf("instinct-id:%s", inst.ID))
 	}

-	_ = importance // stored separately; ParsedObservation has no importance field
-
 	return &models.ParsedObservation{

Also applies to: 49-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/instincts/converter.go` around lines 13 - 20, The scaling/cla mping
of inst.Confidence into an importance score is duplicated in
ConvertToObservation and InstinctImportanceScore; extract that logic into a
single helper function (e.g., computeImportance or scaleConfidenceToImportance)
that takes a float64 confidence and returns the rounded, clamped float64/int in
the 1-10 range, then replace the duplicated blocks in ConvertToObservation and
InstinctImportanceScore with calls to that helper to ensure a single source of
truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/instincts/converter.go`:
- Line 39: The SourceType field in converter.go is set to an invalid string
literal "instinct-import"; update it to use one of the existing constants from
pkg/models/observation.go (e.g. SourceToolVerified, SourceToolRead,
SourceWebFetch, SourceTodoWrite, SourceLLMDerived, or SourceUnknown) or add a
new exported constant (e.g. SourceInstinctImport) to pkg/models/observation.go
and use that constant in converter.go (reference the SourceType assignment in
the converter code and the constants in pkg/models/observation.go to locate and
change the value).

In `@internal/instincts/importer.go`:
- Line 40: При импорте в importer.go не передавать пустые строки в
obsStore.StoreObservation (sdkSessionID и project) — это нарушает NOT NULL в
таблице observations; вместо этого установите явные значения по умолчанию
(например константы типа "imported" или специальный импортный
SDK_SESSION_ID/PROJECT) или используйте nullable/optional типы/обёртки согласно
сигнатуре StoreObservation; в коде упомяните инициализацию этих значений рядом с
вызовом obsStore.StoreObservation. Также исправьте логику очистки в
observation_store.go (функция очистки старых наблюдений) чтобы она не пропускала
записи с пустым project — либо включите пустой проект в критерии удаления (т.е.
удалять записи с project = '' как импортные), либо явно фильтруйте/помечайте
импортные записи и учитывайте эту метку в CleanupOldObservations/соответствующей
функции очистки.

In `@internal/mcp/server.go`:
- Around line 879-940: The tool list currently unconditionally advertises
"list_documents", "get_document", "ingest_document", "search_collection" and
"remove_document" even when dependencies are nil; update the code that
builds/returns the tools list (the code emitting the tool entries for the names
above) to conditionally include tools: only add "list_documents", "get_document"
and "remove_document" when documentStore != nil, and only add "ingest_document"
and "search_collection" when both documentStore != nil and embedSvc != nil
(optionally also require chunkManager for ingest if your ingestion path needs
it). Locate the block that constructs the tool descriptors (the map entries with
Name: "list_documents" etc.) and wrap each descriptor creation in the
appropriate nil-checks so only supported capabilities are advertised.

In `@internal/mcp/tools_documents.go`:
- Around line 213-221: The fast-path wrongly uses s.documentStore.SearchChunks
with a zero-vector to detect existing chunks, which returns nearest embeddings
(via embedSvc.Dimensions()) not chunks for a specific hash; replace this with an
exact hash lookup: query the document store for chunks where Hash == newHash
(e.g., implement/use a GetChunksByHash / FindChunks(filter{hash:newHash}) or add
a hash filter parameter to SearchChunks) and only return the "already
up-to-date" message when that exact-hash query returns results; update the block
that references existing, existing[0].Hash, doc.Hash and params.Content to rely
on the exact-hash query.

In `@internal/mcp/tools_instincts.go`:
- Around line 41-44: Импорт сейчас продолжает создавать записи когда вызов
дедупликации в internal/instincts/importer.go (вызов IsDuplicate) возвращает
ошибку — нужно изменить поведение: при ошибке дедупликации (т.е. когда
IsDuplicate возвращает ошибку, а не просто true/false) прерывать импорт и
вернуть ошибку из instincts.Import, либо заранее отказаться от запуска
import_instincts если отсутствует или недоступен s.vectorClient; обновите
функцию instincts.Import (и место вызова в tools_instincts.go) чтобы при ошибке
от dedup-check сразу возвращать ошибку (не логировать и продолжать) и/или
добавить проверку наличия рабочей s.vectorClient перед началом импорта.

In `@internal/worker/handlers_instincts.go`:
- Around line 20-42: The handler currently accepts params.Path and uses it
directly as dir (falling back to filepath.Join(home, ".claude", "homunculus",
"instincts")), which allows clients to point at arbitrary filesystem locations;
change this to always resolve incoming params.Path against a fixed base
directory and deny any path that escapes it (e.g., compute base :=
filepath.Join(os.UserHomeDir(), ".claude", "homunculus", "instincts") and then
join+clean the requested path and validate it has base as a prefix), move that
validation logic into a shared helper (e.g.,
ValidateAndResolveInstinctsPath(requested string) used by this handler and the
MCP tool), and remove direct filesystem access from using raw params.Path so
importer only ever sees paths inside the allowed base.
- Around line 23-26: The JSON decode currently swallows errors which lets an
empty params trigger a default import; change the decoding to capture the
returned error from json.NewDecoder(r.Body).Decode(&params) inside the handler
in internal/worker/handlers_instincts.go (where r.Body and params are used), and
if err != nil respond with HTTP 400 Bad Request (include a short error message)
and return instead of proceeding; keep the existing defer r.Body.Close() but do
not ignore the decode error.

---

Nitpick comments:
In `@internal/instincts/converter.go`:
- Around line 13-20: The scaling/cla mping of inst.Confidence into an importance
score is duplicated in ConvertToObservation and InstinctImportanceScore; extract
that logic into a single helper function (e.g., computeImportance or
scaleConfidenceToImportance) that takes a float64 confidence and returns the
rounded, clamped float64/int in the 1-10 range, then replace the duplicated
blocks in ConvertToObservation and InstinctImportanceScore with calls to that
helper to ensure a single source of truth.

In `@internal/instincts/importer.go`:
- Around line 15-57: The Import function currently always returns nil error;
modify it so after processing (before the final return) it checks the
ImportResult (result.Imported and result.Errors) and returns a non-nil error
when no instincts were imported and there are stored errors (i.e.,
result.Imported == 0 && len(result.Errors) > 0). Construct a clear error message
(e.g., "no instincts imported: N errors") that includes summary info from result
(counts or first error) to aid callers, and keep returning the result along with
that error; leave successful paths unchanged. Ensure this change is made in the
Import function and uses the existing ImportResult variable for the condition.
- Around line 40-50: The error handling is inconsistent:
obsStore.StoreObservation errors are appended to result.Errors while
obsStore.UpdateImportanceScore errors are only logged; decide and apply a
consistent strategy. Either treat UpdateImportanceScore as critical by appending
its error to result.Errors (e.g., same fmt.Sprintf pattern with inst.ID) and
continue, or if the score update is intentionally non-critical, add a clear
comment above the UpdateImportanceScore call explaining that failures are safe
to ignore and therefore only logged; update the code around
obsStore.StoreObservation, obsStore.UpdateImportanceScore,
InstinctImportanceScore and result.Errors accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c9ac492a-26c5-46d1-b846-7c9096507201

📥 Commits

Reviewing files that changed from the base of the PR and between fa18b2f and 60e1b1a.

📒 Files selected for processing (15)
  • cmd/mcp-sse/main.go
  • cmd/mcp/main.go
  • internal/db/gorm/document_store.go
  • internal/db/gorm/migrations.go
  • internal/db/gorm/models.go
  • internal/instincts/converter.go
  • internal/instincts/dedup.go
  • internal/instincts/importer.go
  • internal/instincts/parser.go
  • internal/instincts/types.go
  • internal/mcp/server.go
  • internal/mcp/tools_documents.go
  • internal/mcp/tools_instincts.go
  • internal/worker/handlers_instincts.go
  • internal/worker/service.go

Comment thread internal/instincts/converter.go
Comment thread internal/instincts/importer.go
Comment thread internal/mcp/server.go
Comment thread internal/mcp/tools_documents.go
Comment thread internal/mcp/tools_instincts.go
Comment thread internal/worker/handlers_instincts.go
Comment thread internal/worker/handlers_instincts.go
@thebtf
Copy link
Copy Markdown
Owner Author

thebtf commented Mar 7, 2026

🤖 PR Review MCP State (auto-managed, do not edit)
{
  "version": 2,
  "parentChildren": {},
  "resolvedNitpicks": {
    "coderabbit-nitpick-3c56d046-15": {
      "resolvedAt": "2026-03-07T11:05:51.763Z",
      "resolvedBy": "agent"
    },
    "coderabbit-nitpick-80196fe0-40": {
      "resolvedAt": "2026-03-07T11:05:54.318Z",
      "resolvedBy": "agent"
    },
    "coderabbit-nitpick-76171a03-13": {
      "resolvedAt": "2026-03-07T11:07:32.285Z",
      "resolvedBy": "agent"
    },
    "coderabbit-nitpick-09b7baee-12": {
      "resolvedAt": "2026-03-07T11:16:22.386Z",
      "resolvedBy": "agent"
    },
    "coderabbit-nitpick-0dc3626b-224": {
      "resolvedAt": "2026-03-07T11:17:06.159Z",
      "resolvedBy": "agent"
    },
    "coderabbit-nitpick-cbf08913-36": {
      "resolvedAt": "2026-03-07T11:18:17.845Z",
      "resolvedBy": "agent"
    }
  },
  "updatedAt": "2026-03-07T11:18:18.296Z"
}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
internal/worker/handlers_instincts.go (2)

23-25: ⚠️ Potential issue | 🟠 Major

Битый JSON не должен запускать импорт по умолчанию.

Сейчас ошибка Decode теряется, params остаётся пустым, и handler может реально запустить импорт из ~/.claude/homunculus/instincts. Для такого bulk-endpoint здесь нужен 400 Bad Request, а не молчаливый fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/handlers_instincts.go` around lines 23 - 25, When decoding
the request body into params in the instincts handler, don't ignore json.Decode
errors: check the error returned by json.NewDecoder(r.Body).Decode(&params) and
if it's a non-nil error (except a deliberate empty-body/EOF case if you want to
allow it) respond with 400 Bad Request and abort the import; keep the defer
r.Body.Close() but replace the current ignored decode with proper error handling
so malformed JSON cannot fall back to using the default
~/.claude/homunculus/instincts data.

28-42: ⚠️ Potential issue | 🟠 Major

Не принимайте произвольный filesystem path от клиента.

params.Path уходит напрямую в os.Stat и дальше в importer, так что аутентифицированный клиент может направить импорт на любой читаемый каталог хоста. Разрешайте только подкаталоги фиксированного base dir и вынесите resolve/validate в общий helper, чтобы HTTP и MCP не расходились.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/handlers_instincts.go` around lines 28 - 42, Do not accept
arbitrary filesystem paths from the client: replace direct use of params.Path in
handlers_instincts.go (the os.Stat and downstream importer calls) with a
validation/resolution helper that restricts paths to subdirectories of a fixed
base directory (e.g., ~/.claude/homunculus/instincts). Implement a shared helper
(e.g., ResolveAndValidateInstinctPath(baseDir, clientPath) used by both the HTTP
handler and MCP) that joins the baseDir and clientPath, cleans the result
(filepath.Clean), verifies the resulting path has the baseDir as its prefix
(preventing path traversal), and returns an error for invalid or non-existent
targets; then replace the current os.Stat(dir) check and any direct passing of
params.Path to the importer with the helper's validated path.
internal/mcp/server.go (1)

879-940: ⚠️ Potential issue | 🟠 Major

Не рекламируйте document tools, когда зависимости не подняты.

В standalone wiring cmd/mcp/main.go и cmd/mcp-sse/main.go передают nil для documentStore, embedSvc и chunkManager, но tools/list всё равно объявляет list_documents, get_document, ingest_document, search_collection и remove_document. Клиент увидит capability, которая в этом режиме не может корректно отработать. Собирайте этот блок условно: list/get/remove при documentStore != nil, ingest/search — только когда готовы все нужные зависимости.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/mcp/server.go` around lines 879 - 940, The tool capability entries
for document operations are always registered even when dependencies are nil;
change the tool registration in internal/mcp/server.go so the "list_documents",
"get_document", and "remove_document" entries are added only if documentStore !=
nil, and add "ingest_document" and "search_collection" only when documentStore
!= nil && embedSvc != nil && chunkManager != nil; update the block that builds
the tools slice (where the entries for
list_documents/get_document/ingest_document/search_collection/remove_document
are defined) to conditionally append each tool based on those dependency checks
so clients only see capabilities that can actually run.
🧹 Nitpick comments (3)
internal/instincts/converter.go (1)

12-20: Дублирование логики расчёта importance score.

Логика масштабирования confidence (0.0-1.0) в importance (1-10) с clamping дублируется в ConvertToObservation (строки 14-20) и InstinctImportanceScore (строки 50-56). При этом в ConvertToObservation результат вычисляется, но не используется (строка 34).

♻️ Предлагаемый рефакторинг
 func ConvertToObservation(inst *Instinct) *models.ParsedObservation {
-	// Scale confidence (0.0-1.0) to importance score (1-10)
-	importance := math.Round(inst.Confidence * 10)
-	if importance < 1 {
-		importance = 1
-	}
-	if importance > 10 {
-		importance = 10
-	}
-
 	// Build concepts from domain
 	var concepts []string
 	if inst.Domain != "" {
@@ -31,8 +23,6 @@
 		tags = append(tags, fmt.Sprintf("instinct-id:%s", inst.ID))
 	}

-	_ = importance // stored separately; ParsedObservation has no importance field
-
 	return &models.ParsedObservation{
 		Type:       models.ObsTypeGuidance,
 		MemoryType: models.MemTypeGuidance,

Also applies to: 46-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/instincts/converter.go` around lines 12 - 20, ConvertToObservation
currently duplicates the confidence→importance scaling/clamping (lines where
importance is computed) and then doesn't use that computed value; instead, reuse
the existing InstinctImportanceScore(inst *Instinct) function to compute the
importance and remove the duplicated computation and unused variable in
ConvertToObservation so ConvertToObservation calls InstinctImportanceScore(inst)
(or otherwise references that function's result) and preserves single-source
logic for the 0.0-1.0 → 1-10 mapping and clamping.
internal/mcp/tools_documents.go (1)

224-264: Использование временного файла для chunker.

Создание временного файла для передачи в chunker работает, но добавляет накладные расходы ввода-вывода. Если chunker поддерживает работу с io.Reader или строками, рассмотрите рефакторинг для избежания записи на диск.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/mcp/tools_documents.go` around lines 224 - 264, Вместо всегда
создания временного файла, измените блок, который использует s.chunkManager:
если s.chunkManager реализует in-memory API (например методы вроде
SupportsReader/SupportsString и ChunkFromReader/ChunkFromString или
аналогичные), вызывайте эти методы с params.Content (или с bytes.NewReader),
иначе падайте обратно на существующую проверку SupportsFile и использование
ChunkFile; уберите постоянное создание tmpFile и os.CreateTemp path branch when
in-memory path is available, keeping the temp-file fallback only when no
in-memory API exists.
internal/mcp/tools_instincts.go (1)

36-39: Возврат пустого результата вместо ошибки при отсутствии директории.

Если директория не существует, функция возвращает информационное сообщение без ошибки. Это соответствует описанному поведению в PR, но может скрыть проблемы конфигурации от вызывающего кода.

Рекомендую добавить в результат явный признак, что операция не выполнена, либо возвращать ошибку для случаев, когда путь был явно указан пользователем.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/mcp/tools_instincts.go` around lines 36 - 39, When os.Stat(dir)
reports the directory missing, don't return a non-error string; instead return a
clear error. Replace the current return fmt.Sprintf("Instincts directory not
found: %s", dir), nil with a failing result such as "", fmt.Errorf("instincts
directory not found: %s", dir) (or wrap the underlying error) so callers receive
a non-nil error; update any callers expecting the old behavior if needed. Ensure
you reference the os.Stat(dir) check and the existing message string when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/db/gorm/document_store.go`:
- Around line 133-141: The current loop in the UpsertChunks path inserts NULL
embeddings when c.Embedding.Slice() is empty, which breaks vector search; update
the code that contains the loop (the UpsertChunks / the function invoking
stmt.ExecContext over chunks) to validate that c.Embedding is non-empty before
attempting the ExecContext call (for example, if c.Embedding.Slice() == 0 return
a clear error like "empty embedding for chunk Seq X") so no NULL is written to
the embedding column (or alternatively enforce NOT NULL at the DB level and fail
fast). Ensure you reference the chunk fields (c.Embedding, c.Seq) in the error
message so callers can diagnose which chunk failed.

In `@internal/db/gorm/migrations.go`:
- Around line 1027-1035: Migration "029_content_chunks_text" currently adds a
non-null text column but leaves existing rows empty; update the Migrate function
(alongside the tx.Exec that adds the column) to perform a backfill that
populates text for existing content_chunks (e.g., run an UPDATE via tx.Exec to
set text = <derived chunk text> based on existing fields or call the project’s
rechunk/backfill routine), and ensure the operation is safe for large tables
(batching or scheduling a background job if needed); keep Rollback unchanged.

In `@internal/instincts/dedup.go`:
- Around line 13-15: The current nil-check for vectorClient that returns "false,
nil" silently disables deduplication; change this in internal/instincts/dedup.go
by either making it fail-fast (return an error when vectorClient == nil) or
implementing a non-vector deterministic fallback: derive a stable dedup key from
the import metadata (e.g., import ID, timestamp, and stable hashing of the
import payload) and use that key for the existing duplicate check logic instead
of short-circuiting; update the branch around vectorClient and the function that
returns (false, nil) so callers receive an error or a consistent fallback
result, and ensure the unique symbols vectorClient and the dedup function
handling the return are adjusted accordingly.
- Around line 17-22: The dedup currently queries all vectors via
vectorClient.Query(ctx, title, 1, nil) which can match non-observation vectors;
update the Query call in dedup (the block using vectorClient.Query and checking
results[0].Similarity >= threshold) to include a where/filter limiting to
doc_type = "observation" (replace the nil filter arg with an appropriate filter
object or map that enforces doc_type == "observation") so only observation
vectors are considered for dedup.

---

Duplicate comments:
In `@internal/mcp/server.go`:
- Around line 879-940: The tool capability entries for document operations are
always registered even when dependencies are nil; change the tool registration
in internal/mcp/server.go so the "list_documents", "get_document", and
"remove_document" entries are added only if documentStore != nil, and add
"ingest_document" and "search_collection" only when documentStore != nil &&
embedSvc != nil && chunkManager != nil; update the block that builds the tools
slice (where the entries for
list_documents/get_document/ingest_document/search_collection/remove_document
are defined) to conditionally append each tool based on those dependency checks
so clients only see capabilities that can actually run.

In `@internal/worker/handlers_instincts.go`:
- Around line 23-25: When decoding the request body into params in the instincts
handler, don't ignore json.Decode errors: check the error returned by
json.NewDecoder(r.Body).Decode(&params) and if it's a non-nil error (except a
deliberate empty-body/EOF case if you want to allow it) respond with 400 Bad
Request and abort the import; keep the defer r.Body.Close() but replace the
current ignored decode with proper error handling so malformed JSON cannot fall
back to using the default ~/.claude/homunculus/instincts data.
- Around line 28-42: Do not accept arbitrary filesystem paths from the client:
replace direct use of params.Path in handlers_instincts.go (the os.Stat and
downstream importer calls) with a validation/resolution helper that restricts
paths to subdirectories of a fixed base directory (e.g.,
~/.claude/homunculus/instincts). Implement a shared helper (e.g.,
ResolveAndValidateInstinctPath(baseDir, clientPath) used by both the HTTP
handler and MCP) that joins the baseDir and clientPath, cleans the result
(filepath.Clean), verifies the resulting path has the baseDir as its prefix
(preventing path traversal), and returns an error for invalid or non-existent
targets; then replace the current os.Stat(dir) check and any direct passing of
params.Path to the importer with the helper's validated path.

---

Nitpick comments:
In `@internal/instincts/converter.go`:
- Around line 12-20: ConvertToObservation currently duplicates the
confidence→importance scaling/clamping (lines where importance is computed) and
then doesn't use that computed value; instead, reuse the existing
InstinctImportanceScore(inst *Instinct) function to compute the importance and
remove the duplicated computation and unused variable in ConvertToObservation so
ConvertToObservation calls InstinctImportanceScore(inst) (or otherwise
references that function's result) and preserves single-source logic for the
0.0-1.0 → 1-10 mapping and clamping.

In `@internal/mcp/tools_documents.go`:
- Around line 224-264: Вместо всегда создания временного файла, измените блок,
который использует s.chunkManager: если s.chunkManager реализует in-memory API
(например методы вроде SupportsReader/SupportsString и
ChunkFromReader/ChunkFromString или аналогичные), вызывайте эти методы с
params.Content (или с bytes.NewReader), иначе падайте обратно на существующую
проверку SupportsFile и использование ChunkFile; уберите постоянное создание
tmpFile и os.CreateTemp path branch when in-memory path is available, keeping
the temp-file fallback only when no in-memory API exists.

In `@internal/mcp/tools_instincts.go`:
- Around line 36-39: When os.Stat(dir) reports the directory missing, don't
return a non-error string; instead return a clear error. Replace the current
return fmt.Sprintf("Instincts directory not found: %s", dir), nil with a failing
result such as "", fmt.Errorf("instincts directory not found: %s", dir) (or wrap
the underlying error) so callers receive a non-nil error; update any callers
expecting the old behavior if needed. Ensure you reference the os.Stat(dir)
check and the existing message string when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0ab25a8-3747-4bfa-b9af-5f38e6987b6a

📥 Commits

Reviewing files that changed from the base of the PR and between fa18b2f and 60e1b1a.

📒 Files selected for processing (15)
  • cmd/mcp-sse/main.go
  • cmd/mcp/main.go
  • internal/db/gorm/document_store.go
  • internal/db/gorm/migrations.go
  • internal/db/gorm/models.go
  • internal/instincts/converter.go
  • internal/instincts/dedup.go
  • internal/instincts/importer.go
  • internal/instincts/parser.go
  • internal/instincts/types.go
  • internal/mcp/server.go
  • internal/mcp/tools_documents.go
  • internal/mcp/tools_instincts.go
  • internal/worker/handlers_instincts.go
  • internal/worker/service.go

Comment thread internal/db/gorm/document_store.go
Comment thread internal/db/gorm/migrations.go
Comment thread internal/instincts/dedup.go
Comment thread internal/instincts/dedup.go
@thebtf thebtf merged commit 60e1b1a into main Mar 7, 2026
2 checks passed
thebtf added a commit that referenced this pull request Apr 16, 2026
…REAKING)

Consolidates the BREAKING CHANGE from PR #176 (legacy MCP/CLI removal) with
two user-visible plugin tactical fixes into a single coherent v4.5.0 release.

## BREAKING CHANGES (carried over from PR #176)

The 'engram-mcp' and 'engram-mcp-stdio-proxy' release artifacts no longer
ship. Bare-metal users who installed via tarball or scripts/install.sh and
relied on the engram-mcp binary must migrate to the plugin marketplace
install path:

  /plugin marketplace add thebtf/engram-marketplace
  /plugin install engram

Plugin marketplace users are unaffected — ensure-binary.js downloads only
the 'engram' stdio-proxy binary, not engram-mcp. cmd/mcp/, cmd/engram-cli/,
the Dockerfile client image stage, the engram-mcp Makefile + goreleaser +
install.sh entries, and the bin/mcp-server / bin/mcp-stdio-proxy
COMPONENTS.md sections are all gone.

## Fix #1 — MCP startup reliability after CC plugin update

Smoking gun (.agent/reports/plugin-tactical-fix-triage.md): when CC updates
the plugin from v4.3.0 to v4.4.x it does NOT migrate plugin-level
user_config (server_url + api_token). .mcp.json then expands
${user_config.server_url} to an empty string, the binary spawns silently,
and every MCP tool call fails with an opaque gRPC dial error against an
empty target.

Fix:
  - plugin/engram/.mcp.json: env block now also passes
    "ENGRAM_URL_LEGACY": "${ENGRAM_URL}" so users who had ENGRAM_URL in
    their shell from the v4.3.0 setup still get a working MCP server.
  - plugin/engram/scripts/run-engram.js: before spawning, fall back to
    ENGRAM_URL_LEGACY if ENGRAM_URL is empty, AND emit a visible WARN to
    stderr if both are empty so the failure mode is no longer silent.

## Fix #2 — disable noisy context injection

Per .agent/reports/phase-2-synthesis.md fix #16 + the entity audits, the
session-start hook currently injects 100 raw observations unioned with 10
semantic hits, plus project briefing, plus learned guidance. Reported as
noise rather than relevant context.

plugin/engram/hooks/session-start.js now keeps the inject GET (still
needed for result.always_inject) but renders ONLY:
  - result.always_inject     -> <user-behavior-rules>
  - GET /api/issues          -> <engram-issues>
  - GET /api/issues resolved -> <engram-resolved-issues>

Skipped: result.observations / result.full_count / result.project_briefing
/ result.guidance. mark-injected scoped to always_inject IDs only so
citation tracking does not log false positives. Misleading observation-
count log line removed.

Save / recall MCP tools (recall_memory, store_memory, find_by_file,
observations, store, feedback, vault, issues) are completely independent
of this hook — separate gRPC path, unaffected.

This is tactical, NOT a redesign. Phase 2 strategic work continues:
citation session_id="" smoking gun, 100-obs cap policy, hook
curated-render redesign — all in .agent/reports/phase-2-synthesis.md.

## Version bump

Per Constitution §15 the daemon version and plugin version move together:
  - cmd/engram/main.go daemonVersion "v4.4.0" -> "v4.5.0"
  - plugin/engram/.claude-plugin/plugin.json "4.4.0" -> "4.5.0"

## Verification

  - JSON parse on .mcp.json + plugin.json: clean
  - node --check on run-engram.js + session-start.js: clean
  - go build ./... clean
  - 3-OS matrix CI green on the prior 4.4.1 commit (re-runs on amend)

## Files

  4 plugin/daemon files + 1 daemon version constant.
  Net: -76 LOC (mostly removed render blocks in session-start.js).
thebtf added a commit that referenced this pull request Apr 16, 2026
…REAKING) (#177)

Consolidates BREAKING from PR #176 + 2 plugin tactical fixes into v4.5.0.

BREAKING: 'engram-mcp' and 'engram-mcp-stdio-proxy' release artifacts no
longer ship. Bare-metal users migrate to plugin marketplace path.
Plugin marketplace users unaffected (ensure-binary.js downloads only the
'engram' stdio-proxy binary).

Fix #1 — MCP startup reliability after CC plugin update:
  - .mcp.json adds ENGRAM_URL_LEGACY = ${ENGRAM_URL} fallback
  - run-engram.js falls back if user_config-derived ENGRAM_URL is empty,
    + visible WARN to stderr if both empty (no more silent failure)

Fix #2 — Disable noisy context injection:
  - session-start.js keeps inject GET (still needed for always_inject)
    but renders ONLY result.always_inject + 2 issues fetches
  - Skipped: result.observations / result.full_count /
    result.project_briefing / result.guidance
  - mark-injected scoped to always_inject IDs only
  - All MCP save/recall tools unchanged (separate gRPC path)

Version bump per Constitution §15:
  - cmd/engram/main.go daemonVersion v4.4.0 -> v4.5.0
  - plugin/engram/.claude-plugin/plugin.json 4.4.0 -> 4.5.0

Phase 2 strategic redesign continues separately — see
.agent/reports/phase-2-synthesis.md for the 21-item fix priority matrix.
@thebtf thebtf deleted the feat/collection-mcp-tools branch May 7, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant