feat: collection MCP tools and instinct import#2
Conversation
- Add 7 MCP tools: list_collections, list_documents, get_document, ingest_document, search_collection, remove_document, import_instincts - Add instinct parser package (internal/instincts/) with YAML frontmatter parsing, observation conversion, vector similarity dedup - Add migration 029: text column on content_chunks table - Update document_store to use raw SQL for pgvector embedding insertion - Wire DocumentStore, EmbedSvc, ChunkManager into MCP server - Add POST /api/instincts/import HTTP endpoint
WalkthroughPR добавляет функционал парсинга и импорта инстинктов с новыми моделями данных (Instinct, ImportResult) и утилитами парсера. Улучшены операции с документами: оптимизирован SQL для работы с чанками, добавлена поддержка текстового поля, проверка дубликатов и семантический поиск. MCP сервер расширен инструментами управления документами и инстинктами. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ImportHandler as Import Handler
participant Parser as Instinct Parser
participant DedupChecker as Dedup Checker
participant Converter as Converter
participant ObsStore as Observation Store
participant VectorClient as Vector Client
Client->>ImportHandler: POST /api/instincts/import {path}
ImportHandler->>Parser: ParseDir(instincts_path)
Parser-->>ImportHandler: []*Instinct, []error
loop For each parsed Instinct
ImportHandler->>DedupChecker: IsDuplicate(title, threshold)
DedupChecker->>VectorClient: Query similar observations
VectorClient-->>DedupChecker: similarity result
DedupChecker-->>ImportHandler: isDuplicate bool
alt Not Duplicate
ImportHandler->>Converter: ConvertToObservation(instinct)
Converter-->>ImportHandler: ParsedObservation
ImportHandler->>ObsStore: StoreObservation(observation)
ObsStore-->>ImportHandler: stored observation
ImportHandler->>ObsStore: UpdateImportanceScore(importance)
ObsStore-->>ImportHandler: success
ImportHandler->>ImportHandler: Track: imported++
else Is Duplicate
ImportHandler->>ImportHandler: Track: skipped++
end
end
ImportHandler-->>Client: ImportResult {Total, Imported, Skipped, Errors}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the system's capabilities by introducing robust document and collection management features, alongside a dedicated system for importing external 'instincts' as guidance observations. These changes enhance the platform's ability to ingest, organize, and semantically search diverse content, thereby improving the overall knowledge base and enabling more sophisticated AI agent interactions. The integration of new database operations and service components lays the groundwork for a more intelligent and data-rich environment. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@coderabbitai review |
|
@gemini-code-assist review |
|
@codex review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Code Review
This pull request introduces collection management tools and an instinct import feature. The changes include new MCP tools, database migrations, and API endpoints. My review focuses on correctness, efficiency, and maintainability. I've identified a bug in the document ingestion logic, an inefficiency in file handling, an unhandled error in an HTTP handler, and some code duplication.
Note: Security Review did not run due to the size of the PR.
There was a problem hiding this comment.
Code Review
This pull request introduces significant new functionality by adding MCP tools for document collections and a mechanism for importing "instincts". However, a high-severity path traversal vulnerability was identified in the import_instincts tool and its corresponding HTTP endpoint, which allows reading arbitrary directories and files on the host system. Additionally, the review highlighted potential inconsistencies in scoring logic, areas for improved robustness in error handling for the new API endpoint, and opportunities to enhance code clarity and resource management.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
internal/instincts/importer.go (2)
15-57: Рассмотрите возможность возврата ошибки при критических сбоях.Функция
Importвсегда возвращаетnilкак error, даже если все instincts не удалось импортировать. Рассмотрите возможность возврата ошибки, еслиresult.Imported == 0 && len(result.Errors) > 0, чтобы вызывающий код мог обработать полный сбой импорта.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/instincts/importer.go` around lines 15 - 57, The Import function currently always returns nil error; modify it so after processing (before the final return) it checks the ImportResult (result.Imported and result.Errors) and returns a non-nil error when no instincts were imported and there are stored errors (i.e., result.Imported == 0 && len(result.Errors) > 0). Construct a clear error message (e.g., "no instincts imported: N errors") that includes summary info from result (counts or first error) to aid callers, and keep returning the result along with that error; leave successful paths unchanged. Ensure this change is made in the Import function and uses the existing ImportResult variable for the condition.
40-50: Непоследовательная обработка ошибок между StoreObservation и UpdateImportanceScore.Ошибки
StoreObservationдобавляются вresult.Errors(строка 42), но ошибкиUpdateImportanceScoreтолько логируются (строка 49). Если обновление importance score критично, ошибки должны обрабатываться единообразно. Если это осознанный выбор (score некритичен), рекомендуется добавить комментарий для ясности.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/instincts/importer.go` around lines 40 - 50, The error handling is inconsistent: obsStore.StoreObservation errors are appended to result.Errors while obsStore.UpdateImportanceScore errors are only logged; decide and apply a consistent strategy. Either treat UpdateImportanceScore as critical by appending its error to result.Errors (e.g., same fmt.Sprintf pattern with inst.ID) and continue, or if the score update is intentionally non-critical, add a clear comment above the UpdateImportanceScore call explaining that failures are safe to ignore and therefore only logged; update the code around obsStore.StoreObservation, obsStore.UpdateImportanceScore, InstinctImportanceScore and result.Errors accordingly.internal/instincts/converter.go (1)
13-20: Дублирование логики расчёта importance score.Логика масштабирования confidence в importance score (округление, clamping к диапазону 1-10) дублируется в
ConvertToObservation(строки 14-20) иInstinctImportanceScore(строки 49-57). Рекомендуется использовать helper-функцию.♻️ Предлагаемый рефакторинг
func ConvertToObservation(inst *Instinct) *models.ParsedObservation { - // Scale confidence (0.0-1.0) to importance score (1-10) - importance := math.Round(inst.Confidence * 10) - if importance < 1 { - importance = 1 - } - if importance > 10 { - importance = 10 - } - // Build concepts from domain var concepts []string if inst.Domain != "" { concepts = append(concepts, inst.Domain) } // Build tags preserving original metadata tags := []string{"instinct", fmt.Sprintf("source:%s", inst.Source)} if inst.ID != "" { tags = append(tags, fmt.Sprintf("instinct-id:%s", inst.ID)) } - _ = importance // stored separately; ParsedObservation has no importance field - return &models.ParsedObservation{Also applies to: 49-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/instincts/converter.go` around lines 13 - 20, The scaling/cla mping of inst.Confidence into an importance score is duplicated in ConvertToObservation and InstinctImportanceScore; extract that logic into a single helper function (e.g., computeImportance or scaleConfidenceToImportance) that takes a float64 confidence and returns the rounded, clamped float64/int in the 1-10 range, then replace the duplicated blocks in ConvertToObservation and InstinctImportanceScore with calls to that helper to ensure a single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/instincts/converter.go`:
- Line 39: The SourceType field in converter.go is set to an invalid string
literal "instinct-import"; update it to use one of the existing constants from
pkg/models/observation.go (e.g. SourceToolVerified, SourceToolRead,
SourceWebFetch, SourceTodoWrite, SourceLLMDerived, or SourceUnknown) or add a
new exported constant (e.g. SourceInstinctImport) to pkg/models/observation.go
and use that constant in converter.go (reference the SourceType assignment in
the converter code and the constants in pkg/models/observation.go to locate and
change the value).
In `@internal/instincts/importer.go`:
- Line 40: При импорте в importer.go не передавать пустые строки в
obsStore.StoreObservation (sdkSessionID и project) — это нарушает NOT NULL в
таблице observations; вместо этого установите явные значения по умолчанию
(например константы типа "imported" или специальный импортный
SDK_SESSION_ID/PROJECT) или используйте nullable/optional типы/обёртки согласно
сигнатуре StoreObservation; в коде упомяните инициализацию этих значений рядом с
вызовом obsStore.StoreObservation. Также исправьте логику очистки в
observation_store.go (функция очистки старых наблюдений) чтобы она не пропускала
записи с пустым project — либо включите пустой проект в критерии удаления (т.е.
удалять записи с project = '' как импортные), либо явно фильтруйте/помечайте
импортные записи и учитывайте эту метку в CleanupOldObservations/соответствующей
функции очистки.
In `@internal/mcp/server.go`:
- Around line 879-940: The tool list currently unconditionally advertises
"list_documents", "get_document", "ingest_document", "search_collection" and
"remove_document" even when dependencies are nil; update the code that
builds/returns the tools list (the code emitting the tool entries for the names
above) to conditionally include tools: only add "list_documents", "get_document"
and "remove_document" when documentStore != nil, and only add "ingest_document"
and "search_collection" when both documentStore != nil and embedSvc != nil
(optionally also require chunkManager for ingest if your ingestion path needs
it). Locate the block that constructs the tool descriptors (the map entries with
Name: "list_documents" etc.) and wrap each descriptor creation in the
appropriate nil-checks so only supported capabilities are advertised.
In `@internal/mcp/tools_documents.go`:
- Around line 213-221: The fast-path wrongly uses s.documentStore.SearchChunks
with a zero-vector to detect existing chunks, which returns nearest embeddings
(via embedSvc.Dimensions()) not chunks for a specific hash; replace this with an
exact hash lookup: query the document store for chunks where Hash == newHash
(e.g., implement/use a GetChunksByHash / FindChunks(filter{hash:newHash}) or add
a hash filter parameter to SearchChunks) and only return the "already
up-to-date" message when that exact-hash query returns results; update the block
that references existing, existing[0].Hash, doc.Hash and params.Content to rely
on the exact-hash query.
In `@internal/mcp/tools_instincts.go`:
- Around line 41-44: Импорт сейчас продолжает создавать записи когда вызов
дедупликации в internal/instincts/importer.go (вызов IsDuplicate) возвращает
ошибку — нужно изменить поведение: при ошибке дедупликации (т.е. когда
IsDuplicate возвращает ошибку, а не просто true/false) прерывать импорт и
вернуть ошибку из instincts.Import, либо заранее отказаться от запуска
import_instincts если отсутствует или недоступен s.vectorClient; обновите
функцию instincts.Import (и место вызова в tools_instincts.go) чтобы при ошибке
от dedup-check сразу возвращать ошибку (не логировать и продолжать) и/или
добавить проверку наличия рабочей s.vectorClient перед началом импорта.
In `@internal/worker/handlers_instincts.go`:
- Around line 20-42: The handler currently accepts params.Path and uses it
directly as dir (falling back to filepath.Join(home, ".claude", "homunculus",
"instincts")), which allows clients to point at arbitrary filesystem locations;
change this to always resolve incoming params.Path against a fixed base
directory and deny any path that escapes it (e.g., compute base :=
filepath.Join(os.UserHomeDir(), ".claude", "homunculus", "instincts") and then
join+clean the requested path and validate it has base as a prefix), move that
validation logic into a shared helper (e.g.,
ValidateAndResolveInstinctsPath(requested string) used by this handler and the
MCP tool), and remove direct filesystem access from using raw params.Path so
importer only ever sees paths inside the allowed base.
- Around line 23-26: The JSON decode currently swallows errors which lets an
empty params trigger a default import; change the decoding to capture the
returned error from json.NewDecoder(r.Body).Decode(¶ms) inside the handler
in internal/worker/handlers_instincts.go (where r.Body and params are used), and
if err != nil respond with HTTP 400 Bad Request (include a short error message)
and return instead of proceeding; keep the existing defer r.Body.Close() but do
not ignore the decode error.
---
Nitpick comments:
In `@internal/instincts/converter.go`:
- Around line 13-20: The scaling/cla mping of inst.Confidence into an importance
score is duplicated in ConvertToObservation and InstinctImportanceScore; extract
that logic into a single helper function (e.g., computeImportance or
scaleConfidenceToImportance) that takes a float64 confidence and returns the
rounded, clamped float64/int in the 1-10 range, then replace the duplicated
blocks in ConvertToObservation and InstinctImportanceScore with calls to that
helper to ensure a single source of truth.
In `@internal/instincts/importer.go`:
- Around line 15-57: The Import function currently always returns nil error;
modify it so after processing (before the final return) it checks the
ImportResult (result.Imported and result.Errors) and returns a non-nil error
when no instincts were imported and there are stored errors (i.e.,
result.Imported == 0 && len(result.Errors) > 0). Construct a clear error message
(e.g., "no instincts imported: N errors") that includes summary info from result
(counts or first error) to aid callers, and keep returning the result along with
that error; leave successful paths unchanged. Ensure this change is made in the
Import function and uses the existing ImportResult variable for the condition.
- Around line 40-50: The error handling is inconsistent:
obsStore.StoreObservation errors are appended to result.Errors while
obsStore.UpdateImportanceScore errors are only logged; decide and apply a
consistent strategy. Either treat UpdateImportanceScore as critical by appending
its error to result.Errors (e.g., same fmt.Sprintf pattern with inst.ID) and
continue, or if the score update is intentionally non-critical, add a clear
comment above the UpdateImportanceScore call explaining that failures are safe
to ignore and therefore only logged; update the code around
obsStore.StoreObservation, obsStore.UpdateImportanceScore,
InstinctImportanceScore and result.Errors accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c9ac492a-26c5-46d1-b846-7c9096507201
📒 Files selected for processing (15)
cmd/mcp-sse/main.gocmd/mcp/main.gointernal/db/gorm/document_store.gointernal/db/gorm/migrations.gointernal/db/gorm/models.gointernal/instincts/converter.gointernal/instincts/dedup.gointernal/instincts/importer.gointernal/instincts/parser.gointernal/instincts/types.gointernal/mcp/server.gointernal/mcp/tools_documents.gointernal/mcp/tools_instincts.gointernal/worker/handlers_instincts.gointernal/worker/service.go
🤖 PR Review MCP State (auto-managed, do not edit){
"version": 2,
"parentChildren": {},
"resolvedNitpicks": {
"coderabbit-nitpick-3c56d046-15": {
"resolvedAt": "2026-03-07T11:05:51.763Z",
"resolvedBy": "agent"
},
"coderabbit-nitpick-80196fe0-40": {
"resolvedAt": "2026-03-07T11:05:54.318Z",
"resolvedBy": "agent"
},
"coderabbit-nitpick-76171a03-13": {
"resolvedAt": "2026-03-07T11:07:32.285Z",
"resolvedBy": "agent"
},
"coderabbit-nitpick-09b7baee-12": {
"resolvedAt": "2026-03-07T11:16:22.386Z",
"resolvedBy": "agent"
},
"coderabbit-nitpick-0dc3626b-224": {
"resolvedAt": "2026-03-07T11:17:06.159Z",
"resolvedBy": "agent"
},
"coderabbit-nitpick-cbf08913-36": {
"resolvedAt": "2026-03-07T11:18:17.845Z",
"resolvedBy": "agent"
}
},
"updatedAt": "2026-03-07T11:18:18.296Z"
} |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
internal/worker/handlers_instincts.go (2)
23-25:⚠️ Potential issue | 🟠 MajorБитый JSON не должен запускать импорт по умолчанию.
Сейчас ошибка
Decodeтеряется,paramsостаётся пустым, и handler может реально запустить импорт из~/.claude/homunculus/instincts. Для такого bulk-endpoint здесь нужен400 Bad Request, а не молчаливый fallback.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_instincts.go` around lines 23 - 25, When decoding the request body into params in the instincts handler, don't ignore json.Decode errors: check the error returned by json.NewDecoder(r.Body).Decode(¶ms) and if it's a non-nil error (except a deliberate empty-body/EOF case if you want to allow it) respond with 400 Bad Request and abort the import; keep the defer r.Body.Close() but replace the current ignored decode with proper error handling so malformed JSON cannot fall back to using the default ~/.claude/homunculus/instincts data.
28-42:⚠️ Potential issue | 🟠 MajorНе принимайте произвольный filesystem path от клиента.
params.Pathуходит напрямую вos.Statи дальше в importer, так что аутентифицированный клиент может направить импорт на любой читаемый каталог хоста. Разрешайте только подкаталоги фиксированного base dir и вынесите resolve/validate в общий helper, чтобы HTTP и MCP не расходились.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/worker/handlers_instincts.go` around lines 28 - 42, Do not accept arbitrary filesystem paths from the client: replace direct use of params.Path in handlers_instincts.go (the os.Stat and downstream importer calls) with a validation/resolution helper that restricts paths to subdirectories of a fixed base directory (e.g., ~/.claude/homunculus/instincts). Implement a shared helper (e.g., ResolveAndValidateInstinctPath(baseDir, clientPath) used by both the HTTP handler and MCP) that joins the baseDir and clientPath, cleans the result (filepath.Clean), verifies the resulting path has the baseDir as its prefix (preventing path traversal), and returns an error for invalid or non-existent targets; then replace the current os.Stat(dir) check and any direct passing of params.Path to the importer with the helper's validated path.internal/mcp/server.go (1)
879-940:⚠️ Potential issue | 🟠 MajorНе рекламируйте document tools, когда зависимости не подняты.
В standalone wiring
cmd/mcp/main.goиcmd/mcp-sse/main.goпередаютnilдляdocumentStore,embedSvcиchunkManager, ноtools/listвсё равно объявляетlist_documents,get_document,ingest_document,search_collectionиremove_document. Клиент увидит capability, которая в этом режиме не может корректно отработать. Собирайте этот блок условно: list/get/remove приdocumentStore != nil, ingest/search — только когда готовы все нужные зависимости.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/mcp/server.go` around lines 879 - 940, The tool capability entries for document operations are always registered even when dependencies are nil; change the tool registration in internal/mcp/server.go so the "list_documents", "get_document", and "remove_document" entries are added only if documentStore != nil, and add "ingest_document" and "search_collection" only when documentStore != nil && embedSvc != nil && chunkManager != nil; update the block that builds the tools slice (where the entries for list_documents/get_document/ingest_document/search_collection/remove_document are defined) to conditionally append each tool based on those dependency checks so clients only see capabilities that can actually run.
🧹 Nitpick comments (3)
internal/instincts/converter.go (1)
12-20: Дублирование логики расчёта importance score.Логика масштабирования confidence (0.0-1.0) в importance (1-10) с clamping дублируется в
ConvertToObservation(строки 14-20) иInstinctImportanceScore(строки 50-56). При этом вConvertToObservationрезультат вычисляется, но не используется (строка 34).♻️ Предлагаемый рефакторинг
func ConvertToObservation(inst *Instinct) *models.ParsedObservation { - // Scale confidence (0.0-1.0) to importance score (1-10) - importance := math.Round(inst.Confidence * 10) - if importance < 1 { - importance = 1 - } - if importance > 10 { - importance = 10 - } - // Build concepts from domain var concepts []string if inst.Domain != "" { @@ -31,8 +23,6 @@ tags = append(tags, fmt.Sprintf("instinct-id:%s", inst.ID)) } - _ = importance // stored separately; ParsedObservation has no importance field - return &models.ParsedObservation{ Type: models.ObsTypeGuidance, MemoryType: models.MemTypeGuidance,Also applies to: 46-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/instincts/converter.go` around lines 12 - 20, ConvertToObservation currently duplicates the confidence→importance scaling/clamping (lines where importance is computed) and then doesn't use that computed value; instead, reuse the existing InstinctImportanceScore(inst *Instinct) function to compute the importance and remove the duplicated computation and unused variable in ConvertToObservation so ConvertToObservation calls InstinctImportanceScore(inst) (or otherwise references that function's result) and preserves single-source logic for the 0.0-1.0 → 1-10 mapping and clamping.internal/mcp/tools_documents.go (1)
224-264: Использование временного файла для chunker.Создание временного файла для передачи в chunker работает, но добавляет накладные расходы ввода-вывода. Если chunker поддерживает работу с
io.Readerили строками, рассмотрите рефакторинг для избежания записи на диск.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/mcp/tools_documents.go` around lines 224 - 264, Вместо всегда создания временного файла, измените блок, который использует s.chunkManager: если s.chunkManager реализует in-memory API (например методы вроде SupportsReader/SupportsString и ChunkFromReader/ChunkFromString или аналогичные), вызывайте эти методы с params.Content (или с bytes.NewReader), иначе падайте обратно на существующую проверку SupportsFile и использование ChunkFile; уберите постоянное создание tmpFile и os.CreateTemp path branch when in-memory path is available, keeping the temp-file fallback only when no in-memory API exists.internal/mcp/tools_instincts.go (1)
36-39: Возврат пустого результата вместо ошибки при отсутствии директории.Если директория не существует, функция возвращает информационное сообщение без ошибки. Это соответствует описанному поведению в PR, но может скрыть проблемы конфигурации от вызывающего кода.
Рекомендую добавить в результат явный признак, что операция не выполнена, либо возвращать ошибку для случаев, когда путь был явно указан пользователем.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/mcp/tools_instincts.go` around lines 36 - 39, When os.Stat(dir) reports the directory missing, don't return a non-error string; instead return a clear error. Replace the current return fmt.Sprintf("Instincts directory not found: %s", dir), nil with a failing result such as "", fmt.Errorf("instincts directory not found: %s", dir) (or wrap the underlying error) so callers receive a non-nil error; update any callers expecting the old behavior if needed. Ensure you reference the os.Stat(dir) check and the existing message string when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/db/gorm/document_store.go`:
- Around line 133-141: The current loop in the UpsertChunks path inserts NULL
embeddings when c.Embedding.Slice() is empty, which breaks vector search; update
the code that contains the loop (the UpsertChunks / the function invoking
stmt.ExecContext over chunks) to validate that c.Embedding is non-empty before
attempting the ExecContext call (for example, if c.Embedding.Slice() == 0 return
a clear error like "empty embedding for chunk Seq X") so no NULL is written to
the embedding column (or alternatively enforce NOT NULL at the DB level and fail
fast). Ensure you reference the chunk fields (c.Embedding, c.Seq) in the error
message so callers can diagnose which chunk failed.
In `@internal/db/gorm/migrations.go`:
- Around line 1027-1035: Migration "029_content_chunks_text" currently adds a
non-null text column but leaves existing rows empty; update the Migrate function
(alongside the tx.Exec that adds the column) to perform a backfill that
populates text for existing content_chunks (e.g., run an UPDATE via tx.Exec to
set text = <derived chunk text> based on existing fields or call the project’s
rechunk/backfill routine), and ensure the operation is safe for large tables
(batching or scheduling a background job if needed); keep Rollback unchanged.
In `@internal/instincts/dedup.go`:
- Around line 13-15: The current nil-check for vectorClient that returns "false,
nil" silently disables deduplication; change this in internal/instincts/dedup.go
by either making it fail-fast (return an error when vectorClient == nil) or
implementing a non-vector deterministic fallback: derive a stable dedup key from
the import metadata (e.g., import ID, timestamp, and stable hashing of the
import payload) and use that key for the existing duplicate check logic instead
of short-circuiting; update the branch around vectorClient and the function that
returns (false, nil) so callers receive an error or a consistent fallback
result, and ensure the unique symbols vectorClient and the dedup function
handling the return are adjusted accordingly.
- Around line 17-22: The dedup currently queries all vectors via
vectorClient.Query(ctx, title, 1, nil) which can match non-observation vectors;
update the Query call in dedup (the block using vectorClient.Query and checking
results[0].Similarity >= threshold) to include a where/filter limiting to
doc_type = "observation" (replace the nil filter arg with an appropriate filter
object or map that enforces doc_type == "observation") so only observation
vectors are considered for dedup.
---
Duplicate comments:
In `@internal/mcp/server.go`:
- Around line 879-940: The tool capability entries for document operations are
always registered even when dependencies are nil; change the tool registration
in internal/mcp/server.go so the "list_documents", "get_document", and
"remove_document" entries are added only if documentStore != nil, and add
"ingest_document" and "search_collection" only when documentStore != nil &&
embedSvc != nil && chunkManager != nil; update the block that builds the tools
slice (where the entries for
list_documents/get_document/ingest_document/search_collection/remove_document
are defined) to conditionally append each tool based on those dependency checks
so clients only see capabilities that can actually run.
In `@internal/worker/handlers_instincts.go`:
- Around line 23-25: When decoding the request body into params in the instincts
handler, don't ignore json.Decode errors: check the error returned by
json.NewDecoder(r.Body).Decode(¶ms) and if it's a non-nil error (except a
deliberate empty-body/EOF case if you want to allow it) respond with 400 Bad
Request and abort the import; keep the defer r.Body.Close() but replace the
current ignored decode with proper error handling so malformed JSON cannot fall
back to using the default ~/.claude/homunculus/instincts data.
- Around line 28-42: Do not accept arbitrary filesystem paths from the client:
replace direct use of params.Path in handlers_instincts.go (the os.Stat and
downstream importer calls) with a validation/resolution helper that restricts
paths to subdirectories of a fixed base directory (e.g.,
~/.claude/homunculus/instincts). Implement a shared helper (e.g.,
ResolveAndValidateInstinctPath(baseDir, clientPath) used by both the HTTP
handler and MCP) that joins the baseDir and clientPath, cleans the result
(filepath.Clean), verifies the resulting path has the baseDir as its prefix
(preventing path traversal), and returns an error for invalid or non-existent
targets; then replace the current os.Stat(dir) check and any direct passing of
params.Path to the importer with the helper's validated path.
---
Nitpick comments:
In `@internal/instincts/converter.go`:
- Around line 12-20: ConvertToObservation currently duplicates the
confidence→importance scaling/clamping (lines where importance is computed) and
then doesn't use that computed value; instead, reuse the existing
InstinctImportanceScore(inst *Instinct) function to compute the importance and
remove the duplicated computation and unused variable in ConvertToObservation so
ConvertToObservation calls InstinctImportanceScore(inst) (or otherwise
references that function's result) and preserves single-source logic for the
0.0-1.0 → 1-10 mapping and clamping.
In `@internal/mcp/tools_documents.go`:
- Around line 224-264: Вместо всегда создания временного файла, измените блок,
который использует s.chunkManager: если s.chunkManager реализует in-memory API
(например методы вроде SupportsReader/SupportsString и
ChunkFromReader/ChunkFromString или аналогичные), вызывайте эти методы с
params.Content (или с bytes.NewReader), иначе падайте обратно на существующую
проверку SupportsFile и использование ChunkFile; уберите постоянное создание
tmpFile и os.CreateTemp path branch when in-memory path is available, keeping
the temp-file fallback only when no in-memory API exists.
In `@internal/mcp/tools_instincts.go`:
- Around line 36-39: When os.Stat(dir) reports the directory missing, don't
return a non-error string; instead return a clear error. Replace the current
return fmt.Sprintf("Instincts directory not found: %s", dir), nil with a failing
result such as "", fmt.Errorf("instincts directory not found: %s", dir) (or wrap
the underlying error) so callers receive a non-nil error; update any callers
expecting the old behavior if needed. Ensure you reference the os.Stat(dir)
check and the existing message string when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b0ab25a8-3747-4bfa-b9af-5f38e6987b6a
📒 Files selected for processing (15)
cmd/mcp-sse/main.gocmd/mcp/main.gointernal/db/gorm/document_store.gointernal/db/gorm/migrations.gointernal/db/gorm/models.gointernal/instincts/converter.gointernal/instincts/dedup.gointernal/instincts/importer.gointernal/instincts/parser.gointernal/instincts/types.gointernal/mcp/server.gointernal/mcp/tools_documents.gointernal/mcp/tools_instincts.gointernal/worker/handlers_instincts.gointernal/worker/service.go
…REAKING) Consolidates the BREAKING CHANGE from PR #176 (legacy MCP/CLI removal) with two user-visible plugin tactical fixes into a single coherent v4.5.0 release. ## BREAKING CHANGES (carried over from PR #176) The 'engram-mcp' and 'engram-mcp-stdio-proxy' release artifacts no longer ship. Bare-metal users who installed via tarball or scripts/install.sh and relied on the engram-mcp binary must migrate to the plugin marketplace install path: /plugin marketplace add thebtf/engram-marketplace /plugin install engram Plugin marketplace users are unaffected — ensure-binary.js downloads only the 'engram' stdio-proxy binary, not engram-mcp. cmd/mcp/, cmd/engram-cli/, the Dockerfile client image stage, the engram-mcp Makefile + goreleaser + install.sh entries, and the bin/mcp-server / bin/mcp-stdio-proxy COMPONENTS.md sections are all gone. ## Fix #1 — MCP startup reliability after CC plugin update Smoking gun (.agent/reports/plugin-tactical-fix-triage.md): when CC updates the plugin from v4.3.0 to v4.4.x it does NOT migrate plugin-level user_config (server_url + api_token). .mcp.json then expands ${user_config.server_url} to an empty string, the binary spawns silently, and every MCP tool call fails with an opaque gRPC dial error against an empty target. Fix: - plugin/engram/.mcp.json: env block now also passes "ENGRAM_URL_LEGACY": "${ENGRAM_URL}" so users who had ENGRAM_URL in their shell from the v4.3.0 setup still get a working MCP server. - plugin/engram/scripts/run-engram.js: before spawning, fall back to ENGRAM_URL_LEGACY if ENGRAM_URL is empty, AND emit a visible WARN to stderr if both are empty so the failure mode is no longer silent. ## Fix #2 — disable noisy context injection Per .agent/reports/phase-2-synthesis.md fix #16 + the entity audits, the session-start hook currently injects 100 raw observations unioned with 10 semantic hits, plus project briefing, plus learned guidance. Reported as noise rather than relevant context. plugin/engram/hooks/session-start.js now keeps the inject GET (still needed for result.always_inject) but renders ONLY: - result.always_inject -> <user-behavior-rules> - GET /api/issues -> <engram-issues> - GET /api/issues resolved -> <engram-resolved-issues> Skipped: result.observations / result.full_count / result.project_briefing / result.guidance. mark-injected scoped to always_inject IDs only so citation tracking does not log false positives. Misleading observation- count log line removed. Save / recall MCP tools (recall_memory, store_memory, find_by_file, observations, store, feedback, vault, issues) are completely independent of this hook — separate gRPC path, unaffected. This is tactical, NOT a redesign. Phase 2 strategic work continues: citation session_id="" smoking gun, 100-obs cap policy, hook curated-render redesign — all in .agent/reports/phase-2-synthesis.md. ## Version bump Per Constitution §15 the daemon version and plugin version move together: - cmd/engram/main.go daemonVersion "v4.4.0" -> "v4.5.0" - plugin/engram/.claude-plugin/plugin.json "4.4.0" -> "4.5.0" ## Verification - JSON parse on .mcp.json + plugin.json: clean - node --check on run-engram.js + session-start.js: clean - go build ./... clean - 3-OS matrix CI green on the prior 4.4.1 commit (re-runs on amend) ## Files 4 plugin/daemon files + 1 daemon version constant. Net: -76 LOC (mostly removed render blocks in session-start.js).
…REAKING) (#177) Consolidates BREAKING from PR #176 + 2 plugin tactical fixes into v4.5.0. BREAKING: 'engram-mcp' and 'engram-mcp-stdio-proxy' release artifacts no longer ship. Bare-metal users migrate to plugin marketplace path. Plugin marketplace users unaffected (ensure-binary.js downloads only the 'engram' stdio-proxy binary). Fix #1 — MCP startup reliability after CC plugin update: - .mcp.json adds ENGRAM_URL_LEGACY = ${ENGRAM_URL} fallback - run-engram.js falls back if user_config-derived ENGRAM_URL is empty, + visible WARN to stderr if both empty (no more silent failure) Fix #2 — Disable noisy context injection: - session-start.js keeps inject GET (still needed for always_inject) but renders ONLY result.always_inject + 2 issues fetches - Skipped: result.observations / result.full_count / result.project_briefing / result.guidance - mark-injected scoped to always_inject IDs only - All MCP save/recall tools unchanged (separate gRPC path) Version bump per Constitution §15: - cmd/engram/main.go daemonVersion v4.4.0 -> v4.5.0 - plugin/engram/.claude-plugin/plugin.json 4.4.0 -> 4.5.0 Phase 2 strategic redesign continues separately — see .agent/reports/phase-2-synthesis.md for the 21-item fix priority matrix.
Summary
list_collections,list_documents,get_document,ingest_document,search_collection,remove_document,import_instinctsinternal/instincts/package: YAML frontmatter parser, observation converter, vector similarity dedup, import orchestratortextcolumn oncontent_chunkstabledocument_storeto use raw SQL for pgvector embedding insertionDocumentStore,EmbedSvc,ChunkManagerinto MCP server constructorPOST /api/instincts/importHTTP endpointContext
Implements roadmap items #9 (Collection MCP Tools) and #7 (ECC Instinct Import).
Spec:
.agent/specs/collection-mcp-tools.md,.agent/specs/instinct-import.mdTest plan
go build ./...compiles cleantools/listingest_documentcreates document with chunks and embeddingssearch_collectionreturns ranked results with chunk textlist_collectionsreturns collections with doc countsimport_instinctscreates guidance observations from instinct filesimport_instinctsis idempotent (0 new)Summary by CodeRabbit
Новые функции