Skip to content

feat(search-engine): add Elasticsearch backend and canonical import/sync#44

Closed
Intrinsical-AI wants to merge 13 commits intomasterfrom
feat/elastic-backend
Closed

feat(search-engine): add Elasticsearch backend and canonical import/sync#44
Intrinsical-AI wants to merge 13 commits intomasterfrom
feat/elastic-backend

Conversation

@Intrinsical-AI
Copy link
Owner

Summary

  • Elasticsearch persistence backend: full document storage, history,
    system state, diagnostics, and index mapping management; wired through
    composition and runtime alongside the existing SQL backend.
  • scope / snapshot_id fields: propagated across the domain model,
    mutation contracts, both persistence adapters (SQL + ES), HTTP schemas,
    and CLI — enabling documents to be tagged with an origin scope and version.
  • Canonical import endpoint (POST /api/docs/import-canonical) and
    CLI command (rag-import-canonical): declarative snapshot-based sync
    that upserts a scoped document collection and optionally hard-deletes stale
    docs not present in the current snapshot (replace_scope mode).

Test plan

  • Unit tests pass for SQL and Elasticsearch document storage (scope persistence, list_external_ids_by_scope)
  • Unit tests pass for canonical import endpoint (upsert-only and replace_scope paths, duplicate external_id rejection)
  • Unit tests pass for rag-import-canonical CLI (sync + cross-snapshot deletion)
  • CLI registry test confirms import-canonical command and entry point are wired
  • Elasticsearch backend runtime tests pass (client, mappings, diagnostics)
  • Spin up stack with docker-compose up, index docs via rag-import-canonical, verify scope-aware deletion

Add nullable scope and snapshot_id fields to the SQL Document table and
Elasticsearch index mappings, enabling external producers to tag documents
with an origin scope and version snapshot.
…ields

Extend UpsertDocBuilderPort, MutationUpsertInput, and normalization/serialization
to carry scope and snapshot_id. Add CanonicalImportSummary result type for
reporting insert/update/delete statistics after a canonical import.
…xecutors

Wire the new fields from MutationUpsertInput into the upsert builder calls
in both the atomic and saga mutation execution paths.
…nal_ids_by_scope()

Update SQL and Elasticsearch repositories to read/write scope and snapshot_id
in upsert, change detection, and domain mapping. Add list_external_ids_by_scope()
to both stores to support stale-doc deletion in canonical imports.
Introduce CanonicalImportRequest/Response schemas with scope, snapshot_id,
replace_scope flag, and duplicate external_id validation. Add the endpoint
behind the multi-store write lock. Extend existing mutate endpoint schemas
to accept scope/snapshot_id on individual upsert items.
Implement execute_import_canonical_sync() use case: batched upsert (256 docs)
with optional replace_scope hard-deletion of stale documents not present in
the current snapshot. Wire as rag-import-canonical CLI entry point and register
in the CLI group.
Document the new import-canonical HTTP endpoint and CLI command with curl
and CLI invocation examples. Note scope/snapshot synchronization semantics
and replace_scope behavior.
@Intrinsical-AI
Copy link
Owner Author

Unstable - mergin to dev first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant