feat(filter): metadata retrieval filter#53
Merged
Intrinsical-AI merged 6 commits intodevelopfrom Mar 15, 2026
Merged
Conversation
…gacy metadata fields Promotes normalize_filter_values, document_field_values, document_matches_filters into the domain module (retrieval.py). Adds snapshot_id to TOP_LEVEL_FILTER_FIELDS. Removes LEGACY_METADATA_FILTER_FIELDS (path, language, unit_type). Drops legacy (str, k) overload from RetrieverPort and EvalRetrieverPort — protocol is now retrieve(request: RetrievalRequest) -> RetrievalResult only. Renames list_docs_page → query_docs with filters param in DocsReadPort. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ends All retrievers (dense, sparse BM25, hybrid, elastic-like, local-split, Solr, reranking, elastic-lexical) now implement retrieve(request: RetrievalRequest) only. Removes all @overload stubs and legacy dispatch branches. local_split.py switches to domain-level document_matches_filters. Elastic/Solr filter fields updated (snapshot_id in, path/language/unit_type out). _ElasticLexicalRetriever and _RepoDocsReadPort.query_docs gain filter support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New DocsQueryRequest schema with limit, offset, and filters fields. DocumentInDB gains external_id, source_id, and metadata. Routers wired to the new query_docs port method. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adapts all tests to the unified retrieve(RetrievalRequest) signatures and the new docs query/filter contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates test_dense_edgecases, test_hybrid_weighting, test_retrievers, test_sparse_empty, test_sparse_in_memory_cache, test_sparse_tokenization, test_composition, and the dense/hybrid e2e to use RetrievalRequest instead of the legacy (str, k) overload. Edge-case tests for blank query and top_k=0 now assert ValueError at RetrievalRequest construction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ef745ff docs: update README and USAGE for metadata filter fields
cd82ff0 test: update unit/integration/e2e for new retriever API and docs filter
11f11e0 feat(docs-api): add filter support to docs listing endpoint
46306c1 refactor(retrievers): drop legacy str-query overloads across all backends
5ec14fd refactor(domain): centralise filter helpers, add snapshot_id, drop legacy metadata fields