Summary
Generate typed PostgreSQL tables from Schema FieldDefinition[] (mirroring the existing feature-table DDL pattern) and extend the existing /discovery/* REST endpoints to support compound filters (nested AND/OR), pgvector similarity, and full-text search. This gives records, metadata, and features a unified typed query surface using the DDD/CQRS machinery that already exists.
Context
- Records have metadata (currently JSONB in
records.metadata) and features (typed PG tables in features.*)
- Feature tables are already generated dynamically via
build_feature_table() in server/osa/infrastructure/persistence/feature_table.py
- Current discovery surface:
POST /discovery/records + POST /discovery/features/{hook} — flat AND filters with EQ / CONTAINS / GTE / LTE, keyset pagination
DiscoveryReadStore protocol in server/osa/domain/discovery/port/read_store.py already abstracts the read path — the service (service/discovery.py) does validation + operator whitelisting
- Gap: metadata is raw JSONB (no typed filtering), no OR/compound logic, no vector or full-text operators, no cross-table joins
The expressive query surface (GraphQL vs MCP vs typed REST DSL) for LLM and agent consumers is tracked separately in #125 — that decision builds on top of this one.
The metadata gap
records.metadata is JSONB. You can filter via PG JSONB containment but you cannot express typed range queries, cannot index effectively, and cannot expose typed filters through any auto-generated API layer.
Fix: generate typed metadata tables from Schema FieldDefinition[], same pattern as the existing feature tables. A convention with a schema defining species: term, resolution: number, title: text gets a metadata.<convention_name> table with real typed columns.
Dynamic tables today
Feature tables use build_feature_table() with column type mapping in column_mapper.py:
| JSON type |
Format |
PostgreSQL type |
| string |
— |
Text |
| string |
date-time |
DateTime(tz) |
| string |
date |
Date |
| string |
uuid |
Uuid |
| number |
— |
Float(53) |
| integer |
— |
BigInteger |
| boolean |
— |
Boolean |
| array |
— |
JSONB |
| object |
— |
JSONB |
Feature tables live in the features PG schema, tracked in a feature_tables catalog table. Metadata tables follow the identical pattern in a metadata PG schema, tracked in a metadata_tables catalog.
Query DSL extensions
Extend Filter and DiscoveryReadStore to support:
- Compound groups — nested
{and: [...]} / {or: [...]} / {not: ...} trees instead of a flat AND list
- Cross-table join — filter records by feature column values (e.g.
records where features.cell_classifier.confidence > 0.9)
- Vector similarity — pgvector operator as a first-class filter +
orderBy (requires a vector column on the relevant table)
- Full-text — tsvector
@@ operator with ts_rank ordering
Validation lives in DiscoveryService (already the pattern). SQL compilation lives in the PostgresDiscoveryReadStore adapter (already the pattern). No new service, no new process.
Implementation steps
Phase 1: Typed metadata tables
Phase 2: Compound filter DSL
Phase 3: Cross-table joins
Phase 4: Vector + full-text operators
Phase 5: Parquet export
Depends on
Related
Summary
Generate typed PostgreSQL tables from Schema
FieldDefinition[](mirroring the existing feature-table DDL pattern) and extend the existing/discovery/*REST endpoints to support compound filters (nested AND/OR), pgvector similarity, and full-text search. This gives records, metadata, and features a unified typed query surface using the DDD/CQRS machinery that already exists.Context
records.metadata) and features (typed PG tables infeatures.*)build_feature_table()inserver/osa/infrastructure/persistence/feature_table.pyPOST /discovery/records+POST /discovery/features/{hook}— flat AND filters withEQ / CONTAINS / GTE / LTE, keyset paginationDiscoveryReadStoreprotocol inserver/osa/domain/discovery/port/read_store.pyalready abstracts the read path — the service (service/discovery.py) does validation + operator whitelistingThe expressive query surface (GraphQL vs MCP vs typed REST DSL) for LLM and agent consumers is tracked separately in #125 — that decision builds on top of this one.
The metadata gap
records.metadatais JSONB. You can filter via PG JSONB containment but you cannot express typed range queries, cannot index effectively, and cannot expose typed filters through any auto-generated API layer.Fix: generate typed metadata tables from Schema
FieldDefinition[], same pattern as the existing feature tables. A convention with a schema definingspecies: term,resolution: number,title: textgets ametadata.<convention_name>table with real typed columns.Dynamic tables today
Feature tables use
build_feature_table()with column type mapping incolumn_mapper.py:Feature tables live in the
featuresPG schema, tracked in afeature_tablescatalog table. Metadata tables follow the identical pattern in ametadataPG schema, tracked in ametadata_tablescatalog.Query DSL extensions
Extend
FilterandDiscoveryReadStoreto support:{and: [...]}/{or: [...]}/{not: ...}trees instead of a flat AND listrecords where features.cell_classifier.confidence > 0.9)orderBy(requires a vector column on the relevant table)@@operator withts_rankorderingValidation lives in
DiscoveryService(already the pattern). SQL compilation lives in thePostgresDiscoveryReadStoreadapter (already the pattern). No new service, no new process.Implementation steps
Phase 1: Typed metadata tables
FieldDefinition[](replicate feature-table DDL pattern in ametadataPG schema)metadata_tablescatalog (mirrorsfeature_tables)records.metadataJSONBPhase 2: Compound filter DSL
Filtervalue object indomain/discovery/model/value.pyDiscoveryServiceto walk compound treesPostgresDiscoveryReadStoreto compile compound filters to SQLAlchemyPOST /discovery/recordsandPOST /discovery/features/{hook}request schemasPhase 3: Cross-table joins
features.<hook>.<column>)Phase 4: Vector + full-text operators
Phase 5: Parquet export
GET /export/conventions/{srn}/records.parquet— query metadata + feature tables, stream Parquet via PyArrowDepends on
Related