feat: typed metadata tables + expressive REST query DSL

## Summary

Generate typed PostgreSQL tables from Schema `FieldDefinition[]` (mirroring the existing feature-table DDL pattern) and extend the existing `/discovery/*` REST endpoints to support compound filters (nested AND/OR), pgvector similarity, and full-text search. This gives records, metadata, and features a unified typed query surface using the DDD/CQRS machinery that already exists.

## Context

- Records have metadata (currently JSONB in `records.metadata`) and features (typed PG tables in `features.*`)
- Feature tables are already generated dynamically via `build_feature_table()` in `server/osa/infrastructure/persistence/feature_table.py`
- Current discovery surface: `POST /discovery/records` + `POST /discovery/features/{hook}` — flat AND filters with `EQ / CONTAINS / GTE / LTE`, keyset pagination
- `DiscoveryReadStore` protocol in `server/osa/domain/discovery/port/read_store.py` already abstracts the read path — the service (`service/discovery.py`) does validation + operator whitelisting
- Gap: metadata is raw JSONB (no typed filtering), no OR/compound logic, no vector or full-text operators, no cross-table joins

The expressive query surface (GraphQL vs MCP vs typed REST DSL) for LLM and agent consumers is tracked separately in #125 — that decision builds on top of this one.

## The metadata gap

`records.metadata` is JSONB. You can filter via PG JSONB containment but you cannot express typed range queries, cannot index effectively, and cannot expose typed filters through any auto-generated API layer.

Fix: generate typed metadata tables from Schema `FieldDefinition[]`, same pattern as the existing feature tables. A convention with a schema defining `species: term`, `resolution: number`, `title: text` gets a `metadata.<convention_name>` table with real typed columns.

### Dynamic tables today

Feature tables use `build_feature_table()` with column type mapping in `column_mapper.py`:

| JSON type | Format | PostgreSQL type |
|-----------|--------|-----------------|
| string | — | Text |
| string | date-time | DateTime(tz) |
| string | date | Date |
| string | uuid | Uuid |
| number | — | Float(53) |
| integer | — | BigInteger |
| boolean | — | Boolean |
| array | — | JSONB |
| object | — | JSONB |

Feature tables live in the `features` PG schema, tracked in a `feature_tables` catalog table. Metadata tables follow the identical pattern in a `metadata` PG schema, tracked in a `metadata_tables` catalog.

## Query DSL extensions

Extend `Filter` and `DiscoveryReadStore` to support:

1. **Compound groups** — nested `{and: [...]}` / `{or: [...]}` / `{not: ...}` trees instead of a flat AND list
2. **Cross-table join** — filter records by feature column values (e.g. `records where features.cell_classifier.confidence > 0.9`)
3. **Vector similarity** — pgvector operator as a first-class filter + `orderBy` (requires a vector column on the relevant table)
4. **Full-text** — tsvector `@@` operator with `ts_rank` ordering

Validation lives in `DiscoveryService` (already the pattern). SQL compilation lives in the `PostgresDiscoveryReadStore` adapter (already the pattern). No new service, no new process.

## Implementation steps

### Phase 1: Typed metadata tables
- [ ] Generate metadata PG tables from Schema `FieldDefinition[]` (replicate feature-table DDL pattern in a `metadata` PG schema)
- [ ] Add `metadata_tables` catalog (mirrors `feature_tables`)
- [ ] FK constraint from feature tables to records (#75)
- [ ] FK constraint from metadata tables to records
- [ ] Populate metadata tables during record publication (ingest + deposition paths)
- [ ] Migrate existing records' JSONB metadata → typed tables (one-time)
- [ ] Deprecate direct reads from `records.metadata` JSONB

### Phase 2: Compound filter DSL
- [ ] Design the filter AST (nested AND/OR/NOT groups over existing operators)
- [ ] Extend `Filter` value object in `domain/discovery/model/value.py`
- [ ] Extend validation in `DiscoveryService` to walk compound trees
- [ ] Extend `PostgresDiscoveryReadStore` to compile compound filters to SQLAlchemy
- [ ] Update `POST /discovery/records` and `POST /discovery/features/{hook}` request schemas
- [ ] Maintain backward compat with the current flat-AND shape for one release

### Phase 3: Cross-table joins
- [ ] Allow record filters to reference feature-table columns (e.g. `features.<hook>.<column>`)
- [ ] Compile to SQL JOINs with proper indexing

### Phase 4: Vector + full-text operators
- [ ] Add pgvector similarity filter + orderBy
- [ ] Add tsvector full-text filter with ts_rank ordering
- [ ] Document indexing requirements for each

### Phase 5: Parquet export
- [ ] `GET /export/conventions/{srn}/records.parquet` — query metadata + feature tables, stream Parquet via PyArrow
- [ ] Optional filter query parameters (reuse the compound DSL)

## Depends on

- #75 (FK constraint from feature tables to records)

## Related

- #125 (expressive query surface for LLM/agent consumers — GraphQL vs MCP vs typed REST DSL — built on top of this)
- #70 (migrate vector embedding to hook)
- #60 (Dublin Core field identity — enables cross-convention queries)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: typed metadata tables + expressive REST query DSL #76

Summary

Context

The metadata gap

Dynamic tables today

Query DSL extensions

Implementation steps

Phase 1: Typed metadata tables

Phase 2: Compound filter DSL

Phase 3: Cross-table joins

Phase 4: Vector + full-text operators

Phase 5: Parquet export

Depends on

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JSON type	Format	PostgreSQL type
string	—	Text
string	date-time	DateTime(tz)
string	date	Date
string	uuid	Uuid
number	—	Float(53)
integer	—	BigInteger
boolean	—	Boolean
array	—	JSONB
object	—	JSONB

feat: typed metadata tables + expressive REST query DSL #76

Description

Summary

Context

The metadata gap

Dynamic tables today

Query DSL extensions

Implementation steps

Phase 1: Typed metadata tables

Phase 2: Compound filter DSL

Phase 3: Cross-table joins

Phase 4: Vector + full-text operators

Phase 5: Parquet export

Depends on

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions