-
Notifications
You must be signed in to change notification settings - Fork 0
feat: typed metadata tables + compound discovery DSL (#76) #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
8167cea
feat: add compound filter expressions and typed metadata tables
rorybyrne a356770
refactor: replace schema SRN with short-form schema ID throughout system
rorybyrne 7041682
refactor: replace async metadata projection with synchronous dual-write
rorybyrne 7de6cea
refactor: remove data backfill from database migrations for greenfiel…
rorybyrne 044fb9d
fix: add SQL injection protection for PostgreSQL identifiers
rorybyrne bb27aa9
fix: handle NULL values in SQL filter operations to prevent silent re…
rorybyrne 3cabcfa
fix: replace assert statements with explicit TypeError exceptions
rorybyrne f065491
feat: improve error handling for date/datetime coercion in metadata s…
rorybyrne 2ad7d8b
feat: add HookName type with 40-char limit for PostgreSQL compatibility
rorybyrne File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
95 changes: 95 additions & 0 deletions
95
server/migrations/versions/076_add_feature_tables_record_srn_fks.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| """076_add_feature_tables_record_srn_fks | ||
|
|
||
| For each row currently registered in the ``public.feature_tables`` catalog, | ||
| add a foreign-key constraint on ``features.<hook>.record_srn`` referencing | ||
| ``records.srn`` with ``ON DELETE CASCADE``. Bundles GitHub #75. | ||
|
|
||
| Idempotent: skips any hook whose FK is already present (detected by naming | ||
| convention). No-op on greenfield deployments where the catalog is empty. | ||
|
|
||
| Revision ID: 076_feature_fks | ||
| Revises: 076_records_schema_srn | ||
| Create Date: 2026-04-19 | ||
|
|
||
| """ | ||
|
|
||
| import re | ||
| from typing import Sequence, Union | ||
|
|
||
| from alembic import op | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision: str = "076_feature_fks" | ||
| down_revision: Union[str, Sequence[str], None] = "076_records_schema_srn" | ||
| branch_labels: Union[str, Sequence[str], None] = None | ||
| depends_on: Union[str, Sequence[str], None] = None | ||
|
|
||
|
|
||
| FK_NAME_TEMPLATE = "fk_features_{hook}_record_srn" | ||
|
|
||
| # Defense-in-depth: hook names read from ``feature_tables`` are interpolated | ||
| # into raw DDL below. Application code constrains hooks to this shape at write | ||
| # time, but the migration should not trust that invariant — a stray ``"`` in a | ||
| # stored name would break out of quoting. Mirrors the ``_safe_ident`` check in | ||
| # ``osa.infrastructure.persistence.metadata_store``. | ||
| _PG_IDENT_RE = re.compile(r"^[a-z][a-z0-9_]{0,62}$") | ||
|
|
||
|
|
||
| def _safe_ident(name: str) -> str: | ||
| if not _PG_IDENT_RE.match(name): | ||
| raise ValueError(f"Refusing to interpolate unsafe PG identifier {name!r} into DDL") | ||
| return name | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| conn = op.get_bind() | ||
| rows = conn.execute(_select_hooks()).fetchall() | ||
|
|
||
| for row in rows: | ||
| hook = _safe_ident(row[0]) | ||
| fk_name = _safe_ident(FK_NAME_TEMPLATE.format(hook=hook)) | ||
| exists = conn.execute(_check_constraint(fk_name)).scalar() | ||
| if exists: | ||
| continue | ||
|
|
||
| conn.execute(_add_fk_sql(hook, fk_name)) | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| conn = op.get_bind() | ||
| rows = conn.execute(_select_hooks()).fetchall() | ||
| for row in rows: | ||
| hook = _safe_ident(row[0]) | ||
| fk_name = _safe_ident(FK_NAME_TEMPLATE.format(hook=hook)) | ||
| exists = conn.execute(_check_constraint(fk_name)).scalar() | ||
| if not exists: | ||
| continue | ||
| conn.execute(_drop_fk_sql(hook, fk_name)) | ||
|
|
||
|
|
||
| def _select_hooks(): | ||
| from sqlalchemy import text | ||
|
|
||
| return text("SELECT hook_name FROM feature_tables") | ||
|
|
||
|
|
||
| def _check_constraint(fk_name: str): | ||
| from sqlalchemy import text | ||
|
|
||
| return text("SELECT 1 FROM pg_constraint WHERE conname = :fk_name").bindparams(fk_name=fk_name) | ||
|
|
||
|
|
||
| def _add_fk_sql(hook: str, fk_name: str): | ||
| from sqlalchemy import text | ||
|
|
||
| return text( | ||
| f'ALTER TABLE features."{hook}" ' | ||
| f'ADD CONSTRAINT "{fk_name}" ' | ||
| f"FOREIGN KEY (record_srn) REFERENCES records(srn) ON DELETE CASCADE" | ||
| ) | ||
|
|
||
|
|
||
| def _drop_fk_sql(hook: str, fk_name: str): | ||
| from sqlalchemy import text | ||
|
|
||
| return text(f'ALTER TABLE features."{hook}" DROP CONSTRAINT "{fk_name}"') | ||
47 changes: 47 additions & 0 deletions
47
server/migrations/versions/076_add_metadata_schema_and_catalog.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| """076_add_metadata_schema_and_catalog | ||
|
|
||
| Create the ``metadata`` PostgreSQL schema and the ``public.metadata_tables`` | ||
| catalog table. Dynamic per-schema metadata tables will live inside the | ||
| ``metadata`` schema; the catalog indexes them by short schema id + major. | ||
|
|
||
| Revision ID: 076_metadata_catalog | ||
| Revises: add_deliver_after | ||
| Create Date: 2026-04-19 | ||
|
|
||
| """ | ||
|
|
||
| from typing import Sequence, Union | ||
|
|
||
| import sqlalchemy as sa | ||
| from alembic import op | ||
| from sqlalchemy.dialects.postgresql import JSONB | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision: str = "076_metadata_catalog" | ||
| down_revision: Union[str, Sequence[str], None] = "add_deliver_after" | ||
| branch_labels: Union[str, Sequence[str], None] = None | ||
| depends_on: Union[str, Sequence[str], None] = None | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| op.execute('CREATE SCHEMA IF NOT EXISTS "metadata"') | ||
|
|
||
| op.create_table( | ||
| "metadata_tables", | ||
| sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True), | ||
| sa.Column("schema_id", sa.Text(), nullable=False), | ||
| sa.Column("schema_slug", sa.Text(), nullable=False), | ||
| sa.Column("schema_major", sa.Integer(), nullable=False), | ||
| sa.Column("schema_versions", JSONB(), nullable=False), | ||
| sa.Column("pg_table", sa.Text(), nullable=False), | ||
| sa.Column("metadata_schema", JSONB(), nullable=False), | ||
| sa.Column("created_at", sa.DateTime(timezone=True), nullable=False), | ||
| sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False), | ||
| sa.UniqueConstraint("schema_id", "schema_major", name="uq_metadata_tables_id_major"), | ||
| sa.UniqueConstraint("pg_table", name="uq_metadata_tables_pg_table"), | ||
| ) | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| op.drop_table("metadata_tables") | ||
| op.execute('DROP SCHEMA IF EXISTS "metadata" CASCADE') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| """076_add_records_schema_id | ||
|
|
||
| Add ``records.schema_id`` + ``records.schema_version`` so a Record's typed | ||
| linkage is first-class (FR-008). | ||
|
|
||
| Greenfield only: no backfill from the linked convention. If this runs | ||
| against a populated ``records`` table it fails at ``SET NOT NULL`` with a | ||
| clear constraint error, which is the correct signal that the data predates | ||
| this schema. | ||
|
|
||
| Revision ID: 076_records_schema_srn | ||
| Revises: 076_schemas_to_id | ||
| Create Date: 2026-04-19 | ||
|
|
||
| """ | ||
|
|
||
| from typing import Sequence, Union | ||
|
|
||
| import sqlalchemy as sa | ||
| from alembic import op | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision: str = "076_records_schema_srn" | ||
| down_revision: Union[str, Sequence[str], None] = "076_schemas_to_id" | ||
| branch_labels: Union[str, Sequence[str], None] = None | ||
| depends_on: Union[str, Sequence[str], None] = None | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| op.add_column("records", sa.Column("schema_id", sa.Text(), nullable=True)) | ||
| op.add_column("records", sa.Column("schema_version", sa.Text(), nullable=True)) | ||
| op.alter_column("records", "schema_id", nullable=False) | ||
| op.alter_column("records", "schema_version", nullable=False) | ||
| op.create_index("idx_records_schema_id", "records", ["schema_id"]) | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| op.drop_index("idx_records_schema_id", table_name="records") | ||
| op.drop_column("records", "schema_version") | ||
| op.drop_column("records", "schema_id") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| """076_schemas_to_id | ||
|
|
||
| Replace URN-keyed ``schemas`` and ``conventions`` columns with short-form | ||
| ``(id, version)`` pairs. After this migration, internal code works entirely | ||
| in ``SchemaId``; full URNs are reserved for federation edges. | ||
|
|
||
| Changes: | ||
| - ``schemas.srn`` → ``schemas.id`` + ``schemas.version``. Composite PK. | ||
| - ``conventions.schema_srn`` → ``conventions.schema_id`` + ``conventions.schema_version``. | ||
|
|
||
| Greenfield only: no backfill from the old URN columns. If this runs against | ||
| a populated DB it fails at ``SET NOT NULL`` with a clear constraint error, | ||
| which is the correct signal that the data predates this schema. | ||
|
|
||
| Revision ID: 076_schemas_to_id | ||
| Revises: 076_metadata_catalog | ||
| Create Date: 2026-04-20 | ||
|
|
||
| """ | ||
|
|
||
| from typing import Sequence, Union | ||
|
|
||
| import sqlalchemy as sa | ||
| from alembic import op | ||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision: str = "076_schemas_to_id" | ||
| down_revision: Union[str, Sequence[str], None] = "076_metadata_catalog" | ||
| branch_labels: Union[str, Sequence[str], None] = None | ||
| depends_on: Union[str, Sequence[str], None] = None | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| # schemas: drop old SRN PK, add id + version, recompose PK. | ||
| op.add_column("schemas", sa.Column("id", sa.String(), nullable=True)) | ||
| op.add_column("schemas", sa.Column("version", sa.String(), nullable=True)) | ||
| op.alter_column("schemas", "id", nullable=False) | ||
| op.alter_column("schemas", "version", nullable=False) | ||
| op.drop_constraint("schemas_pkey", "schemas", type_="primary") | ||
| op.drop_column("schemas", "srn") | ||
| op.create_primary_key("schemas_pkey", "schemas", ["id", "version"]) | ||
| op.create_index("idx_schemas_id", "schemas", ["id"]) | ||
|
|
||
| # conventions: split schema_srn into schema_id + schema_version. | ||
| op.add_column("conventions", sa.Column("schema_id", sa.String(), nullable=True)) | ||
| op.add_column("conventions", sa.Column("schema_version", sa.String(), nullable=True)) | ||
| op.alter_column("conventions", "schema_id", nullable=False) | ||
| op.alter_column("conventions", "schema_version", nullable=False) | ||
| op.drop_column("conventions", "schema_srn") | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| # conventions back to schema_srn | ||
| op.add_column("conventions", sa.Column("schema_srn", sa.String(), nullable=True)) | ||
| op.alter_column("conventions", "schema_srn", nullable=False) | ||
| op.drop_column("conventions", "schema_version") | ||
| op.drop_column("conventions", "schema_id") | ||
|
|
||
| # schemas back to srn | ||
| op.drop_index("idx_schemas_id", table_name="schemas") | ||
| op.drop_constraint("schemas_pkey", "schemas", type_="primary") | ||
| op.add_column("schemas", sa.Column("srn", sa.String(), nullable=True)) | ||
| op.alter_column("schemas", "srn", nullable=False) | ||
| op.create_primary_key("schemas_pkey", "schemas", ["srn"]) | ||
| op.drop_column("schemas", "version") | ||
| op.drop_column("schemas", "id") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FK_NAME_TEMPLATE = "fk_features_{hook}_record_srn"has 23 chars of fixed overhead ("fk_features_"= 12,"_record_srn"= 11). The_PG_IDENT_REregex allows hook names up to 63 chars, so the rendered FK name can reach 86 chars._safe_ident(fk_name)will reject any name over 63 chars with aValueError, aborting the migration for any hook whose name exceeds 40 characters.Consider truncating or hashing the hook name when the full FK name would overflow: