Skip to content

refactor: unify PG identifier safety behind a typed value object #130

@rorybyrne

Description

@rorybyrne

Context

PR #128 introduces dynamic PG DDL keyed on user-chosen strings (schema ids, hook names, field names, feature column names). Every composition site re-invents its own safety check, and review rounds keep finding gaps at the boundaries where user strings compose into PG identifiers.

Current state after the #128 review rounds:

  • SchemaIdentifier (3-64 chars, [a-z][a-z0-9-]) enforces charset only.
  • schema_slug() converts + enforces its own length regex; check_pg_table_name() guards the {slug}_v{major} composition.
  • HookName (40 chars) caps hooks to leave room for fk_features_{name}_record_srn.
  • PgIdentifier (63 chars) on ColumnDef.name — columns don't compose into longer names.
  • _safe_ident defense-in-depth in two migrations + the metadata store.

That's five separate enforcement sites with subtly different rules. When a new derivation site appears (e.g. a future index name, constraint name, or cross-hook composition), the invariant is not enforced by the type system — only by whatever guard that site happens to add.

Proposal

Introduce a single typed abstraction for PG-safe identifiers that encodes:

  1. The charset / leading-character rule.
  2. The max-length budget for this category, which accounts for the worst-case derived identifier the value will be composed into.
  3. A composition helper (pg_compose(prefix, ident, suffix)) that re-checks the final length against PG's 63-char limit.

Rough shape:

```python
class PgIdent(Protocol):
MAX_LEN: ClassVar[int]
def as_pg_str(self) -> str: ...

class SchemaIdent(PgIdent): # 58 chars — leaves room for _v<major>
class HookIdent(PgIdent): # 40 chars — leaves room for fk_features_{}_record_srn
class ColumnIdent(PgIdent): # 63 chars — standalone, no composition

def pg_compose(*parts: str) -> str:
joined = "_".join(parts)
if len(joined) > 63:
raise ValidationError(...)
return joined
```

Then collapse _safe_ident, schema_slug, check_pg_table_name, and the ad-hoc regexes on each type into this one abstraction.

Why this matters

The review churn on #128 isn't a quality problem with that PR — it's a signal that the abstraction is missing. Each new derivation point is a new silent-failure risk until someone notices and adds another guard.

Out of scope

  • Changing how schemas / hooks / columns are named externally (wire contract stays).
  • Runtime migrations — existing names already satisfy the planned caps.
  • Changing the FK naming convention (fk_features_{hook}_record_srn).

Acceptance

  • One PgIdent protocol or ABC, and a pg_compose helper.
  • SchemaIdentifier, HookName (currently str), and ColumnDef.name replaced by the corresponding typed value objects.
  • _safe_ident, schema_slug, and check_pg_table_name either deleted or delegated to the new abstraction.
  • Migrations that derive PG identifiers from catalog rows use pg_compose rather than their own regex guard.
  • A single place to answer "what's the budget for this name category?".

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorInternal restructuring, no behavior change

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions