Skip to content

refactor: make Record source-agnostic with RecordSource discriminated union #103

@rorybyrne

Description

@rorybyrne

Summary

Make the Record aggregate source-agnostic so it can be created from any pathway (deposition, harvest, fork, import) without coupling to DepositionSRN.

Motivation

The harvest domain needs to create records directly from bulk-ingested data without going through the deposition lifecycle. Currently, Record has deposition_srn: DepositionSRN as a required field, and the entire downstream chain (RecordPublished event, InsertRecordFeatures handler, FeatureService) is coupled to depositions.

Changes

1. RecordSource discriminated union (new shared model)

class RecordSource(ValueObject):
    type: str   # "deposition" | "harvest" | "fork" | "import"
    id: str     # idempotency key

class DepositionSource(RecordSource):
    type: Literal["deposition"] = "deposition"

class HarvestSource(RecordSource):
    type: Literal["harvest"] = "harvest"
    harvest_run_srn: str
    upstream_source: str  # "pdb", "geo"

class ForkSource(RecordSource):
    type: Literal["fork"] = "fork"
    source_node: str

class ImportSource(RecordSource):
    type: Literal["import"] = "import"
    source_node: str

2. Record aggregate (simplified)

class Record(Aggregate):
    srn: RecordSRN
    convention_srn: ConventionSRN
    source: RecordSource  # JSONB in DB
    metadata: dict[str, Any]
    published_at: datetime

Remove: deposition_srn, indexes.

3. Records table schema change

Alembic migration:

  • Drop deposition_srn column + its index
  • Drop indexes column
  • Add convention_srn TEXT NOT NULL column
  • Add source JSONB NOT NULL column
  • Add functional unique index: UNIQUE(convention_srn, source->>'type', source->>'id')

No backfill or backwards compatibility needed — no production data exists.

4. RecordPublished event

Replace deposition_srn: DepositionSRN with source: RecordSource. Keep features_dir for downstream feature insertion.

5. RecordService

Update publish_record() to accept source: RecordSource instead of deposition_srn. Add bulk_publish() method for batch record creation.

6. HookInputs (validation port)

Replace deposition_srn: DepositionSRN with generic work_dir: str. Callers set the path.

7. FeatureService / InsertRecordFeatures handler

  • Handler uses convention_srn to look up hooks from convention (instead of receiving hooks from event)
  • FeatureService reads features from features_dir passed through the event

Files touched

  • domain/record/model/aggregate.py — Record aggregate
  • domain/record/service/record.py — RecordService
  • domain/record/event/record_published.py — RecordPublished event
  • domain/shared/model/source.py — new RecordSource models
  • domain/validation/port/hook_runner.py — HookInputs
  • domain/feature/service/feature.py — FeatureService
  • domain/feature/handler/insert_record_features.py — InsertRecordFeatures
  • infrastructure/persistence/tables.py — records table
  • infrastructure/persistence/record_mapper.py — mapper
  • infrastructure/k8s/runner.py — K8sHookRunner
  • Alembic migration

Blocked by

Nothing — this is a standalone refactor.

Blocks

#104 (Harvest domain implementation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorInternal restructuring, no behavior change

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions