Summary
Make the Record aggregate source-agnostic so it can be created from any pathway (deposition, harvest, fork, import) without coupling to DepositionSRN.
Motivation
The harvest domain needs to create records directly from bulk-ingested data without going through the deposition lifecycle. Currently, Record has deposition_srn: DepositionSRN as a required field, and the entire downstream chain (RecordPublished event, InsertRecordFeatures handler, FeatureService) is coupled to depositions.
Changes
1. RecordSource discriminated union (new shared model)
class RecordSource(ValueObject):
type: str # "deposition" | "harvest" | "fork" | "import"
id: str # idempotency key
class DepositionSource(RecordSource):
type: Literal["deposition"] = "deposition"
class HarvestSource(RecordSource):
type: Literal["harvest"] = "harvest"
harvest_run_srn: str
upstream_source: str # "pdb", "geo"
class ForkSource(RecordSource):
type: Literal["fork"] = "fork"
source_node: str
class ImportSource(RecordSource):
type: Literal["import"] = "import"
source_node: str
2. Record aggregate (simplified)
class Record(Aggregate):
srn: RecordSRN
convention_srn: ConventionSRN
source: RecordSource # JSONB in DB
metadata: dict[str, Any]
published_at: datetime
Remove: deposition_srn, indexes.
3. Records table schema change
Alembic migration:
- Drop
deposition_srn column + its index
- Drop
indexes column
- Add
convention_srn TEXT NOT NULL column
- Add
source JSONB NOT NULL column
- Add functional unique index:
UNIQUE(convention_srn, source->>'type', source->>'id')
No backfill or backwards compatibility needed — no production data exists.
4. RecordPublished event
Replace deposition_srn: DepositionSRN with source: RecordSource. Keep features_dir for downstream feature insertion.
5. RecordService
Update publish_record() to accept source: RecordSource instead of deposition_srn. Add bulk_publish() method for batch record creation.
6. HookInputs (validation port)
Replace deposition_srn: DepositionSRN with generic work_dir: str. Callers set the path.
7. FeatureService / InsertRecordFeatures handler
- Handler uses
convention_srn to look up hooks from convention (instead of receiving hooks from event)
- FeatureService reads features from
features_dir passed through the event
Files touched
domain/record/model/aggregate.py — Record aggregate
domain/record/service/record.py — RecordService
domain/record/event/record_published.py — RecordPublished event
domain/shared/model/source.py — new RecordSource models
domain/validation/port/hook_runner.py — HookInputs
domain/feature/service/feature.py — FeatureService
domain/feature/handler/insert_record_features.py — InsertRecordFeatures
infrastructure/persistence/tables.py — records table
infrastructure/persistence/record_mapper.py — mapper
infrastructure/k8s/runner.py — K8sHookRunner
- Alembic migration
Blocked by
Nothing — this is a standalone refactor.
Blocks
#104 (Harvest domain implementation)
Summary
Make the Record aggregate source-agnostic so it can be created from any pathway (deposition, harvest, fork, import) without coupling to DepositionSRN.
Motivation
The harvest domain needs to create records directly from bulk-ingested data without going through the deposition lifecycle. Currently, Record has
deposition_srn: DepositionSRNas a required field, and the entire downstream chain (RecordPublished event, InsertRecordFeatures handler, FeatureService) is coupled to depositions.Changes
1. RecordSource discriminated union (new shared model)
2. Record aggregate (simplified)
Remove:
deposition_srn,indexes.3. Records table schema change
Alembic migration:
deposition_srncolumn + its indexindexescolumnconvention_srn TEXT NOT NULLcolumnsource JSONB NOT NULLcolumnUNIQUE(convention_srn, source->>'type', source->>'id')No backfill or backwards compatibility needed — no production data exists.
4. RecordPublished event
Replace
deposition_srn: DepositionSRNwithsource: RecordSource. Keepfeatures_dirfor downstream feature insertion.5. RecordService
Update
publish_record()to acceptsource: RecordSourceinstead ofdeposition_srn. Addbulk_publish()method for batch record creation.6. HookInputs (validation port)
Replace
deposition_srn: DepositionSRNwith genericwork_dir: str. Callers set the path.7. FeatureService / InsertRecordFeatures handler
convention_srnto look up hooks from convention (instead of receiving hooks from event)features_dirpassed through the eventFiles touched
domain/record/model/aggregate.py— Record aggregatedomain/record/service/record.py— RecordServicedomain/record/event/record_published.py— RecordPublished eventdomain/shared/model/source.py— new RecordSource modelsdomain/validation/port/hook_runner.py— HookInputsdomain/feature/service/feature.py— FeatureServicedomain/feature/handler/insert_record_features.py— InsertRecordFeaturesinfrastructure/persistence/tables.py— records tableinfrastructure/persistence/record_mapper.py— mapperinfrastructure/k8s/runner.py— K8sHookRunnerBlocked by
Nothing — this is a standalone refactor.
Blocks
#104 (Harvest domain implementation)