Hooks can reject records that are too large for the current memory limit (e.g. a 38 MB CIF file that needs >2 GB to process). Currently these records are permanently rejected.
We should support a structured rejection code that signals "this record is valid but needs more resources":
{"id": "10PX", "reason": "Structure too large (37.8 MB CIF)", "code": "resource_limit", "hint": {"memory": "4g"}}
The ingest pipeline could then:
- Collect all
resource_limit rejections from a batch
- Re-run them in a separate container with higher limits (single-record batches for isolation)
- Fresh container = no accumulated memory from prior records
This also addresses the minor memory leak pattern where baseline memory drifts upward over a batch — large structures get a fresh container with full headroom.
Implementation notes
- Add
code and hint fields to the rejections.jsonl contract
PublishBatch handler collects resource_limit rejections and emits a retry event
- New
RetryLargeRecords handler processes them one-by-one with increased limits
- Hook authors opt in by raising
Reject("...", code="resource_limit", hint={"memory": "4g"}) instead of just Reject("...")
Hooks can reject records that are too large for the current memory limit (e.g. a 38 MB CIF file that needs >2 GB to process). Currently these records are permanently rejected.
We should support a structured rejection code that signals "this record is valid but needs more resources":
{"id": "10PX", "reason": "Structure too large (37.8 MB CIF)", "code": "resource_limit", "hint": {"memory": "4g"}}The ingest pipeline could then:
resource_limitrejections from a batchThis also addresses the minor memory leak pattern where baseline memory drifts upward over a batch — large structures get a fresh container with full headroom.
Implementation notes
codeandhintfields to the rejections.jsonl contractPublishBatchhandler collectsresource_limitrejections and emits a retry eventRetryLargeRecordshandler processes them one-by-one with increased limitsReject("...", code="resource_limit", hint={"memory": "4g"})instead of justReject("...")