Skip to content

feat: retry rejected-for-resources records with higher limits #117

@rorybyrne

Description

@rorybyrne

Hooks can reject records that are too large for the current memory limit (e.g. a 38 MB CIF file that needs >2 GB to process). Currently these records are permanently rejected.

We should support a structured rejection code that signals "this record is valid but needs more resources":

{"id": "10PX", "reason": "Structure too large (37.8 MB CIF)", "code": "resource_limit", "hint": {"memory": "4g"}}

The ingest pipeline could then:

  1. Collect all resource_limit rejections from a batch
  2. Re-run them in a separate container with higher limits (single-record batches for isolation)
  3. Fresh container = no accumulated memory from prior records

This also addresses the minor memory leak pattern where baseline memory drifts upward over a batch — large structures get a fresh container with full headroom.

Implementation notes

  • Add code and hint fields to the rejections.jsonl contract
  • PublishBatch handler collects resource_limit rejections and emits a retry event
  • New RetryLargeRecords handler processes them one-by-one with increased limits
  • Hook authors opt in by raising Reject("...", code="resource_limit", hint={"memory": "4g"}) instead of just Reject("...")

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions