The ingest pipeline has static backpressure constants (MAX_PENDING_BATCHES = 4, BACKPRESSURE_DELAY = 60s) that control how fast the ingester feeds batches into the hook pipeline. These work but aren't optimal — the delay doesn't account for actual hook execution times, and the pipeline depth doesn't adapt to cluster capacity.
Proposed approach
Adaptive delay: track hook execution durations per convention. Use a rolling average to set the backpressure delay to ~80% of average hook duration, so the ingester checks back just before a slot should free up.
Adaptive pipeline depth: observe scheduling success rate. If hook Jobs schedule immediately, increase MAX_PENDING_BATCHES. If scheduling timeouts occur, decrease it. Similar to TCP congestion window (slow start, back off on failure).
Data collection
- Record hook duration on each batch completion (time from
IngesterBatchReady to HookBatchCompleted)
- Rolling average over last N batches, per convention
- Persist to survive server restarts (on the aggregate or a metrics table)
Considerations
- Cold start: need sensible defaults for first run before data is available
- Variance: large proteins take longer than small ones — rolling average smooths but outliers can cause over/under-provisioning
- Per-convention: different conventions have different hooks with different performance profiles
- Multi-tenant: backpressure should account for other tenants' Jobs competing for cluster capacity
Current static values (set in #120)
MAX_PENDING_BATCHES = 4 in server/osa/domain/ingest/handler/run_ingester.py
BACKPRESSURE_DELAY = timedelta(seconds=60) in same file
SCHEDULING_TIMEOUT = 120 in K8s runners
The ingest pipeline has static backpressure constants (
MAX_PENDING_BATCHES = 4,BACKPRESSURE_DELAY = 60s) that control how fast the ingester feeds batches into the hook pipeline. These work but aren't optimal — the delay doesn't account for actual hook execution times, and the pipeline depth doesn't adapt to cluster capacity.Proposed approach
Adaptive delay: track hook execution durations per convention. Use a rolling average to set the backpressure delay to ~80% of average hook duration, so the ingester checks back just before a slot should free up.
Adaptive pipeline depth: observe scheduling success rate. If hook Jobs schedule immediately, increase
MAX_PENDING_BATCHES. If scheduling timeouts occur, decrease it. Similar to TCP congestion window (slow start, back off on failure).Data collection
IngesterBatchReadytoHookBatchCompleted)Considerations
Current static values (set in #120)
MAX_PENDING_BATCHES = 4inserver/osa/domain/ingest/handler/run_ingester.pyBACKPRESSURE_DELAY = timedelta(seconds=60)in same fileSCHEDULING_TIMEOUT = 120in K8s runners