Memory ingestion queue is unbounded — buggy producer can OOM the core

> *Original issue #2442 by @obchain on 2026-05-21T11:51:15Z*

---

## Summary

`memory::ingestion::queue::start_worker_with_state` builds the ingestion job channel with `mpsc::unbounded_channel`, so producers (`put_doc`, `store_skill_sync`) can enqueue indefinitely while the worker drains one job at a time under a singleton run-lock. A misbehaving or buggy agent calling `put_doc` in a tight loop can grow the queue without bound and exhaust process memory.

## Problem

`src/openhuman/memory/ingestion/queue.rs:106` (current `main`, 6137b678):

```rust
let (tx, rx) = mpsc::unbounded_channel::<IngestionJob>();
```

The worker (`ingestion_worker`, same file ~L123) serialises every job behind `IngestionState::acquire()` so it can only process one extraction at a time — the local extraction LLM contends otherwise. Job processing is on the order of seconds-to-minutes per document depending on doc size and model.

Two producer sites push into the channel directly without backpressure:

- `src/openhuman/memory/store/client.rs:152` — `put_doc`
- `src/openhuman/memory/store/client.rs:266` — `store_skill_sync`

Both increment `IngestionState::enqueue()` and then call `IngestionQueue::submit(job)`. `submit` already handles the "worker gone" path (`SendError`) but the channel itself has no capacity bound.

Concrete repro: a skill that loops calling `put_doc` (or any code path that ends up at one of the two producers above) ~100k times faster than the worker can drain results in:

- `queue_depth` atomic climbing to 100k.
- ~100k `IngestionJob` values resident in the channel buffer; each holds an owned `NamespaceDocumentInput` (full document content + metadata), so peak memory scales linearly with doc size × queue depth.
- No backpressure signal to the producer — submit returns `true` regardless of pressure.
- OOM kill or paging stall before the worker catches up.

Impact tier: not exploitable across a trust boundary (producers are inside the user's own core), but it is a robustness bug: a buggy skill, a misconfigured Composio sync, or an agent re-ingesting the same source on every tick can DoS the local core without any user-visible warning.

## Solution (optional)

Three-step fix in `src/openhuman/memory/ingestion/queue.rs`:

1. Replace `mpsc::unbounded_channel` with `mpsc::channel(DEFAULT_QUEUE_CAPACITY)`. Suggest `DEFAULT_QUEUE_CAPACITY = 512` — at typical doc sizes (1KB–100KB) the buffer caps memory pressure at well under 100MB while still absorbing reasonable user-driven bursts (bulk import of a Notion workspace, large Slack backfill).
2. Change `submit` from `tx.send(job)` to `tx.try_send(job)` (non-blocking) and distinguish `TrySendError::Full` from `TrySendError::Closed`:
   - `Full` → log at warn with namespace + title, decrement `enqueue` counter, return `false` so producers can surface "queue full, retry later" to the caller.
   - `Closed` → keep existing "worker gone" behaviour.
3. Expose a `start_worker_with_capacity` variant for tests so the capacity-bound path can be exercised deterministically without faking a slow worker.

Optional: emit a `DomainEvent::MemoryIngestionEnqueueDropped { … }` so the existing event-bus subscribers (status RPC, observability) can surface drops to the operator.

## Acceptance criteria

- [ ] **Repro gone** — A test that fills the queue to capacity gets a `false` from `submit` and a logged warning, while `queue_depth` stays at the cap (not the runaway value).
- [ ] **Regression safety** — Unit tests cover (a) submit-when-full path, (b) submit-after-drain succeeds again, (c) submit-when-worker-gone still returns false.
- [ ] **No producer behaviour change for the common case** — `put_doc` and `store_skill_sync` continue to enqueue successfully under normal load.
- [ ] **Diff coverage ≥ 80%** — fix PR meets the changed-lines coverage gate.

## Related

- Surface introduced in #325 (`feat: background ingestion queue for memory graph extraction`).
- Related hardening pass (file/path side): #2111.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory ingestion queue is unbounded — buggy producer can OOM the core #52

Summary

Problem

Solution (optional)

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory ingestion queue is unbounded — buggy producer can OOM the core #52

Description

Summary

Problem

Solution (optional)

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions