Skip to content

Fix memory leak#550

Merged
samwillis merged 5 commits intomainfrom
fix-memory-leak
Sep 15, 2025
Merged

Fix memory leak#550
samwillis merged 5 commits intomainfrom
fix-memory-leak

Conversation

@samwillis
Copy link
Collaborator

Thanks to @sorenbs for finding and fixing this.

sorenbs and others added 5 commits September 12, 2025 13:44
src/valueIndex.ts:28–40 — fast path for primitive equality (number/string) avoids hashing when possible.
Each operator factory added a fresh “public” reader via graph.addStream(output.connectReader()). These readers are not consumed anywhere in the graph engine (D2.run() only checks operator inputs).
As a result, every message at every stage was duplicated into at least one extra queue that was never drained, leading to unbounded growth.
In src/d2.ts, newInput() also added such a reader, further duplicating messages at the graph boundary.
This is an architectural memory leak: dangling readers retained all messages indefinitely.

Changes Implemented

Stop creating unconsumed “public” readers:
src/d2.ts:28–34 — newInput no longer creates and stores a reader.
src/d2.ts:41–45 — addStream kept as a no-op for API compatibility (prevents accidental accumulation).
Remove graph.addStream(output.connectReader()) across all operators so they don’t create dangling readers:
src/operators/map.ts:44
src/operators/reduce.ts:120
src/operators/filter.ts:… (operator creation site)
src/operators/concat.ts:… (operator creation site)
src/operators/output.ts:… (operator creation site)
src/operators/debug.ts:… (operator creation site)
src/operators/tap.ts:… (operator creation site)
src/operators/join.ts:… (operator creation site)
src/operators/count.ts:… (operator creation site)
src/operators/distinct.ts:… (operator creation site)
src/operators/consolidate.ts:… (operator creation site)
src/operators/topKWithFractionalIndex.ts:… (operator creation site)
src/operators/topKWithFractionalIndexBTree.ts:… (operator creation site)

Validation

Memory after fix (gtime):
1k: Max memory: 54112 KB
100k: Max memory: 108112 KB
1M: Max memory: 176160 KB

Compared to the pre-fix 1M at ~1.26 GB, this is a dramatic improvement. The remaining increase is modest and consistent with runtime/GC behavior and transient allocations (e.g., output multisets per step), not retained state.

Why This Works

Operators still create one reader per actual consumer (the next operator in the pipeline). Those are drained synchronously in each step().
No extra dangling readers are created; hence no permanent queues retain messages.
If a stream has no consumer (like the final reduce in the memtest), its writer has zero readers, so sendData does nothing — no queue, no retention.

Please explain to me:

Why did this library implement these “public” readers? Are they needed for any part of the public API? Can we safely remove them without impacting the public API and functioning of the library?

Here’s the straight answer.

Why they existed

Likely intent: Materialize every operator’s output “somewhere” by default — either as a vestige from the original D2TS design (where graphs tracked streams/frontiers) or as a convenience to make outputs observable later.
In this simplified, version-free engine, that mechanism was never actually used. The graph scheduler does not read those queues, nor does the public API expose them.

Are they needed for the public API?

No. Public consumption is already explicit:
IStreamBuilder.connectReader() lets you attach readers where you need results. See implementation in src/d2.ts:81.
Side-effect sinks (e.g. debug, output) do observation without needing a reader.
The graph’s private storage of readers is not exposed and is not used by scheduling:
D2.run() drives work by operators only; it doesn’t look at readers. See src/d2.ts:61 and src/d2.ts:65.
Readers themselves only buffer for the parties that explicitly connect. See src/graph.ts:46 (writer creates a queue per reader) and src/graph.ts:19 (reader drains and clears).

Can we remove them safely?

Yes. We removed the automatic “public” readers and everything still works:
No change to surface API or types.
All tests still pass (247/247).
Memory usage improves dramatically because we no longer create undrained queues holding every message.

Behavioral semantics after removal

Streams without a consumer: If you don’t connect a reader (or end in a sink), outputs are dropped — which matches normal dataflow semantics. Previously, they were buffered in invisible, undrained queues (the source of the memory growth).
How to consume results explicitly:
Example: const reader = output.connectReader(); graph.run(); const messages = reader.drain();
This is consistent with the library’s public contract — users already needed to attach readers for data they wanted to materialize.
Key references

src/graph.ts:36 — Writer enqueues to each reader’s queue.
src/graph.ts:19 — Reader drain() clears its queue.
src/d2.ts:61 and src/d2.ts:65 — Scheduler only asks operators if they have pending input and runs them; it never reads “graph-level” readers.
src/d2.ts:81 — StreamBuilder.connectReader() is the supported way to consume a stream.
Bottom line

The “public” readers were not part of the usable public API and weren’t used by the engine. They created undrained queues that retained all messages.
Removing them is safe and correct. If you want materialized outputs, attach a reader explicitly or use a sink operator. If you want a convenience “materialize()” operator that returns a handle, we can add one as a follow-up.
@changeset-bot
Copy link

changeset-bot bot commented Sep 13, 2025

🦋 Changeset detected

Latest commit: 0c85527

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages
Name Type
@tanstack/db-ivm Patch
@tanstack/db Patch
@tanstack/electric-db-collection Patch
@tanstack/query-db-collection Patch
@tanstack/react-db Patch
@tanstack/rxdb-db-collection Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch
@tanstack/db-example-react-todo Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@samwillis samwillis requested a review from kevin-dp September 13, 2025 17:17
@pkg-pr-new
Copy link

pkg-pr-new bot commented Sep 13, 2025

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@550

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@550

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@550

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@550

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@550

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@550

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@550

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@550

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@550

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@550

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@550

commit: 0c85527

@github-actions
Copy link
Contributor

Size Change: 0 B

Total Size: 66.6 kB

ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/change-events.js 1.13 kB
./packages/db/dist/esm/collection.js 10.5 kB
./packages/db/dist/esm/deferred.js 230 B
./packages/db/dist/esm/errors.js 3.1 kB
./packages/db/dist/esm/index.js 1.55 kB
./packages/db/dist/esm/indexes/auto-index.js 745 B
./packages/db/dist/esm/indexes/base-index.js 605 B
./packages/db/dist/esm/indexes/btree-index.js 1.74 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.25 kB
./packages/db/dist/esm/local-only.js 827 B
./packages/db/dist/esm/local-storage.js 2.03 kB
./packages/db/dist/esm/optimistic-action.js 294 B
./packages/db/dist/esm/proxy.js 3.87 kB
./packages/db/dist/esm/query/builder/functions.js 615 B
./packages/db/dist/esm/query/builder/index.js 3.93 kB
./packages/db/dist/esm/query/builder/ref-proxy.js 938 B
./packages/db/dist/esm/query/compiler/evaluators.js 1.52 kB
./packages/db/dist/esm/query/compiler/expressions.js 631 B
./packages/db/dist/esm/query/compiler/group-by.js 2.08 kB
./packages/db/dist/esm/query/compiler/index.js 2.27 kB
./packages/db/dist/esm/query/compiler/joins.js 2.52 kB
./packages/db/dist/esm/query/compiler/order-by.js 1.23 kB
./packages/db/dist/esm/query/compiler/select.js 1.28 kB
./packages/db/dist/esm/query/ir.js 508 B
./packages/db/dist/esm/query/live-query-collection.js 333 B
./packages/db/dist/esm/query/live/collection-config-builder.js 2.59 kB
./packages/db/dist/esm/query/live/collection-subscriber.js 2.4 kB
./packages/db/dist/esm/query/optimizer.js 3.05 kB
./packages/db/dist/esm/SortedMap.js 1.24 kB
./packages/db/dist/esm/transactions.js 2.29 kB
./packages/db/dist/esm/utils.js 943 B
./packages/db/dist/esm/utils/btree.js 6.02 kB
./packages/db/dist/esm/utils/comparison.js 718 B
./packages/db/dist/esm/utils/index-optimization.js 1.62 kB

compressed-size-action::db-package-size

@github-actions
Copy link
Contributor

Size Change: 0 B

Total Size: 1.18 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 152 B
./packages/react-db/dist/esm/useLiveQuery.js 1.02 kB

compressed-size-action::react-db-package-size

@sorenbs
Copy link
Contributor

sorenbs commented Sep 13, 2025

So stoked for this! Reduced memory from ~6GB to ~200MB in one of my tests.

@kevin-dp
Copy link
Contributor

@samwillis could you explain what was causing the memory leak? I see that you removed the #streams property and the addStream method. Why was it causing a memory leak and why do we no longer need it?

@samwillis
Copy link
Collaborator Author

@kevin-dp we were adding an output from each operator which would accumulate messages into a list in the graph, these outputs were never drained. It was a remnant of how things worked from very early in D2TS that was overlooked.
It meant we had these unused growing lists holding hard references to messages forever.

@samwillis samwillis merged commit c58cec9 into main Sep 15, 2025
6 checks passed
@samwillis samwillis deleted the fix-memory-leak branch September 15, 2025 09:43
@github-actions github-actions bot mentioned this pull request Sep 15, 2025
Uziniii pushed a commit to Uziniii/db that referenced this pull request Sep 19, 2025
Co-authored-by: Søren Bramer Schmidt <sorenbs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants