Optimize joins to use index when possible by kevin-dp · Pull Request #335 · TanStack/db

kevin-dp · 2025-07-30T09:37:47Z

This PR optimizes joins based on available indexes.

For a left join we always have to iterate over the left collection, but we don't need to iterate over the entire right collection. Based on the rows in the left collection we can lookup the rows that match in the right collection (based on the key we're joining on). That lookup is efficient if there's an index on the join key. We can do the same for right joins. For inner joins, we can loop over the smallest collection and lookup matching rows in the bigger collection such that we don't have to loop over the bigger collection.

Here's a concrete example, imagine we're left joining a Comments collection with a Users collection on Comments.user_id = Users.id. And imagine we have an index on Users.id. Now, we can loop over all Comments and for each comment we can lookup it's user_id in the index for Users.id. This will give us the corresponding user that we need to join with the comment, without having to loop over the entire Users collection.

Implementation Overview

We don't actually have to modify the existing D2 join operator to do this. The join operator takes two streams, a stream for the left collection and a stream for the right collection. For left/right/inner joins we only need to loop over one of the streams and we don't need to loop over the entire other stream. Therefore, the idea is to modify the streams such that there is an active stream and a lazy stream. We process the entire active stream and use it to dynamically populate the lazy stream. This is depicted in the following diagram:

The diagram above depicts a left-join for comments with users (the example from before). The comments are filtered (e.g. to only get the comments for a certain issue). Then, we want to left-join it with users. To do this, we add a tap operator after the filter and before the join. This operator doesn't modify the stream, but will for every row, look up the join key in the index of the Users collection and dynamically load the matching user into the lazy users stream. In other words, we're populating the lazy stream with users that are matching the comments as we process them. Note that the lazy users stream can apply additional operators before being joined in. In the diagram, we're doing an additional filter over the lazy stream before joining it in.

Implementation Challenges

The D2 pipeline from the diagram above is created at compile time but the indexes are created at runtime. Hence, when creating this special tap operator we don't know if the collection that we want to lazily load via an index on the join key, will actually have the index that is required on the join key. We will only know this at runtime, when the map operator runs for the first time. At that point, if we notice that the index does not exist, we cannot apply the optimization so we then turn the Users collection back into a regular collection (instead of a lazy collection).

Currently, inner joins loop over the smallest collection and lookup matching rows in the index on the bigger collection. But that index may not exist, in which case we will need to loop over the entire bigger collection. In that case, it may be more efficient to loop over the bigger collection and try to find matching rows in the smaller collection (because that one might have an index on the join key). However, flipping these collections around is going to complicate the code quite a lot (because again this would need to happen at runtime) so we decided not to do it yet.

TODOs

Add unit tests to check that indexes are correctly used for left/right/inner joins
Automatically create indexes for join keys (in eager mode)
~~Always create an index on the PK of a collection (in eager and pk mode)~~ Better as a follow-up PR.

changeset-bot · 2025-07-30T09:37:51Z

🦋 Changeset detected

Latest commit: 7140573

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
@tanstack/db-ivm	Patch
@tanstack/db	Patch
@tanstack/electric-db-collection	Patch
@tanstack/query-db-collection	Patch
@tanstack/react-db	Patch
@tanstack/solid-db	Patch
@tanstack/svelte-db	Patch
@tanstack/trailbase-db-collection	Patch
@tanstack/vue-db	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2025-07-30T11:36:09Z

More templates

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@335

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@335

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@335

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@335

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@335

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@335

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@335

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@335

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@335

commit: 7140573

github-actions · 2025-07-30T11:37:15Z

Size Change: +1.66 kB (+2.84%)

Total Size: 60.1 kB

Filename	Size	Change
`./packages/db/dist/esm/collection.js`	9.86 kB	+13 B (+0.13%)
`./packages/db/dist/esm/errors.js`	3 kB	+27 B (+0.91%)
`./packages/db/dist/esm/index.js`	1.52 kB	+11 B (+0.73%)
`./packages/db/dist/esm/indexes/auto-index.js`	718 B	+29 B (+4.21%)
`./packages/db/dist/esm/query/compiler/index.js`	2.1 kB	+366 B (+21.1%)	🚨
`./packages/db/dist/esm/query/compiler/joins.js`	2.31 kB	+754 B (+48.33%)	🚨
`./packages/db/dist/esm/query/live-query-collection.js`	2.91 kB	+463 B (+18.93%)	⚠️

ℹ️ View Unchanged

Filename	Size
`./packages/db/dist/esm/change-events.js`	1.13 kB
`./packages/db/dist/esm/deferred.js`	230 B
`./packages/db/dist/esm/indexes/base-index.js`	605 B
`./packages/db/dist/esm/indexes/btree-index.js`	1.47 kB
`./packages/db/dist/esm/indexes/lazy-index.js`	1.25 kB
`./packages/db/dist/esm/local-only.js`	827 B
`./packages/db/dist/esm/local-storage.js`	2.03 kB
`./packages/db/dist/esm/optimistic-action.js`	294 B
`./packages/db/dist/esm/proxy.js`	4.19 kB
`./packages/db/dist/esm/query/builder/functions.js`	575 B
`./packages/db/dist/esm/query/builder/index.js`	3.79 kB
`./packages/db/dist/esm/query/builder/ref-proxy.js`	890 B
`./packages/db/dist/esm/query/compiler/evaluators.js`	1.48 kB
`./packages/db/dist/esm/query/compiler/expressions.js`	631 B
`./packages/db/dist/esm/query/compiler/group-by.js`	2.03 kB
`./packages/db/dist/esm/query/compiler/order-by.js`	677 B
`./packages/db/dist/esm/query/compiler/select.js`	655 B
`./packages/db/dist/esm/query/ir.js`	318 B
`./packages/db/dist/esm/query/optimizer.js`	2.44 kB
`./packages/db/dist/esm/SortedMap.js`	1.24 kB
`./packages/db/dist/esm/transactions.js`	2.29 kB
`./packages/db/dist/esm/utils.js`	419 B
`./packages/db/dist/esm/utils/btree.js`	5.93 kB
`./packages/db/dist/esm/utils/comparison.js`	718 B
`./packages/db/dist/esm/utils/index-optimization.js`	1.62 kB

_{compressed-size-action::db-package-size}

github-actions · 2025-07-30T11:38:21Z

Size Change: 0 B

Total Size: 1.05 kB

ℹ️ View Unchanged

Filename	Size
`./packages/react-db/dist/esm/index.js`	152 B
`./packages/react-db/dist/esm/useLiveQuery.js`	902 B

_{compressed-size-action::react-db-package-size}

samwillis

~~We have central error classes in src/errros.ts we should use one form there or create a new one.~~

Ignore, clicked wrong btutton

samwillis

This is all absolutely awesome!
Approving it, but may be worth just swapping out the errors for the ~~mental~~ central errors and using the ~~drug~~ debug package for logging.

packages/db/src/query/compiler/joins.ts

samwillis · 2025-08-05T17:06:31Z

packages/db-ivm/src/operators/tap.ts

+
+  inner(collection: MultiSet<T>): MultiSet<T> {
+    return collection.map((data) => {
+      this.#f(data)


It could be useful in future to pass the multiplicity to the callback so that it's aware if it's an insert/delete. Not important for now.

We can add this later when we need it :-)

samwillis · 2025-08-05T17:07:57Z

packages/db/src/indexes/auto-index.ts

+      indexType: BTreeIndex,
+    })
+  } catch (error) {
+    console.warn(`Failed to create auto-index for field "${fieldName}":`, error)


we've used the debug package elsewhere for logging. We should maybe use it here.

I didn't actually change this. I took this piece of code from ensureIndexForExpression and moved it here. We could use debugLog but that would only log it in debug mode. I think this warning is useful also in non-debug mode to warn you that for some reason the index could not be created and thus the queries might be less efficient.

… to use

…are joined on

… and loading matching keys dynamically.

samwillis

This is great, let's get it merged!

the note I've added is for later

samwillis · 2025-08-18T07:08:03Z

packages/db/src/query/compiler/joins.ts

+    const activePipelineWithLoading: IStreamBuilder<
+      [key: unknown, [originalKey: string, namespacedRow: NamespacedRow]]
+    > = activePipeline.pipe(
+      tap(([joinKey, _]) => {


Not something to change right now, but with the current version of tap we processes and ask for each joined key one at a time. If in future we then want to batch load via sync we need to try and reassembly a batch to ask for.

the tap operator iterates over the items in the multiset, calling this function here, and it then for each row asked for the joined items to be injected. The alternative would to do so tot as the multiset level, we have a batch of items from the left, so ask for a batch from the right to be injected all at once. These batches then naturally can be pushed back down to the sync layer and asked for from there.

But for later!

kevin-dp force-pushed the kevindp/join-with-index branch from ec65765 to 492f9fa Compare July 30, 2025 11:34

kevin-dp requested a review from samwillis August 5, 2025 07:11

kevin-dp force-pushed the kevindp/join-with-index branch from 2e81220 to e706f15 Compare August 5, 2025 07:32

samwillis reviewed Aug 5, 2025

View reviewed changes

samwillis approved these changes Aug 5, 2025

View reviewed changes

kevin-dp added 9 commits August 14, 2025 09:57

Optimize joins to use index when possible.

6e3dccf

Index tests for left, right, and inner joins.

69ff884

Use the right join expression to determine the join key and the index…

9b62327

… to use

Modify join tests to use different column names for the columns that …

98ba8b8

…are joined on

Automatically create index for join key in eager mode

2a2fa9a

Optimize initial load of join queries by using a regular subscription…

3ac1208

… and loading matching keys dynamically.

Move followRef helper function to the bottom of the file

02e04a3

Changeset

69bb8ba

Use central error class

7140573

kevin-dp force-pushed the kevindp/join-with-index branch from 6065ac1 to 7140573 Compare August 14, 2025 07:57

samwillis approved these changes Aug 18, 2025

View reviewed changes

kevin-dp merged commit 68538b4 into main Aug 18, 2025
6 checks passed

kevin-dp deleted the kevindp/join-with-index branch August 18, 2025 07:38

github-actions bot mentioned this pull request Aug 18, 2025

ci: Version Packages #415

Merged

Conversation

kevin-dp commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Overview

Implementation Challenges

TODOs

Uh oh!

changeset-bot bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samwillis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samwillis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samwillis Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

kevin-dp Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

samwillis Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

kevin-dp Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

samwillis left a comment

Choose a reason for hiding this comment

Uh oh!

samwillis Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevin-dp commented Jul 30, 2025 •

edited

Loading

changeset-bot bot commented Jul 30, 2025 •

edited

Loading

pkg-pr-new bot commented Jul 30, 2025 •

edited

Loading

github-actions bot commented Jul 30, 2025 •

edited

Loading

github-actions bot commented Jul 30, 2025 •

edited

Loading

samwillis left a comment •

edited

Loading

samwillis left a comment •

edited

Loading