Skip to content

Optimized order by that lazily loads more data when needed#410

Merged
kevin-dp merged 7 commits intomainfrom
kevindp/lazy-orderBy
Aug 18, 2025
Merged

Optimized order by that lazily loads more data when needed#410
kevin-dp merged 7 commits intomainfrom
kevindp/lazy-orderBy

Conversation

@kevin-dp
Copy link
Contributor

This PR optimizes the orderBy operator by only loading the first limit + offset rows into the query pipeline. If later more rows are needed (e.g. because some rows are updated or deleted), then it lazily loads more rows until the query results contain limit rows again (or there are no more rows available).

This optimization is only possible when a range index over the ordered property is available.

@changeset-bot
Copy link

changeset-bot bot commented Aug 14, 2025

🦋 Changeset detected

Latest commit: 4f958a6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@tanstack/db-ivm Patch
@tanstack/db Patch
@tanstack/electric-db-collection Patch
@tanstack/query-db-collection Patch
@tanstack/react-db Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@kevin-dp kevin-dp requested a review from samwillis August 14, 2025 08:02
@pkg-pr-new
Copy link

pkg-pr-new bot commented Aug 14, 2025

More templates

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@410

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@410

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@410

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@410

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@410

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@410

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@410

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@410

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@410

commit: 4f958a6

@github-actions
Copy link
Contributor

github-actions bot commented Aug 14, 2025

Size Change: 0 B

Total Size: 62.5 kB

ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/change-events.js 1.13 kB
./packages/db/dist/esm/collection.js 10.5 kB
./packages/db/dist/esm/deferred.js 230 B
./packages/db/dist/esm/errors.js 3 kB
./packages/db/dist/esm/index.js 1.52 kB
./packages/db/dist/esm/indexes/auto-index.js 745 B
./packages/db/dist/esm/indexes/base-index.js 605 B
./packages/db/dist/esm/indexes/btree-index.js 1.74 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.25 kB
./packages/db/dist/esm/local-only.js 827 B
./packages/db/dist/esm/local-storage.js 2.03 kB
./packages/db/dist/esm/optimistic-action.js 294 B
./packages/db/dist/esm/proxy.js 4.19 kB
./packages/db/dist/esm/query/builder/functions.js 575 B
./packages/db/dist/esm/query/builder/index.js 3.79 kB
./packages/db/dist/esm/query/builder/ref-proxy.js 890 B
./packages/db/dist/esm/query/compiler/evaluators.js 1.48 kB
./packages/db/dist/esm/query/compiler/expressions.js 631 B
./packages/db/dist/esm/query/compiler/group-by.js 2.04 kB
./packages/db/dist/esm/query/compiler/index.js 2.14 kB
./packages/db/dist/esm/query/compiler/joins.js 2.36 kB
./packages/db/dist/esm/query/compiler/order-by.js 1.17 kB
./packages/db/dist/esm/query/compiler/select.js 655 B
./packages/db/dist/esm/query/ir.js 318 B
./packages/db/dist/esm/query/live-query-collection.js 3.63 kB
./packages/db/dist/esm/query/optimizer.js 2.44 kB
./packages/db/dist/esm/SortedMap.js 1.24 kB
./packages/db/dist/esm/transactions.js 2.29 kB
./packages/db/dist/esm/utils.js 419 B
./packages/db/dist/esm/utils/btree.js 6.02 kB
./packages/db/dist/esm/utils/comparison.js 718 B
./packages/db/dist/esm/utils/index-optimization.js 1.62 kB

compressed-size-action::db-package-size

@github-actions
Copy link
Contributor

github-actions bot commented Aug 14, 2025

Size Change: 0 B

Total Size: 1.05 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 152 B
./packages/react-db/dist/esm/useLiveQuery.js 902 B

compressed-size-action::react-db-package-size

Base automatically changed from kevindp/join-with-index to main August 18, 2025 07:38
Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great, only a few nits.

I think you need to rebase/merge on main?


// Optimize the orderBy operator to lazily load elements
// by using the range index of the collection.
// Only for orderBy clause on a single column for now (no composite ordering)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth making a tracking issue for this - we will forget.

I wander if a first step is to use the index for the first of the columns in a composite orderBy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's smart! That would already be able to reduce the rows that are loaded initially. And the rest of the filtering would then happen inside the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this i don't think this works. Say that we want the first 20 rows based on two columns: [A, B]. Now, assume that all rows have the same value for column A. If we only take into account column A to lazily load rows that means that we will load 20 arbitrary rows because all rows are considered equal (since they have the same value for A). So we cannot ignore the remaining columns in the order because we need them to tie-break equal rows. We could however lazily load the first 20 rows based on column A and make sure to also load all rows that are equal to the 20th row. That will ensure we have all equal rows and we can then let the pipeline tie-break them.

let subscribedToAllCollections = false

const maybeRunGraph = () => {
const maybeRunGraph = (callback?: () => boolean) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment on what the callback is for?

I think it's for when loading more items into the pipeline, we can return false to indicate that the query is still being run - this is returned from loadMoreIfNeeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment explaining this callback. It's called after the pipeline run which we need such that we can check whether the query has enough results or not (i.e. when using orderBy with a limit). We need to do it like because we cannot rely on the orderBy operator pulling data in when needed because the pipeline execution might never reach the orderBy operator if the data gets filtered out.

For example, if have limit to 20 results, then we only load the first 20 rows. But there could be a filter in the pipeline that filters out the first 40 rows. In which case we would never reach the orderBy operator. So with this solution, after the initial load of 20 rows and the pipeline run, the callback will notice that the result set is still empty. So it would load 20 more results. It would run the pipeline again, notice the empty result set, and load 20 more items. This time they don't get filtered out. So the result set contains 20 rows. The callback checks the result set, notices it has 20 rows so it doesn't load more data and marks the collection as ready.

expect(results.map((r) => r.salary)).toEqual([65000, 60000, 55000])
})

// TODO: also test live updates with lazy loaded orderBy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This todo is obsolete as it is implemented by these tests:

  • applies incremental insert of a new row before the topK correctly
  • applies incremental insert of a new row inside the topK correctly
  • applies incremental insert of a new row after the topK correctly
  • applies incremental update of a row inside the topK correctly
  • applies incremental delete of a row in the topK correctly

@kevin-dp kevin-dp force-pushed the kevindp/lazy-orderBy branch from 45b9c08 to 7622884 Compare August 18, 2025 14:14
Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: 🥳

@kevin-dp kevin-dp merged commit 6c1c19c into main Aug 18, 2025
5 checks passed
@kevin-dp kevin-dp deleted the kevindp/lazy-orderBy branch August 18, 2025 14:56
@github-actions github-actions bot mentioned this pull request Aug 18, 2025
@bradleybernard
Copy link

I upgraded and ran into an issue that I think might be related to this, but not too sure....
#435

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants