Optimized order by that lazily loads more data when needed#410
Optimized order by that lazily loads more data when needed#410
Conversation
🦋 Changeset detectedLatest commit: 4f958a6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
More templates
@tanstack/db
@tanstack/db-ivm
@tanstack/electric-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
|
Size Change: 0 B Total Size: 62.5 kB ℹ️ View Unchanged
|
|
Size Change: 0 B Total Size: 1.05 kB ℹ️ View Unchanged
|
samwillis
left a comment
There was a problem hiding this comment.
This all looks great, only a few nits.
I think you need to rebase/merge on main?
|
|
||
| // Optimize the orderBy operator to lazily load elements | ||
| // by using the range index of the collection. | ||
| // Only for orderBy clause on a single column for now (no composite ordering) |
There was a problem hiding this comment.
Worth making a tracking issue for this - we will forget.
I wander if a first step is to use the index for the first of the columns in a composite orderBy.
There was a problem hiding this comment.
That's smart! That would already be able to reduce the rows that are loaded initially. And the rest of the filtering would then happen inside the pipeline.
There was a problem hiding this comment.
Thinking a bit more about this i don't think this works. Say that we want the first 20 rows based on two columns: [A, B]. Now, assume that all rows have the same value for column A. If we only take into account column A to lazily load rows that means that we will load 20 arbitrary rows because all rows are considered equal (since they have the same value for A). So we cannot ignore the remaining columns in the order because we need them to tie-break equal rows. We could however lazily load the first 20 rows based on column A and make sure to also load all rows that are equal to the 20th row. That will ensure we have all equal rows and we can then let the pipeline tie-break them.
| let subscribedToAllCollections = false | ||
|
|
||
| const maybeRunGraph = () => { | ||
| const maybeRunGraph = (callback?: () => boolean) => { |
There was a problem hiding this comment.
Could we add a comment on what the callback is for?
I think it's for when loading more items into the pipeline, we can return false to indicate that the query is still being run - this is returned from loadMoreIfNeeded
There was a problem hiding this comment.
I've added a comment explaining this callback. It's called after the pipeline run which we need such that we can check whether the query has enough results or not (i.e. when using orderBy with a limit). We need to do it like because we cannot rely on the orderBy operator pulling data in when needed because the pipeline execution might never reach the orderBy operator if the data gets filtered out.
For example, if have limit to 20 results, then we only load the first 20 rows. But there could be a filter in the pipeline that filters out the first 40 rows. In which case we would never reach the orderBy operator. So with this solution, after the initial load of 20 rows and the pipeline run, the callback will notice that the result set is still empty. So it would load 20 more results. It would run the pipeline again, notice the empty result set, and load 20 more items. This time they don't get filtered out. So the result set contains 20 rows. The callback checks the result set, notices it has 20 rows so it doesn't load more data and marks the collection as ready.
| expect(results.map((r) => r.salary)).toEqual([65000, 60000, 55000]) | ||
| }) | ||
|
|
||
| // TODO: also test live updates with lazy loaded orderBy |
There was a problem hiding this comment.
This todo is obsolete as it is implemented by these tests:
- applies incremental insert of a new row before the topK correctly
- applies incremental insert of a new row inside the topK correctly
- applies incremental insert of a new row after the topK correctly
- applies incremental update of a row inside the topK correctly
- applies incremental delete of a row in the topK correctly
…t the right places and using a comparator for rows instead of namespaced rows.
45b9c08 to
7622884
Compare
|
I upgraded and ran into an issue that I think might be related to this, but not too sure.... |
This PR optimizes the
orderByoperator by only loading the firstlimit + offsetrows into the query pipeline. If later more rows are needed (e.g. because some rows are updated or deleted), then it lazily loads more rows until the query results containlimitrows again (or there are no more rows available).This optimization is only possible when a range index over the ordered property is available.