perf: add a lightweight scheduler implementation by westonpace · Pull Request #5773 · lance-format/lance

westonpace · 2026-01-21T13:23:31Z

The current scheduler introduces too much synchronization overhead when we are in a high-iops throughput situation. This scheduler reduces the number of asynchronous context switches. On my desktop it doesn't actually have much impact on performance. However, on a system with more cores and higher RAM bandwidth the new scheduler more than doubles the amount of IOPS/s. Combined with the I/O uring (coming in a future PR) performance is actually 4x.

This scheduler makes tradeoffs which are not ideal for cloud readers, but which are important for the uring reader:

There is no dedicated background I/O loop thread and tasks are not launched with tokio spawn. This is fine for the local filesystem since each task is its own spawn blocking call. However, for cloud stores this is not great. Tasks are not spawned calls and need to be polled occasionally. The end result is that I/O tasks which are not front of the line might get paused if they fill up (for example, if the HTTP request queues fill up). In I/O uring this isn't a problem because polling one task actually progresses all I/O requests (not just the one associated with the task).
The lite scheduler does not enforce any kind of concurrency throttling. This is because it is mainly meant for local storage use cases and there is not much penalty for having too many concurrent requests. This would be a problem if using this scheduler for cloud storage however.

westonpace · 2026-01-21T13:24:00Z

Drafting until I merge #5755

github-actions · 2026-01-21T13:24:40Z

Code Review Summary

This PR introduces a lightweight scheduler implementation to reduce synchronization overhead in high-IOPS scenarios. The design is sound and the claimed 2-4x performance improvements are significant.

P0/P1 Issues

1. Concurrency throttle is ineffective (P0 - Bug)

In lite.rs:1662-1673, SimpleConcurrencyThrottle::try_acquire always returns true when concurrency_available > 0 but never decrements the counter:

fn try_acquire(&mut self) -> bool {
    if self.concurrency_available > 0 {
        // ...commented out...
        true  // Returns true but doesn't decrement
    } else {
        false
    }
}

Combined with release() being a no-op, this means the concurrency throttle provides no actual limiting. This appears intentional based on the TODO comment about deadlocks, but the current implementation allows unbounded concurrent I/O, which could cause resource exhaustion under load.

2. PrioritiesInFlight insertion is O(n) (P1 - Performance)

In lite.rs:1718-1724, every push performs a binary search followed by an insert, making it O(n). With up to 256 concurrent tasks as mentioned in comments, this could add measurable overhead in the hot path. Consider using a BTreeMap<u128, usize> to track priority counts instead.

3. Task cancellation leaks backpressure reservations (P1 - Bug)

In lite.rs:1527-1542, IoTask::cancel() creates a dummy BackpressureReservation with num_bytes: 0 regardless of whether the task had a real reservation. The comment in close() at line 2027-2028 acknowledges this is currently safe because the queue is local, but this is fragile if the design changes.

Minor Observations

The Reader trait change from async fn to BoxFuture return is a reasonable approach to make the futures 'static for the lite scheduler
The serialized_scheduling option controlled via env var is a good debugging/testing escape hatch
Tests should be added specifically for the lite scheduler path (currently tests only use use_lite_scheduler: false)

Overall, this is a well-structured performance improvement. Addressing the concurrency throttle behavior (either fix it or document why unbounded is acceptable) would be the main blocker.

codecov · 2026-01-21T13:59:26Z

Codecov Report

❌ Patch coverage is 12.26667% with 329 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-io/src/scheduler/lite.rs	0.00%	285 Missing ⚠️
rust/lance-io/src/scheduler.rs	51.16%	42 Missing ⚠️
rust/lance-tools/src/meta.rs	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

westonpace · 2026-01-30T00:35:24Z

Once the io_uring stuff merges I'll probably move away from triggering the lite scheduler with an environment variable and instead let the reader instance decide which scheduler it wants.

…one tasks

wjones127

Seems good, although I'm surprised by the lack of unit tests for the lite scheduler implementation. It might, for example, be helpful to double check ordering for TaskEntry is working as expected.

Also, is there some generic test suite of the schedule where you can test use_lite_scheduler on true and false?

westonpace · 2026-02-04T22:59:29Z

I added a test for ordering and one more for making sure get_range is called at the right time (a nuanced thing it took a long time to debug).

The on/off tests can come when the uring reader is added

~~This is still a draft while waiting on #5755 and #5773 This PR adds a new URI scheme `file+uring`. The scheme uses the same local file reader as `file` but has two custom `Reader` implementations that are based on the io_uring API. One of these creates a configurable number of process-wide ring threads and the reader communicates with this thread using a queue. The second assumes that the scheduler and decoder run on the same thread and uses a thread local uring instance. Both are able to saturate up to 1.5M IOPS/s when combined with the scheduler rework. I've tested the thread local variant up to 2M IOPS/s. These numbers are assuming the data is not in the kernel page cache. I've seen results as high as 4M IOPS/s when the data is cached. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

github-actions Bot added python performance labels Jan 21, 2026

westonpace marked this pull request as draft January 21, 2026 13:24

This was referenced Jan 21, 2026

feat: io_uring based file reader #5777

Merged

perf: rework scheduler #5496

Closed

westonpace force-pushed the perf/lite-scheduler branch from 6dfe2be to 9c94785 Compare January 29, 2026 23:30

westonpace marked this pull request as ready for review January 30, 2026 00:33

westonpace added 4 commits February 3, 2026 10:00

Add a lightweight scheduler implementation which doesn't have standal…

da28c16

…one tasks

Simplify PR. Switch back to new. Get rid of concurrency throttle.

3f7c77a

Fix bench compile error and missing copyright

bc54a4c

Fix bench

1346fa5

westonpace force-pushed the perf/lite-scheduler branch from fc6d1f8 to 1346fa5 Compare February 3, 2026 18:21

wjones127 approved these changes Feb 4, 2026

View reviewed changes

Added some documentation and tests

74fed48

westonpace merged commit 70636f6 into lance-format:main Feb 4, 2026
26 of 27 checks passed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add a lightweight scheduler implementation#5773

perf: add a lightweight scheduler implementation#5773
westonpace merged 5 commits intolance-format:mainfrom
westonpace:perf/lite-scheduler

westonpace commented Jan 21, 2026 •

edited

Loading

Uh oh!

westonpace commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

codecov Bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

westonpace commented Jan 30, 2026

Uh oh!

wjones127 left a comment

Uh oh!

Uh oh!

westonpace commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

westonpace commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Code Review Summary

P0/P1 Issues

Minor Observations

Uh oh!

codecov Bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

westonpace commented Jan 30, 2026

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

westonpace commented Jan 21, 2026 •

edited

Loading

codecov Bot commented Jan 21, 2026 •

edited

Loading