Summary
PR #5981 (e25f16909) introduced FullZipReadSource and create_page_load_task as a unified abstraction for scheduling FullZip reads. While the PR itself brought significant performance improvements (especially the full-page scan shortcut and always-cached rep index), it inadvertently changed the I/O submission timing in two code paths: schedule_ranges_simple and the cached branch of schedule_ranges_rep. In both cases, submit_request was moved inside an async move { ... } block, which means the I/O is no longer submitted during the schedule phase — it is deferred until the decode phase polls the future.
Affected Code Paths
| Path |
Before #5981 |
After #5981 |
Expected |
schedule_ranges_simple (fixed-width, no rep index) |
submit_request called eagerly during scheduling |
submit_request inside create_page_load_task → deferred to decode |
Eager (byte ranges are known at schedule time via simple arithmetic) |
schedule_ranges_rep cached branch (rep index in memory) |
All logic inside one async block (deferred) |
submit_request inside create_page_load_task → still deferred |
Eager (byte ranges computable from in-memory cached rep index, opportunity for optimization) |
schedule_ranges_rep full-page scan |
N/A (new path) |
submit_single called eagerly ✅ |
Correct |
schedule_ranges_rep uncached (no rep index) |
Deferred (two-stage I/O dependency) |
Deferred ✅ |
Correct (data byte ranges depend on first I/O result) |
Impact
The scheduling architecture is designed as a two-thread pipeline: the scheduler thread issues I/O as fast as possible, and the decode stream consumes loaded pages. As described in decoder.rs:
Note that the scheduler thread does not need to wait for I/O to happen at any point. As soon as it starts it will start scheduling one page of I/O after another until it has scheduled the entire file's worth of I/O.
When submit_request is deferred into the future, the I/O request is not enqueued into the IoQueue until the decode stream actually polls it. This eliminates the overlap between I/O and scheduling/decoding of other pages, effectively serializing I/O with decode and adding one full network RTT of latency per page for cloud storage (S3/GCS/Azure).
Note: this does not cause unbounded I/O pressure because IoQueue already enforces IOPS limits (io_parallelism: 64 for cloud, 8 for local) and byte-level backpressure (io_buffer_size_bytes). Early submission simply enqueues requests into the priority queue sooner, giving the I/O scheduler better visibility for prioritization.
Summary
PR #5981 (
e25f16909) introducedFullZipReadSourceandcreate_page_load_taskas a unified abstraction for scheduling FullZip reads. While the PR itself brought significant performance improvements (especially the full-page scan shortcut and always-cached rep index), it inadvertently changed the I/O submission timing in two code paths:schedule_ranges_simpleand the cached branch ofschedule_ranges_rep. In both cases,submit_requestwas moved inside anasync move { ... }block, which means the I/O is no longer submitted during the schedule phase — it is deferred until the decode phase polls the future.Affected Code Paths
schedule_ranges_simple(fixed-width, no rep index)submit_requestcalled eagerly during schedulingsubmit_requestinsidecreate_page_load_task→ deferred to decodeschedule_ranges_repcached branch (rep index in memory)asyncblock (deferred)submit_requestinsidecreate_page_load_task→ still deferredschedule_ranges_repfull-page scansubmit_singlecalled eagerly ✅schedule_ranges_repuncached (no rep index)Impact
The scheduling architecture is designed as a two-thread pipeline: the scheduler thread issues I/O as fast as possible, and the decode stream consumes loaded pages. As described in
decoder.rs:When
submit_requestis deferred into the future, the I/O request is not enqueued into theIoQueueuntil the decode stream actually polls it. This eliminates the overlap between I/O and scheduling/decoding of other pages, effectively serializing I/O with decode and adding one full network RTT of latency per page for cloud storage (S3/GCS/Azure).Note: this does not cause unbounded I/O pressure because
IoQueuealready enforces IOPS limits (io_parallelism: 64 for cloud, 8 for local) and byte-level backpressure (io_buffer_size_bytes). Early submission simply enqueues requests into the priority queue sooner, giving the I/O scheduler better visibility for prioritization.