-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
We rely on MemoryExec for many of our tests, but MemoryExec doesn't support pushing limits down (aka fetch) This means that some bugs such as the following are harder to observe due to the fact in memory tables don't have pushdown testing:
Describe the solution you'd like
I would like MemoryExec to support "fetch" pushdown too so it mirrors the other sources and adds additional test coverage
Describe alternatives you've considered
One way would be:
- Add a
MemoryExec::fetchfield similar todatafusion/datafusion/physical-plan/src/limit.rs
Lines 213 to 214 in f3b1141
/// Maximum number of rows to return fetch: usize, - Implement the limit logic like this:
datafusion/datafusion/physical-plan/src/limit.rs
Lines 390 to 397 in f3b1141
if batch.num_rows() <= self.skip { self.skip -= batch.num_rows(); RecordBatch::new_empty(input.schema()) } else { let new_batch = batch.slice(self.skip, batch.num_rows() - self.skip); self.skip = 0; new_batch } - Add the limit to the explain plan display:
datafusion/datafusion/physical-plan/src/memory.rs
Lines 71 to 78 in f775791
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { f.debug_struct("MemoryExec") .field("partitions", &"[...]") .field("schema", &self.schema) .field("projection", &self.projection) .field("sort_information", &self.sort_information) .finish() } - Add tests here
use datafusion_common::stats::{ColumnStatistics, Precision};
Additional context
I think this is a relatively self contained project for a newcomer who has done rust before but wants to get experience with DataFusion and the engine part more
The only potential challenge is if this exposes more bugs such as #14335 and will likely require updating a bunch of tests.
It might be good to do this in a few PRs:
- Add the limit to the MemoryExec and display in one PR (but don't actually limit the output)
- Implement the actual output limits