Skip to content

Conversation

@mertak-synnada
Copy link

Which issue does this PR close?

Closes #14446.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Feb 5, 2025
03)----Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[min(aggregate_test_100.c1)]]
04)------TableScan: aggregate_test_100 projection=[c1, c3]
physical_plan
01)GlobalLimitExec: skip=0, fetch=5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that SKIP is missing.

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the SKIP setting is not supported for all with_fetch operators, for example:

SortPreservingMergeExec also support fetch but not support setting skip, it's default to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to use skip=0 for the SortPreservingMergeExec:

 1 => match self.fetch {
                Some(fetch) => {
                    let stream = self.input.execute(0, context)?;
                    debug!("Done getting stream for SortPreservingMergeExec::execute with 1 input with {fetch}");
                    Ok(Box::pin(LimitStream::new(
                        stream,
                        0,
                        Some(fetch),
                        BaselineMetrics::new(&self.metrics, partition),
                    )))
                }
impl LimitStream {
    pub fn new(
        input: SendableRecordBatchStream,
        skip: usize,
        fetch: Option<usize>,
        baseline_metrics: BaselineMetrics,
    ) -> Self {
        let schema = input.schema();
        Self {
            skip,
            fetch: fetch.unwrap_or(usize::MAX),
            input: Some(input),
            schema,
            baseline_metrics,
        }
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @zhuqi-lucas mentioned, only Limit operators support skip, and limit_pushdown is adding a Limit operator if skip exists, so this is only affecting plans without skip.

Here's the query result with skip:

query TT
EXPLAIN SELECT DISTINCT c3, min(c1) FROM aggregate_test_100 group by c3 limit 5 offset 3;
----
logical_plan
01)Limit: skip=3, fetch=5
02)--Aggregate: groupBy=[[aggregate_test_100.c3, min(aggregate_test_100.c1)]], aggr=[[]]
03)----Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[min(aggregate_test_100.c1)]]
04)------TableScan: aggregate_test_100 projection=[c1, c3]
physical_plan
01)GlobalLimitExec: skip=3, fetch=5
02)--CoalescePartitionsExec: fetch=8
03)----AggregateExec: mode=FinalPartitioned, gby=[c3@0 as c3, min(aggregate_test_100.c1)@1 as min(aggregate_test_100.c1)], aggr=[], lim=[8]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------RepartitionExec: partitioning=Hash([c3@0, min(aggregate_test_100.c1)@1], 4), input_partitions=4
06)----------AggregateExec: mode=Partial, gby=[c3@0 as c3, min(aggregate_test_100.c1)@1 as min(aggregate_test_100.c1)], aggr=[], lim=[8]
07)------------AggregateExec: mode=FinalPartitioned, gby=[c3@0 as c3], aggr=[min(aggregate_test_100.c1)]
08)--------------CoalesceBatchesExec: target_batch_size=8192
09)----------------RepartitionExec: partitioning=Hash([c3@0], 4), input_partitions=4
10)------------------AggregateExec: mode=Partial, gby=[c3@1 as c3], aggr=[min(aggregate_test_100.c1)]
11)--------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
12)----------------------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c3], has_header=true

cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
let poll = self.inner.poll_next_unpin(cx);
let poll = self.limit_reached(poll);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

let Some(fetch) = self.fetch else { return poll };

May be we can move the following to above, only call limit_reached when

let Some(fetch) = self.fetch {
    poll = self.limit_reached(poll);
}

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@berkaysynnada berkaysynnada merged commit 55730dc into apache:main Feb 6, 2025
25 checks passed
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Jun 19, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Sep 23, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Oct 1, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Oct 15, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>
Dandandan added a commit to coralogix/arrow-datafusion that referenced this pull request Nov 17, 2025
* Get working build

* Add pool_size method to MemoryPool (#218) (#230)

* Add pool_size method to MemoryPool

* Fix

* Fmt

Co-authored-by: Daniël Heres <danielheres@gmail.com>

* Respect `IGNORE NULLS` flag in `ARRAY_AGG` (#260) (apache#15544) v48

* Hook for doing distributed `CollectLeft` joins (#269)

* Ignore writer shutdown error (#271)

* ignore writer shutdown error

* cargo check

* Fix  bug in `swap_hash_join` (#278)

* Try and fix swap_hash_join

* Only swap projections when join does not have projections

* just backport upstream fix

* remove println

* Support Duration in min/max agg functions (#283) (apache#15310) v47

* Support Duration in min/max agg functions

* Attempt to fix build

* Attempt to fix build - Fix chrono version

* Revert "Attempt to fix build - Fix chrono version"

This reverts commit fd76fe6.

* Revert "Attempt to fix build"

This reverts commit 9114b86.

---------

Co-authored-by: svranesevic <svranesevic@users.noreply.github.com>

* Fix panics in array_union (#287) (apache#15149) v48

* Drop rust-toolchain

* Fix panics in array_union

* Fix the chrono

* Backport `GroupsAccumulator` for Duration min/max agg (#288) (apache#15322) v47

* Fix array_sort for empty record batch (#290) (apache#15149) v48

* fix: rewrite fetch, skip of the Limit node in correct order (apache#14496) v46

* fix: rewrite fetch, skip of the Limit node in correct order

* style: fix clippy

* Support aliases in ConstEvaluator (apache#14734) (#281) v46

* Support aliases in ConstEvaluator (apache#14734)

Not sure why they are not supported. It seems that if we're not careful,
some transformations can introduce aliases nested inside other expressions.

* Format Cargo.toml

* Preserve the name of grouping sets in SimplifyExpressions (#282) (apache#14888) v46

Whenever we use `recompute_schema` or `with_exprs_and_inputs`,
this ensures that we obtain the same schema.

* Support Duration in min/max agg functions (#284) (apache#15310) v47

Co-authored-by: svranesevic <svranesevic@users.noreply.github.com>

* fix case_column_or_null with nullable when conditions (apache#13886) v45

* fix case_column_or_null with nullable when conditions

* improve sqllogictests for case_column_or_null

---------

Co-authored-by: zhangli20 <zhangli20@kuaishou.com>

* Fix casewhen (apache#14156) v45

* Cherry-pick topk limit pushdown fix (apache#14192) v45

* fix: FULL OUTER JOIN and LIMIT produces wrong results (apache#14338) v45

* fix: FULL OUTER JOIN and LIMIT produces wrong results

* Fix minor slt testing

* fix test

(cherry picked from commit ecc5694)

* Cherry-pick global limit fix (apache#14245) v45

* fix: Limits are not applied correctly (apache#14418) v46

* fix: Limits are not applied correctly

* Add easy fix

* Add fix

* Add slt testing

* Address comments

* Disable grouping set in CSE

* Fix spm + limit (apache#14569) v46

* prost 0.13 / fix parquet dep

* Delete unreliable checks

* Segfault in ByteGroupValueBuilder (#294) (apache#15968) v50

* test to demonstrate segfault in ByteGroupValueBuilder

* check for offset overflow

* clippy

(cherry picked from commit 5bdaeaf)

* Update arrow dependency to include rowid (#295)

* Update arrow version

* Feat: Add fetch to CoalescePartitionsExec (apache#14499) (#298) v46

* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>

* Fix `CoalescePartitionsExec` proto serialization (apache#15824) (#299) v48

* add fetch to CoalescePartitionsExecNode

* gen proto code

* Add test

* fix

* fix build

* Fix test build

* remove comments

Co-authored-by: 张林伟 <lewiszlw520@gmail.com>

* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec (#300)

* Add JoinContext with JoinLeftData to TaskContext in HashJoinExec

* Expose random state as const

* re-export ahash::RandomState

* JoinContext default impl

* Add debug log when setting join left data

* Update arrow version for not preserving dict_id (#303)

* Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream (apache#13995) (#302) v45

* Refactor spill handling in GroupedHashAggregateStream to use partial aggregate schema

* Implement aggregate functions with spill handling in tests

* Add tests for aggregate functions with and without spill handling

* Move test related imports into mod test

* Rename spill pool test functions for clarity and consistency

* Refactor aggregate function imports to use fully qualified paths

* Remove outdated comments regarding input batch schema for spilling in GroupedHashAggregateStream

* Update aggregate test to use AVG instead of MAX

* assert spill count

* Refactor partial aggregate schema creation to use create_schema function

* Refactor partial aggregation schema creation and remove redundant function

* Remove unused import of Schema from arrow::datatypes in row_hash.rs

* move spill pool testing for aggregate functions to physical-plan/src/aggregates

* Use Arc::clone for schema references in aggregate functions

(cherry picked from commit 81b50c4)

Co-authored-by: kosiew <kosiew@gmail.com>

* Update tag

* Push limits past windows (#337) (apache#17347) v50

* Restore old method for DQE

* feat(optimizer): Enable filter pushdown on window functions (apache#14026) v45

* Avoid Aliased Window Expr Enter Unreachable Code (apache#14109) v45

(cherry picked from commit fda500a)

* Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#355) (apache#17757) v51

* Update PR template to be relevant to our fork

* Make limit pushdown work for SortPreservingMergeExec (apache#17893) (#361)

* re-publicise functions DQE relies on

* Handle columns in with_new_exprs with a Join (apache#15055) (#384)

apache#15055

* handle columns in with_new_exprs with Join

* test doesn't return result

* take join from result

* clippy

* make test fallible

* accept any pair of expression for new_on in with_new_exprs for Join

* use with_capacity

Co-authored-by: delamarch3 <68732277+delamarch3@users.noreply.github.com>

---------

Co-authored-by: Georgi Krastev <georgi.krastev@coralogix.com>
Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Co-authored-by: Faiaz Sanaulla <105630300+fsdvh@users.noreply.github.com>
Co-authored-by: Sava Vranešević <20240220+svranesevic@users.noreply.github.com>
Co-authored-by: svranesevic <svranesevic@users.noreply.github.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
Co-authored-by: Zhang Li <richselian@gmail.com>
Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Co-authored-by: Aleksey Kirilishin <54231417+avkirilishin@users.noreply.github.com>
Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: Qi Zhu <821684824@qq.com>
Co-authored-by: Martins Purins <martins.purins@coralogix.com>
Co-authored-by: mertak-synnada <mertak67+synaada@gmail.com>
Co-authored-by: 张林伟 <lewiszlw520@gmail.com>
Co-authored-by: kosiew <kosiew@gmail.com>
Co-authored-by: nuno-faria <nunofpfaria@gmail.com>
Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
Co-authored-by: Mason Hall <mason.hall@coralogix.com>
Co-authored-by: delamarch3 <68732277+delamarch3@users.noreply.github.com>
avantgardnerio pushed a commit to coralogix/arrow-datafusion that referenced this pull request Dec 18, 2025
* add fetch info to CoalescePartitionsExec

* use Statistics with_fetch API on CoalescePartitionsExec

* check limit_reached only if fetch is assigned

(cherry picked from commit 55730dc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CoalescePartitionsExec fetch (limit) support

5 participants