Skip to content

Conversation

@alamb
Copy link

@alamb alamb commented Nov 8, 2023

This is a proposed way to simplify the logic in https://github.com/apache/arrow-datafusion/pull/8038/files

Note it currently targets apache#8038 but we can also apply it as a PR directly to DataFusion after apache#8038 is merged

I will comment inline on the rationale and changes

cc @msirek

let elapsed_compute = self.baseline_metrics.elapsed_compute().clone();

// common function for signalling end of processing of the input stream
fn set_input_done_and_produce_output(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only modifies fields on GroupedHashAggregateStream so we can reduce a level of indirection and follow the pattern in other functionality in this module by moving it to the impl GroupedHashAggregateStream { ... }

// make sure the exec_state just set is not overwritten below
break 'reading_input;
}
if self.hit_soft_group_limit() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose moving this check into its own function to try and keep the core group aggregate logic as simple as possible

}
if self.hit_soft_group_limit() {
timer.done();
extract_ok!(self.set_input_done_and_produce_output());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the signature of set_input_done_and_produce_output to return just a Result<()> simplifies handling and makes it clearer what can be returned

}

// common function for signalling end of processing of the input stream
fn set_input_done_and_produce_output(&mut self) -> Result<()> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original signature implies this can return early as well as may return a RecordBatch, neither of which is true

@alamb alamb changed the title Alamb/simplify agg Simplify code for LIMIT in aggregate stream Nov 8, 2023
@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 16, 2025
@github-actions github-actions bot closed this Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant