Simplify update_skip_aggregation_probe method#12332
Conversation
| /// Note: currently spilling is not supported for Partial aggregation | ||
| fn update_skip_aggregation_probe(&mut self, input_rows: usize) { | ||
| if let Some(probe) = self.skip_aggregation_probe.as_mut() { | ||
| if !self.spill_state.spills.is_empty() { |
There was a problem hiding this comment.
Now partial aggregation only do early emit indeed, but I guess this check is trying to be defensive, if someone decides to do spilling in the partial stage also, the early emit logic won't be break 🤔 Maybe we can leave an assertion here?
There was a problem hiding this comment.
In #7400, the author tried spilling in partial stage, but he was asked to remove that. See #7400 (comment).
There was a problem hiding this comment.
I guess this check is trying to be defensive
Yes, that was the idea -- these features are mutually exclusive, with current implementation of spilling, but if they both are triggered (which never happens for now -- probably the note should be more informative) it shouldn't break query execution.
I think the change is reasonable since current code is redundant indeed, but not sure about assertion -- I suppose it's better to return not_implemented / internal error instead of panicking.
And it's probably worth to retain the note on why skip partial is incompatible with spilling.
There was a problem hiding this comment.
Should we add another assert to ensure and emphasize that the update_state is only called in Partial mode?
IMO, when we are sure that some branches are actually unreacheable, assertion may be nice that it can let us easier to find the bug through tests?
...
assert!(self.spill_state.spills.is_empty() && self.mode == AggregateMode::Partial);
probe.update_state(input_rows, self.group_values.len());
...There was a problem hiding this comment.
I agree -- adding an assertion would be great. Can you perhaps make a PR to do so?
alamb
left a comment
There was a problem hiding this comment.
Looks good to me. Let's get this merged 🚀
cc @Rachelint
…aggregation_probe
|
I took the liberty of merging up from main to resolve a conflict |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Spilling only happens when mode is not
AggregateMode::Partial,datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 890 in 91b1d2b
but skipping agg probe only happens when mode is
AggregateMode::Partial.datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 530 in 91b1d2b
So there is no spilling when
skip_aggregation_probeisOption::Some.Are these changes tested?
Are there any user-facing changes?