Count distinct returned incorrect results without useApproximateCountDistinct#14748
Count distinct returned incorrect results without useApproximateCountDistinct#14748clintropolis merged 101 commits intoapache:masterfrom
Conversation
| for (int i = 0; i < aggSpec.size(); i++) { | ||
| values[i] = aggSpec.get(i).factorize(new AllNullColumnSelectorFactory()).get(); | ||
| } | ||
| return Collections.singleton(ResultRow.of(values)).iterator(); |
There was a problem hiding this comment.
There's a Collections.singletonIterator that you can use instead. It's a nit, but will save on an object allocation.
| Sequence<ResultRow> process; | ||
| if (isNestedQueryPushDown(query)) { | ||
| return mergeResultsWithNestedQueryPushDown(query, resource, runner, context); | ||
| process = mergeResultsWithNestedQueryPushDown(query, resource, runner, context); | ||
| } else { | ||
| process = mergeGroupByResultsWithoutPushDown(query, resource, runner, context); | ||
| } | ||
| return mergeGroupByResultsWithoutPushDown(query, resource, runner, context); | ||
| return GroupByQueryRunnerFactory.wrapSummaryRowIfNeeded(query, process); |
There was a problem hiding this comment.
I'm surprised that this was required, which test caused you to need this change? I say this because the only way you should be able to get a completely empty sequence here is if the "leaf nodes" are producing completely empty sequences. The change in the other place should ensure that no leaf node ever produces a completely empty sequence, meaning that this change shouldn't be necessary...
There was a problem hiding this comment.
Thank you for taking a look!
unfortunately its needed - I've linked the test(s) checking this.
The leaf nodes are not necessarily aggregating (in case of distinct) so an empty sequence may be produced - the merger supposed to aggregate them - that's why this is needed.
For nested query stuff the merge runner becomes this lambda (note: I don't know why I didn't placed this call there - just moved it)
example tests
- testCountDistinctNonApproximateEmptySet is a sql level one
- testSummaryrowForEmptySubqueryInput as a runnertest
|
The last test results have uncovered that To avoid that issue I've moved the insertion of the optional summary row to be right before postprocessing is applied |
With
useApproximateCountDistinct=falsequeries like:may have returned incorrected results.
This PR has: