Affected Version
0.18.x
Description
As reported in #9792, a nested groupBy query can result in an incorrect result when these conditions are met:
- The nested groupBy is on top of a Join of subqueries
- Inner and outer groupBys have different filters.
In this case, the Join execution engine will use the filter of the outer groupBy query when it processes the inner groupBy query. For example, given a query as below,
WITH abc AS (
SELECT dim1, m2
FROM druid.foo
WHERE "__time" >= '2001-01-02'
),
def AS(
SELECT t1.dim1, SUM(t2.m2) AS "metricSum"
FROM abc AS t1 INNER JOIN abc AS t2 ON t1.dim1 = t2.dim1
WHERE t1.dim1='def'
GROUP BY 1
)
SELECT count(*) FROM def
Druid will make a query plan for this query as below:
groupBy (outer)
|
groupBy (inner)
|
join
/ \
scan scan
| |
foo foo
For this query plan, the broker will execute the two scan queries at leaf, materialize the results in memory, and then execute the join and groupBys. The join plan will be converted into a joinSegment and executed with the inner groupBy together. Due to this bug, the broker will ignore the filter t1.dim1 = 'def' on the inner groupBy query since there is no filter on the outer groupBy.
Affected Version
0.18.x
Description
As reported in #9792, a nested groupBy query can result in an incorrect result when these conditions are met:
In this case, the Join execution engine will use the filter of the outer groupBy query when it processes the inner groupBy query. For example, given a query as below,
Druid will make a query plan for this query as below:
For this query plan, the broker will execute the two scan queries at leaf, materialize the results in memory, and then execute the join and groupBys. The join plan will be converted into a joinSegment and executed with the inner groupBy together. Due to this bug, the broker will ignore the filter
t1.dim1 = 'def'on the inner groupBy query since there is no filter on the outer groupBy.