Skip to content

Incorrect result of nested groupBy query on Join of subqueries #9866

@jihoonson

Description

@jihoonson

Affected Version

0.18.x

Description

As reported in #9792, a nested groupBy query can result in an incorrect result when these conditions are met:

  • The nested groupBy is on top of a Join of subqueries
  • Inner and outer groupBys have different filters.

In this case, the Join execution engine will use the filter of the outer groupBy query when it processes the inner groupBy query. For example, given a query as below,

WITH abc AS (
  SELECT dim1, m2
  FROM druid.foo 
  WHERE "__time" >= '2001-01-02'
),
def AS(
  SELECT t1.dim1, SUM(t2.m2) AS "metricSum" 
  FROM abc AS t1 INNER JOIN abc AS t2 ON t1.dim1 = t2.dim1
  WHERE t1.dim1='def'
  GROUP BY 1
)
SELECT count(*) FROM def

Druid will make a query plan for this query as below:

 groupBy (outer)
    |
 groupBy (inner)
    |
   join
  /    \
scan  scan
 |      |
foo    foo

For this query plan, the broker will execute the two scan queries at leaf, materialize the results in memory, and then execute the join and groupBys. The join plan will be converted into a joinSegment and executed with the inner groupBy together. Due to this bug, the broker will ignore the filter t1.dim1 = 'def' on the inner groupBy query since there is no filter on the outer groupBy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions