Skip to content

Unnecessary cartesian explosion if multi-value column is reused in expression #8947

@vogievetsky

Description

@vogievetsky

Affected Version

Druid 0.16, 0.15

Description

With this dataset (as fs):

{"time":"2019-08-14T00:00:00.000Z","srcGroups":["x","y","z"],"dstGroups":["a","b","c","d"]}
{"time":"2019-08-14T00:00:00.000Z","srcGroups":["x","y","z"],"dstGroups":["a","c","d"]}
{"time":"2019-08-14T00:00:00.000Z","srcGroups":["x","y","z"],"dstGroups":["a","g"]}

Doing this query:

SELECT
  CASE "dstGroups"
    WHEN 'b' THEN 'b'
    WHEN 'g' THEN 'g'
    ELSE 'Other'
  END AS "dst",
  COUNT(*) AS "Count"
FROM "fs"
GROUP BY 1
ORDER BY "Count" DESC

Yields an unexpected cartesian explosion:

image

This is due to this expression in the underlying plan:

"case_searched((\"dstGroups\" == 'b'),'b',(\"dstGroups\" == 'g'),'g','Other')"

Which triggers a cartesian product on the same column dstGroups which it should not do.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions