Support projection after sorting in SQL by jihoonson · Pull Request #5788 · apache/druid

jihoonson · 2018-05-20T01:38:34Z

In SQL, an additional projection can be added after sorting, which means, the projections after sorting can be different from the projections before sorting.

An example SQL is

SELECT dim1
FROM (
  SELECT dim1, dim2, count(*) cnt 
  FROM druid.foo 
  GROUP BY dim1, dim2 
  ORDER BY cnt
) t

This change is

gianm · 2018-05-30T05:59:56Z

+        final Sort sort = call.rel(1);
+        final Aggregate aggregate = call.rel(2);
+
+        return aggregate != null && sort != null && project != null;


I don't think these can be null. So it should be safe to remove the entire matches method, in which case the rule will fire for any project -> sort -> aggregate -> druidrel.

gianm · 2018-05-30T06:02:54Z

      }
    };

+    public static RelOptRule AGGREGATE_SORT_PROJECT = new DruidOuterQueryRule(


Can you create a test that hits this rule? It should be a nested groupby where the outer query has the sort + project combo.

Added a test.

gianm · 2018-05-30T06:04:50Z

+    }
+  }
+
+  private static RowOrderAndPostAggregations computePostAggregations(


How about calling this method simply create?

Would you tell me your thoughts in more detail? create sounds too broad and less intuitive.

I was thinking that RowOrderAndPostAggregations.create is still pretty obvious as to what it does.

gianm · 2018-05-30T06:15:26Z

+    }
+  }
+
+  private static class RowOrderAndPostAggregations


Better name would be ProjectRowOrderAndPostAggregations. It's longer but it has the word "Project" in there, and this is something that really is meant to represent a projection.

gianm · 2018-05-30T06:27:43Z

+
+  SortProject(
+      RowSignature inputRowSignature,
+      List<Aggregation> postAggregators,


If these are meant to only be post-aggregators, I think it'd be better to pass in a List<PostAggregator>. The idea of an Aggregation is that it can bundle together aggregators and post-aggregators. But here, we never want regular aggregators.

Good point. Changed.

gianm · 2018-05-30T06:28:19Z

+  private static class RowOrderAndPostAggregations
+  {
+    private final List<String> rowOrder;
+    private final List<Aggregation> postAggregations;


Since this is meant to only be PostAggregators, why not have this be a List<PostAggregator>?

gianm · 2018-05-30T06:29:32Z

+    final Set<String> seen = new HashSet<>();
+    inputRowSignature.getRowOrder().forEach(field -> {
+      if (!seen.add(field)) {
+        throw new ISE("Duplicate field name: %s", field);


I don't think this anti-collision verification is necessary. It may even be a bug.

It's checking that the input row signature has no duplicate output field names, but, it might (if the input is select a, a from tbl group by a, a then the input row order will be something like ["d0","d0"]). And that would be okay.

I believe Calcite can remove duplicate columns automatically. Please check the testProjectAfterSort3().

Hmm, okay, let's leave it in then and if it's too aggressive we can remove it later.

gianm · 2018-05-30T06:31:19Z

+    } else {
+      if (sortProject != null) {
+        for (Aggregation aggregation : sortProject.getPostAggregators()) {
+          retVal.addAll(aggregation.getVirtualColumns());


There will never be any virtual columns added by post-aggregators (virtual columns can only be added by aggregators that read the input data).

This code would be removed naturally if getPostAggregators was changed to return List<PostAggregator> rather than List<Aggregation>, so that's a point in favor of changing those types.

gianm

LGTM, thanks @jihoonson!

* Add sort project * add more test * address comments

Add sort project

7f2c4d0

jihoonson added the Area - SQL label May 20, 2018

add more test

413109a

gianm reviewed May 30, 2018

View reviewed changes

address comments

6f47841

gianm approved these changes Jun 11, 2018

View reviewed changes

gianm merged commit fe4d678 into apache:master Jun 11, 2018

gianm pushed a commit to implydata/druid-public that referenced this pull request Jun 18, 2018

Support projection after sorting in SQL (apache#5788)

66c71bd

* Add sort project * add more test * address comments

jihoonson mentioned this pull request Aug 22, 2018

SQL planning error for nested queries #6211

Closed

jihoonson added this to the 0.12.3 milestone Aug 22, 2018

gianm mentioned this pull request Aug 25, 2018

[Backport] Support projection after sorting in SQL #6228

Merged

gianm pushed a commit to gianm/druid that referenced this pull request Aug 25, 2018

Support projection after sorting in SQL (apache#5788)

0ef2e41

* Add sort project * add more test * address comments

gianm mentioned this pull request Aug 25, 2018

[SQL] Fix missing postAggregations for Timeseries and TopN #5912

Merged

fjy pushed a commit that referenced this pull request Aug 26, 2018

Support projection after sorting in SQL (#5788) (#6228)

bc07320

* Add sort project * add more test * address comments

jihoonson mentioned this pull request Aug 27, 2018

Nested SQL query throws error on latest version 0.11.0 #5353

Closed

gianm mentioned this pull request Aug 28, 2018

SQL: Fix post-aggregator naming logic for sort-project. #6250

Merged

jon-wei mentioned this pull request Sep 1, 2018

[DRAFT] Druid 0.12.3 release notes #6288

Closed

Conversation

jihoonson commented May 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson Jun 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jihoonson commented May 20, 2018 •

edited

Loading

jihoonson Jun 2, 2018 •

edited

Loading