Support projection after sorting in SQL#5788
Conversation
| final Sort sort = call.rel(1); | ||
| final Aggregate aggregate = call.rel(2); | ||
|
|
||
| return aggregate != null && sort != null && project != null; |
There was a problem hiding this comment.
I don't think these can be null. So it should be safe to remove the entire matches method, in which case the rule will fire for any project -> sort -> aggregate -> druidrel.
| } | ||
| }; | ||
|
|
||
| public static RelOptRule AGGREGATE_SORT_PROJECT = new DruidOuterQueryRule( |
There was a problem hiding this comment.
Can you create a test that hits this rule? It should be a nested groupby where the outer query has the sort + project combo.
| } | ||
| } | ||
|
|
||
| private static RowOrderAndPostAggregations computePostAggregations( |
There was a problem hiding this comment.
How about calling this method simply create?
There was a problem hiding this comment.
Would you tell me your thoughts in more detail? create sounds too broad and less intuitive.
There was a problem hiding this comment.
I was thinking that RowOrderAndPostAggregations.create is still pretty obvious as to what it does.
| } | ||
| } | ||
|
|
||
| private static class RowOrderAndPostAggregations |
There was a problem hiding this comment.
Better name would be ProjectRowOrderAndPostAggregations. It's longer but it has the word "Project" in there, and this is something that really is meant to represent a projection.
|
|
||
| SortProject( | ||
| RowSignature inputRowSignature, | ||
| List<Aggregation> postAggregators, |
There was a problem hiding this comment.
If these are meant to only be post-aggregators, I think it'd be better to pass in a List<PostAggregator>. The idea of an Aggregation is that it can bundle together aggregators and post-aggregators. But here, we never want regular aggregators.
There was a problem hiding this comment.
Good point. Changed.
| private static class RowOrderAndPostAggregations | ||
| { | ||
| private final List<String> rowOrder; | ||
| private final List<Aggregation> postAggregations; |
There was a problem hiding this comment.
Since this is meant to only be PostAggregators, why not have this be a List<PostAggregator>?
| final Set<String> seen = new HashSet<>(); | ||
| inputRowSignature.getRowOrder().forEach(field -> { | ||
| if (!seen.add(field)) { | ||
| throw new ISE("Duplicate field name: %s", field); |
There was a problem hiding this comment.
I don't think this anti-collision verification is necessary. It may even be a bug.
It's checking that the input row signature has no duplicate output field names, but, it might (if the input is select a, a from tbl group by a, a then the input row order will be something like ["d0","d0"]). And that would be okay.
There was a problem hiding this comment.
I believe Calcite can remove duplicate columns automatically. Please check the testProjectAfterSort3().
There was a problem hiding this comment.
Hmm, okay, let's leave it in then and if it's too aggressive we can remove it later.
| } else { | ||
| if (sortProject != null) { | ||
| for (Aggregation aggregation : sortProject.getPostAggregators()) { | ||
| retVal.addAll(aggregation.getVirtualColumns()); |
There was a problem hiding this comment.
There will never be any virtual columns added by post-aggregators (virtual columns can only be added by aggregators that read the input data).
This code would be removed naturally if getPostAggregators was changed to return List<PostAggregator> rather than List<Aggregation>, so that's a point in favor of changing those types.
* Add sort project * add more test * address comments
* Add sort project * add more test * address comments
In SQL, an additional projection can be added after sorting, which means, the projections after sorting can be different from the projections before sorting.
An example SQL is
This change is