Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY#15095
Rewrite EARLIEST/LATEST query operators to EARLIEST_BY/LATEST_BY#15095abhishekagarwal87 merged 13 commits intoapache:masterfrom
Conversation
|
|
||
| SqlAggFunction aggFunction; | ||
|
|
||
| switch (getName()) { |
There was a problem hiding this comment.
I wonder if this switch is really needed:
AggregatorTypeis an enum; you could probably save it in the constructor - so that no string based switch is necessary- and probably the enum could have a method or something which returns the other
- or you may also accept a second
AggregatorTypein the constructor ( what should it be rewritten to) so there will be no question what the "other" should be....
There was a problem hiding this comment.
Modified to take the latter approach of passing the replacement SqlAggFunction in the constructor itself.
|
|
||
| switch (operands.size()) { | ||
| case 1: | ||
| return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( |
There was a problem hiding this comment.
note:
you could probably use the list constructor of the call; so there is less duplication
List<SqlNode> newOperands = new ArrayList<SqlNode>();
newOperands.add(operands.get(0));
newOperands.add(new SqlIdentifier("__time", pos));
if (operands.size() == 2)
newOperands.add(operands.get(1));
return aggFunction.createCall(pos, newOperands);
There was a problem hiding this comment.
Thanks - updated the code likewise.
| } | ||
|
|
||
| @Override | ||
| public SqlNode rewriteCall( |
There was a problem hiding this comment.
I wonder if after this patch constructors of classes like LongLastAggregatorFactory should still accept timeColumn as null or not
There was a problem hiding this comment.
Good point! I think it shouldn't. Maybe can take up those modifications in a separate PR.
| return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( | ||
| "__time", pos)); | ||
| case 2: | ||
| return aggFunction.createCall(pos, operands.get(0), new SqlIdentifier( |
There was a problem hiding this comment.
I wonder if we should validate the new call created as well - since we're hard coding the __time now in the function call. It may or may not be present in the aggregation scope - which is actually a bad thing that the LATEST/EARLIEST invocations have.
There was a problem hiding this comment.
The new call will be validated post the rewrite. But there was still the issue of validator treating rewritten query as originating from the user and flagging unrelatable col __time not found in any table (row x, col y) error. I've now addressed it by introducing a custom SQLIdentifier class.
…ad creating a custom SQLIdentifier class for the Time column
| } | ||
| catch (CalciteContextException e) { | ||
| if (e.getCause() instanceof SqlValidatorException) { | ||
| throw DruidException.forPersona(DruidException.Persona.ADMIN) |
There was a problem hiding this comment.
| throw DruidException.forPersona(DruidException.Persona.ADMIN) | |
| throw DruidException.forPersona(DruidException.Persona.USER) |
There was a problem hiding this comment.
Chose ADMIN Persona here since the exceptions in existing tests such as testTimeColumnAggregationsOnLookups had had the same -- not sure why though. If that's incorrect, shall update the tests as well.
There was a problem hiding this comment.
This is the reason you are looking for: https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/QueryHandler.java#L692
However, in this case, it seems like we do know the exact reason why the query failed - It's due to the absence of a time column. So perhaps change the wording to be more assertive. Also, we should change it to user, because of the reason highlighted in the link above.
There was a problem hiding this comment.
I'm not sure if the reason will always be absence of time column - what if there's a case of a join where __time column is coming from both the tables, in that case I think it could also be an ambiguous column.
There was a problem hiding this comment.
I don't know what will admin do with these planning errors even if they are hints. The hints are meant for the end-user. I don't want to hold this PR so would let it merge as it is.
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)
) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite. (cherry picked from commit c6ca990)
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
…che#15095) EARLIEST and LATEST operators implicitly reference the __time column for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the __time column name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors. This change rewrites these operators to EARLIEST_BY and LATEST_BY during query processing to make the reference explicit to Calcite.
Description
EARLIESTandLATESToperators implicitly reference the__timecolumn for calculation of the aggregate value. Since the reference isn't explicit, Calcite sometimes fails to update the__timecolumn name when there's column renaming --such as in the case of nested queries -- resulting in column not found errors.This change rewrites these operators to
EARLIEST_BYandLATEST_BYduring query processing to make the reference explicit to Calcite.This PR has: