Conversation
clintropolis
left a comment
There was a problem hiding this comment.
overall changes lgtm, I think we def need to call out in release notes the changes in SQL for Druid default non-standard null handling behavior (as well as consider switching to SQL compatible being the default), but still seems worth it.
There was a problem hiding this comment.
Hmm, this seems strange and sort of lame in terms of the produced expression, but i guess more correct in terms of the types involved for the SQL?
There was a problem hiding this comment.
Something should be collapsing these casts. Could you add a comment to the test that we're aware the behavior here is weird? Maybe a future developer will see that and it will inspire them.
There was a problem hiding this comment.
why decimal instead of like bigint or... something?
There was a problem hiding this comment.
could possiblyWrapRootWithOuterLimitFromContext just return root.rel instead of null to collapse these near identical planner transform blocks?
There was a problem hiding this comment.
This test was testing the bloom_filter_test expression macro function rather than a generic bloom filter sql test, so I think the query needs to change to continue to do that, or this test be removed.
There was a problem hiding this comment.
should this be documented somewhere or just leave it hidden since I guess is mostly to be friendly to web console?
There was a problem hiding this comment.
I think leaving it undocumented makes sense, since it's meant to be internal. End users should add a LIMIT to their queries. If we discover use cases where it makes sense to expose it then we could do it then.
There was a problem hiding this comment.
I think it would be better to document internal parameters somewhere rather than depending on human memory. But I think we can do in a follow-up pr.
There was a problem hiding this comment.
I think this should be in the docs, it is a super useful parameter IMO even outside the web console
|
@jon-wei thanks. I'm reviewing this PR. |
| .dataSource(CalciteTests.DATASOURCE1) | ||
| .intervals(querySegmentSpec(Filtration.eternity())) | ||
| .filters(selector("dim2", "0", null)) | ||
| .filters(bound("dim2", "0", "0", false, false, null, StringComparators.NUMERIC)) |
There was a problem hiding this comment.
This means that instead of casting 0 to "0" the planner is now casting dim2 to numeric. Performance will be worse but it is technically more correct. (What if dim2 was "0.0"?) So, it sounds good to me.
| .columns(ImmutableList.of("dim1")) | ||
| .limit(2) | ||
| .order(ScanQuery.Order.DESCENDING) | ||
| .order(ScanQuery.Order.NONE) |
There was a problem hiding this comment.
Technically correct (the standard allows it) but now this test misses the point of what it was originally trying to validate (that the wrapping technique used by the web console works).
I suggest keeping this test, but adding a comment like the one you have in testSelectProjectionFromSelectSingleColumnDescending. And then adding another test that uses the inner query here, plus the new sqlOuterLimit functionality, to verify that the Druid query generated in that case is a good one.
There was a problem hiding this comment.
Added a comment, and an additional query within this test that uses the outer context limit instead. One difference there is that the new query includes __time within the results (otherwise the scan query rejects the query, since it is ordering on __time)
| { | ||
| // Regression test for https://github.com/apache/incubator-druid/issues/7768. | ||
|
|
||
| // After upgrading to Calcite 1.21, the results return in the wrong order, the ORDER BY __time DESC |
There was a problem hiding this comment.
Similar comment as above: the behavior is technically correct, and I think we should roll with it. For this test, I'd suggest:
- keep the query the same
- remove this comment
- add a comment that says we're verifying that the inner order by is stripped, as the standard allows
There was a problem hiding this comment.
I revised the comment here
| ) | ||
| ) | ||
| ) | ||
| //.filters(expressionFilter("case_searched((\"dim2\" == 'a'),1,isnull(\"dim2\"))")) |
There was a problem hiding this comment.
Please don't include commented-out code.
There was a problem hiding this comment.
This has been removed
| ImmutableList.of(new Object[]{0L}) | ||
| ); | ||
| /* | ||
| The SQL query in this test planned to the following Druid query in Calcite 1.17. |
There was a problem hiding this comment.
What does the test case look like if NULLIF(dim2, 'a') = NULL is replaced with NULLIF(dim2, 'a') IS NULL? The new behavior seems like a valid optimization, since <anything> = NULL is false. We could update the SQL docs appropriately to point out that you should avoid = NULL and use IS NULL no matter what your null handling mode is.
Btw, please don't include commented-out code.
There was a problem hiding this comment.
There is a version that uses IS NULL instead in testNullEmptyStringEquality
There was a problem hiding this comment.
I think leaving it undocumented makes sense, since it's meant to be internal. End users should add a LIMIT to their queries. If we discover use cases where it makes sense to expose it then we could do it then.
| not(selector("dim2", "a", null)) | ||
| ) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The new filter is dim2 = 'a' OR (dim2 IS NULL AND dim2 != 'a'). It seems a bit… weird. Ideally it should be simplified to dim2 = 'a' OR dim2 IS NULL. It's fine for now, but could you add a comment about this?
There was a problem hiding this comment.
Added a comment on the unnecessary dim2 != 'a'
There was a problem hiding this comment.
Something should be collapsing these casts. Could you add a comment to the test that we're aware the behavior here is weird? Maybe a future developer will see that and it will inspire them.
| } | ||
|
|
||
| /** | ||
| * In Calcite 1.17, this test worked, but after upgrading to Calcite 1.21, this query fails with: |
There was a problem hiding this comment.
Please do two things in response to this:
- Keep this test as-is, but add a new, second test that is the same basic query shape but doesn't trip the
dim1is ambiguous thing. Maybe rename one of them, or group by ordinals. Testing project after sort is still important, so there's value in this. - If you think this is a Calcite bug, raise a Calcite issue at: https://issues.apache.org/jira/projects/CALCITE/
There was a problem hiding this comment.
I added a new test:
@Test
public void testProjectAfterSort3WithoutAmbiguity() throws Exception
{
// This query is equivalent to the one in testProjectAfterSort3 but renames the second grouping column
// to avoid the ambiguous name exception. The inner sort is also optimized out in Calcite 1.21.
testQuery(
"select copydim1 from (select dim1, dim1 AS copydim1, count(*) cnt from druid.foo group by dim1, dim1 order by cnt)",
ImmutableList.of(
GroupByQuery.builder()
.setDataSource(CalciteTests.DATASOURCE1)
.setInterval(querySegmentSpec(Filtration.eternity()))
.setGranularity(Granularities.ALL)
.setDimensions(
dimensions(
new DefaultDimensionSpec("dim1", "d0")
)
)
.setContext(QUERY_CONTEXT_DEFAULT)
.build()
),
ImmutableList.of(
new Object[]{""},
new Object[]{"1"},
new Object[]{"10.1"},
new Object[]{"2"},
new Object[]{"abc"},
new Object[]{"def"}
)
);
}
As with the other project-after-sort tests, the inner ordering is optimized out.
I'll need to do some more investigation to see if this is a bug or not.
| .aggregators(aggregators( | ||
| new CountAggregatorFactory("a0") | ||
| )) | ||
| // after upgrading to Calcite 1.21, expressions like sin(pi/6) that only reference |
|
|
||
| public class DruidOperatorTable implements SqlOperatorTable | ||
| { | ||
| private static final EmittingLogger log = new EmittingLogger(DruidOperatorTable.class); |
There was a problem hiding this comment.
Removed unused logger
| import org.apache.druid.segment.DimensionHandlerUtils; | ||
| import org.apache.druid.sql.calcite.rel.DruidConvention; | ||
| import org.apache.druid.sql.calcite.rel.DruidRel; | ||
| import org.checkerframework.checker.nullness.qual.Nullable; |
There was a problem hiding this comment.
Hmm I think we usually use javax.annotation.Nullable.
There was a problem hiding this comment.
Ah, changed to the right Nullable
There was a problem hiding this comment.
I think it would be better to document internal parameters somewhere rather than depending on human memory. But I think we can do in a follow-up pr.
| "SELECT COUNT(*)\n" | ||
| + "FROM druid.foo\n" | ||
| + "WHERE NULLIF(dim2, 'a') = null", | ||
| ImmutableList.of(), |
There was a problem hiding this comment.
Missing query plan verification?
There was a problem hiding this comment.
No native Druid query is actually generated for this, the filter would never be true and Calcite just returns a count of 0
| + "WHERE NULLIF(dim2, 'a') = null", | ||
| ImmutableList.of(), | ||
| NullHandling.replaceWithDefault() ? | ||
| // Matches everything but "abc" |
There was a problem hiding this comment.
Looks like the comment is wrong?
There was a problem hiding this comment.
Fixed this area, the ternary was also unnecessary
* Upgrade Calcite to 1.21 * Checkstyle, test fix' * Exclude calcite yaml deps, update license.yaml * Add method for exception chain handling * Checkstyle * PR comments, Add outer limit context flag * Revert project settings change * Update subquery test comment * Checkstyle fix * Fix test in sql compat mode * Fix test * Fix dependency analysis * Address PR comments * Checkstyle * Adjust testSelectStarFromSelectSingleColumnWithLimitDescending
This PR updates Calcite to 1.21.
Upgrading also fixes #8266
This patch introduces incompatible behavior in two areas:
https://github.com/apache/incubator-druid/pull/8566/files#diff-190f8eaf9dd09e8d7ae0564804286afcR1830
https://github.com/apache/incubator-druid/pull/8566/files#diff-190f8eaf9dd09e8d7ae0564804286afcR1888
The web console relies on the subquery ORDER BY being respected (it can wrap queries made in the query view with an outer query that has a limit), so this patch adds an internal query context flag that wraps a query with a limit-only LogicalSort if specified.
Before including this version upgrade in a release, I think we should expose the SQL compatible null handling mode in our docs (#4349) and update the web console to use the new context flag for applying limits.
This PR has: