INSERT/REPLACE can omit clustering when catalog has default by zachjsh · Pull Request #16260 · apache/druid

zachjsh · 2024-04-10T20:12:35Z

Description

This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing for tables that are defined in the catalog to have any defined clustering columns used in DML INSERT/REPLACE operations without needing to be specified at query time. If the user specified a clustering columns at query time, these columns are preferred to the catalog defined clustering columns.

This PR has:

…ql-type-inference

…mplex-columns

…ing-columns

…mplex-columns

…' into use-catalog-clustering-columns

…ing-columns

kgyrtkirk

+1 ; just left some questions :)

kgyrtkirk · 2024-04-24T07:55:18Z

+      final SqlIdentifier colIdent = new SqlIdentifier(
+          Collections.singletonList(keyCol.expr()),
+          null, SqlParserPos.ZERO,
+          Collections.singletonList(SqlParserPos.ZERO)
+      );


I was wondering what will happen in the following case:

say colunmn c is a clusterKey

we are selecting from a join which has column c on both sides

but it seems like the column in the select list will take precedence.

one more thing I was wondering about: do we have a check that all keyCols are present in the selected column list?

About whether there is a check that al keyCols are present in the selected column list, see the following tests:

testInsertTableWithClusteringWithClusteringOnNewColumnFromQuery
testInsertTableWithClusteringWithClusteringOnBadColumn

Do these cover the cases you are talking about?

About the join issue, do you have a concrete query in example, just to clarify?

kgyrtkirk · 2024-04-24T08:12:40Z

+      final IdentifierNamespace insertNs = (IdentifierNamespace) targetNamespace;
+      SqlIdentifier identifier = insertNs.getId();
+      SqlValidatorTable catalogTable = getCatalogReader().getTable(identifier.names);
+      if (catalogTable != null) {


wouldn't the fall-thru from this condtional will cause that the CLUSTER BY on the ingestNode will not be applied (line399 right now); even if its there - is that okay?

if the ingestNode already has the clustering columns, they will be used. There are existing tests which test that the clustering columns are used in the plan returned from dml query, when clustering is defined at query time, and the table is / it not in catalog. Let me know if this covers the issue that think could occur.

…ing-columns

zachjsh added 10 commits March 27, 2024 15:48

* fix

fafcc76

* fix

fe2c407

Merge remote-tracking branch 'apache/master' into fix-complex-types-s…

80151fc

…ql-type-inference

* address review comments

357e6a7

* fix

fd6cb24

* simplify tests

3012773

* fix complex type nullability issue

853ea76

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

7b20b83

…mplex-columns

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

9890d91

…mplex-columns

* implement and add tests

758a414

zachjsh requested a review from kgyrtkirk April 10, 2024 20:12

github-actions Bot added the Area - Querying label Apr 10, 2024

zachjsh requested review from abhishekrb19, clintropolis and jon-wei April 10, 2024 20:12

github-advanced-security AI found potential problems Apr 10, 2024

View reviewed changes

Comment thread sql/src/test/java/org/apache/druid/sql/calcite/CalciteCatalogIngestionDmlTest.java Fixed

Comment thread sql/src/test/java/org/apache/druid/sql/calcite/CalciteCatalogIngestionDmlTest.java Fixed

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

0401766

…ing-columns

kgyrtkirk reviewed Apr 12, 2024

View reviewed changes

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java Outdated

kgyrtkirk reviewed Apr 12, 2024

View reviewed changes

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

zachjsh added 7 commits April 16, 2024 11:22

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

b042bb6

…mplex-columns

* address review comments

fdf2140

* address test review comments

736a7c8

Merge remote-tracking branch 'origin/validate-catalog-complex-columns…

000e015

…' into use-catalog-clustering-columns

* fix checkstyle

7ad8289

Merge remote-tracking branch 'origin/validate-catalog-complex-columns…

c4bb77d

…' into use-catalog-clustering-columns

* fix dependencies

6181eef

github-actions Bot added the Area - Dependencies label Apr 16, 2024

zachjsh added 3 commits April 17, 2024 14:36

* all tests passing

87b4dd2

* cleanup

f9f6b7b

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

009b684

…ing-columns

* remove unneeded code

7cc749a

zachjsh requested a review from kgyrtkirk April 17, 2024 19:10

* remove unused dependency

03838bf

kgyrtkirk approved these changes Apr 24, 2024

View reviewed changes

zachjsh added 3 commits April 25, 2024 13:06

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

3b9d78d

…ing-columns

* fix checkstyle

659cac0

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

a0880bc

…ing-columns

zachjsh merged commit 365cd7e into apache:master Apr 26, 2024

zachjsh deleted the use-catalog-clustering-columns branch April 26, 2024 14:19

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT/REPLACE can omit clustering when catalog has default#16260

INSERT/REPLACE can omit clustering when catalog has default#16260
zachjsh merged 26 commits intoapache:masterfrom
zachjsh:use-catalog-clustering-columns

zachjsh commented Apr 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kgyrtkirk left a comment

Uh oh!

kgyrtkirk Apr 24, 2024

Uh oh!

zachjsh Apr 25, 2024 •

edited

Loading

Uh oh!

kgyrtkirk Apr 24, 2024

Uh oh!

zachjsh Apr 25, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zachjsh commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kgyrtkirk left a comment

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Apr 24, 2024

Choose a reason for hiding this comment

Uh oh!

zachjsh Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kgyrtkirk Apr 24, 2024

Choose a reason for hiding this comment

Uh oh!

zachjsh Apr 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zachjsh commented Apr 10, 2024 •

edited

Loading

zachjsh Apr 25, 2024 •

edited

Loading