INSERT/REPLACE can omit clustering when catalog has default#16260
INSERT/REPLACE can omit clustering when catalog has default#16260zachjsh merged 26 commits intoapache:masterfrom
Conversation
…ql-type-inference
…' into use-catalog-clustering-columns
…' into use-catalog-clustering-columns
kgyrtkirk
left a comment
There was a problem hiding this comment.
+1 ; just left some questions :)
| final SqlIdentifier colIdent = new SqlIdentifier( | ||
| Collections.singletonList(keyCol.expr()), | ||
| null, SqlParserPos.ZERO, | ||
| Collections.singletonList(SqlParserPos.ZERO) | ||
| ); |
There was a problem hiding this comment.
I was wondering what will happen in the following case:
- say colunmn
cis aclusterKey - we are selecting from a join which has column
con both sides
but it seems like the column in the select list will take precedence.
one more thing I was wondering about: do we have a check that all keyCols are present in the selected column list?
There was a problem hiding this comment.
About whether there is a check that al keyCols are present in the selected column list, see the following tests:
testInsertTableWithClusteringWithClusteringOnNewColumnFromQuery
testInsertTableWithClusteringWithClusteringOnBadColumn
Do these cover the cases you are talking about?
About the join issue, do you have a concrete query in example, just to clarify?
| final IdentifierNamespace insertNs = (IdentifierNamespace) targetNamespace; | ||
| SqlIdentifier identifier = insertNs.getId(); | ||
| SqlValidatorTable catalogTable = getCatalogReader().getTable(identifier.names); | ||
| if (catalogTable != null) { |
There was a problem hiding this comment.
wouldn't the fall-thru from this condtional will cause that the CLUSTER BY on the ingestNode will not be applied (line399 right now); even if its there - is that okay?
There was a problem hiding this comment.
if the ingestNode already has the clustering columns, they will be used. There are existing tests which test that the clustering columns are used in the plan returned from dml query, when clustering is defined at query time, and the table is / it not in catalog. Let me know if this covers the issue that think could occur.
Description
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing for tables that are defined in the catalog to have any defined clustering columns used in DML INSERT/REPLACE operations without needing to be specified at query time. If the user specified a clustering columns at query time, these columns are preferred to the catalog defined clustering columns.
This PR has: