Move INSERT & REPLACE validation to the Calcite validator#15908
Move INSERT & REPLACE validation to the Calcite validator#15908zachjsh merged 14 commits intoapache:masterfrom
Conversation
…ct source node. Implemented validateInsert method in DruidSqlValidtor to validate the respective node and moved a lot of the validation being done previously in the ingestHandlers to this overriden method. In the next commit will try to pull back out some of this validation to make this code change smaller and easier to review.
kgyrtkirk
left a comment
There was a problem hiding this comment.
left some comments; I'm not sure if I understand every bits of it...
are these codes covered by tests?
there are quite a few new incorrect state checks; it would be nice to cover at least some of them with directed tests
| (SqlNodeList) operands[5], | ||
| // Must match DruidSqlReplace.getOperandList() | ||
| operands[6], | ||
| null // fix this |
There was a problem hiding this comment.
what needs to be fixed? I would recommend to either use FIXME and address it before merging the patch; or remove this comment and make sure that bad doesn't happen by providing some reasonable exception
There was a problem hiding this comment.
I think that in order for this to work properly, the exportFileFormat needs to be changed into an SqlNode, and added as an operand. Without this, I dont think that parameterized queries using export capabilites will work. cc @adarshsanjeev
| { | ||
| return new DruidSqlReplace( | ||
| pos, | ||
| // Must match SqlInsert.getOperandList() |
There was a problem hiding this comment.
I don't really understand these comments - it doesn't help me understand...do we need them?
you could create a bunch of local variables with the casted types and name them accrodingly - that might help or even provide a place to add comments...
There was a problem hiding this comment.
removed the comments, let me know if ok now.
| if (!query.isA(SqlKind.QUERY)) { | ||
| throw InvalidSqlInput.exception("Unexpected SQL statement type [%s], expected it to be a QUERY", query.getKind()); | ||
| } | ||
| return DruidSqlInsert.create(new SqlInsert( |
There was a problem hiding this comment.
I think contents of this method could be just pushed into DruidSqlInsert ; it seems like all this supposed to belong to there...
...or there is something I've missed...
| } | ||
| } | ||
|
|
||
| private void validateSegmentGranularity( |
There was a problem hiding this comment.
seeing a method which is void and named as validateSegmentGranularity I have a feeling that the method have lost its context;
meaning: ideally this should be named something like getEffectiveGranularity and return with Granularity - but in the process of getting the actual granularity it must also identify invalid cases...
do you see any possible better place for it?
There was a problem hiding this comment.
Good point! Fixed. let me know if better now.
| final List<RelDataTypeField> sourceFields = sourceType.getFieldList(); | ||
| for (final RelDataTypeField sourceField : sourceFields) { | ||
| // Check that there are no unnamed columns in the insert. | ||
| if (UNNAMED_COLUMN_PATTERN.matcher(sourceField.getName()).matches()) { |
There was a problem hiding this comment.
are there any test for this exception? what if the pattern doesn't match anymore?
There was a problem hiding this comment.
Yes! there are tests in CalciteInsertDmlTest, for example, org.apache.druid.sql.calcite.CalciteInsertDmlTest#testInsertWithUnnamedColumnInSelectStatement
| if (query instanceof SqlOrderBy) { | ||
| SqlOrderBy sqlOrderBy = (SqlOrderBy) query; | ||
| SqlNodeList orderByList = sqlOrderBy.orderList; | ||
| if (!(orderByList == null || orderByList.equals(SqlNodeList.EMPTY))) { |
There was a problem hiding this comment.
why have all this here? isn't DruidSqlIngest subclassed to have DruidSqlInsert?
shouldn't this happen in DruidSqlInsert when it gets created?
There was a problem hiding this comment.
doing it in DruidSqlInsert construction changes the output of unparse. Not sure if this is ok. There are a few tests that are testing the expected output of unparse in DruidSqlUnparseTest.java. I can update these tests, but just not sure if doing this has other ramifications?
* allow granularity mismatch * remove duplicate validation around unnamed columns
| // Copied here from MSQE since that extension is not visible here. | ||
| public static final String CTX_ROWS_PER_SEGMENT = "msqRowsPerSegment"; | ||
|
|
||
| public interface ValidatorContext |
| } | ||
|
|
||
| // The target namespace is both the target table ID and the row type for that table. | ||
| final SqlValidatorNamespace targetNamespace = getNamespace(insert); |
There was a problem hiding this comment.
could we have a copy getNamespaceOrThrow over from SqlValidatorImpl or use requireNonNull (just to avoid possible issue if it ends up being null)
| // know names and we match by name.) Thus, we'd have to validate (to know names and types) | ||
| // to get the target types, but we need the target types to validate. Catch-22. So, we punt. | ||
| final SqlValidatorScope scope; | ||
| if (source instanceof SqlSelect) { |
There was a problem hiding this comment.
note: it seems like most of this is copied over from SqlValidatorImpl#validateInsert ; extended / refactored / methods were extracted / commented on...
I think it might be challenging to maintain this in the long run - however validateInsert doesn't seem to be changing very often
I think it might be usefull to leave some comments about the origins of this method as an apidoc of validateInsert method
| for (final RelDataTypeField sourceField : sourceFields) { | ||
| // Check that there are no unnamed columns in the insert. | ||
| if (UNNAMED_COLUMN_PATTERN.matcher(sourceField.getName()).matches()) { | ||
| throw InvalidSqlInput.exception( |
There was a problem hiding this comment.
could you replace these exceptions with ones communicating the SqlNode if its interesting/valuable....
if its a general error that doesn't matter...but this error is specific to a selected column:
throw buildCalciteContextException(
"Insertion requires columns to be named....",
getSqlNodeFor(insert, sourceFields.indexOf(sourceField))
rough sqlNodeFor method:
SqlNode getSqlNodeFor(SqlInsert insert, int idx) {
SqlNode src = insert.getSource();
if(src instanceof SqlSelect) {
SqlSelect sqlSelect = (SqlSelect) src;
SqlNodeList selectList = sqlSelect.getSelectList();
if(idx < selectList.size()) {
return selectList.get(idx);
}
}
return src;
}
| @@ -1765,7 +1762,6 @@ public void testErrorWhenInputSourceInvalid() | |||
| + "partitioned by DAY\n" | |||
| + "clustered by channel"; | |||
| HashMap<String, Object> context = new HashMap<>(DEFAULT_CONTEXT); | |||
There was a problem hiding this comment.
I can't comment on the testInsertWithInvalidColumnNameInIngest testcase; but the intention of the check is to ensure that something like:
INSERT INTO t SELECT __time, 1+1 FROM foo PARTITIONED BY ALL
is catched ; could you change or add something like this as a testcase?
|
@zachjsh - This was merged without a committer approval. Can you revert it and open another PR so that you can get a committer approval? |
| public void testInsertHourGrain() | ||
| { | ||
| testIngestionQuery() | ||
| .sql("INSERT INTO hourDs\n" + |
There was a problem hiding this comment.
In a follow up, could you also please add a test for a REPLACE query where PARTITIONED BY clause in the query is omitted?
| } | ||
|
|
||
| /** | ||
| * If the segment grain is given in the catalog then use this value is used. |
There was a problem hiding this comment.
| * If the segment grain is given in the catalog then use this value is used. | |
| If the segment grain is given in the catalog and absent in the PARTITIONED BY clause in the query, then use the value from the catalog. |
| @Nullable SqlIdentifier exportFileFormat, | ||
| @Nullable SqlNode replaceTimeQuery |
There was a problem hiding this comment.
curious, is the order of args swapped for a reason?
| // Copied here from MSQE since that extension is not visible here. | ||
| public static final String CTX_ROWS_PER_SEGMENT = "msqRowsPerSegment"; | ||
|
|
||
| public interface ValidatorContext |
| + "partitioned by DAY\n" | ||
| + "clustered by channel"; | ||
| HashMap<String, Object> context = new HashMap<>(DEFAULT_CONTEXT); | ||
| context.put(PlannerContext.CTX_SQL_OUTER_LIMIT, 100); |
There was a problem hiding this comment.
Curious, are these context parameters not required by these tests anymore?
…urce input expressions (#15962) * * address remaining comments from #15836 * * address remaining comments from #15908 * * add test that exposes relational algebra issue * * simplify test exposing issue * * fix * * add tests for sealed / non-sealed * * update test descriptions * * fix test failure when -Ddruid.generic.useDefaultValueForNull=true * * check type assignment based on natice Druid types * * add tests that cover missing jacoco coverage * * add replace tests * * add more tests and comments about column ordering * * simplify tests * * review comments * * remove commented line * * STRING family types should be validated as non-null
Description
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.
This PR has: