Add window-focused tests from Drill#13773
Conversation
0607f74 to
9f53ee6
Compare
a4ac6e1 to
0364c4a
Compare
d0546cf to
20ea802
Compare
This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.
0d686c3 to
d539712
Compare
| final ColumnType type = columnAccessor.getType(); | ||
| if (type.getType() == ValueType.COMPLEX) { | ||
| final ComplexMetricSerde serdeForType = ComplexMetrics.getSerdeForType(type.getComplexTypeName()); | ||
| if (serdeForType != null && serdeForType.getObjectStrategy() != null) { |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| if (type.getType() == ValueType.COMPLEX) { | ||
| final ComplexMetricSerde serdeForType = ComplexMetrics.getSerdeForType(type.getComplexTypeName()); | ||
| if (serdeForType != null && serdeForType.getObjectStrategy() != null) { | ||
| return serdeForType.getObjectStrategy().getClazz(); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
gianm
left a comment
There was a problem hiding this comment.
LGTM, the adjusted changes are pretty targeted. I have one question on the SQL planning side that doesn't affect my approval, I am just wanting to know.
| } | ||
|
|
||
| final DataSource myDataSource; | ||
| if (dataSource instanceof TableDataSource) { |
There was a problem hiding this comment.
What if the underlying datasource is a join or unnest? Do we need special handling or is that taken care of by the WindowOperatorQuery itself?
There was a problem hiding this comment.
This code is insufficient for those cases. I expect that the Drill tests will end up covering those cases. I.e. that's still WIP. In the "simplest" solution, however, this should perhaps be inverted to just check if it's already a QueryDataSource and if it's not, to make it one. That would cover all of the cases and handle things as a scan.
That said, the whole planning a scan query thing is a short-term hack/work-around anyway, I expect that in the end this will just plan the "operator query" all the way down to the segment (which is what happens with this workaround as well, it just happens in a really... odd... way)
This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities. In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.
This commit borrows some test definitions from Drill's test suite and tries to use them to flesh out the full validation of window function capbilities.
In order to be able to run these tests, we also add the ability to run a Scan operation against segments, which also meant an implementation of RowsAndColumns for frames.
Initially, in trying to add these tests, I also started trying to
fix the problems that arose. One of which was being able to scan
data from segments for use in queries. This is necessary for these
tests because the Drill tests are generally not grouping on things
first and, instead, are essentially just resolving to scan operators.
After resolving that issue, I ran into another set of bugs specifically
associated with Calcite query planning, where Calcite did not give me
a logical plan that mapped correctly to the semantics of the query.
As I dove into that, I realized that it was a bigger ball of yarn and
this commit was already starting to sprawl out in scope, so I changed
strategy and am instead introducing the test framework and fixes
that have been made so far, will introduce the full set of 2000
files for the tests in a subsequent commit, and just focus this commit
on the code changes required to get everything in place.
Table of Contents (or, what to expect when reviewing this):
parquet-extension, these are some code changes to add a main (ParquetToJson) that can be used to convert parquet files to new-line delimited Json. This is just to have a utility for developers to use if we ever need to add a new dataset that is defined by parquet and is not a Main intended for a general audienceFrameColumnReaderto have it be able to read out a RowsAndColumns column. This hopefully also provides a relatively straight-forward path for using Frame columns in cases where direct reads from locations makes more sense than theColumnSelector/DimensionSelectorroutes that have been previously employedDecoratableRowsAndColumnsemantic interface added that takes on "decorations" of a RAC and tries to lazily execute them. This is leveraged in making the ability to read the segment work. Note that the capabilities for reading segments have only the minimum implemented to make these tests run and are not implemented and wired up to be able to actually execute in a distributed environment.resourcesdirectory without any actual code changes. That should make it easy to merge in the 2600 extra files for tests.This PR has: