Numeric array support for columnar frames#15917
Conversation
|
|
||
| for (int i = 0; i < rowLength; ++i) { | ||
| final Number element = numericArray.get(i); | ||
| final long memoryOffset = rowDataCursor.start() + ((long) elementSizeBytes() * i); |
Check warning
Code scanning / CodeQL
Dereferenced variable may be null
There was a problem hiding this comment.
This is a false warning, since rowDataCursor = null iff rowLength = 0, and in which case, we won't be accessing the variable.
| final Number element = numericArray.get(i); | ||
| final long memoryOffset = rowDataCursor.start() + ((long) elementSizeBytes() * i); | ||
| if (element == null) { | ||
| rowNullityDataCursor.memory() |
Check warning
Code scanning / CodeQL
Dereferenced variable may be null
| .putByte(rowNullityDataCursor.start() + (long) Byte.BYTES * i, NULL_ELEMENT_MARKER); | ||
| putNull(rowDataCursor.memory(), memoryOffset); | ||
| } else { | ||
| rowNullityDataCursor.memory() |
Check warning
Code scanning / CodeQL
Dereferenced variable may be null
| (long) Byte.BYTES * FrameColumnReaderUtils.getAdjustedCumulativeRowLength( | ||
| memory, | ||
| getStartOfCumulativeLengthSection(), | ||
| numRows - 1 |
Check failure
Code scanning / CodeQL
User-controlled data in arithmetic expression
adarshsanjeev
left a comment
There was a problem hiding this comment.
Looks good to me overall!
|
|
||
| if (multiValue) { | ||
| totalNumValues = adjustCumulativeRowLength(getCumulativeRowLength(memory, numRows - 1)); | ||
| totalNumValues = FrameColumnReaderUtils.adjustCumulativeRowLength( |
There was a problem hiding this comment.
Should this be changed to getAdjustedCumulativeRowLength instead?
| rowLength = cumulativeRowLength; | ||
| } else { | ||
| rowLength = cumulativeRowLength - adjustCumulativeRowLength(getCumulativeRowLength(memory, physicalRow - 1)); | ||
| rowLength = cumulativeRowLength - FrameColumnReaderUtils.adjustCumulativeRowLength( |
There was a problem hiding this comment.
Should this also be getAdjustedRowLength since the function exists?
| import org.apache.druid.segment.column.ColumnType; | ||
|
|
||
| /** | ||
| * Reaaers for columns written by {@link org.apache.druid.frame.write.columnar.LongArrayFrameColumnWriter} |
| * | ||
| * Note on cumulative lengths stored in section 1: Cumulative lengths are stored so that its fast to offset into the | ||
| * elements of the array. We also use negative cumulative length to denote that the array itself is null (as opposed to | ||
| * individual elements being null, which we store in section 2) |
There was a problem hiding this comment.
Thanks for the detailed doc! This made it a lot easier to follow
| totalNumValues = FrameColumnReaderUtils.getAdjustedCumulativeRowLength( | ||
| memory, | ||
| getStartOfCumulativeLengthSection(), | ||
| numRows - 1 |
Check failure
Code scanning / CodeQL
User-controlled data in arithmetic expression
|
Thanks for the review @adarshsanjeev. I have confirmed that the CodeQL warnings are not correct (with explanation) |
This patch adds "TypeCastSelectors", which is used when writing frames to perform two coercions: - When a numeric type is desired and the underlying type is non-numeric or unknown, the underlying selector is wrapped, "getObject" is called and the result is coerced using "ExprEval.ofType". This differs from the prior behavior where the primitive methods like "getLong", "getDouble", etc, would be called directly. This fixes an issue where a column would be read as all-zeroes when its SQL type is numeric and its physical type is string, which can happen when evolving a column's type from string to number. - When an array type is desired, the underlying selector is wrapped, "getObject" is called, and the result is coerced to Object[]. This coercion replaces some earlier logic from #15917.
Description
This PR adds support for numeric arrays for columnar frame types. Columnar frame types are used in window functions processing and materializing subquery results, therefore it fixes up a major hole in the current capabilities of those.
Layout of the column
For storing a column of n rows, it is laid out in the following fashion. Assume that the total number of the elements (i.e. individual values in the array) in all the rows combined is k:
n * Integer.BYTESk * Byte.BYTESk * ELEMENT_SIZERelease note
Columnar frames used in subquery materialization and window functions now support numeric arrays.
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: