Add fieldReader for row based frames#16707
Conversation
imply-cheddar
left a comment
There was a problem hiding this comment.
Overall LGTM. Fix up the coverage for tests and it should be okay to merge.
LakshSingla
left a comment
There was a problem hiding this comment.
Regarding the failing coverage, we have FrameWriterTestData for all the different types of columns supported. It would be good to add the tests there that check the getVal, getComparator and isNull of all the rows fetched from the ReaderColumns, and validate them against the source data.
|
Why can't we delegate the function calls made in the added Column to the Selector that's already present? Given that the layout is row-based, can we have a selector + a cursor to mimic the functionality of the Column, without duplicating the logic inside? Also, why is the comparator of the array classes not implemented? |
| public int compareRows(int lhsRowNum, int rhsRowNum) | ||
| { | ||
| long lhsPosition = coach.computeFieldPosition(lhsRowNum); | ||
| long rhsPosition = coach.computeFieldPosition(rhsRowNum); | ||
|
|
||
| final byte nullIndicatorByte = getNullIndicatorByte(); | ||
| if (dataRegion.getByte(lhsPosition) == nullIndicatorByte) { | ||
| if (dataRegion.getByte(rhsPosition) == nullIndicatorByte) { | ||
| return 0; | ||
| } else { | ||
| return -1; | ||
| } | ||
| } else { | ||
| if (dataRegion.getByte(rhsPosition) == nullIndicatorByte) { | ||
| return 1; | ||
| } else { | ||
| return Long.compare(getLongAtPosition(lhsPosition), getLongAtPosition(rhsPosition)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This logic should use byte-based comparison. That's the benefit of transforming the value stored - we can directly compare the bytes (including the nullity check). Checking for nullity, Detransforming, and comparing as long is redundant and a lot more expensive than the first approach.
There was a problem hiding this comment.
Which is the code to do that comparison with? Or does it need to be written. When I wrote this code, I know that I heard that the frames had this nice comparability property, but I couldn't figure out what code could be used to do it correctly, so the implementation fell to something that would definitely be correct. Along with the statement of what should be done, a pointer to what code to use would make it much faster and simpler to actually update the PR and get it merged.
There was a problem hiding this comment.
If two fields of the same type are written in the frame - a represented as [a0...aM] and b represented as [b0..bN] then if (a > b), then the bytewise comparison of a's representation will be lexicographically greater than the bytewise representation of b. Therefore, if a > b, a0 = b0, a1 == b1, ... ai == bi, a(i+1) > b(i+1)
Note, this only applies to primitive types and arrays of primitives.
We use this property in multiple places, check out one of the implementations here - https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/frame/read/FrameReaderUtils.java#L173-L173.
in the given code, you can compare using something like:
for (int i = 0; i < Math.min(fieldLength1, fieldLength2); ++i) {
int cmp = compareBytesUnsigned(dataRegion.getByte(fieldPosition1 + i), dataRegion.getByte(fieldPosition2 + i);
if (cmp != 0) return cmp;
}fieldLength1 & fieldLength2 is something you can compute by subtracting the starting position of current field from the starting position of the next field (or the row end). There are pre-existing utility methods that can achieve this - checkout the classes ReadableFieldPointer and RowMemoryFieldPointer. You can check those as well, I think some of the work done by the FieldPositionHelper is duplicated in those classes, albeit in a non-roundabout way.
LMK if there's more I can explain up on.
There was a problem hiding this comment.
Approving since this can be picked up in future iterations. Feel free to implement this in this PR or as a follow-up. PR should be GTG once the tests pass.
Because avoiding all of that is the exact point of doing direct access? If you want to avoid duplication, go in and re-implement the cursor-based stuff to be able to do direct reads using these methods rather than doing some round-about hoop-jumping to align a direct read interface with an indirect cursor. |
Which array class are you particularly worried about? All of the array-based readers throw Those can absolutely be implemented as needed, they aren't implemented yet because they just weren't needed, as is indicated by the javadoc on the |
|
The test failure is weird. The way that the current test is parameterized makes it incredibly hard to figure out which is broken, in fact that whole pattern of parameterization is probably counter productive (you can just have each thing be its own test and share the test-execution code). I cannot push directly to this branch, so I instead did a PR against the branch at adarshsanjeev#2 The PR updates the test to hopefully be something that continues to test what the test here was trying to test and avoids the challenges. That said, I couldn't reproduce the test failure myself, so I'm not really sure why it's sad... |
Add a new fieldReaders#makeRAC for RowBasedFrameRowsAndColumns. Currently,
RowBasedFrameRowsAndColumnsusesFrameColumnReadersto read values.FrameColumnReadersdo not support reading from row based frames, and this throws an exception. This PR addsFieldReaders#makeRACColumnimplementations to be used in this case. This will allow row based frames to be converted into RAC correctly.The PR does not implement the functions for array readers. This will be done in the future.
This PR has: