Skip to content

Add fieldReader for row based frames#16707

Merged
adarshsanjeev merged 7 commits intoapache:masterfrom
adarshsanjeev:row-based-frame-rac
Aug 13, 2024
Merged

Add fieldReader for row based frames#16707
adarshsanjeev merged 7 commits intoapache:masterfrom
adarshsanjeev:row-based-frame-rac

Conversation

@adarshsanjeev
Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev commented Jul 9, 2024

Add a new fieldReaders#makeRAC for RowBasedFrameRowsAndColumns. Currently, RowBasedFrameRowsAndColumns uses FrameColumnReaders to read values. FrameColumnReaders do not support reading from row based frames, and this throws an exception. This PR adds FieldReaders#makeRACColumn implementations to be used in this case. This will allow row based frames to be converted into RAC correctly.

The PR does not implement the functions for array readers. This will be done in the future.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Fix up the coverage for tests and it should be okay to merge.

Copy link
Copy Markdown
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the failing coverage, we have FrameWriterTestData for all the different types of columns supported. It would be good to add the tests there that check the getVal, getComparator and isNull of all the rows fetched from the ReaderColumns, and validate them against the source data.

@LakshSingla
Copy link
Copy Markdown
Contributor

Why can't we delegate the function calls made in the added Column to the Selector that's already present? Given that the layout is row-based, can we have a selector + a cursor to mimic the functionality of the Column, without duplicating the logic inside?

Also, why is the comparator of the array classes not implemented?

Comment on lines +206 to +225
public int compareRows(int lhsRowNum, int rhsRowNum)
{
long lhsPosition = coach.computeFieldPosition(lhsRowNum);
long rhsPosition = coach.computeFieldPosition(rhsRowNum);

final byte nullIndicatorByte = getNullIndicatorByte();
if (dataRegion.getByte(lhsPosition) == nullIndicatorByte) {
if (dataRegion.getByte(rhsPosition) == nullIndicatorByte) {
return 0;
} else {
return -1;
}
} else {
if (dataRegion.getByte(rhsPosition) == nullIndicatorByte) {
return 1;
} else {
return Long.compare(getLongAtPosition(lhsPosition), getLongAtPosition(rhsPosition));
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should use byte-based comparison. That's the benefit of transforming the value stored - we can directly compare the bytes (including the nullity check). Checking for nullity, Detransforming, and comparing as long is redundant and a lot more expensive than the first approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the code to do that comparison with? Or does it need to be written. When I wrote this code, I know that I heard that the frames had this nice comparability property, but I couldn't figure out what code could be used to do it correctly, so the implementation fell to something that would definitely be correct. Along with the statement of what should be done, a pointer to what code to use would make it much faster and simpler to actually update the PR and get it merged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two fields of the same type are written in the frame - a represented as [a0...aM] and b represented as [b0..bN] then if (a > b), then the bytewise comparison of a's representation will be lexicographically greater than the bytewise representation of b. Therefore, if a > b, a0 = b0, a1 == b1, ... ai == bi, a(i+1) > b(i+1)
Note, this only applies to primitive types and arrays of primitives.

We use this property in multiple places, check out one of the implementations here - https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/frame/read/FrameReaderUtils.java#L173-L173.

in the given code, you can compare using something like:

 for (int i = 0; i < Math.min(fieldLength1, fieldLength2); ++i) {
   int cmp = compareBytesUnsigned(dataRegion.getByte(fieldPosition1 + i), dataRegion.getByte(fieldPosition2 + i);
   if (cmp != 0) return cmp;
 }

fieldLength1 & fieldLength2 is something you can compute by subtracting the starting position of current field from the starting position of the next field (or the row end). There are pre-existing utility methods that can achieve this - checkout the classes ReadableFieldPointer and RowMemoryFieldPointer. You can check those as well, I think some of the work done by the FieldPositionHelper is duplicated in those classes, albeit in a non-roundabout way.

LMK if there's more I can explain up on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since this can be picked up in future iterations. Feel free to implement this in this PR or as a follow-up. PR should be GTG once the tests pass.

@imply-cheddar
Copy link
Copy Markdown
Contributor

Why can't we delegate the function calls made in the added Column to the Selector that's already present? Given that the layout is row-based, can we have a selector + a cursor to mimic the functionality of the Column, without duplicating the logic inside?

Because avoiding all of that is the exact point of doing direct access? If you want to avoid duplication, go in and re-implement the cursor-based stuff to be able to do direct reads using these methods rather than doing some round-about hoop-jumping to align a direct read interface with an indirect cursor.

@imply-cheddar
Copy link
Copy Markdown
Contributor

Also, why is the comparator of the array classes not implemented?

Which array class are you particularly worried about? All of the array-based readers throw NotYetImplemented DruidExceptions?

Those can absolutely be implemented as needed, they aren't implemented yet because they just weren't needed, as is indicated by the javadoc on the NotYetImplemented exception.

@imply-cheddar
Copy link
Copy Markdown
Contributor

The test failure is weird. The way that the current test is parameterized makes it incredibly hard to figure out which is broken, in fact that whole pattern of parameterization is probably counter productive (you can just have each thing be its own test and share the test-execution code). I cannot push directly to this branch, so I instead did a PR against the branch at adarshsanjeev#2

The PR updates the test to hopefully be something that continues to test what the test here was trying to test and avoids the challenges. That said, I couldn't reproduce the test failure myself, so I'm not really sure why it's sad...

@adarshsanjeev adarshsanjeev merged commit c6da2f3 into apache:master Aug 13, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants