Write null byte when indexing numeric dimensions with Hadoop#7020
Write null byte when indexing numeric dimensions with Hadoop#7020gianm merged 4 commits intoapache:masterfrom
Conversation
clintropolis
left a comment
There was a problem hiding this comment.
Thanks for the contribution! (and apologies it has taken so long for a review)
I think this is reasonable, it's the same approach being used for the metrics columns.
I think it would be nice to add a test in InputRowSerdeTest to cover this. All of the tests in travis are run with and without sql null compatibility, so you can probably just write one test that can assert that null valued input columns are either the null byte or zero depending on the NullHandling.replaceWithDefault().
392f3ae to
7ad1d91
Compare
|
Thanks for taking a look at this! I added a test to |
clintropolis
left a comment
There was a problem hiding this comment.
LGTM, thanks for adding a test 👍
| // Write the null byte only if the default numeric value is still null. | ||
| if (ret == null) { | ||
| out.writeByte(NullHandling.IS_NULL_BYTE); | ||
|
|
There was a problem hiding this comment.
Please remove the extra blank line.
| public Long deserialize(ByteArrayDataInput in) | ||
| { | ||
| return in.readLong(); | ||
| return isNullByteSet(in) ? null : in.readLong(); |
There was a problem hiding this comment.
Perhaps it would be better to use a functional programming style here.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readLong)
.get();There was a problem hiding this comment.
Eh, I sort of prefer it the way it currently is, seems clearer to me, is there any reason it would be better other than preference?
There was a problem hiding this comment.
Alright, I prefer the functional style because it makes the code more readable. If we don't use Optional, then we need add the @Nullable annotation for this method. It's up to you. 😅
There was a problem hiding this comment.
I think I prefer the non-functional style. Also, maybe I'm misunderstanding, but wouldn't the get() cause the code to throw if the null byte is set?
I'll add @Nullable annotations to these deserialize methods
There was a problem hiding this comment.
but wouldn't the get() cause the code to throw if the null byte is set?
@ferristseng If the null byte is set, then get will return a null value. What you describe should be the orElseThrow function. Thanks for your contribution.
| public Float deserialize(ByteArrayDataInput in) | ||
| { | ||
| return in.readFloat(); | ||
| return isNullByteSet(in) ? null : in.readFloat(); |
There was a problem hiding this comment.
Same.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readFloat)
.get();| public Double deserialize(ByteArrayDataInput in) | ||
| { | ||
| return in.readDouble(); | ||
| return isNullByteSet(in) ? null : in.readDouble(); |
There was a problem hiding this comment.
Same.
return Optional.ofNullable(in)
.filter(InputRowSerde::isNotNullByteSet)
.map(ByteArrayDataInput::readDouble)
.get();
asdf2014
left a comment
There was a problem hiding this comment.
Overall LGTM 👍 Also I left a few suggestions.
|
This has two approvals -- merging it. |
* write null byte in hadoop indexing for numeric dimensions * Add test case to check output serializing null numeric dimensions * Remove extra line * Add @nullable annotations
I noticed a couple of comments that hadn't been addressed in the Hadoop Indexing project regarding serializing and deserializing null numeric values, so I figured I would try to tackle it. I'm not super familiar with the internals of Druid, so let me know if I need to change code elsewhere.
Also, I ran the existing tests in the Hadoop Indexing project with
-Ddruid.generic.useDefaultValueForNull=false, and they still passed. Let me know if I need to add additional ones!