Fix NPE when indexing unparseable/missing numeric values when sortFacts=false#4509
Fix NPE when indexing unparseable/missing numeric values when sortFacts=false#4509jon-wei wants to merge 2 commits intoapache:masterfrom
Conversation
| return ((Number) valObj).longValue(); | ||
| } else if (valObj instanceof String) { | ||
| return DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj); | ||
| Long parsedVal = DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj); |
There was a problem hiding this comment.
Why null checks are needed in checkUnsortedEncodedKeyComponentsEqual() and checkUnsortedEncodedKeyComponentHashCode(), since you added it here?
There was a problem hiding this comment.
Nulls could still appear in TimeAndDims for numeric fields when the input row is missing those columns completely, you can see an example in testNullDimensionTransform() in incremental/IncrementalIndexTest
| } else if (valObj instanceof String) { | ||
| return DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj); | ||
| Long parsedVal = DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj); | ||
| return parsedVal == null ? 0L : parsedVal; |
There was a problem hiding this comment.
Could you cache 0L in a constant of type Long? JLS doesn't guarantee it will be boxed to a single object
There was a problem hiding this comment.
Added a LONG_ZERO constant
| } else if (valObj instanceof String) { | ||
| return Floats.tryParse((String) valObj); | ||
| Float parsedVal = Floats.tryParse((String) valObj); | ||
| return parsedVal == null ? 0.0f : parsedVal; |
There was a problem hiding this comment.
Added a FLOAT_ZERO constant
leventov
left a comment
There was a problem hiding this comment.
Also, maybe instead of converting unparseable values to 0, reject the row? Conversion to 0 allows to hide problems and gives false impression that everything is ok.
| @Override | ||
| public boolean checkUnsortedEncodedKeyComponentsEqual(Float lhs, Float rhs) | ||
| { | ||
| if (lhs == null) { |
| public int getUnsortedEncodedKeyComponentHashCode(Float key) | ||
| { | ||
| return key.hashCode(); | ||
| return key == null ? 0 : key.hashCode(); |
| @Override | ||
| public boolean checkUnsortedEncodedKeyComponentsEqual(Long lhs, Long rhs) | ||
| { | ||
| if (lhs == null) { |
leventov
left a comment
There was a problem hiding this comment.
Non-null is expected as dim value also in FloatDimensionIndexer and LongDimensionIndexer. Maybe it's better to fill missing dimensions with nulls, to ensure TimeAndDims.getDims() array doesn't have null elements?
|
I'm not sure this really needs to be a 0.10.1 blocker, based on this comment.
Because sortFacts is always true during ingestion. It's only false in groupBy v1 processing, which doesn't support numeric dimensions anyway. So the code path should never actually get hit in production. |
|
I think there is still a problem and this patch should be merged |
|
Moving this back to 0.12.0 as we discovered that, unlike previously believed, the bug is in fact triggerable in production. |
|
LGTM overall |
|
Thanks for the review so far, after revisting this, I agree with @leventov's comment here: #4509 (review) I'm closing this PR in favor of #5312 which rejects rows with unparseable numeric dimensions |
Related to discussion in #4503
If a row with non-null unparseable values for a numeric dimension or a row with a numeric dimension missing is ingested, this would cause an NPE when sortFacts=false.