Skip to content

Fix NPE when indexing unparseable/missing numeric values when sortFacts=false#4509

Closed
jon-wei wants to merge 2 commits intoapache:masterfrom
jon-wei:numeric_indexing_fix
Closed

Fix NPE when indexing unparseable/missing numeric values when sortFacts=false#4509
jon-wei wants to merge 2 commits intoapache:masterfrom
jon-wei:numeric_indexing_fix

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Jul 6, 2017

Related to discussion in #4503

If a row with non-null unparseable values for a numeric dimension or a row with a numeric dimension missing is ingested, this would cause an NPE when sortFacts=false.

@jon-wei jon-wei added the Bug label Jul 6, 2017
@jon-wei jon-wei added this to the 0.10.1 milestone Jul 6, 2017
return ((Number) valObj).longValue();
} else if (valObj instanceof String) {
return DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj);
Long parsedVal = DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why null checks are needed in checkUnsortedEncodedKeyComponentsEqual() and checkUnsortedEncodedKeyComponentHashCode(), since you added it here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nulls could still appear in TimeAndDims for numeric fields when the input row is missing those columns completely, you can see an example in testNullDimensionTransform() in incremental/IncrementalIndexTest

} else if (valObj instanceof String) {
return DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj);
Long parsedVal = DimensionHandlerUtils.getExactLongFromDecimalString((String) valObj);
return parsedVal == null ? 0L : parsedVal;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you cache 0L in a constant of type Long? JLS doesn't guarantee it will be boxed to a single object

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a LONG_ZERO constant

} else if (valObj instanceof String) {
return Floats.tryParse((String) valObj);
Float parsedVal = Floats.tryParse((String) valObj);
return parsedVal == null ? 0.0f : parsedVal;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a FLOAT_ZERO constant

Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe instead of converting unparseable values to 0, reject the row? Conversion to 0 allows to hide problems and gives false impression that everything is ok.

@Override
public boolean checkUnsortedEncodedKeyComponentsEqual(Float lhs, Float rhs)
{
if (lhs == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Objects.equals()

public int getUnsortedEncodedKeyComponentHashCode(Float key)
{
return key.hashCode();
return key == null ? 0 : key.hashCode();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use Objects.hashCode()

@Override
public boolean checkUnsortedEncodedKeyComponentsEqual(Long lhs, Long rhs)
{
if (lhs == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-null is expected as dim value also in FloatDimensionIndexer and LongDimensionIndexer. Maybe it's better to fill missing dimensions with nulls, to ensure TimeAndDims.getDims() array doesn't have null elements?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 6, 2017

I'm not sure this really needs to be a 0.10.1 blocker, based on this comment.

If a row with non-null unparseable values for a numeric dimension or a row with a numeric dimension missing is ingested, this would cause an NPE when sortFacts=false.

Because sortFacts is always true during ingestion. It's only false in groupBy v1 processing, which doesn't support numeric dimensions anyway. So the code path should never actually get hit in production.

@jon-wei jon-wei modified the milestones: 0.11.0, 0.10.1 Jul 6, 2017
@jon-wei jon-wei modified the milestones: 0.11.0, 0.11.1 Sep 20, 2017
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 2, 2017

@jon-wei @leventov is this PR still relevant / useful? Should we pick it back up or have other patches in this area superseded it?

@leventov
Copy link
Copy Markdown
Member

leventov commented Oct 2, 2017

I think there is still a problem and this patch should be merged

@jon-wei jon-wei closed this Oct 6, 2017
@jon-wei jon-wei deleted the numeric_indexing_fix branch October 6, 2017 21:41
@jon-wei jon-wei restored the numeric_indexing_fix branch October 6, 2017 21:42
@jon-wei jon-wei reopened this Oct 6, 2017
@jon-wei jon-wei modified the milestones: 0.12.0, 0.13.0 Jan 9, 2018
@gianm gianm modified the milestones: 0.13.0, 0.12.0 Jan 29, 2018
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 29, 2018

Moving this back to 0.12.0 as we discovered that, unlike previously believed, the bug is in fact triggerable in production.

@himanshug
Copy link
Copy Markdown
Contributor

LGTM overall

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Jan 31, 2018

@leventov @himanshug @gianm

Thanks for the review so far, after revisting this, I agree with @leventov's comment here: #4509 (review)

I'm closing this PR in favor of #5312 which rejects rows with unparseable numeric dimensions

@jon-wei jon-wei closed this Jan 31, 2018
@gianm gianm removed this from the 0.12.0 milestone Feb 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants