fix nested column handling of null and "null"#13714
Merged
clintropolis merged 7 commits intoapache:masterfrom Feb 1, 2023
Merged
fix nested column handling of null and "null"#13714clintropolis merged 7 commits intoapache:masterfrom
clintropolis merged 7 commits intoapache:masterfrom
Conversation
imply-cheddar
approved these changes
Feb 1, 2023
Comment on lines
+139
to
+145
| if (value == null) { | ||
| localId = localDictionary.add(0); | ||
| } else { | ||
| final int globalId = lookupGlobalId(value); | ||
| Preconditions.checkArgument(globalId >= 0, "Value [%s] is not present in global dictionary", value); | ||
| localId = localDictionary.add(globalId); | ||
| } |
Contributor
There was a problem hiding this comment.
Why is it not okay to do this check inside of lookupGlobalId?
Member
Author
There was a problem hiding this comment.
this is the shared function, lookupGlobalId has different implementations depending on the type of writer so would need to duplicate there
abhagraw
pushed a commit
to abhagraw/druid
that referenced
this pull request
Feb 8, 2023
* fix nested column handling of null and "null" * fix issue merging nested column value dictionaries that could incorrect lose dictionary values
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes an issue with nested columns that can occur when both actual
nulland the string"null"are present in any nested path that results in thenullvalues incorrectly becoming associated with the"null"values.The bug was caused by a usage of
String.valueOfinStringFieldColumnWriterthat was not checking for null values when writing out the column. This still worked by dumb luck because the fastutils 2Int maps that were backing the globalId lookup when writing out segments had a default value of 0, which happens to benullglobal id, so even though"null"wasn't present in the global dictionary it ended up with the correct id. However, if"null"was present,nullwould incorrectly be written out as the"null"global id and associated to that value instead.As a safety measure, I've changed the 2int maps to have a default value of -1, and check that the globalid is in range before writing it to the column to ensure mistakes like this don't happen in the future, which caught a pretty serious regression introduced by #13653, caused by the simplified dictionary merging iterator that would result in segment dictionaries getting completely mangled if the same value was in more than 1 segment's dictionary. Luckily this didn't make it into any releases.
The added test data in
NestedDataColumnSupplierTestwould fail prior to this PR.Unrelated, this PR also moves some column selector tests that had nothing to do with scan queries out of
NestedDataScanQueryTestinto their own file.This PR has: