Skip to content

More specific null/empty str handling in IndexMerger#2306

Merged
fjy merged 1 commit intoapache:masterfrom
jon-wei:inherit2
Jan 21, 2016
Merged

More specific null/empty str handling in IndexMerger#2306
fjy merged 1 commit intoapache:masterfrom
jon-wei:inherit2

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Jan 20, 2016

Addresses a concern raised in PR #2006 about dimension cardinalities.

Changes the conditions for when an entry for the null/empty str is added to the value dictionary during index merging.

Old behavior:

  • If dim has values in at least one index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

New behavior:

  • If dim is entirely missing from an index, but has values in another index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a more descriptive name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjy changed this to "dimAbsentFromIndex"

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 20, 2016

@navis does this address your concerns from #2006 ?

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 21, 2016

👍

@fjy fjy added this to the 0.9.0 milestone Jan 21, 2016
@binlijin
Copy link
Copy Markdown
Contributor

What will happen if a dim has values in all index, but all index do not have null/empty value?

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Jan 21, 2016

@binlijin The null/empty str will not be added to the dictionary in that case (i.e., dimAbsentFromIndex will be false)

Off the top of my head, testNonLexicographicDimOrderMerge() in IndexMergerTest is an example where the dims don't have null/empty values

@binlijin
Copy link
Copy Markdown
Contributor

@jon-wei good

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be &&

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm changed, thanks for catching that

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 21, 2016

@jon-wei It seems like the ideal condition would be:

  • If dim is entirely missing from one index, but has mixed nulls and values in another index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

Checking for mixed nulls/values shouldn't add additional expense as isNullColumn already iterates over all the dim values. So I think it's feasible to do this check the ideal way.

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Jan 21, 2016

@gianm It's still necessary to add the null/empty str to the dictionary for the case where a dim is missing from one index, and has no null values in another (missing and non-mixed) when they merge.

In the legacy merger this was handled by bumping the dictionary in IndexIO:

          if (onlyOneValue) {
            log.info("Dimension[%s] is single value, converting...", dimension);
            final boolean bumpedDictionary;
            if (nullsSet != null) {
              log.info("Dimension[%s] has null rows.", dimension);
              final ImmutableBitmap theNullSet = bitmapFactory.makeImmutableBitmap(nullsSet);

              if (dictionary.get(0) != null) {
                log.info("Dimension[%s] has no null value in the dictionary, expanding...", dimension);
                bumpedDictionary = true;
                final List<String> nullList = Lists.newArrayList();
                nullList.add(null);

                dictionary = GenericIndexed.fromIterable(
                    Iterables.concat(nullList, dictionary),
                    GenericIndexed.STRING_STRATEGY
                );

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 21, 2016

👍

fjy added a commit that referenced this pull request Jan 21, 2016
More specific null/empty str handling in IndexMerger
@fjy fjy merged commit 3f99811 into apache:master Jan 21, 2016
@fjy fjy mentioned this pull request Feb 5, 2016
@fjy fjy added the Improvement label Feb 5, 2016
@jon-wei jon-wei deleted the inherit2 branch October 6, 2017 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants