Conversation
There was a problem hiding this comment.
can we move close to a finally block.
|
@KurtYoung This is an awesome PR and one that I know several folks want to code review. Everything is going to be slow during the Christmas holidays, but this will get attention after new years. |
|
👍, seems good to go after a minor nitpick for closing in finally block. |
|
added IndexMergerV9, changed some low level interfaces but totally compatible with the old way. Some explanation and thoughts here:
And here are some points I think should be discussed with your guys when writing the codes: |
54725a7 to
8daf783
Compare
There was a problem hiding this comment.
New two IntBuffer to reuse, when after write switch them to make sure GenericIndexedWriter can sort correct.
|
hmm...I see the point why the null dict value and null row set are handled both in IndexMaker and IndexIO's converting. My previous decision about skippedDimension & nullSet are wrong, just ignore it. Working on this now... |
|
Found a bug of merge & maker about dimension orders: |
|
Is it ok to insert null to every dimension's dictionary even if the dimension did not contain any null values? Update: Found a way to deal with null value now, but it's a little tricky(there are comments in IncrementalIndexAdapter) and easy to create inconsistency(IncrementalIndexStorageAdapter also rely on IncrementalIndex but does not have this logic right now or it's just does not need this right now). I think proposal above is a possible and easy solution, what do you guys think? |
|
Wishfully -1 could be used for null and number of distincts would be specified for each dimension in meta. Would be possible? |
|
@KurtYoung: are there any corresponding changes in the filters/query path for null handling in case we add null to every dimension dictionary. |
|
@nishantmonu51 I'm also aware of this, the current implementation did not add null to each dimension but handled null value in both IncrementalIndexAdapter and IndexMergerV9. |
|
@KurtYoung there's been a lot of optimizations in the old index merger over the last few months. Are those optimizations incorporated in building the v9 segment directly? |
There was a problem hiding this comment.
since we are changing the behavior of this method, can we please add a comment on the interface about how the method is supposed to be used?
|
@KurtYoung I can't find the logic where you actually use index merger v9 instead of index merger |
|
@KurtYoung I think the best way to think about how to handle nulls/empty strings in Druid is described in this PR: #995 |
|
I did a first pass over this PR but didn't go into detail for IndexMergerV9. Will look into more once we that know it reasonably works. High level I'm on board with the changes. |
|
@fjy Actually, I did not change any logic to use IndexMergerV9 now, but I switch the current IndexMerger's logic to IndexMergerV9's and make all the test cases passes. |
|
@KurtYoung can't seem to make any comments for indexmergerv9 |
There was a problem hiding this comment.
I am not able to make any comments for IndexMergerV9 below this line.
There was a problem hiding this comment.
is IndexMergerV9 just a rename of IndexMerger.java?
There was a problem hiding this comment.
If that's the case, did you remove the v8 to v9 conversion step in IndexMerger?
There was a problem hiding this comment.
IndexMergerV9 make v9 index files directly, the main step are very like with IndexMerger except the v8 to v9 conversion step is no longer necessary.
|
@himanshug Any more comments? @KurtYoung Can we add some way to switch between IndexMerger and IndexMerger v9 in the configuration? Index Merger should be the default. |
|
@fjy I am 👍 once the configuration to switch to IndexMergerV9 is in place. |
|
I believe @cheddar should have a look at this one. He had a lot of opinions about format when I made changes to introduce dimension compression, and this one introduces even more changes. I also agree with @gianm we should be able to switch between implementations until it has been verified to be production ready. |
5d4423f to
75fdd58
Compare
|
Added "buildV9Directly" option to TuningConfig, docs are updated |
|
@himanshug @xvrl we good to move forward? |
|
👍 for me |
|
👍 for me too. Will leave this open until tomm to see if anyone else has comments. @KurtYoung have you filled out the CLA: http://druid.io/community/cla.html You guys might consider a corporate CLA. |
|
@fjy @KurtYoung pls squash the commits / cleanup the history, very useful contribution. |
|
@fjy @KurtYoung I might be wrong, but I still see some comments outstanding. Can we respond or address them? |
|
@fjy I have filled out individual CLA, don't know if i had the right to fill a corporate CLA. |
|
@KurtYoung thanks @xvrl any more comments? |
|
@xvrl All these comments had been addressed and solved. |
add unit tests for IndexMergerV9 and fix some bugs add more unit tests and fix bugs handle null values and add more tests minor changes & use LoggingProgressIndicator in IndexGeneratorReducer make some static class public from IndexMerger minor changes and add some comments changes for comments
75fdd58 to
82ff98c
Compare
|
squashed into 3 commits. |
There was a problem hiding this comment.
since buildV9Directly is never null, this should probably be boolean instead of Boolean
There was a problem hiding this comment.
and also renamed to isBuildV9Directly
This PR tracks the feature of building v9 directly which had been discussed in https://groups.google.com/forum/#!topic/druid-development/0CxhljSGeeo
We can divide this PR into 3 main parts:
Here are the classed which are doing the real things:
VSizeIndexedIntsWriter (single value, vsize encoded, not compressed)
CompressedIntsIndexedWriter (single value, not vsize encoded, compressed)
CompressedVSizeIntsIndexedWriter (single value, vsize encoded, compressed)
VSizeIndexedWriter (multi value, both offset and values are vsized, not compressed)
CompressedVSizeIndexedV3Writer (multi value, only values are vsized, compressed)
More details can be found here: https://groups.google.com/forum/#!topic/druid-development/0CxhljSGeeo
LongColumnSerializer (write long metrics)
FloatColumnSerializer (write float metrics)
ComplexColumnSerializer (writer complex metrics)