ability to build and load custom segment in realtime node#4448
ability to build and load custom segment in realtime node#4448kaijianding wants to merge 3 commits intoapache:masterfrom
Conversation
618ab1a to
7080bbb
Compare
|
@kaijianding you are doing a lot of PRs, could you also review other people's PRs? And then become a committer? |
7080bbb to
7b78f82
Compare
|
Sure, will review other's PRs and glad to hear that I'm close to being a committer @leventov |
|
@kaijianding I'm not sure about closeness and it's not for me to decide because I'm not a PMC member, but making reviews is essential because it's a painful aspect in the project that not enough reviews are done. When you are making a lot of PRs you demand reviews but not "contribute them back". |
7b78f82 to
cfd979c
Compare
|
I will try to review this. Sorry for not having much time to do the review. |
|
I can give it a try next week |
b27a368 to
d31abb7
Compare
There was a problem hiding this comment.
What is the differences between loadSegment().asQueryableIndex() and just loadIndex()?
I noticed there still exists plenty usage of loadIndex(), should they also be changed?
There was a problem hiding this comment.
every place want to load index from file should call loadSegment instead of loadIndex. inside loadSegment, it will use the customized segmentizerFactory to map the file to index.
Most usages of loadIndex() are in tests and benchmarks, so I don't modify them.
I will double check the rest usages to make sure every place calling loadIndex is replaced by loadSegment()
There was a problem hiding this comment.
If loadSegment is the only correct way to load a segment, then i would suggest you to delete the `loadIndex' method.
There was a problem hiding this comment.
I rewrite the indexIO.loadIndex() to use segmentizerFactory to load index from file, then there is no need to modify all the place calling indexIO.loadIndex() now
There was a problem hiding this comment.
yes, otherwise the test will fail.
public AlertBuilder makeAlert(Throwable t, String message, Object... objects)
{
if (emitter == null) {
final String errorMessage = String.format(
"Emitter not initialized! Cannot alert. Please make sure to call %s.registerEmitter()", this.getClass()
);
the emitter is registered in EmittingLogger.registerEmitter
There was a problem hiding this comment.
Can you double check this? I commented this line and the test passed. BTW, why this test will not fail before you added this, and why the test case would enter the logic about making any alerts?
There was a problem hiding this comment.
after rewrite indexIO.loadIndex(), there is no need to modify this anymore
There was a problem hiding this comment.
yes, the customIndexMerger may need to use Metadata, like in my customIndexMerger implement.
There was a problem hiding this comment.
Can we change this until we see a real customized index merger?
There was a problem hiding this comment.
my implement want to sub class SmooshedFileMapper, so need it can be extended.
There was a problem hiding this comment.
@kaijianding SmooshedFileMapper is not a part of public api, and could be changed anytime so that your subclass will break. Worse, the behaviour may break, however without compilation errors, so you won't notice it. So if you want something similar to SmooshedFileMapper in your custom SegmentizerFactory, I strongly suggest to copy SmooshedFileMapper entirely into your codebase and tailor it for your needs.
There was a problem hiding this comment.
I want to use V9IndexLoader.load(SmooshedFileMapper smooshedFiles, File inDir, ObjectMapper mapper), this method requires SmooshedFileMapper.
It's fine SmooshedFileMapper can change behaviors at any time, I extend SmooshedFileMapper in my extension and will have unit test to ensure everything is working properly.
There was a problem hiding this comment.
Does this related to your proposed change?
There was a problem hiding this comment.
yes, this is used to load persisted segment in realtime node
There was a problem hiding this comment.
change it to loadIndex()
There was a problem hiding this comment.
Looks like this is some kind of performance improvement?
There was a problem hiding this comment.
this is some cleanup to StringDimensionMergerV9.java, in this part, only dimValuesList is needed, but more info is required(the adapters) than needed.
There was a problem hiding this comment.
So this is unrelated to this PR, right? Can we separate this to another PR?
KurtYoung
left a comment
There was a problem hiding this comment.
Looks like this PR contains 3 different changes:
- make SegmentizerFactory works in some cases
- introduce sinkFactory and customIndexMeger
- some performance improvement
I think these 3 things are completely independent, can you split this PR into three?
|
The purpose of this PR is to provide the ability to build and load customized segment. SinkFactory and customizeIndexMerger are used to build customized segment, and the indexIo.loadSegment() is to load the customized segment So I think the 3 modifications are better in one PR. @KurtYoung |
|
Make sense except the third point. Overall it looks good to me. @leventov Do you have any design concerns? |
7053cf0 to
4f82eb8
Compare
4f82eb8 to
e3256fc
Compare
| @Deprecated | ||
| private final long handoffConditionTimeout; | ||
| private final boolean resetOffsetAutomatically; | ||
| private final SinkFactory sinkFactory; |
There was a problem hiding this comment.
Please extract an umbrella abstraction like "SegmentStrategy" with createSinkFactory() and createIndexMerger() methods, and inject/serialize/deserialize only it, here and in other places.
| } | ||
| return indexSpec != null ? indexSpec.equals(that.indexSpec) : that.indexSpec == null; | ||
|
|
||
| if (indexSpec != null ? !indexSpec.equals(that.indexSpec) : that.indexSpec != null) { |
There was a problem hiding this comment.
Refactor to use Objects.equals()
There was a problem hiding this comment.
it's generated by Intellij Idea, but sure, will modify it.
There was a problem hiding this comment.
You can generate the forms that I suggested with intelliJ as well, need to choose another option in generation dialog. Once you choose it, it will be the default later.
| result = 31 * result + (reportParseExceptions ? 1 : 0); | ||
| result = 31 * result + (int) (handoffConditionTimeout ^ (handoffConditionTimeout >>> 32)); | ||
| result = 31 * result + (resetOffsetAutomatically ? 1 : 0); | ||
| result = 31 * result + (sinkFactory != null ? sinkFactory.hashCode() : 0); |
There was a problem hiding this comment.
Refactor to use Objects.hash()
| // Set of spilled segments. Will be merged at the end. | ||
| final Set<File> spilled = Sets.newHashSet(); | ||
|
|
||
| // IndexMerger implementation. |
| final Set<File> spilled = Sets.newHashSet(); | ||
|
|
||
| // IndexMerger implementation. | ||
| final IndexMerger theIndexMerger = config.getCustomIndexMerger() != null |
There was a problem hiding this comment.
Regarding null handling - doesn't Jackson handle it up front, when you specify defaultImpl=... on the IndexMerger interface (or SegmentStrategy for that matter, if you apply suggestion above)?
There was a problem hiding this comment.
usually the default implement is provided in the class context, like indexMergerV9 here. Also the TaskToolbox has the default implement, like toolbox.getIndexMergerV9().
So I think it's better to regard the old code here to get the default implement.
| } | ||
|
|
||
| static class V9IndexLoader implements IndexLoader | ||
| public static class V9IndexLoader implements IndexLoader |
There was a problem hiding this comment.
see comments below
| { | ||
| return getSegmentizerFactory(inDir).loadIndex(inDir); | ||
| } | ||
|
|
There was a problem hiding this comment.
Readers and users will never know the difference between loadIndex() and loadIndexDirectly(). Maybe move loadIndexDirectly() to MMappedQueryableSegmentizerFactory. Along with some other methods.
There was a problem hiding this comment.
ok, but many related code should be moved to.
| @JsonSubTypes(value = { | ||
| @JsonSubTypes.Type(name = "v9", value = IndexMergerV9.class) | ||
| }) | ||
| @ImplementedBy(IndexMergerV9.class) |
There was a problem hiding this comment.
@ImplementedBy is likely not needed since Json defaultImpl is added.
There was a problem hiding this comment.
@ImplementedBy is guice thing, I'm not sure it can work properly after removing it if it is not a json case
| setupEncodedValueWriter(); | ||
| } | ||
|
|
||
| protected List<Indexed<String>> toDimValuesList(List<IndexableAdapter> adapters) throws IOException |
There was a problem hiding this comment.
As far as I can understand you changed this class in order to be able to extend it in your code. See comment to SmooshedFileMapper, don't do this. Copy class entirely into your codebase.
| @@ -174,7 +174,7 @@ public static final AggregationTestHelper createSelectQueryAggregationTestHelper | |||
| new InjectableValues.Std().addValue( | |||
There was a problem hiding this comment.
Formatting
new InjectableValues.Std()
.addValue(SelectQueryConfig.class, new SelectQueryConfig(true))
.addValue(IndexIO.class, TestHelper.getTestIndexIO())|
Just a drive by comment on the public api stuff. I believe none of the segment-type extension points are public apis at this point (see http://druid.io/docs/latest/development/modules.html) and that's because we're not yet entirely sure what custom segment extensions should look like and what support they might need from core druid. IMO, if @kaijianding is ok with this then it's fine for his custom extensions to use internal apis. It's part of the learning process of what should and shouldn't be public for these kinds of extensions. At some point in the future, we might have a better idea of what should be public, and then we can make those apis public. |
|
I don't build custom segment from the very beginning but do some wrap on top of the old codes, like MetaData, Smoosh, V9 indexMerger and V9 loader, the sink and onHeapIncreamentalIndex. I modified the structure of the final smooshed file, but I didn't modify the v9 format itself. Then I need to access the metadata.drd and use v9 loader to load index from the smooshed file from position P1 to Pn. That's the reason I want MetaData and IndexLoader to be public, thus I can use it in my extension to load the custom segment from file. @gianm @leventov |
|
@kaijianding I'm ok with just making (or keeping) classes and constructors public, but I'm not OK with adding properties and fields which are unused in the core codebase and protected methods to classes that have no subclasses in the core codebase. It increases complexity of the core for nothing. And it's not a step towards modular design, the ultimate goal of this PR as well. |
|
Are there a lot of properties and fields in the patch that don't need to be protected or even exist at all? (Sorry, I haven't had a chance to read it yet). I'm hoping we can find a balance between keeping the core clean, and making the extension possible. |
|
Fine, will restore some changes and modify them when I provide an actually custom implement @leventov |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
It's a follow up of #2965 #3901