Deprecate NoneShardSpec and drop support for automatic segment merge#6883
Deprecate NoneShardSpec and drop support for automatic segment merge#6883jihoonson merged 25 commits intoapache:masterfrom
Conversation
| this.dimensions = prepareDimensionsOrMetrics(dimensions, DIMENSIONS_INTERNER); | ||
| this.metrics = prepareDimensionsOrMetrics(metrics, METRICS_INTERNER); | ||
| this.shardSpec = (shardSpec == null) ? NoneShardSpec.instance() : shardSpec; | ||
| this.shardSpec = (shardSpec == null) ? new NumberedShardSpec(0, 1) : shardSpec; |
There was a problem hiding this comment.
Any reason not to have a static shared instance for NumberedShardSpec(0, 1) the same way we did for NoneShardSpec?
There was a problem hiding this comment.
Well, we can. But I think it would be not common to create a dataSegment with null shardSpec. I guess this code is just for backward compatibility to support a really old version of dataSegment. In recent versions, every dataSegment must have a proper shardSpec.
There was a problem hiding this comment.
I guess I was more thinking of the impact through DataSegment.Builder which with this change will now create a new object in the constructor to set the default before it is likely later replaced when the actual shard spec is set before completing the build, instead of the previous behavior of re-using the same object.
There was a problem hiding this comment.
Oh, DataSegment.Builder is mostly for benchmark or unit tests. It's not supposed to be used in production codes.
I don't think the builder pattern is appropriate for DataSegment because it requires for every fields to be filled except loadSpec in its constructor. But, it's quite prevalent in unit tests, so I left it.
There was a problem hiding this comment.
Ah, just saw a bunch of usages, didn't look too closely at what they were 👍
| this.dimensions = ImmutableList.of(); | ||
| this.metrics = ImmutableList.of(); | ||
| this.shardSpec = NoneShardSpec.instance(); | ||
| this.shardSpec = new NumberedShardSpec(0, 1); |
There was a problem hiding this comment.
continuing other comment chain, alternatively maybe this line isn't necessary if everything should be setting it
clintropolis
left a comment
There was a problem hiding this comment.
Code changes look good to me I think, but maybe we should mark the MergeTask as deprecated since NoneShardSpec will no longer be produced after this patch, making it unusable for all new segments.
|
@clintropolis sounds good. I deprecated mergeTask and |
|
I don't understand why teamcity complains about |
|
@clintropolis thank you for the review. I added the |
|
I've removed mergeTask because it wouldn't work anymore. The coordinator throws an exception if |
gianm
left a comment
There was a problem hiding this comment.
I like this change, because I think it is simplifying in two ways:
- It promotes usage of 'extendable' specs.
- It removes the 'auto merge' functionality in the Coordinator (which does horizontal merging, across time chunks, of NoneShardSpecced data) in favor of the newer 'auto compaction' functionality (which can do both horizontal and vertical merging, and can work on time chunks with multiple segments).
I think both of these things will make data management easier and simpler. Please retitle the PR to reflect the second point, though. Right now it only mentions the first one.
| new NamedType(HashBasedNumberedShardSpec.class, "hashed") | ||
| ) | ||
| ); | ||
| return Collections.emptyList(); |
There was a problem hiding this comment.
If you don't have Jackson modules, you could just extend com.google.inject.Module instead. (A DruidModule is a Guice Module that also adds Jackson modules.)
NoneShardSpecrepresents a single partition in a timeChunk. It's basically same with other shardSpecs with total number of partitions = 1 without details of the partition scheme. I think we're using it because mergeTask is able to merge segments only if they haveNoneShardSpec(#3241). However, now, we have a compactionTask which can merge any segments.This PR is to deprecate
NoneShardSpec. Also, I moved allShardSpecimplementations todruid-core. Other changes are:NoneShardSpec. Realtime tasks useNumberedShardSpec. Batch tasks can useNumberedShardSpec,HashBasedNumberedShardSpecorSingleDimensionShardSpec.forceExtendableShardSpecsanymore. The way it worked is, whenforceGuaranteedRollup= true andforceExtendableShardSpecs= false,NoneShardSpecwas used for a single partition whileHashBasedNumberedShardSpecwas used for others. This is inconsistent becauseHashBasedNumberedShardSpecis extendable butNoneShardSpecis not. IfforceExtendableShardSpecsis set,NumberedShardSpecis used when publishing segments. Now, IndexTask only uses eitherNumberedShardSpecorHashBasedNumberedShardSpec. Both of them are extendable, so there's no point forforceExtendableShardSpecs.forceExtendableShardSpecs. It can be set to useNumberedShardSpecfor range partitioned segments.This is an incompatible change since mergeTask cannot be used for new segments.