Single dimension hash-based partitioning#2570
Single dimension hash-based partitioning#2570himanshug merged 1 commit intoapache:masterfrom binlijin:single_dimension_partitioning
Conversation
|
@binlijin can you please
Thanks ! |
|
@b-slim , ok, i will add these. |
|
@binlijin I'm wondering if instead of adding a new type of partitioning we could extend the existing "hashed" type to support an arbitrary number of dimensions. I think it would also help reduce the confusion around the different partitioning schemes. |
|
@xvrl , ok , i will change it lately. |
There was a problem hiding this comment.
why this is ignored isn't part of the state ?
There was a problem hiding this comment.
done, have change this to JsonProperty
There was a problem hiding this comment.
can you add @Nullable as method signature
There was a problem hiding this comment.
minor nit: would it make sense and make code more clean to make partitonDimensions an empty list when user does not specify any dimensions ?
There was a problem hiding this comment.
@nishantmonu51 ok, will remove @nullable also, because there is no necessary.
|
LGTM after taking care of minor comments. |
|
👍 after my comment on docs is answered |
There was a problem hiding this comment.
How would you feel about making this passed as a json property in HashedPartitionsSpec and add it to the docs, so that people can use it ?
There was a problem hiding this comment.
so, now, what is the difference between following configurations?
{
"type": "hashed",
"numShards": n,
"partitionDimensions": <whatever>
}
and
{
"type": "dimension",
"numShards": n,
"partitionDimensions": <whatever>
}
I think they will produce exactly same results. In other words, I don't think "dimension" partition spec needs to support "numShards" as it is only used in the case when you want to partition data by dimension value range. Fixed shards case is already supported by "hashed" partition spec.
There was a problem hiding this comment.
Yes, they will produce exactly same results, i will remove "dimension" partition spec's "numShards".
Single dimension hash-based partitioning
| @JsonProperty | ||
| public List<String> getPartitionDimensions() | ||
| { | ||
| return ImmutableList.of(); |
There was a problem hiding this comment.
why doesn't this return "partitionDimension"?
|
With this change should single partition hash spec be deprecated? |
|
@drcrallen, if you want to partition data by one dimension or dimensions with "numShards", use If you want to partition data by one dimension with "targetPartitionSize", use |
Dimension hash-based partitioning.
Partitioning rows across those segments according to the hash of the partition dimension in each row. So all rows with a particular value for that dimension will go into the same segment.
See the requirement https://groups.google.com/forum/#!topic/druid-user/yfAzwStIZGo
and https://groups.google.com/forum/#!topic/druid-development/GXRpXfBzfJs