Skip to content

Single dimension hash-based partitioning#2570

Merged
himanshug merged 1 commit intoapache:masterfrom
binlijin:single_dimension_partitioning
Mar 22, 2016
Merged

Single dimension hash-based partitioning#2570
himanshug merged 1 commit intoapache:masterfrom
binlijin:single_dimension_partitioning

Conversation

@binlijin
Copy link
Copy Markdown
Contributor

@binlijin binlijin commented Mar 1, 2016

Dimension hash-based partitioning.
Partitioning rows across those segments according to the hash of the partition dimension in each row. So all rows with a particular value for that dimension will go into the same segment.

See the requirement https://groups.google.com/forum/#!topic/druid-user/yfAzwStIZGo
and https://groups.google.com/forum/#!topic/druid-development/GXRpXfBzfJs

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Mar 1, 2016

@binlijin can you please

  1. add documentation
  2. add UTs at least ser deser
  3. explain how this sort of solve the two issues you have mentioned above ?

Thanks !

@binlijin binlijin added this to the 0.9.2 milestone Mar 2, 2016
@binlijin
Copy link
Copy Markdown
Contributor Author

binlijin commented Mar 2, 2016

@b-slim , ok, i will add these.

@fjy fjy closed this Mar 2, 2016
@fjy fjy reopened this Mar 2, 2016
@binlijin binlijin changed the title Single dimension partitioning Single dimension hash-based partitioning Mar 2, 2016
@binlijin binlijin closed this Mar 7, 2016
@binlijin binlijin reopened this Mar 7, 2016
@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 7, 2016

@binlijin I'm wondering if instead of adding a new type of partitioning we could extend the existing "hashed" type to support an arbitrary number of dimensions. I think it would also help reduce the confusion around the different partitioning schemes.

@binlijin
Copy link
Copy Markdown
Contributor Author

binlijin commented Mar 8, 2016

@xvrl , ok , i will change it lately.

@binlijin binlijin modified the milestones: 0.9.1, 0.9.2 Mar 8, 2016
@binlijin
Copy link
Copy Markdown
Contributor Author

binlijin commented Mar 8, 2016

@b-slim @xvrl @fjy what about now?
We need this to rebuild our product data, and it it important for us, because the ShardSpec information store with segment meta in mysql.

@binlijin binlijin closed this Mar 8, 2016
@binlijin binlijin reopened this Mar 8, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is ignored isn't part of the state ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, have change this to JsonProperty

@binlijin binlijin closed this Mar 11, 2016
@binlijin binlijin reopened this Mar 11, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add @Nullable as method signature

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: would it make sense and make code more clean to make partitonDimensions an empty list when user does not specify any dimensions ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51 ok, will remove @nullable also, because there is no necessary.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented Mar 11, 2016

LGTM after taking care of minor comments.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Mar 14, 2016

👍 after my comment on docs is answered

@binlijin binlijin closed this Mar 16, 2016
@binlijin binlijin reopened this Mar 16, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about making this passed as a json property in HashedPartitionsSpec and add it to the docs, so that people can use it ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@binlijin binlijin closed this Mar 17, 2016
@binlijin binlijin reopened this Mar 17, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, now, what is the difference between following configurations?

{
  "type": "hashed",
  "numShards": n,
  "partitionDimensions": <whatever>
}

and

{
  "type": "dimension",
  "numShards": n,
  "partitionDimensions": <whatever>
}

I think they will produce exactly same results. In other words, I don't think "dimension" partition spec needs to support "numShards" as it is only used in the case when you want to partition data by dimension value range. Fixed shards case is already supported by "hashed" partition spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they will produce exactly same results, i will remove "dimension" partition spec's "numShards".

@binlijin binlijin closed this Mar 22, 2016
@binlijin binlijin reopened this Mar 22, 2016
himanshug added a commit that referenced this pull request Mar 22, 2016
@himanshug himanshug merged commit 3220b10 into apache:master Mar 22, 2016
@JsonProperty
public List<String> getPartitionDimensions()
{
return ImmutableList.of();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this return "partitionDimension"?

@drcrallen
Copy link
Copy Markdown
Contributor

With this change should single partition hash spec be deprecated?

@binlijin
Copy link
Copy Markdown
Contributor Author

@drcrallen, if you want to partition data by one dimension or dimensions with "numShards", use

{
  "type": "hashed",
  "numShards": n,
  "partitionDimensions": <whatever>
}

If you want to partition data by one dimension with "targetPartitionSize", use

  "partitionsSpec": {
     "type": "dimension",
     "targetPartitionSize": 5000000
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants