insert-segment tool by guobingkun · Pull Request #1861 · apache/druid

guobingkun · 2015-10-26T18:17:50Z

This tool can be used to load segments into Druid by inserting the segment's payload into metadata storage. People can use it to migrate segments to different deep storages or even recover segments as long as they have segments stored in the deep storage.

Usage example:
java -Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-hdfs-storage\"] -cp $CLASSPATH io.druid.cli.Main tools insert-segment --workingDir hdfs://tmp/druid/localStorage/wikipedia/

Suppose under wikipedia, it looks like this,

├── 2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z
│   └── 2015-10-21T22:07:57.074Z
│       └── 0
│           ├── descriptor.json
│           └── index.zip
├── 2013-09-01T00:00:00.000Z_2013-09-02T00:00:00.000Z
│   └── 2015-10-21T22:07:57.074Z
│       └── 0
│           ├── descriptor.json
│           └── index.zip
├── 2013-09-02T00:00:00.000Z_2013-09-03T00:00:00.000Z
│   └── 2015-10-21T22:07:57.074Z
│       └── 0
│           ├── descriptor.json
│           └── index.zip
└── 2013-09-03T00:00:00.000Z_2013-09-04T00:00:00.000Z
    └── 2015-10-21T22:07:57.074Z
        └── 0
            ├── descriptor.json
            └── index.zip

Then these 4 segments will be found and inserted into metadata storage, each segment's loadSpec will be updated with the location from where it was found.

This PR depends on druid-io/druid-api#62
DataSegmentFinder is used so that this tool can work on different deep storages.
An HDFS version is implemented in this PR.

fjy · 2015-10-26T21:02:56Z

Can we add some documentation in the Druid docs about using this tool?

guobingkun · 2015-10-26T21:04:54Z

@fjy yeah, I am working on it.

himanshug · 2015-10-26T21:23:42Z

do we need to do this if updateDescriptor is set to false?

nvrmnd, it seems that is needed for db update later.

indexZip.toString() would bring hdfs://host:port as well? I would just put the absolute path.

I made some change so that it only puts relative path(without storage scheme prefix) in the loadSpec.

drcrallen · 2015-10-26T21:30:58Z

This tool does not interact with the interval locking right? That means you may end up with nasty race conditions if used on an active cluster.

himanshug · 2015-10-26T21:33:45Z

why comparing Strings and not DataSegment object itself which would be more reliable?

I was comparing DataSegment, but then found the implementation of equals() in DataSegment only compared identifier, so it will return true as long as two segments have the same identifier.

himanshug · 2015-10-26T21:57:58Z

@drcrallen I think this tool is intended to be used in a situation where someone manually wants to migrate data from one place to another or rebuild metadata store segments table. In this usecase, we would expect user to have druid cluster in a safe mode (where there are no active tasks to interfere or just bringing down the cluster).

Adding the locking to this code will increase complexity and will also require for overlord to exist for this to work.

himanshug · 2015-10-26T21:59:36Z

description should say something about possible updation of descriptor.json on the fs too ?
can we make the name insert-segments-to-db ?

Done changing the name to insert-segments-to-db

drcrallen · 2015-10-26T22:12:14Z

@himanshug Ok, just need to make sure to clarify that in the docs

nishantmonu51 · 2015-10-29T09:33:08Z

can we rename mysql to metadata storage credentials ?

Done renaming.

guobingkun · 2015-11-03T17:35:54Z

@fjy @drcrallen @nishantmonu51 Added doc and emphasized the correct use case in the doc.
I also made some changes so that it only puts relative HDFS path in loadSpec, in this way there is no need to update database if segments are migrated from one HDFS to another (assuming the relative path doesn't change).

Tested this tool with 5436 segments, completed in 5 minutes.

guobingkun · 2015-11-18T23:02:20Z

Will update this version once the new druid-api is released.

himanshug · 2015-12-17T04:02:51Z

this doc is not linked anywhere, how are users expected to find it? can you add it to "operations" section in toc file?

Added into "operations" section.

himanshug · 2015-12-17T04:03:26Z

👍 after #1861 (comment) is resolved.

fjy · 2015-12-19T01:01:52Z

i don't like this TOC heading and this getting its own section

I prefer we rework the libraries section for Druid and for that page to link to the doc here

for now, can we add a link to this tool in the libraries section?

Done adding a link to this tool in the libraries section.

guobingkun force-pushed the insert_segment_tool branch from 779e4b4 to 293e6e0 Compare October 26, 2015 18:35

himanshug reviewed Oct 26, 2015
View reviewed changes

guobingkun force-pushed the insert_segment_tool branch from 293e6e0 to 4fd8b34 Compare October 28, 2015 15:57

nishantmonu51 reviewed Oct 29, 2015
View reviewed changes

guobingkun force-pushed the insert_segment_tool branch from 31ce9c4 to 4f3bc3f Compare November 3, 2015 17:25

guobingkun mentioned this pull request Nov 3, 2015

Add DataSegmentFinder for finding segments under a directory druid-io/druid-api#62

Merged

guobingkun reviewed Nov 18, 2015
View reviewed changes

Comment thread pom.xml Outdated

Copy link
Copy Markdown

Contributor Author

guobingkun Nov 18, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this version once the new druid-api is released.

drcrallen added this to the 0.9.0 milestone Dec 1, 2015

guobingkun closed this Dec 14, 2015

guobingkun reopened this Dec 14, 2015

guobingkun force-pushed the insert_segment_tool branch 2 times, most recently from 1421fc3 to 1621866 Compare December 15, 2015 21:50

guobingkun closed this Dec 16, 2015

guobingkun reopened this Dec 16, 2015

guobingkun force-pushed the insert_segment_tool branch from 1621866 to caa2d80 Compare December 16, 2015 22:31

himanshug reviewed Dec 17, 2015
View reviewed changes

guobingkun force-pushed the insert_segment_tool branch from caa2d80 to 6f4a0e5 Compare December 18, 2015 17:45

fjy reviewed Dec 19, 2015
View reviewed changes

Conversation

guobingkun commented Oct 26, 2015

Uh oh!

fjy commented Oct 26, 2015

Uh oh!

guobingkun commented Oct 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drcrallen commented Oct 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented Oct 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drcrallen commented Oct 26, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guobingkun commented Nov 3, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

himanshug commented Dec 17, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants