Conversation
779e4b4 to
293e6e0
Compare
|
Can we add some documentation in the Druid docs about using this tool? |
|
@fjy yeah, I am working on it. |
There was a problem hiding this comment.
do we need to do this if updateDescriptor is set to false?
There was a problem hiding this comment.
nvrmnd, it seems that is needed for db update later.
There was a problem hiding this comment.
indexZip.toString() would bring hdfs://host:port as well? I would just put the absolute path.
There was a problem hiding this comment.
I made some change so that it only puts relative path(without storage scheme prefix) in the loadSpec.
|
This tool does not interact with the interval locking right? That means you may end up with nasty race conditions if used on an active cluster. |
There was a problem hiding this comment.
why comparing Strings and not DataSegment object itself which would be more reliable?
There was a problem hiding this comment.
I was comparing DataSegment, but then found the implementation of equals() in DataSegment only compared identifier, so it will return true as long as two segments have the same identifier.
|
@drcrallen I think this tool is intended to be used in a situation where someone manually wants to migrate data from one place to another or rebuild metadata store segments table. In this usecase, we would expect user to have druid cluster in a safe mode (where there are no active tasks to interfere or just bringing down the cluster). Adding the locking to this code will increase complexity and will also require for overlord to exist for this to work. |
There was a problem hiding this comment.
description should say something about possible updation of descriptor.json on the fs too ?
can we make the name insert-segments-to-db ?
There was a problem hiding this comment.
Done changing the name to insert-segments-to-db
|
@himanshug Ok, just need to make sure to clarify that in the docs |
293e6e0 to
4fd8b34
Compare
There was a problem hiding this comment.
can we rename mysql to metadata storage credentials ?
31ce9c4 to
4f3bc3f
Compare
|
@fjy @drcrallen @nishantmonu51 Added doc and emphasized the correct use case in the doc. Tested this tool with 5436 segments, completed in 5 minutes. |
There was a problem hiding this comment.
Will update this version once the new druid-api is released.
1421fc3 to
1621866
Compare
1621866 to
caa2d80
Compare
There was a problem hiding this comment.
this doc is not linked anywhere, how are users expected to find it? can you add it to "operations" section in toc file?
There was a problem hiding this comment.
Added into "operations" section.
|
👍 after #1861 (comment) is resolved. |
caa2d80 to
6f4a0e5
Compare
There was a problem hiding this comment.
i don't like this TOC heading and this getting its own section
I prefer we rework the libraries section for Druid and for that page to link to the doc here
There was a problem hiding this comment.
for now, can we add a link to this tool in the libraries section?
There was a problem hiding this comment.
Done adding a link to this tool in the libraries section.
This tool can be used to load segments into Druid by inserting the segment's payload into metadata storage. People can use it to migrate segments to different deep storages or even recover segments as long as they have segments stored in the deep storage.
Usage example:
java -Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-hdfs-storage\"] -cp $CLASSPATH io.druid.cli.Main tools insert-segment --workingDir hdfs://tmp/druid/localStorage/wikipedia/Suppose under wikipedia, it looks like this,
Then these 4 segments will be found and inserted into metadata storage, each segment's loadSpec will be updated with the location from where it was found.
This PR depends on druid-io/druid-api#62
DataSegmentFinder is used so that this tool can work on different deep storages.
An HDFS version is implemented in this PR.