Add tool for migrating from local deep storage/Derby metadata#7598
Add tool for migrating from local deep storage/Derby metadata#7598jon-wei merged 8 commits intoapache:masterfrom
Conversation
|
|
||
| This helps users migrate segments stored in local deep storage to HDFS. | ||
|
|
||
| `--hadoopStorageDirectory`, `h`: The HDFS path that will hold the migrated segments |
There was a problem hiding this comment.
Thanks @jon-wei. Please consider my comments below. Also I tested the command in the doc, it emitted the below error. Would you check it please?
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
May 06, 2019 11:55:17 AM org.hibernate.validator.internal.util.Version <clinit>
INFO: HV000001: Hibernate Validator 5.1.3.Final
Exception in thread "main" java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2 errors
at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:71)
at org.apache.druid.cli.ExportMetadata.run(ExportMetadata.java:159)
at org.apache.druid.cli.Main.main(Main.java:118)
Caused by: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) A binding to com.google.common.base.Supplier<org.apache.druid.server.audit.SQLAuditManagerConfig> was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2) A binding to org.apache.druid.server.audit.SQLAuditManagerConfig was already configured at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.mysql.MySQLMetadataStorageModule).
at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.metadata.storage.postgresql.PostgreSQLMetadataStorageModule)
2 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:470)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:99)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at org.apache.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:419)
at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:68)
... 2 more
| To use the tool, you can run the following command: | ||
|
|
||
| ```bash | ||
| java -classpath "lib/*:conf/druid/single-server/micro-quickstart/_common" org.apache.druid.cli.Main tools export-metadata -o /tmp/csv |
There was a problem hiding this comment.
Maybe adding the below would make clear where the current directory is.
$ cd ${DRUID_ROOT}There was a problem hiding this comment.
Added cd ${DRUID_ROOT}
|
|
||
| Example import commands for MySQL and PostgreSQL are shown below. | ||
|
|
||
| These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the MySQL or PostgreSQL documentation. |
There was a problem hiding this comment.
The usage here is fine
| - rules | ||
| - config | ||
| - datasource | ||
| - supervisors |
There was a problem hiding this comment.
It looks that migrating only these tables would be enough for now, but if we extend this tool to support other types of deep storage in the future, then it would be probably worth to include audit, tasks, and tasklogs.
There was a problem hiding this comment.
agree, it would be useful to support those in the future
| @@ -0,0 +1,169 @@ | |||
| --- | |||
| layout: doc_page | |||
| title: "Migrating Derby Metadata and Local Deep Storage" | |||
There was a problem hiding this comment.
you can have a non-HA cluster with non local deep storage and derby metadata store
I think the metadata and deep storage migration should have separate docs
There was a problem hiding this comment.
Some food for thought:
when a user goes from single server to cluster, they should first read the deep storage migration doc
when a user goes from non-HA cluster to HA-cluster, they should read the metadata store migration doc
There was a problem hiding this comment.
Hm, there may be some redundancy in contents but I'll look into splitting
There was a problem hiding this comment.
Redundancy is fine as long as it is clear to the user what they should do.
There was a problem hiding this comment.
I made a separate page for the export-metadata tool and made separate pages for metadata/deep storage migration which reference the tool doc page
I fixed this, I had simplified the example command too much and left out |
|
|
||
| ```bash | ||
| cd ${DRUID_ROOT} | ||
| java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[] org.apache.druid.cli.Main tools export-metadata --connectURI "jdbc:derby://localhost:1527/var/druid/metadata.db;" -o /tmp/csv |
There was a problem hiding this comment.
Would you please add mkdir -p /tmp/csv too? Looks like the output directory must exist before running this command.
There was a problem hiding this comment.
Added that and a note about making sure the directory exists
This PR adds a new tool under
servicesmeant to assist with the following use case:Users sometimes begin evaluating Druid with a simple deployment that uses local deep storage and Derby. After using the evaluation deployment for some time and ingesting some segments, they wish to move to MySQL or PostgreSQL and/or use a different deep storage, while keeping their old segments and ingestion setup.
This tool exports the contents of the following Druid tables:
These tables are chosen since they contain non-transient (like task locks), non-historical entities (like task logs), as the tool is intended for migrations where the user shuts the entire cluster down.
The tool also allows users to specify a new S3 bucket/key, HDFS path, or new local filesystem path, and the entries from the segments table will be rewritten with new load specs. This is to assist with deep storage migration.
Currently only migration from local deep storage combined with Derby metadata is supported (the use case described above), the tool could be later expanded to handle other use cases.