-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add tool for migrating from local deep storage/Derby metadata #7598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
4a390d2
Add tool for migrating from local deep storage/Derby metadata
jon-wei 536bf18
Split deep storage and metadata migration docs
jon-wei a6f5f60
Support import into Derby
jon-wei 337ee20
Fix create tables cmd
jon-wei 41bfcc2
Fix create tables cmd
jon-wei d6106cb
Fix commands
jon-wei ab203d4
PR comment
jon-wei 9cd1ded
Add -p
jon-wei File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| --- | ||
| layout: doc_page | ||
| title: "Deep Storage Migration" | ||
| --- | ||
|
|
||
| <!-- | ||
| ~ Licensed to the Apache Software Foundation (ASF) under one | ||
| ~ or more contributor license agreements. See the NOTICE file | ||
| ~ distributed with this work for additional information | ||
| ~ regarding copyright ownership. The ASF licenses this file | ||
| ~ to you under the Apache License, Version 2.0 (the | ||
| ~ "License"); you may not use this file except in compliance | ||
| ~ with the License. You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, | ||
| ~ software distributed under the License is distributed on an | ||
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| ~ KIND, either express or implied. See the License for the | ||
| ~ specific language governing permissions and limitations | ||
| ~ under the License. | ||
| --> | ||
|
|
||
| # Deep Storage Migration | ||
|
|
||
| If you have been running an evaluation Druid cluster using local deep storage and wish to migrate to a | ||
| more production-capable deep storage system such as S3 or HDFS, this document describes the necessary steps. | ||
|
|
||
| Migration of deep storage involves the following steps at a high level: | ||
| - Copying segments from local deep storage to the new deep storage | ||
| - Exporting Druid's segments table from metadata | ||
| - Rewriting the load specs in the exported segment data to reflect the new deep storage location | ||
| - Reimporting the edited segments into metadata | ||
|
|
||
| ## Shut down cluster services | ||
|
|
||
| To ensure a clean migration, shut down the non-coordinator services to ensure that metadata state will not | ||
| change as you do the migration. | ||
|
|
||
| When migrating from Derby, the coordinator processes will still need to be up initially, as they host the Derby database. | ||
|
|
||
| ## Copy segments from old deep storage to new deep storage. | ||
|
|
||
| Before migrating, you will need to copy your old segments to the new deep storage. | ||
|
|
||
| For information on what path structure to use in the new deep storage, please see [deep storage migration options](../operations/export-metadata.html#deep-storage-migration). | ||
|
|
||
| ## Export segments with rewritten load specs | ||
|
|
||
| Druid provides an [Export Metadata Tool](../operations/export-metadata.html) for exporting metadata from Derby into CSV files | ||
| which can then be reimported. | ||
|
|
||
| By setting [deep storage migration options](../operations/export-metadata.html#deep-storage-migration), the `export-metadata` tool will export CSV files where the segment load specs have been rewritten to load from your new deep storage location. | ||
|
|
||
| Run the `export-metadata` tool on your existing cluster, using the migration options appropriate for your new deep storage location, and save the CSV files it generates. After a successful export, you can shut down the coordinator. | ||
|
|
||
| ### Import metadata | ||
|
|
||
| After generating the CSV exports with the modified segment data, you can reimport the contents of the Druid segments table from the generated CSVs. | ||
|
|
||
| Please refer to [import commands](../operations/export-metadata.html#importing-metadata) for examples. Only the `druid_segments` table needs to be imported. | ||
|
|
||
| ### Restart cluster | ||
|
|
||
| After importing the segment table successfully, you can now restart your cluster. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| --- | ||
| layout: doc_page | ||
| title: "Export Metadata Tool" | ||
| --- | ||
|
|
||
| <!-- | ||
| ~ Licensed to the Apache Software Foundation (ASF) under one | ||
| ~ or more contributor license agreements. See the NOTICE file | ||
| ~ distributed with this work for additional information | ||
| ~ regarding copyright ownership. The ASF licenses this file | ||
| ~ to you under the Apache License, Version 2.0 (the | ||
| ~ "License"); you may not use this file except in compliance | ||
| ~ with the License. You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, | ||
| ~ software distributed under the License is distributed on an | ||
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| ~ KIND, either express or implied. See the License for the | ||
| ~ specific language governing permissions and limitations | ||
| ~ under the License. | ||
| --> | ||
|
|
||
| # Export Metadata Tool | ||
|
|
||
| Druid includes an `export-metadata` tool for assisting with migration of cluster metadata and deep storage. | ||
|
|
||
| This tool exports the contents of the following Druid metadata tables: | ||
| - segments | ||
| - rules | ||
| - config | ||
| - datasource | ||
| - supervisors | ||
|
|
||
| Additionally, the tool can rewrite the local deep storage location descriptors in the rows of the segments table | ||
| to point to new deep storage locations (S3, HDFS, and local rewrite paths are supported). | ||
|
|
||
| The tool has the following limitations: | ||
| - Only exporting from Derby metadata is currently supported | ||
| - If rewriting load specs for deep storage migration, only migrating from local deep storage is currently supported. | ||
|
|
||
| ## `export-metadata` Options | ||
|
|
||
| The `export-metadata` tool provides the following options: | ||
|
|
||
| ### Connection Properties | ||
|
|
||
| `--connectURI`: The URI of the Derby database, e.g. `jdbc:derby://localhost:1527/var/druid/metadata.db;create=true` | ||
| `--user`: Username | ||
| `--password`: Password | ||
| `--base`: corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default. | ||
|
|
||
| ### Output Path | ||
|
|
||
| `--output-path`, `-o`: The output directory of the tool. CSV files for the Druid segments, rules, config, datasource, and supervisors tables will be written to this directory. | ||
|
|
||
| ### Export Format Options | ||
|
|
||
| `--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal strings. This needs to be set if importing back into Derby. Default is false. | ||
|
|
||
| `--booleans-as-strings`, `-t`: If set, write boolean values as "true" or "false" instead of "1" and "0". This needs to be set if importing back into Derby. Default is false. | ||
|
|
||
| ### Deep Storage Migration | ||
|
|
||
| #### Migration to S3 Deep Storage | ||
|
|
||
| By setting the options below, the tool will rewrite the segment load specs to point to a new S3 deep storage location. | ||
|
|
||
| This helps users migrate segments stored in local deep storage to S3. | ||
|
|
||
| `--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments | ||
| `--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored | ||
|
|
||
| When copying the local deep storage segments to S3, the rewrite performed by this tool requires that the directory structure of the segments be unchanged. | ||
|
|
||
| For example, if the cluster had the following local deep storage configuration: | ||
|
|
||
| ``` | ||
| druid.storage.type=local | ||
| druid.storage.storageDirectory=/druid/segments | ||
| ``` | ||
|
|
||
| If the target S3 bucket was `migration`, with a base key of `example`, the contents of `s3://migration/example/` must be identical to that of `/druid/segments` on the old local filesystem. | ||
|
|
||
| #### Migration to HDFS Deep Storage | ||
|
|
||
| By setting the options below, the tool will rewrite the segment load specs to point to a new HDFS deep storage location. | ||
|
|
||
| This helps users migrate segments stored in local deep storage to HDFS. | ||
|
|
||
| `--hadoopStorageDirectory`, `-h`: The HDFS path that will hold the migrated segments | ||
|
|
||
| When copying the local deep storage segments to HDFS, the rewrite performed by this tool requires that the directory structure of the segments be unchanged, with the exception of directory names containing colons (`:`). | ||
|
|
||
| For example, if the cluster had the following local deep storage configuration: | ||
|
|
||
| ``` | ||
| druid.storage.type=local | ||
| druid.storage.storageDirectory=/druid/segments | ||
| ``` | ||
|
|
||
| If the target hadoopStorageDirectory was `/migration/example`, the contents of `hdfs:///migration/example/` must be identical to that of `/druid/segments` on the old local filesystem. | ||
|
|
||
| Additionally, the segments paths in local deep storage contain colons(`:`) in their names, e.g.: | ||
|
|
||
| `wikipedia/2016-06-27T02:00:00.000Z_2016-06-27T03:00:00.000Z/2019-05-03T21:57:15.950Z/1/index.zip` | ||
|
|
||
| HDFS cannot store files containing colons, and this tool expects the colons to be replaced with underscores (`_`) in HDFS. | ||
|
|
||
| In this example, the `wikipedia` segment above under `/druid/segments` in local deep storage would need to be migrated to HDFS under `hdfs:///migration/example/` with the following path: | ||
|
|
||
| `wikipedia/2016-06-27T02_00_00.000Z_2016-06-27T03_00_00.000Z/2019-05-03T21_57_15.950Z/1/index.zip` | ||
|
|
||
| #### Migration to New Local Deep Storage Path | ||
|
|
||
| By setting the options below, the tool will rewrite the segment load specs to point to a new local deep storage location. | ||
|
|
||
| This helps users migrate segments stored in local deep storage to a new path (e.g., a new NFS mount). | ||
|
|
||
| `--newLocalPath`, `-n`: The new path on the local filesystem that will hold the migrated segments | ||
|
|
||
| When copying the local deep storage segments to a new path, the rewrite performed by this tool requires that the directory structure of the segments be unchanged. | ||
|
|
||
| For example, if the cluster had the following local deep storage configuration: | ||
|
|
||
| ``` | ||
| druid.storage.type=local | ||
| druid.storage.storageDirectory=/druid/segments | ||
| ``` | ||
|
|
||
| If the new path was `/migration/example`, the contents of `/migration/example/` must be identical to that of `/druid/segments` on the local filesystem. | ||
|
|
||
| ## Running the tool | ||
|
|
||
| To use the tool, you can run the following from the root of the Druid package: | ||
|
|
||
| ```bash | ||
| cd ${DRUID_ROOT} | ||
| mkdir -p /tmp/csv | ||
| java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[] org.apache.druid.cli.Main tools export-metadata --connectURI "jdbc:derby://localhost:1527/var/druid/metadata.db;" -o /tmp/csv | ||
| ``` | ||
|
|
||
| In the example command above: | ||
| - `lib` is the the Druid lib directory | ||
| - `extensions` is the Druid extensions directory | ||
| - `/tmp/csv` is the output directory. Please make sure that this directory exists. | ||
|
|
||
| ## Importing Metadata | ||
|
|
||
| After running the tool, the output directory will contain `<table-name>_raw.csv` and `<table-name>.csv` files. | ||
|
|
||
| The `<table-name>_raw.csv` files are intermediate files used by the tool, containing the table data as exported by Derby without modification. | ||
|
|
||
| The `<table-name>.csv` files are used for import into another database such as MySQL and PostgreSQL and have any configured deep storage location rewrites applied. | ||
|
|
||
| Example import commands for Derby, MySQL, and PostgreSQL are shown below. | ||
|
|
||
| These example import commands expect `/tmp/csv` and its contents to be accessible from the server. For other options, such as importing from the client filesystem, please refer to the database's documentation. | ||
|
|
||
| ### Derby | ||
|
|
||
| ```sql | ||
| CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'DRUID_SEGMENTS','/tmp/csv/druid_segments.csv',',','"',null,0); | ||
|
|
||
| CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'DRUID_RULES','/tmp/csv/druid_rules.csv',',','"',null,0); | ||
|
|
||
| CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'DRUID_CONFIG','/tmp/csv/druid_config.csv',',','"',null,0); | ||
|
|
||
| CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'DRUID_DATASOURCE','/tmp/csv/druid_dataSource.csv',',','"',null,0); | ||
|
|
||
| CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'DRUID_SUPERVISORS','/tmp/csv/druid_supervisors.csv',',','"',null,0); | ||
| ``` | ||
|
|
||
| ### MySQL | ||
|
|
||
| ```sql | ||
| LOAD DATA INFILE '/tmp/csv/druid_segments.csv' INTO TABLE druid_segments FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (id,dataSource,created_date,start,end,partitioned,version,used,payload); SHOW WARNINGS; | ||
|
|
||
| LOAD DATA INFILE '/tmp/csv/druid_rules.csv' INTO TABLE druid_rules FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (id,dataSource,version,payload); SHOW WARNINGS; | ||
|
|
||
| LOAD DATA INFILE '/tmp/csv/druid_config.csv' INTO TABLE druid_config FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (name,payload); SHOW WARNINGS; | ||
|
|
||
| LOAD DATA INFILE '/tmp/csv/druid_dataSource.csv' INTO TABLE druid_dataSource FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (dataSource,created_date,commit_metadata_payload,commit_metadata_sha1); SHOW WARNINGS; | ||
|
|
||
| LOAD DATA INFILE '/tmp/csv/druid_supervisors.csv' INTO TABLE druid_supervisors FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (id,spec_id,created_date,payload); SHOW WARNINGS; | ||
| ``` | ||
|
|
||
| ### PostgreSQL | ||
|
|
||
| ```sql | ||
| COPY druid_segments(id,dataSource,created_date,start,"end",partitioned,version,used,payload) FROM '/tmp/csv/druid_segments.csv' DELIMITER ',' CSV; | ||
|
|
||
| COPY druid_rules(id,dataSource,version,payload) FROM '/tmp/csv/druid_rules.csv' DELIMITER ',' CSV; | ||
|
|
||
| COPY druid_config(name,payload) FROM '/tmp/csv/druid_config.csv' DELIMITER ',' CSV; | ||
|
|
||
| COPY druid_dataSource(dataSource,created_date,commit_metadata_payload,commit_metadata_sha1) FROM '/tmp/csv/druid_dataSource.csv' DELIMITER ',' CSV; | ||
|
|
||
| COPY druid_supervisors(id,spec_id,created_date,payload) FROM '/tmp/csv/druid_supervisors.csv' DELIMITER ',' CSV; | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| --- | ||
| layout: doc_page | ||
| title: "Metadata Migration" | ||
| --- | ||
|
|
||
| <!-- | ||
| ~ Licensed to the Apache Software Foundation (ASF) under one | ||
| ~ or more contributor license agreements. See the NOTICE file | ||
| ~ distributed with this work for additional information | ||
| ~ regarding copyright ownership. The ASF licenses this file | ||
| ~ to you under the Apache License, Version 2.0 (the | ||
| ~ "License"); you may not use this file except in compliance | ||
| ~ with the License. You may obtain a copy of the License at | ||
| ~ | ||
| ~ http://www.apache.org/licenses/LICENSE-2.0 | ||
| ~ | ||
| ~ Unless required by applicable law or agreed to in writing, | ||
| ~ software distributed under the License is distributed on an | ||
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| ~ KIND, either express or implied. See the License for the | ||
| ~ specific language governing permissions and limitations | ||
| ~ under the License. | ||
| --> | ||
|
|
||
| # Metadata Migration | ||
|
|
||
| If you have been running an evaluation Druid cluster using the built-in Derby metadata storage and wish to migrate to a | ||
| more production-capable metadata store such as MySQL or PostgreSQL, this document describes the necessary steps. | ||
|
|
||
| ## Shut down cluster services | ||
|
|
||
| To ensure a clean migration, shut down the non-coordinator services to ensure that metadata state will not | ||
| change as you do the migration. | ||
|
|
||
| When migrating from Derby, the coordinator processes will still need to be up initially, as they host the Derby database. | ||
|
|
||
| ## Exporting metadata | ||
|
|
||
| Druid provides an [Export Metadata Tool](../operations/export-metadata.html) for exporting metadata from Derby into CSV files | ||
| which can then be imported into your new metadata store. | ||
|
|
||
| The tool also provides options for rewriting the deep storage locations of segments; this is useful | ||
| for [deep storage migration](../operations/deep-storage-migration.html). | ||
|
|
||
| Run the `export-metadata` tool on your existing cluster, and save the CSV files it generates. After a successful export, you can shut down the coordinator. | ||
|
|
||
| ## Initializing the new metadata store | ||
|
|
||
| ### Create database | ||
|
|
||
| Before importing the existing cluster metadata, you will need to set up the new metadata store. | ||
|
|
||
| The [MySQL extension](../development/extensions-core/mysql.html) and [PostgreSQL extension](../development/extensions-core/postgresql.html) docs have instructions for initial database setup. | ||
|
|
||
| ### Update configuration | ||
|
|
||
| Update your Druid runtime properties with the new metadata configuration. | ||
|
|
||
| ### Create Druid tables | ||
|
|
||
| Druid provides a `metadata-init` tool for creating Druid's metadata tables. After initializing the Druid database, you can run the commands shown below from the root of the Druid package to initialize the tables. | ||
|
|
||
| In the example commands below: | ||
| - `lib` is the the Druid lib directory | ||
| - `extensions` is the Druid extensions directory | ||
| - `base` corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default. | ||
| - The `--connectURI` parameter corresponds to the value of `druid.metadata.storage.connector.connectURI`. | ||
| - The `--user` parameter corresponds to the value of `druid.metadata.storage.connector.user`. | ||
| - The `--password` parameter corresponds to the value of `druid.metadata.storage.connector.password`. | ||
|
|
||
| #### MySQL | ||
|
|
||
| ```bash | ||
| cd ${DRUID_ROOT} | ||
| java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[\"mysql-metadata-storage\"] -Ddruid.metadata.storage.type=mysql org.apache.druid.cli.Main tools metadata-init --connectURI="<mysql-uri>" --user <user> --password <pass> --base druid | ||
| ``` | ||
|
|
||
| #### PostgreSQL | ||
|
|
||
| ```bash | ||
| cd ${DRUID_ROOT} | ||
| java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log4j2.xml -Ddruid.extensions.directory="extensions" -Ddruid.extensions.loadList=[\"postgresql-metadata-storage\"] -Ddruid.metadata.storage.type=postgresql org.apache.druid.cli.Main tools metadata-init --connectURI="<postgresql-uri>" --user <user> --password <pass> --base druid | ||
| ``` | ||
|
|
||
| ### Import metadata | ||
|
|
||
| After initializing the tables, please refer to the [import commands](../operations/export-metadata.html#importing-metadata) for your target database. | ||
|
|
||
| ### Restart cluster | ||
|
|
||
| After importing the metadata successfully, you can now restart your cluster. | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add
mkdir -p /tmp/csvtoo? Looks like the output directory must exist before running this command.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that and a note about making sure the directory exists