-
Notifications
You must be signed in to change notification settings - Fork 963
make rocksdb format version configurable #2824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make rocksdb format version configurable #2824
Conversation
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
merlimat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Since RocksDB takes a config file, we should also let users configure that, so that we don't need new code changes to support new options.
| # dbStorage_rocksDB_numFilesInLevel0=4 | ||
| # dbStorage_rocksDB_maxSizeInLevel1MB=256 | ||
| # dbStorage_rocksDB_logPath= | ||
| # dbStorage_rocksDB_format_version=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add some comments or description here? I think not every customer could have knowledge that this format version could have impacts on performance.
Fix apache#2823 RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394 ```C++ // We currently have five versions: // 0 -- This version is currently written out by all RocksDB's versions by // default. Can be read by really old RocksDB's. Doesn't support changing // checksum (default is CRC32). // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default // checksum, like xxHash. It is written by RocksDB when // BlockBasedTableOptions::checksum is something other than kCRC32c. (version // 0 is silently upconverted) // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you // don't plan to run RocksDB before version 3.10, you should probably use // this. // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we // encode the keys in index blocks. If you don't plan to run RocksDB before // version 5.15, you should probably use this. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we // encode the values in index blocks. If you don't plan to run RocksDB before // version 5.16 and you are using index_block_restart_interval > 1, you should // probably use this as it would reduce the index size. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned // filters use a generally faster and more accurate Bloom filter // implementation, with a different schema. uint32_t format_version = 5; ``` Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance. 1. Make the format_version configurable. Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable
### Motivation Fix apache#2823 RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394 ```C++ // We currently have five versions: // 0 -- This version is currently written out by all RocksDB's versions by // default. Can be read by really old RocksDB's. Doesn't support changing // checksum (default is CRC32). // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default // checksum, like xxHash. It is written by RocksDB when // BlockBasedTableOptions::checksum is something other than kCRC32c. (version // 0 is silently upconverted) // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you // don't plan to run RocksDB before version 3.10, you should probably use // this. // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we // encode the keys in index blocks. If you don't plan to run RocksDB before // version 5.15, you should probably use this. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we // encode the values in index blocks. If you don't plan to run RocksDB before // version 5.16 and you are using index_block_restart_interval > 1, you should // probably use this as it would reduce the index size. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned // filters use a generally faster and more accurate Bloom filter // implementation, with a different schema. uint32_t format_version = 5; ``` Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance. ### Changes 1. Make the format_version configurable. Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable (cherry picked from commit 50f5287)
### Motivation Fix apache#2823 RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394 ```C++ // We currently have five versions: // 0 -- This version is currently written out by all RocksDB's versions by // default. Can be read by really old RocksDB's. Doesn't support changing // checksum (default is CRC32). // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default // checksum, like xxHash. It is written by RocksDB when // BlockBasedTableOptions::checksum is something other than kCRC32c. (version // 0 is silently upconverted) // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you // don't plan to run RocksDB before version 3.10, you should probably use // this. // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we // encode the keys in index blocks. If you don't plan to run RocksDB before // version 5.15, you should probably use this. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we // encode the values in index blocks. If you don't plan to run RocksDB before // version 5.16 and you are using index_block_restart_interval > 1, you should // probably use this as it would reduce the index size. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned // filters use a generally faster and more accurate Bloom filter // implementation, with a different schema. uint32_t format_version = 5; ``` Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance. ### Changes 1. Make the format_version configurable. Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable (cherry picked from commit 50f5287)
### Motivation Fix apache#2823 RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394 ```C++ // We currently have five versions: // 0 -- This version is currently written out by all RocksDB's versions by // default. Can be read by really old RocksDB's. Doesn't support changing // checksum (default is CRC32). // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default // checksum, like xxHash. It is written by RocksDB when // BlockBasedTableOptions::checksum is something other than kCRC32c. (version // 0 is silently upconverted) // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you // don't plan to run RocksDB before version 3.10, you should probably use // this. // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we // encode the keys in index blocks. If you don't plan to run RocksDB before // version 5.15, you should probably use this. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we // encode the values in index blocks. If you don't plan to run RocksDB before // version 5.16 and you are using index_block_restart_interval > 1, you should // probably use this as it would reduce the index size. // This option only affects newly written tables. When reading existing // tables, the information about version is read from the footer. // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned // filters use a generally faster and more accurate Bloom filter // implementation, with a different schema. uint32_t format_version = 5; ``` Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance. ### Changes 1. Make the format_version configurable. Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable
Motivation
Fix #2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html
https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version
In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.
Changes