Skip to content

Conversation

@hangc0276
Copy link
Contributor

Motivation

Fix #2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html

https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394

  // We currently have five versions:
  // 0 -- This version is currently written out by all RocksDB's versions by
  // default.  Can be read by really old RocksDB's. Doesn't support changing
  // checksum (default is CRC32).
  // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
  // checksum, like xxHash. It is written by RocksDB when
  // BlockBasedTableOptions::checksum is something other than kCRC32c. (version
  // 0 is silently upconverted)
  // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
  // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
  // don't plan to run RocksDB before version 3.10, you should probably use
  // this.
  // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
  // encode the keys in index blocks. If you don't plan to run RocksDB before
  // version 5.15, you should probably use this.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
  // encode the values in index blocks. If you don't plan to run RocksDB before
  // version 5.16 and you are using index_block_restart_interval > 1, you should
  // probably use this as it would reduce the index size.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
  // filters use a generally faster and more accurate Bloom filter
  // implementation, with a different schema.
  uint32_t format_version = 5;

Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version

In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.

Changes

  1. Make the format_version configurable.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since RocksDB takes a config file, we should also let users configure that, so that we don't need new code changes to support new options.

# dbStorage_rocksDB_numFilesInLevel0=4
# dbStorage_rocksDB_maxSizeInLevel1MB=256
# dbStorage_rocksDB_logPath=
# dbStorage_rocksDB_format_version=2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add some comments or description here? I think not every customer could have knowledge that this format version could have impacts on performance.

@dlg99 dlg99 added this to the 4.15.0 milestone Feb 14, 2022
@dlg99 dlg99 merged commit 50f5287 into apache:master Feb 14, 2022
StevenLuMT pushed a commit to StevenLuMT/bookkeeper that referenced this pull request Feb 16, 2022
Fix apache#2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html

https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394

```C++
  // We currently have five versions:
  // 0 -- This version is currently written out by all RocksDB's versions by
  // default.  Can be read by really old RocksDB's. Doesn't support changing
  // checksum (default is CRC32).
  // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
  // checksum, like xxHash. It is written by RocksDB when
  // BlockBasedTableOptions::checksum is something other than kCRC32c. (version
  // 0 is silently upconverted)
  // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
  // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
  // don't plan to run RocksDB before version 3.10, you should probably use
  // this.
  // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
  // encode the keys in index blocks. If you don't plan to run RocksDB before
  // version 5.15, you should probably use this.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
  // encode the values in index blocks. If you don't plan to run RocksDB before
  // version 5.16 and you are using index_block_restart_interval > 1, you should
  // probably use this as it would reduce the index size.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
  // filters use a generally faster and more accurate Bloom filter
  // implementation, with a different schema.
  uint32_t format_version = 5;
```
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version

In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.

1. Make the format_version configurable.

Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable
dlg99 pushed a commit to datastax/bookkeeper that referenced this pull request Jun 28, 2024
### Motivation
Fix apache#2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html

https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394

```C++
  // We currently have five versions:
  // 0 -- This version is currently written out by all RocksDB's versions by
  // default.  Can be read by really old RocksDB's. Doesn't support changing
  // checksum (default is CRC32).
  // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
  // checksum, like xxHash. It is written by RocksDB when
  // BlockBasedTableOptions::checksum is something other than kCRC32c. (version
  // 0 is silently upconverted)
  // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
  // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
  // don't plan to run RocksDB before version 3.10, you should probably use
  // this.
  // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
  // encode the keys in index blocks. If you don't plan to run RocksDB before
  // version 5.15, you should probably use this.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
  // encode the values in index blocks. If you don't plan to run RocksDB before
  // version 5.16 and you are using index_block_restart_interval > 1, you should
  // probably use this as it would reduce the index size.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
  // filters use a generally faster and more accurate Bloom filter
  // implementation, with a different schema.
  uint32_t format_version = 5;
```
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version

In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.

### Changes

1. Make the format_version configurable.

Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable

(cherry picked from commit 50f5287)
dlg99 pushed a commit to datastax/bookkeeper that referenced this pull request Jul 2, 2024
### Motivation
Fix apache#2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html

https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394

```C++
  // We currently have five versions:
  // 0 -- This version is currently written out by all RocksDB's versions by
  // default.  Can be read by really old RocksDB's. Doesn't support changing
  // checksum (default is CRC32).
  // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
  // checksum, like xxHash. It is written by RocksDB when
  // BlockBasedTableOptions::checksum is something other than kCRC32c. (version
  // 0 is silently upconverted)
  // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
  // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
  // don't plan to run RocksDB before version 3.10, you should probably use
  // this.
  // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
  // encode the keys in index blocks. If you don't plan to run RocksDB before
  // version 5.15, you should probably use this.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
  // encode the values in index blocks. If you don't plan to run RocksDB before
  // version 5.16 and you are using index_block_restart_interval > 1, you should
  // probably use this as it would reduce the index size.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
  // filters use a generally faster and more accurate Bloom filter
  // implementation, with a different schema.
  uint32_t format_version = 5;
```
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version

In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.

### Changes

1. Make the format_version configurable.

Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable

(cherry picked from commit 50f5287)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
### Motivation
Fix apache#2823
RocksDB support several format versions which uses different data structure to implement key-values indexes and have huge different performance. https://rocksdb.org/blog/2019/03/08/format-version-4.html

https://github.com/facebook/rocksdb/blob/d52b520d5168de6be5f1494b2035b61ff0958c11/include/rocksdb/table.h#L368-L394 

```C++
  // We currently have five versions:
  // 0 -- This version is currently written out by all RocksDB's versions by
  // default.  Can be read by really old RocksDB's. Doesn't support changing
  // checksum (default is CRC32).
  // 1 -- Can be read by RocksDB's versions since 3.0. Supports non-default
  // checksum, like xxHash. It is written by RocksDB when
  // BlockBasedTableOptions::checksum is something other than kCRC32c. (version
  // 0 is silently upconverted)
  // 2 -- Can be read by RocksDB's versions since 3.10. Changes the way we
  // encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
  // don't plan to run RocksDB before version 3.10, you should probably use
  // this.
  // 3 -- Can be read by RocksDB's versions since 5.15. Changes the way we
  // encode the keys in index blocks. If you don't plan to run RocksDB before
  // version 5.15, you should probably use this.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 4 -- Can be read by RocksDB's versions since 5.16. Changes the way we
  // encode the values in index blocks. If you don't plan to run RocksDB before
  // version 5.16 and you are using index_block_restart_interval > 1, you should
  // probably use this as it would reduce the index size.
  // This option only affects newly written tables. When reading existing
  // tables, the information about version is read from the footer.
  // 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
  // filters use a generally faster and more accurate Bloom filter
  // implementation, with a different schema.
  uint32_t format_version = 5;
```
Different format version requires different rocksDB version and it couldn't roll back once upgrade to new format version

In our current RocksDB storage code, we hard code the format_version to 2, which is hard to to upgrade format_version to achieve new RocksDB's high performance.

### Changes

1. Make the format_version configurable.


Reviewers: Matteo Merli <mmerli@apache.org>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2824 from hangc0276/chenhang/make_rocksdb_format_version_configurable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why KeyValueStorageRocksDB set the table format version as 2 explicitly?

5 participants