fix: forward compatibility of FTS index#4906
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4906 +/- ##
=======================================
Coverage 81.60% 81.61%
=======================================
Files 333 333
Lines 131384 131388 +4
Branches 131384 131388 +4
=======================================
+ Hits 107221 107236 +15
+ Misses 20565 20555 -10
+ Partials 3598 3597 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| reason="FTS token set format was introduced in 0.36.0", | ||
| ) | ||
| def test_list_indices_ignores_new_fts_index_version(): | ||
| session = lance.Session(index_cache_size_bytes=0, metadata_cache_size_bytes=0) |
There was a problem hiding this comment.
wait, are we supposed to set index_cache_size_bytes=0 and metadata_cache_size_bytes=0?
|
|
||
| pub const INVERTED_INDEX_VERSION: u32 = 0; | ||
| // Version 0: Arrow TokenSetFormat (legacy) | ||
| // Version 1: Fst TokenSetFormat (new default, incompatible with old clients) |
There was a problem hiding this comment.
mention lance version where version 1 was added?
westonpace
left a comment
There was a problem hiding this comment.
The change itself (bumping version when token set format is new) seems fine.
I hadn't expected index versioning to evolve quite like this. I had assumed newer versions of lance would just write newer index versions and we wouldn't have the ability to write multiple index versions. However, this seems better for gradual migration.
…c RowIdTreeMap.serialize… (#5105) …_into stable Closes: #5093 This PR made the following changes: 1. doc RowIdTreeMap.serialize_info as stable, which was used in bitmap index 2. make sure that the forward compatibility tests datasets have multiple fragments to guarantee older versions Lance can correctly load bitmap index. 3. make sure backward compatibility test `test_old_btree_bitmap_indices` use bitmap index 4. fix fails introduced by #4906 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
When creating using 0.38 and reading using 0.36 for `list_indices` the operation fails with ``` thread '<unnamed>' panicked at /home/runner/work/lance/lance/rust/lance-index/src/scalar/inverted/index.rs:846:30: called `Option::unwrap()` on a `None` value ``` This PR bumps the version of the FTS index when the token set format is FTS, so that the incompatible index should be ignored by old clients. However, there is a bug in the old client that if the indices are cached when opening manifest, then the index is not ignored and would still cause error when doing list_indices or other operations against the index.
…c RowIdTreeMap.serialize… (lance-format#5105) …_into stable Closes: lance-format#5093 This PR made the following changes: 1. doc RowIdTreeMap.serialize_info as stable, which was used in bitmap index 2. make sure that the forward compatibility tests datasets have multiple fragments to guarantee older versions Lance can correctly load bitmap index. 3. make sure backward compatibility test `test_old_btree_bitmap_indices` use bitmap index 4. fix fails introduced by lance-format#4906 --------- Co-authored-by: Will Jones <willjones127@gmail.com>
When creating using 0.38 and reading using 0.36 for
list_indicesthe operation fails with