Revisit List Row Encoding (#5807)#5811
Merged
tustvold merged 3 commits intoapache:masterfrom May 28, 2024
Merged
Conversation
Contributor
Author
|
I have also tested this against #5792 which integrates lists and structs into the row format and the tests pass 🎉 |
alamb
approved these changes
May 28, 2024
Contributor
alamb
left a comment
There was a problem hiding this comment.
Overall the change looks good to me but I am not sure I would catch any subtle bugs as I am not an expert in this code
I suggest you also extend the fuzz test to include ListArrays to make sure we cover all corner cases.
| dictionary_eq(&back[0], &array); | ||
| } | ||
|
|
||
| #[test] |
Contributor
There was a problem hiding this comment.
I verified that this test covers the issue by running this test on main and it failed like this:
thread 'tests::test_list_prefix' panicked at arrow-row/src/lib.rs:2284:9:
assertion `left == right` failed
left: Greater
right: Less
stack backtrace:
0: rust_begin_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:298:5
4: arrow_row::tests::test_list_prefix
at ./src/lib.rs:2284:9
5: arrow_row::tests::test_list_prefix::{{closure}}
at ./src/lib.rs:2276:26
6: core::ops::function::FnOnce::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
7: core::ops::function::FnOnce::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `--lib`
error: 1 target failed:
`--lib`
Contributor
Author
I have this ready as part of #5792 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #5807
Rationale for this change
Historically the variable length encoding used a fixed block size of 32 bytes, this made it very expensive for encoding small byte arrays. This meant we couldn't use it directly for encoding list elements, and instead came up with an alternative encoding. Unfortunately this encoding is incorrect #5807
Fortunately following #4818 the space overheads for small values using the variable length encoding is significantly reduced, and so we can just use it directly. The result is not only correct, but also typically smaller.
What changes are included in this PR?
Are there any user-facing changes?