-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
save progress (11/11/2020)
save progress
Integrating level calculations in writer
Some tests are failing, still have a long way to go
fix lints
save progress
I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.
Currently failing tests:
```rust
failures:
arrow::arrow_writer::tests::arrow_writer_2_level_struct
arrow::arrow_writer::tests::arrow_writer_complex
arrow::levels::tests::test_calculate_array_levels_2
arrow::levels::tests::test_calculate_array_levels_nested_list
arrow::levels::tests::test_calculate_one_level_2
```
They are mainly failing because we don't roundtrip lists correctly
save progress 19/20-11-2020
Structs that have nulls are working (need to revert non-null logic)
TODOs that need addressing later on
save progress
- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work
add failing arrow struct array test
a bit of cleanup for failing tests
Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks. Masks and OOB panics. Ported list write code
integrated list writer, now need to get the levels consistently correct
- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.
Contributor
Author
|
I'd like to complete this before the Christmas break, so I've unfortunately solely been working on this on the weekend. |
Contributor
Author
|
I got stalled with #9093. I think it's the last blocker before I can complete this :( |
save progress (11/11/2020)
save progress
Integrating level calculations in writer
Some tests are failing, still have a long way to go
fix lints
save progress
I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.
Currently failing tests:
```rust
failures:
arrow::arrow_writer::tests::arrow_writer_2_level_struct
arrow::arrow_writer::tests::arrow_writer_complex
arrow::levels::tests::test_calculate_array_levels_2
arrow::levels::tests::test_calculate_array_levels_nested_list
arrow::levels::tests::test_calculate_one_level_2
```
They are mainly failing because we don't roundtrip lists correctly
save progress 19/20-11-2020
Structs that have nulls are working (need to revert non-null logic)
TODOs that need addressing later on
save progress
- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work
add failing arrow struct array test
a bit of cleanup for failing tests
Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks. Masks and OOB panics. Ported list write code
integrated list writer, now need to get the levels consistently correct
- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.
revert logical equality changes
save progress (11/11/2020)
save progress
Integrating level calculations in writer
Some tests are failing, still have a long way to go
fix lints
save progress
I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.
Currently failing tests:
```rust
failures:
arrow::arrow_writer::tests::arrow_writer_2_level_struct
arrow::arrow_writer::tests::arrow_writer_complex
arrow::levels::tests::test_calculate_array_levels_2
arrow::levels::tests::test_calculate_array_levels_nested_list
arrow::levels::tests::test_calculate_one_level_2
```
They are mainly failing because we don't roundtrip lists correctly
save progress 19/20-11-2020
Structs that have nulls are working (need to revert non-null logic)
TODOs that need addressing later on
save progress
- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work
add failing arrow struct array test
a bit of cleanup for failing tests
Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks. Masks and OOB panics. Ported list write code
integrated list writer, now need to get the levels consistently correct
- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.
revert logical equality changes
Contributor
Author
|
Closing this, will open a fresh one that's got fewer commits |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Putting this out for those who are interested in poking through the implementation. I'm nearly done with this, but I'm now dealing with integrating the level calculations into arrays. Some tests pass, others fail