ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927

nevi-me · 2020-12-15T14:03:54Z

Putting this out for those who are interested in poking through the implementation. I'm nearly done with this, but I'm now dealing with integrating the level calculations into arrays. Some tests pass, others fail

save progress (11/11/2020) save progress Integrating level calculations in writer Some tests are failing, still have a long way to go fix lints save progress I'm nearly able to reproduce a `<struct<struct<primitive>>` I'm writing one level too high for nulls, so my null counts differ. Fixing this should result in nested struct roundtrip for the fully nullable case. Currently failing tests: ```rust failures: arrow::arrow_writer::tests::arrow_writer_2_level_struct arrow::arrow_writer::tests::arrow_writer_complex arrow::levels::tests::test_calculate_array_levels_2 arrow::levels::tests::test_calculate_array_levels_nested_list arrow::levels::tests::test_calculate_one_level_2 ``` They are mainly failing because we don't roundtrip lists correctly save progress 19/20-11-2020 Structs that have nulls are working (need to revert non-null logic) TODOs that need addressing later on save progress - Focused more on nested structs. - Confident that writes are now fine - Found issue with struct logical comparison, blocks this work add failing arrow struct array test a bit of cleanup for failing tests Also document why dictionary test is failing

strip out list support, to be worked on separately

(1) all but 1 test failing at this point

(2) trying to solve OOB panics

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

integrated list writer, now need to get the levels consistently correct

github-actions · 2020-12-15T14:30:41Z

https://issues.apache.org/jira/browse/ARROW-10766

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

nevi-me · 2020-12-21T02:42:49Z

I'd like to complete this before the Christmas break, so I've unfortunately solely been working on this on the weekend.
I'm dealing with edge-cases in the tests, and massaging everything to work.

nevi-me · 2021-01-04T20:51:47Z

I got stalled with #9093. I think it's the last blocker before I can complete this :(

save progress (11/11/2020) save progress Integrating level calculations in writer Some tests are failing, still have a long way to go fix lints save progress I'm nearly able to reproduce a `<struct<struct<primitive>>` I'm writing one level too high for nulls, so my null counts differ. Fixing this should result in nested struct roundtrip for the fully nullable case. Currently failing tests: ```rust failures: arrow::arrow_writer::tests::arrow_writer_2_level_struct arrow::arrow_writer::tests::arrow_writer_complex arrow::levels::tests::test_calculate_array_levels_2 arrow::levels::tests::test_calculate_array_levels_nested_list arrow::levels::tests::test_calculate_one_level_2 ``` They are mainly failing because we don't roundtrip lists correctly save progress 19/20-11-2020 Structs that have nulls are working (need to revert non-null logic) TODOs that need addressing later on save progress - Focused more on nested structs. - Confident that writes are now fine - Found issue with struct logical comparison, blocks this work add failing arrow struct array test a bit of cleanup for failing tests Also document why dictionary test is failing

strip out list support, to be worked on separately

(1) all but 1 test failing at this point

(2) trying to solve OOB panics

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

integrated list writer, now need to get the levels consistently correct

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

revert logical equality changes

…RROW-10766

save progress (11/11/2020) save progress Integrating level calculations in writer Some tests are failing, still have a long way to go fix lints save progress I'm nearly able to reproduce a `<struct<struct<primitive>>` I'm writing one level too high for nulls, so my null counts differ. Fixing this should result in nested struct roundtrip for the fully nullable case. Currently failing tests: ```rust failures: arrow::arrow_writer::tests::arrow_writer_2_level_struct arrow::arrow_writer::tests::arrow_writer_complex arrow::levels::tests::test_calculate_array_levels_2 arrow::levels::tests::test_calculate_array_levels_nested_list arrow::levels::tests::test_calculate_one_level_2 ``` They are mainly failing because we don't roundtrip lists correctly save progress 19/20-11-2020 Structs that have nulls are working (need to revert non-null logic) TODOs that need addressing later on save progress - Focused more on nested structs. - Confident that writes are now fine - Found issue with struct logical comparison, blocks this work add failing arrow struct array test a bit of cleanup for failing tests Also document why dictionary test is failing

strip out list support, to be worked on separately

(1) all but 1 test failing at this point

(2) trying to solve OOB panics

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

integrated list writer, now need to get the levels consistently correct

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

revert logical equality changes

…RROW-10766

nevi-me · 2021-01-18T00:47:03Z

Closing this, will open a fresh one that's got fewer commits

nevi-me added 9 commits December 13, 2020 02:06

simplify dictionary writes

8f5301c

move things around

a3114e3

strip out list support, to be worked on separately

add list level calculations again

1ab6048

save progress on work done on lists

08bce27

save changes (1)

689b510

(1) all but 1 test failing at this point

save progress (2)

15dee34

(2) trying to solve OOB panics

Save progress

c84a166

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

save progress

99336d7

integrated list writer, now need to get the levels consistently correct

github-actions bot added Component: Rust Component: Parquet labels Dec 15, 2020

save progress (20-12-2020)

4581ec8

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 21, 2020

save changes

462f410

nevi-me force-pushed the ARROW-10766 branch from 462f410 to 920dafa Compare January 4, 2021 20:23

nevi-me mentioned this pull request Jan 5, 2021

ARROW-11125: [Rust] Logical equality for list arrays #9093

Closed

nevi-me added 11 commits January 5, 2021 21:46

simplify dictionary writes

2431f95

move things around

5634333

strip out list support, to be worked on separately

add list level calculations again

661e8dc

save progress on work done on lists

7a56cb0

save changes (1)

93fcf41

(1) all but 1 test failing at this point

save progress (2)

0bc574f

(2) trying to solve OOB panics

Save progress

102bea0

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

save progress

a5557fd

integrated list writer, now need to get the levels consistently correct

save progress (20-12-2020)

654a244

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

save changes

be944d3

save progress

36a252d

revert logical equality changes

nevi-me force-pushed the ARROW-10766 branch from 920dafa to 36a252d Compare January 5, 2021 19:51

fix rebase

20a010e

nevi-me changed the title ~~ARROW-10766: [Rust] [Parquet] Nested List IO~~ ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] Jan 5, 2021

nevi-me removed the needs-rebase A PR that needs to be rebased by the author label Jan 9, 2021

nevi-me added 2 commits January 13, 2021 02:19

Merge branch 'ARROW-10766' of https://github.com/nevi-me/arrow into A…

5807b17

…RROW-10766

bank changes

cc192c0

nevi-me force-pushed the ARROW-10766 branch from cc192c0 to 8786f7f Compare January 17, 2021 20:58

nevi-me added 14 commits January 17, 2021 22:58

simplify dictionary writes

24b03b2

move things around

6343e14

strip out list support, to be worked on separately

add list level calculations again

0336b79

save progress on work done on lists

7cd9c55

save changes (1)

bf80f70

(1) all but 1 test failing at this point

save progress (2)

f62e62f

(2) trying to solve OOB panics

Save progress

bbb2fe3

List definition algo still has some quirks. Masks and OOB panics. Ported list write code

save progress

b38a796

integrated list writer, now need to get the levels consistently correct

save progress (20-12-2020)

fb3b385

- fixed most tests, worked them out on paper again - made max_def_level almost completely consistent - added a few tests I'm sadly spending a lot of time dealing with Arrow edge-cases, but they are important to avoid data loss and incorrect indexing of array.

save changes

bd4166a

save progress

ad154c0

revert logical equality changes

fix rebase

7c62bd3

Verified that levels are working, improved logic

bb52465

nevi-me force-pushed the ARROW-10766 branch from 8786f7f to bb52465 Compare January 17, 2021 20:58

nevi-me added 3 commits January 17, 2021 23:00

fix lints

4f14ea3

Merge branch 'ARROW-10766' of https://github.com/nevi-me/arrow into A…

6b4302a

…RROW-10766

writer working

3ea8cce

nevi-me closed this Jan 18, 2021

asfimport mentioned this pull request Jan 22, 2021

[Rust] Compute nested definition and repetition for list arrays #18397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927

ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927

Uh oh!

nevi-me commented Dec 15, 2020

Uh oh!

github-actions bot commented Dec 15, 2020

Uh oh!

nevi-me commented Dec 21, 2020

Uh oh!

nevi-me commented Jan 4, 2021

Uh oh!

nevi-me commented Jan 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927

ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] #8927

Uh oh!

Conversation

nevi-me commented Dec 15, 2020

Uh oh!

github-actions bot commented Dec 15, 2020

Uh oh!

nevi-me commented Dec 21, 2020

Uh oh!

nevi-me commented Jan 4, 2021

Uh oh!

nevi-me commented Jan 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant