Skip to content

Conversation

@nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Dec 15, 2020

Putting this out for those who are interested in poking through the implementation. I'm nearly done with this, but I'm now dealing with integrating the level calculations into arrays. Some tests pass, others fail

save progress (11/11/2020)

save progress

Integrating level calculations in writer

Some tests are failing, still have a long way to go

fix lints

save progress

I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.

Currently failing tests:

```rust
failures:
    arrow::arrow_writer::tests::arrow_writer_2_level_struct
    arrow::arrow_writer::tests::arrow_writer_complex
    arrow::levels::tests::test_calculate_array_levels_2
    arrow::levels::tests::test_calculate_array_levels_nested_list
    arrow::levels::tests::test_calculate_one_level_2
```

They are mainly failing because we don't roundtrip lists correctly

save progress 19/20-11-2020

Structs that have nulls are working (need to revert non-null logic)

TODOs that need addressing later on

save progress

- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work

add failing arrow struct array test

a bit of cleanup for failing tests

Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks.
Masks and OOB panics.
Ported list write code
integrated list writer, now need to get the levels consistently correct
@github-actions
Copy link

- fixed most tests, worked them out on paper again
- made max_def_level almost completely consistent
- added a few tests

I'm sadly spending a lot of time dealing with Arrow edge-cases,
but they are important to avoid data loss and incorrect
indexing of array.
@nevi-me
Copy link
Contributor Author

nevi-me commented Dec 21, 2020

I'd like to complete this before the Christmas break, so I've unfortunately solely been working on this on the weekend.
I'm dealing with edge-cases in the tests, and massaging everything to work.

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 21, 2020
@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 4, 2021

I got stalled with #9093. I think it's the last blocker before I can complete this :(

nevi-me added 11 commits January 5, 2021 21:46
save progress (11/11/2020)

save progress

Integrating level calculations in writer

Some tests are failing, still have a long way to go

fix lints

save progress

I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.

Currently failing tests:

```rust
failures:
    arrow::arrow_writer::tests::arrow_writer_2_level_struct
    arrow::arrow_writer::tests::arrow_writer_complex
    arrow::levels::tests::test_calculate_array_levels_2
    arrow::levels::tests::test_calculate_array_levels_nested_list
    arrow::levels::tests::test_calculate_one_level_2
```

They are mainly failing because we don't roundtrip lists correctly

save progress 19/20-11-2020

Structs that have nulls are working (need to revert non-null logic)

TODOs that need addressing later on

save progress

- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work

add failing arrow struct array test

a bit of cleanup for failing tests

Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks.
Masks and OOB panics.
Ported list write code
integrated list writer, now need to get the levels consistently correct
- fixed most tests, worked them out on paper again
- made max_def_level almost completely consistent
- added a few tests

I'm sadly spending a lot of time dealing with Arrow edge-cases,
but they are important to avoid data loss and incorrect
indexing of array.
revert logical equality changes
@nevi-me nevi-me changed the title ARROW-10766: [Rust] [Parquet] Nested List IO ARROW-10766: [Rust] [Parquet] Nested List IO [WIP] Jan 5, 2021
@nevi-me nevi-me removed the needs-rebase A PR that needs to be rebased by the author label Jan 9, 2021
save progress (11/11/2020)

save progress

Integrating level calculations in writer

Some tests are failing, still have a long way to go

fix lints

save progress

I'm nearly able to reproduce a `<struct<struct<primitive>>`
I'm writing one level too high for nulls, so my null counts differ.
Fixing this should result in nested struct roundtrip for the fully
nullable case.

Currently failing tests:

```rust
failures:
    arrow::arrow_writer::tests::arrow_writer_2_level_struct
    arrow::arrow_writer::tests::arrow_writer_complex
    arrow::levels::tests::test_calculate_array_levels_2
    arrow::levels::tests::test_calculate_array_levels_nested_list
    arrow::levels::tests::test_calculate_one_level_2
```

They are mainly failing because we don't roundtrip lists correctly

save progress 19/20-11-2020

Structs that have nulls are working (need to revert non-null logic)

TODOs that need addressing later on

save progress

- Focused more on nested structs.
- Confident that writes are now fine
- Found issue with struct logical comparison, blocks this work

add failing arrow struct array test

a bit of cleanup for failing tests

Also document why dictionary test is failing
strip out list support, to be worked on separately
(1) all but 1 test failing at this point
(2) trying to solve OOB panics
List definition algo still has some quirks.
Masks and OOB panics.
Ported list write code
integrated list writer, now need to get the levels consistently correct
- fixed most tests, worked them out on paper again
- made max_def_level almost completely consistent
- added a few tests

I'm sadly spending a lot of time dealing with Arrow edge-cases,
but they are important to avoid data loss and incorrect
indexing of array.
revert logical equality changes
@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 18, 2021

Closing this, will open a fresh one that's got fewer commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant