[Rust] [Parquet] Implement parquet writer

This is the parent story. See subtasks for more information.

Notes from @wesm :

A couple of initial things to keep in mind
- Writes of both Nullable (OPTIONAL) and non-nullable (REQUIRED) fields
- You can optimize the special case where a nullable field's data has no nulls
- A good amount of code is required to handle converting from the Arrow physical form of various logical types to the Parquet equivalent one, see <https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc> for details
- It would be worth thinking up front about how dictionary-encoded data is handled both on the Arrow write and Arrow read paths. In parquet-cpp we initially discarded Arrow DictionaryArrays on write (casting e.g. Dictionary to dense String), and through real world need I was forced to revisit this (quite painfully) to enable Arrow dictionaries to survive roundtrips to Parquet format, and also achieve better performance and memory use in both reads and writes. You can certainly do a dictionary-to-dense conversion like we did, but you may someday find yourselves doing the same painful refactor that I did to make dictionary write and read not only more efficient but also dictionary order preserving.
  
  Notes from `[~sunchao]` :
  
  I roughly skimmed through the C++ implementation and think on the high level we need to do the following:
1. implement a method similar to `WriteArrow` in [column_writer.cc](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc). We can further break this up into smaller pieces such as: dictionary/non-dictionary, primitive types, booleans, timestamps, dates, so on and so forth.
1. implement an arrow writer in the parquet crate [here](https://github.com/apache/arrow/tree/master/rust/parquet/src/arrow). This needs to offer similar APIs as [writer.h](https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.h).

**Reporter**: [Andy Grove](https://issues.apache.org/jira/browse/ARROW-8421) / @andygrove
**Assignee**: [Neville Dipale](https://issues.apache.org/jira/browse/ARROW-8421) / @nevi-me
#### Subtasks:
- [X] [[Rust] [Parquet] Implement minimal Arrow Parquet writer as starting point for full writer](https://github.com/apache/arrow/issues/24482)
- [X] [[Rust] [Parquet] Implement function to convert Arrow schema to Parquet schema](https://github.com/apache/arrow/issues/24604)
- [X] [[Rust] [Parquet] Serialize arrow schema into metadata when writing parquet](https://github.com/apache/arrow/issues/17129)
- [X] [[Rust] [Parquet] Add support for writing sliced arrays](https://github.com/apache/arrow/issues/24605)
- [X] [[Rust] [Parquet] Add support for writing temporal types](https://github.com/apache/arrow/issues/24606)
- [X] [[Rust] [Parquet] Add support for writing dictionary types](https://github.com/apache/arrow/issues/24607)
- [X] [[Rust] [Parquet] Compute nested definition and repetition for structs](https://github.com/apache/arrow/issues/17306)
- [X] [[Rust] [Parquet] Update for IPC changes](https://github.com/apache/arrow/issues/26110)
- [X] [[Rust] [Parquet] Extend arrow schema conversion to projected fields](https://github.com/apache/arrow/issues/26176)
- [X] [[Rust] [Parquet] Add roundtrip tests for single column batches](https://github.com/apache/arrow/issues/26196)
- [X] [[Rust] [Parquet] Fix null bitmap comparisons in roundtrip tests](https://github.com/apache/arrow/issues/18317)
- [X] [[Rust] [Parquet] Support reading and writing Arrow NullArray](https://github.com/apache/arrow/issues/26322)
- [X] [[Rust] [Parquet] Write nested types (struct, list)](https://github.com/apache/arrow/issues/26516)
- [X] [[Rust] [Parquet] Add support for writing boolean type](https://github.com/apache/arrow/issues/26699)
- [X] [[Rust] Compute nested definition and repetition for list arrays](https://github.com/apache/arrow/issues/18397)
- [X] [[Rust] [Parquet] Write fixed size binary arrays](https://github.com/apache/arrow/issues/27872)
#### PRs and other links:
- [GitHub Pull Request #8274](https://github.com/apache/arrow/pull/8274)
- [GitHub Pull Request #8548](https://github.com/apache/arrow/pull/8548)

<sub>**Note**: *This issue was originally created as [ARROW-8421](https://issues.apache.org/jira/browse/ARROW-8421). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Rust] [Parquet] Implement parquet writer #24603

Subtasks:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Rust] [Parquet] Implement parquet writer #24603

Description

Subtasks:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions