ARROW-4219: [Rust] [Parquet] Initial support for arrow reader.#5523
ARROW-4219: [Rust] [Parquet] Initial support for arrow reader.#5523liurenjie1024 wants to merge 3 commits intoapache:masterfrom
Conversation
|
@sunchao @andygrove @nevi-me @paddyhoran Please take a look when you are available. |
andygrove
left a comment
There was a problem hiding this comment.
LGTM so far but I have a couple of questions in this review
There was a problem hiding this comment.
We have a very similar trait in DataFusion, only it inherits Sync + Send too so that it can be passed between threads. Maybe we could do that here too? See https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/mod.rs#L44-L50
There was a problem hiding this comment.
I don't think it's safe to make this trait as Send+Sync because the reader used some unsafe data(e.g. MutableBuffer).
There was a problem hiding this comment.
What is returned if there are no more batches? Shouldn't this return Result<Option<RecordBatch>> ?
There was a problem hiding this comment.
Currently if we reached EOF, it returns zero length array. But I think it's better to change return type to Result<Option<RecordBatch>> . Will fix it.
|
@andygrove I need to upload some generated files to arrow-testing repo for this integration test. How am I supposed to do that? Should I submit another PR in arrow-testing project? |
|
@liurenjie1024 If you are adding Parquet files then you can PR against https://github.com/apache/parquet-testing but please note the comments in the last PR merged there which suggests we look into alternate approaches: apache/parquet-testing#9 |
|
@andygrove I need to upload not only parquet file, but also json files and proto files. Here is my approach to do integration test: I use protobuf to define schema, then generate both json and parquet file with that schema using java code. Then I use rust code to load parquet and json file to compare them. I guess maybe it's better to upload them into arrow-testing library? |
|
@liurenjie1024 I think it would be good to start a discussion about this on the mailing list. I'm not sure what advice to give on this. |
|
The new |
|
@andygrove Sorry for late reply. I've submitted a PR in arrow-testing repo: apache/arrow-testing#11 |
8cba3aa to
6a93e50
Compare
6a93e50 to
cecdd97
Compare
|
@andygrove This is ready for review now. |
@liurenjie1024 is this an intentional limitation for now? ( arrow/rust/parquet/src/arrow/array_reader.rs Lines 219 to 222 in 8b915b3 Also, my apologies for the late review. |
|
@nevi-me No, it just takes time to implement. Will implement in future. |
andygrove
left a comment
There was a problem hiding this comment.
LGTM. Thanks @liurenjie1024
Initial support of arrow reader, which reads parquet into arrow record batch.