ARROW-6687: Add .parquet file with single np.nan value by alippai · Pull Request #9 · apache/parquet-testing

alippai · 2019-09-25T20:54:26Z

Testing the new DataFusion .parquet support discovered an error. Adding a simple regression test resource to test it

andygrove

Could you just add the nan file in this PR so we can add that regression test?

For the partition issue, we can have unit tests create the partition directories and just copy the all_types parquet files

alippai · 2019-09-26T06:33:26Z

@andygrove removed the partition directory, the commit adds the single NaN value only

andygrove

LGTM

andygrove · 2019-09-26T12:59:41Z

@wesm could you merge this please

wesm · 2019-09-26T13:06:07Z

I'm worried about going down this route as a testing approach. Do you plan to keep adding more files as you develop the Parquet Rust project?

andygrove · 2019-09-26T13:10:41Z

Yes, we definitely need more parquet files to test against. The current testing is very limited.

Is you concern about checking in static files versus generating them using them scripts?

wesm · 2019-09-26T13:11:33Z

Yes, I don't think that having a static corpus is a scalable testing strategy.

andygrove · 2019-09-26T13:15:01Z

I hear you. On the other hand, if the Rust developers now have to have a C++ and/or Python env set up as well to be able to run tests, that's also not ideal either. I suppose this could be Dockerized though?

wesm · 2019-09-26T13:19:44Z

The ideal scenario is to generate files endogenously using the Rust library and not to rely on a different project. That's what we do in C++ (and what the Java library does also). I think checking in "problem" files that exhibit issues that you cannot easily generate from a particular library is okay.

wesm · 2019-09-26T13:23:29Z

Once Rust has a fully capable IPC implementation I'd be supportive of developing some Dockerized automated fuzz/integration testing between the C++/Python/R and Rust libraries. We can have the libraries cross-validate Parquet versus the Arrow protocol "point of truth"

andygrove · 2019-09-26T13:28:40Z

The ideal scenario is to generate files endogenously using the Rust library

Unfortunately the Rust implementation doesn't yet have support for writing Parquet files.

wesm · 2019-09-26T13:32:09Z

Okay. I think it's very important for the Rust developers to prioritize this otherwise it will be very difficult for the project to mature into something that people can depend on in production.

wesm · 2019-09-26T19:10:04Z

Merging this.

Short of writing Parquet files from Rust if this becomes a pattern I would recommend writing a data generation script in Python and providing a Dockerfile to run it as part of the testing process

andygrove approved these changes Sep 26, 2019

View reviewed changes

andygrove suggested changes Sep 26, 2019

View reviewed changes

ARROW-6687: Add .parquet file with single np.nan value

4bb8d95

alippai force-pushed the patch-1 branch from 9e70d62 to 4bb8d95 Compare September 26, 2019 06:31

alippai changed the title ~~ARROW-6687: Add simple partitioned arrow file based on alltypes_plain…~~ ARROW-6687: Add .parquet file with single np.nan value Sep 26, 2019

andygrove approved these changes Sep 26, 2019

View reviewed changes

wesm merged commit 46c9e97 into apache:master Sep 26, 2019

andygrove mentioned this pull request Sep 30, 2019

ARROW-4219: [Rust] [Parquet] Initial support for arrow reader. apache/arrow#5523

Closed

asfimport mentioned this pull request Sep 30, 2019

[Rust] [DataFusion] Query returns incorrect row count apache/arrow#23035

Closed

Conversation

alippai commented Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

alippai commented Sep 26, 2019

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Sep 26, 2019

Uh oh!

wesm commented Sep 26, 2019

Uh oh!

andygrove commented Sep 26, 2019

Uh oh!

wesm commented Sep 26, 2019

Uh oh!

andygrove commented Sep 26, 2019

Uh oh!

wesm commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Sep 26, 2019

Uh oh!

andygrove commented Sep 26, 2019

Uh oh!

wesm commented Sep 26, 2019

Uh oh!

wesm commented Sep 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alippai commented Sep 25, 2019 •

edited

Loading

wesm commented Sep 26, 2019 •

edited

Loading