-
Notifications
You must be signed in to change notification settings - Fork 70
Add file with NaN in statistics #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f14becb to
94c8d18
Compare
data/README.md
Outdated
| | datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC | | ||
| | datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC | | ||
| | overflow_i16_page_cnt.parquet | row group with more than INT16_MAX pages | | ||
| <<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can you remove the merge markers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
data/README.md
Outdated
|
|
||
| ## NaN in stats | ||
|
|
||
| Previous versions of the C++ Parquet writer would write NaN values in min and max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Previous versions" is quite unspecific, can you be more explciit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted the Parquet version that changed in.
| > * If the max is a NaN, it should be ignored. | ||
| > * If the min is +0, the row group may contain -0 values as well. | ||
| > * If the max is -0, the row group may contain +0 values as well. | ||
| > * When looking for NaN values, min and max should be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to add the "File was generated with: [...]" snippet that is part of the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it would always be in the PR history, but your are right that in the text would be more convenient.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wjones127 !
File was generated with: