-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-18413: [C++][Parquet] Expose page index info from ColumnChunkMetaData #14742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
@pitrou @emkornfield Can you please take a look? As I have mentioned in other thread, I will work on the page index and break it down into small commits to make it review-friendly. |
cpp/src/parquet/metadata.h
Outdated
| std::unique_ptr<ColumnCryptoMetaData> crypto_metadata() const; | ||
|
|
||
| bool has_column_index() const; | ||
| int64_t column_index_offset() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a clearer contract to have something like:
struct IndexLocation {
int64_t index_file_offset_bytes;
int32_t offset_index_length
}
optional<IndexLocation> GetColumIndexLocation();
optional<IndexLocation> GetOffsetIndexLocation();
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I will change it shortly.
|
I have addressed your comment, and the unsuccessful CI checks are unrelated to my change. Can you please take a look again? @emkornfield |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Is it possible to add basic tests for this?
Thank you for the review. @pitrou I have added a simple test and addressed your comment. Please take a look again when you get the chance. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @wgtmac !
|
Test failures are unrelated, will merge. |
|
Benchmark runs are scheduled for baseline = 6f4a539 and contender = 958fbfa. 958fbfa is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This is the first step to support page index of parquet.