Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

Conversation

@wesm
Copy link
Member

@wesm wesm commented Apr 5, 2017

There's a lot of room for improvement / further refactoring here -- the assumption that an entire column in a file is being read runs very deep in the Arrow reader, so I tried to do the minimum work to decouple the row group iteration. There's some code duplication in ReadRowGroup, but we should maybe save further cleanup for a future patch.

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@asfgit asfgit closed this in d064665 Apr 6, 2017
@xhochy
Copy link
Member

xhochy commented Apr 6, 2017

The size of code duplications is far less than I would have expected ;)

asfgit pushed a commit to apache/arrow that referenced this pull request Apr 6, 2017
requires PARQUET-946 apache/parquet-cpp#291

cc @cpcloud @jreback @mrocklin

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #494 from wesm/ARROW-771 and squashes the following commits:

126789a [Wes McKinney] Fix docstring
1009423 [Wes McKinney] Add read_row_group / num_row_groups to ParquetFile
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants