-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
From dask/dask#6534 (comment), comment by @rjzamora:
it would be great to have access the total row-group count for the fragment from a
num_row_groupsattribute (which pyarrow should be able to get without parsing all row-group metadata/statistics - I think?).
One question is: does this attribute correspond to the row groups in the parquet file, or the (subset of) row groups represented by the fragment?
I expect the second (so if you do SplitByRowGroup, you would get a fragment with num_row_groups==1), but this might be a potential confusing aspect of the attribute.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Ben Kietzman / @bkietz
PRs and other links:
Note: This issue was originally created as ARROW-10134. Please see the migration documentation for further details.