Skip to content

[C++][Dataset] Add ParquetFileFragment::num_row_groups property #26145

@asfimport

Description

@asfimport

From dask/dask#6534 (comment), comment by @rjzamora:

it would be great to have access the total row-group count for the fragment from a num_row_groups attribute (which pyarrow should be able to get without parsing all row-group metadata/statistics - I think?).

One question is: does this attribute correspond to the row groups in the parquet file, or the (subset of) row groups represented by the fragment?
I expect the second (so if you do SplitByRowGroup, you would get a fragment with num_row_groups==1), but this might be a potential confusing aspect of the attribute.

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Ben Kietzman / @bkietz

PRs and other links:

Note: This issue was originally created as ARROW-10134. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions