[Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim

Assemble a minimal ParquetDataset shim backed by `pyarrow.dataset.*`. Replace the existing ParquetDataset with the shim by default, allow opt-out for users who need the current ParquetDataset

This is mostly exploratory to see which of the python tests fail

**Reporter**: [Ben Kietzman](https://issues.apache.org/jira/browse/ARROW-8039) / @bkietz
**Assignee**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-8039) / @jorisvandenbossche
#### Related issues:
- [[Python] More graceful reading of empty String columns in ParquetDataset](https://github.com/apache/arrow/issues/19053) (is depended upon by)
- [[Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read](https://github.com/apache/arrow/issues/19233) (is depended upon by)
- [[Python] Datatypes not preserved for partition fields in roundtrip to partitioned parquet dataset](https://github.com/apache/arrow/issues/22510) (is depended upon by)
- [[Python] ParquetDataset().read columns argument always returns partition column](https://github.com/apache/arrow/issues/20409) (is depended upon by)
- [[Python] Underscores in partition (string) values are dropped when reading dataset](https://github.com/apache/arrow/issues/22099) (is depended upon by)
- [[Python] better error message on creating ParquetDataset from empty directory](https://github.com/apache/arrow/issues/16728) (is depended upon by)
- [[Python] raise error message when passing invalid filter in parquet reading](https://github.com/apache/arrow/issues/22015) (is depended upon by)
- [[C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets](https://github.com/apache/arrow/issues/19253) (is depended upon by)
- [[C++][Dataset] Automatically detect boolean partition columns](https://github.com/apache/arrow/issues/19718) (is depended upon by)
- [[Python][C++][Parquet] Support reading Parquet files having a permutation of column order](https://github.com/apache/arrow/issues/18353) (is depended upon by)
- [[Python] Improved workflow for loading an arbitrary collection of Parquet files](https://github.com/apache/arrow/issues/19750) (is depended upon by)
- [[Python] RowGroup filtering on file level](https://github.com/apache/arrow/issues/17793) (is depended upon by)
#### PRs and other links:
- [GitHub Pull Request #6303](https://github.com/apache/arrow/pull/6303)

<sub>**Note**: *This issue was originally created as [ARROW-8039](https://issues.apache.org/jira/browse/ARROW-8039). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim #17077

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim #17077

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions