[Python] ArrowInvalid error on reading partitioned parquet files from S3 (arrow-2.0.0)

Hello

It looks like pyarrow-2.0.0 could not read partitioned datasets from S3 buckets: 
```java

import s3fs
import pyarrow as pa
import pyarrow.parquet as pq

filesystem = s3fs.S3FileSystem()

d = pd.date_range('1990-01-01', freq='D', periods=10000)
vals = np.random.randn(len(d), 4)
x = pd.DataFrame(vals, index=d, columns=['A', 'B', 'C', 'D'])
x['Year'] = x.index.year

table = pa.Table.from_pandas(x, preserve_index=True)
pq.write_to_dataset(table, root_path='s3://bucket/test_pyarrow.parquet', partition_cols=['Year'], filesystem=filesystem)
```
 

 Now, reading it via pq.read_table:
```java

pq.read_table('s3://bucket/test_pyarrow.parquet', filesystem=filesystem, use_pandas_metadata=True)
```
Raises exception: 
```java

ArrowInvalid: GetFileInfo() yielded path 'bucket/test_pyarrow.parquet/Year=2017/ffcc136787cf46a18e8cc8f72958453f.parquet', which is outside base dir 's3://bucket/test_pyarrow.parquet'
```
 

Direct read in pandas:
```java

pd.read_parquet('s3://bucket/test_pyarrow.parquet')
```
returns empty DataFrame.

 

The issue does not exist in pyarrow-1.0.1

**Reporter**: [Vladimir](https://issues.apache.org/jira/browse/ARROW-10937)
#### Related issues:
- [[C++] Filesystems: detect if URI is passed where a file path is required and raise informative error](https://github.com/apache/arrow/issues/18430) (relates to)

<sub>**Note**: *This issue was originally created as [ARROW-10937](https://issues.apache.org/jira/browse/ARROW-10937). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] ArrowInvalid error on reading partitioned parquet files from S3 (arrow-2.0.0) #26864

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] ArrowInvalid error on reading partitioned parquet files from S3 (arrow-2.0.0) #26864

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions