[Python] better error message on creating ParquetDataset from empty directory

Currently, you get when `path` is an existing but empty directory:

```python

>>> dataset = pq.ParquetDataset(path)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-16-346f72ae525e> in <module>
----> 1 dataset = pq.ParquetDataset(path)

~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads, memory_map)
    989 
    990         if validate_schema:
--> 991             self.validate_schemas()
    992 
    993         if filters is not None:

~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
   1025                 self.schema = self.common_metadata.schema
   1026             else:
-> 1027                 self.schema = self.pieces[0].get_metadata().schema
   1028         elif self.schema is None:
   1029             self.schema = self.metadata.schema

IndexError: list index out of range
```

That could be a nicer error message. 

Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?) 
It is only failing when validating the schemas, so with `validate_schema=False` it actually returns a ParquetDataset object, just with an empty list for `pieces` and no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.

**Reporter**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-5310) / @jorisvandenbossche
**Assignee**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-5310) / @jorisvandenbossche
#### Related issues:
- [[Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim](https://github.com/apache/arrow/issues/17077) (depends upon)

<sub>**Note**: *This issue was originally created as [ARROW-5310](https://issues.apache.org/jira/browse/ARROW-5310). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] better error message on creating ParquetDataset from empty directory #16728

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] better error message on creating ParquetDataset from empty directory #16728

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions