Skip to content

[Python] raise error message when passing invalid filter in parquet reading #22015

@asfimport

Description

@asfimport

From https://stackoverflow.com/questions/56522977/using-predicates-to-filter-rows-from-pyarrow-parquet-parquetdataset

For example, when specifying a column in the filter which is a normal column and not a key in your partitioned folder hierarchy, the filter gets silently ignored. It would be nice to get an error message for this.
Reproducible example:

df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1], 'c': [1, 2, 3, 4]})
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, 'test_parquet_row_filters', partition_cols=['a'])
# filter on 'a' (partition column) -> works
pq.read_table('test_parquet_row_filters', filters=[('a', '=', 1)]).to_pandas()
# filter on normal column (in future could do row group filtering) -> silently does nothing
pq.read_table('test_parquet_row_filters', filters=[('b', '=', 1)]).to_pandas()

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-5572. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions