Version=7.0.0 introduces bug when filtering by empty set during load

This issue is present in pyarrow v7.0.0, but not in v6.0.1.

Pyarrow errors when attempting to read from a parquet file with an empty filter on a string and categorical column. These are columns "E" and "F". Interestingly the issue is not present in v7.0.0 when reading from a float, timestamp or integer column ("A" through "D").

 

The following Python code presents a minimal example which reproduces the issue:
```python

import pandas as pd
import numpy as np
path = './example_df.parquet'
df = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df.to_parquet(path)

# Works!
df_read = pd.read_parquet(
    path,
    filters=[
        [
            ("A", "in", set())
        ]
    ]
)

# Pyarrow v6.0.1 and v7.0.0
#
# Empty DataFrame
# Columns: [A, B, C, D, E, F]
# Index: []
print(df_read)

# Fails!
df_read = pd.read_parquet(
    path,
    filters=[
        [
            ("F", "in", set())
        ]
    ]
)

# Pyarrow v6.0.1
#
# Empty DataFrame
# Columns: [A, B, C, D, E, F]
# Index: []

# Pyarrow v7.0.0
#
# pyarrow.lib.ArrowInvalid: Array type didn't match type of values set: string vs null
print(df_read) 
```

**Environment**: pandas                    1.3.5
pyarrow                   7.0.0
python                    3.10.4

**Reporter**: [Damian Barabonkov](https://issues.apache.org/jira/browse/ARROW-16045) / @DamianBarabonkovQC

<sub>**Note**: *This issue was originally created as [ARROW-16045](https://issues.apache.org/jira/browse/ARROW-16045). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version=7.0.0 introduces bug when filtering by empty set during load #31464

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version=7.0.0 introduces bug when filtering by empty set during load #31464

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions