[Python] Categorical boolean column saved as regular boolean in parquet

When saving a pandas dataframe to parquet, if there is a categorical column where the categories are boolean, the column is saved as regular boolean.

This causes an issue because, when reading back the parquet file, I expect the column to still be categorical.

 
Reproducible example:
```python

import pandas as pd
import pyarrow

# Create dataframe with boolean column that is then converted to categorical
df = pd.DataFrame({'a': [True, True, False, True, False]})
df['a'] = df['a'].astype('category')

# Convert to arrow Table and save to disk
table = pyarrow.Table.from_pandas(df)
pyarrow.parquet.write_table(table, 'test.parquet')

# Reload data and convert back to pandas
table_rel = pyarrow.parquet.read_table('test.parquet')
df_rel = table_rel.to_pandas()
```

The arrow `table` variable correctly converts the column to an arrow `DICTIONARY` type:
```

>>> df['a']
0     True
1     True
2    False
3     True
4    False
Name: a, dtype: category
Categories (2, object): [False, True]
>>>
>>> table
pyarrow.Table
a: dictionary<values=bool, indices=int8, ordered=0>
```

However, the reloaded column is now a regular boolean:
```

>>> table_rel
pyarrow.Table
a: bool
>>>
>>> df_rel['a']
0     True
1     True
2    False
3     True
4    False
Name: a, dtype: bool
```

I would have expected the column to be read back as categorical.


**Reporter**: [Joao Moreira](https://issues.apache.org/jira/browse/ARROW-13342)
#### Related issues:
- [[C++][Parquet] Support direct dictionary decoding of types other than BYTE_ARRAY](https://github.com/apache/arrow/issues/22534) (is blocked by)
- [[Python] Consistent handling of categoricals](https://github.com/apache/arrow/issues/27067) (is related to)

<sub>**Note**: *This issue was originally created as [ARROW-13342](https://issues.apache.org/jira/browse/ARROW-13342). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Categorical boolean column saved as regular boolean in parquet #29017

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Categorical boolean column saved as regular boolean in parquet #29017

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions