[Python] ParquetDataset().read columns argument always returns partition column

I just noticed that no matter which columns are specified on load of a dataset, the partition column is always returned. This might lead to strange behaviour, as the resulting dataframe has more than the expected columns:
```Java

import dask as da
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import os
import numpy as np
import shutil

PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/'

if os.path.exists(PATH_PYARROW_MANUAL):
    shutil.rmtree(PATH_PYARROW_MANUAL)
os.mkdir(PATH_PYARROW_MANUAL)

arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan])
strings = np.array([np.nan, np.nan, 'a', 'b'])

df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column'])
df.index.name='DPRD_ID'
df['arrays'] = pd.Series(arrays)
df['strings'] = pd.Series(strings)

my_schema = pa.schema([('DPRD_ID', pa.int64()),
                       ('partition_column', pa.int32()),
                       ('arrays', pa.list_(pa.int32())),
                       ('strings', pa.string()),
                       ('new_column', pa.string())])

table = pa.Table.from_pandas(df, schema=my_schema)
pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, partition_cols=['partition_column'])

df_pq = pq.ParquetDataset(PATH_PYARROW_MANUAL).read(columns=['DPRD_ID', 'strings']).to_pandas()
# pd.read_parquet(PATH_PYARROW_MANUAL, columns=['DPRD_ID', 'strings'], engine='pyarrow')
df_pq
```
df_pq has column `partition_column`

**Reporter**: [Christian Thiel](https://issues.apache.org/jira/browse/ARROW-3861) / @c-thiel
**Assignee**: [Joris Van den Bossche](https://issues.apache.org/jira/browse/ARROW-3861) / @jorisvandenbossche
#### Related issues:
- [[Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim](https://github.com/apache/arrow/issues/17077) (depends upon)
#### PRs and other links:
- [GitHub Pull Request #7050](https://github.com/apache/arrow/pull/7050)

<sub>**Note**: *This issue was originally created as [ARROW-3861](https://issues.apache.org/jira/browse/ARROW-3861). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] ParquetDataset().read columns argument always returns partition column #20409

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] ParquetDataset().read columns argument always returns partition column #20409

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions