-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
If there is partitioning and the column has nulls, Int64 columns may not round trip successfully using the legacy datasets implementation.
Simple reproduction:
```python
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
import tempfile
table = pa.table({
'x': pa.array([None, 7753285016841556620]),
'y': pa.array(['a', 'b'])
})
ds_dir = tempfile.mkdtemp()
pq.write_to_dataset(table, ds_dir, partition_cols=['y'])
table_after = ds.dataset(ds_dir).to_table()
print(table['x'])
print(table_after['x'])
assert table['x'] == table_after['x']
```Java
[
[
null,
7753285016841556620
]
]
[
[
null
],
[
7753285016841556992
]
]
Reporter: Will Jones / @wjones127
Related issues:
- [Python] Long-term fate of pyarrow.parquet.ParquetDataset (is related to)
Note: This issue was originally created as ARROW-15725. Please see the migration documentation for further details.