-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Small reproducer:
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({'part': [3760212050]*10, 'col': range(10)})
pq.write_to_dataset(table, "test_int64_partition", partition_cols=['part'])
In [35]: pq.read_table("test_int64_partition/")
...
ArrowInvalid: error parsing '3760212050' as scalar of type int32
In ../src/arrow/scalar.cc, line 333, code: VisitTypeInline(*type_, this)
In ../src/arrow/dataset/partition.cc, line 218, code: (_error_or_value26).status()
In ../src/arrow/dataset/partition.cc, line 229, code: (_error_or_value27).status()
In ../src/arrow/dataset/discovery.cc, line 256, code: (_error_or_value17).status()
In [36]: pq.read_table("test_int64_partition/", use_legacy_dataset=True)
Out[36]:
pyarrow.Table
col: int64
part: dictionary<values=int64, indices=int32, ordered=0>Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Ben Kietzman / @bkietz
PRs and other links:
Note: This issue was originally created as ARROW-10145. Please see the migration documentation for further details.