-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
{}Version{}: pyarrow 9.0.0
Description
Users can add a column with the the same name as an existing column to a table via pyarrow.Table.add_column().
Additionally, that table can be written to a parquet file with pyarrow.parquet.write_table().
However, the written file cannot be read with pyarrow.parquet.read_table() due to having multiple columns with the same name.
Flagging this as a bug because I believe anything that is successfully written by write_table() should be readable by read_table().
Minimum reproducible example
>>> import pyarrow.parquet as pq
>>> import pyarrow as pa
>>> t = pa.Table.from_pydict(\{'a': [1,2,3]})
>>> pq.write_table(t.add_column(0, 'a', pa.array([1.1,2.2,3.3])), 'test.parquet')
>>> pq.read_table('test.parquet')
pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: double
a: int64
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string
Environment: MacOS, Python 3.10.3
Reporter: Grayden Shand
Assignee: Miles Granger / @milesgranger
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-17388. Please see the migration documentation for further details.