Skip to content

[Python] Prevent corrupting files with Multiple matches for FieldRef.Name #32660

@asfimport

Description

@asfimport

{}Version{}: pyarrow 9.0.0

 

Description

Users can add a column with the the same name as an existing column to a table via pyarrow.Table.add_column().

 

Additionally, that table can be written to a parquet file with pyarrow.parquet.write_table().

 

However, the written file cannot be read with pyarrow.parquet.read_table() due to having multiple columns with the same name.

 

Flagging this as a bug because I believe anything that is successfully written by write_table() should be readable by read_table().

 

Minimum reproducible example


>>> import pyarrow.parquet as pq
>>> import pyarrow as pa
>>> t = pa.Table.from_pydict(\{'a': [1,2,3]})
>>> pq.write_table(t.add_column(0, 'a', pa.array([1.1,2.2,3.3])), 'test.parquet')
>>> pq.read_table('test.parquet')
pyarrow.lib.ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: double
a: int64
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string

Environment: MacOS, Python 3.10.3
Reporter: Grayden Shand
Assignee: Miles Granger / @milesgranger

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17388. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions