-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The following shouldn't throw
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pyarrow.dataset as ds
>>> pa.__version__
'2.0.0'
>>> schema = pa.schema([pa.field("utf8", pa.utf8())])
>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)
>>> pq.write_table(table, "/tmp/example.parquet")
>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])
>>> ds.dataset("/tmp/example.parquet", schema=large_schema,
format="parquet").to_table()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/_dataset.pyx", line 405, in
pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 2262, in
pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 122, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: fields had matching names but differing types.
From: utf8: string To: utf8: large_string
Reporter: Micah Kornfield / @emkornfield
Related issues:
- [C++][Dataset] Schema evolution in Dataset scanning (is related to)
Note: This issue was originally created as ARROW-11353. Please see the migration documentation for further details.