-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Reading a dataset with a dictionary column where some of the files don't contain any data for that column (and thus are typed as null) broke with #9532. It worked with the 3.0 release though and thus I would consider this a regression.
This can be reproduced using the following Python snippet:
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
table = pa.table({"a": [None, None]})
pq.write_table(table, "test.parquet")
schema = pa.schema([pa.field("a", pa.dictionary(pa.int32(), pa.string()))])
fsds = ds.FileSystemDataset.from_paths(
paths=["test.parquet"],
schema=schema,
format=pa.dataset.ParquetFileFormat(),
filesystem=pa.fs.LocalFileSystem(),
)
fsds.to_table()The exception on master is currently:
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-14-5f0bc602f16b> in <module>
6 filesystem=pa.fs.LocalFileSystem(),
7 )
----> 8 fsds.to_table()
~/Development/arrow/python/pyarrow/_dataset.pyx in pyarrow._dataset.Dataset.to_table()
456 table : Table instance
457 """
--> 458 return self._scanner(**kwargs).to_table()
459
460 def head(self, int num_rows, **kwargs):
~/Development/arrow/python/pyarrow/_dataset.pyx in pyarrow._dataset.Scanner.to_table()
2887 result = self.scanner.ToTable()
2888
-> 2889 return pyarrow_wrap_table(GetResultValue(result))
2890
2891 def take(self, object indices):
~/Development/arrow/python/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
139 cdef api int pyarrow_internal_check_status(const CStatus& status) \
140 nogil except -1:
--> 141 return check_status(status)
142
143
~/Development/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
116 raise ArrowKeyError(message)
117 elif status.IsNotImplemented():
--> 118 raise ArrowNotImplementedError(message)
119 elif status.IsTypeError():
120 raise ArrowTypeError(message)
ArrowNotImplementedError: Unsupported cast from null to dictionary<values=string, indices=int32, ordered=0> (no available cast function for target type)Reporter: Uwe Korn / @xhochy
Assignee: Krisztian Szucs / @kszucs
PRs and other links:
Note: This issue was originally created as ARROW-12420. Please see the migration documentation for further details.