-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use read_row_group for my solution, but it fails.
I think it's due to fields like these:
pyarrow.Field<to: list<item: string>>
But I'm not sure. Is this a duplicate? The issue linked in the code is resolved
arrow/cpp/src/parquet/arrow/reader.cc
Line 915 in fd0b90a
| // ARROW-3762(wesm): If inout_array is a chunked array, we reject as this is |
Stacktrace is
File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
table = pf.read_row_group(ix, columns=self._columns)
File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
use_threads=use_threads)
File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs
Reporter: Jakub Okoński
Related issues:
- [C++] Support nested data conversions for chunked array (is duplicated by)
- [C++][Parquet] 16MB limit on (nested) column chunk prevents tuning row_group_size (relates to)
- [C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray (relates to)
Note: This issue was originally created as ARROW-5030. Please see the migration documentation for further details.