[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs

Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use `read_row_group` for my solution, but it fails.

 

I think it's due to fields like these:

`pyarrow.Field<to: list<item: string>>`

 

But I'm not sure. Is this a duplicate? The issue linked in the code is resolved https://github.com/apache/arrow/blob/fd0b90a7f7e65fde32af04c4746004a1240914cf/cpp/src/parquet/arrow/reader.cc#L915

 

Stacktrace is

 

`  File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches`
`    table = pf.read_row_group(ix, columns=self._columns)`
`  File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group`
`    use_threads=use_threads)`
`  File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group`
`  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status`
`pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs`

**Reporter**: [Jakub Okoński](https://issues.apache.org/jira/browse/ARROW-5030)
#### Related issues:
- [[C++] Support nested data conversions for chunked array](https://github.com/apache/arrow/issues/32723) (is duplicated by)
- [[C++][Parquet] 16MB limit on (nested) column chunk prevents tuning row_group_size](https://github.com/apache/arrow/issues/21216) (relates to)
- [[C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray](https://github.com/apache/arrow/issues/20081) (relates to)

<sub>**Note**: *This issue was originally created as [ARROW-5030](https://issues.apache.org/jira/browse/ARROW-5030). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions