Skip to content

[Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs #21526

@asfimport

Description

@asfimport

Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use read_row_group for my solution, but it fails.

 

I think it's due to fields like these:

pyarrow.Field<to: list<item: string>>

 

But I'm not sure. Is this a duplicate? The issue linked in the code is resolved

// ARROW-3762(wesm): If inout_array is a chunked array, we reject as this is

 

Stacktrace is

 

  File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
    table = pf.read_row_group(ix, columns=self._columns)
  File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
    use_threads=use_threads)
  File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs

Reporter: Jakub Okoński

Related issues:

Note: This issue was originally created as ARROW-5030. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions