I'm using pyarrow to read a 40MB parquet file.
When reading all of the columns besides the "body" columns, the process peaks at 170MB.
Reading only the "body" column results in over 6GB of memory used.
I made the file publicly accessible: s3://dhavivresearch/pyarrow/demofile.parquet
Reporter: Daniel Haviv
Assignee: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-5993. Please see the migration documentation for further details.