Skip to content

[Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True #22462

@asfimport

Description

@asfimport

 I tried to load a parquet file of about 1.8Gb using the following code. It crashed due to out of memory issue.

import pyarrow.parquet as pq
pq.read_table('/tmp/test.parquet')

 However, it worked well with use_threads=True as follows

pq.read_table('/tmp/test.parquet', use_threads=False)

If pyarrow is downgraded to 0.12.1, there is no such problem.

Reporter: Kun Liu
Assignee: Ben Kietzman / @bkietz

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-6060. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions