[Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True

 I tried to load a parquet file of about 1.8Gb using the following code. It crashed due to out of memory issue.
```java

import pyarrow.parquet as pq
pq.read_table('/tmp/test.parquet')
```
 However, it worked well with use_threads=True as follows
```java

pq.read_table('/tmp/test.parquet', use_threads=False)
```
If pyarrow is downgraded to 0.12.1, there is no such problem.

**Reporter**: [Kun Liu](https://issues.apache.org/jira/browse/ARROW-6060)
**Assignee**: [Ben Kietzman](https://issues.apache.org/jira/browse/ARROW-6060) / @bkietz
#### Related issues:
- [[Python] Reading a dictionary column from Parquet results in disproportionate memory usage](https://github.com/apache/arrow/issues/22400) (is duplicated by)
- [Method pyarrow.parquet.read_table has memory spikes from version 0.14](https://github.com/apache/arrow/issues/22753) (is duplicated by)
- [[Python] Reading a dictionary column from Parquet results in disproportionate memory usage](https://github.com/apache/arrow/issues/22400) (causes)
- [[Python] Regression memory issue when calling pandas.read_parquet](https://github.com/apache/arrow/issues/22461) (relates to)
- [[R] Reading in Parquet files are 20x slower than reading fst files in R](https://github.com/apache/arrow/issues/22617) (is related to)
#### PRs and other links:
- [GitHub Pull Request #5016](https://github.com/apache/arrow/pull/5016)

<sub>**Note**: *This issue was originally created as [ARROW-6060](https://issues.apache.org/jira/browse/ARROW-6060). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True #22462

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True #22462

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions