[C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk

_Apologies if this is a duplicate, I haven't found anything related_

When creating an arrow table via the python api, the following code runs out of memory after using all the available resources on a box with 512GB of ram. This happens with pyarrow 4.0.0 and 4.0.1. However when running the same code with pyarrow 3.0.0, the memory usage only reaches 5GB (which seems like the appropriate ballpark for the table size).
 The code generates a table with a single string column with 1m rows, each string being 3000 characters long.

Not sure whether the issue is python related or not, I haven't tried replicating it from the C++ api.

 
```python

import os, string
import numpy as np
import pyarrow as pa

print(pa.__version__)
np.random.seed(42)

alphabet = list(string.ascii_uppercase)

_col = []
for _n in range(1000):
  k = ''.join(np.random.choice(alphabet, 3000))
  _col += [k] * 1000

table = pa.Table.from_pydict({'col': _col})
```

**Reporter**: [Laurent Mazare](https://issues.apache.org/jira/browse/ARROW-12983)
**Assignee**: [David Li](https://issues.apache.org/jira/browse/ARROW-12983) / @lidavidm
#### Related issues:
- [[Python] Pandas to_feather no longer works - runs out of memory](https://github.com/apache/arrow/issues/28729) (is duplicated by)
- [[Python] Processes killed and semaphore objects leaked when reading pandas data](https://github.com/apache/arrow/issues/28936) (is duplicated by)
- [[Python] pyarrow.array memory leak on large string arrays](https://github.com/apache/arrow/issues/29075) (is duplicated by)
#### PRs and other links:
- [GitHub Pull Request #10470](https://github.com/apache/arrow/pull/10470)
- [GitHub Pull Request #10556](https://github.com/apache/arrow/pull/10556)

<sub>**Note**: *This issue was originally created as [ARROW-12983](https://issues.apache.org/jira/browse/ARROW-12983). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk #28701

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk #28701

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions