-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Starting from big array sizes (~500Mb) pyarrow.array constructor hangs and starts to consume memory until it's killed (by hand or by OOM).
import pyarrow as pa
my_string = 'a' * 40
strings = [my_string for _ in range(100_000_000)]
pyarrow_array = pa.array(x[:50_000_000]) # this works a couple of seconds
pyarrow_array = pa.array(x[:60_000_000]) # this hangs and consumes all free memory
In pyarrow==3.0.0 it works seamlessly.
Environment: Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
Python 3.7.6
Darwin 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
Python 3.8.6
Reporter: Mikhail
Related issues:
- [C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk (duplicates)
Note: This issue was originally created as ARROW-13406. Please see the migration documentation for further details.