-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Since upgrading to 4.0.1 writing to feather files with the pandas to_feather method uses up far, far more memory.
For reference I have a dataframe that is around 10gb in size, 25 million rows. Writing a feather file took around 3-4gb of memory in pyarrow versions up to 3.0.0. As of 4.0.1 I don't know how much memory it will take to successfully write - I tried running on a 120gb AWS machine, and that wasn't sufficient.
I can't provide the dataframe, but I can give an outline of the types / sizes of the columns:
size (bytes),type
206663144,int64
206663144,int64
206663144,float64
206663144,float64
2882448709,object
5813798687,object
206663144,float64
206663144,int64
206663144,int64
206663144,int64
206663144,int64
206663144,float64
Environment: Linux
Reporter: Roland Swingler
Related issues:
- [C++][Python] Converter::Extend gets stuck in infinite loop causing OOM if values don't fit in single chunk (duplicates)
Note: This issue was originally created as ARROW-13014. Please see the migration documentation for further details.