Skip to content

[Python] Pandas to_feather no longer works - runs out of memory #28729

@asfimport

Description

@asfimport

Since upgrading to 4.0.1 writing to feather files with the pandas to_feather method uses up far, far more memory.

For reference I have a dataframe that is around 10gb in size, 25 million rows. Writing a feather file took around 3-4gb of memory in pyarrow versions up to 3.0.0. As of 4.0.1 I don't know how much memory it will take to successfully write - I tried running on a 120gb AWS machine, and that wasn't sufficient.

I can't provide the dataframe, but I can give an outline of the types / sizes of the columns:

size (bytes),type
206663144,int64
206663144,int64
206663144,float64
206663144,float64
2882448709,object
5813798687,object
206663144,float64
206663144,int64
206663144,int64
206663144,int64
206663144,int64
206663144,float64

Environment: Linux
Reporter: Roland Swingler

Related issues:

Note: This issue was originally created as ARROW-13014. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions