Skip to content

ArrowIOError: Arrow error: Capacity error #2485

@MarkiesFredje

Description

@MarkiesFredje

Hi,

I'm using pyarrow 0.10

I have a dataframe about 90GB size in memory, with one object column contain strings up to 27 characters max.

basket_plateau.to_parquet("basket_plateau.parquet", compression=None) writes this file to disk just fine
basket_plateau = pd.read_parquet("basket_plateau.parquet") fails however.

ArrowIOError: Arrow error: Capacity error: BinaryArray cannot contain more than 2147483646 bytes, have 2147483655

I can reproduce this exact same error when I use pyarrow directly:
pq.write_table(pa.Table.from_pandas(basket_plateau), "basket_plateau.parquet")
basket_plateau= pq.read_table("basket_plateau.parquet")

Kr.
Fred

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions