Skip to content

Arrow freezes on write if chunk_size=0 #16342

@asfimport

Description

@asfimport

Pyarrow freezes if you set chunk_size=0 (e.g. if you forget to account for short data when setting chunk size as a function of table length, see example).

Would expect either to handle gracefully (e.g. revert to behaviour chunk_size=None) or to throw error.

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
cols = 'A', 'B', 'C', 'D'
row = np.arange(4)
data = pd.DataFrame([row], columns=cols)
table = pa.Table.from_pandas(data.reset_index(), timestamps_to_ms=True)
pq.write_table(table, 'test.pq', chunk_size=int(len(data) / 4))

Environment: Linux, macOS
Reporter: Jonathan Chambers
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-723. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions