Skip to content

[C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray #38770

@nph

Description

@nph

Describe the bug, including details regarding any error messages, version, and platform.

Filtering a record batch with a boolean mask in the form of a ChunkedArray results in a segmentation fault.

Reproduction on PyArrow 14.0.1:

import pandas as pd
import pyarrow as pa

# Create a simple record batch
df = pd.DataFrame(range(1, 7), columns=['x'])
record_batch = pa.RecordBatch.from_pandas(df)

# Generate a boolean mask as a chunked array
mask = pa.chunked_array([[True, False], [True, False], [True, False]])

# Flatten the chunked array mask and use this to filter the RecordBatch - all good
print(record_batch.filter(mask.combine_chunks()))
# pyarrow.RecordBatch
# x: int64 
# ----
# x: [1,3,5]

# Try filtering with the original chunked array - segfaults
print(record_batch.filter(mask))
# libc++abi: terminating with uncaught exception of type std::bad_variant_access: bad_variant_access
# Abort trap: 6

Note - this problem doesn't occur when filtering a table.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions