-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Labels
Component: PythonCritical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: buggood-second-issue
Milestone
Description
Describe the bug, including details regarding any error messages, version, and platform.
Filtering a record batch with a boolean mask in the form of a ChunkedArray results in a segmentation fault.
Reproduction on PyArrow 14.0.1:
import pandas as pd
import pyarrow as pa
# Create a simple record batch
df = pd.DataFrame(range(1, 7), columns=['x'])
record_batch = pa.RecordBatch.from_pandas(df)
# Generate a boolean mask as a chunked array
mask = pa.chunked_array([[True, False], [True, False], [True, False]])
# Flatten the chunked array mask and use this to filter the RecordBatch - all good
print(record_batch.filter(mask.combine_chunks()))
# pyarrow.RecordBatch
# x: int64
# ----
# x: [1,3,5]
# Try filtering with the original chunked array - segfaults
print(record_batch.filter(mask))
# libc++abi: terminating with uncaught exception of type std::bad_variant_access: bad_variant_access
# Abort trap: 6
Note - this problem doesn't occur when filtering a table.
Component(s)
Python
Metadata
Metadata
Assignees
Labels
Component: PythonCritical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: buggood-second-issue