Skip to content

[Python][C++] UnionArray with invalid data passes validation / leads to segfaults #22550

@asfimport

Description

@asfimport

From the Python side, you can create an "invalid" UnionArray:

binary = pa.array([b'a', b'b', b'c', b'd'], type='binary') 
int64 = pa.array([1, 2, 3], type='int64') 
types = pa.array([0, 1, 0, 0, 2, 1, 0], type='int8')   # <- value of 2 is out of bound for number of childs
value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')

a = pa.UnionArray.from_dense(types, value_offsets, [binary, int64])

Eg on conversion to python this leads to a segfault:

In [7]: a.to_pylist()
Segmentation fault (core dumped)

On the other hand, doing an explicit validation does not give an error:

In [8]: a.validate()

Should the validation raise errors for this case? (the C++ ValidateVisitor for UnionArray does nothing)

(so that this can be called from the Python API to avoid creating invalid arrays / segfaults there)

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Antoine Pitrou / @pitrou

PRs and other links:

Note: This issue was originally created as ARROW-6157. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions