Skip to content

BUG: avoid conversion to object dtype in to_numpy for Arrow dtype and with dtype-compatible fill value #54808

@jorisvandenbossche

Description

@jorisvandenbossche

Using the current main branch (2.1.0rc0+115.gca429943c7), if we convert an Arrow-backed boolean array to numpy, we get object dtype if there are missing values:

# no missing values -> bool dtype
>>> ser = pd.Series([True, False, True], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([ True, False,  True])

# missing values -> object dtype
>>> ser = pd.Series([True, False, None], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([True, False, <NA>], dtype=object)

This follows pyarrow's behaviour of converting to a numpy array (and I think also makes sense, unless we might want to raise an error instead, but that's for another discussion).

But when a user specifies an na_value to avoid this issue of missing values with numpy, we still return an object dtype:

>>> ser.to_numpy(na_value=False)
array([True, False, False], dtype=object)

while if the specified na_value is valid for the default target bool dtype (as in the example above), we could perfectly well return a proper bool dtype array automatically (currently you would have to be explicit: to_numpy(np.bool_, na_value=False)).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions