-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
Arrowpyarrow functionalitypyarrow functionality
Description
Using the current main branch (2.1.0rc0+115.gca429943c7), if we convert an Arrow-backed boolean array to numpy, we get object dtype if there are missing values:
# no missing values -> bool dtype
>>> ser = pd.Series([True, False, True], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([ True, False, True])
# missing values -> object dtype
>>> ser = pd.Series([True, False, None], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([True, False, <NA>], dtype=object)This follows pyarrow's behaviour of converting to a numpy array (and I think also makes sense, unless we might want to raise an error instead, but that's for another discussion).
But when a user specifies an na_value to avoid this issue of missing values with numpy, we still return an object dtype:
>>> ser.to_numpy(na_value=False)
array([True, False, False], dtype=object)while if the specified na_value is valid for the default target bool dtype (as in the example above), we could perfectly well return a proper bool dtype array automatically (currently you would have to be explicit: to_numpy(np.bool_, na_value=False)).
Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionality