-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
In numpy one can do :
In [2]: import numpy as np
In [3]: a = np.array(['a', 'bb', 'ccc', 'dddd'], dtype="O")
In [4]: indices = np.array([0, -1, 2, 2, 0, 3])
In [5]: a[indices]
Out[5]: array(['a', 'dddd', 'ccc', 'ccc', 'a', 'dddd'], dtype=object)It would be nice to have a similar feature in pyarrow.
Currently, pa.arrow getitem supports only a slice or a single element as an argument.
Of course, using that we've some workarounds, like below
In [6]: import pyarrow as pa
In [7]: a = pa.array(['a', 'bb', 'ccc', 'dddd'])
In [8]: pa.array(a.to_pandas()[indices]) # if len(indices) is high
Out[8]:
<pyarrow.lib.StringArray object at 0x91bd845e8>
[
"a",
"dddd",
"ccc",
"ccc",
"a",
"dddd"
]
In [9]: pa.array([a[i].as_py() for i in indices]) # if len(indices) is low
Out[9]:
<pyarrow.lib.StringArray object at 0x91bc14868>
[
"a",
"dddd",
"ccc",
"ccc",
"a",
"dddd"
]both are not memory&cpu efficient.
Reporter: Artem KOZHEVNIKOV / @artemru
Related issues:
- [C++/Python] Add pandas-like take method to Array (is duplicated by)
Note: This issue was originally created as ARROW-5713. Please see the migration documentation for further details.