-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
There have been various (slightly different) bugs reported about using "utf8_slice_codeunits" with optional start or stop. The stop argument is already optional and translated into the largest int to indicate to always slice until the end, but that internal "workaround" also produces some bugs in the current implementation due to integer overflows.
Potentially, we could use a different mechanism to signal a default start/stop, such as using std::optional<int64_t> instead of std::numeric_limits<int64_t>::max()
Listing the related issues:
- [C++] Enable slicing to end of string using "utf8_slice_codeunits" when string length unknown or different lengths #28940
- [Python] pyarrow.compute.utf8_slice_codeunits fails when stop=None #14991
- [C++][Python] Allow utf8_slice_codeunits to support default start value of None to support strings of different length #34917
- [C++] utf8_slice_codeunits crashes with max start and negative step #34928
The option class is also used for "binary_slice" kernel.
Component(s)
C++