Skip to content

[C++][Python] utf8_slice_codeunits produces invalid unicode sequence #36311

@wirable23

Description

@wirable23

Describe the bug, including details regarding any error messages, version, and platform.

pa.compute.utf8_slice_codeunits(f"AB{chr(127917)}C{chr(127917)}ㇱD", start=2, stop=None, step=4).as_py()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow\scalar.pxi", line 632, in pyarrow.lib.StringScalar.as_py
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 4: invalid start byte
>>>

The result of utf8_slice_codeunits produced an invalid unicode sequence.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions