Skip to content

[BUG]: string_caster should not use temporary on Python >= 3.3 #3252

@jbms

Description

@jbms

Required prerequisites

Problem description

Currently, string_caster always creates a temporary PyBytes object by calling PyUnicode_AsEncodedString. However, in the common case of UTF_N == 8, on Python >= 3.3 we can instead use PyUnicode_AsUTF8AndSize, which manages and caches the UTF-8 encoding internally within the PyUnicode object. If the UTF-8 representation is cached, then the encoding does not have to be done at all, avoiding an extra copy of the string.

This is particularly advantageous in the IsView == true case: users likely expect casting from PyUnicode to std::string_view to be low cost, but currently it always involves a copy of the string. Additionally, it the IsView == true case has the additional cost of relying on loader_life_support::add_patient, which introduces additional cost and additional memory allocations (e.g. with the change in #3237, an allocation of the unordered_set bucket array on first use of loader_life_support, and an additional allocation of the node).

Reproducible example code

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions