From daf8e02c708849d160b6a71f5dcd266f21c587c6 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 2 Oct 2025 21:08:13 -0700 Subject: [PATCH 01/43] gh-139871: Update bytearray to contain PyBytesObject This sets up so the bytes can be "taken" as a byes object without requiring a copy. I ran pyperformance (results below) and don't see any major speedups or slowdowns with this; all seems to be in the noise of my machine. ------ pyperformance compare main.json bytearray_bytes.json -O table main.json ========= Performance version: 1.11.0 Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42 Number of logical CPUs: 32 Start date: 2025-10-14 00:55:52.519236 End date: 2025-10-14 02:23:01.308400 bytearray_bytes.json ==================== Performance version: 1.11.0 Report on Linux-6.17.1-arch1-1-x86_64-with-glibc2.42 Number of logical CPUs: 32 Start date: 2025-10-13 23:22:29.928152 End date: 2025-10-14 00:49:34.467284 +----------------------------------+-----------+----------------------+--------------+------------------------+ | Benchmark | main.json | bytearray_bytes.json | Change | Significance | +==================================+===========+======================+==============+========================+ | 2to3 | 137 ms | 136 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_generators | 193 ms | 195 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_cpu_io_mixed | 285 ms | 286 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_cpu_io_mixed_tg | 289 ms | 290 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager | 50.4 ms | 51.5 ms | 1.02x slower | Significant (t=-10.40) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_cpu_io_mixed | 223 ms | 225 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_cpu_io_mixed_tg | 263 ms | 264 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_io | 370 ms | 372 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_io_tg | 380 ms | 384 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_memoization | 125 ms | 126 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_memoization_tg | 161 ms | 162 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_eager_tg | 125 ms | 125 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_io | 366 ms | 360 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_io_tg | 359 ms | 361 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_memoization | 177 ms | 181 ms | 1.02x slower | Significant (t=-9.20) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_memoization_tg | 188 ms | 189 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_none | 151 ms | 151 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | async_tree_none_tg | 150 ms | 151 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_tcp | 182 ms | 161 ms | 1.13x faster | Significant (t=32.85) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_tcp_ssl | 548 ms | 553 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | asyncio_websockets | 342 ms | 339 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bench_mp_pool | 7.12 ms | 7.08 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bench_thread_pool | 818 us | 819 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | bpe_tokeniser | 2.10 sec | 2.09 sec | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | chaos | 27.9 ms | 28.0 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | comprehensions | 7.45 us | 7.24 us | 1.03x faster | Significant (t=3.27) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | connected_components | 308 ms | 309 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | coroutines | 11.1 ms | 11.2 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | coverage | 33.6 ms | 34.1 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | create_gc_cycles | 1.16 ms | 1.16 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | crypto_pyaes | 37.1 ms | 35.6 ms | 1.04x faster | Significant (t=10.63) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | dask | 347 ms | 351 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy | 118 us | 117 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy_memo | 12.8 us | 12.7 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deepcopy_reduce | 1.32 us | 1.34 us | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | deltablue | 1.65 ms | 1.64 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | django_template | 17.9 ms | 17.8 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | docutils | 1.19 sec | 1.20 sec | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | dulwich_log | 19.5 ms | 19.7 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | fannkuch | 184 ms | 181 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | float | 37.1 ms | 36.7 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | gc_traversal | 3.04 ms | 2.84 ms | 1.07x faster | Significant (t=19.48) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | generators | 15.9 ms | 15.3 ms | 1.04x faster | Significant (t=7.03) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | genshi_text | 11.3 ms | 11.2 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | genshi_xml | 25.5 ms | 25.5 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | go | 57.6 ms | 56.7 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | hexiom | 2.92 ms | 2.88 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | html5lib | 26.0 ms | 26.5 ms | 1.02x slower | Significant (t=-9.20) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | json_dumps | 4.48 ms | 4.44 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | json_loads | 11.7 us | 11.7 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | k_core | 1.41 sec | 1.42 sec | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_format | 3.27 us | 3.30 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_silent | 45.5 ns | 45.8 ns | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | logging_simple | 3.02 us | 3.01 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | mako | 6.02 ms | 6.03 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | many_optionals | 473 us | 478 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | mdp | 587 ms | 578 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | meteor_contest | 50.2 ms | 50.5 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | nbody | 54.6 ms | 52.4 ms | 1.04x faster | Significant (t=10.72) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | nqueens | 41.7 ms | 40.4 ms | 1.03x faster | Significant (t=6.79) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pathlib | 9.77 ms | 9.73 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle | 5.99 us | 6.01 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_dict | 12.5 us | 12.8 us | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_list | 1.98 us | 1.96 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pickle_pure_python | 149 us | 150 us | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pidigits | 111 ms | 115 ms | 1.03x slower | Significant (t=-18.53) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pprint_pformat | 737 ms | 748 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pprint_safe_repr | 362 ms | 369 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | pyflate | 211 ms | 205 ms | 1.03x faster | Significant (t=7.43) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | python_startup | 7.88 ms | 7.88 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | python_startup_no_site | 4.72 ms | 4.76 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | raytrace | 130 ms | 128 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_compile | 50.0 ms | 50.2 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_dna | 101 ms | 103 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_effbot | 1.72 ms | 1.77 ms | 1.03x slower | Significant (t=-26.42) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | regex_v8 | 12.5 ms | 12.3 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | richards | 20.4 ms | 20.0 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | richards_super | 23.4 ms | 22.8 ms | 1.03x faster | Significant (t=11.36) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_fft | 154 ms | 153 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_lu | 55.4 ms | 57.0 ms | 1.03x slower | Significant (t=-5.67) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_monte_carlo | 32.8 ms | 32.8 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_sor | 57.8 ms | 56.9 ms | 1.02x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | scimark_sparse_mat_mult | 2.75 ms | 2.76 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | shortest_path | 316 ms | 318 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | spectral_norm | 47.7 ms | 51.6 ms | 1.08x slower | Significant (t=-2.01) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sphinx | 465 ms | 467 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_normalize | 50.3 ms | 50.2 ms | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_optimize | 24.2 ms | 24.4 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_parse | 576 us | 572 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlglot_v2_transpile | 724 us | 722 us | 1.00x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sqlite_synth | 1.14 us | 1.15 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | subparsers | 20.6 ms | 20.7 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_expand | 181 ms | 184 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_integrate | 8.54 ms | 8.55 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_str | 103 ms | 105 ms | 1.02x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | sympy_sum | 55.9 ms | 56.0 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | telco | 3.39 ms | 3.34 ms | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | tomli_loads | 971 ms | 982 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | typing_runtime_protocols | 73.2 us | 73.6 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpack_sequence | 25.2 ns | 23.0 ns | 1.10x faster | Significant (t=7.03) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle | 6.99 us | 7.05 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle_list | 2.07 us | 2.10 us | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | unpickle_pure_python | 105 us | 104 us | 1.01x faster | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_generate | 40.5 ms | 40.7 ms | 1.00x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_iterparse | 49.7 ms | 50.4 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_parse | 77.2 ms | 79.1 ms | 1.02x slower | Significant (t=-16.14) | +----------------------------------+-----------+----------------------+--------------+------------------------+ | xml_etree_process | 29.5 ms | 29.8 ms | 1.01x slower | Not significant | +----------------------------------+-----------+----------------------+--------------+------------------------+ --- Include/cpython/bytearrayobject.h | 1 + Lib/test/test_sys.py | 2 +- Objects/bytearrayobject.c | 38 +++++++++++++++++-------------- 3 files changed, 23 insertions(+), 18 deletions(-) diff --git a/Include/cpython/bytearrayobject.h b/Include/cpython/bytearrayobject.h index 4dddef713ce097..f116271ad655b5 100644 --- a/Include/cpython/bytearrayobject.h +++ b/Include/cpython/bytearrayobject.h @@ -9,6 +9,7 @@ typedef struct { char *ob_bytes; /* Physical backing buffer */ char *ob_start; /* Logical start inside ob_bytes */ Py_ssize_t ob_exports; /* How many buffer exports */ + PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */ } PyByteArrayObject; PyAPI_DATA(char) _PyByteArray_empty_string[]; diff --git a/Lib/test/test_sys.py b/Lib/test/test_sys.py index 1198c6d35113c8..a65840d437ec62 100644 --- a/Lib/test/test_sys.py +++ b/Lib/test/test_sys.py @@ -1583,7 +1583,7 @@ def test_objecttypes(self): samples = [b'', b'u'*100000] for sample in samples: x = bytearray(sample) - check(x, vsize('n2Pi') + x.__alloc__()) + check(x, vsize('n2PiP') + x.__alloc__()) # bytearray_iterator check(iter(bytearray()), size('nP')) # bytes diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index c519485c1cc74c..1445853f2d36d0 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -141,22 +141,26 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) } new = PyObject_New(PyByteArrayObject, &PyByteArray_Type); - if (new == NULL) + if (new == NULL) { return NULL; + } if (size == 0) { + new->ob_bytes_object = NULL; new->ob_bytes = NULL; alloc = 0; } else { alloc = size + 1; - new->ob_bytes = PyMem_Malloc(alloc); + new->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); + new->ob_bytes = PyBytes_AsString(new->ob_bytes_object); if (new->ob_bytes == NULL) { Py_DECREF(new); return PyErr_NoMemory(); } - if (bytes != NULL && size > 0) + if (bytes != NULL && size > 0) { memcpy(new->ob_bytes, bytes, size); + } new->ob_bytes[size] = '\0'; /* Trailing null byte */ } Py_SET_SIZE(new, size); @@ -189,7 +193,6 @@ static int bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) { _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED(self); - void *sval; PyByteArrayObject *obj = ((PyByteArrayObject *)self); /* All computations are done unsigned to avoid integer overflows (see issue #22335). */ @@ -244,25 +247,28 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) return -1; } + /* re-align data to the start of the allocation. */ if (logical_offset > 0) { - sval = PyMem_Malloc(alloc); - if (sval == NULL) { - PyErr_NoMemory(); + memmove(obj->ob_bytes, obj->ob_start, + Py_MIN(requested_size, Py_SIZE(self))); + } + + if (obj->ob_bytes_object == NULL) { + obj->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); + if (obj->ob_bytes_object == NULL) { return -1; } - memcpy(sval, PyByteArray_AS_STRING(self), - Py_MIN((size_t)requested_size, (size_t)Py_SIZE(self))); - PyMem_Free(obj->ob_bytes); } else { - sval = PyMem_Realloc(obj->ob_bytes, alloc); - if (sval == NULL) { - PyErr_NoMemory(); + if (_PyBytes_Resize(&obj->ob_bytes_object, alloc) == -1) { + Py_SET_SIZE(self, 0); + obj->ob_bytes = obj->ob_start = NULL; + FT_ATOMIC_STORE_SSIZE_RELAXED(obj->ob_alloc, 0); return -1; } } - obj->ob_bytes = obj->ob_start = sval; + obj->ob_bytes = obj->ob_start = PyBytes_AS_STRING(obj->ob_bytes_object); Py_SET_SIZE(self, size); FT_ATOMIC_STORE_SSIZE_RELAXED(obj->ob_alloc, alloc); obj->ob_bytes[size] = '\0'; /* Trailing null byte */ @@ -1169,9 +1175,7 @@ bytearray_dealloc(PyObject *op) "deallocated bytearray object has exported buffers"); PyErr_Print(); } - if (self->ob_bytes != 0) { - PyMem_Free(self->ob_bytes); - } + Py_CLEAR(self->ob_bytes_object); Py_TYPE(self)->tp_free((PyObject *)self); } From 39b2d15afab3a742992b09dbbbd29a5c098c43ee Mon Sep 17 00:00:00 2001 From: "blurb-it[bot]" <43283697+blurb-it[bot]@users.noreply.github.com> Date: Tue, 14 Oct 2025 18:24:20 +0000 Subject: [PATCH 02/43] =?UTF-8?q?=F0=9F=93=9C=F0=9F=A4=96=20Added=20by=20b?= =?UTF-8?q?lurb=5Fit.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst | 1 + 1 file changed, 1 insertion(+) create mode 100644 Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst diff --git a/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst new file mode 100644 index 00000000000000..0af1a596136bbe --- /dev/null +++ b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst @@ -0,0 +1 @@ +Update :class:`bytearray` to use a :class:`bytes` under the hood as its buffer which enables optimizations. From 86faf1da867520f6afe2d7017cbbf30a911d51d1 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 2 Oct 2025 21:27:23 -0700 Subject: [PATCH 03/43] Add bytearray.take_bytes --- Doc/library/stdtypes.rst | 91 ++++++++++++++++++++++++++++++ Lib/test/test_bytes.py | 52 +++++++++++++++++ Objects/bytearrayobject.c | 90 +++++++++++++++++++++++++++++ Objects/clinic/bytearrayobject.c.h | 39 ++++++++++++- 4 files changed, 271 insertions(+), 1 deletion(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 21fe35edc1bef8..df2b5ab802b81e 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -3158,6 +3158,97 @@ objects. .. versionadded:: 3.14 + .. method:: take_bytes(n=None, /) + + Take the first *n* bytes as an immutable :class:`bytes`. Defaults to all + bytes. + + If *n* is negative indexes from the end and takes the first :func:`len` + minus *n* bytes. If *n* is out of bounds raises :exc:`IndexError`. + + Taking less than the full length will leave remaining bytes in the + :class:`bytearray` which requires a copy. If the remaining bytes should be + discarded use :func:`~bytearray.resize` or :keyword:`del` to truncate + then :func:`~bytearray.take_bytes` without a size. + + .. impl-detail:: + + CPython implements this as a zero-copy operation making it a very + efficient way to make a :class:`bytes` from a :class:`bytearray`. + + .. list-table:: Suggested Replacements + :header-rows: 1 + + * - Description + - Old + - New + + * - Return :class:`bytes` after working with :class:`bytearray` + - .. code:: python + + + def read() -> bytes: + buffer = bytearray(1024) + ... + return bytes(buffer) + - .. code:: python + + def read() -> bytes: + buffer = bytearray(1024) + ... + return buffer.take_bytes() + + * - Empty a buffer getting the bytes + - .. code:: python + + buffer = bytearray(1024) + ... + data = bytes(buffer) + buffer.clear() + - .. code:: python + + buffer = bytearray(1024) + ... + data = buffer.take_bytes() + assert len(buffer) == 0 + + * - Split a buffer at a specific separator + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = bytes(buffer[:n + 1]) + del buffer[:n + 1] + assert buffer == bytearray(b'def') + + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = buffer.take_bytes(n + 1) + assert buffer == bytearray(b'def') + + * - Split a buffer at a specific separator; discard after the separator + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = bytes(buffer[:n]) + buffer.clear() + assert data == b'abc' + assert len(buffer) == 0 + + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + buffer.resize(n) + data = buffer.take_bytes() + assert data == b'abc' + assert len(buffer) == 0 + + .. versionadded:: next + Since bytearray objects are sequences of integers (akin to a list), for a bytearray object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytearray object of length 1. (This contrasts with text strings, where diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index f10e4041937f4f..42d1bec9010e52 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -1451,6 +1451,58 @@ def test_resize(self): self.assertRaises(MemoryError, bytearray().resize, sys.maxsize) self.assertRaises(MemoryError, bytearray(1000).resize, sys.maxsize) + def test_take_bytes(self): + ba = bytearray(b'ab') + self.assertEqual(ba.take_bytes(), b'ab') + self.assertEqual(len(ba), 0) + self.assertEqual(ba, bytearray(b'')) + + # Positive and negative slicing. + ba = bytearray(b'abcdef') + self.assertEqual(ba.take_bytes(1), b'a') + self.assertEqual(ba, bytearray(b'bcdef')) + self.assertEqual(len(ba), 5) + self.assertEqual(ba.take_bytes(-5), b'') + self.assertEqual(ba, bytearray(b'bcdef')) + self.assertEqual(len(ba), 5) + self.assertEqual(ba.take_bytes(-3), b'bc') + self.assertEqual(ba, bytearray(b'def')) + self.assertEqual(len(ba), 3) + self.assertEqual(ba.take_bytes(3), b'def') + self.assertEqual(ba, bytearray(b'')) + self.assertEqual(len(ba), 0) + + # Take nothing from emptiness. + self.assertEqual(ba.take_bytes(0), b'') + self.assertEqual(ba.take_bytes(), b'') + self.assertEqual(ba.take_bytes(None), b'') + + # Out of bounds, bad take value. + self.assertRaises(IndexError, ba.take_bytes, -1) + self.assertRaises(TypeError, ba.take_bytes, 3.14) + ba = bytearray(b'abcdef') + self.assertRaises(IndexError, ba.take_bytes, 7) + + # Offset between physical and logical start (ob_bytes != ob_start). + ba = bytearray(b'abcde') + del ba[:2] + self.assertEqual(ba, bytearray(b'cde')) + self.assertEqual(ba.take_bytes(), b'cde') + + # Overallocation at end. + ba = bytearray(b'abcde') + del ba[-2:] + self.assertEqual(ba, bytearray(b'abc')) + self.assertEqual(ba.take_bytes(), b'abc') + ba = bytearray(b'abcde') + ba.resize(4) + self.assertEqual(ba.take_bytes(), b'abcd') + + # Take of a bytearray with references should fail. + ba = bytearray(b'abc') + with memoryview(ba) as mv: + self.assertRaises(BufferError, ba.take_bytes) + self.assertEqual(ba.take_bytes(), b'abc') def test_setitem(self): def setitem_as_mapping(b, i, val): diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 1445853f2d36d0..1430401e43bdf4 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1495,6 +1495,95 @@ bytearray_resize_impl(PyByteArrayObject *self, Py_ssize_t size) } +/*[clinic input] +@critical_section +bytearray.take_bytes + n: object = None + Bytes to take, negative indexes from end. None indicates all bytes. + / +Take *n* bytes from the bytearray and return them as a bytes object. +[clinic start generated code]*/ + +static PyObject * +bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) +/*[clinic end generated code: output=3147fbc0bbbe8d94 input=b15b5172cdc6deda]*/ +{ + Py_ssize_t to_take, original; + Py_ssize_t size = Py_SIZE(self); + if (Py_IsNone(n)) { + to_take = original = size; + } + // Integer index, from start (zero, positive) or end (negative). + else if (_PyIndex_Check(n)) { + to_take = original = PyNumber_AsSsize_t(n, PyExc_IndexError); + if (to_take == -1 && PyErr_Occurred()) { + return NULL; + } + if (to_take < 0) { + to_take += size; + } + } else { + PyErr_SetString(PyExc_TypeError, "n must be an integer or None"); + return NULL; + } + + if (to_take < 0 || to_take > size) { + PyErr_Format(PyExc_IndexError, + "can't take %d(%d) outside size %d", + original, to_take, size); + return NULL; + } + + // Exports may change the contents, No mutable bytes allowed. + if (!_canresize(self)) { + return NULL; + } + + if (to_take == 0 || size == 0) { + return Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); + } + + // Copy remaining bytes to a new bytes. + PyObject *remaining = NULL; + Py_ssize_t remaining_length = size - to_take; + if (remaining_length > 0) { + // +1 to copy across the null which always ends a bytearray. + remaining = PyBytes_FromStringAndSize(self->ob_start + to_take, + remaining_length + 1); + if (remaining == NULL) { + return NULL; + } + } + + // If the bytes are offset inside the buffer must first align. + if (self->ob_start != self->ob_bytes) { + memmove(self->ob_bytes, self->ob_start, to_take); + self->ob_start = self->ob_bytes; + } + + if (_PyBytes_Resize(&self->ob_bytes_object, to_take) == -1) { + Py_CLEAR(remaining); + return NULL; + } + + // Point the bytearray towards the buffer with the remaining data. + PyObject *result = self->ob_bytes_object; + self->ob_bytes_object = remaining; + if (remaining) { + self->ob_bytes = self->ob_start = PyBytes_AS_STRING(self->ob_bytes_object); + Py_SET_SIZE(self, size - to_take); + FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, size - to_take + 1); + } + else { + self->ob_bytes = self->ob_start = NULL; + Py_SET_SIZE(self, 0); + FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, 0); + } + + return result; +} + + /*[clinic input] @critical_section bytearray.translate @@ -2690,6 +2779,7 @@ static PyMethodDef bytearray_methods[] = { BYTEARRAY_STARTSWITH_METHODDEF BYTEARRAY_STRIP_METHODDEF {"swapcase", bytearray_swapcase, METH_NOARGS, _Py_swapcase__doc__}, + BYTEARRAY_TAKE_BYTES_METHODDEF {"title", bytearray_title, METH_NOARGS, _Py_title__doc__}, BYTEARRAY_TRANSLATE_METHODDEF {"upper", bytearray_upper, METH_NOARGS, _Py_upper__doc__}, diff --git a/Objects/clinic/bytearrayobject.c.h b/Objects/clinic/bytearrayobject.c.h index ffb45ade11f6dc..f466c5ba9013d4 100644 --- a/Objects/clinic/bytearrayobject.c.h +++ b/Objects/clinic/bytearrayobject.c.h @@ -631,6 +631,43 @@ bytearray_resize(PyObject *self, PyObject *arg) return return_value; } +PyDoc_STRVAR(bytearray_take_bytes__doc__, +"take_bytes($self, n=None, /)\n" +"--\n" +"\n" +"Take *n* bytes from the bytearray and return them as a bytes object.\n" +"\n" +" n\n" +" Bytes to take, negative indexes from end. None indicates all bytes."); + +#define BYTEARRAY_TAKE_BYTES_METHODDEF \ + {"take_bytes", _PyCFunction_CAST(bytearray_take_bytes), METH_FASTCALL, bytearray_take_bytes__doc__}, + +static PyObject * +bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n); + +static PyObject * +bytearray_take_bytes(PyObject *self, PyObject *const *args, Py_ssize_t nargs) +{ + PyObject *return_value = NULL; + PyObject *n = Py_None; + + if (!_PyArg_CheckPositional("take_bytes", nargs, 0, 1)) { + goto exit; + } + if (nargs < 1) { + goto skip_optional; + } + n = args[0]; +skip_optional: + Py_BEGIN_CRITICAL_SECTION(self); + return_value = bytearray_take_bytes_impl((PyByteArrayObject *)self, n); + Py_END_CRITICAL_SECTION(); + +exit: + return return_value; +} + PyDoc_STRVAR(bytearray_translate__doc__, "translate($self, table, /, delete=b\'\')\n" "--\n" @@ -1796,4 +1833,4 @@ bytearray_sizeof(PyObject *self, PyObject *Py_UNUSED(ignored)) { return bytearray_sizeof_impl((PyByteArrayObject *)self); } -/*[clinic end generated code: output=be6d28193bc96a2c input=a9049054013a1b77]*/ +/*[clinic end generated code: output=72c5e773da5e2bf3 input=a9049054013a1b77]*/ From e9f5ca9f7b38294a2e08ece086567ba1a2c7817b Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 15 Oct 2025 10:42:26 -0700 Subject: [PATCH 04/43] Update Objects/bytearrayobject.c Co-authored-by: Victor Stinner --- Objects/bytearrayobject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 908ec7d0b204c1..ac6352a8d47ea0 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1549,7 +1549,7 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) if (remaining_length > 0) { // +1 to copy across the null which always ends a bytearray. remaining = PyBytes_FromStringAndSize(self->ob_start + to_take, - remaining_length + 1); + remaining_length + 1); if (remaining == NULL) { return NULL; } From 47849575592cb0ca3b51cef108c3e249bc7dd55c Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 15 Oct 2025 11:40:05 -0700 Subject: [PATCH 05/43] Review fixes --- Doc/library/stdtypes.rst | 9 ++------- .../2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst | 3 ++- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 3db51d1f33134f..3f7ba196cf7019 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -3179,7 +3179,7 @@ objects. bytes. If *n* is negative indexes from the end and takes the first :func:`len` - minus *n* bytes. If *n* is out of bounds raises :exc:`IndexError`. + plus *n* bytes. If *n* is out of bounds raises :exc:`IndexError`. Taking less than the full length will leave remaining bytes in the :class:`bytearray` which requires a copy. If the remaining bytes should be @@ -3188,8 +3188,7 @@ objects. .. impl-detail:: - CPython implements this as a zero-copy operation making it a very - efficient way to make a :class:`bytes` from a :class:`bytearray`. + Taking all bytes is a zero-copy operation. .. list-table:: Suggested Replacements :header-rows: 1 @@ -3225,7 +3224,6 @@ objects. buffer = bytearray(1024) ... data = buffer.take_bytes() - assert len(buffer) == 0 * - Split a buffer at a specific separator - .. code:: python @@ -3241,7 +3239,6 @@ objects. buffer = bytearray(b'abc\ndef') n = buffer.find(b'\n') data = buffer.take_bytes(n + 1) - assert buffer == bytearray(b'def') * - Split a buffer at a specific separator; discard after the separator - .. code:: python @@ -3259,8 +3256,6 @@ objects. n = buffer.find(b'\n') buffer.resize(n) data = buffer.take_bytes() - assert data == b'abc' - assert len(buffer) == 0 .. versionadded:: next diff --git a/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst index 0af1a596136bbe..d4b8578afe3afc 100644 --- a/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst +++ b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-14-18-24-16.gh-issue-139871.SWtuUz.rst @@ -1 +1,2 @@ -Update :class:`bytearray` to use a :class:`bytes` under the hood as its buffer which enables optimizations. +Update :class:`bytearray` to use a :class:`bytes` under the hood as its buffer +and add :func:`bytearray.take_bytes` to take it out. From 451c302eb593d5cc657839e5ecacd7c6e3d31c93 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 11:01:21 -0700 Subject: [PATCH 06/43] Update Objects/bytearrayobject.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maurycy Pawłowski-Wieroński <5383+maurycy@users.noreply.github.com> --- Objects/bytearrayobject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index ac6352a8d47ea0..5913fab776b001 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1529,7 +1529,7 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) if (to_take < 0 || to_take > size) { PyErr_Format(PyExc_IndexError, - "can't take %d(%d) outside size %d", + "can't take %zd(%zd) outside size %zd", original, to_take, size); return NULL; } From bab71515945e710a76856ac5147e6af4ca4b8221 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 15 Oct 2025 14:57:18 -0700 Subject: [PATCH 07/43] Add tests around alloc and getsizeof that show clearing isn't working which is resulting in threading failure --- Lib/test/test_bytes.py | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index 42d1bec9010e52..cfe33c3f83a453 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -1390,6 +1390,15 @@ def test_clear(self): b.append(ord('p')) self.assertEqual(b, b'p') + # Cleared object should be empty. + b = bytearray(b'abc') + b.clear() + self.assertEqual(b.__alloc__(), 0) + self.assertEqual(sys.getsizeof(b), 0) + c = b.copy() + self.assertEqual(c.__alloc__(), 0) + self.assertEqual(sys.getsizeof(c), 0) + def test_copy(self): b = bytearray(b'abc') bb = b.copy() @@ -1456,6 +1465,8 @@ def test_take_bytes(self): self.assertEqual(ba.take_bytes(), b'ab') self.assertEqual(len(ba), 0) self.assertEqual(ba, bytearray(b'')) + self.assertEqual(ba.__alloc__(), 0) + self.assertEqual(sys.getsizeof(ba), 0) # Positive and negative slicing. ba = bytearray(b'abcdef') From cb2377c45c8ea5f7eebaf023e7213200b11a11c0 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 15 Oct 2025 14:57:18 -0700 Subject: [PATCH 08/43] Fix resizing to 0 length / clearing leaving one byte alloc --- Lib/test/test_bytes.py | 8 +++++--- Objects/bytearrayobject.c | 7 ++++++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index cfe33c3f83a453..5ca1b966d9c46a 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -1394,10 +1394,11 @@ def test_clear(self): b = bytearray(b'abc') b.clear() self.assertEqual(b.__alloc__(), 0) - self.assertEqual(sys.getsizeof(b), 0) + base_size = sys.getsizeof(bytearray()) + self.assertEqual(sys.getsizeof(b), base_size) c = b.copy() self.assertEqual(c.__alloc__(), 0) - self.assertEqual(sys.getsizeof(c), 0) + self.assertEqual(sys.getsizeof(c), base_size) def test_copy(self): b = bytearray(b'abc') @@ -1466,7 +1467,8 @@ def test_take_bytes(self): self.assertEqual(len(ba), 0) self.assertEqual(ba, bytearray(b'')) self.assertEqual(ba.__alloc__(), 0) - self.assertEqual(sys.getsizeof(ba), 0) + base_size = sys.getsizeof(bytearray()) + self.assertEqual(sys.getsizeof(ba), base_size) # Positive and negative slicing. ba = bytearray(b'abcdef') diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 5913fab776b001..27cac368a3c143 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -223,6 +223,11 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) if (size < alloc / 2) { /* Major downsize; resize down to exact size */ alloc = size + 1; + + /* If new size is 0; don't need to allocate one byte for null. */ + if (size == 0) { + alloc = 0; + } } else { /* Minor downsize; quick exit */ @@ -238,7 +243,7 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) alloc = size + (size >> 3) + (size < 9 ? 3 : 6); } else { - /* Major upsize; resize up to exact size */ + /* Major upsize; resize up to exact size. Upsize always means size > 0 */ alloc = size + 1; } } From 20175f86cacabd5f1eec8b29f02a155cde01d682 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 18:37:50 -0700 Subject: [PATCH 09/43] review fix: handle NULL return from from PyBytes_FromStringAndSize MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Maurycy Pawłowski-Wieroński <5383+maurycy@users.noreply.github.com> --- Objects/bytearrayobject.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 27cac368a3c143..cfb6b3bf24d021 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -153,6 +153,10 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) else { alloc = size + 1; new->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); + if (new->ob_bytes_object == NULL) { + Py_DECREF(new); + return NULL; + } new->ob_bytes = PyBytes_AsString(new->ob_bytes_object); if (new->ob_bytes == NULL) { Py_DECREF(new); From e48559553290adbbc3875e2e23b2d031d7872783 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 18:43:18 -0700 Subject: [PATCH 10/43] Add take_bytes to test_free_threading --- Lib/test/test_bytes.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index 5ca1b966d9c46a..694b8996d14b8b 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -2622,6 +2622,11 @@ def zfill(b, a): c = a.zfill(0x400000) assert not c or c[-1] not in (0xdd, 0xcd) + def take_bytes(b, a): + b.wait() + c = b.take_bytes() + assert not c or c[0] == 48 # '0' + def check(funcs, a=None, *args): if a is None: a = bytearray(b'0' * 0x400000) @@ -2682,6 +2687,7 @@ def check(funcs, a=None, *args): check([clear] + [splitlines] * 10, bytearray(b'\n' * 0x400)) check([clear] + [startswith] * 10) check([clear] + [strip] * 10) + check([clear] + [take_bytes] * 10) check([clear] + [contains] * 10) check([clear] + [subscript] * 10) From b5535d0207208a2ad5617c760fbc840f935c8705 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 18:47:55 -0700 Subject: [PATCH 11/43] Missed line... --- Lib/test/test_bytes.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index 694b8996d14b8b..23a95dc0d40b62 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -2624,7 +2624,7 @@ def zfill(b, a): def take_bytes(b, a): b.wait() - c = b.take_bytes() + c = a.take_bytes() assert not c or c[0] == 48 # '0' def check(funcs, a=None, *args): From 7c6e8a84d6659f1cd591055f4c570f7cf9cf273f Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 20:00:48 -0700 Subject: [PATCH 12/43] Simplify getting out ob_bytes --- Objects/bytearrayobject.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index cfb6b3bf24d021..1c4fdae5458410 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -157,11 +157,8 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) Py_DECREF(new); return NULL; } - new->ob_bytes = PyBytes_AsString(new->ob_bytes_object); - if (new->ob_bytes == NULL) { - Py_DECREF(new); - return PyErr_NoMemory(); - } + new->ob_bytes = PyBytes_AS_STRING(new->ob_bytes_object); + assert(new->ob_bytes); if (bytes != NULL && size > 0) { memcpy(new->ob_bytes, bytes, size); } From 4e27d13b05df279fd386dfad5eae789a03c371b2 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 17 Oct 2025 20:03:47 -0700 Subject: [PATCH 13/43] Include PyBytesObject in __alloc__ of bytearray. --- Objects/bytearrayobject.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 1c4fdae5458410..80ec5708b5bac0 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -2504,7 +2504,11 @@ static PyObject * bytearray_alloc(PyObject *op, PyObject *Py_UNUSED(ignored)) { PyByteArrayObject *self = _PyByteArray_CAST(op); - return PyLong_FromSsize_t(FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc)); + Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc); + if (alloc > 0) { + alloc += sizeof(PyBytesObject); + } + return PyLong_FromSsize_t(alloc); } /*[clinic input] @@ -2700,8 +2704,13 @@ static PyObject * bytearray_sizeof_impl(PyByteArrayObject *self) /*[clinic end generated code: output=738abdd17951c427 input=e27320fd98a4bc5a]*/ { - size_t res = _PyObject_SIZE(Py_TYPE(self)); - res += (size_t)FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc) * sizeof(char); + Py_ssize_t res = _PyObject_SIZE(Py_TYPE(self)); + Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc) * sizeof(char); + if (alloc > 0) { + res += sizeof(PyBytesObject); + res += alloc * sizeof(char); + } + return PyLong_FromSize_t(res); } From 9887dad8fd7d8fd1fe6582f8f8a4014b22ba4c65 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Sun, 26 Oct 2025 23:21:32 -0700 Subject: [PATCH 14/43] Apply suggestion from @vstinner Co-authored-by: Victor Stinner --- Objects/bytearrayobject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 80ec5708b5bac0..9975a044f470ac 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1181,7 +1181,7 @@ bytearray_dealloc(PyObject *op) "deallocated bytearray object has exported buffers"); PyErr_Print(); } - Py_CLEAR(self->ob_bytes_object); + Py_XDECREF(self->ob_bytes_object); Py_TYPE(self)->tp_free((PyObject *)self); } From 6e4b910c2e4c4fd3d7ebf782f294f7b3943470c8 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Sun, 26 Oct 2025 23:27:47 -0700 Subject: [PATCH 15/43] Don't multiply by sizeof(char) as it's always 1 see: https://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1-or-at-least-char-bit-8 --- Objects/bytearrayobject.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 9975a044f470ac..da10ddef412a85 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -2705,10 +2705,10 @@ bytearray_sizeof_impl(PyByteArrayObject *self) /*[clinic end generated code: output=738abdd17951c427 input=e27320fd98a4bc5a]*/ { Py_ssize_t res = _PyObject_SIZE(Py_TYPE(self)); - Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc) * sizeof(char); + Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc); if (alloc > 0) { res += sizeof(PyBytesObject); - res += alloc * sizeof(char); + res += alloc; } return PyLong_FromSize_t(res); From b6f84036a911753138da09f32d94ab6d21d8d45d Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Sun, 26 Oct 2025 23:50:06 -0700 Subject: [PATCH 16/43] Rely on bytes for end of buffer NULL 1. After __init__ or C construction guarantee ob_bytes_object is set by using empty bytes object. 2. In resize place a null terminator mid-buffer only if required 3. Remove now unneded branches - n == PY_SSIZE_T_MAX checks are redundant with resize checks. - size = 0 is handled by PyBytes_FromStringAndSize - No more alloc + 1; exact resize is exact and bytes does +1 for null - No downsize to 0 special case since alloc == size there. --- Objects/bytearrayobject.c | 86 ++++++++++++--------------------------- 1 file changed, 27 insertions(+), 59 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index da10ddef412a85..c089cf755782d0 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -127,7 +127,6 @@ PyObject * PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) { PyByteArrayObject *new; - Py_ssize_t alloc; if (size < 0) { PyErr_SetString(PyExc_SystemError, @@ -135,7 +134,7 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) return NULL; } - /* Prevent buffer overflow when setting alloc to size+1. */ + /* Prevent buffer overflow when setting alloc to size. */ if (size == PY_SSIZE_T_MAX) { return PyErr_NoMemory(); } @@ -145,27 +144,18 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) return NULL; } - if (size == 0) { - new->ob_bytes_object = NULL; - new->ob_bytes = NULL; - alloc = 0; + new->ob_bytes_object = PyBytes_FromStringAndSize(NULL, size); + if (new->ob_bytes_object == NULL) { + Py_DECREF(new); + return NULL; } - else { - alloc = size + 1; - new->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); - if (new->ob_bytes_object == NULL) { - Py_DECREF(new); - return NULL; - } - new->ob_bytes = PyBytes_AS_STRING(new->ob_bytes_object); - assert(new->ob_bytes); - if (bytes != NULL && size > 0) { - memcpy(new->ob_bytes, bytes, size); - } - new->ob_bytes[size] = '\0'; /* Trailing null byte */ + new->ob_bytes = PyBytes_AS_STRING(new->ob_bytes_object); + assert(new->ob_bytes); + if (bytes != NULL && size > 0) { + memcpy(new->ob_bytes, bytes, size); } Py_SET_SIZE(new, size); - new->ob_alloc = alloc; + new->ob_alloc = size; new->ob_start = new->ob_bytes; new->ob_exports = 0; @@ -218,21 +208,17 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) return -1; } - if (size + logical_offset + 1 <= alloc) { + if (size + logical_offset <= alloc) { /* Current buffer is large enough to host the requested size, decide on a strategy. */ if (size < alloc / 2) { /* Major downsize; resize down to exact size */ - alloc = size + 1; - - /* If new size is 0; don't need to allocate one byte for null. */ - if (size == 0) { - alloc = 0; - } + alloc = size; } else { /* Minor downsize; quick exit */ Py_SET_SIZE(self, size); + /* Add mid-buffer null; end provided by bytes. */ PyByteArray_AS_STRING(self)[size] = '\0'; /* Trailing null */ return 0; } @@ -245,10 +231,11 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) } else { /* Major upsize; resize up to exact size. Upsize always means size > 0 */ - alloc = size + 1; + alloc = size; } } - if (alloc > PY_SSIZE_T_MAX) { + // NOTE: offsetof() logic copied from PyBytesObject_SIZE in bytesobject.c + if (alloc > PY_SSIZE_T_MAX - (offsetof(PyBytesObject, ob_sval) + 1)) { PyErr_NoMemory(); return -1; } @@ -277,7 +264,10 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) obj->ob_bytes = obj->ob_start = PyBytes_AS_STRING(obj->ob_bytes_object); Py_SET_SIZE(self, size); FT_ATOMIC_STORE_SSIZE_RELAXED(obj->ob_alloc, alloc); - obj->ob_bytes[size] = '\0'; /* Trailing null byte */ + if (alloc != size) { + /* Add mid-buffer null; end provided by bytes. */ + obj->ob_bytes[size] = '\0'; + } return 0; } @@ -1550,15 +1540,11 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) } // Copy remaining bytes to a new bytes. - PyObject *remaining = NULL; Py_ssize_t remaining_length = size - to_take; - if (remaining_length > 0) { - // +1 to copy across the null which always ends a bytearray. - remaining = PyBytes_FromStringAndSize(self->ob_start + to_take, - remaining_length + 1); - if (remaining == NULL) { - return NULL; - } + PyObject *remaining = PyBytes_FromStringAndSize(self->ob_start + to_take, + remaining_length); + if (remaining == NULL) { + return NULL; } // If the bytes are offset inside the buffer must first align. @@ -1575,17 +1561,9 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) // Point the bytearray towards the buffer with the remaining data. PyObject *result = self->ob_bytes_object; self->ob_bytes_object = remaining; - if (remaining) { - self->ob_bytes = self->ob_start = PyBytes_AS_STRING(self->ob_bytes_object); - Py_SET_SIZE(self, size - to_take); - FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, size - to_take + 1); - } - else { - self->ob_bytes = self->ob_start = NULL; - Py_SET_SIZE(self, 0); - FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, 0); - } - + self->ob_bytes = self->ob_start = PyBytes_AS_STRING(self->ob_bytes_object); + Py_SET_SIZE(self, remaining_length); + FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, remaining_length); return result; } @@ -1967,11 +1945,6 @@ bytearray_insert_impl(PyByteArrayObject *self, Py_ssize_t index, int item) Py_ssize_t n = Py_SIZE(self); char *buf; - if (n == PY_SSIZE_T_MAX) { - PyErr_SetString(PyExc_OverflowError, - "cannot add more objects to bytearray"); - return NULL; - } if (bytearray_resize_lock_held((PyObject *)self, n + 1) < 0) return NULL; buf = PyByteArray_AS_STRING(self); @@ -2086,11 +2059,6 @@ bytearray_append_impl(PyByteArrayObject *self, int item) { Py_ssize_t n = Py_SIZE(self); - if (n == PY_SSIZE_T_MAX) { - PyErr_SetString(PyExc_OverflowError, - "cannot add more objects to bytearray"); - return NULL; - } if (bytearray_resize_lock_held((PyObject *)self, n + 1) < 0) return NULL; From 28cb8c57ebd2f8604d64110aa33236737ba04dd5 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Mon, 27 Oct 2025 01:04:17 -0700 Subject: [PATCH 17/43] Personal review fixes --- Objects/bytearrayobject.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index c089cf755782d0..547f16aeae7a64 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -230,7 +230,7 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) alloc = size + (size >> 3) + (size < 9 ? 3 : 6); } else { - /* Major upsize; resize up to exact size. Upsize always means size > 0 */ + /* Major upsize; resize up to exact size */ alloc = size; } } @@ -246,18 +246,24 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) Py_MIN(requested_size, Py_SIZE(self))); } + int ret; if (obj->ob_bytes_object == NULL) { obj->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); if (obj->ob_bytes_object == NULL) { + /* Object state valid and resize failed. */ return -1; } + ret = 0; } else { if (_PyBytes_Resize(&obj->ob_bytes_object, alloc) == -1) { - Py_SET_SIZE(self, 0); - obj->ob_bytes = obj->ob_start = NULL; - FT_ATOMIC_STORE_SSIZE_RELAXED(obj->ob_alloc, 0); - return -1; + /* storage gone and resize failed. */ + obj->ob_bytes_object = Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); + size = alloc = 0; + ret = -1; + } + else { + ret = 0; } } @@ -269,7 +275,7 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) obj->ob_bytes[size] = '\0'; } - return 0; + return ret; } int From f03b895019d05e14c09022ebe74aae00ba0d6773 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Mon, 27 Oct 2025 01:21:25 -0700 Subject: [PATCH 18/43] Simplify resize error handling --- Objects/bytearrayobject.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 547f16aeae7a64..7dddc10f36c556 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -246,25 +246,21 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) Py_MIN(requested_size, Py_SIZE(self))); } - int ret; if (obj->ob_bytes_object == NULL) { obj->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); - if (obj->ob_bytes_object == NULL) { - /* Object state valid and resize failed. */ - return -1; - } - ret = 0; } else { - if (_PyBytes_Resize(&obj->ob_bytes_object, alloc) == -1) { - /* storage gone and resize failed. */ - obj->ob_bytes_object = Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); - size = alloc = 0; - ret = -1; - } - else { - ret = 0; - } + _PyBytes_Resize(&obj->ob_bytes_object, alloc); + } + + int ret; + if (obj->ob_bytes_object == NULL) { + obj->ob_bytes_object = Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); + size = alloc = 0; + ret = -1; + } + else { + ret = 0; } obj->ob_bytes = obj->ob_start = PyBytes_AS_STRING(obj->ob_bytes_object); From a45f3c23632199a3b685e16d3f70c256b87b6679 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Mon, 27 Oct 2025 14:14:57 -0700 Subject: [PATCH 19/43] Use right PyLong constructor --- Objects/bytearrayobject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 7dddc10f36c556..3ca8655cc03c42 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -2681,7 +2681,7 @@ bytearray_sizeof_impl(PyByteArrayObject *self) res += alloc; } - return PyLong_FromSize_t(res); + return PyLong_FromSsize_t(res); } static PySequenceMethods bytearray_as_sequence = { From c8943e3b2c4bf06b2048083ac46c9f5835b724d3 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 28 Oct 2025 22:56:08 -0700 Subject: [PATCH 20/43] Add a define for max bytearray size, comment size=0 --- Include/cpython/bytesobject.h | 7 +++++++ Objects/bytearrayobject.c | 32 +++++++++++++++++--------------- Objects/bytesobject.c | 8 +------- 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/Include/cpython/bytesobject.h b/Include/cpython/bytesobject.h index 85bc2b827df8fb..ab35462396bc9a 100644 --- a/Include/cpython/bytesobject.h +++ b/Include/cpython/bytesobject.h @@ -41,6 +41,13 @@ _PyBytes_Join(PyObject *sep, PyObject *iterable) return PyBytes_Join(sep, iterable); } +/* _PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation + for a bytes object of length n should request PyBytesObject_SIZE + n bytes. + + Using _PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves + 3 or 7 bytes per bytes object allocation on a typical system. +*/ +#define _PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) // --- PyBytesWriter API ----------------------------------------------------- diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 3ca8655cc03c42..c2e91b7b47b7c8 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -20,6 +20,9 @@ class bytearray "PyByteArrayObject *" "&PyByteArray_Type" /* For PyByteArray_AS_STRING(). */ char _PyByteArray_empty_string[] = ""; +/* Max number of bytes a bytearray can contain */ +#define PyByteArray_SIZE_MAX ((Py_ssize_t)(PY_SSIZE_T_MAX - _PyBytesObject_SIZE)) + /* Helpers */ static int @@ -134,16 +137,15 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) return NULL; } - /* Prevent buffer overflow when setting alloc to size. */ - if (size == PY_SSIZE_T_MAX) { - return PyErr_NoMemory(); - } - new = PyObject_New(PyByteArrayObject, &PyByteArray_Type); if (new == NULL) { return NULL; } + /* optimization: size=0 bytearray should not allocate space + + PyBytes_FromStringAndSize returns the empty bytes global when size=0 so + no allocation occurs. */ new->ob_bytes_object = PyBytes_FromStringAndSize(NULL, size); if (new->ob_bytes_object == NULL) { Py_DECREF(new); @@ -235,7 +237,7 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) } } // NOTE: offsetof() logic copied from PyBytesObject_SIZE in bytesobject.c - if (alloc > PY_SSIZE_T_MAX - (offsetof(PyBytesObject, ob_sval) + 1)) { + if (alloc > PyByteArray_SIZE_MAX) { PyErr_NoMemory(); return -1; } @@ -299,7 +301,7 @@ PyByteArray_Concat(PyObject *a, PyObject *b) goto done; } - if (va.len > PY_SSIZE_T_MAX - vb.len) { + if (va.len > PyByteArray_SIZE_MAX - vb.len) { PyErr_NoMemory(); goto done; } @@ -343,7 +345,7 @@ bytearray_iconcat_lock_held(PyObject *op, PyObject *other) } Py_ssize_t size = Py_SIZE(self); - if (size > PY_SSIZE_T_MAX - vo.len) { + if (size > PyByteArray_SIZE_MAX - vo.len) { PyBuffer_Release(&vo); return PyErr_NoMemory(); } @@ -377,7 +379,7 @@ bytearray_repeat_lock_held(PyObject *op, Py_ssize_t count) count = 0; } const Py_ssize_t mysize = Py_SIZE(self); - if (count > 0 && mysize > PY_SSIZE_T_MAX / count) { + if (count > 0 && mysize > PyByteArray_SIZE_MAX / count) { return PyErr_NoMemory(); } Py_ssize_t size = mysize * count; @@ -413,7 +415,7 @@ bytearray_irepeat_lock_held(PyObject *op, Py_ssize_t count) } const Py_ssize_t mysize = Py_SIZE(self); - if (count > 0 && mysize > PY_SSIZE_T_MAX / count) { + if (count > 0 && mysize > PyByteArray_SIZE_MAX / count) { return PyErr_NoMemory(); } const Py_ssize_t size = mysize * count; @@ -589,7 +591,7 @@ bytearray_setslice_linear(PyByteArrayObject *self, buf = PyByteArray_AS_STRING(self); } else if (growth > 0) { - if (Py_SIZE(self) > (Py_ssize_t)PY_SSIZE_T_MAX - growth) { + if (Py_SIZE(self) > PyByteArray_SIZE_MAX - growth) { PyErr_NoMemory(); return -1; } @@ -2168,16 +2170,16 @@ bytearray_extend_impl(PyByteArrayObject *self, PyObject *iterable_of_ints) if (len >= buf_size) { Py_ssize_t addition; - if (len == PY_SSIZE_T_MAX) { + if (len == PyByteArray_SIZE_MAX) { Py_DECREF(it); Py_DECREF(bytearray_obj); return PyErr_NoMemory(); } addition = len >> 1; - if (addition > PY_SSIZE_T_MAX - len - 1) - buf_size = PY_SSIZE_T_MAX; + if (addition > PyByteArray_SIZE_MAX - len) + buf_size = PyByteArray_SIZE_MAX; else - buf_size = len + addition + 1; + buf_size = len + addition; if (bytearray_resize_lock_held((PyObject *)bytearray_obj, buf_size) < 0) { Py_DECREF(it); Py_DECREF(bytearray_obj); diff --git a/Objects/bytesobject.c b/Objects/bytesobject.c index 9c807b3dd166ee..ecc53fb12e234c 100644 --- a/Objects/bytesobject.c +++ b/Objects/bytesobject.c @@ -25,13 +25,7 @@ class bytes "PyBytesObject *" "&PyBytes_Type" #include "clinic/bytesobject.c.h" -/* PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation - for a bytes object of length n should request PyBytesObject_SIZE + n bytes. - - Using PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves - 3 or 7 bytes per bytes object allocation on a typical system. -*/ -#define PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) +#define PyBytesObject_SIZE _PyBytesObject_SIZE /* Forward declaration */ static void* _PyBytesWriter_ResizeAndUpdatePointer(PyBytesWriter *writer, From 5bffb7e68fd0059f2874a2bff3499db35a39c659 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 28 Oct 2025 23:03:48 -0700 Subject: [PATCH 21/43] Remove oold comment --- Objects/bytearrayobject.c | 1 - 1 file changed, 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index c2e91b7b47b7c8..ebe93caf44927b 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -236,7 +236,6 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) alloc = size; } } - // NOTE: offsetof() logic copied from PyBytesObject_SIZE in bytesobject.c if (alloc > PyByteArray_SIZE_MAX) { PyErr_NoMemory(); return -1; From 583ea4b7288b040395571c0862e334c94801d06a Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 28 Oct 2025 23:23:43 -0700 Subject: [PATCH 22/43] Update test_capi.test_bytearray for MemoryError vs OverflowError --- Lib/test/test_capi/test_bytearray.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Lib/test/test_capi/test_bytearray.py b/Lib/test/test_capi/test_bytearray.py index 52565ea34c61b8..106a141ddfef5a 100644 --- a/Lib/test/test_capi/test_bytearray.py +++ b/Lib/test/test_capi/test_bytearray.py @@ -1,3 +1,4 @@ +import sys import unittest from test.support import import_helper @@ -55,7 +56,9 @@ def test_fromstringandsize(self): self.assertEqual(fromstringandsize(b'', 0), bytearray()) self.assertEqual(fromstringandsize(NULL, 0), bytearray()) self.assertEqual(len(fromstringandsize(NULL, 3)), 3) - self.assertRaises(MemoryError, fromstringandsize, NULL, PY_SSIZE_T_MAX) + self.assertRaises(OverflowError, fromstringandsize, NULL, PY_SSIZE_T_MAX) + self.assertRaises(MemoryError, fromstringandsize, NULL, + PY_SSIZE_T_MAX-sys.getsizeof(b'')) self.assertRaises(SystemError, fromstringandsize, b'abc', -1) self.assertRaises(SystemError, fromstringandsize, b'abc', PY_SSIZE_T_MIN) From 8ee14e6331fa5ccdf88886137006c01b1bbda3a8 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 00:17:08 -0700 Subject: [PATCH 23/43] More accurate size and alloc calculation --- Objects/bytearrayobject.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index ebe93caf44927b..82ac0ca8f4c380 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -2477,7 +2477,7 @@ bytearray_alloc(PyObject *op, PyObject *Py_UNUSED(ignored)) PyByteArrayObject *self = _PyByteArray_CAST(op); Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc); if (alloc > 0) { - alloc += sizeof(PyBytesObject); + alloc += _PyBytesObject_SIZE; } return PyLong_FromSsize_t(alloc); } @@ -2678,8 +2678,7 @@ bytearray_sizeof_impl(PyByteArrayObject *self) Py_ssize_t res = _PyObject_SIZE(Py_TYPE(self)); Py_ssize_t alloc = FT_ATOMIC_LOAD_SSIZE_RELAXED(self->ob_alloc); if (alloc > 0) { - res += sizeof(PyBytesObject); - res += alloc; + res += _PyBytesObject_SIZE + alloc; } return PyLong_FromSsize_t(res); From d70e36908e62fba00184952607e5e95f3795ce4d Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 00:24:17 -0700 Subject: [PATCH 24/43] Comment and minor doc tweaks --- Doc/library/stdtypes.rst | 4 +++- Objects/bytearrayobject.c | 6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 3f7ba196cf7019..31c6b9bd91aff4 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -3200,11 +3200,11 @@ objects. * - Return :class:`bytes` after working with :class:`bytearray` - .. code:: python - def read() -> bytes: buffer = bytearray(1024) ... return bytes(buffer) + - .. code:: python def read() -> bytes: @@ -3219,6 +3219,7 @@ objects. ... data = bytes(buffer) buffer.clear() + - .. code:: python buffer = bytearray(1024) @@ -3232,6 +3233,7 @@ objects. n = buffer.find(b'\n') data = bytes(buffer[:n + 1]) del buffer[:n + 1] + assert data == b'abc' assert buffer == bytearray(b'def') - .. code:: python diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 82ac0ca8f4c380..11cae2cb1c8b97 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -142,7 +142,7 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) return NULL; } - /* optimization: size=0 bytearray should not allocate space + /* Optimization: size=0 bytearray should not allocate space PyBytes_FromStringAndSize returns the empty bytes global when size=0 so no allocation occurs. */ @@ -241,7 +241,7 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) return -1; } - /* re-align data to the start of the allocation. */ + /* Re-align data to the start of the allocation. */ if (logical_offset > 0) { memmove(obj->ob_bytes, obj->ob_start, Py_MIN(requested_size, Py_SIZE(self))); @@ -1533,7 +1533,7 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) return NULL; } - // Exports may change the contents, No mutable bytes allowed. + // Exports may change the contents. No mutable bytes allowed. if (!_canresize(self)) { return NULL; } From 97be8186d4d3144c5f59c47386b78c1fed36767e Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 10:27:40 -0700 Subject: [PATCH 25/43] Apply suggestion from @vstinner Co-authored-by: Victor Stinner --- Objects/bytearrayobject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 11cae2cb1c8b97..ffa4603975a8f4 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1557,7 +1557,7 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) } if (_PyBytes_Resize(&self->ob_bytes_object, to_take) == -1) { - Py_CLEAR(remaining); + Py_DECREF(remaining); return NULL; } From 99e49efa7dc3a57168766ea51c6adca754868576 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 11:20:21 -0700 Subject: [PATCH 26/43] Remove _PyByteArray_empty_string, add bytearray_reinit_from_bytes De-duplicate the code to set `ob_bytes`, `ob_start`, `ob_alloc` and `ob_size`; rely in resize on ob_start being always set. --- Include/cpython/bytearrayobject.h | 8 +---- Objects/bytearrayobject.c | 48 ++++++++++++---------------- Tools/c-analyzer/cpython/ignored.tsv | 1 - 3 files changed, 21 insertions(+), 36 deletions(-) diff --git a/Include/cpython/bytearrayobject.h b/Include/cpython/bytearrayobject.h index f116271ad655b5..904e6ac658cc37 100644 --- a/Include/cpython/bytearrayobject.h +++ b/Include/cpython/bytearrayobject.h @@ -12,19 +12,13 @@ typedef struct { PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */ } PyByteArrayObject; -PyAPI_DATA(char) _PyByteArray_empty_string[]; - /* Macros and static inline functions, trading safety for speed */ #define _PyByteArray_CAST(op) \ (assert(PyByteArray_Check(op)), _Py_CAST(PyByteArrayObject*, op)) static inline char* PyByteArray_AS_STRING(PyObject *op) { - PyByteArrayObject *self = _PyByteArray_CAST(op); - if (Py_SIZE(self)) { - return self->ob_start; - } - return _PyByteArray_empty_string; + return _PyByteArray_CAST(op)->ob_start; } #define PyByteArray_AS_STRING(self) PyByteArray_AS_STRING(_PyObject_CAST(self)) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index ffa4603975a8f4..f5c57161100c6d 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -17,9 +17,6 @@ class bytearray "PyByteArrayObject *" "&PyByteArray_Type" [clinic start generated code]*/ /*[clinic end generated code: output=da39a3ee5e6b4b0d input=5535b77c37a119e0]*/ -/* For PyByteArray_AS_STRING(). */ -char _PyByteArray_empty_string[] = ""; - /* Max number of bytes a bytearray can contain */ #define PyByteArray_SIZE_MAX ((Py_ssize_t)(PY_SSIZE_T_MAX - _PyBytesObject_SIZE)) @@ -46,6 +43,14 @@ _getbytevalue(PyObject* arg, int *value) return 1; } +static void +bytearray_reinit_from_bytes(PyByteArrayObject *self, Py_ssize_t size, + Py_ssize_t alloc) { + self->ob_bytes = self->ob_start = PyBytes_AS_STRING(self->ob_bytes_object); + Py_SET_SIZE(self, size); + FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, alloc); +} + static int bytearray_getbuffer_lock_held(PyObject *self, Py_buffer *view, int flags) { @@ -151,14 +156,10 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) Py_DECREF(new); return NULL; } - new->ob_bytes = PyBytes_AS_STRING(new->ob_bytes_object); - assert(new->ob_bytes); + bytearray_reinit_from_bytes(new, size, size); if (bytes != NULL && size > 0) { memcpy(new->ob_bytes, bytes, size); } - Py_SET_SIZE(new, size); - new->ob_alloc = size; - new->ob_start = new->ob_bytes; new->ob_exports = 0; return (PyObject *)new; @@ -247,26 +248,12 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) Py_MIN(requested_size, Py_SIZE(self))); } - if (obj->ob_bytes_object == NULL) { - obj->ob_bytes_object = PyBytes_FromStringAndSize(NULL, alloc); - } - else { - _PyBytes_Resize(&obj->ob_bytes_object, alloc); - } - - int ret; - if (obj->ob_bytes_object == NULL) { + int ret = _PyBytes_Resize(&obj->ob_bytes_object, alloc); + if (ret == -1) { obj->ob_bytes_object = Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); size = alloc = 0; - ret = -1; } - else { - ret = 0; - } - - obj->ob_bytes = obj->ob_start = PyBytes_AS_STRING(obj->ob_bytes_object); - Py_SET_SIZE(self, size); - FT_ATOMIC_STORE_SSIZE_RELAXED(obj->ob_alloc, alloc); + bytearray_reinit_from_bytes(obj, size, alloc); if (alloc != size) { /* Add mid-buffer null; end provided by bytes. */ obj->ob_bytes[size] = '\0'; @@ -904,6 +891,13 @@ bytearray___init___impl(PyByteArrayObject *self, PyObject *arg, PyObject *it; PyObject *(*iternext)(PyObject *); + /* First __init__; set ob_bytes_object so ob_bytes is always non-null. */ + if (self->ob_bytes_object == NULL) { + self->ob_bytes_object = Py_GetConstant(Py_CONSTANT_EMPTY_BYTES); + bytearray_reinit_from_bytes(self, 0, 0); + self->ob_exports = 0; + } + if (Py_SIZE(self) != 0) { /* Empty previous contents (yes, do this first of all!) */ if (PyByteArray_Resize((PyObject *)self, 0) < 0) @@ -1564,9 +1558,7 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) // Point the bytearray towards the buffer with the remaining data. PyObject *result = self->ob_bytes_object; self->ob_bytes_object = remaining; - self->ob_bytes = self->ob_start = PyBytes_AS_STRING(self->ob_bytes_object); - Py_SET_SIZE(self, remaining_length); - FT_ATOMIC_STORE_SSIZE_RELAXED(self->ob_alloc, remaining_length); + bytearray_reinit_from_bytes(self, remaining_length, remaining_length); return result; } diff --git a/Tools/c-analyzer/cpython/ignored.tsv b/Tools/c-analyzer/cpython/ignored.tsv index c3b13d69f0de8e..1e4d87d93d9273 100644 --- a/Tools/c-analyzer/cpython/ignored.tsv +++ b/Tools/c-analyzer/cpython/ignored.tsv @@ -324,7 +324,6 @@ Modules/pyexpat.c - error_info_of - Modules/pyexpat.c - handler_info - Modules/termios.c - termios_constants - Modules/timemodule.c init_timezone YEAR - -Objects/bytearrayobject.c - _PyByteArray_empty_string - Objects/complexobject.c - c_1 - Objects/exceptions.c - static_exceptions - Objects/genobject.c - ASYNC_GEN_IGNORED_EXIT_MSG - From f4b62d96ef66fb17cf9e31d863b7b55a331d3391 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 11:51:19 -0700 Subject: [PATCH 27/43] Update Stable API concerns: restore _empty_string an dmove _PyBytesObject_SIZE --- Include/cpython/bytearrayobject.h | 4 ++++ Include/cpython/bytesobject.h | 8 -------- Include/internal/pycore_bytesobject.h | 8 ++++++++ 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/Include/cpython/bytearrayobject.h b/Include/cpython/bytearrayobject.h index 904e6ac658cc37..90c556cc878e90 100644 --- a/Include/cpython/bytearrayobject.h +++ b/Include/cpython/bytearrayobject.h @@ -12,6 +12,10 @@ typedef struct { PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */ } PyByteArrayObject; +/* Part of the stable ABI; used to be used as a fallback in + PyByteArray_AS_STRING. */ +PyAPI_DATA(char) _PyByteArray_empty_string[]; + /* Macros and static inline functions, trading safety for speed */ #define _PyByteArray_CAST(op) \ (assert(PyByteArray_Check(op)), _Py_CAST(PyByteArrayObject*, op)) diff --git a/Include/cpython/bytesobject.h b/Include/cpython/bytesobject.h index ab35462396bc9a..395510c8d04ac6 100644 --- a/Include/cpython/bytesobject.h +++ b/Include/cpython/bytesobject.h @@ -41,14 +41,6 @@ _PyBytes_Join(PyObject *sep, PyObject *iterable) return PyBytes_Join(sep, iterable); } -/* _PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation - for a bytes object of length n should request PyBytesObject_SIZE + n bytes. - - Using _PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves - 3 or 7 bytes per bytes object allocation on a typical system. -*/ -#define _PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) - // --- PyBytesWriter API ----------------------------------------------------- typedef struct PyBytesWriter PyBytesWriter; diff --git a/Include/internal/pycore_bytesobject.h b/Include/internal/pycore_bytesobject.h index c7bc53b6073770..8e8fa696ee0350 100644 --- a/Include/internal/pycore_bytesobject.h +++ b/Include/internal/pycore_bytesobject.h @@ -60,6 +60,14 @@ PyAPI_FUNC(void) _PyBytes_Repeat(char* dest, Py_ssize_t len_dest, const char* src, Py_ssize_t len_src); +/* _PyBytesObject_SIZE gives the basic size of a bytes object; any memory allocation + for a bytes object of length n should request PyBytesObject_SIZE + n bytes. + + Using _PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves + 3 or 7 bytes per bytes object allocation on a typical system. +*/ +#define _PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) + /* --- PyBytesWriter ------------------------------------------------------ */ struct PyBytesWriter { From 48afb623f839d6156d8d7acc9ed67d03457301a3 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 12:23:24 -0700 Subject: [PATCH 28/43] Restore _PyByteArray_empty_string in .c file --- Include/cpython/bytearrayobject.h | 4 ---- Objects/bytearrayobject.c | 5 +++++ Tools/c-analyzer/cpython/ignored.tsv | 2 ++ 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/Include/cpython/bytearrayobject.h b/Include/cpython/bytearrayobject.h index 90c556cc878e90..904e6ac658cc37 100644 --- a/Include/cpython/bytearrayobject.h +++ b/Include/cpython/bytearrayobject.h @@ -12,10 +12,6 @@ typedef struct { PyObject *ob_bytes_object; /* PyBytes for zero-copy bytes conversion */ } PyByteArrayObject; -/* Part of the stable ABI; used to be used as a fallback in - PyByteArray_AS_STRING. */ -PyAPI_DATA(char) _PyByteArray_empty_string[]; - /* Macros and static inline functions, trading safety for speed */ #define _PyByteArray_CAST(op) \ (assert(PyByteArray_Check(op)), _Py_CAST(PyByteArrayObject*, op)) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index f5c57161100c6d..164e1413b295e7 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -17,6 +17,11 @@ class bytearray "PyByteArrayObject *" "&PyByteArray_Type" [clinic start generated code]*/ /*[clinic end generated code: output=da39a3ee5e6b4b0d input=5535b77c37a119e0]*/ +/* Part of the stable ABI; used to be used as a fallback in + PyByteArray_AS_STRING. */ +PyAPI_DATA(char) _PyByteArray_empty_string[]; +char _PyByteArray_empty_string[] = ""; + /* Max number of bytes a bytearray can contain */ #define PyByteArray_SIZE_MAX ((Py_ssize_t)(PY_SSIZE_T_MAX - _PyBytesObject_SIZE)) diff --git a/Tools/c-analyzer/cpython/ignored.tsv b/Tools/c-analyzer/cpython/ignored.tsv index 1e4d87d93d9273..8b73189fb07dc5 100644 --- a/Tools/c-analyzer/cpython/ignored.tsv +++ b/Tools/c-analyzer/cpython/ignored.tsv @@ -194,6 +194,7 @@ Python/pyfpe.c - PyFPE_counter - Python/import.c - pkgcontext - Python/pystate.c - _Py_tss_tstate - Python/pystate.c - _Py_tss_gilstate - +Python/pystate.c - _Py_tss_interp - ##----------------------- ## should be const @@ -324,6 +325,7 @@ Modules/pyexpat.c - error_info_of - Modules/pyexpat.c - handler_info - Modules/termios.c - termios_constants - Modules/timemodule.c init_timezone YEAR - +Objects/bytearrayobject.c - _PyByteArray_empty_string - Objects/complexobject.c - c_1 - Objects/exceptions.c - static_exceptions - Objects/genobject.c - ASYNC_GEN_IGNORED_EXIT_MSG - From 8c81e0326ac9f4d37f5b2a7b4eb0fa3d13960989 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Wed, 29 Oct 2025 12:29:57 -0700 Subject: [PATCH 29/43] remove line that shouldn't have been added --- Tools/c-analyzer/cpython/ignored.tsv | 1 - 1 file changed, 1 deletion(-) diff --git a/Tools/c-analyzer/cpython/ignored.tsv b/Tools/c-analyzer/cpython/ignored.tsv index 8b73189fb07dc5..c3b13d69f0de8e 100644 --- a/Tools/c-analyzer/cpython/ignored.tsv +++ b/Tools/c-analyzer/cpython/ignored.tsv @@ -194,7 +194,6 @@ Python/pyfpe.c - PyFPE_counter - Python/import.c - pkgcontext - Python/pystate.c - _Py_tss_tstate - Python/pystate.c - _Py_tss_gilstate - -Python/pystate.c - _Py_tss_interp - ##----------------------- ## should be const From 313e78cbd246d786e00010c8c384f822c09ce7b3 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 30 Oct 2025 13:21:03 -0700 Subject: [PATCH 30/43] Apply suggestion from @encukou Co-authored-by: Petr Viktorin --- Objects/bytearrayobject.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 164e1413b295e7..82e5b1821ae58a 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1527,8 +1527,8 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) if (to_take < 0 || to_take > size) { PyErr_Format(PyExc_IndexError, - "can't take %zd(%zd) outside size %zd", - original, to_take, size); + "can't take %zd bytes outside size %zd", + to_take, size); return NULL; } From c028e2b9b2c56bd995dbfc94b785a53e16e75c47 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 30 Oct 2025 13:43:49 -0700 Subject: [PATCH 31/43] Remove original variable, no longer used --- Objects/bytearrayobject.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 82e5b1821ae58a..052fbe4ed0c1e2 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1506,14 +1506,14 @@ static PyObject * bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) /*[clinic end generated code: output=3147fbc0bbbe8d94 input=b15b5172cdc6deda]*/ { - Py_ssize_t to_take, original; + Py_ssize_t to_take; Py_ssize_t size = Py_SIZE(self); if (Py_IsNone(n)) { - to_take = original = size; + to_take = size; } // Integer index, from start (zero, positive) or end (negative). else if (_PyIndex_Check(n)) { - to_take = original = PyNumber_AsSsize_t(n, PyExc_IndexError); + to_take = PyNumber_AsSsize_t(n, PyExc_IndexError); if (to_take == -1 && PyErr_Occurred()) { return NULL; } From a69b3383d7b88c3414557181d46d0b6d6e2eb52e Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 30 Oct 2025 13:45:07 -0700 Subject: [PATCH 32/43] remove _PyByteArray_empty_string --- Objects/bytearrayobject.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 052fbe4ed0c1e2..535f7b10858ea8 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -17,11 +17,6 @@ class bytearray "PyByteArrayObject *" "&PyByteArray_Type" [clinic start generated code]*/ /*[clinic end generated code: output=da39a3ee5e6b4b0d input=5535b77c37a119e0]*/ -/* Part of the stable ABI; used to be used as a fallback in - PyByteArray_AS_STRING. */ -PyAPI_DATA(char) _PyByteArray_empty_string[]; -char _PyByteArray_empty_string[] = ""; - /* Max number of bytes a bytearray can contain */ #define PyByteArray_SIZE_MAX ((Py_ssize_t)(PY_SSIZE_T_MAX - _PyBytesObject_SIZE)) From 2a95118eb14cae004364348107732d4bfaf558ef Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 30 Oct 2025 13:51:41 -0700 Subject: [PATCH 33/43] Add take_bytes_n free-threading test Validates take_byes(10) always either is past the end of the input or gets exactly one run of 10 bytes at a 10 byte offset. --- Lib/test/test_bytes.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index 23a95dc0d40b62..e306c3e6e944a1 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -2622,11 +2622,18 @@ def zfill(b, a): c = a.zfill(0x400000) assert not c or c[-1] not in (0xdd, 0xcd) - def take_bytes(b, a): + def take_bytes(b, a): # MODIFIES! b.wait() c = a.take_bytes() assert not c or c[0] == 48 # '0' + def take_bytes_n(b, a): # MODIFIES! + b.wait() + try: + c = a.take_bytes(10) + assert c == b'0123456789' + except IndexError: pass + def check(funcs, a=None, *args): if a is None: a = bytearray(b'0' * 0x400000) @@ -2687,7 +2694,10 @@ def check(funcs, a=None, *args): check([clear] + [splitlines] * 10, bytearray(b'\n' * 0x400)) check([clear] + [startswith] * 10) check([clear] + [strip] * 10) + check([clear] + [take_bytes] * 10) + check([take_bytes_n] * 10, bytearray(b'0123456789' * 0x400)) + check([take_bytes_n] * 10, bytearray(b'0123456789' * 5)) check([clear] + [contains] * 10) check([clear] + [subscript] * 10) From 9680e8aa97fbe7a948792736c30bfb89336f8853 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 31 Oct 2025 14:11:09 -0700 Subject: [PATCH 34/43] Expand comment for ob_alloc --- Include/cpython/bytearrayobject.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Include/cpython/bytearrayobject.h b/Include/cpython/bytearrayobject.h index 904e6ac658cc37..1edd082074206c 100644 --- a/Include/cpython/bytearrayobject.h +++ b/Include/cpython/bytearrayobject.h @@ -5,7 +5,12 @@ /* Object layout */ typedef struct { PyObject_VAR_HEAD - Py_ssize_t ob_alloc; /* How many bytes allocated in ob_bytes */ + /* How many bytes allocated in ob_bytes + + In the current implementation this is equivalent to Py_SIZE(ob_bytes_object). + The value is always loaded and stored atomically for thread safety. + There are API compatibilty concerns with removing so keeping for now. */ + Py_ssize_t ob_alloc; char *ob_bytes; /* Physical backing buffer */ char *ob_start; /* Logical start inside ob_bytes */ Py_ssize_t ob_exports; /* How many buffer exports */ From 02882af2338e530ca9e8f7fbb909804ab7af9f52 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 31 Oct 2025 14:25:43 -0700 Subject: [PATCH 35/43] Add note on memmove tradeoff --- Objects/bytearrayobject.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 535f7b10858ea8..9c6ab98df48991 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -244,6 +244,10 @@ bytearray_resize_lock_held(PyObject *self, Py_ssize_t requested_size) /* Re-align data to the start of the allocation. */ if (logical_offset > 0) { + /* optimization tradeoff: This is faster than a new allocation when + the number of bytes being removed in a resize is small; for large + size changes it may be better to just make a new bytes object as + _PyBytes_Resize will do a malloc + memcpy internally. */ memmove(obj->ob_bytes, obj->ob_start, Py_MIN(requested_size, Py_SIZE(self))); } From b67d10cf407b275e2921e443d0640c21890be2c0 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 31 Oct 2025 16:02:32 -0700 Subject: [PATCH 36/43] Move suggested optimizing refactors to whatsnew --- Doc/library/stdtypes.rst | 73 ++---------------------------------- Doc/whatsnew/3.15.rst | 80 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+), 69 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 31c6b9bd91aff4..cfcd4b97a5d00c 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -3190,77 +3190,12 @@ objects. Taking all bytes is a zero-copy operation. - .. list-table:: Suggested Replacements - :header-rows: 1 - - * - Description - - Old - - New - - * - Return :class:`bytes` after working with :class:`bytearray` - - .. code:: python - - def read() -> bytes: - buffer = bytearray(1024) - ... - return bytes(buffer) - - - .. code:: python - - def read() -> bytes: - buffer = bytearray(1024) - ... - return buffer.take_bytes() - - * - Empty a buffer getting the bytes - - .. code:: python - - buffer = bytearray(1024) - ... - data = bytes(buffer) - buffer.clear() - - - .. code:: python - - buffer = bytearray(1024) - ... - data = buffer.take_bytes() - - * - Split a buffer at a specific separator - - .. code:: python - - buffer = bytearray(b'abc\ndef') - n = buffer.find(b'\n') - data = bytes(buffer[:n + 1]) - del buffer[:n + 1] - assert data == b'abc' - assert buffer == bytearray(b'def') - - - .. code:: python - - buffer = bytearray(b'abc\ndef') - n = buffer.find(b'\n') - data = buffer.take_bytes(n + 1) - - * - Split a buffer at a specific separator; discard after the separator - - .. code:: python - - buffer = bytearray(b'abc\ndef') - n = buffer.find(b'\n') - data = bytes(buffer[:n]) - buffer.clear() - assert data == b'abc' - assert len(buffer) == 0 - - - .. code:: python - - buffer = bytearray(b'abc\ndef') - n = buffer.find(b'\n') - buffer.resize(n) - data = buffer.take_bytes() - .. versionadded:: next + See the :ref:`What's New ` entry for + common code patterns which can be optimized with + :func:`bytearray.take_bytes`. + Since bytearray objects are sequences of integers (akin to a list), for a bytearray object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytearray object of length 1. (This contrasts with text strings, where diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst index 56028a92aa2e29..056d6f18446a1a 100644 --- a/Doc/whatsnew/3.15.rst +++ b/Doc/whatsnew/3.15.rst @@ -307,6 +307,86 @@ Other language changes not only integers or floats, although this does not improve precision. (Contributed by Serhiy Storchaka in :gh:`67795`.) +.. _whatsnew315-bytearray-take-bytes: + +* Added :meth:`bytearray.take_bytes(n=None, /) ` to take + bytes out of a :class:`bytearray` without copying. This enables optimizing code + which must return :class:`bytes` after working with a mutable buffer of bytes + such as buffering data, network protocol implementation, encoding, decoding, + and compression. Common code patterns which can be optimized with + :func:`~bytearray.take_bytes` are listed below. + + (Contributed by Cody Maloney in :gh:`139871`.) + + .. list-table:: Suggested Optimizing Refactors + :header-rows: 1 + + * - Description + - Old + - New + + * - Return :class:`bytes` after working with :class:`bytearray` + - .. code:: python + + def read() -> bytes: + buffer = bytearray(1024) + ... + return bytes(buffer) + + - .. code:: python + + def read() -> bytes: + buffer = bytearray(1024) + ... + return buffer.take_bytes() + + * - Empty a buffer getting the bytes + - .. code:: python + + buffer = bytearray(1024) + ... + data = bytes(buffer) + buffer.clear() + + - .. code:: python + + buffer = bytearray(1024) + ... + data = buffer.take_bytes() + + * - Split a buffer at a specific separator + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = bytes(buffer[:n + 1]) + del buffer[:n + 1] + assert data == b'abc' + assert buffer == bytearray(b'def') + + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = buffer.take_bytes(n + 1) + + * - Split a buffer at a specific separator; discard after the separator + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + data = bytes(buffer[:n]) + buffer.clear() + assert data == b'abc' + assert len(buffer) == 0 + + - .. code:: python + + buffer = bytearray(b'abc\ndef') + n = buffer.find(b'\n') + buffer.resize(n) + data = buffer.take_bytes() + New modules =========== From 6db88226f557b77a29cd2a6c6904a28e57f6c3eb Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 31 Oct 2025 16:08:04 -0700 Subject: [PATCH 37/43] Remove unintended change --- Include/cpython/bytesobject.h | 1 + 1 file changed, 1 insertion(+) diff --git a/Include/cpython/bytesobject.h b/Include/cpython/bytesobject.h index 395510c8d04ac6..85bc2b827df8fb 100644 --- a/Include/cpython/bytesobject.h +++ b/Include/cpython/bytesobject.h @@ -41,6 +41,7 @@ _PyBytes_Join(PyObject *sep, PyObject *iterable) return PyBytes_Join(sep, iterable); } + // --- PyBytesWriter API ----------------------------------------------------- typedef struct PyBytesWriter PyBytesWriter; From fb84c141fae18ea46edfa71694c082e33d78d55a Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 4 Nov 2025 11:55:54 -0800 Subject: [PATCH 38/43] Fix intermittent failure on deallocation from uninitialized memory --- Objects/bytearrayobject.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 9c6ab98df48991..cec482fe192c5d 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -147,6 +147,13 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) return NULL; } + /* Fill values used in bytearray_dealloc. + + In an optimized build the memory isn't zeroed and ob_exports would read + uninitialized data when when PyBytes_FromStringAndSize errored leading to + intermittent test failures. */ + new->ob_exports = 0; + /* Optimization: size=0 bytearray should not allocate space PyBytes_FromStringAndSize returns the empty bytes global when size=0 so @@ -160,7 +167,6 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) if (bytes != NULL && size > 0) { memcpy(new->ob_bytes, bytes, size); } - new->ob_exports = 0; return (PyObject *)new; } From 0258891b70a6773c55afb96870e16423779132e2 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 4 Nov 2025 12:04:09 -0800 Subject: [PATCH 39/43] PEP 7 --- Objects/bytearrayobject.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index cec482fe192c5d..0cee66abf657c5 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -1525,7 +1525,8 @@ bytearray_take_bytes_impl(PyByteArrayObject *self, PyObject *n) if (to_take < 0) { to_take += size; } - } else { + } + else { PyErr_SetString(PyExc_TypeError, "n must be an integer or None"); return NULL; } From c4701782cb7dd0db2665f99fb2a2b1e441de86c6 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Tue, 4 Nov 2025 12:06:34 -0800 Subject: [PATCH 40/43] Tweak comment --- Objects/bytearrayobject.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Objects/bytearrayobject.c b/Objects/bytearrayobject.c index 0cee66abf657c5..99bfdec89f6c3a 100644 --- a/Objects/bytearrayobject.c +++ b/Objects/bytearrayobject.c @@ -149,8 +149,8 @@ PyByteArray_FromStringAndSize(const char *bytes, Py_ssize_t size) /* Fill values used in bytearray_dealloc. - In an optimized build the memory isn't zeroed and ob_exports would read - uninitialized data when when PyBytes_FromStringAndSize errored leading to + In an optimized build the memory isn't zeroed and ob_exports would be + uninitialized when when PyBytes_FromStringAndSize errored leading to intermittent test failures. */ new->ob_exports = 0; From 968113564e6bbb29d86f430950d0a1915a277e7e Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 6 Nov 2025 21:02:10 -0800 Subject: [PATCH 41/43] Add +1 so allocation is over max byte length On 32 bit systems with 4GB of ram there may be PY_SSIZE_T_MAX allocatable and the testt would pass as the allocation would succeed. Add a + 1 so that it's over the max bytes length and an OverflowError will be raised by bytes. --- Lib/test/test_capi/test_bytearray.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Lib/test/test_capi/test_bytearray.py b/Lib/test/test_capi/test_bytearray.py index 106a141ddfef5a..cb7ad8b22252d9 100644 --- a/Lib/test/test_capi/test_bytearray.py +++ b/Lib/test/test_capi/test_bytearray.py @@ -57,8 +57,8 @@ def test_fromstringandsize(self): self.assertEqual(fromstringandsize(NULL, 0), bytearray()) self.assertEqual(len(fromstringandsize(NULL, 3)), 3) self.assertRaises(OverflowError, fromstringandsize, NULL, PY_SSIZE_T_MAX) - self.assertRaises(MemoryError, fromstringandsize, NULL, - PY_SSIZE_T_MAX-sys.getsizeof(b'')) + self.assertRaises(OverflowError, fromstringandsize, NULL, + PY_SSIZE_T_MAX-sys.getsizeof(b'') + 1) self.assertRaises(SystemError, fromstringandsize, b'abc', -1) self.assertRaises(SystemError, fromstringandsize, b'abc', PY_SSIZE_T_MIN) From 442692a21817e9d6eda45f07d6ee4ccc53df6ed3 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Thu, 6 Nov 2025 21:45:15 -0800 Subject: [PATCH 42/43] Minor tweak for flow in whatsnew entry --- Doc/whatsnew/3.15.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst index 056d6f18446a1a..b44feb601778cf 100644 --- a/Doc/whatsnew/3.15.rst +++ b/Doc/whatsnew/3.15.rst @@ -312,7 +312,7 @@ Other language changes * Added :meth:`bytearray.take_bytes(n=None, /) ` to take bytes out of a :class:`bytearray` without copying. This enables optimizing code which must return :class:`bytes` after working with a mutable buffer of bytes - such as buffering data, network protocol implementation, encoding, decoding, + such as data buffering, network protocol parsing, encoding, decoding, and compression. Common code patterns which can be optimized with :func:`~bytearray.take_bytes` are listed below. From ee0d6d69c278476a25a9793a9e3546963e254a62 Mon Sep 17 00:00:00 2001 From: Cody Maloney Date: Fri, 7 Nov 2025 09:21:37 -0800 Subject: [PATCH 43/43] Apply suggestions from code review Co-authored-by: Petr Viktorin --- Doc/library/stdtypes.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index cfcd4b97a5d00c..95f72ae88c0b38 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -3175,15 +3175,16 @@ objects. .. method:: take_bytes(n=None, /) - Take the first *n* bytes as an immutable :class:`bytes`. Defaults to all - bytes. + Remove the first *n* bytes from the bytearray and return them as an immutable + :class:`bytes`. + By default (if *n* is ``None``), return all bytes and clear the bytearray. - If *n* is negative indexes from the end and takes the first :func:`len` - plus *n* bytes. If *n* is out of bounds raises :exc:`IndexError`. + If *n* is negative, index from the end and take the first :func:`len` + plus *n* bytes. If *n* is out of bounds, raise :exc:`IndexError`. Taking less than the full length will leave remaining bytes in the - :class:`bytearray` which requires a copy. If the remaining bytes should be - discarded use :func:`~bytearray.resize` or :keyword:`del` to truncate + :class:`bytearray`, which requires a copy. If the remaining bytes should be + discarded, use :func:`~bytearray.resize` or :keyword:`del` to truncate then :func:`~bytearray.take_bytes` without a size. .. impl-detail::