diff --git a/cuda_core/cuda/core/_version.py b/cuda_core/cuda/core/_version.py index dc33772c41..44683fadb3 100644 --- a/cuda_core/cuda/core/_version.py +++ b/cuda_core/cuda/core/_version.py @@ -2,4 +2,4 @@ # # SPDX-License-Identifier: Apache-2.0 -__version__ = "0.4.2" +__version__ = "0.5.0" diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst index 51e505b59d..5a212c1922 100644 --- a/cuda_core/docs/source/api.rst +++ b/cuda_core/docs/source/api.rst @@ -26,6 +26,7 @@ CUDA runtime Event MemoryResource DeviceMemoryResource + GraphMemoryResource PinnedMemoryResource ManagedMemoryResource LegacyPinnedMemoryResource diff --git a/cuda_core/docs/source/release/0.5.0-notes.rst b/cuda_core/docs/source/release/0.5.0-notes.rst new file mode 100644 index 0000000000..8164ceb4c1 --- /dev/null +++ b/cuda_core/docs/source/release/0.5.0-notes.rst @@ -0,0 +1,72 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. currentmodule:: cuda.core.experimental + +``cuda.core`` 0.5.0 Release Notes +================================= + + +Highlights +---------- + +- Added memory management support (allocation, deallocation, copy, and fill) for CUDA graphs. +- Added :class:`PinnedMemoryResource` and :class:`ManagedMemoryResource` for advanced memory management. +- Added peer access control to :class:`DeviceMemoryResource`. +- Reduced Python overhead and improved performance for calling :func:`launch`, constructing :class:`LaunchConfig`, and accessing :class:`DeviceMemoryResource` attributes. + + +Breaking Changes +---------------- + +The support for setting :attr:`VirtualMemoryResourceOptions.handle_type` to ``"win32"`` is removed. Please reach out to us on GitHub if you have a use case. + +The following APIs have been deprecated and will be removed in 0.6.0: + +- ``cuda.core.experimental.system.driver_version`` has been replaced with + ``cuda.core.experimental.system.get_driver_version()``. +- ``cuda.core.experimental.system.num_devices`` has been replaced with + ``cuda.core.experimental.system.get_num_devices()``. +- ``cuda.core.experimental.system.devices`` has been replaced with + ``cuda.core.experimental.Device.get_all_devices()``. + +Other changes: + +- The :meth:`utils.StridedMemoryView.__init__` constructor is deprecated in favor of the new ``from_*`` classmethods, see below. +- Support for Python 3.9 and 3.13t is dropped. + + +New features +------------ + +- Added :class:`GraphMemoryResource` for allocating and deallocating memory when building a CUDA graph. +- Added :class:`PinnedMemoryResource` and :class:`PinnedMemoryResourceOptions` for managing host-pinned memory pools with optional IPC support. +- Added :class:`ManagedMemoryResource` and :class:`ManagedMemoryResourceOptions` for managing unified memory pools accessible from both host and device. +- Added :meth:`Buffer.fill` method for efficient memory initialization, supporting ``int``, ``bytes``, and general buffer protocol objects. +- :class:`Buffer` can now wrap external memory allocations with an owner object. +- Added alternative constructors :meth:`~utils.StridedMemoryView.from_buffer`, :meth:`~utils.StridedMemoryView.from_dlpack`, and :meth:`~utils.StridedMemoryView.from_cuda_array_interface` + and a new property :attr:`~utils.StridedMemoryView.size` for :class:`~utils.StridedMemoryView`. +- Added :meth:`ProgramOptions.as_bytes` and :meth:`LinkerOptions.as_bytes` public APIs for converting options to backend-specific byte representations. +- Updated :class:`Device` constructor to accept either a :class:`Device` instance or a device ordinal (``int``). +- Added :meth:`Device.get_all_devices` classmethod. +- IPC-imported buffers can now be re-exported to other processes. + + +New examples +------------ + +None. + + +Fixes and enhancements +---------------------- + +- Most CUDA resources can be hashed now. +- Python ``bool`` objects are now converted to C++ ``bool`` type when passed as kernel arguments (previously converted to ``int``). +- Restored v0.3.x :class:`MemoryResource` behaviors and missing MR attributes for backward compatibility. +- Added warning when multiprocessing start method is set to ``'fork'``. +- Fixed potential memory leaks when DLPack capsule creation is interrupted. +- Fixed :class:`VirtualMemoryResource` on Windows platforms. +- Fixed NVRTC program name handling on Windows to avoid filesystem issues. +- Improved test determinism by replacing OS sleep with GPU nanosleep kernel in event timing tests. +- Fixed CUDA graph issues with ``cuda-python==12.6.*``. diff --git a/cuda_core/docs/source/release/0.5.x-notes.rst b/cuda_core/docs/source/release/0.5.x-notes.rst deleted file mode 100644 index 3164ca34d9..0000000000 --- a/cuda_core/docs/source/release/0.5.x-notes.rst +++ /dev/null @@ -1,48 +0,0 @@ -.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -.. SPDX-License-Identifier: Apache-2.0 - -.. currentmodule:: cuda.core.experimental - -``cuda.core`` 0.5.x Release Notes -================================= - - -Highlights ----------- - -None. - - -Breaking Changes ----------------- - -The following APIs have been deprecated and will be removed in 0.6.0: - -- ``cuda.core.experimental.system.driver_version`` has been replaced with - ``cuda.core.experimental.system.get_driver_version()``. -- ``cuda.core.experimental.system.num_devices`` has been replaced with - ``cuda.core.experimental.system.get_num_devices()``. -- ``cuda.core.experimental.system.devices`` has been replaced with - ``cuda.core.experimental.Device.get_all_devices()``. - -New features ------------- - -- Added :class:`PinnedMemoryResource` and :class:`PinnedMemoryResourceOptions` for managing - host-pinned memory pools with optional IPC support. -- Added :class:`ManagedMemoryResource` and :class:`ManagedMemoryResourceOptions` for managing - unified memory pools accessible from both host and device. - - -New examples ------------- - -None. - - -Fixes and enhancements ----------------------- - -- Python ``bool`` objects are now converted to C++ ``bool`` type when passed as kernel - arguments. Previously, they were converted to ``int``. This brings them inline - with ``ctypes.c_bool`` and ``numpy.bool_``. diff --git a/cuda_core/pixi.toml b/cuda_core/pixi.toml index 6be4e1dc3f..8683992cad 100644 --- a/cuda_core/pixi.toml +++ b/cuda_core/pixi.toml @@ -68,7 +68,7 @@ cu12 = { features = ["cu12", "test", "cython-tests"], solve-group = "cu12" } # TODO: check if these can be extracted from pyproject.toml [package] name = "cuda-core" -version = "0.4.2" +version = "0.5.0" [package.build] backend = { name = "pixi-build-python", version = "*" }