diff --git a/cuda_core/docs/source/developer-guide.rst b/cuda_core/docs/source/developer-guide.rst
new file mode 100644
index 0000000000..e3e110519c
--- /dev/null
+++ b/cuda_core/docs/source/developer-guide.rst
@@ -0,0 +1,1259 @@
+.. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+.. SPDX-License-Identifier: Apache-2.0
+
+CUDA Core Developer Guide
+=========================
+
+This guide defines conventions for Python and Cython code in
+``cuda/core``.
+
+**This project follows** `PEP 8 <https://peps.python.org/pep-0008/>`__
+**as the base style guide and** `PEP
+257 <https://peps.python.org/pep-0257/>`__ **for docstring
+conventions.** The guidance in this document extends these with
+project-specific patterns, particularly for Cython code and the
+structure of this codebase. Standard conventions are not repeated here.
+
+Table of Contents
+-----------------
+
+1.  `File Structure <#file-structure>`__
+2.  `Package Layout <#package-layout>`__
+3.  `Import Statements <#import-statements>`__
+4.  `Class and Function Definitions <#class-and-function-definitions>`__
+5.  `Naming Conventions <#naming-conventions>`__
+6.  `Type Annotations and
+    Declarations <#type-annotations-and-declarations>`__
+7.  `Docstrings <#docstrings>`__
+8.  `Errors and Warnings <#errors-and-warnings>`__
+9.  `CUDA-Specific Patterns <#cuda-specific-patterns>`__
+10. `Development Lifecycle <#development-lifecycle>`__
+
+--------------
+
+File Structure
+--------------
+
+The goal is **readability and maintainability**. A well-organized file
+lets readers quickly find what they're looking for and understand how
+the pieces fit together.
+
+To support this, we suggest organizing content from most important to
+least important: principal classes first, then supporting classes, then
+implementation details. This way, readers can start at the top and
+immediately see what matters most. Unlike C/C++ where definitions must
+precede uses, Python imposes no such constraint, so we're free to optimize
+for the reader.
+
+These are guidelines, not rules. Place helper functions near their call
+sites if that's clearer. Group related code together if it aids
+understanding. When in doubt, choose whatever makes the code easiest to
+read and maintain.
+
+The following is a suggested file organization:
+
+.. _1-spdx-copyright-header:
+
+1. SPDX Copyright Header
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Every file begins with an SPDX copyright header. The pre-commit hook
+adds or updates these automatically.
+
+.. _2-module-docstring-optional:
+
+2. Module Docstring (Optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If present, the module docstring comes immediately after the copyright
+header, before any imports. Per PEP 257, this is the standard location
+for module-level documentation.
+
+.. _3-import-statements:
+
+3. Import Statements
+~~~~~~~~~~~~~~~~~~~~
+
+Imports come next. See `Import Statements <#import-statements>`__ for
+ordering conventions.
+
+.. _4-__all__-declaration-optional:
+
+4. ``__all__`` Declaration (Optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If present, ``__all__`` specifies symbols included in star imports.
+
+.. code:: python
+
+   __all__ = ['DeviceMemoryResource', 'DeviceMemoryResourceOptions']
+
+.. _5-type-aliases-and-constants-optional:
+
+5. Type Aliases and Constants (Optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Type aliases and module-level constants, if any, come next.
+
+.. code:: python
+
+   DevicePointerT = driver.CUdeviceptr | int | None
+   """Type union for device pointer representations."""
+
+   LEGACY_DEFAULT_STREAM = C_LEGACY_DEFAULT_STREAM
+
+.. _6-principal-class-or-function:
+
+6. Principal Class or Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the file centers on a single class or function (e.g., ``_buffer.pyx``
+defines ``Buffer``, ``_device.pyx`` defines ``Device``), that principal
+element comes first among the definitions.
+
+.. _7-other-public-classes-and-functions:
+
+7. Other Public Classes and Functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Other public classes and functions follow. These might include auxiliary
+classes (e.g., ``DeviceMemoryResourceOptions``), abstract base classes,
+or additional exports. Organize them logically, such as by related functionality
+or typical usage.
+
+.. _8-public-module-functions:
+
+8. Public Module Functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Public module-level functions come after classes.
+
+.. _9-private-and-implementation-details:
+
+9. Private and Implementation Details
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Finally, private functions and implementation details: functions
+prefixed with ``_``, ``cdef inline`` helpers, and any specialized code
+that would distract from the principal content.
+
+Example Structure
+~~~~~~~~~~~~~~~~~
+
+.. code:: python
+
+   # <SPDX copyright header>
+   """Module for buffer and memory resource management."""
+
+   from libc.stdint cimport uintptr_t
+   from cuda.core._memory._device_memory_resource cimport DeviceMemoryResource
+   import abc
+
+   __all__ = ['Buffer', 'MemoryResource', 'some_public_function']
+
+   DevicePointerT = driver.CUdeviceptr | int | None
+   """Type union for device pointer representations."""
+
+   cdef class Buffer:
+       """Principal class for this module."""
+       # ...
+
+   cdef class MemoryResource:
+       """Abstract base class."""
+       # ...
+
+   def some_public_function():
+       """Public API function."""
+       # ...
+
+   cdef inline void Buffer_close(Buffer self, stream):
+       """Private implementation helper."""
+       # ...
+
+Notes
+~~~~~
+
+- Not every file will have all sections. For example, a utility module
+  may not have a principal class.
+- The distinction between "principal" and "other" classes is based on
+  the file's primary purpose. If a file exists primarily to define one
+  class, that class is the principal class.
+- Private implementation functions should be placed at the end of the
+  file to keep the public API visible at the top.
+- **Within each section**, prefer logical ordering (e.g., by
+  functionality or typical usage). Alphabetical ordering is a reasonable
+  fallback when no clear logical structure exists.
+
+Package Layout
+--------------
+
+File Types
+~~~~~~~~~~
+
+The ``cuda/core`` package uses three types of files:
+
+1. **.pyx files**: Cython implementation files containing the actual
+   code
+2. **.pxd files**: Cython declaration files containing type
+   definitions and function signatures for C-level access
+3. **.py files**: Pure Python files for utilities and high-level
+   interfaces
+
+File Naming Conventions
+~~~~~~~~~~~~~~~~~~~~~~~
+
+- **Implementation files**: Use ``.pyx`` for Cython code, ``.py`` for
+  pure Python code
+- **Declaration files**: Use ``.pxd`` for Cython type declarations
+- **Private modules**: Prefix with underscore (e.g., ``_buffer.pyx``,
+  ``_device.pyx``)
+- **Public modules**: No underscore prefix (e.g., ``utils.py``)
+
+.. _relationship-between-pxd-and-pyx-files:
+
+Relationship Between ``.pxd`` and ``.pyx`` Files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For each ``.pyx`` file that defines classes or functions used by other
+Cython modules, create a corresponding ``.pxd`` file:
+
+- **.pxd file**: Contains ``cdef`` class declarations,
+  ``cdef``/``cpdef`` function signatures, and ``cdef`` attribute
+  declarations
+- **.pyx file**: Contains the full implementation including Python
+  methods, docstrings, and implementation details
+
+**Example:**
+
+``_buffer.pxd``:
+
+.. code:: python
+
+   cdef class Buffer:
+       cdef:
+           uintptr_t      _ptr
+           size_t         _size
+           MemoryResource _memory_resource
+           object         _ipc_data
+
+``_buffer.pyx``:
+
+.. code:: python
+
+   cdef class Buffer:
+       """Full implementation with methods and docstrings."""
+
+       def close(self, stream=None):
+           """Implementation here."""
+           # ...
+
+Module Organization
+~~~~~~~~~~~~~~~~~~~
+
+Simple Top-Level Modules
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For simple modules at the ``cuda/core`` level, define classes and
+functions directly in the module file with an ``__all__`` list:
+
+.. code:: python
+
+   # _device.pyx
+   __all__ = ['Device', 'DeviceProperties']
+
+   cdef class Device:
+       # ...
+
+   cdef class DeviceProperties:
+       # ...
+
+Complex Subpackages
+^^^^^^^^^^^^^^^^^^^
+
+For complex subpackages that require extra structure (like
+``_memory/``), use the following pattern:
+
+1. **Private submodules**: Each component is implemented in a private
+   submodule (e.g., ``_buffer.pyx``, ``_device_memory_resource.pyx``)
+2. **Submodule __all__**: Each submodule defines its own ``__all__``
+   list
+3. **Subpackage __init__.py**: The subpackage ``__init__.py`` uses
+   ``from ._module import *`` to assemble the package
+
+**Example structure for _memory/ subpackage:**
+
+``_memory/_buffer.pyx``:
+
+.. code:: python
+
+   __all__ = ['Buffer', 'MemoryResource']
+
+   cdef class Buffer:
+       # ...
+
+   cdef class MemoryResource:
+       # ...
+
+``_memory/_device_memory_resource.pyx``:
+
+.. code:: python
+
+   __all__ = ['DeviceMemoryResource', 'DeviceMemoryResourceOptions']
+
+   cdef class DeviceMemoryResourceOptions:
+       # ...
+
+   cdef class DeviceMemoryResource:
+       # ...
+
+``_memory/__init__.py``:
+
+.. code:: python
+
+   from ._buffer import *  # noqa: F403
+   from ._device_memory_resource import *  # noqa: F403
+   from ._graph_memory_resource import *  # noqa: F403
+   from ._ipc import *  # noqa: F403
+   from ._legacy import *  # noqa: F403
+   from ._virtual_memory_resource import *  # noqa: F403
+
+This pattern allows:
+
+- **Modular organization**: Each component lives in its own file
+- **Clear star-import behavior**: Each submodule explicitly defines what
+  it exports via ``__all__``
+- **Clean package interface**: The subpackage ``__init__.py`` assembles
+  all exports into a single namespace
+- **Easier refactoring**: Components can be moved or reorganized without
+  changing the public API
+
+**Migration guidance**: Simple top-level modules can be migrated to this
+subpackage structure when they become sufficiently complex (e.g., when a
+module grows to multiple related classes or when logical grouping would
+improve maintainability).
+
+Guidelines
+~~~~~~~~~~
+
+1. **Always create .pxd files for shared Cython types**: If a class
+   or function is ``cimport``\ ed by other modules, provide a ``.pxd``
+   declaration file.
+
+2. **Keep .pxd files minimal**: Only include declarations needed for
+   Cython compilation. Omit implementation details, docstrings, and
+   Python-only code.
+
+3. **Use __all__ when helpful**: Define ``__all__`` to control
+   exported symbols when it simplifies or clarifies the module
+   structure.
+
+4. **Use from ._module import * in subpackage __init__.py**:
+   This pattern assembles the subpackage API from its submodules. Use
+   ``# noqa: F403`` to suppress linting warnings about wildcard imports.
+
+5. **Migrate to subpackage structure when complex**: When a top-level
+   module becomes complex (multiple related classes, logical grouping
+   needed), consider refactoring to the subpackage pattern.
+
+6. **Separate concerns**: Use ``.py`` files for pure Python utilities,
+   ``.pyx`` files for Cython implementations that need C-level
+   performance.
+
+Import Statements
+-----------------
+
+Import statements must be organized into five groups, in the following
+order.
+
+**Note**: Within each group, imports must be sorted alphabetically. This
+is enforced by pre-commit linters (``ruff``).
+
+.. _1-__future__-imports:
+
+1. ``__future__`` Imports
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``__future__`` imports must come first, before all other imports.
+
+.. code:: python
+
+   from __future__ import annotations
+
+.. _2-external-cimport-statements:
+
+2. External ``cimport`` Statements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+External Cython imports from standard libraries and third-party
+packages. This includes:
+
+- ``libc.*`` (e.g., ``libc.stdint``, ``libc.stdlib``, ``libc.string``)
+- ``cpython``
+- ``cython``
+- ``cuda.bindings`` (CUDA bindings package)
+
+.. code:: python
+
+   cimport cpython
+   from libc.stdint cimport uintptr_t
+   from libc.stdlib cimport malloc, free
+   from cuda.bindings cimport cydriver
+
+.. _3-cuda-core-cimport-statements:
+
+3. cuda-core ``cimport`` Statements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cython imports from within the ``cuda.core`` package.
+
+.. code:: python
+
+   from cuda.core._memory._buffer cimport Buffer, MemoryResource
+   from cuda.core._stream cimport Stream_accept, Stream
+   from cuda.core._utils.cuda_utils cimport (
+       HANDLE_RETURN,
+       check_or_create_options,
+   )
+
+.. _4-external-import-statements:
+
+4. External ``import`` Statements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Regular Python imports from standard libraries and third-party packages.
+This includes:
+
+- Standard library modules (e.g., ``abc``, ``typing``, ``threading``,
+  ``dataclasses``)
+- Third-party packages
+
+.. code:: python
+
+   import abc
+   import threading
+   from dataclasses import dataclass
+
+.. _5-cuda-core-import-statements:
+
+5. cuda-core ``import`` Statements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Regular Python imports from within the ``cuda.core`` package.
+
+.. code:: python
+
+   from cuda.core._context import Context, ContextOptions
+   from cuda.core._dlpack import DLDeviceType, make_py_capsule
+   from cuda.core._utils.cuda_utils import (
+       CUDAError,
+       driver,
+       handle_return,
+   )
+
+Additional Rules
+~~~~~~~~~~~~~~~~
+
+1. **Alphabetical Ordering**: Within each group, imports must be sorted
+   alphabetically by module name. This is enforced by pre-commit
+   linters.
+
+2. **Multi-line Imports**: When importing multiple items from a single
+   module, use parentheses for multi-line formatting:
+
+   .. code:: python
+
+      from cuda.core._utils.cuda_utils cimport (
+          HANDLE_RETURN,
+          check_or_create_options,
+      )
+
+3. **Type-only imports**: With ``from __future__ import annotations``,
+   types can be imported normally even if only used in annotations.
+   Avoid ``TYPE_CHECKING`` blocks (see `Type Annotations and
+   Declarations <#type-annotations-and-declarations>`__ for details).
+
+4. **Blank Lines**: Use blank lines to separate the five import groups.
+   Do not use blank lines within a group unless using multi-line import
+   formatting.
+
+5. **try/except blocks**: Import fallbacks (e.g., for optional
+   dependencies) should be placed in the appropriate group (external or
+   cuda-core) using ``try/except`` blocks.
+
+Example
+~~~~~~~
+
+.. code:: python
+
+   # <SPDX copyright header>
+
+   from __future__ import annotations
+
+   cimport cpython
+   from libc.stdint cimport uintptr_t
+   from libc.stdlib cimport malloc, free
+   from cuda.bindings cimport cydriver
+
+   from cuda.core._memory._buffer cimport Buffer, MemoryResource
+   from cuda.core._utils.cuda_utils cimport HANDLE_RETURN
+
+   import abc
+   from dataclasses import dataclass
+
+   from cuda.core._context import Context
+   from cuda.core._device import Device
+   from cuda.core._utils.cuda_utils import driver
+
+Class and Function Definitions
+------------------------------
+
+Class Definition Order
+~~~~~~~~~~~~~~~~~~~~~~
+
+Within a class definition, the suggested organization is:
+
+1. **Special (dunder) methods**: Methods with names starting and ending
+   with double underscores. By convention, ``__init__`` (or
+   ``__cinit__`` in Cython) should be first among dunder methods, as it
+   defines the class interface.
+
+2. **Methods**: Regular instance methods, class methods
+   (``@classmethod``), and static methods (``@staticmethod``)
+
+3. **Properties**: Properties defined with ``@property`` decorator
+
+**Note**: Within each section, prefer logical ordering (e.g., grouping
+related methods). Alphabetical ordering is acceptable when no clear
+logical structure exists. Developers should use their judgment.
+
+.. _example-1:
+
+Example
+~~~~~~~
+
+.. code:: python
+
+   cdef class Buffer:
+       """Example class demonstrating the ordering."""
+
+       # 1. Special (dunder) methods (__cinit__/__init__ first by convention)
+       def __cinit__(self):
+           """Cython initialization."""
+           # ...
+
+       def __init__(self, *args, **kwargs):
+           """Python initialization."""
+           # ...
+
+       def __buffer__(self, flags: int, /) -> memoryview:
+           """Buffer protocol support."""
+           # ...
+
+       def __dealloc__(self):
+           """Cleanup."""
+           # ...
+
+       def __dlpack__(self, *, stream=None):
+           """DLPack protocol support."""
+           # ...
+
+       def __reduce__(self):
+           """Pickle support."""
+           # ...
+
+       # 2. Methods
+       def close(self, stream=None):
+           """Close the buffer."""
+           # ...
+
+       def copy_from(self, src, *, stream):
+           """Copy data from source buffer."""
+           # ...
+
+       def copy_to(self, dst=None, *, stream):
+           """Copy data to destination buffer."""
+           # ...
+
+       @classmethod
+       def from_handle(cls, ptr, size, mr=None):
+           """Create buffer from handle."""
+           # ...
+
+       def get_ipc_descriptor(self):
+           """Get IPC descriptor."""
+           # ...
+
+       # 3. Properties
+       @property
+       def device_id(self) -> int:
+           """Device ID property."""
+           # ...
+
+       @property
+       def handle(self):
+           """Handle property."""
+           # ...
+
+       @property
+       def size(self) -> int:
+           """Size property."""
+           # ...
+
+Helper Functions
+~~~~~~~~~~~~~~~~
+
+When a class grows long or a method becomes deeply nested, consider
+extracting implementation details into helper functions. The goal is to
+keep class definitions easy to navigate. Readers shouldn't have to scroll
+through hundreds of lines to understand a class's interface.
+
+In Cython files, helpers are typically ``cdef`` or ``cdef inline``
+functions named with the pattern ``ClassName_methodname`` (e.g.,
+``DMR_close``, ``Buffer_close``). Place them at the end of the file or
+near their call sites, whichever aids readability.
+
+**Example:**
+
+.. code:: python
+
+   cdef class DeviceMemoryResource:
+       def close(self):
+           """Close the memory resource."""
+           DMR_close(self)
+
+   # Helper function (at end of file or nearby)
+   cdef inline DMR_close(DeviceMemoryResource self):
+       if self._handle == NULL:
+           return
+       # ... implementation ...
+
+Function Definitions
+~~~~~~~~~~~~~~~~~~~~
+
+For module-level functions (outside of classes), follow the ordering
+specified in `File Structure <#file-structure>`__: principal functions
+first (if applicable), then other public functions, then private
+functions. Within each group, prefer logical ordering; alphabetical
+ordering is a reasonable fallback.
+
+Naming Conventions
+------------------
+
+Follow PEP 8 naming conventions (CamelCase for classes, snake_case for
+functions/variables, UPPER_SNAKE_CASE for constants, leading underscore
+for private names).
+
+Cython ``cdef`` Variables
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider prefixing ``cdef`` variables with ``c_`` to distinguish them
+from Python variables. This improves code readability by making it clear
+which variables are C-level types.
+
+**Preferred:**
+
+.. code:: python
+
+   def copy_to(self, dst: Buffer = None, *, stream: Stream | GraphBuilder) -> Buffer:
+       stream = Stream_accept(stream)
+       cdef size_t c_src_size = self._size
+
+       if dst is None:
+           dst = self._memory_resource.allocate(c_src_size, stream)
+
+       cdef size_t c_dst_size = dst._size
+       if c_dst_size != c_src_size:
+           raise ValueError(f"buffer sizes mismatch: src={c_src_size}, dst={c_dst_size}")
+       # ...
+
+**Also acceptable (if context is clear):**
+
+.. code:: python
+
+   cdef cydriver.CUdevice get_device_from_ctx(
+           cydriver.CUcontext target_ctx, cydriver.CUcontext curr_ctx) except?cydriver.CU_DEVICE_INVALID nogil:
+       cdef bint switch_context = (curr_ctx != target_ctx)
+       cdef cydriver.CUcontext ctx
+       cdef cydriver.CUdevice target_dev
+       # ...
+
+The ``c_`` prefix is particularly helpful when mixing Python and Cython
+variables in the same scope, or when the variable name would otherwise
+be ambiguous.
+
+Type Annotations and Declarations
+---------------------------------
+
+Python Type Annotations
+~~~~~~~~~~~~~~~~~~~~~~~
+
+PEP 604 Union Syntax
+^^^^^^^^^^^^^^^^^^^^
+
+Use the modern `PEP 604 <https://peps.python.org/pep-0604/>`__ union
+syntax (``X | Y``) instead of ``typing.Union`` or ``typing.Optional``.
+
+**Preferred:**
+
+.. code:: python
+
+   def allocate(self, size_t size, stream: Stream | GraphBuilder | None = None) -> Buffer:
+       # ...
+
+   def close(self, stream: Stream | None = None):
+       # ...
+
+**Avoid:**
+
+.. code:: python
+
+   from typing import Optional, Union
+
+   def allocate(self, size_t size, stream: Optional[Union[Stream, GraphBuilder]] = None) -> Buffer:
+       # ...
+
+   def close(self, stream: Optional[Stream] = None):
+       # ...
+
+Forward References and ``from __future__ import annotations``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Where needed, files should include
+``from __future__ import annotations`` at the top (after the SPDX
+header). This enables:
+
+1. **Forward references**: Type annotations can reference types that are
+   defined later in the file or in other modules without requiring
+   ``TYPE_CHECKING`` blocks.
+
+2. **Cleaner syntax**: Annotations are evaluated as strings, avoiding
+   circular import issues.
+
+**Preferred:**
+
+.. code:: python
+
+   from __future__ import annotations
+
+   # Can reference Stream even if it's defined later or in another module
+   def allocate(self, size_t size, stream: Stream | None = None) -> Buffer:
+       # ...
+
+**Avoid:**
+
+.. code:: python
+
+   from typing import TYPE_CHECKING
+
+   if TYPE_CHECKING:
+       from cuda.core._stream import Stream
+
+   def allocate(self, size_t size, stream: Stream | None = None) -> Buffer:
+       # ...
+
+.. _guidelines-1:
+
+Guidelines
+^^^^^^^^^^
+
+1. **Use from __future__ import annotations**: This should be
+   present in all ``.py`` and ``.pyx`` files with type annotations.
+
+2. **Use | for unions**: Prefer ``X | Y | None`` over
+   ``Union[X, Y]`` or ``Optional[X]``.
+
+3. **Avoid TYPE_CHECKING blocks**: With
+   ``from __future__ import annotations``, forward references work
+   without ``TYPE_CHECKING`` guards.
+
+4. **Import types normally**: Even if a type is only used in
+   annotations, import it normally (not in a ``TYPE_CHECKING`` block).
+
+Cython Type Declarations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cython uses ``cdef`` declarations for C-level types. These follow
+different rules:
+
+.. code:: python
+
+   cdef class Buffer:
+       cdef:
+           uintptr_t _ptr
+           size_t _size
+           MemoryResource _memory_resource
+
+For Cython-specific type declarations, see `Cython-Specific
+Features <#cython-specific-features>`__.
+
+Docstrings
+----------
+
+This project uses the **NumPy docstring style** for all documentation.
+This format is well-suited for scientific and technical libraries and
+integrates well with Sphinx documentation generation.
+
+Format Overview
+~~~~~~~~~~~~~~~
+
+Docstrings use triple double-quotes (``"""``) and follow this general
+structure:
+
+.. code:: python
+
+   """Summary line.
+
+   Extended description (optional).
+
+   Parameters
+   ----------
+   param1 : type
+       Description of param1.
+   param2 : type, optional
+       Description of param2. Default is value.
+
+   Returns
+   -------
+   return_type
+       Description of return value.
+
+   Raises
+   ------
+   ExceptionType
+       Description of when this exception is raised.
+
+   Notes
+   -----
+   Additional notes and implementation details.
+
+   Examples
+   --------
+   >>> example_code()
+   result
+   """
+
+Module Docstrings
+~~~~~~~~~~~~~~~~~
+
+Per PEP 257, module docstrings appear at the top of the file,
+immediately after the copyright header and before any imports. They
+provide a brief overview of the module's purpose.
+
+.. code:: python
+
+   # <SPDX copyright header>
+   """Module for managing CUDA device memory resources.
+
+   This module provides classes and functions for allocating and managing
+   device memory using CUDA's stream-ordered memory pool API.
+   """
+
+   from __future__ import annotations
+   # ... imports ...
+
+For simple utility modules, a single-line docstring may suffice:
+
+.. code:: python
+
+   """Utility functions for CUDA error handling."""
+
+Class Docstrings
+~~~~~~~~~~~~~~~~
+
+Class docstrings should include:
+
+1. **Summary line**: A one-line description of the class
+2. **Extended description** (optional): Additional context about the
+   class
+3. **Parameters section**: If the class is callable (has ``__init__``),
+   document constructor parameters
+4. **Attributes section**: Document public attributes (if any)
+5. **Notes section**: Important usage notes, implementation details, or
+   examples
+6. **Examples section**: Usage examples (if helpful)
+
+**Example:**
+
+.. code:: python
+
+   cdef class DeviceMemoryResource(MemoryResource):
+       """
+       A device memory resource managing a stream-ordered memory pool.
+
+       Parameters
+       ----------
+       device_id : :class:`Device` | int
+           Device or device ordinal for which a memory resource is constructed.
+       options : :class:`DeviceMemoryResourceOptions`, optional
+           Memory resource creation options. If None, uses the driver's current
+           or default memory pool for the specified device.
+
+       Attributes
+       ----------
+       device_id : int
+           The device ID associated with this memory resource.
+       is_ipc_enabled : bool
+           Whether this memory resource supports IPC.
+
+       Notes
+       -----
+       To create an IPC-enabled memory resource, specify ``ipc_enabled=True``
+       in the options. IPC-enabled resources can share allocations between
+       processes.
+
+       Examples
+       --------
+       >>> dmr = DeviceMemoryResource(0)
+       >>> buffer = dmr.allocate(1024)
+       """
+
+For simple classes, a brief docstring may be sufficient:
+
+.. code:: python
+
+   @dataclass
+   cdef class DeviceMemoryResourceOptions:
+       """Customizable DeviceMemoryResource options.
+
+       Attributes
+       ----------
+       ipc_enabled : bool, optional
+           Whether to create an IPC-enabled memory pool. Default is False.
+       max_size : int, optional
+           Maximum pool size. Default is 0 (system-dependent).
+       """
+
+Method and Function Docstrings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Method and function docstrings should include:
+
+1. **Summary line**: A one-line description starting with a verb (e.g.,
+   "Allocate", "Return", "Create")
+2. **Extended description** (optional): Additional details about
+   behavior
+3. **Parameters section**: All parameters with types and descriptions
+4. **Returns section**: Return type and description
+5. **Raises section**: Exceptions that may be raised (if any)
+6. **Notes section**: Important implementation details or usage notes
+   (if needed)
+7. **Examples section**: Usage examples (if helpful)
+
+**Example:**
+
+.. code:: python
+
+   def allocate(self, size_t size, stream: Stream | GraphBuilder | None = None) -> Buffer:
+       """Allocate a buffer of the requested size.
+
+       Parameters
+       ----------
+       size : int
+           The size of the buffer to allocate, in bytes.
+       stream : :class:`Stream` | :class:`GraphBuilder`, optional
+           The stream on which to perform the allocation asynchronously.
+           If None, an internal stream is used.
+
+       Returns
+       -------
+       :class:`Buffer`
+           The allocated buffer object, which is accessible on the device
+           that this memory resource was created for.
+
+       Raises
+       ------
+       TypeError
+           If called on a mapped IPC-enabled memory resource.
+       RuntimeError
+           If allocation fails.
+
+       Notes
+       -----
+       The allocated buffer is associated with this memory resource and will
+       be deallocated when the buffer is closed or when this resource is closed.
+       """
+
+For simple functions, a brief docstring may suffice:
+
+.. code:: python
+
+   def get_ipc_descriptor(self) -> IPCBufferDescriptor:
+       """Export a :class:`Buffer` for sharing between processes."""
+
+Property Docstrings
+~~~~~~~~~~~~~~~~~~~
+
+Property docstrings should be concise and focus on what the property
+represents. For read-write properties, document both getter and setter
+behavior.
+
+**Read-only property:**
+
+.. code:: python
+
+   @property
+   def device_id(self) -> int:
+       """Return the device ordinal of this buffer."""
+
+**Read-write property:**
+
+.. code:: python
+
+   @property
+   def peer_accessible_by(self):
+       """
+       Get or set the devices that can access allocations from this memory pool.
+
+       Returns
+       -------
+       tuple of int
+           A tuple of sorted device IDs that currently have peer access to
+           allocations from this memory pool.
+
+       Notes
+       -----
+       When setting, accepts a sequence of :class:`Device` objects or device IDs.
+       Setting to an empty sequence revokes all peer access.
+
+       Examples
+       --------
+       >>> dmr.peer_accessible_by = [1]  # Grant access to device 1
+       >>> assert dmr.peer_accessible_by == (1,)
+       """
+
+Type References in Docstrings
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use Sphinx cross-reference roles to link to other documented objects.
+Use the most specific role for each type:
+
++-------------+---------------------------+-------------------------------------------+
+| Role        | Use for                   | Example                                   |
++=============+===========================+===========================================+
+| ``:class:`` | Classes                   | :literal:`:class:\`Buffer\``              |
++-------------+---------------------------+-------------------------------------------+
+| ``:func:``  | Functions                 | :literal:`:func:\`launch\``               |
++-------------+---------------------------+-------------------------------------------+
+| ``:meth:``  | Methods                   | :literal:`:meth:\`Device.create_stream\`` |
++-------------+---------------------------+-------------------------------------------+
+| ``:attr:``  | Attributes                | :literal:`:attr:\`device_id\``            |
++-------------+---------------------------+-------------------------------------------+
+| ``:mod:``   | Modules                   | :literal:`:mod:\`multiprocessing\``       |
++-------------+---------------------------+-------------------------------------------+
+| ``:obj:``   | Type aliases, other       | :literal:`:obj:\`DevicePointerT\``        |
+|             | objects                   |                                           |
++-------------+---------------------------+-------------------------------------------+
+
+The ``~`` prefix displays only the final component:
+:literal:`:class:\`~cuda.core.Buffer\`` renders as "Buffer" while still
+linking to the full path.
+
+For more details, see the `Sphinx Python domain
+documentation <https://www.sphinx-doc.org/en/master/usage/domains/python.html#cross-referencing-python-objects>`__.
+
+**Example:**
+
+.. code:: python
+
+   def from_handle(
+       ptr: DevicePointerT, size_t size, mr: MemoryResource | None = None
+   ) -> Buffer:
+       """Create a new :class:`Buffer` from a pointer.
+
+       Parameters
+       ----------
+       ptr : :obj:`DevicePointerT`
+           Allocated buffer handle object.
+       size : int
+           Memory size of the buffer.
+       mr : :class:`MemoryResource`, optional
+           Memory resource associated with the buffer.
+       """
+
+.. _guidelines-2:
+
+Guidelines
+~~~~~~~~~~
+
+1. **Always include docstrings**: All public classes, methods,
+   functions, and properties should have docstrings.
+
+2. **Start with a verb**: Summary lines for methods and functions should
+   start with a verb in imperative mood (e.g., "Allocate", "Return",
+   "Create", not "Allocates", "Returns", "Creates").
+
+3. **Be concise but complete**: Provide enough information for users to
+   understand and use the API, but avoid unnecessary verbosity.
+
+4. **Use proper sections**: Include Parameters, Returns, Raises sections
+   when applicable. Use Notes and Examples sections when they add value.
+
+5. **Document optional parameters**: Clearly indicate optional
+   parameters and their default values.
+
+6. **Use type hints**: Type information in docstrings should complement
+   (not duplicate) type annotations. Use docstrings to provide
+   additional context about types.
+
+7. **Cross-reference related APIs**: Use Sphinx cross-references to link
+   to related classes, methods, and attributes.
+
+8. **Keep private methods brief**: Private methods (starting with ``_``)
+   may have minimal docstrings, but should still document non-obvious
+   behavior.
+
+9. **Update docstrings with code changes**: Keep docstrings synchronized
+   with implementation changes.
+
+Errors and Warnings
+-------------------
+
+CUDA Exceptions
+~~~~~~~~~~~~~~~
+
+The project defines custom exceptions for CUDA-specific errors:
+
+- **CUDAError**: Base exception for CUDA driver errors
+- **NVRTCError**: Exception for NVRTC compiler errors (inherits from
+  ``CUDAError``)
+
+Use these instead of generic exceptions when reporting CUDA failures.
+
+CUDA API Error Handling
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In ``nogil`` contexts, use the ``HANDLE_RETURN`` macro:
+
+.. code:: python
+
+   with nogil:
+       HANDLE_RETURN(cydriver.cuMemAlloc(ptr, size))
+
+At the Python level, use ``handle_return()`` or
+``raise_if_driver_error()``:
+
+.. code:: python
+
+   err, = driver.cuMemcpyAsync(dst._ptr, self._ptr, src_size, stream.handle)
+   handle_return((err,))
+
+Warnings
+~~~~~~~~
+
+When emitting warnings, always specify ``stacklevel`` so the warning
+points to the caller:
+
+.. code:: python
+
+   warnings.warn(message, UserWarning, stacklevel=3)
+
+The value depends on call depth: typically ``stacklevel=2`` for direct
+calls, ``stacklevel=3`` when called through a helper.
+
+CUDA-Specific Patterns
+----------------------
+
+GIL Management for CUDA Driver API Calls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For optimized Cython code, release the GIL when calling CUDA driver
+APIs. This improves performance and allows other Python threads to run
+during CUDA operations.
+
+During initial development, it's fine to use the Python ``driver``
+module without releasing the GIL (see `Development
+Lifecycle <#development-lifecycle>`__). GIL release is a performance
+optimization that can be applied once the implementation is correct.
+
+Using ``with nogil:`` Blocks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Wrap ``cydriver`` calls in ``with nogil:`` blocks (or declare entire
+functions as ``nogil``):
+
+.. code:: python
+
+   cdef int value
+   with nogil:
+       HANDLE_RETURN(cydriver.cuDeviceGetAttribute(&value, attr, device_id))
+
+Group multiple driver calls in a single block:
+
+.. code:: python
+
+   cdef int low, high, value
+   with nogil:
+       HANDLE_RETURN(cydriver.cuCtxGetStreamPriorityRange(&low, &high))
+       HANDLE_RETURN(cydriver.cuDeviceGetAttribute(&value, attr, device_id))
+
+Raising Exceptions from ``nogil`` Context
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To raise exceptions from a ``nogil`` context, acquire the GIL first:
+
+.. code:: python
+
+   with gil:
+       raise CUDAError(f"CUDA operation failed: {error}")
+
+Development Lifecycle
+---------------------
+
+Two-Phase Development
+~~~~~~~~~~~~~~~~~~~~~
+
+A common pattern when implementing CUDA functionality is to develop in
+two phases:
+
+1. **Start with Python**: Use the ``driver`` module for a
+   straightforward implementation. Write tests to verify correctness.
+   This allows faster iteration and easier debugging.
+
+2. **Optimize with Cython**: Once the implementation is correct, switch
+   to ``cydriver`` with ``nogil`` blocks and ``HANDLE_RETURN`` for
+   better performance.
+
+This approach separates correctness from optimization. Getting the logic
+right first, with Python's better error messages and stack traces, often
+saves time overall.
+
+Python Implementation
+~~~~~~~~~~~~~~~~~~~~~
+
+Use the ``driver`` module from ``cuda.core._utils.cuda_utils``:
+
+.. code:: python
+
+   from cuda.core._utils.cuda_utils import driver
+   from cuda.core._utils.cuda_utils cimport (
+       _check_driver_error as raise_if_driver_error,
+   )
+
+   def get_attribute(self, attr: int) -> int:
+       err, value = driver.cuDeviceGetAttribute(attr, self._id)
+       raise_if_driver_error(err)
+       return value
+
+Cython Optimization
+~~~~~~~~~~~~~~~~~~~
+
+When ready to optimize, switch to ``cydriver``:
+
+.. code:: python
+
+   from cuda.bindings cimport cydriver
+   from cuda.core._utils.cuda_utils cimport HANDLE_RETURN
+
+   def get_attribute(self, attr: int) -> int:
+       cdef int value
+       with nogil:
+           HANDLE_RETURN(cydriver.cuDeviceGetAttribute(&value, attr, self._id))
+       return value
+
+Key changes:
+
+- Replace ``driver`` with ``cydriver``
+- Wrap calls in ``with nogil:``
+- Use ``HANDLE_RETURN`` instead of ``raise_if_driver_error``
+
+Run tests after optimization to verify behavior is unchanged.
diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst
index b6907de160..cb93225f2c 100644
--- a/cuda_core/docs/source/index.rst
+++ b/cuda_core/docs/source/index.rst
@@ -19,6 +19,7 @@ Welcome to the documentation for ``cuda.core``.
 .. toctree::
    :maxdepth: 1
 
+   developer-guide
    conduct
    license