Skip to content

[Perf] Enable zero-copy in to_torch() and to_numpy()#450

Merged
hughperkins merged 40 commits intomainfrom
hp/zerocopy-interop
Apr 30, 2026
Merged

[Perf] Enable zero-copy in to_torch() and to_numpy()#450
hughperkins merged 40 commits intomainfrom
hp/zerocopy-interop

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

@hughperkins hughperkins force-pushed the hp/zerocopy-interop branch 2 times, most recently from a7cd921 to c0b3bdc Compare April 5, 2026 17:15
Comment thread python/quadrants/lang/_interop.py Outdated

Provides cached, zero-copy conversion of Fields and Ndarrays to PyTorch tensors using the DLPack protocol.
The returned tensors directly alias the Quadrants device memory -- modifications to either side are visible
in the other (after ``qd.sync()`` on Apple Metal).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(afterqd.sync() on Apple Metal) is inaccurate torch.mps.synchronize() is also required in case of copy.

Comment thread python/quadrants/lang/_interop.py Outdated
)


@functools.lru_cache(maxsize=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRU cache is slow. It should be profiled but I would not be surprised if systematic recomputation is faster. I would recommend defined a global constant instead of a function.

Comment thread python/quadrants/lang/_interop.py Outdated

Args:
is_field: ``True`` for SNode-backed Fields, ``False`` for Ndarrays.
dtype: The Quadrants dtype (e.g. ``qd.f32``). Types not in the C++ DLPack whitelist (f16, u8, …) return False.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid non-ascii char '…'.

Comment thread python/quadrants/lang/_interop.py Outdated
Comment on lines +34 to +35
parts = torch.__version__.replace("+", ".").split(".")[:3]
return tuple(map(int, parts)) > (2, 9, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are already relying on package packing somewhere else, using it here would be more robust. If not, just keep this.

Copy link
Copy Markdown
Collaborator Author

@hughperkins hughperkins Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opus here.

packaging isn't currently a runtime dependency of Quadrants — it only appears in three CI workflow YAMLs (win.yml, manylinux_wheel.yml, macosx.yml), not in pyproject.toml, setup.py, or requirements*.txt. Per the second branch of your suggestion ("if not, just keep this"), I'll leave the ad-hoc parser in place. It handles +cpu/+cu121 local-version suffixes via replace("+", "."), and falls through to False via except ValueError on dev/rc/nightly tags (e.g. 2.10.0a0) — which is the safe default since it just gates off the Metal-DLPack-bytes-offset optimisation rather than crashing.

Comment thread python/quadrants/lang/_interop.py Outdated
Comment on lines +66 to +91
def dlpack_to_torch(obj):
"""Return a cached zero-copy ``torch.Tensor`` view of a Field or Ndarray via DLPack.

On the first call the DLPack capsule is created and wrapped into a ``torch.Tensor`` which is then stored on the
source object as ``_qd_dlpack_tc``. Subsequent calls return the cached tensor directly (O(1)).

On Apple Metal an explicit ``sync()`` is performed after the initial export to ensure data visibility.
"""
try:
return obj._qd_dlpack_tc
except AttributeError:
import torch # pylint: disable=C0415

tc = torch.from_dlpack(obj.to_dlpack())
obj._qd_dlpack_tc = tc
if impl.current_cfg().arch == _ARCH_METAL:
impl.get_runtime().sync()
return tc


def invalidate_zerocopy_cache(obj) -> None:
"""Remove the cached DLPack torch tensor, if any."""
try:
del obj._qd_dlpack_tc
except AttributeError:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caching mechanism is missing transpose. This is important for efficiency.

Copy link
Copy Markdown
Contributor

@duburcqa duburcqa Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caching mechanism is missing numpy view. This is important for efficiency.
NumPy>=1.22 (genesis requires >=1.26) supports np.from_dlpack() natively. Using it would skip importing torch and the torch tensor intermediary for numpy-only workloads. Rather than caching a .numpy() derived from a torch tensor, a np.from_dlpack() view could be cached independently. It also avoids the torch import when the user never uses torch at all.

Comment thread python/quadrants/lang/_ndarray.py Outdated
Comment on lines +78 to +81
from quadrants.lang._interop import ( # pylint: disable=C0415
can_zerocopy,
dlpack_to_torch,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any good reason to keep this import local. No obvious circular deps or direct dependency to torch. Maybe I'm missing something.

Comment thread python/quadrants/lang/_ndarray.py Outdated
Uses zero-copy when possible (all backends except Vulkan, and when the dtype is supported by DLPack).

Args:
copy: ``None``/``False`` return a zero-copy view, ``True`` returns an independent copy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not what None does.

Comment thread python/quadrants/lang/_ndarray.py Outdated
if copy is False:
raise ValueError("Zero-copy not available for this backend/dtype combination")

import torch # pylint: disable=C0415
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer to move local imports in top of its local scope.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complicated change. Won't do.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(added FIXME)

Comment thread python/quadrants/lang/_ndarray.py Outdated
try:
tc = dlpack_to_torch(self)
if tc.device.type == "cpu":
return tc.numpy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very efficient. Not acceptable.

Comment thread python/quadrants/lang/_ndarray.py Outdated
if tc.device.type == "cpu":
return tc.numpy()
except (ImportError, RuntimeError):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favour of using try-except if the condition can be checked. What about storing a flag directly at instance level to keep track of whether zero-copy is supported by this tensor?

Comment thread python/quadrants/lang/field.py Outdated
if copy is False:
raise ValueError("Zero-copy to numpy requires a CPU backend")
elif copy is False:
raise ValueError("Zero-copy not available for this backend/type combination")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to be more specific.

Comment thread python/quadrants/lang/matrix.py Outdated
Comment on lines +1376 to +1378
as_vector = self.m == 1 and not keep_dims
shape_ext = (self.n,) if as_vector else (self.n, self.m)
expected = self.shape + shape_ext
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated logics.

Comment thread python/quadrants/lang/matrix.py Outdated
Comment on lines +1328 to +1330
as_vector = self.m == 1 and not keep_dims
shape_ext = (self.n,) if as_vector else (self.n, self.m)
expected = self.shape + shape_ext
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated logics.

Comment thread python/quadrants/lang/matrix.py Outdated
dlpack_to_torch,
)

if copy is not True and can_zerocopy(is_field=True, dtype=self.dtype, shape=self.shape):
Copy link
Copy Markdown
Contributor

@duburcqa duburcqa Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is wrong. You should prefer zero-copy even if a copy is requested. This is orthogonal. Zero-copy is always faster even if you need to copy in the end.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use zerocopy systematically, and call torch.Tensor.numpy on cpu backend.

Comment thread python/quadrants/lang/_ndarray.py Outdated

if copy is not True and can_zerocopy(is_field=False, dtype=self.dtype):
tc = dlpack_to_torch(self)
return tc.clone() if copy is True else tc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if copy is True is a dead branch. But it should not be. You should just not guard against copy is not True.

Comment thread python/quadrants/lang/_ndarray.py Outdated
dlpack_to_torch,
)

if copy is not True and can_zerocopy(is_field=False, dtype=self.dtype):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_zerocopy(is_field=False, dtype=self.dtype) should be stored as a constant attribute of the instance for efficiency.

Comment thread python/quadrants/lang/_interop.py Outdated
return tc


def invalidate_zerocopy_cache(obj) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cache invalidation for ScalarField and MatrixField have no corresponding _reset hook?

Comment thread python/quadrants/lang/struct.py Outdated
Dict[str, Union[numpy.ndarray, Dict]]: The result NumPy array.
"""
return {k: v.to_numpy() for k, v in self._items}
# Struct children have interleaved SNode memory; DLPack cannot represent this, so always use kernel copy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct members are separate SNode leaves with contiguous storage each, individual members could be zero-copied via DLPack from what I understand.

Comment thread python/quadrants/lang/_ndarray.py Outdated
Comment on lines +84 to +85
tc = dlpack_to_torch(self)
return tc.clone() if copy is True else tc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing call to torch.mps.synchronize() before clone on metal backend.

Comment thread python/quadrants/lang/field.py Outdated
raise ValueError(
f"copy=False is incompatible with device transfer (data on {tc.device}, requested {device})"
)
return tc.to(device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing call to torch.mps.synchronize() on metal backend.

@hughperkins
Copy link
Copy Markdown
Collaborator Author

@duburcqa conversation with Opus about this AI. Thoughts?

User: this code was migrated from genesis, by an AI. Can you compare this code with what is currently in genesis, what is currently in quadrants main, and figure out why the code isnt ssimply automatically correct, from being ported from Genesis?

Assistant:

Why it isn't automatically correct

Three structural reasons, in order of importance:

1. The original lives in the consumer, not the library. qd_to_python is in Genesis. Genesis owns the entire call path: it controls use_zerocopy, gs.backend, and it owns when qd.reset() is called relative to cache lifetime. When you push the same logic down into Quadrants' own classes, you inherit a different threat model: arbitrary user code can call qd.reset() between the cache fill and the next read, so cached views become use-after-free. The PR's commit history shows the AI hitting exactly this — segfaults on AMD GPU and Metal — and patching around it by disabling the numpy cache and changing the default of to_numpy mid-PR. The reviewer's comment #18 (no _reset hook on ScalarField/MatrixField) is the still-unfixed version of the same problem.

2. The AI re-decided design choices instead of transcribing them. Genesis's design has clear invariants that are easy to miss when refactoring: "always try zerocopy first, then clone if requested" (not "guard zerocopy behind copy preference"); "sync MPS twice — once before reading, once after cloning"; "cache transpose and numpy variants too". The AI re-derived the algorithm from intent and got the boundaries wrong in if copy is not True and can_zerocopy(...), dropped the second MPS sync, and shrunk the cache to a single attribute. These map directly to comments #15/#16/#21, #1/#22/#23, and #5/#6.

3. The Quadrants API is a different shape than the Genesis wrapper. Genesis exposes one entry point, so it gets one cache, one set of conditions, one sync sequence. Quadrants has four to six entry points (Ndarray.to_torch, Ndarray._ndarray_to_numpy, Ndarray._ndarray_matrix_to_numpy, ScalarField.to_torch/to_numpy, MatrixField.to_torch/to_numpy, StructField.to_torch/to_numpy). Each got a copy-pasted version of the same boilerplate, drifting slightly between sites — comments #13 and #14 are just "you have the same logic twice in matrix.py", and the situation is the same across files. The Genesis design can't be ported one-method-at-a-time without first factoring a single helper that all the methods call.

A faithful port would have:

  • Put the whole Genesis algorithm into _interop.py as one helper (qd_to_python-equivalent), kept the four cache attributes (_tc/_T_tc/_np/_T_np), replaced gs.use_zerocopy/gs.backend with can_zerocopy(...) evaluated inside that helper, and made every to_torch/to_numpy a one-liner that calls it.
  • Added _reset hooks to every Field/Ndarray subclass before exposing the cache, since Quadrants exposes qd.reset() to users that Genesis never had to model.
  • Carried over both qd.sync() and torch.mps.synchronize() verbatim.
  • Kept the Genesis decision rule literally: "if zerocopy supported, take that path; clone iff copy=True."

The current PR is roughly an interpretation of Genesis rather than a port, and most of duburcqa's review is asking for the original Genesis behaviour back.

User: it sounds like you are recommending we do the actions under 'a faithful port', is that a fair impression?

Assistant:

Yes, that's a fair impression — but with one caveat I should make explicit.

I was describing what a faithful port would have looked like, and that aligns closely with what duburcqa is asking for in the review. So if you accept the reviewer's framing, then yes, "a faithful port" is roughly the recommended path.

The caveat: there's one place where Genesis itself may not be the right target. The use-after-free risk on qd.reset() is real and not present in Genesis (because Genesis owns the reset call). Inside Quadrants, you can't just copy Genesis verbatim — you also need invalidation hooks on every cache holder. So it's "faithful port + reset hooks", not pure faithful port.

Two other things worth flagging before you commit to that direction:

  1. Genesis's design isn't universally praised in the review either. The reviewer's comment [Build] Add clang-tidy linter, and fix lint errors #11 ("don't use try/except, use a flag") is actually a critique of a pattern Genesis uses too. So a fully faithful port would still draw that comment. Minor, but worth knowing.

  2. The to_numpy lifetime question is a genuine design call, not just a port mistake. Your fix commit on 2026-04-20 ("Make to_numpy() always copy unless copy=False is explicitly requested") was reacting to a real segfault. Genesis defaults to zero-copy because Genesis users know about the model. Quadrants has a wider audience and numpy arrays are conventionally expected to outlive their source. Reverting to "default zero-copy for to_numpy" to match Genesis would re-introduce that hazard unless you also fix invalidation. It's worth deciding deliberately rather than just matching Genesis.

So my honest recommendation is: yes, do the faithful-port refactor as the structural backbone (one central helper, full cache shape, both syncs, correct copy=True ⇒ zerocopy-then-clone), AND add reset hooks on all field types, AND have an explicit conversation with duburcqa about the to_numpy default before changing it back.

@github-actions
Copy link
Copy Markdown

Coverage Report (87d7efdfe)

Metric Value
Diff coverage (changed lines only) 78%
Overall project coverage 20%

Total: 572 lines, 128 missing, 78% covered

🔴 python/quadrants/lang/_interop.py (47%)
      1  """Zero-copy tensor interop between Quadrants and PyTorch / NumPy via DLPack.
      2  
      3  Provides cached, zero-copy conversion of Fields and Ndarrays to ``torch.Tensor`` and ``numpy.ndarray`` using DLPack.
      4  Returned tensors / arrays alias Quadrants device memory directly -- modifications on either side are visible on the
      5  other (after the appropriate sync; see :class:`_ZerocopyCache`).
      6  
      7  Public API used by :mod:`quadrants.lang._ndarray`, :mod:`quadrants.lang.field`, :mod:`quadrants.lang.matrix`, and
      8  :mod:`quadrants.lang.struct`:
      9  
     10  * :func:`can_zerocopy` -- predicate (call once, cache result on the source instance).
     11  * :class:`_ZerocopyCache` -- per-instance container for the torch + numpy DLPack views.
     12  * :func:`make_zerocopy_cache_if_supported` -- constructor that registers the owner with the runtime so the cache is
     13    invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++ teardown.
     14  * :func:`get_zerocopy_torch`, :func:`get_zerocopy_numpy` -- thin entry points used by every per-class ``to_torch`` /
     15    ``to_numpy``. Implement the always-zerocopy-then-clone semantic and the Apple Metal double-sync (``qd.sync()`` on
     16    read, ``torch.mps.synchronize()`` after ``.clone()`` / ``.to()``).
     17  """
     18  
🔴   19  from __future__ import annotations
     20  
🔴   21  from typing import TYPE_CHECKING
     22  
🔴   23  import numpy as np
     24  
🔴   25  from quadrants._lib import core as _qd_core
🔴   26  from quadrants.lang import impl
🔴   27  from quadrants.types import primitive_types
     28  
     29  if TYPE_CHECKING:
     30      import torch as _torch_mod  # for type hints only
     31  
     32  # Optional torch import: numpy zero-copy works without torch (np.from_dlpack), so importing this module must not fail
     33  # on torch-less environments (e.g. the Vulkan CI runner).
🔴   34  try:
🔴   35      import torch as _torch
🔴   36      from torch.utils.dlpack import from_dlpack as _torch_from_dlpack
     37  
🔴   38      _HAS_TORCH = True
🔴   39  except ImportError:
🔴   40      _torch = None  # type: ignore[assignment]
🔴   41      _torch_from_dlpack = None  # type: ignore[assignment]
🔴   42      _HAS_TORCH = False
     43  
     44  
🔴   45  _ARCH_METAL = _qd_core.Arch.metal
🔴   46  _ARCH_VULKAN = _qd_core.Arch.vulkan
🔴   47  _ARCH_CPU = frozenset({_qd_core.Arch.x64, _qd_core.Arch.arm64})
     48  
🔴   49  _DLPACK_SUPPORTED_DTYPES = frozenset(
     50      {primitive_types.i32, primitive_types.i64, primitive_types.f32, primitive_types.f64, primitive_types.u1}
     51  )
     52  
     53  
🔴   54  def _compute_torch_mps_supports_dlpack_bytes_offset() -> bool:
     55      """Required for zero-copy export of Fields on Apple Metal.
     56  
     57      Available since torch > 2.9.1 (see pytorch/pytorch#168193).
     58      """
🔴   59      if not _HAS_TORCH:
🔴   60          return False
🔴   61      parts = _torch.__version__.replace("+", ".").split(".")[:3]
🔴   62      try:
🔴   63          return tuple(map(int, parts)) > (2, 9, 1)
🔴   64      except ValueError:
🔴   65          return False
     66  
     67  
     68  # Evaluate once at import. Reviewer feedback on PR #450: lru_cache(maxsize=1) on a zero-arg helper adds per-call
     69  # overhead with no benefit; a module constant is cheaper and clearer.
🔴   70  _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET = _compute_torch_mps_supports_dlpack_bytes_offset()
     71  
     72  
🔴   73  def current_arch_is_cpu() -> bool:
     74      """``True`` when the active Quadrants backend is a CPU backend (x64 or arm64)."""
🟢   75      return impl.current_cfg().arch in _ARCH_CPU
     76  
     77  
🔴   78  def can_zerocopy(
     79      is_field: bool,
     80      dtype=None,
     81      is_scalar_field: bool = False,
     82      shape: tuple[int, ...] = (),
     83      is_aos_struct_member: bool = False,
     84  ) -> bool:
     85      """Check whether zero-copy DLPack export is available for the current backend and data type.
     86  
     87      This is intended to be called **once** at the source instance's construction (or first access) and cached on the
     88      instance, not re-evaluated on every ``to_torch`` / ``to_numpy`` call.
     89  
     90      Args:
     91          is_field: ``True`` for SNode-backed Fields, ``False`` for Ndarrays.
     92          dtype: The Quadrants dtype (e.g. ``qd.f32``). Types not in the C++ DLPack whitelist (f16, u8, ...) return False.
     93          is_scalar_field: ``True`` when the source is a ``ScalarField`` (0-dim DLPack edge-case).
     94          shape: Batch shape of the field/ndarray.
     95          is_aos_struct_member: ``True`` when the field is a member of a multi-member ``StructField``. Quadrants' C++
     96              ``field_to_dlpack`` does not currently emit cell-stride-aware DLPack views for AOS struct members (it
     97              computes contiguous strides at the member dtype size, but the actual stride between consecutive elements of
     98              the same member is ``sizeof(cell)``), so the zerocopy view would interleave neighboring members' bytes.
     99              Force kernel-copy until the C++ export is fixed.
    100  
    101      Returns:
    102          ``True`` if zero-copy via DLPack is supported.
    103      """
🟢  104      if dtype is not None and dtype not in _DLPACK_SUPPORTED_DTYPES:
🟢  105          return False
🟢  106      arch = impl.current_cfg().arch
🟢  107      if arch == _ARCH_VULKAN:
🔴  108          return False
🟢  109      if is_field:
🟢  110          if arch == _ARCH_METAL and not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
🔴  111              return False
    112          # 0-dim ScalarFields lack DLPack bytes_offset support in current PyTorch.
🟢  113          if is_scalar_field and not shape:
🔴  114              return False
🟢  115          if is_aos_struct_member:
🟢  116              return False
🟢  117      return True
    118  
    119  
🔴  120  class _DLPackV1Adapter:
    121      """Wraps a DLPack v0 PyCapsule into a v1-compatible object.
    122  
    123      Quadrants' C++ ``field_to_dlpack`` / ``ndarray_to_dlpack`` return raw PyCapsules (the v0 DLPack protocol). Modern
    124      NumPy (>= 1.23 with strict checks; mandatory in NumPy 2.x) requires the v1 protocol -- an object exposing
    125      ``__dlpack__`` and ``__dlpack_device__``. This adapter is a thin Python-side bridge so we don't have to touch the
    126      C++ DLPack export.
    127  
    128      CPU-only by construction; numpy zero-copy is gated on ``current_arch_is_cpu()`` upstream.
    129      """
    130  
🔴  131      __slots__ = ("_capsule",)
🔴  132      _KDLCPU = 1  # DLDeviceType::kDLCPU
    133  
🔴  134      def __init__(self, capsule):
🟢  135          self._capsule = capsule
    136  
🔴  137      def __dlpack__(self, stream=None):
🟢  138          return self._capsule
    139  
🔴  140      def __dlpack_device__(self):
🔴  141          return (self._KDLCPU, 0)
    142  
    143  
🔴  144  class _ZerocopyCache:
    145      """Per-instance cache of DLPack-backed views into a Quadrants Field / Ndarray.
    146  
    147      Holds two independent slots:
    148  
    149      * ``_tc``: ``torch.Tensor`` filled via ``torch.utils.dlpack.from_dlpack`` on first access.
    150      * ``_np``: ``numpy.ndarray`` filled via ``numpy.from_dlpack`` on first access (CPU only). Independent of torch:
    151        numpy-only workloads never trigger a torch import.
    152  
    153      Each slot is filled lazily and may be ``None`` if not yet requested.
    154  
    155      Lifetime: the cache is invalidated by :func:`PyQuadrants.reset` (via ``cache_holders``) BEFORE the C++ program is
    156      torn down, so the DLPack deleters run while the underlying memory is still valid.
    157      """
    158  
🔴  159      __slots__ = ("_tc", "_np")
    160  
🔴  161      def __init__(self):
🟢  162          self._tc = None
🟢  163          self._np = None
    164  
🔴  165      def _ensure_torch(self, owner) -> "_torch_mod.Tensor":
🟢  166          if not _HAS_TORCH:
🔴  167              raise RuntimeError("torch is not installed; zero-copy to_torch is unavailable")
🟢  168          if self._tc is None:
🟢  169              self._tc = _torch_from_dlpack(owner.to_dlpack())
🟢  170          return self._tc
    171  
🔴  172      def _ensure_numpy(self, owner) -> np.ndarray:
🟢  173          if self._np is None:
🟢  174              self._np = np.from_dlpack(_DLPackV1Adapter(owner.to_dlpack()))
🟢  175          return self._np
    176  
🔴  177      def invalidate(self) -> None:
    178          """Drop both cached views. Idempotent."""
🟢  179          self._tc = None
🟢  180          self._np = None
    181  
    182  
🔴  183  def make_zerocopy_cache_if_supported(
    184      owner,
    185      *,
    186      is_field: bool,
    187      dtype,
    188      is_scalar_field: bool = False,
    189      shape: tuple[int, ...] = (),
    190      is_aos_struct_member: bool = False,
    191  ) -> _ZerocopyCache | None:
    192      """Construct a ``_ZerocopyCache`` for ``owner`` if zero-copy is supported, else return ``None``.
    193  
    194      Also registers ``owner`` with ``pyquadrants.cache_holders`` so its cache is invalidated on ``qd.reset()`` /
    195      ``qd.init()`` BEFORE C++ teardown. The owner must define a method ``_invalidate_zerocopy_cache(self)`` that calls
    196      :meth:`_ZerocopyCache.invalidate`.
    197  
    198      Intended to be called at instance construction (or once via ``cached_property``).
    199      """
🟢  200      if not can_zerocopy(
    201          is_field=is_field,
    202          dtype=dtype,
    203          is_scalar_field=is_scalar_field,
    204          shape=shape,
    205          is_aos_struct_member=is_aos_struct_member,
    206      ):
🟢  207          return None
🟢  208      cache = _ZerocopyCache()
🟢  209      impl.get_runtime().cache_holders.add(owner)
🟢  210      return cache
    211  
    212  
🔴  213  def _metal_sync_runtime() -> None:
    214      """Quadrants -> MPS sync. Required so the DLPack-backed MPS tensor sees pending kernel writes."""
🟢  215      if impl.current_cfg().arch == _ARCH_METAL:
🔴  216          impl.get_runtime().sync()
    217  
    218  
🔴  219  def _metal_sync_torch() -> None:
    220      """MPS -> next operation sync. Required after ``.clone()`` / ``.to()`` so the resulting torch tensor
    221      has actually finished copying before the next user op (or the next Quadrants kernel) sees it."""
🟢  222      if _HAS_TORCH and impl.current_cfg().arch == _ARCH_METAL:
🔴  223          _torch.mps.synchronize()
    224  
    225  
🔴  226  def get_zerocopy_torch(
    227      owner,
    228      *,
    229      copy: bool | None,
    230      device=None,
    231  ) -> "_torch_mod.Tensor | None":
    232      """Zero-copy entry point for ``to_torch``.
    233  
    234      Returns the cached zero-copy view, optionally cloned and / or moved to ``device``. The ``copy`` argument selects
    235      between view (``False`` / ``None``) and independent buffer (``True``); crucially, the zerocopy export is taken
    236      **even when ``copy=True``** and the result is then cloned -- DLPack export is cheaper than the kernel-copy fallback
    237      path even when followed by a full clone.
    238  
    239      Args:
    240          owner: A Field or Ndarray with a ``_zerocopy_cache: _ZerocopyCache | None`` attribute and a ``to_dlpack()``
    241              method.
    242          copy: ``None`` -> view; ``False`` -> view (raises if zerocopy unsupported); ``True`` -> clone.
    243          device: Optional torch device. If different from the view's device, performs a device transfer (incompatible
    244              with ``copy=False``).
    245  
    246      Returns:
    247          The torch tensor, or ``None`` when ``owner._zerocopy_cache is None`` (i.e. zero-copy is not supported for this
    248          instance) and ``copy is not False`` -- the caller should fall back to its kernel-copy path.
    249  
    250      Raises:
    251          ValueError: when ``copy=False`` but zerocopy is unsupported for this instance, or when a device transfer is
    252              required but ``copy=False``.
    253      """
🟢  254      cache: _ZerocopyCache | None = owner._zerocopy_cache
🟢  255      if cache is None:
🟢  256          if copy is False:
🔴  257              raise ValueError("Zero-copy not available for this backend / dtype combination")
🟢  258          return None
    259  
🟢  260      _metal_sync_runtime()
🟢  261      tc = cache._ensure_torch(owner)
    262  
🟢  263      needs_device_transfer = device is not None and tc.device != _torch.device(device)
🟢  264      if needs_device_transfer:
🔴  265          if copy is False:
🔴  266              raise ValueError(
    267                  f"copy=False is incompatible with device transfer (data on {tc.device}, requested {device})"
    268              )
🔴  269          out = tc.to(device)
🔴  270          _metal_sync_torch()
🔴  271          return out
    272  
🟢  273      if copy is True:
🟢  274          out = tc.clone()
🟢  275          _metal_sync_torch()
🟢  276          return out
🟢  277      return tc
    278  
    279  
🔴  280  def get_zerocopy_numpy(
    281      owner,
    282      *,
    283      copy: bool | None,
    284      dtype_target=None,
    285  ) -> np.ndarray | None:
    286      """Zero-copy entry point for ``to_numpy``.
    287  
    288      Numpy zero-copy is available only on CPU backends (numpy arrays cannot reference GPU memory).
    289      Uses ``numpy.from_dlpack`` directly -- no torch import required.
    290  
    291      Args:
    292          owner: A Field or Ndarray with a ``_zerocopy_cache: _ZerocopyCache | None`` attribute and a ``to_dlpack()``
    293              method.
    294          copy: ``None`` -> view; ``False`` -> view (raises if zerocopy unsupported); ``True`` -> copy.
    295          dtype_target: Optional numpy dtype. If different from the view's dtype, performs ``.astype()`` (incompatible
    296              with ``copy=False``).
    297  
    298      Returns:
    299          The numpy array, or ``None`` when zerocopy is unsupported for this instance and ``copy is not False`` -- the
    300          caller should fall back to its kernel-copy path.
    301  
    302      Raises:
    303          ValueError: when ``copy=False`` but zerocopy is unsupported, or when a dtype conversion is required but
    304              ``copy=False``.
    305      """
🟢  306      cache: _ZerocopyCache | None = owner._zerocopy_cache
🟢  307      if cache is None or not current_arch_is_cpu():
🟢  308          if copy is False:
🔴  309              raise ValueError("Zero-copy numpy unavailable (requires a CPU backend and a supported dtype)")
🟢  310          return None
    311  
🟢  312      arr = cache._ensure_numpy(owner)
    313  
🟢  314      if dtype_target is not None and arr.dtype != dtype_target:
🟢  315          if copy is False:
🟢  316              raise ValueError("copy=False is incompatible with dtype conversion")
🔴  317          return arr.astype(dtype_target)
    318  
🟢  319      if copy is True:
🟢  320          return arr.copy()
🔴  321      return arr
🔴 python/quadrants/lang/_ndarray.py (47%)
🔴    3  from functools import cached_property
🔴    9  from quadrants.lang import _interop, _ndarray_pickle, impl
     20      to_pytorch_type,
🔴   70      @cached_property
🔴   71      def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
     72          """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported for this instance.
     73  
     74          Computed once per instance (review feedback: avoid re-checking ``can_zerocopy`` per call). Registers ``self``
     75          with ``pyquadrants.cache_holders`` so the cache is invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++
     76          teardown.
     77          """
🟢   78          return _interop.make_zerocopy_cache_if_supported(self, is_field=False, dtype=self.dtype)
     79  
🔴   80      def _invalidate_zerocopy_cache(self) -> None:
     81          """Hook called by ``impl.reset()`` (via ``pyquadrants.cache_holders``) before C++ teardown."""
🟢   82          cache = self.__dict__.get("_zerocopy_cache")
🟢   83          if cache is not None:
🟢   84              cache.invalidate()
     85  
🔴   86      @python_scope
🔴   87      def to_torch(self, *, copy=None):
     88          """Converts this ndarray to a ``torch.Tensor``.
     89  
     90          Zero-copy via DLPack when supported for this backend/dtype, otherwise an independent kernel-copied tensor.
     91  
     92          Args:
     93              copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
     94                  zero-copy or raises.
     95          """
🟢   96          tc = _interop.get_zerocopy_torch(self, copy=copy)
🟢   97          if tc is not None:
🟢   98              return tc
     99  
🔴  100          import torch  # pylint: disable=C0415
    101  
🔴  102          arr = torch.zeros(size=self.arr.total_shape(), dtype=to_pytorch_type(self.dtype))
🔴  103          from quadrants._kernels import ndarray_to_ext_arr  # pylint: disable=C0415
    104  
🔴  105          ndarray_to_ext_arr(self, arr)
🔴  106          impl.get_runtime().sync()
🔴  107          return arr
    108  
    112  
    113          Note: cache invalidation is handled separately by ``pyquadrants.cache_holders`` BEFORE the C++ program is torn
    114          down; this hook only nulls out the Python-side state.
🔴  179      def _ndarray_to_numpy(self, *, copy=None):
    182          Args:
    183              copy: ``None`` (default) and ``True`` return an independent copy (numpy arrays are conventionally expected
    184                  to outlive their source). ``False`` returns a zero-copy DLPack view (requires CPU backend and a
    185                  supported dtype) or raises ``ValueError``.
    186  
🟢  190          if copy is False:
🔴  191              return _interop.get_zerocopy_numpy(self, copy=False)
    192          # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢  193          arr = _interop.get_zerocopy_numpy(self, copy=True)
🟢  194          if arr is not None:
🟢  195              return arr
    196  
🔴  205      def _ndarray_matrix_to_numpy(self, as_vector, *, copy=None):
    208          Args:
    209              as_vector: Whether to treat as a vector ndarray.
    210              copy: see :meth:`_ndarray_to_numpy`.
    211  
🟢  215          if copy is False:
🔴  216              return _interop.get_zerocopy_numpy(self, copy=False)
🟢  217          arr = _interop.get_zerocopy_numpy(self, copy=True)
🟢  218          if arr is not None:
🟢  219              return arr
    220  
🔴  391      def to_numpy(self, *, copy=None):
🟢  392          return self._ndarray_to_numpy(copy=copy)
🔴 python/quadrants/lang/field.py (65%)
🔴    4  from functools import cached_property
🔴   11  from quadrants.lang import _interop, impl
🔴  107      def _invalidate_zerocopy_cache(self) -> None:
    108          """Hook called by ``impl.reset()`` (via ``pyquadrants.cache_holders``) before C++ teardown.
    109  
    110          Subclasses that support zero-copy define ``_zerocopy_cache`` as a ``cached_property``; we read it via
    111          ``__dict__.get`` so invalidation never triggers the lazy init.
    112          """
🟢  113          cache = self.__dict__.get("_zerocopy_cache")
🟢  114          if cache is not None:
🟢  115              cache.invalidate()
    116  
🔴  122      def to_numpy(self, dtype: DataTypeCxx | None = None, *, copy: bool | None = None):
    125          Args:
    126              copy: Controls copying behaviour:
    127  
    128                  - ``None`` (default) -- zero-copy when possible, copy otherwise.
    129                  - ``True`` -- always return an independent copy.
    130                  - ``False`` -- require zero-copy; raises if not possible.
    131  
🔴  138      def to_torch(self, device=None, *, copy: bool | None = None):
    143              copy: Controls copying behaviour:
    144  
    145                  - ``None`` (default) -- zero-copy when possible, copy otherwise.
    146                  - ``True`` -- always return an independent copy.
    147                  - ``False`` -- require zero-copy; raises if not possible.
🔴  251      @cached_property
🔴  252      def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
    253          """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported.
    254  
    255          Computed once per instance (see review feedback #17 on PR #450). Registers ``self`` in
    256          ``pyquadrants.cache_holders`` so the cache is invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++ teardown.
    257          """
    258          # A ScalarField that is a member of a multi-member StructField has AOS layout: its parent SNode (the struct
    259          # cell) holds multiple `place` children, and consecutive elements of the same member are sizeof(cell) bytes
    260          # apart. The C++ field_to_dlpack does not emit those strides yet, so zerocopy would produce an interleaved
    261          # (broken) view. Skip it.
🟢  262          parent_snode = self.parent()._snode.ptr
🟢  263          is_aos_struct_member = parent_snode.get_num_ch() > 1
🟢  264          return _interop.make_zerocopy_cache_if_supported(
    265              self,
    266              is_field=True,
    267              dtype=self.dtype,
    268              is_scalar_field=True,
    269              shape=self.shape,
    270              is_aos_struct_member=is_aos_struct_member,
    271          )
    272  
🔴  287      def to_numpy(self, dtype=None, *, copy=None):
    288          """Converts this field to a `numpy.ndarray`.
    289  
    290          Args:
    291              dtype: Optional target numpy dtype. Incompatible with ``copy=False`` if it differs from the field's native
    292                  dtype.
    293              copy: ``None`` (default) and ``True`` return an independent copy. ``False`` returns a zero-copy DLPack view
    294                  (requires CPU backend and a supported dtype) or raises ``ValueError``. Note: zero-copy numpy arrays
    295                  alias the field's underlying C++ runtime memory; callers opting into ``copy=False`` are responsible for
    296                  the buffer lifetime.
    297          """
🟢  302          np_dtype_target = None
🟢  303          if dtype is not None:
🟢  304              np_dtype_target = to_numpy_type(dtype) if isinstance(dtype, _qd_core.DataTypeCxx) else dtype
    305  
🟢  306          if copy is False:
🟢  307              return _interop.get_zerocopy_numpy(self, copy=False, dtype_target=np_dtype_target)
    308          # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢  309          arr = _interop.get_zerocopy_numpy(self, copy=True, dtype_target=np_dtype_target)
🟢  310          if arr is not None:
🟢  311              return arr
    312  
🔴  326      def to_torch(self, device=None, *, copy=None):
    327          """Converts this field to a `torch.tensor`.
    328  
    329          Args:
    330              device: Optional torch device. Incompatible with ``copy=False`` if it differs from the field's native
    331                  device.
    332              copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
    333                  zero-copy or raises.
    334          """
🟢  335          tc = _interop.get_zerocopy_torch(self, copy=copy, device=device)
🟢  336          if tc is not None:
🟢  337              return tc
    338  
🟢 python/quadrants/lang/impl.py (100%)
    377          # Holders of zero-copy DLPack caches (Ndarray, ScalarField, MatrixField, ...). On reset, each holder's
    378          # `_invalidate_zerocopy_cache()` is called BEFORE the C++ program is torn down, so that DLPack deleters run
    379          # while the underlying memory is still valid.
🟢  380          self.cache_holders: weakref.WeakSet = weakref.WeakSet()
🟢  571      old_cache_holders = pyquadrants.cache_holders
    572      # Drop zero-copy caches BEFORE the C++ program is torn down. The cached torch tensors / numpy arrays alias C++
    573      # memory via DLPack; their deleters must run while the program is still alive, otherwise we get use-after-free at
    574      # GC time (segfault on AMD GPU, Metal, etc.).
🟢  575      for h in old_cache_holders:
🟢  576          h._invalidate_zerocopy_cache()
🔴 python/quadrants/lang/matrix.py (46%)
🔴    6  from functools import cached_property
🔴   13  from quadrants.lang import _interop, expr, impl, runtime_ops
🔴 1214      @cached_property
🔴 1215      def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
   1216          """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported.
   1217  
   1218          Constructed once per instance (closes review #17); registers ``self`` with ``pyquadrants.cache_holders``
   1219          (closes review #18) so ``qd.reset()`` / ``qd.init()`` invalidate the cache BEFORE C++ teardown.
   1220          """
   1221          # Defensive: if this MatrixField is a member of a multi-member StructField, its representative SNode shares a
   1222          # parent with sibling members and the C++ DLPack export produces broken AOS strides. See
   1223          # ScalarField._zerocopy_cache for the full rationale.
🟢 1224          parent_snode = self.parent()._snode.ptr
🟢 1225          is_aos_struct_member = parent_snode.get_num_ch() > 1
🟢 1226          return _interop.make_zerocopy_cache_if_supported(
   1227              self,
   1228              is_field=True,
   1229              dtype=self.dtype,
   1230              shape=self.shape,
   1231              is_aos_struct_member=is_aos_struct_member,
   1232          )
   1233  
🔴 1234      def _matrix_view_shape(self, keep_dims: bool) -> tuple[tuple[int, ...], bool]:
   1235          """Returns ``(expected_shape, as_vector)`` for ``to_torch`` / ``to_numpy``.
   1236  
   1237          Keeps the n/m/keep_dims handling in one place rather than duplicating it in each conversion method (closes
   1238          review #13, #14).
   1239          """
🟢 1240          as_vector = self.m == 1 and not keep_dims
🟢 1241          shape_ext = (self.n,) if as_vector else (self.n, self.m)
🟢 1242          return self.shape + shape_ext, as_vector
   1243  
🔴 1337      def to_numpy(self, keep_dims=False, dtype=None, *, copy=None):
   1346              copy: ``None`` (default) and ``True`` return an independent copy. ``False`` returns a zero-copy DLPack view
   1347                  (requires CPU backend and a supported dtype) or raises ``ValueError``. Note: zero-copy numpy arrays
   1348                  alias the field's underlying C++ runtime memory; callers opting into ``copy=False`` are responsible for
   1349                  the buffer lifetime.
🟢 1354          expected, as_vector = self._matrix_view_shape(keep_dims)
🟢 1355          np_dtype_target = None
🟢 1356          if dtype is not None:
🔴 1357              np_dtype_target = to_numpy_type(dtype) if isinstance(dtype, qd_python_core.DataTypeCxx) else dtype
   1358  
🟢 1359          if copy is False:
🔴 1360              arr = _interop.get_zerocopy_numpy(self, copy=False, dtype_target=np_dtype_target)
🔴 1361              if arr.shape != expected:
🔴 1362                  arr = arr.reshape(expected)
🔴 1363              return arr
   1364          # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢 1365          arr = _interop.get_zerocopy_numpy(self, copy=True, dtype_target=np_dtype_target)
🟢 1366          if arr is not None:
🔴 1367              if arr.shape != expected:
🔴 1368                  arr = arr.reshape(expected)
🔴 1369              return arr
   1370  
🟢 1373          arr = np.zeros(expected, dtype=dtype)
🔴 1380      def to_torch(self, device=None, keep_dims=False, *, copy=None):
   1387              copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
   1388                  zero-copy or raises.
🟢 1393          expected, as_vector = self._matrix_view_shape(keep_dims)
🟢 1394          tc = _interop.get_zerocopy_torch(self, copy=copy, device=device)
🟢 1395          if tc is not None:
🔴 1396              if tc.shape != expected:
🔴 1397                  tc = tc.reshape(expected)
🔴 1398              return tc
   1399  
🟢 1403          arr = torch.empty(expected, dtype=to_pytorch_type(self.dtype), device=device)
🔴 1777      def to_numpy(self, *, copy=None):
🔴 1790          return self._ndarray_matrix_to_numpy(as_vector=0, copy=copy)
🔴 1888      def to_numpy(self, *, copy=None):
🟢 1901          return self._ndarray_matrix_to_numpy(as_vector=1, copy=copy)
🔴 python/quadrants/lang/struct.py (62%)
🔴  552      def to_numpy(self, *, copy=None):
    555          Struct fields use AOS cell layout, but Quadrants' C++ ``field_to_dlpack`` does not currently emit
    556          cell-stride-aware DLPack views for individual members (it computes contiguous strides at the member dtype
    557          size, which would interleave neighboring members' bytes). Until the C++ export is taught about AOS strides,
    558          ``StructField`` always returns independent copies; the ``copy`` argument is accepted for API symmetry but
    559          ``copy=False`` is rejected.
    560  
    563          Args:
    564              copy: must be ``None`` or ``True``; ``copy=False`` raises ``ValueError``.
    565  
🟢  569          if copy is False:
🟢  570              raise ValueError(
    571                  "StructField.to_numpy(copy=False) is not supported: AOS member views require "
    572                  "cell-stride-aware DLPack export which the C++ runtime does not emit yet."
    573              )
🔴  574          return {k: v.to_numpy(copy=True) for k, v in self._items}
🔴  577      def to_torch(self, device=None, *, copy=None):
    580          See :meth:`to_numpy` for why members are always returned as independent copies. ``copy=False`` is rejected.
    581  
    585              device (torch.device, optional): The desired device of returned tensors.
    586              copy: must be ``None`` or ``True``; ``copy=False`` raises ``ValueError``.
    589              Dict[str, Union[torch.Tensor, Dict]]: The result PyTorch tensor.
🟢  591          if copy is False:
🟢  592              raise ValueError(
    593                  "StructField.to_torch(copy=False) is not supported: AOS member views require "
    594                  "cell-stride-aware DLPack export which the C++ runtime does not emit yet."
    595              )
🟢  596          return {k: v.to_torch(device=device, copy=True) for k, v in self._items}
🔴 python/quadrants/lang/util.py (0%)
🔴   95  @functools.lru_cache(maxsize=None)
🔴  133  @functools.lru_cache(maxsize=None)
🟢 tests/python/quadrants/lang/test_zerocopy_interop.py (96%)
      1  """Tests for zero-copy to_torch / to_numpy via DLPack (the ``copy`` parameter)."""
      2  
🟢    3  import os
      4  
🟢    5  import numpy as np
🟢    6  import pytest
      7  
🟢    8  torch = pytest.importorskip("torch")
      9  
🟢   10  import quadrants as qd
     11  
🟢   12  from tests import test_utils
     13  
🟢   14  pytestmark = pytest.mark.needs_torch
     15  
🟢   16  dlpack_arch = [qd.cpu, qd.cuda, qd.metal, qd.amdgpu]
     17  
     18  
🟢   19  def is_v520_amdgpu():
🟢   20      return os.environ.get("QD_AMDGPU_V520", None) == "1" and qd.cfg.arch == qd.amdgpu
     21  
     22  
🟢   23  def _to_cpu(t):
     24      """Move tensor to CPU for value comparison on arches where torch accessors may not work."""
🟢   25      if is_v520_amdgpu() or t.device.type != "cpu":
🔴   26          return t.cpu()
🟢   27      return t
     28  
     29  
     30  # ---------------------------------------------------------------------------
     31  # ScalarField.to_torch  --  zero-copy
     32  # ---------------------------------------------------------------------------
     33  
     34  
🟢   35  @test_utils.test(arch=dlpack_arch)
🟢   36  @pytest.mark.parametrize("dtype", [qd.i32, qd.f32])
🟢   37  def test_scalar_field_to_torch_zerocopy(dtype):
🟢   38      f = qd.field(dtype, shape=(4,))
🟢   39      f[0] = 10
🟢   40      f[1] = 20
🟢   41      qd.sync()
     42  
🟢   43      tc = f.to_torch()
🟢   44      tc = _to_cpu(tc)
🟢   45      assert tc[0] == 10
🟢   46      assert tc[1] == 20
🟢   47      assert tuple(tc.shape) == (4,)
     48  
     49  
🟢   50  @test_utils.test(arch=dlpack_arch)
🟢   51  def test_scalar_field_to_torch_caching():
     52      """Calling to_torch() twice returns the same cached object."""
🟢   53      f = qd.field(qd.f32, shape=(3,))
🟢   54      qd.sync()
🟢   55      t1 = f.to_torch()
🟢   56      t2 = f.to_torch()
🟢   57      assert t1.data_ptr() == t2.data_ptr()
     58  
     59  
🟢   60  @test_utils.test(arch=dlpack_arch)
🟢   61  def test_scalar_field_to_torch_copy_true():
     62      """copy=True must return an independent tensor."""
🟢   63      f = qd.field(qd.f32, shape=(3,))
🟢   64      f[0] = 42
🟢   65      qd.sync()
🟢   66      tc_view = f.to_torch()
🟢   67      tc_copy = f.to_torch(copy=True)
🟢   68      assert _to_cpu(tc_copy)[0] == 42
🟢   69      assert tc_view.data_ptr() != tc_copy.data_ptr()
     70  
     71  
🟢   72  @test_utils.test(arch=dlpack_arch)
🟢   73  def test_scalar_field_to_torch_copy_false():
     74      """copy=False should succeed on DLPack-capable arches with non-empty shape."""
🟢   75      f = qd.field(qd.f32, shape=(3,))
🟢   76      f[0] = 7
🟢   77      qd.sync()
🟢   78      tc = f.to_torch(copy=False)
🟢   79      assert _to_cpu(tc)[0] == 7
     80  
     81  
🟢   82  @test_utils.test(arch=dlpack_arch)
🟢   83  def test_scalar_field_to_torch_aliases_memory():
     84      """Zero-copy tensor must alias the field's device memory."""
🟢   85      if is_v520_amdgpu():
🔴   86          pytest.skip("can't run torch accessor kernels on v520")
🟢   87      f = qd.field(qd.i32, shape=(4,))
🟢   88      f[0] = 1
🟢   89      qd.sync()
🟢   90      tc = f.to_torch()
     91  
🟢   92      @qd.kernel
🟢   93      def write(f: qd.template()):
🔴   94          f[0] = 99
     95  
🟢   96      write(f)
🟢   97      qd.sync()
🟢   98      assert tc[0] == 99
     99  
    100  
    101  # ---------------------------------------------------------------------------
    102  # ScalarField.to_numpy  --  zero-copy (CPU only)
    103  # ---------------------------------------------------------------------------
    104  
    105  
🟢  106  @test_utils.test(arch=[qd.cpu])
🟢  107  def test_scalar_field_to_numpy_zerocopy_cpu():
🟢  108      f = qd.field(qd.f32, shape=(5,))
🟢  109      f[0] = 3.5
🟢  110      qd.sync()
🟢  111      arr = f.to_numpy()
🟢  112      assert isinstance(arr, np.ndarray)
🟢  113      np.testing.assert_allclose(arr[0], 3.5)
    114  
    115  
🟢  116  @test_utils.test(arch=[qd.cpu])
🟢  117  def test_scalar_field_to_numpy_copy_true_cpu():
🟢  118      f = qd.field(qd.f32, shape=(3,))
🟢  119      f[0] = 1.0
🟢  120      qd.sync()
🟢  121      arr = f.to_numpy(copy=True)
🟢  122      np.testing.assert_allclose(arr[0], 1.0)
    123  
    124  
🟢  125  @test_utils.test(arch=dlpack_arch)
🟢  126  def test_scalar_field_to_numpy_matches_copy_path():
    127      """Zero-copy and copy path produce the same values."""
🟢  128      f = qd.field(qd.f32, shape=(6,))
🟢  129      for i in range(6):
🟢  130          f[i] = float(i * 11)
🟢  131      qd.sync()
🟢  132      arr_default = f.to_numpy()
🟢  133      arr_copy = f.to_numpy(copy=True)
🟢  134      np.testing.assert_allclose(arr_default, arr_copy)
    135  
    136  
    137  # ---------------------------------------------------------------------------
    138  # MatrixField.to_torch  --  zero-copy with keep_dims handling
    139  # ---------------------------------------------------------------------------
    140  
    141  
🟢  142  @test_utils.test(arch=dlpack_arch)
🟢  143  def test_vector_field_to_torch_zerocopy():
🟢  144      vec3 = qd.types.vector(3, qd.f32)
🟢  145      f = qd.field(vec3, shape=(4,))
🟢  146      f[0] = (1, 2, 3)
🟢  147      f[1] = (4, 5, 6)
🟢  148      qd.sync()
🟢  149      tc = _to_cpu(f.to_torch())
🟢  150      assert tuple(tc.shape) == (4, 3)
🟢  151      assert tc[0, 0] == 1
🟢  152      assert tc[1, 2] == 6
    153  
    154  
🟢  155  @test_utils.test(arch=dlpack_arch)
🟢  156  def test_vector_field_to_torch_keep_dims():
🟢  157      vec3 = qd.types.vector(3, qd.f32)
🟢  158      f = qd.field(vec3, shape=(4,))
🟢  159      f[0] = (10, 20, 30)
🟢  160      qd.sync()
🟢  161      tc_no_keep = _to_cpu(f.to_torch(keep_dims=False))
🟢  162      tc_keep = _to_cpu(f.to_torch(keep_dims=True))
🟢  163      assert tuple(tc_no_keep.shape) == (4, 3)
🟢  164      assert tuple(tc_keep.shape) == (4, 3, 1)
🟢  165      assert tc_no_keep[0, 0] == tc_keep[0, 0, 0]
    166  
    167  
🟢  168  @test_utils.test(arch=dlpack_arch)
🟢  169  def test_matrix_field_to_torch_zerocopy():
🟢  170      mat = qd.types.matrix(2, 3, qd.f32)
🟢  171      f = qd.field(mat, shape=(5,))
🟢  172      f[0] = ((1, 2, 3), (4, 5, 6))
🟢  173      qd.sync()
🟢  174      tc = _to_cpu(f.to_torch())
🟢  175      assert tuple(tc.shape) == (5, 2, 3)
🟢  176      assert tc[0, 0, 0] == 1
🟢  177      assert tc[0, 1, 2] == 6
    178  
    179  
🟢  180  @test_utils.test(arch=dlpack_arch)
🟢  181  def test_matrix_field_to_torch_matches_copy():
🟢  182      mat = qd.types.matrix(2, 3, qd.f32)
🟢  183      f = qd.field(mat, shape=(4,))
🟢  184      f[0] = ((1, 2, 3), (4, 5, 6))
🟢  185      f[1] = ((7, 8, 9), (10, 11, 12))
🟢  186      qd.sync()
🟢  187      tc_zc = _to_cpu(f.to_torch())
🟢  188      tc_cp = _to_cpu(f.to_torch(copy=True))
🟢  189      assert torch.allclose(tc_zc, tc_cp)
    190  
    191  
    192  # ---------------------------------------------------------------------------
    193  # MatrixField.to_numpy  --  zero-copy (CPU only)
    194  # ---------------------------------------------------------------------------
    195  
    196  
🟢  197  @test_utils.test(arch=[qd.cpu])
🟢  198  def test_vector_field_to_numpy_zerocopy_cpu():
🟢  199      vec3 = qd.types.vector(3, qd.f32)
🟢  200      f = qd.field(vec3, shape=(3,))
🟢  201      f[0] = (10, 20, 30)
🟢  202      qd.sync()
🟢  203      arr = f.to_numpy()
🟢  204      assert arr.shape == (3, 3)
🟢  205      np.testing.assert_allclose(arr[0], [10, 20, 30])
    206  
    207  
🟢  208  @test_utils.test(arch=[qd.cpu])
🟢  209  def test_matrix_field_to_numpy_zerocopy_cpu():
🟢  210      mat = qd.types.matrix(2, 2, qd.f32)
🟢  211      f = qd.field(mat, shape=(2,))
🟢  212      f[0] = ((1, 2), (3, 4))
🟢  213      qd.sync()
🟢  214      arr = f.to_numpy()
🟢  215      assert arr.shape == (2, 2, 2)
🟢  216      np.testing.assert_allclose(arr[0], [[1, 2], [3, 4]])
    217  
    218  
    219  # ---------------------------------------------------------------------------
    220  # Ndarray.to_torch  --  new method, always uses DLPack
    221  # ---------------------------------------------------------------------------
    222  
    223  
🟢  224  @test_utils.test(arch=dlpack_arch)
🟢  225  def test_scalar_ndarray_to_torch():
🟢  226      nd = qd.ndarray(qd.f32, shape=(5,))
🟢  227      nd[0] = 42.0
🟢  228      qd.sync()
🟢  229      tc = _to_cpu(nd.to_torch())
🟢  230      assert tuple(tc.shape) == (5,)
🟢  231      assert tc[0] == 42.0
    232  
    233  
🟢  234  @test_utils.test(arch=dlpack_arch)
🟢  235  def test_vector_ndarray_to_torch():
🟢  236      nd = qd.Vector.ndarray(3, qd.f32, shape=(4,))
🟢  237      nd[0] = (1, 2, 3)
🟢  238      qd.sync()
🟢  239      tc = _to_cpu(nd.to_torch())
🟢  240      assert tuple(tc.shape) == (4, 3)
🟢  241      assert tc[0, 0] == 1
🟢  242      assert tc[0, 2] == 3
    243  
    244  
🟢  245  @test_utils.test(arch=dlpack_arch)
🟢  246  def test_matrix_ndarray_to_torch():
🟢  247      nd = qd.Matrix.ndarray(2, 3, qd.f32, shape=(4,))
🟢  248      nd[0] = ((10, 20, 30), (40, 50, 60))
🟢  249      qd.sync()
🟢  250      tc = _to_cpu(nd.to_torch())
🟢  251      assert tuple(tc.shape) == (4, 2, 3)
🟢  252      assert tc[0, 0, 0] == 10
🟢  253      assert tc[0, 1, 2] == 60
    254  
    255  
🟢  256  @test_utils.test(arch=dlpack_arch)
🟢  257  def test_ndarray_to_torch_caching():
🟢  258      nd = qd.ndarray(qd.f32, shape=(3,))
🟢  259      qd.sync()
🟢  260      t1 = nd.to_torch()
🟢  261      t2 = nd.to_torch()
🟢  262      assert t1.data_ptr() == t2.data_ptr()
    263  
    264  
🟢  265  @test_utils.test(arch=dlpack_arch)
🟢  266  def test_ndarray_to_torch_copy_true():
🟢  267      nd = qd.ndarray(qd.f32, shape=(3,))
🟢  268      nd[0] = 7.0
🟢  269      qd.sync()
🟢  270      tc = _to_cpu(nd.to_torch(copy=True))
🟢  271      assert tc[0] == 7.0
🟢  272      view = nd.to_torch()
🟢  273      assert view.data_ptr() != tc.data_ptr()
    274  
    275  
    276  # ---------------------------------------------------------------------------
    277  # Ndarray.to_numpy  --  zero-copy on CPU
    278  # ---------------------------------------------------------------------------
    279  
    280  
🟢  281  @test_utils.test(arch=[qd.cpu])
🟢  282  def test_scalar_ndarray_to_numpy_zerocopy():
🟢  283      nd = qd.ndarray(qd.f32, shape=(4,))
🟢  284      nd[0] = 3.0
🟢  285      nd[1] = 5.0
🟢  286      qd.sync()
🟢  287      arr = nd.to_numpy()
🟢  288      np.testing.assert_allclose(arr[:2], [3.0, 5.0])
    289  
    290  
🟢  291  @test_utils.test(arch=[qd.cpu])
🟢  292  def test_vector_ndarray_to_numpy_zerocopy():
🟢  293      nd = qd.Vector.ndarray(3, qd.f32, shape=(2,))
🟢  294      nd[0] = (1, 2, 3)
🟢  295      qd.sync()
🟢  296      arr = nd.to_numpy()
🟢  297      np.testing.assert_allclose(arr[0], [1, 2, 3])
    298  
    299  
🟢  300  @test_utils.test(arch=dlpack_arch)
🟢  301  def test_ndarray_to_numpy_matches_copy():
🟢  302      nd = qd.ndarray(qd.f32, shape=(6,))
🟢  303      for i in range(6):
🟢  304          nd[i] = float(i * 7)
🟢  305      qd.sync()
🟢  306      arr_default = nd.to_numpy()
🟢  307      arr_copy = nd.to_numpy(copy=True)
🟢  308      np.testing.assert_allclose(arr_default, arr_copy)
    309  
    310  
    311  # ---------------------------------------------------------------------------
    312  # Two fields in the same SNode tree (non-zero offset)
    313  # ---------------------------------------------------------------------------
    314  
    315  
🟢  316  @test_utils.test(arch=dlpack_arch)
🟢  317  def test_to_torch_two_fields_same_tree():
🟢  318      a = qd.field(qd.i32, (100,))
🟢  319      b = qd.field(qd.i32, (100,))
🟢  320      a[0] = 111
🟢  321      b[0] = 222
🟢  322      qd.sync()
🟢  323      at = _to_cpu(a.to_torch())
🟢  324      bt = _to_cpu(b.to_torch())
🟢  325      assert at[0] == 111
🟢  326      assert bt[0] == 222
    327  
    328  
    329  # ---------------------------------------------------------------------------
    330  # copy=False  --  error paths
    331  # ---------------------------------------------------------------------------
    332  
    333  
🟢  334  @test_utils.test(arch=dlpack_arch)
🟢  335  def test_copy_false_with_device_transfer_raises():
    336      """copy=False + device that differs from data device should raise."""
🟢  337      if qd.cfg.arch == qd.cpu:
🟢  338          pytest.skip("need GPU arch to test device mismatch")
🔴  339      f = qd.field(qd.f32, shape=(3,))
🔴  340      qd.sync()
🔴  341      with pytest.raises(ValueError, match="copy=False"):
🔴  342          f.to_torch(device="cpu", copy=False)
    343  
    344  
🟢  345  @test_utils.test(arch=[qd.cpu])
🟢  346  def test_copy_false_numpy_dtype_conversion_raises():
    347      """copy=False + dtype that requires conversion should raise."""
🟢  348      f = qd.field(qd.f32, shape=(3,))
🟢  349      qd.sync()
🟢  350      with pytest.raises(ValueError, match="copy=False"):
🟢  351          f.to_numpy(dtype=np.float64, copy=False)
    352  
    353  
    354  # ---------------------------------------------------------------------------
    355  # StructField pass-through
    356  # ---------------------------------------------------------------------------
    357  
    358  
🟢  359  @test_utils.test(arch=dlpack_arch)
🟢  360  def test_struct_field_to_torch():
🟢  361      s = qd.Struct.field({"a": qd.f32, "b": qd.f32}, shape=(4,))
🟢  362      s[0] = {"a": 1.0, "b": 2.0}
🟢  363      qd.sync()
🟢  364      d = s.to_torch()
🟢  365      assert isinstance(d, dict)
🟢  366      assert _to_cpu(d["a"])[0] == 1.0
🟢  367      assert _to_cpu(d["b"])[0] == 2.0
    368  
    369  
🟢  370  @test_utils.test(arch=dlpack_arch)
🟢  371  def test_struct_field_to_torch_copy_true():
🟢  372      s = qd.Struct.field({"x": qd.f32}, shape=(3,))
🟢  373      s[0] = {"x": 5.0}
🟢  374      qd.sync()
🟢  375      d1 = s.to_torch()
🟢  376      d2 = s.to_torch(copy=True)
    377      # StructField always copies (AOS member views are not zero-copyable yet -- see struct.py docstring),
    378      # so both calls allocate independent storage.
🟢  379      assert d1["x"].data_ptr() != d2["x"].data_ptr()
🟢  380      assert _to_cpu(d2["x"])[0] == 5.0
    381  
    382  
🟢  383  @test_utils.test(arch=dlpack_arch)
🟢  384  def test_struct_field_to_torch_does_not_alias_memory():
    385      """StructField always returns independent copies of each AOS member.
    386  
    387      Quadrants' C++ ``field_to_dlpack`` does not currently emit cell-stride-aware DLPack views for individual members of
    388      an AOS struct -- it computes contiguous strides at the member dtype size, which would interleave neighboring
    389      members' bytes. Until that is fixed, ``StructField.to_torch`` forces ``copy=True`` per-member, so a kernel write
    390      into the field must NOT be reflected in a previously-obtained dict.
    391      """
🟢  392      if is_v520_amdgpu():
🔴  393          pytest.skip("can't run torch accessor kernels on v520")
🟢  394      s = qd.Struct.field({"a": qd.i32, "b": qd.i32}, shape=(4,))
🟢  395      s[0] = {"a": 1, "b": 2}
🟢  396      qd.sync()
🟢  397      d = s.to_torch()
🟢  398      assert _to_cpu(d["a"])[0] == 1
🟢  399      assert _to_cpu(d["b"])[0] == 2
    400  
🟢  401      @qd.kernel
🟢  402      def write(s: qd.template()):
🔴  403          s[0].a = 99
🔴  404          s[0].b = 77
    405  
🟢  406      write(s)
🟢  407      qd.sync()
    408      # Snapshot must be unchanged because StructField.to_torch always copies.
🟢  409      assert _to_cpu(d["a"])[0] == 1
🟢  410      assert _to_cpu(d["b"])[0] == 2
    411  
    412  
🟢  413  @test_utils.test(arch=[qd.cpu])
🟢  414  def test_struct_field_copy_false_raises():
    415      """``copy=False`` is rejected for StructField (AOS member views not supported yet)."""
🟢  416      s = qd.Struct.field({"a": qd.f32, "b": qd.i32}, shape=(3,))
🟢  417      s[0] = {"a": 1.5, "b": 7}
🟢  418      qd.sync()
🟢  419      with pytest.raises(ValueError, match="StructField.to_numpy.*copy=False"):
🟢  420          s.to_numpy(copy=False)
🟢  421      with pytest.raises(ValueError, match="StructField.to_torch.*copy=False"):
🟢  422          s.to_torch(copy=False)
    423  
    424  
    425  # ---------------------------------------------------------------------------
    426  # Cache invalidation across qd.reset() / qd.init()
    427  # ---------------------------------------------------------------------------
    428  
    429  
🟢  430  @test_utils.test(arch=dlpack_arch)
🟢  431  def test_zerocopy_cache_survives_reset():
    432      """Holding a zero-copy view across qd.reset() must not segfault.
    433  
    434      Regression for the use-after-free that motivated the Genesis to_numpy()-always-copies workaround: cache invalidation
    435      now runs BEFORE the C++ program is torn down (see pyquadrants.cache_holders in impl.py), so the DLPack deleter on
    436      the cached tensor operates on still-valid memory.
    437      """
🟢  438      arch = qd.cfg.arch
🟢  439      f = qd.field(qd.f32, shape=(4,))
🟢  440      f[0] = 1.0
🟢  441      qd.sync()
🟢  442      held = f.to_torch()  # cached zero-copy view; we hold a Python reference past reset()
🟢  443      assert held is not None
    444  
🟢  445      qd.reset()
🟢  446      qd.init(arch=arch)
    447  
    448      # Re-create and verify the new field works -- the previous-cycle 'held' tensor is allowed to remain alive (its
    449      # data may now be released, but we don't dereference it). What we MUST not have is a crash inside the
    450      # previous-cycle deleter when 'held' is GC'd.
🟢  451      f2 = qd.field(qd.f32, shape=(4,))
🟢  452      f2[0] = 42.0
🟢  453      qd.sync()
🟢  454      fresh = f2.to_torch()
🟢  455      assert _to_cpu(fresh)[0] == 42.0
    456      # Touch 'held' last so it's only dropped at function exit; the DLPack deleter must not crash.
    457  
    458  
🟢  459  @test_utils.test(arch=dlpack_arch)
🟢  460  def test_zerocopy_cache_fresh_after_reset():
    461      """After qd.reset() / qd.init(), a freshly-constructed Field gets a fresh cache.
    462  
    463      Specifically, the new field's data_ptr is not stale-equal to a previous cycle's, AND the new view sees current data.
    464      """
🟢  465      arch = qd.cfg.arch
🟢  466      f1 = qd.field(qd.f32, shape=(2,))
🟢  467      f1[0] = 1.0
🟢  468      qd.sync()
🟢  469      f1.to_torch()  # populate cache
🟢  470      f1_dtype = f1.dtype  # noqa: F841 -- keep f1 alive past reset for the GC ordering test
    471  
🟢  472      qd.reset()
🟢  473      qd.init(arch=arch)
    474  
🟢  475      f2 = qd.field(qd.f32, shape=(2,))
🟢  476      f2[0] = 99.0
🟢  477      qd.sync()
🟢  478      tc2 = _to_cpu(f2.to_torch())
🟢  479      assert tc2[0] == 99.0
    480  
    481  
    482  # ---------------------------------------------------------------------------
    483  # Apple Metal double-sync
    484  # ---------------------------------------------------------------------------
    485  
    486  
🟢  487  @test_utils.test(arch=dlpack_arch)
🟢  488  def test_clone_after_kernel_write_returns_up_to_date_data():
    489      """After a kernel writes to a field, an immediate clone must see the post-write values.
    490  
    491      Specifically targets the Apple Metal path where qd.sync() flushes pending Quadrants kernels and
    492      torch.mps.synchronize() flushes pending MPS clone copies; both must run for the cloned tensor to be observably
    493      equal to the field's current values.
    494      """
🟢  495      if is_v520_amdgpu():
🔴  496          pytest.skip("can't run torch accessor kernels on v520")
🟢  497      f = qd.field(qd.i32, shape=(8,))
🟢  498      f[0] = 0
🟢  499      qd.sync()
    500  
🟢  501      @qd.kernel
🟢  502      def write(f: qd.template()):
🔴  503          for i in range(8):
🔴  504              f[i] = 100 + i
    505  
🟢  506      write(f)
    507      # No explicit qd.sync() here -- we rely on to_torch(copy=True)'s internal sync.
🟢  508      tc = f.to_torch(copy=True)
🟢  509      tc_cpu = _to_cpu(tc)
🟢  510      for i in range(8):
🟢  511          assert tc_cpu[i] == 100 + i
🟢 tests/python/test_api.py (100%)
    370      "to_torch",
🟢  372  user_api[qd.Ndarray] = ["copy_from", "element_shape", "fill", "get_type", "to_dlpack", "to_torch"]
    407      "to_torch",
    432      "to_torch",
🟢 tests/python/test_torch_ad.py (100%)
🟢   51              outp = y.to_torch().to(inp.device)
🟢   59              inp_grad = x.grad.to_torch().to(outp_grad.device)
🟢 tests/python/test_torch_io.py (100%)
🟢  164      t3 = x2.to_torch().cpu()
🟢  178      zeros = mat.to_torch().cpu()
🟢  209      s_after_to = impl.get_runtime().get_num_compiled_functions()
    210      # to_torch may use zero-copy (0 kernels) or the fused kernel path (1 kernel)
🟢  211      assert s_after_to - s <= 1
    213      # from_torch always compiles exactly one fused kernel
🟢  214      assert impl.get_runtime().get_num_compiled_functions() == s_after_to + 1
🟢  233      X = x.to_torch().cpu()
🟢  241      X1 = x.to_torch().cpu()
🟢  243      X1 = x.to_torch().cpu()
🟢  253      X = x.to_torch().cpu()
🟢  260      X1 = x.to_torch().cpu()
🟢  262      X1 = x.to_torch().cpu()

Comment thread docs/source/user_guide/interop.md Outdated
auto = f.to_torch() # default: copy=None (see below)
```

Modifications via the view are visible to subsequent Quadrants kernel reads, and vice-versa. The view stays valid until the underlying storage is reallocated -- typically on `qd.init()` or `qd.reset()`, after which a fresh call to `to_torch()` / `to_numpy()` returns a new view.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should mention somewhere in the doc that one should be careful it is asynchronous on Apple Metal because torch and quadrants do not share the same compute "stream" (not sure if it is the correct terminology). Which means that doing:

a_tc = a.to_torch(copy=False)
a_tc.zero_()
my_kernel(a)

Does not guarantee that my_kernel with be computed after reset to zero on Apple Metal, until torch.mps.synchronise() is called explicitly after a_tc.zero_(). Currently it is not the case on purpose. This is managed manually in Genesis, only after performing all the desired operations on all fields involved. This is more efficient because it keeps filling the compute queue will emptying it, partially masking Python overhead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just found out the section where you mention the behaviour on Metal. I'm not convinced by the behaviour that was chosen though.

Comment thread docs/source/user_guide/interop.md Outdated

Zero-copy uses [DLPack](https://github.com/dmlc/dlpack) and requires:

- a backend with DLPack support: `cpu` (`x64`/`arm64`), `cuda`, `amdgpu`, or `metal` (Vulkan is not supported);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not supported, because it is mostly irrelevant for now. Vulkan-backed DLPack tensors cannot be processed by any existing well-known scientific computing library such as torch, numpy...etc

- a backend with DLPack support: `cpu` (`x64`/`arm64`), `cuda`, `amdgpu`, or `metal` (Vulkan is not supported);
- a DLPack-supported dtype: `i32`, `i64`, `f32`, `f64`, `u1` (other dtypes such as `f16`, `u8`, `u16` fall back to the kernel-copy path);
- on Apple Metal, `torch >= 2.9.2` for fields (required for DLPack `bytes_offset` on MPS; see [pytorch/pytorch#168193](https://github.com/pytorch/pytorch/pull/168193));
- 0-dim `ScalarField` instances are not zero-copyable on any backend (PyTorch DLPack `bytes_offset` limitation);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clarified that Quadrants' implementation has been designed with torch as only first-class user interface in mind, otherwise bringing torch limitation to justify Quadrants' limitation is weird.

Comment thread docs/source/user_guide/interop.md Outdated
- a DLPack-supported dtype: `i32`, `i64`, `f32`, `f64`, `u1` (other dtypes such as `f16`, `u8`, `u16` fall back to the kernel-copy path);
- on Apple Metal, `torch >= 2.9.2` for fields (required for DLPack `bytes_offset` on MPS; see [pytorch/pytorch#168193](https://github.com/pytorch/pytorch/pull/168193));
- 0-dim `ScalarField` instances are not zero-copyable on any backend (PyTorch DLPack `bytes_offset` limitation);
- `StructField` members are not zero-copyable yet (see [Struct fields](#struct-fields) below); `copy=False` raises on `StructField`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only StructField with AOS should not be zero-copyable. If the layout is SAO, it should be fine as far as I know.

Comment thread docs/source/user_guide/interop.md Outdated

| Value | Behaviour |
|---|---|
| `None` (default) | Zero-copy view if supported, otherwise an independent copy. Never raises for support reasons. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bad idea to use None by default. It should rather be False.

In Genesis we started with copy=None by default, before realising it was causing troubles for users that were expecting a copy and doing in-place modification on the output of forgetting to copy if before storing in a buffer.

If people care about efficiency, they should know about this copy option. If they don't, then they probably care more about reliability and user-friendliness. And I don't think it is very user-friendly that something converting a tensor to a completely different format from another library returns a shared view instead of a copy. Moreover, I think that having a consistent, simple, behaviour by default is just easier to support.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean, it should default to copy=True?

Comment thread docs/source/user_guide/interop.md Outdated

### Apple Metal: synchronisation

On Apple Metal, Quadrants and PyTorch MPS use separate Metal command queues. Quadrants kernel writes are made visible to the MPS-backed view via an automatic `qd.sync()` on every `to_torch()` / `to_numpy()` call. Cloning a view also synchronises MPS so that the clone sees the latest writes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via an automatic qd.sync()on everyto_torch()/to_numpy() call

I don't think this is a good idea (see my earlier comment), at least it should be optional.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what do you think should happen?

Comment thread tests/python/test_to_torch_copy.py
Comment thread python/quadrants/lang/field.py Outdated
@github-actions
Copy link
Copy Markdown

Coverage Report (98f58f01d)

File Coverage Missing
🟢 python/quadrants/lang/_ndarray.py 81% 453,468,481
🔴 python/quadrants/lang/field.py 44% 19,115-117,119,122-131,134,137,160,164,167,169,171,177,191,194-196,199-200,207,216,218,221,228,230,233,236-237,240,247-249,251-253,259,384,396,493-494,549,560,579
🔴 python/quadrants/lang/matrix.py 66% 22,1416,1438,1452,1849,1866-1870,1889,1990,2010,2033
🔴 python/quadrants/lang/struct.py 50% 552,566
🟢 tests/python/test_to_torch_copy.py 83% 28,52-55,57-59,147,259-261,280-282,300-302,323-325,344-346,356,372-374,396-398,400-401,412,414-417,419-421,427,429-432,434-437,518-521,523-524

Diff coverage: 73% · Overall: 52% · 482 lines, 128 missing

Full annotated report

…DA GPUs

Restrict to arch=[qd.cuda] since qd.cfg.arch.name gives 'amdgpu'/'metal'
which are not valid torch device strings (torch expects 'cuda'/'mps').
@hughperkins hughperkins force-pushed the hp/zerocopy-interop branch from 04f6f20 to 1603ed9 Compare April 30, 2026 11:07
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread python/quadrants/lang/field.py
The C++ field_to_dlpack eagerly imports torch to check byte_offset
support. When torch is absent, this raises ModuleNotFoundError on
a numpy-only API. Catch the error gracefully: copy=False raises a
clear ValueError, copy=True falls back to the kernel-copy path.
Document the torch prerequisite for field zero-copy numpy in
interop.md.
@github-actions
Copy link
Copy Markdown

Coverage Report (28825e2a7)

File Coverage Missing
🟢 python/quadrants/lang/_ndarray.py 81% 453,468,481
🔴 python/quadrants/lang/field.py 44% 19,115-117,119,122-131,134,137,160,164,167,169,171,177,192,195-197,200-201,208,217,219,222,229,231,234,237-238,241,248-250,252-254,260,385,397,494-495,550,561,580
🔴 python/quadrants/lang/matrix.py 66% 22,1416,1438,1452,1849,1866-1870,1889,1990,2010,2033
🔴 python/quadrants/lang/struct.py 50% 552,566
🟢 tests/python/test_to_torch_copy.py 82% 28,55-61,63-64,152,264-266,285-287,305-307,328-330,349-351,361,377-379,401-403,405-406,417,419-422,424-426,432,434-437,439-442,523-526,528-529

Diff coverage: 73% · Overall: 73% · 486 lines, 130 missing

Full annotated report

Wrap single-line docstrings in MatrixNdarray.to_torch (132c) and
VectorNdarray.to_torch (154c) to stay within 120c. Also fix a
122c line in _is_aos_struct_member docstring from previous reflow.
Comment thread tests/python/test_to_torch_copy.py Outdated
Comment thread tests/python/test_to_torch_copy.py
- 0-dim ScalarField copy=False raises (to_torch and to_numpy)
- Unsupported dtype (f16) copy=False raises
- Cross-device transfer with copy=False raises
- MatrixField (2x3, not just vectors) copy=False and copy=True
- MatrixNdarray to_numpy(copy=False)
- AoS struct member to_numpy(copy=False) raises
- copy=True independence checks for vector/matrix fields and ndarrays
Comment thread python/quadrants/lang/_ndarray.py
Comment thread tests/python/test_api.py
Comment thread python/quadrants/lang/field.py Outdated
- Add is_ndarray parameter to _can_zerocopy_field so ndarrays skip
  the Metal torch-version gate (ndarray_to_dlpack uses byte_offset=0)
  and the 0-dim ScalarField carve-out (only applies to fields-in-tree).
- Catch RuntimeError from C++ field_to_dlpack for non-dense SNodes and
  convert to ValueError for consistent error types.
- Tighten pytest.raises(Exception) to ValueError in Vulkan test.
- Skip Metal in tests when torch < 2.9.2 (old DLPack bytes_offset).
- Drop to_dlpack from StructField API listing (base class raises
  NotImplementedError).
@github-actions
Copy link
Copy Markdown

Coverage Report (4e2283d4b)

File Coverage Missing
🟢 python/quadrants/lang/_ndarray.py 81% 453,468,481
🔴 python/quadrants/lang/field.py 47% 19,115-117,119,122-131,134,137,160,167,169,177,192,195-197,200-201,208,217,219,222,229,231,234,237-238,241,248-250,254,260-262,266,268,393,405,502-503,558,569,588
🔴 python/quadrants/lang/matrix.py 76% 22,1416,1438,1452,1849,1869,1889,1992,2012,2035
🔴 python/quadrants/lang/struct.py 50% 552,566
🟢 tests/python/test_to_torch_copy.py 86% 28,55-61,63-64,152,264-266,285-287,305-307,328-330,349-351,361,377-379,401-403,405-406,417,419-422,424-426,432,434-437,439-442,523-526,528-529,605-608,610-611,685-687

Diff coverage: 79% · Overall: 74% · 634 lines, 135 missing

Full annotated report

Comment thread python/quadrants/lang/field.py
@github-actions
Copy link
Copy Markdown

Coverage Report (1955eeada)

File Coverage Missing
🟢 python/quadrants/lang/_ndarray.py 81% 453,468,481
🔴 python/quadrants/lang/field.py 45% 19,115-117,119,122-131,134,137,160,167,170,178,193-194,196,199-201,204-205,212,221,223,226,233,235,238,241-242,245,252-254,258,264-266,270-274,276,401,413,510-511,566,577,596
🔴 python/quadrants/lang/matrix.py 76% 22,1416,1438,1452,1849,1869,1889,1992,2012,2035
🔴 python/quadrants/lang/struct.py 50% 552,566
🟢 tests/python/test_to_torch_copy.py 85% 28,30,32-33,60-66,68-69,157,269-271,290-292,310-312,333-335,354-356,366,382-384,406-408,410-411,422,424-427,429-431,437,439-442,444-447,528-531,533-534,610-613,615-616,690-692

Diff coverage: 78% · Overall: 64% · 646 lines, 144 missing

Full annotated report

- Fix ruff I001 import sorting in test_to_torch_copy.py
- Restore to_dlpack in StructField API listing (inherited from Field
  base class, so it appears in the actual API even though it raises
  NotImplementedError)
- Restore deleted TODO comments on runtime_ops.sync() calls
NumPy 2.x flags arrays imported from DLPack v0 PyCapsules as
non-writable. Update interop.md to clarify that to_numpy(copy=False)
returns a read-only view and direct users to to_torch(copy=False)
for writable zero-copy views. Replace the numpy mutation example
with a read-only usage example.
NumPy >= 2.0 marks arrays from v0 capsules as read-only. A future
PR should upgrade the C++ to emit DLManagedTensorVersioned (v1)
with flags=0 for the numpy path.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — duburcqa approved and prior review feedback addressed; only minor nits remain.

Extended reasoning...

Overview

This PR adds zero-copy DLPack interop to to_torch()/to_numpy() across ScalarField, MatrixField, all Ndarray variants, and StructField (per-member). Adds a copy= keyword: copy=True (default) keeps the kernel-copy path; copy=False returns a DLPack-aliased view or raises. New interop.md docs explain the support matrix, lifetime caveats, and Metal sync semantics.

Security risks

None. This is an in-process Python/C++ data-conversion API on already-trusted runtime memory. No auth, crypto, or untrusted-input boundaries.

Level of scrutiny

Public API change touching widely-used types — normally warrants close human review. However: duburcqa (active maintainer) has explicitly approved with 'ok to merge'; my prior review history on this PR has flagged 12+ issues that the author addressed with code or doc fixes; the two bugs found in this run are explicitly nit-level (the dtype=qd.f32 ValueError affects no real callers given the documented contract, and the test-skip helper only affects local Metal+stable-torch dev which isn't a CI configuration).

Other factors

The PR has had multiple coverage iterations (final ~73-78% on changed lines), the test file for the new feature is comprehensive (covers AOS-vs-SOA struct layouts, 0-dim scalars, unsupported dtypes, cross-device transfer rejection, Metal sync, MatrixField shape preservation under keep_dims, copy=True independence). The remaining nits are noted inline.

Comment on lines +576 to +581
if copy is False:
arr = _try_zerocopy_numpy(self, copy=False, is_scalar=True)
if arr is not None:
if dtype is not None and arr.dtype != dtype:
raise ValueError(f"copy=False is incompatible with dtype conversion ({arr.dtype} -> {dtype})")
return arr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Pre-existing nit / API inconsistency. Field.to_numpy(copy=False, dtype=qd.f32) (and the same in MatrixField.to_numpy, Ndarray.to_numpy, MatrixNdarray.to_numpy, VectorNdarray.to_numpy) raises ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)") even though the underlying dtype IS f32. The comparison arr.dtype != dtype (field.py:579) compares a numpy.dtype to a quadrants DataTypeCxx, which evaluates True due to type mismatch. The base class type hint advertises dtype: DataTypeCxx | None = None (field.py:404) while the override docstring says "Optional target numpy dtype" — the type hint is the misleading half. Fix: normalize via to_numpy_type() if dtype is a DataTypeCxx before the comparison. Severity: nit — no real callers pass quadrants dtypes, the kernel-copy path also fails on dtype=qd.f32 (with TypeError from numpy refusing the unknown dtype), and the docstring is the more authoritative half of the contract.

Extended reasoning...

What the bug is

In Field.to_numpy(copy=False, dtype=...) at field.py:577-581 (and four other identical sites: matrix.py:1437-1438, matrix.py:1868-1869, matrix.py:2011-2012, _ndarray.py:467-468):

arr = _try_zerocopy_numpy(self, copy=False, is_scalar=True)
if arr is not None:
    if dtype is not None and arr.dtype != dtype:
        raise ValueError(f"copy=False is incompatible with dtype conversion ({arr.dtype} -> {dtype})")
    return arr

The arr.dtype != dtype comparison compares arr.dtype (a numpy.dtype like np.float32) to whatever the caller passed for dtype. If the caller follows the base class type hint at field.py:404 — dtype: DataTypeCxx | None = None — and passes dtype=qd.f32 (which is quadrants_python.DataType_f32, a pybind11 wrapper of DataTypeCxx), the inequality evaluates True because numpy can't coerce the pybind11 enum and falls back to identity-style comparison. Result: spurious ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)") even though the dtypes ARE compatible.

Step-by-step proof

  1. User reads the base class signature Field.to_numpy(self, dtype: DataTypeCxx | None = None, *, copy=True) and writes f.to_numpy(copy=False, dtype=qd.f32) for a qd.f32 field.
  2. Field.to_numpy (field.py:566) takes the copy is False branch.
  3. _try_zerocopy_numpy(self, copy=False, is_scalar=True) returns a working np.ndarray with arr.dtype == np.dtype('float32').
  4. Line 579: dtype is not None is True, and arr.dtype != dtype evaluates np.dtype('float32') != quadrants_python.DataType_f32. Numpy's __ne__ returns NotImplemented for the pybind11 enum, Python falls back to identity comparison, which yields True.
  5. Line 580 raises ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)").
  6. The user sees a confusing message claiming a dtype conversion is needed when the underlying dtype already matches.

Why existing code does not prevent it

The override docstring at field.py:569 says "Optional target numpy dtype", which would steer most users toward np.float32. But the abstract base class signature (field.py:404) explicitly types it as DataTypeCxx | None, and IDEs / help() / Sphinx surfaces show this hint to anyone introspecting the API. There is no normalization step on the input — neither isinstance(dtype, DataTypeCxx) check nor to_numpy_type() conversion before the comparison.

Why this is nit severity (and not blocking)

Four mitigations push this down from "real bug" to "API inconsistency":

  1. Docstring overrides type hint. The override at ScalarField.to_numpy (and the other 4 sites) explicitly documents dtype as a numpy dtype. Users reading the docs pass np.float32 and never trip the check.
  2. Zero existing callers pass a DataTypeCxx. Grep across tests/python/ confirms every to_numpy(dtype=...) callsite passes a numpy dtype (np.float32, np.float64). No production code in the repo follows the type hint over the docstring.
  3. The kernel-copy path is no better. Looking at the copy=True (default) path at field.py:587-595:
    if dtype is None:
        dtype = to_numpy_type(self.dtype)
    arr = np.zeros(self.shape, dtype=dtype)
    If a user passes dtype=qd.f32 here, np.zeros(shape, dtype=qd.f32) raises TypeError: data type 'qd.f32' not understood because numpy can't convert the pybind11 enum. So the kernel-copy path also fails on dtype=qd.f32, just with a different error type. This isn't a regression introduced by this PR; the API has always rejected DataTypeCxx in practice.
  4. Failure mode is loud, not silent. Clean ValueError with a (slightly misleading) message — not silent data corruption. Workaround is trivial: pass np.float32 instead.

Addressing the refutation

The refuting verifier's points all stand:

  • The docstring is the actual contract — agreed.
  • All existing tests pass numpy dtypes — confirmed.
  • The kernel-copy path also rejects DataTypeCxx — confirmed (different error, same outcome).
  • The submitter labels this nit — agreed.

Where the refuter and I differ: they say "not a bug worth flagging", I (and 4 other verifiers) say it's worth a nit-level note because the type hint genuinely misleads, and the fix is small (~2 lines: add if isinstance(dtype, DataTypeCxx): dtype = to_numpy_type(dtype) before the comparison). This is "API hygiene" rather than "blocker", which is exactly what nit severity is for.

Recommended fix

Normalize the user-provided dtype before comparing, in all 5 sites:

if dtype is not None:
    if isinstance(dtype, _qd_core.DataTypeCxx):
        dtype = to_numpy_type(dtype)
    if arr.dtype != dtype:
        raise ValueError(...)

This makes dtype=qd.f32 and dtype=np.float32 both behave correctly when they match the field's native dtype, and either one raises cleanly when they don't. The docstring stays "numpy dtype" but the type-hint contract is also honored.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-classified by ai as nit

Comment on lines +22 to +35
_NO_ZEROCOPY_ARCHS = {qd.vulkan}


def _skip_if_no_zerocopy():
"""Skip the current test if the active backend doesn't support DLPack zero-copy."""
if qd.cfg.arch in _NO_ZEROCOPY_ARCHS:
pytest.skip(f"DLPack zero-copy not supported on {qd.cfg.arch.name}")
if qd.cfg.arch == qd.metal:
from quadrants.lang.field import (
_TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET, # pylint: disable=C0415
)

if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The test helper _skip_if_no_zerocopy() (test_to_torch_copy.py:22-35) skips all tests when running on Apple Metal with torch ≤ 2.9.1, but production _can_zerocopy_field (field.py:168-170) gates the Metal version check on if not is_ndarray: — the C++ ndarray_to_dlpack hardcodes byte_offset = 0 and never calls torch_supports_byte_offset(). As a result, ndarray-only tests like test_scalar_ndarray_copy_false, test_matrix_ndarray_copy_false, test_vector_ndarray_copy_false (and the to_numpy variants) get spuriously skipped on Metal+old-torch even though they would pass. CI is unaffected (uses torch nightly per macosx/4_test.sh) — only local Metal dev with stable torch loses this coverage. Fix: thread an is_ndarray=False flag through the helper and only enforce the Metal version gate when it's a field.

Extended reasoning...

What the bug is

_skip_if_no_zerocopy() in tests/python/test_to_torch_copy.py:22-35 is the gate every copy=False test calls before exercising the new zero-copy API:

def _skip_if_no_zerocopy():
    if qd.cfg.arch in _NO_ZEROCOPY_ARCHS:
        pytest.skip(...)
    if qd.cfg.arch == qd.metal:
        from quadrants.lang.field import _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET
        if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
            pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2")

The Metal-version branch is uniform across all tests, so the helper fires for ndarray tests and field tests alike. Production code at field.py:160-176 (_can_zerocopy_field) is explicitly not uniform:

if not is_ndarray:
    if arch == _ARCH_METAL and not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
        return False
    ...

The Metal version gate is field-only — is_ndarray=True callsites (Ndarray.to_torch, MatrixNdarray.to_torch, VectorNdarray.to_torch, and the to_numpy counterparts) skip the version check entirely.

Why the production carve-out is correct

The C++ has two separate DLPack export paths:

  • field_to_dlpack (dlpack_funcs.cpp:226): calls torch_supports_byte_offset() because field-in-tree offsets need non-zero byte_offset support, which torch added in 2.9.2 (Honor dlpack byte_offset pytorch/pytorch#168193).
  • ndarray_to_dlpack (dlpack_funcs.cpp:424): hardcodes dl_tensor.byte_offset = 0. Always works regardless of torch version on MPS — the v0 capsule with zero offset has been supported by every torch DLPack importer that supports MPS at all.

So ndarray zero-copy on Metal works on torch 2.9.1, 2.9.0, etc. The production code knows this; the test helper does not.

Step-by-step proof

  1. Local dev box: macOS Apple Silicon, torch == 2.9.1 (stable), qd.init(arch=qd.metal).
  2. _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET is computed at import time as tuple(map(int, ["2","9","1"])) > (2, 9, 1) → False.
  3. User runs pytest tests/python/test_to_torch_copy.py::test_scalar_ndarray_copy_false.
  4. The test calls _skip_if_no_zerocopy() at line ~155.
  5. The helper hits the metal-version branch at lines 30-35 and calls pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2").
  6. The test never runs, even though nd.to_torch(copy=False) would have succeeded:
    • Ndarray.to_torch (_ndarray.py:481) takes the copy is False branch and calls _try_zerocopy_torch(self, copy=False, device=None, is_ndarray=True).
    • _try_zerocopy_torch calls _can_zerocopy_field(self, is_scalar=False, is_ndarray=True).
    • _can_zerocopy_field at field.py:168 hits if not is_ndarray: → False → skips the Metal version check entirely.
    • Returns True; DLPack export proceeds normally and the test would have passed.

The same skip-when-should-run trace fires for test_matrix_ndarray_copy_false, test_vector_ndarray_copy_false, test_scalar_ndarray_to_numpy_copy_false, test_vector_ndarray_to_numpy_copy_false, and test_matrix_ndarray_to_numpy_copy_false — every ndarray-flavored copy=False test the PR introduces.

Why CI does not catch it

.github/workflows/scripts_new/macosx/4_test.sh has an explicit comment about installing pre-release torch nightly ≥ 2.9.2 specifically because of this PR; on CI _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET evaluates True and the helper does not skip. So the bug only fires for local Metal dev on stable torch — a niche but real configuration.

Impact

  • Production: zero. The production code is correct.
  • Test coverage: lost on local Metal dev with stable torch. Five ndarray tests skip when they should run, hiding any regression in the is_ndarray=True zero-copy path on that configuration.
  • Failure mode: skip rather than fail. Not silent corruption, not test failure — just absent coverage.

This is a regression of the original review's intent: the is_ndarray flag was added to production specifically so ndarrays would not be subject to the field-only Metal version gate, and the test helper missed the same distinction.

Suggested fix

Parametrize the helper with is_ndarray=False and skip the Metal version gate when called with is_ndarray=True:

def _skip_if_no_zerocopy(*, is_ndarray=False):
    if qd.cfg.arch in _NO_ZEROCOPY_ARCHS:
        pytest.skip(f"DLPack zero-copy not supported on {qd.cfg.arch.name}")
    if qd.cfg.arch == qd.metal and not is_ndarray:
        from quadrants.lang.field import _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET
        if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
            pytest.skip("Metal DLPack zero-copy for fields requires torch >= 2.9.2")

Then the ndarray tests pass is_ndarray=True, mirroring the production helper's contract.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

@github-actions
Copy link
Copy Markdown

Coverage Report (37ebf7ff2)

File Coverage Missing
🟢 python/quadrants/lang/_ndarray.py 81% 453,468,481
🔴 python/quadrants/lang/field.py 53% 19,115-117,119,122-131,134,137,160,167,170,178,193-194,196,212,221,223,226,236,238,241,244-245,248,257,261,267-269,273-277,279,404,416,513-514,569,580,599
🔴 python/quadrants/lang/matrix.py 76% 22,1416,1438,1452,1849,1869,1889,1992,2012,2035
🔴 python/quadrants/lang/struct.py 50% 552,566
🟢 tests/python/test_to_torch_copy.py 90% 28,30,34-35,65,271-273,292-294,312-314,335-337,356-358,368,384-386,408-410,412-413,424,426-429,431-433,439,441-444,446-449,692-694

Diff coverage: 82% · Overall: 74% · 648 lines, 116 missing

Full annotated report

@hughperkins hughperkins merged commit 7f4632b into main Apr 30, 2026
54 checks passed
@hughperkins hughperkins deleted the hp/zerocopy-interop branch April 30, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants