[Perf] Enable zero-copy in to_torch() and to_numpy()#450
Conversation
a7cd921 to
c0b3bdc
Compare
|
|
||
| Provides cached, zero-copy conversion of Fields and Ndarrays to PyTorch tensors using the DLPack protocol. | ||
| The returned tensors directly alias the Quadrants device memory -- modifications to either side are visible | ||
| in the other (after ``qd.sync()`` on Apple Metal). |
There was a problem hiding this comment.
(afterqd.sync() on Apple Metal) is inaccurate torch.mps.synchronize() is also required in case of copy.
| ) | ||
|
|
||
|
|
||
| @functools.lru_cache(maxsize=1) |
There was a problem hiding this comment.
LRU cache is slow. It should be profiled but I would not be surprised if systematic recomputation is faster. I would recommend defined a global constant instead of a function.
|
|
||
| Args: | ||
| is_field: ``True`` for SNode-backed Fields, ``False`` for Ndarrays. | ||
| dtype: The Quadrants dtype (e.g. ``qd.f32``). Types not in the C++ DLPack whitelist (f16, u8, …) return False. |
| parts = torch.__version__.replace("+", ".").split(".")[:3] | ||
| return tuple(map(int, parts)) > (2, 9, 1) |
There was a problem hiding this comment.
If you are already relying on package packing somewhere else, using it here would be more robust. If not, just keep this.
There was a problem hiding this comment.
Opus here.
packaging isn't currently a runtime dependency of Quadrants — it only appears in three CI workflow YAMLs (win.yml, manylinux_wheel.yml, macosx.yml), not in pyproject.toml, setup.py, or requirements*.txt. Per the second branch of your suggestion ("if not, just keep this"), I'll leave the ad-hoc parser in place. It handles +cpu/+cu121 local-version suffixes via replace("+", "."), and falls through to False via except ValueError on dev/rc/nightly tags (e.g. 2.10.0a0) — which is the safe default since it just gates off the Metal-DLPack-bytes-offset optimisation rather than crashing.
| def dlpack_to_torch(obj): | ||
| """Return a cached zero-copy ``torch.Tensor`` view of a Field or Ndarray via DLPack. | ||
|
|
||
| On the first call the DLPack capsule is created and wrapped into a ``torch.Tensor`` which is then stored on the | ||
| source object as ``_qd_dlpack_tc``. Subsequent calls return the cached tensor directly (O(1)). | ||
|
|
||
| On Apple Metal an explicit ``sync()`` is performed after the initial export to ensure data visibility. | ||
| """ | ||
| try: | ||
| return obj._qd_dlpack_tc | ||
| except AttributeError: | ||
| import torch # pylint: disable=C0415 | ||
|
|
||
| tc = torch.from_dlpack(obj.to_dlpack()) | ||
| obj._qd_dlpack_tc = tc | ||
| if impl.current_cfg().arch == _ARCH_METAL: | ||
| impl.get_runtime().sync() | ||
| return tc | ||
|
|
||
|
|
||
| def invalidate_zerocopy_cache(obj) -> None: | ||
| """Remove the cached DLPack torch tensor, if any.""" | ||
| try: | ||
| del obj._qd_dlpack_tc | ||
| except AttributeError: | ||
| pass |
There was a problem hiding this comment.
This caching mechanism is missing transpose. This is important for efficiency.
There was a problem hiding this comment.
This caching mechanism is missing numpy view. This is important for efficiency.
NumPy>=1.22 (genesis requires >=1.26) supports np.from_dlpack() natively. Using it would skip importing torch and the torch tensor intermediary for numpy-only workloads. Rather than caching a .numpy() derived from a torch tensor, a np.from_dlpack() view could be cached independently. It also avoids the torch import when the user never uses torch at all.
| from quadrants.lang._interop import ( # pylint: disable=C0415 | ||
| can_zerocopy, | ||
| dlpack_to_torch, | ||
| ) |
There was a problem hiding this comment.
I don't see any good reason to keep this import local. No obvious circular deps or direct dependency to torch. Maybe I'm missing something.
| Uses zero-copy when possible (all backends except Vulkan, and when the dtype is supported by DLPack). | ||
|
|
||
| Args: | ||
| copy: ``None``/``False`` return a zero-copy view, ``True`` returns an independent copy. |
There was a problem hiding this comment.
That is not what None does.
| if copy is False: | ||
| raise ValueError("Zero-copy not available for this backend/dtype combination") | ||
|
|
||
| import torch # pylint: disable=C0415 |
There was a problem hiding this comment.
Personally I prefer to move local imports in top of its local scope.
There was a problem hiding this comment.
Complicated change. Won't do.
| try: | ||
| tc = dlpack_to_torch(self) | ||
| if tc.device.type == "cpu": | ||
| return tc.numpy() |
There was a problem hiding this comment.
This is very efficient. Not acceptable.
| if tc.device.type == "cpu": | ||
| return tc.numpy() | ||
| except (ImportError, RuntimeError): | ||
| pass |
There was a problem hiding this comment.
I'm not in favour of using try-except if the condition can be checked. What about storing a flag directly at instance level to keep track of whether zero-copy is supported by this tensor?
| if copy is False: | ||
| raise ValueError("Zero-copy to numpy requires a CPU backend") | ||
| elif copy is False: | ||
| raise ValueError("Zero-copy not available for this backend/type combination") |
There was a problem hiding this comment.
It would be nice to be more specific.
| as_vector = self.m == 1 and not keep_dims | ||
| shape_ext = (self.n,) if as_vector else (self.n, self.m) | ||
| expected = self.shape + shape_ext |
| as_vector = self.m == 1 and not keep_dims | ||
| shape_ext = (self.n,) if as_vector else (self.n, self.m) | ||
| expected = self.shape + shape_ext |
| dlpack_to_torch, | ||
| ) | ||
|
|
||
| if copy is not True and can_zerocopy(is_field=True, dtype=self.dtype, shape=self.shape): |
There was a problem hiding this comment.
This condition is wrong. You should prefer zero-copy even if a copy is requested. This is orthogonal. Zero-copy is always faster even if you need to copy in the end.
There was a problem hiding this comment.
You should use zerocopy systematically, and call torch.Tensor.numpy on cpu backend.
|
|
||
| if copy is not True and can_zerocopy(is_field=False, dtype=self.dtype): | ||
| tc = dlpack_to_torch(self) | ||
| return tc.clone() if copy is True else tc |
There was a problem hiding this comment.
if copy is True is a dead branch. But it should not be. You should just not guard against copy is not True.
| dlpack_to_torch, | ||
| ) | ||
|
|
||
| if copy is not True and can_zerocopy(is_field=False, dtype=self.dtype): |
There was a problem hiding this comment.
can_zerocopy(is_field=False, dtype=self.dtype) should be stored as a constant attribute of the instance for efficiency.
| return tc | ||
|
|
||
|
|
||
| def invalidate_zerocopy_cache(obj) -> None: |
There was a problem hiding this comment.
What about cache invalidation for ScalarField and MatrixField have no corresponding _reset hook?
| Dict[str, Union[numpy.ndarray, Dict]]: The result NumPy array. | ||
| """ | ||
| return {k: v.to_numpy() for k, v in self._items} | ||
| # Struct children have interleaved SNode memory; DLPack cannot represent this, so always use kernel copy. |
There was a problem hiding this comment.
struct members are separate SNode leaves with contiguous storage each, individual members could be zero-copied via DLPack from what I understand.
| tc = dlpack_to_torch(self) | ||
| return tc.clone() if copy is True else tc |
There was a problem hiding this comment.
You are missing call to torch.mps.synchronize() before clone on metal backend.
| raise ValueError( | ||
| f"copy=False is incompatible with device transfer (data on {tc.device}, requested {device})" | ||
| ) | ||
| return tc.to(device) |
There was a problem hiding this comment.
Missing call to torch.mps.synchronize() on metal backend.
|
@duburcqa conversation with Opus about this AI. Thoughts? User: this code was migrated from genesis, by an AI. Can you compare this code with what is currently in genesis, what is currently in quadrants main, and figure out why the code isnt ssimply automatically correct, from being ported from Genesis?Assistant: Why it isn't automatically correctThree structural reasons, in order of importance: 1. The original lives in the consumer, not the library. 2. The AI re-decided design choices instead of transcribing them. Genesis's design has clear invariants that are easy to miss when refactoring: "always try zerocopy first, then clone if requested" (not "guard zerocopy behind copy preference"); "sync MPS twice — once before reading, once after cloning"; "cache transpose and numpy variants too". The AI re-derived the algorithm from intent and got the boundaries wrong in 3. The Quadrants API is a different shape than the Genesis wrapper. Genesis exposes one entry point, so it gets one cache, one set of conditions, one sync sequence. Quadrants has four to six entry points ( A faithful port would have:
The current PR is roughly an interpretation of Genesis rather than a port, and most of duburcqa's review is asking for the original Genesis behaviour back. User: it sounds like you are recommending we do the actions under 'a faithful port', is that a fair impression?Assistant: Yes, that's a fair impression — but with one caveat I should make explicit. I was describing what a faithful port would have looked like, and that aligns closely with what duburcqa is asking for in the review. So if you accept the reviewer's framing, then yes, "a faithful port" is roughly the recommended path. The caveat: there's one place where Genesis itself may not be the right target. The use-after-free risk on Two other things worth flagging before you commit to that direction:
So my honest recommendation is: yes, do the faithful-port refactor as the structural backbone (one central helper, full cache shape, both syncs, correct |
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 78% |
| Overall project coverage | 20% |
Total: 572 lines, 128 missing, 78% covered
🔴 python/quadrants/lang/_interop.py (47%)
1 """Zero-copy tensor interop between Quadrants and PyTorch / NumPy via DLPack.
2
3 Provides cached, zero-copy conversion of Fields and Ndarrays to ``torch.Tensor`` and ``numpy.ndarray`` using DLPack.
4 Returned tensors / arrays alias Quadrants device memory directly -- modifications on either side are visible on the
5 other (after the appropriate sync; see :class:`_ZerocopyCache`).
6
7 Public API used by :mod:`quadrants.lang._ndarray`, :mod:`quadrants.lang.field`, :mod:`quadrants.lang.matrix`, and
8 :mod:`quadrants.lang.struct`:
9
10 * :func:`can_zerocopy` -- predicate (call once, cache result on the source instance).
11 * :class:`_ZerocopyCache` -- per-instance container for the torch + numpy DLPack views.
12 * :func:`make_zerocopy_cache_if_supported` -- constructor that registers the owner with the runtime so the cache is
13 invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++ teardown.
14 * :func:`get_zerocopy_torch`, :func:`get_zerocopy_numpy` -- thin entry points used by every per-class ``to_torch`` /
15 ``to_numpy``. Implement the always-zerocopy-then-clone semantic and the Apple Metal double-sync (``qd.sync()`` on
16 read, ``torch.mps.synchronize()`` after ``.clone()`` / ``.to()``).
17 """
18
🔴 19 from __future__ import annotations
20
🔴 21 from typing import TYPE_CHECKING
22
🔴 23 import numpy as np
24
🔴 25 from quadrants._lib import core as _qd_core
🔴 26 from quadrants.lang import impl
🔴 27 from quadrants.types import primitive_types
28
29 if TYPE_CHECKING:
30 import torch as _torch_mod # for type hints only
31
32 # Optional torch import: numpy zero-copy works without torch (np.from_dlpack), so importing this module must not fail
33 # on torch-less environments (e.g. the Vulkan CI runner).
🔴 34 try:
🔴 35 import torch as _torch
🔴 36 from torch.utils.dlpack import from_dlpack as _torch_from_dlpack
37
🔴 38 _HAS_TORCH = True
🔴 39 except ImportError:
🔴 40 _torch = None # type: ignore[assignment]
🔴 41 _torch_from_dlpack = None # type: ignore[assignment]
🔴 42 _HAS_TORCH = False
43
44
🔴 45 _ARCH_METAL = _qd_core.Arch.metal
🔴 46 _ARCH_VULKAN = _qd_core.Arch.vulkan
🔴 47 _ARCH_CPU = frozenset({_qd_core.Arch.x64, _qd_core.Arch.arm64})
48
🔴 49 _DLPACK_SUPPORTED_DTYPES = frozenset(
50 {primitive_types.i32, primitive_types.i64, primitive_types.f32, primitive_types.f64, primitive_types.u1}
51 )
52
53
🔴 54 def _compute_torch_mps_supports_dlpack_bytes_offset() -> bool:
55 """Required for zero-copy export of Fields on Apple Metal.
56
57 Available since torch > 2.9.1 (see pytorch/pytorch#168193).
58 """
🔴 59 if not _HAS_TORCH:
🔴 60 return False
🔴 61 parts = _torch.__version__.replace("+", ".").split(".")[:3]
🔴 62 try:
🔴 63 return tuple(map(int, parts)) > (2, 9, 1)
🔴 64 except ValueError:
🔴 65 return False
66
67
68 # Evaluate once at import. Reviewer feedback on PR #450: lru_cache(maxsize=1) on a zero-arg helper adds per-call
69 # overhead with no benefit; a module constant is cheaper and clearer.
🔴 70 _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET = _compute_torch_mps_supports_dlpack_bytes_offset()
71
72
🔴 73 def current_arch_is_cpu() -> bool:
74 """``True`` when the active Quadrants backend is a CPU backend (x64 or arm64)."""
🟢 75 return impl.current_cfg().arch in _ARCH_CPU
76
77
🔴 78 def can_zerocopy(
79 is_field: bool,
80 dtype=None,
81 is_scalar_field: bool = False,
82 shape: tuple[int, ...] = (),
83 is_aos_struct_member: bool = False,
84 ) -> bool:
85 """Check whether zero-copy DLPack export is available for the current backend and data type.
86
87 This is intended to be called **once** at the source instance's construction (or first access) and cached on the
88 instance, not re-evaluated on every ``to_torch`` / ``to_numpy`` call.
89
90 Args:
91 is_field: ``True`` for SNode-backed Fields, ``False`` for Ndarrays.
92 dtype: The Quadrants dtype (e.g. ``qd.f32``). Types not in the C++ DLPack whitelist (f16, u8, ...) return False.
93 is_scalar_field: ``True`` when the source is a ``ScalarField`` (0-dim DLPack edge-case).
94 shape: Batch shape of the field/ndarray.
95 is_aos_struct_member: ``True`` when the field is a member of a multi-member ``StructField``. Quadrants' C++
96 ``field_to_dlpack`` does not currently emit cell-stride-aware DLPack views for AOS struct members (it
97 computes contiguous strides at the member dtype size, but the actual stride between consecutive elements of
98 the same member is ``sizeof(cell)``), so the zerocopy view would interleave neighboring members' bytes.
99 Force kernel-copy until the C++ export is fixed.
100
101 Returns:
102 ``True`` if zero-copy via DLPack is supported.
103 """
🟢 104 if dtype is not None and dtype not in _DLPACK_SUPPORTED_DTYPES:
🟢 105 return False
🟢 106 arch = impl.current_cfg().arch
🟢 107 if arch == _ARCH_VULKAN:
🔴 108 return False
🟢 109 if is_field:
🟢 110 if arch == _ARCH_METAL and not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
🔴 111 return False
112 # 0-dim ScalarFields lack DLPack bytes_offset support in current PyTorch.
🟢 113 if is_scalar_field and not shape:
🔴 114 return False
🟢 115 if is_aos_struct_member:
🟢 116 return False
🟢 117 return True
118
119
🔴 120 class _DLPackV1Adapter:
121 """Wraps a DLPack v0 PyCapsule into a v1-compatible object.
122
123 Quadrants' C++ ``field_to_dlpack`` / ``ndarray_to_dlpack`` return raw PyCapsules (the v0 DLPack protocol). Modern
124 NumPy (>= 1.23 with strict checks; mandatory in NumPy 2.x) requires the v1 protocol -- an object exposing
125 ``__dlpack__`` and ``__dlpack_device__``. This adapter is a thin Python-side bridge so we don't have to touch the
126 C++ DLPack export.
127
128 CPU-only by construction; numpy zero-copy is gated on ``current_arch_is_cpu()`` upstream.
129 """
130
🔴 131 __slots__ = ("_capsule",)
🔴 132 _KDLCPU = 1 # DLDeviceType::kDLCPU
133
🔴 134 def __init__(self, capsule):
🟢 135 self._capsule = capsule
136
🔴 137 def __dlpack__(self, stream=None):
🟢 138 return self._capsule
139
🔴 140 def __dlpack_device__(self):
🔴 141 return (self._KDLCPU, 0)
142
143
🔴 144 class _ZerocopyCache:
145 """Per-instance cache of DLPack-backed views into a Quadrants Field / Ndarray.
146
147 Holds two independent slots:
148
149 * ``_tc``: ``torch.Tensor`` filled via ``torch.utils.dlpack.from_dlpack`` on first access.
150 * ``_np``: ``numpy.ndarray`` filled via ``numpy.from_dlpack`` on first access (CPU only). Independent of torch:
151 numpy-only workloads never trigger a torch import.
152
153 Each slot is filled lazily and may be ``None`` if not yet requested.
154
155 Lifetime: the cache is invalidated by :func:`PyQuadrants.reset` (via ``cache_holders``) BEFORE the C++ program is
156 torn down, so the DLPack deleters run while the underlying memory is still valid.
157 """
158
🔴 159 __slots__ = ("_tc", "_np")
160
🔴 161 def __init__(self):
🟢 162 self._tc = None
🟢 163 self._np = None
164
🔴 165 def _ensure_torch(self, owner) -> "_torch_mod.Tensor":
🟢 166 if not _HAS_TORCH:
🔴 167 raise RuntimeError("torch is not installed; zero-copy to_torch is unavailable")
🟢 168 if self._tc is None:
🟢 169 self._tc = _torch_from_dlpack(owner.to_dlpack())
🟢 170 return self._tc
171
🔴 172 def _ensure_numpy(self, owner) -> np.ndarray:
🟢 173 if self._np is None:
🟢 174 self._np = np.from_dlpack(_DLPackV1Adapter(owner.to_dlpack()))
🟢 175 return self._np
176
🔴 177 def invalidate(self) -> None:
178 """Drop both cached views. Idempotent."""
🟢 179 self._tc = None
🟢 180 self._np = None
181
182
🔴 183 def make_zerocopy_cache_if_supported(
184 owner,
185 *,
186 is_field: bool,
187 dtype,
188 is_scalar_field: bool = False,
189 shape: tuple[int, ...] = (),
190 is_aos_struct_member: bool = False,
191 ) -> _ZerocopyCache | None:
192 """Construct a ``_ZerocopyCache`` for ``owner`` if zero-copy is supported, else return ``None``.
193
194 Also registers ``owner`` with ``pyquadrants.cache_holders`` so its cache is invalidated on ``qd.reset()`` /
195 ``qd.init()`` BEFORE C++ teardown. The owner must define a method ``_invalidate_zerocopy_cache(self)`` that calls
196 :meth:`_ZerocopyCache.invalidate`.
197
198 Intended to be called at instance construction (or once via ``cached_property``).
199 """
🟢 200 if not can_zerocopy(
201 is_field=is_field,
202 dtype=dtype,
203 is_scalar_field=is_scalar_field,
204 shape=shape,
205 is_aos_struct_member=is_aos_struct_member,
206 ):
🟢 207 return None
🟢 208 cache = _ZerocopyCache()
🟢 209 impl.get_runtime().cache_holders.add(owner)
🟢 210 return cache
211
212
🔴 213 def _metal_sync_runtime() -> None:
214 """Quadrants -> MPS sync. Required so the DLPack-backed MPS tensor sees pending kernel writes."""
🟢 215 if impl.current_cfg().arch == _ARCH_METAL:
🔴 216 impl.get_runtime().sync()
217
218
🔴 219 def _metal_sync_torch() -> None:
220 """MPS -> next operation sync. Required after ``.clone()`` / ``.to()`` so the resulting torch tensor
221 has actually finished copying before the next user op (or the next Quadrants kernel) sees it."""
🟢 222 if _HAS_TORCH and impl.current_cfg().arch == _ARCH_METAL:
🔴 223 _torch.mps.synchronize()
224
225
🔴 226 def get_zerocopy_torch(
227 owner,
228 *,
229 copy: bool | None,
230 device=None,
231 ) -> "_torch_mod.Tensor | None":
232 """Zero-copy entry point for ``to_torch``.
233
234 Returns the cached zero-copy view, optionally cloned and / or moved to ``device``. The ``copy`` argument selects
235 between view (``False`` / ``None``) and independent buffer (``True``); crucially, the zerocopy export is taken
236 **even when ``copy=True``** and the result is then cloned -- DLPack export is cheaper than the kernel-copy fallback
237 path even when followed by a full clone.
238
239 Args:
240 owner: A Field or Ndarray with a ``_zerocopy_cache: _ZerocopyCache | None`` attribute and a ``to_dlpack()``
241 method.
242 copy: ``None`` -> view; ``False`` -> view (raises if zerocopy unsupported); ``True`` -> clone.
243 device: Optional torch device. If different from the view's device, performs a device transfer (incompatible
244 with ``copy=False``).
245
246 Returns:
247 The torch tensor, or ``None`` when ``owner._zerocopy_cache is None`` (i.e. zero-copy is not supported for this
248 instance) and ``copy is not False`` -- the caller should fall back to its kernel-copy path.
249
250 Raises:
251 ValueError: when ``copy=False`` but zerocopy is unsupported for this instance, or when a device transfer is
252 required but ``copy=False``.
253 """
🟢 254 cache: _ZerocopyCache | None = owner._zerocopy_cache
🟢 255 if cache is None:
🟢 256 if copy is False:
🔴 257 raise ValueError("Zero-copy not available for this backend / dtype combination")
🟢 258 return None
259
🟢 260 _metal_sync_runtime()
🟢 261 tc = cache._ensure_torch(owner)
262
🟢 263 needs_device_transfer = device is not None and tc.device != _torch.device(device)
🟢 264 if needs_device_transfer:
🔴 265 if copy is False:
🔴 266 raise ValueError(
267 f"copy=False is incompatible with device transfer (data on {tc.device}, requested {device})"
268 )
🔴 269 out = tc.to(device)
🔴 270 _metal_sync_torch()
🔴 271 return out
272
🟢 273 if copy is True:
🟢 274 out = tc.clone()
🟢 275 _metal_sync_torch()
🟢 276 return out
🟢 277 return tc
278
279
🔴 280 def get_zerocopy_numpy(
281 owner,
282 *,
283 copy: bool | None,
284 dtype_target=None,
285 ) -> np.ndarray | None:
286 """Zero-copy entry point for ``to_numpy``.
287
288 Numpy zero-copy is available only on CPU backends (numpy arrays cannot reference GPU memory).
289 Uses ``numpy.from_dlpack`` directly -- no torch import required.
290
291 Args:
292 owner: A Field or Ndarray with a ``_zerocopy_cache: _ZerocopyCache | None`` attribute and a ``to_dlpack()``
293 method.
294 copy: ``None`` -> view; ``False`` -> view (raises if zerocopy unsupported); ``True`` -> copy.
295 dtype_target: Optional numpy dtype. If different from the view's dtype, performs ``.astype()`` (incompatible
296 with ``copy=False``).
297
298 Returns:
299 The numpy array, or ``None`` when zerocopy is unsupported for this instance and ``copy is not False`` -- the
300 caller should fall back to its kernel-copy path.
301
302 Raises:
303 ValueError: when ``copy=False`` but zerocopy is unsupported, or when a dtype conversion is required but
304 ``copy=False``.
305 """
🟢 306 cache: _ZerocopyCache | None = owner._zerocopy_cache
🟢 307 if cache is None or not current_arch_is_cpu():
🟢 308 if copy is False:
🔴 309 raise ValueError("Zero-copy numpy unavailable (requires a CPU backend and a supported dtype)")
🟢 310 return None
311
🟢 312 arr = cache._ensure_numpy(owner)
313
🟢 314 if dtype_target is not None and arr.dtype != dtype_target:
🟢 315 if copy is False:
🟢 316 raise ValueError("copy=False is incompatible with dtype conversion")
🔴 317 return arr.astype(dtype_target)
318
🟢 319 if copy is True:
🟢 320 return arr.copy()
🔴 321 return arr
🔴 python/quadrants/lang/_ndarray.py (47%)
🔴 3 from functools import cached_property
🔴 9 from quadrants.lang import _interop, _ndarray_pickle, impl
20 to_pytorch_type,
🔴 70 @cached_property
🔴 71 def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
72 """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported for this instance.
73
74 Computed once per instance (review feedback: avoid re-checking ``can_zerocopy`` per call). Registers ``self``
75 with ``pyquadrants.cache_holders`` so the cache is invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++
76 teardown.
77 """
🟢 78 return _interop.make_zerocopy_cache_if_supported(self, is_field=False, dtype=self.dtype)
79
🔴 80 def _invalidate_zerocopy_cache(self) -> None:
81 """Hook called by ``impl.reset()`` (via ``pyquadrants.cache_holders``) before C++ teardown."""
🟢 82 cache = self.__dict__.get("_zerocopy_cache")
🟢 83 if cache is not None:
🟢 84 cache.invalidate()
85
🔴 86 @python_scope
🔴 87 def to_torch(self, *, copy=None):
88 """Converts this ndarray to a ``torch.Tensor``.
89
90 Zero-copy via DLPack when supported for this backend/dtype, otherwise an independent kernel-copied tensor.
91
92 Args:
93 copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
94 zero-copy or raises.
95 """
🟢 96 tc = _interop.get_zerocopy_torch(self, copy=copy)
🟢 97 if tc is not None:
🟢 98 return tc
99
🔴 100 import torch # pylint: disable=C0415
101
🔴 102 arr = torch.zeros(size=self.arr.total_shape(), dtype=to_pytorch_type(self.dtype))
🔴 103 from quadrants._kernels import ndarray_to_ext_arr # pylint: disable=C0415
104
🔴 105 ndarray_to_ext_arr(self, arr)
🔴 106 impl.get_runtime().sync()
🔴 107 return arr
108
112
113 Note: cache invalidation is handled separately by ``pyquadrants.cache_holders`` BEFORE the C++ program is torn
114 down; this hook only nulls out the Python-side state.
🔴 179 def _ndarray_to_numpy(self, *, copy=None):
182 Args:
183 copy: ``None`` (default) and ``True`` return an independent copy (numpy arrays are conventionally expected
184 to outlive their source). ``False`` returns a zero-copy DLPack view (requires CPU backend and a
185 supported dtype) or raises ``ValueError``.
186
🟢 190 if copy is False:
🔴 191 return _interop.get_zerocopy_numpy(self, copy=False)
192 # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢 193 arr = _interop.get_zerocopy_numpy(self, copy=True)
🟢 194 if arr is not None:
🟢 195 return arr
196
🔴 205 def _ndarray_matrix_to_numpy(self, as_vector, *, copy=None):
208 Args:
209 as_vector: Whether to treat as a vector ndarray.
210 copy: see :meth:`_ndarray_to_numpy`.
211
🟢 215 if copy is False:
🔴 216 return _interop.get_zerocopy_numpy(self, copy=False)
🟢 217 arr = _interop.get_zerocopy_numpy(self, copy=True)
🟢 218 if arr is not None:
🟢 219 return arr
220
🔴 391 def to_numpy(self, *, copy=None):
🟢 392 return self._ndarray_to_numpy(copy=copy)
🔴 python/quadrants/lang/field.py (65%)
🔴 4 from functools import cached_property
🔴 11 from quadrants.lang import _interop, impl
🔴 107 def _invalidate_zerocopy_cache(self) -> None:
108 """Hook called by ``impl.reset()`` (via ``pyquadrants.cache_holders``) before C++ teardown.
109
110 Subclasses that support zero-copy define ``_zerocopy_cache`` as a ``cached_property``; we read it via
111 ``__dict__.get`` so invalidation never triggers the lazy init.
112 """
🟢 113 cache = self.__dict__.get("_zerocopy_cache")
🟢 114 if cache is not None:
🟢 115 cache.invalidate()
116
🔴 122 def to_numpy(self, dtype: DataTypeCxx | None = None, *, copy: bool | None = None):
125 Args:
126 copy: Controls copying behaviour:
127
128 - ``None`` (default) -- zero-copy when possible, copy otherwise.
129 - ``True`` -- always return an independent copy.
130 - ``False`` -- require zero-copy; raises if not possible.
131
🔴 138 def to_torch(self, device=None, *, copy: bool | None = None):
143 copy: Controls copying behaviour:
144
145 - ``None`` (default) -- zero-copy when possible, copy otherwise.
146 - ``True`` -- always return an independent copy.
147 - ``False`` -- require zero-copy; raises if not possible.
🔴 251 @cached_property
🔴 252 def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
253 """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported.
254
255 Computed once per instance (see review feedback #17 on PR #450). Registers ``self`` in
256 ``pyquadrants.cache_holders`` so the cache is invalidated on ``qd.reset()`` / ``qd.init()`` BEFORE C++ teardown.
257 """
258 # A ScalarField that is a member of a multi-member StructField has AOS layout: its parent SNode (the struct
259 # cell) holds multiple `place` children, and consecutive elements of the same member are sizeof(cell) bytes
260 # apart. The C++ field_to_dlpack does not emit those strides yet, so zerocopy would produce an interleaved
261 # (broken) view. Skip it.
🟢 262 parent_snode = self.parent()._snode.ptr
🟢 263 is_aos_struct_member = parent_snode.get_num_ch() > 1
🟢 264 return _interop.make_zerocopy_cache_if_supported(
265 self,
266 is_field=True,
267 dtype=self.dtype,
268 is_scalar_field=True,
269 shape=self.shape,
270 is_aos_struct_member=is_aos_struct_member,
271 )
272
🔴 287 def to_numpy(self, dtype=None, *, copy=None):
288 """Converts this field to a `numpy.ndarray`.
289
290 Args:
291 dtype: Optional target numpy dtype. Incompatible with ``copy=False`` if it differs from the field's native
292 dtype.
293 copy: ``None`` (default) and ``True`` return an independent copy. ``False`` returns a zero-copy DLPack view
294 (requires CPU backend and a supported dtype) or raises ``ValueError``. Note: zero-copy numpy arrays
295 alias the field's underlying C++ runtime memory; callers opting into ``copy=False`` are responsible for
296 the buffer lifetime.
297 """
🟢 302 np_dtype_target = None
🟢 303 if dtype is not None:
🟢 304 np_dtype_target = to_numpy_type(dtype) if isinstance(dtype, _qd_core.DataTypeCxx) else dtype
305
🟢 306 if copy is False:
🟢 307 return _interop.get_zerocopy_numpy(self, copy=False, dtype_target=np_dtype_target)
308 # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢 309 arr = _interop.get_zerocopy_numpy(self, copy=True, dtype_target=np_dtype_target)
🟢 310 if arr is not None:
🟢 311 return arr
312
🔴 326 def to_torch(self, device=None, *, copy=None):
327 """Converts this field to a `torch.tensor`.
328
329 Args:
330 device: Optional torch device. Incompatible with ``copy=False`` if it differs from the field's native
331 device.
332 copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
333 zero-copy or raises.
334 """
🟢 335 tc = _interop.get_zerocopy_torch(self, copy=copy, device=device)
🟢 336 if tc is not None:
🟢 337 return tc
338
🟢 python/quadrants/lang/impl.py (100%)
377 # Holders of zero-copy DLPack caches (Ndarray, ScalarField, MatrixField, ...). On reset, each holder's
378 # `_invalidate_zerocopy_cache()` is called BEFORE the C++ program is torn down, so that DLPack deleters run
379 # while the underlying memory is still valid.
🟢 380 self.cache_holders: weakref.WeakSet = weakref.WeakSet()
🟢 571 old_cache_holders = pyquadrants.cache_holders
572 # Drop zero-copy caches BEFORE the C++ program is torn down. The cached torch tensors / numpy arrays alias C++
573 # memory via DLPack; their deleters must run while the program is still alive, otherwise we get use-after-free at
574 # GC time (segfault on AMD GPU, Metal, etc.).
🟢 575 for h in old_cache_holders:
🟢 576 h._invalidate_zerocopy_cache()
🔴 python/quadrants/lang/matrix.py (46%)
🔴 6 from functools import cached_property
🔴 13 from quadrants.lang import _interop, expr, impl, runtime_ops
🔴 1214 @cached_property
🔴 1215 def _zerocopy_cache(self) -> _interop._ZerocopyCache | None:
1216 """Lazily-constructed DLPack cache. ``None`` when zero-copy is unsupported.
1217
1218 Constructed once per instance (closes review #17); registers ``self`` with ``pyquadrants.cache_holders``
1219 (closes review #18) so ``qd.reset()`` / ``qd.init()`` invalidate the cache BEFORE C++ teardown.
1220 """
1221 # Defensive: if this MatrixField is a member of a multi-member StructField, its representative SNode shares a
1222 # parent with sibling members and the C++ DLPack export produces broken AOS strides. See
1223 # ScalarField._zerocopy_cache for the full rationale.
🟢 1224 parent_snode = self.parent()._snode.ptr
🟢 1225 is_aos_struct_member = parent_snode.get_num_ch() > 1
🟢 1226 return _interop.make_zerocopy_cache_if_supported(
1227 self,
1228 is_field=True,
1229 dtype=self.dtype,
1230 shape=self.shape,
1231 is_aos_struct_member=is_aos_struct_member,
1232 )
1233
🔴 1234 def _matrix_view_shape(self, keep_dims: bool) -> tuple[tuple[int, ...], bool]:
1235 """Returns ``(expected_shape, as_vector)`` for ``to_torch`` / ``to_numpy``.
1236
1237 Keeps the n/m/keep_dims handling in one place rather than duplicating it in each conversion method (closes
1238 review #13, #14).
1239 """
🟢 1240 as_vector = self.m == 1 and not keep_dims
🟢 1241 shape_ext = (self.n,) if as_vector else (self.n, self.m)
🟢 1242 return self.shape + shape_ext, as_vector
1243
🔴 1337 def to_numpy(self, keep_dims=False, dtype=None, *, copy=None):
1346 copy: ``None`` (default) and ``True`` return an independent copy. ``False`` returns a zero-copy DLPack view
1347 (requires CPU backend and a supported dtype) or raises ``ValueError``. Note: zero-copy numpy arrays
1348 alias the field's underlying C++ runtime memory; callers opting into ``copy=False`` are responsible for
1349 the buffer lifetime.
🟢 1354 expected, as_vector = self._matrix_view_shape(keep_dims)
🟢 1355 np_dtype_target = None
🟢 1356 if dtype is not None:
🔴 1357 np_dtype_target = to_numpy_type(dtype) if isinstance(dtype, qd_python_core.DataTypeCxx) else dtype
1358
🟢 1359 if copy is False:
🔴 1360 arr = _interop.get_zerocopy_numpy(self, copy=False, dtype_target=np_dtype_target)
🔴 1361 if arr.shape != expected:
🔴 1362 arr = arr.reshape(expected)
🔴 1363 return arr
1364 # copy is None or True: try fast zerocopy+clone path, else kernel fallback.
🟢 1365 arr = _interop.get_zerocopy_numpy(self, copy=True, dtype_target=np_dtype_target)
🟢 1366 if arr is not None:
🔴 1367 if arr.shape != expected:
🔴 1368 arr = arr.reshape(expected)
🔴 1369 return arr
1370
🟢 1373 arr = np.zeros(expected, dtype=dtype)
🔴 1380 def to_torch(self, device=None, keep_dims=False, *, copy=None):
1387 copy: ``None`` (default) prefers zero-copy, ``True`` forces an independent copy, ``False`` requires
1388 zero-copy or raises.
🟢 1393 expected, as_vector = self._matrix_view_shape(keep_dims)
🟢 1394 tc = _interop.get_zerocopy_torch(self, copy=copy, device=device)
🟢 1395 if tc is not None:
🔴 1396 if tc.shape != expected:
🔴 1397 tc = tc.reshape(expected)
🔴 1398 return tc
1399
🟢 1403 arr = torch.empty(expected, dtype=to_pytorch_type(self.dtype), device=device)
🔴 1777 def to_numpy(self, *, copy=None):
🔴 1790 return self._ndarray_matrix_to_numpy(as_vector=0, copy=copy)
🔴 1888 def to_numpy(self, *, copy=None):
🟢 1901 return self._ndarray_matrix_to_numpy(as_vector=1, copy=copy)
🔴 python/quadrants/lang/struct.py (62%)
🔴 552 def to_numpy(self, *, copy=None):
555 Struct fields use AOS cell layout, but Quadrants' C++ ``field_to_dlpack`` does not currently emit
556 cell-stride-aware DLPack views for individual members (it computes contiguous strides at the member dtype
557 size, which would interleave neighboring members' bytes). Until the C++ export is taught about AOS strides,
558 ``StructField`` always returns independent copies; the ``copy`` argument is accepted for API symmetry but
559 ``copy=False`` is rejected.
560
563 Args:
564 copy: must be ``None`` or ``True``; ``copy=False`` raises ``ValueError``.
565
🟢 569 if copy is False:
🟢 570 raise ValueError(
571 "StructField.to_numpy(copy=False) is not supported: AOS member views require "
572 "cell-stride-aware DLPack export which the C++ runtime does not emit yet."
573 )
🔴 574 return {k: v.to_numpy(copy=True) for k, v in self._items}
🔴 577 def to_torch(self, device=None, *, copy=None):
580 See :meth:`to_numpy` for why members are always returned as independent copies. ``copy=False`` is rejected.
581
585 device (torch.device, optional): The desired device of returned tensors.
586 copy: must be ``None`` or ``True``; ``copy=False`` raises ``ValueError``.
589 Dict[str, Union[torch.Tensor, Dict]]: The result PyTorch tensor.
🟢 591 if copy is False:
🟢 592 raise ValueError(
593 "StructField.to_torch(copy=False) is not supported: AOS member views require "
594 "cell-stride-aware DLPack export which the C++ runtime does not emit yet."
595 )
🟢 596 return {k: v.to_torch(device=device, copy=True) for k, v in self._items}
🔴 python/quadrants/lang/util.py (0%)
🔴 95 @functools.lru_cache(maxsize=None)
🔴 133 @functools.lru_cache(maxsize=None)
🟢 tests/python/quadrants/lang/test_zerocopy_interop.py (96%)
1 """Tests for zero-copy to_torch / to_numpy via DLPack (the ``copy`` parameter)."""
2
🟢 3 import os
4
🟢 5 import numpy as np
🟢 6 import pytest
7
🟢 8 torch = pytest.importorskip("torch")
9
🟢 10 import quadrants as qd
11
🟢 12 from tests import test_utils
13
🟢 14 pytestmark = pytest.mark.needs_torch
15
🟢 16 dlpack_arch = [qd.cpu, qd.cuda, qd.metal, qd.amdgpu]
17
18
🟢 19 def is_v520_amdgpu():
🟢 20 return os.environ.get("QD_AMDGPU_V520", None) == "1" and qd.cfg.arch == qd.amdgpu
21
22
🟢 23 def _to_cpu(t):
24 """Move tensor to CPU for value comparison on arches where torch accessors may not work."""
🟢 25 if is_v520_amdgpu() or t.device.type != "cpu":
🔴 26 return t.cpu()
🟢 27 return t
28
29
30 # ---------------------------------------------------------------------------
31 # ScalarField.to_torch -- zero-copy
32 # ---------------------------------------------------------------------------
33
34
🟢 35 @test_utils.test(arch=dlpack_arch)
🟢 36 @pytest.mark.parametrize("dtype", [qd.i32, qd.f32])
🟢 37 def test_scalar_field_to_torch_zerocopy(dtype):
🟢 38 f = qd.field(dtype, shape=(4,))
🟢 39 f[0] = 10
🟢 40 f[1] = 20
🟢 41 qd.sync()
42
🟢 43 tc = f.to_torch()
🟢 44 tc = _to_cpu(tc)
🟢 45 assert tc[0] == 10
🟢 46 assert tc[1] == 20
🟢 47 assert tuple(tc.shape) == (4,)
48
49
🟢 50 @test_utils.test(arch=dlpack_arch)
🟢 51 def test_scalar_field_to_torch_caching():
52 """Calling to_torch() twice returns the same cached object."""
🟢 53 f = qd.field(qd.f32, shape=(3,))
🟢 54 qd.sync()
🟢 55 t1 = f.to_torch()
🟢 56 t2 = f.to_torch()
🟢 57 assert t1.data_ptr() == t2.data_ptr()
58
59
🟢 60 @test_utils.test(arch=dlpack_arch)
🟢 61 def test_scalar_field_to_torch_copy_true():
62 """copy=True must return an independent tensor."""
🟢 63 f = qd.field(qd.f32, shape=(3,))
🟢 64 f[0] = 42
🟢 65 qd.sync()
🟢 66 tc_view = f.to_torch()
🟢 67 tc_copy = f.to_torch(copy=True)
🟢 68 assert _to_cpu(tc_copy)[0] == 42
🟢 69 assert tc_view.data_ptr() != tc_copy.data_ptr()
70
71
🟢 72 @test_utils.test(arch=dlpack_arch)
🟢 73 def test_scalar_field_to_torch_copy_false():
74 """copy=False should succeed on DLPack-capable arches with non-empty shape."""
🟢 75 f = qd.field(qd.f32, shape=(3,))
🟢 76 f[0] = 7
🟢 77 qd.sync()
🟢 78 tc = f.to_torch(copy=False)
🟢 79 assert _to_cpu(tc)[0] == 7
80
81
🟢 82 @test_utils.test(arch=dlpack_arch)
🟢 83 def test_scalar_field_to_torch_aliases_memory():
84 """Zero-copy tensor must alias the field's device memory."""
🟢 85 if is_v520_amdgpu():
🔴 86 pytest.skip("can't run torch accessor kernels on v520")
🟢 87 f = qd.field(qd.i32, shape=(4,))
🟢 88 f[0] = 1
🟢 89 qd.sync()
🟢 90 tc = f.to_torch()
91
🟢 92 @qd.kernel
🟢 93 def write(f: qd.template()):
🔴 94 f[0] = 99
95
🟢 96 write(f)
🟢 97 qd.sync()
🟢 98 assert tc[0] == 99
99
100
101 # ---------------------------------------------------------------------------
102 # ScalarField.to_numpy -- zero-copy (CPU only)
103 # ---------------------------------------------------------------------------
104
105
🟢 106 @test_utils.test(arch=[qd.cpu])
🟢 107 def test_scalar_field_to_numpy_zerocopy_cpu():
🟢 108 f = qd.field(qd.f32, shape=(5,))
🟢 109 f[0] = 3.5
🟢 110 qd.sync()
🟢 111 arr = f.to_numpy()
🟢 112 assert isinstance(arr, np.ndarray)
🟢 113 np.testing.assert_allclose(arr[0], 3.5)
114
115
🟢 116 @test_utils.test(arch=[qd.cpu])
🟢 117 def test_scalar_field_to_numpy_copy_true_cpu():
🟢 118 f = qd.field(qd.f32, shape=(3,))
🟢 119 f[0] = 1.0
🟢 120 qd.sync()
🟢 121 arr = f.to_numpy(copy=True)
🟢 122 np.testing.assert_allclose(arr[0], 1.0)
123
124
🟢 125 @test_utils.test(arch=dlpack_arch)
🟢 126 def test_scalar_field_to_numpy_matches_copy_path():
127 """Zero-copy and copy path produce the same values."""
🟢 128 f = qd.field(qd.f32, shape=(6,))
🟢 129 for i in range(6):
🟢 130 f[i] = float(i * 11)
🟢 131 qd.sync()
🟢 132 arr_default = f.to_numpy()
🟢 133 arr_copy = f.to_numpy(copy=True)
🟢 134 np.testing.assert_allclose(arr_default, arr_copy)
135
136
137 # ---------------------------------------------------------------------------
138 # MatrixField.to_torch -- zero-copy with keep_dims handling
139 # ---------------------------------------------------------------------------
140
141
🟢 142 @test_utils.test(arch=dlpack_arch)
🟢 143 def test_vector_field_to_torch_zerocopy():
🟢 144 vec3 = qd.types.vector(3, qd.f32)
🟢 145 f = qd.field(vec3, shape=(4,))
🟢 146 f[0] = (1, 2, 3)
🟢 147 f[1] = (4, 5, 6)
🟢 148 qd.sync()
🟢 149 tc = _to_cpu(f.to_torch())
🟢 150 assert tuple(tc.shape) == (4, 3)
🟢 151 assert tc[0, 0] == 1
🟢 152 assert tc[1, 2] == 6
153
154
🟢 155 @test_utils.test(arch=dlpack_arch)
🟢 156 def test_vector_field_to_torch_keep_dims():
🟢 157 vec3 = qd.types.vector(3, qd.f32)
🟢 158 f = qd.field(vec3, shape=(4,))
🟢 159 f[0] = (10, 20, 30)
🟢 160 qd.sync()
🟢 161 tc_no_keep = _to_cpu(f.to_torch(keep_dims=False))
🟢 162 tc_keep = _to_cpu(f.to_torch(keep_dims=True))
🟢 163 assert tuple(tc_no_keep.shape) == (4, 3)
🟢 164 assert tuple(tc_keep.shape) == (4, 3, 1)
🟢 165 assert tc_no_keep[0, 0] == tc_keep[0, 0, 0]
166
167
🟢 168 @test_utils.test(arch=dlpack_arch)
🟢 169 def test_matrix_field_to_torch_zerocopy():
🟢 170 mat = qd.types.matrix(2, 3, qd.f32)
🟢 171 f = qd.field(mat, shape=(5,))
🟢 172 f[0] = ((1, 2, 3), (4, 5, 6))
🟢 173 qd.sync()
🟢 174 tc = _to_cpu(f.to_torch())
🟢 175 assert tuple(tc.shape) == (5, 2, 3)
🟢 176 assert tc[0, 0, 0] == 1
🟢 177 assert tc[0, 1, 2] == 6
178
179
🟢 180 @test_utils.test(arch=dlpack_arch)
🟢 181 def test_matrix_field_to_torch_matches_copy():
🟢 182 mat = qd.types.matrix(2, 3, qd.f32)
🟢 183 f = qd.field(mat, shape=(4,))
🟢 184 f[0] = ((1, 2, 3), (4, 5, 6))
🟢 185 f[1] = ((7, 8, 9), (10, 11, 12))
🟢 186 qd.sync()
🟢 187 tc_zc = _to_cpu(f.to_torch())
🟢 188 tc_cp = _to_cpu(f.to_torch(copy=True))
🟢 189 assert torch.allclose(tc_zc, tc_cp)
190
191
192 # ---------------------------------------------------------------------------
193 # MatrixField.to_numpy -- zero-copy (CPU only)
194 # ---------------------------------------------------------------------------
195
196
🟢 197 @test_utils.test(arch=[qd.cpu])
🟢 198 def test_vector_field_to_numpy_zerocopy_cpu():
🟢 199 vec3 = qd.types.vector(3, qd.f32)
🟢 200 f = qd.field(vec3, shape=(3,))
🟢 201 f[0] = (10, 20, 30)
🟢 202 qd.sync()
🟢 203 arr = f.to_numpy()
🟢 204 assert arr.shape == (3, 3)
🟢 205 np.testing.assert_allclose(arr[0], [10, 20, 30])
206
207
🟢 208 @test_utils.test(arch=[qd.cpu])
🟢 209 def test_matrix_field_to_numpy_zerocopy_cpu():
🟢 210 mat = qd.types.matrix(2, 2, qd.f32)
🟢 211 f = qd.field(mat, shape=(2,))
🟢 212 f[0] = ((1, 2), (3, 4))
🟢 213 qd.sync()
🟢 214 arr = f.to_numpy()
🟢 215 assert arr.shape == (2, 2, 2)
🟢 216 np.testing.assert_allclose(arr[0], [[1, 2], [3, 4]])
217
218
219 # ---------------------------------------------------------------------------
220 # Ndarray.to_torch -- new method, always uses DLPack
221 # ---------------------------------------------------------------------------
222
223
🟢 224 @test_utils.test(arch=dlpack_arch)
🟢 225 def test_scalar_ndarray_to_torch():
🟢 226 nd = qd.ndarray(qd.f32, shape=(5,))
🟢 227 nd[0] = 42.0
🟢 228 qd.sync()
🟢 229 tc = _to_cpu(nd.to_torch())
🟢 230 assert tuple(tc.shape) == (5,)
🟢 231 assert tc[0] == 42.0
232
233
🟢 234 @test_utils.test(arch=dlpack_arch)
🟢 235 def test_vector_ndarray_to_torch():
🟢 236 nd = qd.Vector.ndarray(3, qd.f32, shape=(4,))
🟢 237 nd[0] = (1, 2, 3)
🟢 238 qd.sync()
🟢 239 tc = _to_cpu(nd.to_torch())
🟢 240 assert tuple(tc.shape) == (4, 3)
🟢 241 assert tc[0, 0] == 1
🟢 242 assert tc[0, 2] == 3
243
244
🟢 245 @test_utils.test(arch=dlpack_arch)
🟢 246 def test_matrix_ndarray_to_torch():
🟢 247 nd = qd.Matrix.ndarray(2, 3, qd.f32, shape=(4,))
🟢 248 nd[0] = ((10, 20, 30), (40, 50, 60))
🟢 249 qd.sync()
🟢 250 tc = _to_cpu(nd.to_torch())
🟢 251 assert tuple(tc.shape) == (4, 2, 3)
🟢 252 assert tc[0, 0, 0] == 10
🟢 253 assert tc[0, 1, 2] == 60
254
255
🟢 256 @test_utils.test(arch=dlpack_arch)
🟢 257 def test_ndarray_to_torch_caching():
🟢 258 nd = qd.ndarray(qd.f32, shape=(3,))
🟢 259 qd.sync()
🟢 260 t1 = nd.to_torch()
🟢 261 t2 = nd.to_torch()
🟢 262 assert t1.data_ptr() == t2.data_ptr()
263
264
🟢 265 @test_utils.test(arch=dlpack_arch)
🟢 266 def test_ndarray_to_torch_copy_true():
🟢 267 nd = qd.ndarray(qd.f32, shape=(3,))
🟢 268 nd[0] = 7.0
🟢 269 qd.sync()
🟢 270 tc = _to_cpu(nd.to_torch(copy=True))
🟢 271 assert tc[0] == 7.0
🟢 272 view = nd.to_torch()
🟢 273 assert view.data_ptr() != tc.data_ptr()
274
275
276 # ---------------------------------------------------------------------------
277 # Ndarray.to_numpy -- zero-copy on CPU
278 # ---------------------------------------------------------------------------
279
280
🟢 281 @test_utils.test(arch=[qd.cpu])
🟢 282 def test_scalar_ndarray_to_numpy_zerocopy():
🟢 283 nd = qd.ndarray(qd.f32, shape=(4,))
🟢 284 nd[0] = 3.0
🟢 285 nd[1] = 5.0
🟢 286 qd.sync()
🟢 287 arr = nd.to_numpy()
🟢 288 np.testing.assert_allclose(arr[:2], [3.0, 5.0])
289
290
🟢 291 @test_utils.test(arch=[qd.cpu])
🟢 292 def test_vector_ndarray_to_numpy_zerocopy():
🟢 293 nd = qd.Vector.ndarray(3, qd.f32, shape=(2,))
🟢 294 nd[0] = (1, 2, 3)
🟢 295 qd.sync()
🟢 296 arr = nd.to_numpy()
🟢 297 np.testing.assert_allclose(arr[0], [1, 2, 3])
298
299
🟢 300 @test_utils.test(arch=dlpack_arch)
🟢 301 def test_ndarray_to_numpy_matches_copy():
🟢 302 nd = qd.ndarray(qd.f32, shape=(6,))
🟢 303 for i in range(6):
🟢 304 nd[i] = float(i * 7)
🟢 305 qd.sync()
🟢 306 arr_default = nd.to_numpy()
🟢 307 arr_copy = nd.to_numpy(copy=True)
🟢 308 np.testing.assert_allclose(arr_default, arr_copy)
309
310
311 # ---------------------------------------------------------------------------
312 # Two fields in the same SNode tree (non-zero offset)
313 # ---------------------------------------------------------------------------
314
315
🟢 316 @test_utils.test(arch=dlpack_arch)
🟢 317 def test_to_torch_two_fields_same_tree():
🟢 318 a = qd.field(qd.i32, (100,))
🟢 319 b = qd.field(qd.i32, (100,))
🟢 320 a[0] = 111
🟢 321 b[0] = 222
🟢 322 qd.sync()
🟢 323 at = _to_cpu(a.to_torch())
🟢 324 bt = _to_cpu(b.to_torch())
🟢 325 assert at[0] == 111
🟢 326 assert bt[0] == 222
327
328
329 # ---------------------------------------------------------------------------
330 # copy=False -- error paths
331 # ---------------------------------------------------------------------------
332
333
🟢 334 @test_utils.test(arch=dlpack_arch)
🟢 335 def test_copy_false_with_device_transfer_raises():
336 """copy=False + device that differs from data device should raise."""
🟢 337 if qd.cfg.arch == qd.cpu:
🟢 338 pytest.skip("need GPU arch to test device mismatch")
🔴 339 f = qd.field(qd.f32, shape=(3,))
🔴 340 qd.sync()
🔴 341 with pytest.raises(ValueError, match="copy=False"):
🔴 342 f.to_torch(device="cpu", copy=False)
343
344
🟢 345 @test_utils.test(arch=[qd.cpu])
🟢 346 def test_copy_false_numpy_dtype_conversion_raises():
347 """copy=False + dtype that requires conversion should raise."""
🟢 348 f = qd.field(qd.f32, shape=(3,))
🟢 349 qd.sync()
🟢 350 with pytest.raises(ValueError, match="copy=False"):
🟢 351 f.to_numpy(dtype=np.float64, copy=False)
352
353
354 # ---------------------------------------------------------------------------
355 # StructField pass-through
356 # ---------------------------------------------------------------------------
357
358
🟢 359 @test_utils.test(arch=dlpack_arch)
🟢 360 def test_struct_field_to_torch():
🟢 361 s = qd.Struct.field({"a": qd.f32, "b": qd.f32}, shape=(4,))
🟢 362 s[0] = {"a": 1.0, "b": 2.0}
🟢 363 qd.sync()
🟢 364 d = s.to_torch()
🟢 365 assert isinstance(d, dict)
🟢 366 assert _to_cpu(d["a"])[0] == 1.0
🟢 367 assert _to_cpu(d["b"])[0] == 2.0
368
369
🟢 370 @test_utils.test(arch=dlpack_arch)
🟢 371 def test_struct_field_to_torch_copy_true():
🟢 372 s = qd.Struct.field({"x": qd.f32}, shape=(3,))
🟢 373 s[0] = {"x": 5.0}
🟢 374 qd.sync()
🟢 375 d1 = s.to_torch()
🟢 376 d2 = s.to_torch(copy=True)
377 # StructField always copies (AOS member views are not zero-copyable yet -- see struct.py docstring),
378 # so both calls allocate independent storage.
🟢 379 assert d1["x"].data_ptr() != d2["x"].data_ptr()
🟢 380 assert _to_cpu(d2["x"])[0] == 5.0
381
382
🟢 383 @test_utils.test(arch=dlpack_arch)
🟢 384 def test_struct_field_to_torch_does_not_alias_memory():
385 """StructField always returns independent copies of each AOS member.
386
387 Quadrants' C++ ``field_to_dlpack`` does not currently emit cell-stride-aware DLPack views for individual members of
388 an AOS struct -- it computes contiguous strides at the member dtype size, which would interleave neighboring
389 members' bytes. Until that is fixed, ``StructField.to_torch`` forces ``copy=True`` per-member, so a kernel write
390 into the field must NOT be reflected in a previously-obtained dict.
391 """
🟢 392 if is_v520_amdgpu():
🔴 393 pytest.skip("can't run torch accessor kernels on v520")
🟢 394 s = qd.Struct.field({"a": qd.i32, "b": qd.i32}, shape=(4,))
🟢 395 s[0] = {"a": 1, "b": 2}
🟢 396 qd.sync()
🟢 397 d = s.to_torch()
🟢 398 assert _to_cpu(d["a"])[0] == 1
🟢 399 assert _to_cpu(d["b"])[0] == 2
400
🟢 401 @qd.kernel
🟢 402 def write(s: qd.template()):
🔴 403 s[0].a = 99
🔴 404 s[0].b = 77
405
🟢 406 write(s)
🟢 407 qd.sync()
408 # Snapshot must be unchanged because StructField.to_torch always copies.
🟢 409 assert _to_cpu(d["a"])[0] == 1
🟢 410 assert _to_cpu(d["b"])[0] == 2
411
412
🟢 413 @test_utils.test(arch=[qd.cpu])
🟢 414 def test_struct_field_copy_false_raises():
415 """``copy=False`` is rejected for StructField (AOS member views not supported yet)."""
🟢 416 s = qd.Struct.field({"a": qd.f32, "b": qd.i32}, shape=(3,))
🟢 417 s[0] = {"a": 1.5, "b": 7}
🟢 418 qd.sync()
🟢 419 with pytest.raises(ValueError, match="StructField.to_numpy.*copy=False"):
🟢 420 s.to_numpy(copy=False)
🟢 421 with pytest.raises(ValueError, match="StructField.to_torch.*copy=False"):
🟢 422 s.to_torch(copy=False)
423
424
425 # ---------------------------------------------------------------------------
426 # Cache invalidation across qd.reset() / qd.init()
427 # ---------------------------------------------------------------------------
428
429
🟢 430 @test_utils.test(arch=dlpack_arch)
🟢 431 def test_zerocopy_cache_survives_reset():
432 """Holding a zero-copy view across qd.reset() must not segfault.
433
434 Regression for the use-after-free that motivated the Genesis to_numpy()-always-copies workaround: cache invalidation
435 now runs BEFORE the C++ program is torn down (see pyquadrants.cache_holders in impl.py), so the DLPack deleter on
436 the cached tensor operates on still-valid memory.
437 """
🟢 438 arch = qd.cfg.arch
🟢 439 f = qd.field(qd.f32, shape=(4,))
🟢 440 f[0] = 1.0
🟢 441 qd.sync()
🟢 442 held = f.to_torch() # cached zero-copy view; we hold a Python reference past reset()
🟢 443 assert held is not None
444
🟢 445 qd.reset()
🟢 446 qd.init(arch=arch)
447
448 # Re-create and verify the new field works -- the previous-cycle 'held' tensor is allowed to remain alive (its
449 # data may now be released, but we don't dereference it). What we MUST not have is a crash inside the
450 # previous-cycle deleter when 'held' is GC'd.
🟢 451 f2 = qd.field(qd.f32, shape=(4,))
🟢 452 f2[0] = 42.0
🟢 453 qd.sync()
🟢 454 fresh = f2.to_torch()
🟢 455 assert _to_cpu(fresh)[0] == 42.0
456 # Touch 'held' last so it's only dropped at function exit; the DLPack deleter must not crash.
457
458
🟢 459 @test_utils.test(arch=dlpack_arch)
🟢 460 def test_zerocopy_cache_fresh_after_reset():
461 """After qd.reset() / qd.init(), a freshly-constructed Field gets a fresh cache.
462
463 Specifically, the new field's data_ptr is not stale-equal to a previous cycle's, AND the new view sees current data.
464 """
🟢 465 arch = qd.cfg.arch
🟢 466 f1 = qd.field(qd.f32, shape=(2,))
🟢 467 f1[0] = 1.0
🟢 468 qd.sync()
🟢 469 f1.to_torch() # populate cache
🟢 470 f1_dtype = f1.dtype # noqa: F841 -- keep f1 alive past reset for the GC ordering test
471
🟢 472 qd.reset()
🟢 473 qd.init(arch=arch)
474
🟢 475 f2 = qd.field(qd.f32, shape=(2,))
🟢 476 f2[0] = 99.0
🟢 477 qd.sync()
🟢 478 tc2 = _to_cpu(f2.to_torch())
🟢 479 assert tc2[0] == 99.0
480
481
482 # ---------------------------------------------------------------------------
483 # Apple Metal double-sync
484 # ---------------------------------------------------------------------------
485
486
🟢 487 @test_utils.test(arch=dlpack_arch)
🟢 488 def test_clone_after_kernel_write_returns_up_to_date_data():
489 """After a kernel writes to a field, an immediate clone must see the post-write values.
490
491 Specifically targets the Apple Metal path where qd.sync() flushes pending Quadrants kernels and
492 torch.mps.synchronize() flushes pending MPS clone copies; both must run for the cloned tensor to be observably
493 equal to the field's current values.
494 """
🟢 495 if is_v520_amdgpu():
🔴 496 pytest.skip("can't run torch accessor kernels on v520")
🟢 497 f = qd.field(qd.i32, shape=(8,))
🟢 498 f[0] = 0
🟢 499 qd.sync()
500
🟢 501 @qd.kernel
🟢 502 def write(f: qd.template()):
🔴 503 for i in range(8):
🔴 504 f[i] = 100 + i
505
🟢 506 write(f)
507 # No explicit qd.sync() here -- we rely on to_torch(copy=True)'s internal sync.
🟢 508 tc = f.to_torch(copy=True)
🟢 509 tc_cpu = _to_cpu(tc)
🟢 510 for i in range(8):
🟢 511 assert tc_cpu[i] == 100 + i
🟢 tests/python/test_api.py (100%)
370 "to_torch",
🟢 372 user_api[qd.Ndarray] = ["copy_from", "element_shape", "fill", "get_type", "to_dlpack", "to_torch"]
407 "to_torch",
432 "to_torch",
🟢 tests/python/test_torch_ad.py (100%)
🟢 51 outp = y.to_torch().to(inp.device)
🟢 59 inp_grad = x.grad.to_torch().to(outp_grad.device)
🟢 tests/python/test_torch_io.py (100%)
🟢 164 t3 = x2.to_torch().cpu()
🟢 178 zeros = mat.to_torch().cpu()
🟢 209 s_after_to = impl.get_runtime().get_num_compiled_functions()
210 # to_torch may use zero-copy (0 kernels) or the fused kernel path (1 kernel)
🟢 211 assert s_after_to - s <= 1
213 # from_torch always compiles exactly one fused kernel
🟢 214 assert impl.get_runtime().get_num_compiled_functions() == s_after_to + 1
🟢 233 X = x.to_torch().cpu()
🟢 241 X1 = x.to_torch().cpu()
🟢 243 X1 = x.to_torch().cpu()
🟢 253 X = x.to_torch().cpu()
🟢 260 X1 = x.to_torch().cpu()
🟢 262 X1 = x.to_torch().cpu()
| auto = f.to_torch() # default: copy=None (see below) | ||
| ``` | ||
|
|
||
| Modifications via the view are visible to subsequent Quadrants kernel reads, and vice-versa. The view stays valid until the underlying storage is reallocated -- typically on `qd.init()` or `qd.reset()`, after which a fresh call to `to_torch()` / `to_numpy()` returns a new view. |
There was a problem hiding this comment.
You should mention somewhere in the doc that one should be careful it is asynchronous on Apple Metal because torch and quadrants do not share the same compute "stream" (not sure if it is the correct terminology). Which means that doing:
a_tc = a.to_torch(copy=False)
a_tc.zero_()
my_kernel(a)Does not guarantee that my_kernel with be computed after reset to zero on Apple Metal, until torch.mps.synchronise() is called explicitly after a_tc.zero_(). Currently it is not the case on purpose. This is managed manually in Genesis, only after performing all the desired operations on all fields involved. This is more efficient because it keeps filling the compute queue will emptying it, partially masking Python overhead.
There was a problem hiding this comment.
Ok I just found out the section where you mention the behaviour on Metal. I'm not convinced by the behaviour that was chosen though.
|
|
||
| Zero-copy uses [DLPack](https://github.com/dmlc/dlpack) and requires: | ||
|
|
||
| - a backend with DLPack support: `cpu` (`x64`/`arm64`), `cuda`, `amdgpu`, or `metal` (Vulkan is not supported); |
There was a problem hiding this comment.
It is not supported, because it is mostly irrelevant for now. Vulkan-backed DLPack tensors cannot be processed by any existing well-known scientific computing library such as torch, numpy...etc
| - a backend with DLPack support: `cpu` (`x64`/`arm64`), `cuda`, `amdgpu`, or `metal` (Vulkan is not supported); | ||
| - a DLPack-supported dtype: `i32`, `i64`, `f32`, `f64`, `u1` (other dtypes such as `f16`, `u8`, `u16` fall back to the kernel-copy path); | ||
| - on Apple Metal, `torch >= 2.9.2` for fields (required for DLPack `bytes_offset` on MPS; see [pytorch/pytorch#168193](https://github.com/pytorch/pytorch/pull/168193)); | ||
| - 0-dim `ScalarField` instances are not zero-copyable on any backend (PyTorch DLPack `bytes_offset` limitation); |
There was a problem hiding this comment.
It should be clarified that Quadrants' implementation has been designed with torch as only first-class user interface in mind, otherwise bringing torch limitation to justify Quadrants' limitation is weird.
| - a DLPack-supported dtype: `i32`, `i64`, `f32`, `f64`, `u1` (other dtypes such as `f16`, `u8`, `u16` fall back to the kernel-copy path); | ||
| - on Apple Metal, `torch >= 2.9.2` for fields (required for DLPack `bytes_offset` on MPS; see [pytorch/pytorch#168193](https://github.com/pytorch/pytorch/pull/168193)); | ||
| - 0-dim `ScalarField` instances are not zero-copyable on any backend (PyTorch DLPack `bytes_offset` limitation); | ||
| - `StructField` members are not zero-copyable yet (see [Struct fields](#struct-fields) below); `copy=False` raises on `StructField`. |
There was a problem hiding this comment.
Only StructField with AOS should not be zero-copyable. If the layout is SAO, it should be fine as far as I know.
|
|
||
| | Value | Behaviour | | ||
| |---|---| | ||
| | `None` (default) | Zero-copy view if supported, otherwise an independent copy. Never raises for support reasons. | |
There was a problem hiding this comment.
I think it is a bad idea to use None by default. It should rather be False.
In Genesis we started with copy=None by default, before realising it was causing troubles for users that were expecting a copy and doing in-place modification on the output of forgetting to copy if before storing in a buffer.
If people care about efficiency, they should know about this copy option. If they don't, then they probably care more about reliability and user-friendliness. And I don't think it is very user-friendly that something converting a tensor to a completely different format from another library returns a shared view instead of a copy. Moreover, I think that having a consistent, simple, behaviour by default is just easier to support.
There was a problem hiding this comment.
do you mean, it should default to copy=True?
|
|
||
| ### Apple Metal: synchronisation | ||
|
|
||
| On Apple Metal, Quadrants and PyTorch MPS use separate Metal command queues. Quadrants kernel writes are made visible to the MPS-backed view via an automatic `qd.sync()` on every `to_torch()` / `to_numpy()` call. Cloning a view also synchronises MPS so that the clone sees the latest writes: |
There was a problem hiding this comment.
via an automatic qd.sync()on everyto_torch()/to_numpy() call
I don't think this is a good idea (see my earlier comment), at least it should be optional.
There was a problem hiding this comment.
so what do you think should happen?
Coverage Report (
|
| File | Coverage | Missing |
|---|---|---|
🟢 python/quadrants/lang/_ndarray.py |
81% | 453,468,481 |
🔴 python/quadrants/lang/field.py |
44% | 19,115-117,119,122-131,134,137,160,164,167,169,171,177,191,194-196,199-200,207,216,218,221,228,230,233,236-237,240,247-249,251-253,259,384,396,493-494,549,560,579 |
🔴 python/quadrants/lang/matrix.py |
66% | 22,1416,1438,1452,1849,1866-1870,1889,1990,2010,2033 |
🔴 python/quadrants/lang/struct.py |
50% | 552,566 |
🟢 tests/python/test_to_torch_copy.py |
83% | 28,52-55,57-59,147,259-261,280-282,300-302,323-325,344-346,356,372-374,396-398,400-401,412,414-417,419-421,427,429-432,434-437,518-521,523-524 |
Diff coverage: 73% · Overall: 52% · 482 lines, 128 missing
…DA GPUs Restrict to arch=[qd.cuda] since qd.cfg.arch.name gives 'amdgpu'/'metal' which are not valid torch device strings (torch expects 'cuda'/'mps').
04f6f20 to
1603ed9
Compare
|
@claude review |
The C++ field_to_dlpack eagerly imports torch to check byte_offset support. When torch is absent, this raises ModuleNotFoundError on a numpy-only API. Catch the error gracefully: copy=False raises a clear ValueError, copy=True falls back to the kernel-copy path. Document the torch prerequisite for field zero-copy numpy in interop.md.
Coverage Report (
|
| File | Coverage | Missing |
|---|---|---|
🟢 python/quadrants/lang/_ndarray.py |
81% | 453,468,481 |
🔴 python/quadrants/lang/field.py |
44% | 19,115-117,119,122-131,134,137,160,164,167,169,171,177,192,195-197,200-201,208,217,219,222,229,231,234,237-238,241,248-250,252-254,260,385,397,494-495,550,561,580 |
🔴 python/quadrants/lang/matrix.py |
66% | 22,1416,1438,1452,1849,1866-1870,1889,1990,2010,2033 |
🔴 python/quadrants/lang/struct.py |
50% | 552,566 |
🟢 tests/python/test_to_torch_copy.py |
82% | 28,55-61,63-64,152,264-266,285-287,305-307,328-330,349-351,361,377-379,401-403,405-406,417,419-422,424-426,432,434-437,439-442,523-526,528-529 |
Diff coverage: 73% · Overall: 73% · 486 lines, 130 missing
Wrap single-line docstrings in MatrixNdarray.to_torch (132c) and VectorNdarray.to_torch (154c) to stay within 120c. Also fix a 122c line in _is_aos_struct_member docstring from previous reflow.
- 0-dim ScalarField copy=False raises (to_torch and to_numpy) - Unsupported dtype (f16) copy=False raises - Cross-device transfer with copy=False raises - MatrixField (2x3, not just vectors) copy=False and copy=True - MatrixNdarray to_numpy(copy=False) - AoS struct member to_numpy(copy=False) raises - copy=True independence checks for vector/matrix fields and ndarrays
- Add is_ndarray parameter to _can_zerocopy_field so ndarrays skip the Metal torch-version gate (ndarray_to_dlpack uses byte_offset=0) and the 0-dim ScalarField carve-out (only applies to fields-in-tree). - Catch RuntimeError from C++ field_to_dlpack for non-dense SNodes and convert to ValueError for consistent error types. - Tighten pytest.raises(Exception) to ValueError in Vulkan test. - Skip Metal in tests when torch < 2.9.2 (old DLPack bytes_offset). - Drop to_dlpack from StructField API listing (base class raises NotImplementedError).
Coverage Report (
|
| File | Coverage | Missing |
|---|---|---|
🟢 python/quadrants/lang/_ndarray.py |
81% | 453,468,481 |
🔴 python/quadrants/lang/field.py |
47% | 19,115-117,119,122-131,134,137,160,167,169,177,192,195-197,200-201,208,217,219,222,229,231,234,237-238,241,248-250,254,260-262,266,268,393,405,502-503,558,569,588 |
🔴 python/quadrants/lang/matrix.py |
76% | 22,1416,1438,1452,1849,1869,1889,1992,2012,2035 |
🔴 python/quadrants/lang/struct.py |
50% | 552,566 |
🟢 tests/python/test_to_torch_copy.py |
86% | 28,55-61,63-64,152,264-266,285-287,305-307,328-330,349-351,361,377-379,401-403,405-406,417,419-422,424-426,432,434-437,439-442,523-526,528-529,605-608,610-611,685-687 |
Diff coverage: 79% · Overall: 74% · 634 lines, 135 missing
Coverage Report (
|
| File | Coverage | Missing |
|---|---|---|
🟢 python/quadrants/lang/_ndarray.py |
81% | 453,468,481 |
🔴 python/quadrants/lang/field.py |
45% | 19,115-117,119,122-131,134,137,160,167,170,178,193-194,196,199-201,204-205,212,221,223,226,233,235,238,241-242,245,252-254,258,264-266,270-274,276,401,413,510-511,566,577,596 |
🔴 python/quadrants/lang/matrix.py |
76% | 22,1416,1438,1452,1849,1869,1889,1992,2012,2035 |
🔴 python/quadrants/lang/struct.py |
50% | 552,566 |
🟢 tests/python/test_to_torch_copy.py |
85% | 28,30,32-33,60-66,68-69,157,269-271,290-292,310-312,333-335,354-356,366,382-384,406-408,410-411,422,424-427,429-431,437,439-442,444-447,528-531,533-534,610-613,615-616,690-692 |
Diff coverage: 78% · Overall: 64% · 646 lines, 144 missing
- Fix ruff I001 import sorting in test_to_torch_copy.py - Restore to_dlpack in StructField API listing (inherited from Field base class, so it appears in the actual API even though it raises NotImplementedError) - Restore deleted TODO comments on runtime_ops.sync() calls
NumPy 2.x flags arrays imported from DLPack v0 PyCapsules as non-writable. Update interop.md to clarify that to_numpy(copy=False) returns a read-only view and direct users to to_torch(copy=False) for writable zero-copy views. Replace the numpy mutation example with a read-only usage example.
NumPy >= 2.0 marks arrays from v0 capsules as read-only. A future PR should upgrade the C++ to emit DLManagedTensorVersioned (v1) with flags=0 for the numpy path.
|
@claude review |
There was a problem hiding this comment.
LGTM — duburcqa approved and prior review feedback addressed; only minor nits remain.
Extended reasoning...
Overview
This PR adds zero-copy DLPack interop to to_torch()/to_numpy() across ScalarField, MatrixField, all Ndarray variants, and StructField (per-member). Adds a copy= keyword: copy=True (default) keeps the kernel-copy path; copy=False returns a DLPack-aliased view or raises. New interop.md docs explain the support matrix, lifetime caveats, and Metal sync semantics.
Security risks
None. This is an in-process Python/C++ data-conversion API on already-trusted runtime memory. No auth, crypto, or untrusted-input boundaries.
Level of scrutiny
Public API change touching widely-used types — normally warrants close human review. However: duburcqa (active maintainer) has explicitly approved with 'ok to merge'; my prior review history on this PR has flagged 12+ issues that the author addressed with code or doc fixes; the two bugs found in this run are explicitly nit-level (the dtype=qd.f32 ValueError affects no real callers given the documented contract, and the test-skip helper only affects local Metal+stable-torch dev which isn't a CI configuration).
Other factors
The PR has had multiple coverage iterations (final ~73-78% on changed lines), the test file for the new feature is comprehensive (covers AOS-vs-SOA struct layouts, 0-dim scalars, unsupported dtypes, cross-device transfer rejection, Metal sync, MatrixField shape preservation under keep_dims, copy=True independence). The remaining nits are noted inline.
| if copy is False: | ||
| arr = _try_zerocopy_numpy(self, copy=False, is_scalar=True) | ||
| if arr is not None: | ||
| if dtype is not None and arr.dtype != dtype: | ||
| raise ValueError(f"copy=False is incompatible with dtype conversion ({arr.dtype} -> {dtype})") | ||
| return arr |
There was a problem hiding this comment.
🟡 Pre-existing nit / API inconsistency. Field.to_numpy(copy=False, dtype=qd.f32) (and the same in MatrixField.to_numpy, Ndarray.to_numpy, MatrixNdarray.to_numpy, VectorNdarray.to_numpy) raises ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)") even though the underlying dtype IS f32. The comparison arr.dtype != dtype (field.py:579) compares a numpy.dtype to a quadrants DataTypeCxx, which evaluates True due to type mismatch. The base class type hint advertises dtype: DataTypeCxx | None = None (field.py:404) while the override docstring says "Optional target numpy dtype" — the type hint is the misleading half. Fix: normalize via to_numpy_type() if dtype is a DataTypeCxx before the comparison. Severity: nit — no real callers pass quadrants dtypes, the kernel-copy path also fails on dtype=qd.f32 (with TypeError from numpy refusing the unknown dtype), and the docstring is the more authoritative half of the contract.
Extended reasoning...
What the bug is
In Field.to_numpy(copy=False, dtype=...) at field.py:577-581 (and four other identical sites: matrix.py:1437-1438, matrix.py:1868-1869, matrix.py:2011-2012, _ndarray.py:467-468):
arr = _try_zerocopy_numpy(self, copy=False, is_scalar=True)
if arr is not None:
if dtype is not None and arr.dtype != dtype:
raise ValueError(f"copy=False is incompatible with dtype conversion ({arr.dtype} -> {dtype})")
return arrThe arr.dtype != dtype comparison compares arr.dtype (a numpy.dtype like np.float32) to whatever the caller passed for dtype. If the caller follows the base class type hint at field.py:404 — dtype: DataTypeCxx | None = None — and passes dtype=qd.f32 (which is quadrants_python.DataType_f32, a pybind11 wrapper of DataTypeCxx), the inequality evaluates True because numpy can't coerce the pybind11 enum and falls back to identity-style comparison. Result: spurious ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)") even though the dtypes ARE compatible.
Step-by-step proof
- User reads the base class signature
Field.to_numpy(self, dtype: DataTypeCxx | None = None, *, copy=True)and writesf.to_numpy(copy=False, dtype=qd.f32)for aqd.f32field. Field.to_numpy(field.py:566) takes thecopy is Falsebranch._try_zerocopy_numpy(self, copy=False, is_scalar=True)returns a workingnp.ndarraywitharr.dtype == np.dtype('float32').- Line 579:
dtype is not Noneis True, andarr.dtype != dtypeevaluatesnp.dtype('float32') != quadrants_python.DataType_f32. Numpy's__ne__returnsNotImplementedfor the pybind11 enum, Python falls back to identity comparison, which yields True. - Line 580 raises
ValueError("copy=False is incompatible with dtype conversion (float32 -> qd.f32)"). - The user sees a confusing message claiming a dtype conversion is needed when the underlying dtype already matches.
Why existing code does not prevent it
The override docstring at field.py:569 says "Optional target numpy dtype", which would steer most users toward np.float32. But the abstract base class signature (field.py:404) explicitly types it as DataTypeCxx | None, and IDEs / help() / Sphinx surfaces show this hint to anyone introspecting the API. There is no normalization step on the input — neither isinstance(dtype, DataTypeCxx) check nor to_numpy_type() conversion before the comparison.
Why this is nit severity (and not blocking)
Four mitigations push this down from "real bug" to "API inconsistency":
- Docstring overrides type hint. The override at
ScalarField.to_numpy(and the other 4 sites) explicitly documentsdtypeas a numpy dtype. Users reading the docs passnp.float32and never trip the check. - Zero existing callers pass a DataTypeCxx. Grep across
tests/python/confirms everyto_numpy(dtype=...)callsite passes a numpy dtype (np.float32,np.float64). No production code in the repo follows the type hint over the docstring. - The kernel-copy path is no better. Looking at the
copy=True(default) path at field.py:587-595:If a user passesif dtype is None: dtype = to_numpy_type(self.dtype) arr = np.zeros(self.shape, dtype=dtype)
dtype=qd.f32here,np.zeros(shape, dtype=qd.f32)raisesTypeError: data type 'qd.f32' not understoodbecause numpy can't convert the pybind11 enum. So the kernel-copy path also fails ondtype=qd.f32, just with a different error type. This isn't a regression introduced by this PR; the API has always rejected DataTypeCxx in practice. - Failure mode is loud, not silent. Clean
ValueErrorwith a (slightly misleading) message — not silent data corruption. Workaround is trivial: passnp.float32instead.
Addressing the refutation
The refuting verifier's points all stand:
- The docstring is the actual contract — agreed.
- All existing tests pass numpy dtypes — confirmed.
- The kernel-copy path also rejects DataTypeCxx — confirmed (different error, same outcome).
- The submitter labels this nit — agreed.
Where the refuter and I differ: they say "not a bug worth flagging", I (and 4 other verifiers) say it's worth a nit-level note because the type hint genuinely misleads, and the fix is small (~2 lines: add if isinstance(dtype, DataTypeCxx): dtype = to_numpy_type(dtype) before the comparison). This is "API hygiene" rather than "blocker", which is exactly what nit severity is for.
Recommended fix
Normalize the user-provided dtype before comparing, in all 5 sites:
if dtype is not None:
if isinstance(dtype, _qd_core.DataTypeCxx):
dtype = to_numpy_type(dtype)
if arr.dtype != dtype:
raise ValueError(...)This makes dtype=qd.f32 and dtype=np.float32 both behave correctly when they match the field's native dtype, and either one raises cleanly when they don't. The docstring stays "numpy dtype" but the type-hint contract is also honored.
There was a problem hiding this comment.
self-classified by ai as nit
| _NO_ZEROCOPY_ARCHS = {qd.vulkan} | ||
|
|
||
|
|
||
| def _skip_if_no_zerocopy(): | ||
| """Skip the current test if the active backend doesn't support DLPack zero-copy.""" | ||
| if qd.cfg.arch in _NO_ZEROCOPY_ARCHS: | ||
| pytest.skip(f"DLPack zero-copy not supported on {qd.cfg.arch.name}") | ||
| if qd.cfg.arch == qd.metal: | ||
| from quadrants.lang.field import ( | ||
| _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET, # pylint: disable=C0415 | ||
| ) | ||
|
|
||
| if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET: | ||
| pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2") |
There was a problem hiding this comment.
🟡 The test helper _skip_if_no_zerocopy() (test_to_torch_copy.py:22-35) skips all tests when running on Apple Metal with torch ≤ 2.9.1, but production _can_zerocopy_field (field.py:168-170) gates the Metal version check on if not is_ndarray: — the C++ ndarray_to_dlpack hardcodes byte_offset = 0 and never calls torch_supports_byte_offset(). As a result, ndarray-only tests like test_scalar_ndarray_copy_false, test_matrix_ndarray_copy_false, test_vector_ndarray_copy_false (and the to_numpy variants) get spuriously skipped on Metal+old-torch even though they would pass. CI is unaffected (uses torch nightly per macosx/4_test.sh) — only local Metal dev with stable torch loses this coverage. Fix: thread an is_ndarray=False flag through the helper and only enforce the Metal version gate when it's a field.
Extended reasoning...
What the bug is
_skip_if_no_zerocopy() in tests/python/test_to_torch_copy.py:22-35 is the gate every copy=False test calls before exercising the new zero-copy API:
def _skip_if_no_zerocopy():
if qd.cfg.arch in _NO_ZEROCOPY_ARCHS:
pytest.skip(...)
if qd.cfg.arch == qd.metal:
from quadrants.lang.field import _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET
if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2")The Metal-version branch is uniform across all tests, so the helper fires for ndarray tests and field tests alike. Production code at field.py:160-176 (_can_zerocopy_field) is explicitly not uniform:
if not is_ndarray:
if arch == _ARCH_METAL and not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
return False
...The Metal version gate is field-only — is_ndarray=True callsites (Ndarray.to_torch, MatrixNdarray.to_torch, VectorNdarray.to_torch, and the to_numpy counterparts) skip the version check entirely.
Why the production carve-out is correct
The C++ has two separate DLPack export paths:
field_to_dlpack(dlpack_funcs.cpp:226): callstorch_supports_byte_offset()because field-in-tree offsets need non-zerobyte_offsetsupport, which torch added in 2.9.2 (Honor dlpack byte_offset pytorch/pytorch#168193).ndarray_to_dlpack(dlpack_funcs.cpp:424): hardcodesdl_tensor.byte_offset = 0. Always works regardless of torch version on MPS — the v0 capsule with zero offset has been supported by every torch DLPack importer that supports MPS at all.
So ndarray zero-copy on Metal works on torch 2.9.1, 2.9.0, etc. The production code knows this; the test helper does not.
Step-by-step proof
- Local dev box: macOS Apple Silicon, torch == 2.9.1 (stable),
qd.init(arch=qd.metal). _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSETis computed at import time astuple(map(int, ["2","9","1"])) > (2, 9, 1)→ False.- User runs
pytest tests/python/test_to_torch_copy.py::test_scalar_ndarray_copy_false. - The test calls
_skip_if_no_zerocopy()at line ~155. - The helper hits the metal-version branch at lines 30-35 and calls
pytest.skip("Metal DLPack zero-copy requires torch >= 2.9.2"). - The test never runs, even though
nd.to_torch(copy=False)would have succeeded:Ndarray.to_torch(_ndarray.py:481) takes thecopy is Falsebranch and calls_try_zerocopy_torch(self, copy=False, device=None, is_ndarray=True)._try_zerocopy_torchcalls_can_zerocopy_field(self, is_scalar=False, is_ndarray=True)._can_zerocopy_fieldat field.py:168 hitsif not is_ndarray:→ False → skips the Metal version check entirely.- Returns True; DLPack export proceeds normally and the test would have passed.
The same skip-when-should-run trace fires for test_matrix_ndarray_copy_false, test_vector_ndarray_copy_false, test_scalar_ndarray_to_numpy_copy_false, test_vector_ndarray_to_numpy_copy_false, and test_matrix_ndarray_to_numpy_copy_false — every ndarray-flavored copy=False test the PR introduces.
Why CI does not catch it
.github/workflows/scripts_new/macosx/4_test.sh has an explicit comment about installing pre-release torch nightly ≥ 2.9.2 specifically because of this PR; on CI _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET evaluates True and the helper does not skip. So the bug only fires for local Metal dev on stable torch — a niche but real configuration.
Impact
- Production: zero. The production code is correct.
- Test coverage: lost on local Metal dev with stable torch. Five ndarray tests skip when they should run, hiding any regression in the
is_ndarray=Truezero-copy path on that configuration. - Failure mode: skip rather than fail. Not silent corruption, not test failure — just absent coverage.
This is a regression of the original review's intent: the is_ndarray flag was added to production specifically so ndarrays would not be subject to the field-only Metal version gate, and the test helper missed the same distinction.
Suggested fix
Parametrize the helper with is_ndarray=False and skip the Metal version gate when called with is_ndarray=True:
def _skip_if_no_zerocopy(*, is_ndarray=False):
if qd.cfg.arch in _NO_ZEROCOPY_ARCHS:
pytest.skip(f"DLPack zero-copy not supported on {qd.cfg.arch.name}")
if qd.cfg.arch == qd.metal and not is_ndarray:
from quadrants.lang.field import _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET
if not _TORCH_MPS_SUPPORTS_DLPACK_BYTES_OFFSET:
pytest.skip("Metal DLPack zero-copy for fields requires torch >= 2.9.2")Then the ndarray tests pass is_ndarray=True, mirroring the production helper's contract.
Coverage Report (
|
| File | Coverage | Missing |
|---|---|---|
🟢 python/quadrants/lang/_ndarray.py |
81% | 453,468,481 |
🔴 python/quadrants/lang/field.py |
53% | 19,115-117,119,122-131,134,137,160,167,170,178,193-194,196,212,221,223,226,236,238,241,244-245,248,257,261,267-269,273-277,279,404,416,513-514,569,580,599 |
🔴 python/quadrants/lang/matrix.py |
76% | 22,1416,1438,1452,1849,1869,1889,1992,2012,2035 |
🔴 python/quadrants/lang/struct.py |
50% | 552,566 |
🟢 tests/python/test_to_torch_copy.py |
90% | 28,30,34-35,65,271-273,292-294,312-314,335-337,356-358,368,384-386,408-410,412-413,424,426-429,431-433,439,441-444,446-449,692-694 |
Diff coverage: 82% · Overall: 74% · 648 lines, 116 missing
Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough