[Cuda] Move implementatinos from jit_cuda.h into jit_cuda.cpp, so easier to … by hughperkins · Pull Request #23 · Genesis-Embodied-AI/quadrants

hughperkins · 2025-06-10T02:20:07Z

…modify without triggering major recompilation

Issue: #

Brief Summary

No code changes. Just move implementation from .h to .cpp, so easier to modify without triggering major recompilation

copilot:summary

Walkthrough

copilot:walkthrough

…modify without triggering major recompilation

Copilot

Pull Request Overview

This PR refactors jit_cuda.h by moving inline implementations into jit_cuda.cpp, reducing header recompilation when implementations change.

Declarations for JITModuleCUDA and JITSessionCUDA methods are moved to the header.
Method definitions are relocated to the .cpp file.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
taichi/runtime/cuda/jit_cuda.h	Stripped out inline implementations, leaving only declarations
taichi/runtime/cuda/jit_cuda.cpp	Added moved implementations for `JITModuleCUDA` and `JITSessionCUDA`

bo-rc

LGTM (moving implementations to .cpp files only).

duburcqa · 2025-06-10T21:49:22Z

Would be great to have PR titles that fit in one line. I think this is good practice that makes it for convenient to get a quick overview of the git tree.

duburcqa

I don't get why it was implemented in the header before. Do not make any sense to me.

hughperkins · 2025-06-10T21:55:15Z

Would be great to have PR titles that fit in one line. I think this is good practice that makes it for convenient to get a quick overview of the git tree.

Fair. Feel free to refuse to Approve PRs until I modify the title to your satisfaction, so I can get a feel for what works :)

hughperkins · 2025-06-10T21:56:31Z

I don't get why it was implemented in the header before. Do not make any sense to me.

I know why, because I always do the same thing: it's easier to write a small class in a single file, than spending a few minutes splitting it across two and adding [Classname]:: in front of each method name. (Then later on, I eventually split it into two files, once it starts getting annoying).

duburcqa · 2025-06-10T22:13:10Z

Ok, I'm probably just too used to splitting declaration and definition that it feels completely natural to me.

hughperkins · 2025-06-10T22:15:33Z

Ok, I'm probably just too used to splitting declaration and definition that it feels completely natural to me.

I used to rather like generating the header file contents automatically https://github.com/hughperkins/DeepCL/blob/46a7bba5eb9f0012d9d705e20c64848bad01285a/src/clmath/CLMathWrapper.h#L37-L55

hughperkins · 2025-06-10T22:17:22Z

(cog https://cog.readthedocs.io/en/latest/ )

Introduce the helper machinery that the per-class to_torch / to_numpy methods will migrate to in subsequent commits. Existing public symbols (can_zerocopy, dlpack_to_torch, invalidate_zerocopy_cache, current_arch_is_cpu) are preserved as deprecated shims so the in-tree pre-rework callers continue to work; they will be removed once every call site is migrated. New surface: - _ZerocopyCache: per-instance container with two independent slots (torch tensor + numpy ndarray), each filled lazily on first access via torch.utils.dlpack.from_dlpack and numpy.from_dlpack respectively. Numpy zero-copy now bypasses torch entirely (closes review #6). - make_zerocopy_cache_if_supported(owner, ...): constructs a cache when zero-copy is supported and registers `owner` with `pyquadrants.cache_holders` so invalidation is wired automatically (closes review #18). - get_zerocopy_torch / get_zerocopy_numpy: thin entry points that implement the always-zerocopy-then-clone semantic (closes review #15, #16, #21) and the Apple Metal double-sync (qd.sync() on read AND torch.mps.synchronize() after .clone()/.to(); closes review #1, #22, #23). Also applies the small lints from the review: - Module-level constant for the torch>2.9.1 MPS bytes_offset probe; drops the pointless lru_cache wrapper around a zero-arg helper (closes review #2). - ASCII '...' instead of Unicode horizontal ellipsis '\u2026' in the docstring (closes review #3). - Top-level imports for numpy and torch (try/except for the no-torch CI case); no per-call lazy imports in the new code path (closes review #7, #9). The deprecated shim still does what the existing per-class methods expect; the new helpers are torch-clean. cache_holders is still empty until the next commits register Ndarray / ScalarField / MatrixField; this commit alone is no-op behaviourally.

Convert Ndarray.to_torch / _ndarray_to_numpy / _ndarray_matrix_to_numpy to thin calls into _interop.get_zerocopy_torch / get_zerocopy_numpy. Behaviour changes vs. the previous PR state: - to_torch(copy=True) now ALSO uses the zerocopy export and clones the result, instead of skipping zerocopy entirely. Zerocopy DLPack is cheaper than the kernel-copy fallback even when followed by a clone (closes review #15, #16, #21). - to_torch on Apple Metal now does both syncs: qd.sync() before the read AND torch.mps.synchronize() after the .clone() so the cloned buffer is actually populated by the time the call returns (closes review #1, #22, #23). - to_numpy(copy=False) now goes through numpy.from_dlpack directly -- no torch round-trip, no torch import required (closes review #6). - to_numpy default (copy=None) keeps the current "independent copy" semantics: numpy arrays conventionally outlive their source, so the default is the safer one. Zero-copy is opt-in via copy=False. Cache invalidation switches from the legacy _qd_dlpack_tc attribute to the per-class _zerocopy_cache (cached_property) + _invalidate_zerocopy_cache method. Registration in pyquadrants.cache_holders happens automatically on first cache access (closes review #18). The existing _reset hook drops its zerocopy invalidation call: that work is now handled by the cache_holders loop in impl.reset() BEFORE C++ teardown, eliminating the use-after-free that motivated the Genesis to_numpy() always-copy workaround.

Convert ScalarField.to_torch / to_numpy to thin calls into _interop.get_zerocopy_torch / get_zerocopy_numpy, mirroring the Ndarray migration in the previous commit. - Adds Field._invalidate_zerocopy_cache method (used by all subclasses) so the impl.reset() cache_holders loop has a single contract to call. - Adds ScalarField._zerocopy_cache as a cached_property; constructed once per instance instead of re-checking can_zerocopy on every call (closes review #17). - Registers ScalarField in pyquadrants.cache_holders on first cache access (closes review #18). - to_torch(copy=True) now zerocopies and clones rather than skipping zerocopy entirely (closes review #15, #16, #21); to_torch on Metal does both syncs (closes review #1, #22, #23). - to_numpy(copy=False) goes through numpy.from_dlpack directly, no torch round-trip and no torch import required (closes review #6). Default to_numpy semantic (copy=None) remains "independent copy" for lifetime safety; zero-copy is opt-in via copy=False.

Convert MatrixField.to_torch / to_numpy to thin calls into _interop.get_zerocopy_torch / get_zerocopy_numpy. Adds a single _matrix_view_shape() helper that returns (expected_shape, as_vector); both to_torch and to_numpy now go through it instead of duplicating the n / m / keep_dims branching (closes review #13, #14). Same wins as the ScalarField migration: - _zerocopy_cache as cached_property (closes review #17) - automatic registration with pyquadrants.cache_holders (closes #18) - copy=True now zerocopies + clones (closes #15, #16, #21) - to_numpy(copy=False) goes through numpy.from_dlpack directly, no torch round-trip (closes #6) - Apple Metal double-sync via the centralised helpers (closes #1, #22, #23) The redundant reshape after zerocopy stays (DLPack returns the field's flat n-d shape; numpy/torch users expect the matrix dims appended) but it's now in one place per method instead of two.

Move implementatinos from jit_cuda.h into jit_cuda.cpp, so easier to …

b699ef3

…modify without triggering major recompilation

hughperkins requested a review from Copilot June 10, 2025 02:20

This comment was marked as outdated.

Sign in to view

remove maybe superflous code

67563c6

hughperkins requested a review from Copilot June 10, 2025 02:23

This comment was marked as outdated.

Sign in to view

remove extra space

25b84d2

hughperkins requested a review from Copilot June 10, 2025 02:25

Copilot AI reviewed Jun 10, 2025

View reviewed changes

hughperkins mentioned this pull request Jun 10, 2025

[Test] Mark all test_ad_ndarray and sparse matrix flaky #24

Closed

hughperkins added 3 commits June 10, 2025 07:06

Merge branch 'main' into hp/move-jit-cuda-implementationt-to-cpp

b514b13

pre-commit

28fd53d

Merge branch 'main' into hp/move-jit-cuda-implementationt-to-cpp

532aed0

bo-rc approved these changes Jun 10, 2025

View reviewed changes

duburcqa approved these changes Jun 10, 2025

View reviewed changes

hughperkins enabled auto-merge (squash) June 10, 2025 21:56

Merge branch 'main' into hp/move-jit-cuda-implementationt-to-cpp

14296a8

hughperkins merged commit ea7c78f into main Jun 11, 2025
16 of 17 checks passed

hughperkins deleted the hp/move-jit-cuda-implementationt-to-cpp branch June 11, 2025 00:25

hughperkins mentioned this pull request Apr 22, 2026

[Perf] Enable zero-copy in to_torch() and to_numpy() #450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cuda] Move implementatinos from jit_cuda.h into jit_cuda.cpp, so easier to …#23

[Cuda] Move implementatinos from jit_cuda.h into jit_cuda.cpp, so easier to …#23
hughperkins merged 7 commits intomainfrom
hp/move-jit-cuda-implementationt-to-cpp

hughperkins commented Jun 10, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

bo-rc left a comment

Uh oh!

duburcqa commented Jun 10, 2025

Uh oh!

duburcqa left a comment

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

duburcqa commented Jun 10, 2025 •

edited

Loading

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hughperkins commented Jun 10, 2025

Brief Summary

Walkthrough

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

bo-rc left a comment

Choose a reason for hiding this comment

Uh oh!

duburcqa commented Jun 10, 2025

Uh oh!

duburcqa left a comment

Choose a reason for hiding this comment

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

duburcqa commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

hughperkins commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

duburcqa commented Jun 10, 2025 •

edited

Loading