Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

Summary

Replace the _CUDA_DRIVER_API_V1 capsule with direct extraction of function pointers from cuda.bindings.cydriver.__pyx_capi__ at module import time.

Changes

  • Remove CudaDriverApiV1 struct and _get_cuda_driver_api_v1_capsule() from _resource_handles.pyx
  • Remove load_driver_api(), ensure_driver_loaded(), and related machinery from resource_handles.cpp
  • Add extern declarations for driver function pointers in resource_handles.hpp
  • Populate function pointers at module import using PyCapsule_GetPointer() with PyCapsule_GetName() for signature lookup
  • Update DESIGN.md to reflect the new __pyx_capi__ approach

This simplifies the architecture by eliminating the custom capsule struct and its loading machinery. The driver function pointers are now populated directly from Cython's built-in cross-module API mechanism.

Stats: 4 files changed, 228 insertions(+), 392 deletions(-) (net -164 lines)

Test Plan

  • Build succeeds
  • Import works on CPU-only machines (no GPU/driver required)
  • test_stream.py, test_event.py, test_memory.py, test_device.py pass
  • CI tests pass

Closes #1450

Replace the _CUDA_DRIVER_API_V1 capsule with direct extraction of function
pointers from cuda.bindings.cydriver.__pyx_capi__ at module import time.

This simplifies the architecture by eliminating the custom capsule struct
and its associated loading machinery (load_driver_api, ensure_driver_loaded,
cuGetProcAddress resolution). The driver function pointers are now populated
directly from Cython's built-in cross-module API mechanism.

Closes NVIDIA#1450
@Andy-Jost Andy-Jost added this to the cuda.core beta 11 milestone Jan 12, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Jan 12, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 12, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 3921ca6

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I didn't find anything visually, or with Cursor.

If we had more CI resources we'd be flying! :-)

@kkraus14
Copy link
Collaborator

Instead of reaching for the __pyx_capi__ which is an implementation detail of Cython, did we try using cdef api and cdef public? https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#c-api-declarations

This generates header file(s) that we can use nicely as expected instead of reaching into internal structs generated by Cython.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate using __pyx_capi__ to simplify resource handle architecture

3 participants