Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions cpp/src/arrow/c/abi.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,27 @@
// specific language governing permissions and limitations
// under the License.

/// \file abi.h Arrow C Data Interface
///
/// The Arrow C Data interface defines a very small, stable set
/// of C definitions which can be easily copied into any project's
/// source code and vendored to be used for columnar data interchange
/// in the Arrow format. For non-C/C++ languages and runtimes,
/// it should be almost as easy to translate the C definitions into
/// the corresponding C FFI declarations.
///
/// Applications and libraries can therefore work with Arrow memory
/// without necessarily using the Arrow libraries or reinventing
/// the wheel. Developers can choose between tight integration
/// with the Arrow software project or minimal integration with
/// the Arrow format only.

#pragma once

#include <stdint.h>

// Spec and documentation: https://arrow.apache.org/docs/format/CDataInterface.html

#ifdef __cplusplus
extern "C" {
#endif
Expand Down Expand Up @@ -65,6 +82,61 @@ struct ArrowArray {

#endif // ARROW_C_DATA_INTERFACE

#ifndef ARROW_C_DEVICE_DATA_INTERFACE
#define ARROW_C_DEVICE_DATA_INTERFACE

// Spec and Documentation: https://arrow.apache.org/docs/format/CDeviceDataInterface.html

// DeviceType for the allocated memory
typedef int32_t ArrowDeviceType;

// CPU device, same as using ArrowArray directly
#define ARROW_DEVICE_CPU 1
// CUDA GPU Device
#define ARROW_DEVICE_CUDA 2
// Pinned CUDA CPU memory by cudaMallocHost
#define ARROW_DEVICE_CUDA_HOST 3
// OpenCL Device
#define ARROW_DEVICE_OPENCL 4
// Vulkan buffer for next-gen graphics
#define ARROW_DEVICE_VULKAN 7
// Metal for Apple GPU
#define ARROW_DEVICE_METAL 8
// Verilog simulator buffer
#define ARROW_DEVICE_VPI 9
// ROCm GPUs for AMD GPUs
#define ARROW_DEVICE_ROCM 10
// Pinned ROCm CPU memory allocated by hipMallocHost
#define ARROW_DEVICE_ROCM_HOST 11
// Reserved for extension
#define ARROW_DEVICE_EXT_DEV 12
// CUDA managed/unified memory allocated by cudaMallocManaged
#define ARROW_DEVICE_CUDA_MANAGED 13
// unified shared memory allocated on a oneAPI non-partitioned device.
#define ARROW_DEVICE_ONEAPI 14
// GPU support for next-gen WebGPU standard
#define ARROW_DEVICE_WEBGPU 15
// Qualcomm Hexagon DSP
#define ARROW_DEVICE_HEXAGON 16

struct ArrowDeviceArray {
// the Allocated Array
//
// the buffers in the array (along with the buffers of any
// children) are what is allocated on the device.
struct ArrowArray array;
// The device id to identify a specific device
int64_t device_id;
// The type of device which can access this memory.
ArrowDeviceType device_type;
// An event-like object to synchronize on if needed.
void* sync_event;
Copy link

@GregoryKimball GregoryKimball May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please help me understand the void* sync_event pointer a bit better in the context of CUDA/C++? Is this a function pointer that is expected to be called with a cudaStream_t parameter provided by the application? Would there be a benefit from storing a cudaStream_t in ArrowDeviceArray (in the reserved bytes or elsewhere)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of CUDA, this would be a pointer to a cudaEvent_t. It would be the producer's responsibility to create the event and call cudaEventRecord to ensure the relevant work in the stream(s) are captured in the event. The consumer can then call cudaStreamWaitEvent which is typically a device-side more efficient stream synchronization mechanism than cudaStreamSynchronize (if they need to wait for host code they can still use cudaEventSynchronize instead).

If both sides are on the same stream, then the cudaStreamWaitEvent call should have negligible overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in the context of CUDA/C++ the void* sync_event would be expected to be a cudaEvent_t* that the producer created and will trigger based on when the data is available.

A consumer would then do something like:

auto event = reinterpret_cast<cudaEvent_t*>(device_array->sync_event);
cudaStreamWaitEvent(stream, event);
// add processing for device_array data to stream to process *after* waiting on the event.

As per the previous discussions, most frameworks for cuda don't actually make their internal streams easily externally available so we aren't expecting a cudaStream_t to get passed. In the future if usage deems it necessary, we could absolutely leverage the reserved bytes to add a stream/queue pointer or something. But the initial pass here is intended to pass an event via the void* sync_event that a queue can wait on from the producer and then just operate on the data from there.

// Reserved bytes for future expansion.
int64_t reserved[3];
};

#endif // ARROW_C_DEVICE_DATA_INTERFACE

#ifndef ARROW_C_STREAM_INTERFACE
#define ARROW_C_STREAM_INTERFACE

Expand Down Expand Up @@ -106,6 +178,56 @@ struct ArrowArrayStream {

#endif // ARROW_C_STREAM_INTERFACE

#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
#define ARROW_C_DEVICE_STREAM_INTERFACE

// Equivalent to ArrowArrayStream, but for ArrowDeviceArrays.
//
// This stream is intended to provide a stream of data on a single
// device, if a producer wants data to be produced on multiple devices
// then multiple streams should be provided. One per device.
struct ArrowDeviceArrayStream {
// The device that this stream produces data on.
ArrowDeviceType device_type;

// Callback to get the stream schema
// (will be the same for all arrays in the stream).
//
// Return value 0 if successful, an `errno`-compatible error code otherwise.
//
// If successful, the ArrowSchema must be released independently from the stream.
// The schema should be accessible via CPU memory.
int (*get_schema)(struct ArrowDeviceArrayStream* self, struct ArrowSchema* out);

// Callback to get the next array
// (if no error and the array is released, the stream has ended)
//
// Return value: 0 if successful, an `errno`-compatible error code otherwise.
//
// If successful, the ArrowDeviceArray must be released independently from the stream.
int (*get_next)(struct ArrowDeviceArrayStream* self, struct ArrowDeviceArray* out);

// Callback to get optional detailed error information.
// This must only be called if the last stream operation failed
// with a non-0 return code.
//
// Return value: pointer to a null-terminated character array describing
// the last error, or NULL if no description is available.
//
// The returned pointer is only valid until the next operation on this stream
// (including release).
const char* (*get_last_error)(struct ArrowDeviceArrayStream* self);

// Release callback: release the stream's own resources.
// Note that arrays returned by `get_next` must be individually released.
void (*release)(struct ArrowDeviceArrayStream* self);

// Opaque producer-specific data
void* private_data;
};

#endif // ARROW_C_DEVICE_STREAM_INTERFACE

#ifdef __cplusplus
}
#endif
4 changes: 4 additions & 0 deletions docs/source/format/CDataInterface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ Examples
has format string ``+us:4,5``; its two children have names ``ints`` and
``floats``, and format strings ``i`` and ``f`` respectively.

.. _c-data-interface-struct-defs:

Structure definitions
=====================
Expand Down Expand Up @@ -531,6 +532,7 @@ parameterized extension types).
The ``ArrowArray`` structure exported from an extension array simply points
to the storage data of the extension array.

.. _c-data-interface-semantics:

Semantics
=========
Expand Down Expand Up @@ -703,6 +705,8 @@ C producer examples
Exporting a simple ``int32`` array
----------------------------------

.. _c-data-interface-export-int32-schema:

Export a non-nullable ``int32`` type with empty metadata. In this case,
all ``ArrowSchema`` members point to statically-allocated data, so the
release callback is trivial.
Expand Down
Loading