diff --git a/rfcs/20200612-stream-executor-c-api.md b/rfcs/20200612-stream-executor-c-api.md index fe0f93397..88f95b8af 100644 --- a/rfcs/20200612-stream-executor-c-api.md +++ b/rfcs/20200612-stream-executor-c-api.md @@ -3,9 +3,9 @@ | Status | Proposed | | :------------ | :------------------------------------------------------ | | **RFC #** | [257](https://github.com/tensorflow/community/pull/257) | -| **Author(s)** | Anna Revinskaya (annarev@google.com), Penporn Koanantakool (penporn@google.com), Yi Situ (yisitu@google.com), Russell Power (power@google.com) | +| **Authors** | Anna Revinskaya (annarev@google.com), Penporn Koanantakool (penporn@google.com), Yi Situ (yisitu@google.com), Russell Power (power@google.com) | | **Sponsor** | Gunhan Gulsoy (gunan@google.com) | -| **Updated** | 2020-07-15 | +| **Updated** | 2020-09-08 | # Objective @@ -21,7 +21,7 @@ to the current TensorFlow runtime. * Compatibility with the [new TensorFlow runtime stack](https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html). -* APIs that will expose all device-specific capabilities. +* APIs that will expose all device-specific capabilities. # Motivation @@ -40,7 +40,7 @@ The new TensorFlow stack, based on [TFRT](https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html) and [MLIR](https://www.tensorflow.org/mlir), is designed with this in mind. However, it is still in an active development phase and is not ready for third-party -device integration until later this year. (For device support expecting to land +device integration yet. (For device support expecting to land in 2021 or later, we highly recommend waiting to integrate with the new stack, since it is fundamentally different from the current stack and cannot guarantee code reuse.) @@ -48,15 +48,15 @@ code reuse.) In the meantime, we plan to provide limited device integration support for the current TensorFlow stack through [Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md). -We anticipate three basic functionalities within a device plugin module: +We anticipate three basic functionalities within a device plug-in module: * Device registration: Addressed in a different RFC, [Adding Pluggable Device for TensorFlow](https://github.com/tensorflow/community/pull/262). * Device management: The focus of this RFC. * Kernel and op registration and implementation: [RFC Accepted](https://github.com/tensorflow/community/blob/master/rfcs/20190814-kernel-and-op-registration.md). [C API implemented](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/). -[StreamExecutor](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_pimpl.h;l=73) is TensorFlow's main device manager, responsible for work execution and memory management. It provides a set of methods -(such as +[StreamExecutor](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_pimpl.h;l=73) is TensorFlow's main device manager, responsible for work execution and memory +management. It provides a set of methods (such as [Memcpy](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=240)) that can be customized for a particular device. @@ -72,65 +72,123 @@ A decoupled way to add a new device to TensorFlow. * Faster time-to-solution: Does not need code review from the TensorFlow team. * Lower maintenance efforts: Only C-API-related changes could break the integration. Unrelated TensorFlow changes would not break the code. - * The C APIs may be changed during the initial experimental phase based - on developer experience and feedback. When the APIs become more mature, - we will try to keep them stable (in a best-effort manner) until the new + * The C APIs may be changed during the initial experimental phase based + on developer experience and feedback. When the APIs become more mature, + we will try to keep them stable (in a best-effort manner) until the new TensorFlow stack is available. # Design Proposal -## StreamExecutorInterface - [StreamExecutorInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=166?q=StreamExecutorinterface) -is quite large and some of its methods are only sporadically used. Therefore, we -plan to wrap only a subset of key StreamExecutorInterface functionality. We decided on this subset based on the PluggableDevice usecase as well as potential future devices such as TPUs. - -Implementation conventions: - -* Structs include `struct_size` parameter. This parameter should be filled in both by TensorFlow and the plugin and can be checked to determine which struct fields are available for current version of TensorFlow. -* Struct name prefixes indicates which side of the API is responsible for populating the struct: - * `SE_` prefix: filled by TensorFlow. - * `SP_` prefix: filled by plugins, except `struct_size` which is also filled by TensorFlow when TensorFlow passes it to a callback. - -See proposed C API below: - -```cpp -#include -#include - +has a large number of methods, some of which are only sporadically used. +Therefore, we plan to wrap only a subset of key `StreamExecutorInterface` +functionality. We decided on this subset based on the [PluggableDevice](https://github.com/tensorflow/community/pull/262) +usecase as well as potential future devices such as TPUs. + +## Versioning Strategy and Stability +StreamExecutor C API follows Semantic Versioning 2.0.0 ([semver](http://semver.org/)). +Each release version has a format `MAJOR.MINOR.PATCH`, as outlined in +[TensorFlow version compatibility](https://www.tensorflow.org/guide/versions#semantic_versioning_20). +We also use struct sizes to track compatibility. More details on functionality +extension and deprecation can be found in [StreamExecutor C API Versioning Strategy](20200612-stream-executor-c-api/C_API_versioning_strategy.md). + +The C API will have an initial bake-in period, where we won’t have any +compatibility guarantees. However, we will make the best effort to perform any +updates in a backwards compatible way. For example, we plan to keep track of +struct sizes. During this period, the API will be kept at `MAJOR` version 0. + +The C API will be placed in [tensorflow/c/experimental](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/c/experimental/). +We will consider moving the API out of the experimental directory once it is +more stable. + +## Implementation Conventions + +* Struct prefix indicates whether struct fields should be filled by the plug-in or core TensorFlow implementation: + * `SE_`: Set/filled by core, unless marked otherwise. + * `SP_`: Set/filled by plug-in, unless marked otherwise. + * This prefix rule only applies to structures. Enumerations and methods are all prefixed with `SE_`. +* Structs begin with two fields: + * `size_t struct_size`: Stores the unpadded size of the struct. + * `void* ext`: A reserved field that may be populated by a plugin in `SP_*` structs or potential future extension points in `SE_` structs. Must be set to zero by default if it unused. +* We use `struct_size` for version checking by both core and plug-in. + * It is exempt from the `SE/SP` rule above and must be set both by core and plug-in. + * It can be checked programmatically to determine which struct fields are available in the structure. + * For example, `create_device` function receives `SP_Device*` as input with `struct_size` populated by core. The plug-in is responsible for setting `struct_size` as well, along with all other fields. +* When a member is added to a struct, the struct size definition must be updated to use the new last member of the struct. + +## Usage Overview + +The table below summarizes all structures defined and the functionality they involve. +| Action | Function call(s) | Populated by Core TensorFlow | Populated by plug-in | +| :----- | :-------------- | :--------------------------- | :------------------- | +| Register platform | `SE_InitPlugin` | `SE_PlatformRegistrationParams` | `SP_Platform`, `SP_PlatformFns` | +| Create device | `SP_PlatformFns::create_device` | `SE_CreateDeviceParams` | `SP_Device` | +| Create stream executor | `SP_PlatformFns::create_stream_executor` | `SE_CreateStreamExecutorParams` | `SP_StreamExecutor` | +| Create timer functions | `SP_PlatformFns::create_timer_fns` | None | `SP_TimerFns` | +| Get allocator stats | `SP_StreamExecutor::get_allocator_stats` | None | `SP_AllocatorStats` | +| Memory management | `SP_StreamExecutor::*allocate*`, `SP_StreamExecutor::*memcpy*` | None | `SP_DeviceMemoryBase` | + +### Registration +Core TensorFlow will register a new StreamExecutor platform as well as a new TensorFlow device with [DeviceFactory](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/common_runtime/device_factory.h;l=30?q=DeviceFactory). +1. Core TensorFlow links to plug-in's dynamic library and loads the function `SE_InitPlugin`. +2. Core TensorFlow populates `SE_PlatformRegistrationParams` and passes it in a call to `SE_InitPlugin`. + * In `SE_InitPlugin`, plug-in populates `SE_PlatformRegistrationParams::SP_Platform` and `SE_PlatformRegistrationParams::SP_PlatformFns`. +3. Core TensorFlow can now create a device, a stream executor, and a timer through functions in `SP_PlatformFns`. + * Core TensorFlow populates `SE_CreateDeviceParams` and pass it as a parameter to `SP_PlatformFns::create_device()`. + * Plug-in populates `SE_CreateDeviceParams::SP_Device`. + * Core TensorFlow populates `SE_CreateStreamExecutorParams` and pass it to `SP_PlatformFns::create_stream_executor()`. + * Plug-in populates `SE_CreateStreamExecutorParams::SP_StreamExecutor`. + * Core TensorFlow sets `struct_size` in `SP_Timer` and pass it in a call to `SP_PlatformFns::create_timer_fns`. + * Plug-in populates `SP_TimerFns`. +4. Core TensorFlow registers a new `PluggableDeviceFactory`. + +`PluggableDevice` is covered in a separate RFC: [Adding Pluggable Device For TensorFlow](https://github.com/tensorflow/community/pull/262). + + +### Definitions from Plug-in +Plug-in needs to provide: +* Methods: `SE_InitPlugin` and other methods declared in `SP_*` structs. +* Structures: `SP_Stream_st`, `SP_Event_st`, and `SP_Timer_st`. + +## Detailed API +```c++ #define SE_MAJOR 0 #define SE_MINOR 0 #define SE_PATCH 1 +// TF_Bool is the C API typedef for unsigned char, while TF_BOOL is +// the datatype for boolean tensors. +#ifndef TF_Bool +#define TF_Bool unsigned char +#endif // TF_Bool + +// Macro used to calculate struct size for maintaining ABI stability across +// different struct implementations. +#ifndef TF_OFFSET_OF_END +#define TF_OFFSET_OF_END(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) +#endif // TF_OFFSET_OF_END + #ifdef __cplusplus extern "C" { #endif -typedef SP_Stream_st* SP_Stream; -typedef SP_Event_st* SP_Event; -typedef SP_Timer_st* SP_Timer; -typedef TF_Status* (*TF_StatusCallbackFn)(void*); - -#ifndef TF_BOOL_DEFINED -#define TF_BOOL unsigned char -#endif // TF_BOOL_DEFINED - -#ifndef TF_OFFSET_OF_END -#define TF_OFFSET_OF_END(TYPE, MEMBER) (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) -#endif // TF_OFFSET_OF_END +typedef struct SP_Stream_st* SP_Stream; +typedef struct SP_Event_st* SP_Event; +typedef struct SP_Timer_st* SP_Timer; +// Takes `callback_arg` passed to `host_callback` as the first argument. +typedef void (*SE_StatusCallbackFn)(void* const, TF_Status* const); typedef struct SP_TimerFns { size_t struct_size; - void* ext; - uint64_t (*nanoseconds)(SE_Timer timer); - uint64_t (*microseconds)(SE_Timer timer); + void* ext; // reserved for future use + uint64_t (*nanoseconds)(SP_Timer timer); } SP_TimerFns; -#define SP_TIMER_FNS_STRUCT_SIZE TF_OFFSET_OF_END(SP_TimerFns, microseconds) +#define SP_TIMER_FNS_STRUCT_SIZE TF_OFFSET_OF_END(SP_TimerFns, nanoseconds) typedef struct SP_AllocatorStats { size_t struct_size; - void* ext; int64_t num_allocs; int64_t bytes_in_use; int64_t peak_bytes_in_use; @@ -148,8 +206,11 @@ typedef struct SP_AllocatorStats { int64_t largest_free_block_bytes; } SP_AllocatorStats; -#define SP_ALLOCATORSTATS_STRUCT_SIZE TF_OFFSET_OF_END(SP_AllocatorStats, largest_free_block_bytes) +#define SP_ALLOCATORSTATS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_AllocatorStats, largest_free_block_bytes) +// Potential states for an SP_Event. If `poll_for_status` returns anything aside +// from kPending or kComplete, an error has occurred; kUnknown is a bad state. typedef enum SE_EventStatus { SE_EVENT_UNKNOWN, SE_EVENT_ERROR, @@ -157,29 +218,25 @@ typedef enum SE_EventStatus { SE_EVENT_COMPLETE, } SE_EventStatus; -typedef struct SE_Options { - size_t struct_size; - void* ext; - int32_t ordinal; -} SE_Options; - -#define SE_OPTIONS_STRUCT_SIZE TF_OFFSET_OF_END(SE_Options, ordinal) - -typedef struct SE_DeviceMemoryBase { +// Memory allocation information. +// This matches DeviceMemoryBase defined here: +// https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/device_memory.h;l=57 +typedef struct SP_DeviceMemoryBase { size_t struct_size; - void* ext; + void* ext; // free-form data set by plugin + // Platform-dependent value representing allocated memory. void* opaque; - uint64_t size; - uint64_t payload; -} SE_DeviceMemoryBase; + uint64_t size; // Size in bytes of this allocation. + uint64_t payload; // Value for plugin's use +} SP_DeviceMemoryBase; -#define SE_DEVICE_MEMORY_BASE_STRUCT_SIZE TF_OFFSET_OF_END(SE_DeviceMemoryBase, payload) +#define SP_DEVICE_MEMORY_BASE_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_DeviceMemoryBase, payload) typedef struct SP_Device { size_t struct_size; - void* ext; // free-form field filled by plugin - const char* name; - size_t name_len; + void* ext; // free-form data set by plugin + int32_t ordinal; // device index // Device vendor can store handle to their device representation // here. @@ -188,146 +245,154 @@ typedef struct SP_Device { #define SP_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SP_Device, device_handle) +typedef struct SE_CreateDeviceParams { + size_t struct_size; + void* ext; // reserved for future use + int32_t ordinal; // device index + + SP_Device* device; // Input/output, struct_size set by TF for plugin to read. + // Subsequently plugin fills the entire struct. +} SE_CreateDeviceParams; + +#define SE_CREATE_DEVICE_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateDeviceParams, device) + typedef struct SP_StreamExecutor { size_t struct_size; - void* ext; + void* ext; // reserved for future use /*** ALLOCATION CALLBACKS ***/ - // Synchronously allocates size bytes on the underlying platform and returns - // a DeviceMemoryBase representing that allocation. In the case of failure, - // nullptr is returned. - // memory_space is reserved for a potential future usage and should be set + // Synchronously allocates `size` bytes on the underlying platform and returns + // `SP_DeviceMemoryBase` representing that allocation. In the case of failure, + // NULL is returned. + // `memory_space` is reserved for a potential future usage and should be set // to 0. - TF_DeviceMemoryBase* (*allocate)( - SP_Device* se, uint64_t size, int64_t memory_space); - + void (*allocate)(const SP_Device* device, uint64_t size, int64_t memory_space, + SP_DeviceMemoryBase* mem); // Deallocate the device memory previously allocated via this interface. - // Deallocation of a nullptr-representative value is permitted. - void (*deallocate)( - SP_Device* se, SE_DeviceMemoryBase* memory); + // Deallocation of a NULL representative value is permitted. + void (*deallocate)(const SP_Device* device, SP_DeviceMemoryBase* memory); + // Allocates a region of host memory and registers it with the platform API. + // Memory allocated in this manner is required for use in asynchronous memcpy + // operations, such as `memcpy_dtoh`. + void* (*host_memory_allocate)(const SP_Device* device, uint64_t size); + + // Deallocates a region of host memory allocated by `host_memory_allocate`. + void (*host_memory_deallocate)(const SP_Device* device, void* mem); + + // Allocates unified memory space of the given size, if supported. Unified + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void* (*unified_memory_allocate)(const SP_Device* device, uint64_t size); - // Fill SP_AllocatorStats with allocator statistics. - TF_BOOL (*get_allocator_stats)(SP_Device* executor, - SP_AllocatorStats* stats); - // Returns the underlying device memory usage information, if it is available. - // If it is not available (false is returned), free/total may not be - // initialized. - TF_BOOL (*device_memory_usage)( - SP_Device* executor, int64_t* free, int64_t* total); - - // Allocate host memory. - void* (*host_memory_allocate)(TF_Device* device, uint64_t size); - - // Deallocate host memory. - void (*host_memory_deallocate)(TF_Device* device, void *mem); - - // Allocates unified memory space of the given size, if supported. Support - // should be added by setting `supports_unified_memory` field in - // `DeviceDescription`. - void* (*unified_memory_allocate)(TF_Device* device, uint64_t bytes); - // Deallocates unified memory space previously allocated with - // `unified_memory_allocate`. - void (*unified_memory_deallocate)(TF_Device* device, void* location); + // `unified_memory_allocate`. Unified + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void (*unified_memory_deallocate)(const SP_Device* device, void* location); + + // Fills SP_AllocatorStats with allocator statistics, if it is available. + // If it is not available, return false. + TF_Bool (*get_allocator_stats)(const SP_Device* device, + SP_AllocatorStats* stats); + // Fills the underlying device memory usage information, if it is + // available. If it is not available (false is returned), free/total need not + // be initialized. + TF_Bool (*device_memory_usage)(const SP_Device* device, int64_t* free, + int64_t* total); /*** STREAM CALLBACKS ***/ - // Creates SE_Stream. This call should also Allocate stream + // Creates SP_Stream. This call should also allocate stream // resources on the underlying platform and initializes its // internals. - void (*create_stream)(SP_Device* executor, SP_Stream*, TF_Status*); + void (*create_stream)(const SP_Device* device, SP_Stream* stream, + TF_Status* status); - // Destroys SE_Stream and deallocates any underlying resources. - void (*destroy_stream)(SP_Device* executor, SP_Stream stream); + // Destroys SP_Stream and deallocates any underlying resources. + void (*destroy_stream)(const SP_Device* device, SP_Stream stream); - // Causes dependent to not begin execution until other has finished its + // Causes `dependent` to not begin execution until `other` has finished its // last-enqueued work. - TF_BOOL (*create_stream_dependency)( - SP_Device* executor, SP_Stream dependent, - SP_Stream other); + void (*create_stream_dependency)(const SP_Device* device, SP_Stream dependent, + SP_Stream other, TF_Status* status); // Without blocking the device, retrieve the current stream status. - void (*get_status)(SP_Device* executor, SP_Stream stream, - TF_Status* status); + void (*get_stream_status)(const SP_Device* device, SP_Stream stream, + TF_Status* status); /*** EVENT CALLBACKS ***/ - // Create SP_Event. Performs platform-specific allocation and initialization of an event. - void (*create_event)( - SP_Device* executor, SP_Event* event, TF_Status* status); + // Create SP_Event. Performs platform-specific allocation and initialization + // of an event. + void (*create_event)(const SP_Device* device, SP_Event* event, + TF_Status* status); - // Destroy SE_Event and perform any platform-specific deallocation and cleanup of an event. - void (*destroy_event)( - SP_Device* executor, SP_Event event, TF_Status* status); + // Destroy SE_Event and perform any platform-specific deallocation and + // cleanup of an event. + void (*destroy_event)(const SP_Device* device, SP_Event event); // Requests the current status of the event from the underlying platform. - SE_EventStatus (*poll_for_event_status)( - SP_Device* executor, SP_Event event); + SE_EventStatus (*get_event_status)(const SP_Device* device, SP_Event event); // Inserts the specified event at the end of the specified stream. - void (*record_event)( - SP_Device* executor, SP_Stream stream, - SP_Event event, TF_Status* status); + void (*record_event)(const SP_Device* device, SP_Stream stream, + SP_Event event, TF_Status* status); // Wait for the specified event at the end of the specified stream. - void (*wait_for_event)( - SP_Device* executor, SP_Stream stream, - SP_Event event, TF_Status* status); + void (*wait_for_event)(const SP_Device* const device, SP_Stream stream, + SP_Event event, TF_Status* const status); /*** TIMER CALLBACKS ***/ - // Creates TF_Timer. Allocates timer resources on the underlying platform and initializes its - // internals, setting `timer` output variable. Sets values in `timer_fns` struct. - void (*create_timer)(SP_Device* executor, SP_Timer* timer, SP_TimerFns* timer_fns, TF_Status* status); + // Creates SP_Timer. Allocates timer resources on the underlying platform + // and initializes its internals, setting `timer` output variable. Sets + // values in `timer_fns` struct. + void (*create_timer)(const SP_Device* device, SP_Timer* timer, + TF_Status* status); // Destroy timer and deallocates timer resources on the underlying platform. - void (*destroy_timer)(SP_Device* executor, SP_Timer timer, SP_TimerFns* timer_fns); + void (*destroy_timer)(const SP_Device* device, SP_Timer timer); // Records a start event for an interval timer. - TF_BOOL (*start_timer)( - SP_Device* executor, SP_Stream stream, SP_Timer timer); - + void (*start_timer)(const SP_Device* device, SP_Stream stream, SP_Timer timer, + TF_Status* status); // Records a stop event for an interval timer. - TF_BOOL (*stop_timer)( - SP_Device* executor, SP_Stream stream, SP_Timer timer); + void (*stop_timer)(const SP_Device* device, SP_Stream stream, SP_Timer timer, + TF_Status* status); /*** MEMCPY CALLBACKS ***/ // Enqueues a memcpy operation onto stream, with a host destination location - // host_dst and a device memory source, with target size size. - TF_BOOL (*memcpy_dtoh)( - SP_Device* executor, SP_Stream stream, - void* host_dst, - const SE_DeviceMemoryBase* device_src, - uint64_t size); - - // Enqueues a memcpy operation onto stream, with a device destination location - // and a host memory source, with target size size - TF_BOOL (*memcpy_htod)( - SP_Device* executor, SP_Stream stream, - SE_DeviceMemoryBase* device_dst, - const void* host_src, uint64_t size); - + // `host_dst` and a device memory source, with target size `size`. + void (*memcpy_dtoh)(const SP_Device* device, SP_Stream stream, void* host_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); + + // Enqueues a memcpy operation onto stream, with a device destination + // location and a host memory source, with target size `size`. + void (*memcpy_htod)(const SP_Device* device, SP_Stream stream, + SP_DeviceMemoryBase* device_dst, const void* host_src, + uint64_t size, TF_Status* status); + // Enqueues a memcpy operation onto stream, with a device destination // location and a device memory source, with target size `size`. void (*memcpy_dtod)(const SP_Device* device, SP_Stream stream, SP_DeviceMemoryBase* device_dst, const SP_DeviceMemoryBase* device_src, uint64_t size, TF_Status* status); - + // Blocks the caller while a data segment of the given size is // copied from the device source to the host destination. - TF_BOOL (*sync_memcpy_dtoh)( - SP_Device* executor, - void* host_dst, - const SE_DeviceMemoryBase* device_src, - uint64_t size); + void (*sync_memcpy_dtoh)(const SP_Device* device, void* host_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); // Blocks the caller while a data segment of the given size is // copied from the host source to the device destination. - TF_BOOL (*sync_memcpy_htod)( - SP_Device* executor, - SE_DeviceMemoryBase* device_dst, - const void* host_src, uint64_t size); - + void (*sync_memcpy_htod)(const SP_Device* device, + SP_DeviceMemoryBase* device_dst, + const void* host_src, uint64_t size, + TF_Status* status); + // Blocks the caller while a data segment of the given size is copied from the // device source to the device destination. void (*sync_memcpy_dtod)(const SP_Device* device, @@ -336,99 +401,152 @@ typedef struct SP_StreamExecutor { TF_Status* status); // Causes the host code to synchronously wait for the event to complete. - void (*block_host_for_event)( - SP_Device* executor, SP_Event event, TF_Status* status); + void (*block_host_for_event)(const SP_Device* device, SP_Event event, + TF_Status* status); + + // [Optional] + // Causes the host code to synchronously wait for operations entrained onto + // stream to complete. Effectively a join on the asynchronous device + // operations enqueued on the stream before this program point. + // If not set, then corresponding functionality will be implemented + // by registering an event on the `stream` and waiting for it using + // `block_host_for_event`. + void (*block_host_until_done)(const SP_Device* device, SP_Stream stream, + TF_Status* status); // Synchronizes all activity occurring in the StreamExecutor's context (most // likely a whole device). - TF_BOOL (*synchronize_all_activity)(SP_Device* executor); - - // Obtains metadata about the underlying device. - void (*fill_device_description)(SP_Device* executor, - SP_DeviceDescription* description, - TF_Status* status); + void (*synchronize_all_activity)(const SP_Device* device, TF_Status* status); // Enqueues on a stream a user-specified function to be run on the host. - TF_BOOL (*host_callback)(SP_Device* executor, SP_Stream* stream, - TF_StatusCallbackFn callback_fn, void* ctx); + // `callback_arg` must be passed as the first argument to `callback_fn`. + TF_Bool (*host_callback)(SP_Device* device, SP_Stream stream, + SE_StatusCallbackFn callback_fn, void* callback_arg); } SP_StreamExecutor; -#define SP_STREAMEXECUTOR_STRUCT_SIZE TF_OFFSET_OF_END(SP_StreamExecutor, host_callback) +#define SP_STREAMEXECUTOR_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_StreamExecutor, host_callback) + +typedef struct SE_CreateStreamExecutorParams { + size_t struct_size; + void* ext; // reserved for future use + + SP_StreamExecutor* stream_executor; // output, to be filled by plugin +} SE_CreateStreamExecutorParams; + +#define SE_CREATE_STREAM_EXECUTOR_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateStreamExecutorParams, stream_executor) typedef struct SP_Platform { size_t struct_size; - - // Free form data set by plugin. - void* ext; - - // Platform name + + void* ext; // free-form data set by plugin + + // Platform name. Must be null-terminated. const char* name; - size_t name_len; - - // Device type name, for example GPU. - char* type; - size_t type_len; - - // Callbacks for creating/destroying. - void (*create_device)( - SP_Device* device, \\ out - SE_Options* options, \\ in - TF_Status* status); \\ out - void (*destroy_device)(SP_Device* device); - - // Callbacks for creating/destroying SE_StreamExecutor. - void (*create_stream_executor)( - SP_StreamExecutor*, \\ out - TF_Status* status); \\ out - void (*destroy_stream_executor)(SP_StreamExecutor* stream_executor); + + // Device type name, for example GPU. Must be null-terminated. + const char* type; + + // Number of visible devices. + size_t visible_device_count; + + // Whether this platform supports unified memory. + // Unified memory is a single memory address space that virtualizes device and + // host memory addresses. It is accessible to both the device and host. + TF_Bool supports_unified_memory; } SP_Platform; -#define SP_PLATFORM_SIZE TF_OFFSET_OF_END(SP_Platform, destroy_stream_executor) +#define SP_PLATFORM_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_Platform, supports_unified_memory) + +typedef struct SP_PlatformFns { + size_t struct_size; + + void* ext; // reserved for future use + + // Callbacks for creating/destroying SP_Device. + void (*create_device)(const SP_Platform* platform, + SE_CreateDeviceParams* params, TF_Status* status); + + // Clean up fields inside SP_Device that were allocated + // by the plugin. `device` itself should not be deleted here. + void (*destroy_device)(const SP_Platform* platform, SP_Device* device); + + // Callbacks for creating/destroying SP_StreamExecutor. + void (*create_stream_executor)(const SP_Platform* platform, + SE_CreateStreamExecutorParams* params, + TF_Status* status); + // Clean up fields inside SP_StreamExecutor that were allocated + // by the plugin. `stream_executor` itself should not be deleted here. + void (*destroy_stream_executor)(const SP_Platform* platform, + SP_StreamExecutor* stream_executor); + + // Callbacks for creating/destroying SP_TimerFns. + void (*create_timer_fns)(const SP_Platform* platform, SP_TimerFns* timer, + TF_Status* status); + + void (*destroy_timer_fns)(const SP_Platform* platform, + SP_TimerFns* timer_fns); +} SP_PlatformFns; + +#define SP_PLATFORM_FNS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_PlatformFns, destroy_timer_fns) typedef struct SE_PlatformRegistrationParams { size_t struct_size; - void* ext; - + void* ext; // reserved for future use + // StreamExecutor C API version. int32_t major_version; int32_t minor_version; int32_t patch_version; - - // Must be filled by the plugin. - SP_Platform platform; // out + + SP_Platform* platform; // output, set by plugin + SP_PlatformFns* platform_fns; // output, set by plugin + // Clean up fields inside SP_Platform that were allocated + // by the plugin. `platform` itself should not be deleted here. + void (*destroy_platform)(SP_Platform* platform); // out, set by plugin + void (*destroy_platform_fns)( + SP_PlatformFns* platform_fns); // out, set by plugin } SE_PlatformRegistrationParams; -#define SE_PLATFORM_REGISTRATION_PARAMS_SIZE TF_OFFSET_OF_END(SE_PlatformRegistrationParams, platform) +#define SE_PLATFORM_REGISTRATION_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_PlatformRegistrationParams, destroy_platform_fns) -void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* status); +void SE_InitPlugin(SE_PlatformRegistrationParams* params, TF_Status* status); #ifdef __cplusplus -} // extern "C" +} // extern "C" #endif ``` -## Registration implementation +### PlatformId -Registration will be implemented by registering a new StreamExecutor platform as well as a new TensorFlow device with [DeviceFactory](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/common_runtime/device_factory.h;l=30?q=DeviceFactory). +StreamExecutor [Platform](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/platform.h;l=114) has an id parameter. This parameter will be hidden from the C API and set +internally by TensorFlow instead. + +## Usage Example +Code example for [PluggableDevice](https://github.com/tensorflow/community/pull/262) +registration outlined in the [Usage Overview](#Usage overview) section. + +### Core TensorFlow ```cpp -typedef (*SEPluginInitFn)(SE_PlatformRegistrationParams*, TF_Status*); +typedef void (*SEInitPluginFn)(SE_PlatformRegistrationParams*, TF_Status*); ... -void* plugin = dlopen("myplugin.so", ...); -if (!plugin) { - ... output error and skip this plugin ... -} -void* initialize_sym = dlsym(plugin, "SE_InitializePlugin"); +// On Windows, use `GetProcAddress` instead of `dlsym`. +void* initialize_sym = dlsym(plugin_dso_handle, "SE_InitPlugin"); if (!initialize_sym) { - ... output error and skip this plugin ... + // Output error and skip this plug-in. } -SEPluginInitFn initialize_fn = reinterpret_cast(initialize_sym); +SEInitPluginFn initialize_fn = reinterpret_cast(initialize_sym); SE_PlatformRegistrationParams params; -TF_Status* status = TF_NewStatus(); +TF_Status status; -initialize_fn(¶ms, status); +initialize_fn(¶ms, &status); // Register new platform std::unique_ptr platform( @@ -438,74 +556,80 @@ SE_CHECK_OK( std::move(platform))); // Register PluggableDevice -std::string platform_name_str(params.params.name, params.params.name_len); -std::string type_str(params.params.type, params.params.type_len); -DeviceFactory::Register(type_str, new PluggableDeviceFactory(platform_name_str), priority); +std::string platform_name_str(params.platform->name); +std::string type_str(params.platform->type); +DeviceFactory::Register(type_str, new PluggableDeviceFactory(platform_name_str), + priority); +... ``` -`PluggableDevice` is covered in a separate RFC: [RFC: Adding Pluggable Device For TensorFlow](https://github.com/tensorflow/community/pull/262). - -## PlatformId - -StreamExecutor [Platform](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/platform.h;l=114) has an id parameter. This parameter will be hidden from the C API and set internally by TensorFlow instead. - -## Usage example - -Define functions that create and destroy `SE_Device` and `SE_StreamExecutor`: +### Plug-in +Define functions that create and destroy `SP_Device`, `SP_StreamExecutor`, and +`SP_TimerFns`: ```cpp -void create_device(SP_Device* device, SE_Options* options, TF_Status* status) { - device->device_handle = get_my_device_handle(); +void create_device(const SP_Platform* platform, SE_CreateDeviceParams* params, + TF_Status* status) { + params->device->device_handle = get_my_device_handle(); ... } -void create_stream_executor(SP_StreamExecutor* se, TF_Status* status) { - se->memcpy_from_host = my_device_memcpy_from_host_function; +void create_stream_executor(const SP_Platform* platform, + SE_CreateStreamExecutorParams* params, + TF_Status* status) { + params->stream_executor->memcpy_htod = my_device_memcpy_from_host_function; ... } -void destroy_device(SP_Device* device) { - -- destroy device handle here -- +void create_timer_fns(const SP_Platform* platform, SP_TimerFns* timer_fns, + TF_Status* status) { + timer_fns->nanoseconds = nanoseconds; ... } -void destroy_stream_executor(SP_StreamExecutor* stream_executor) { - -- perform any clean up needed for stream executor -- +void destroy_device(const SP_Platform* platform, SP_Device* device) { + // Destroy device handle here. +} +void destroy_stream_executor(const SP_Platform* platform, + SP_StreamExecutor* se) { + // Perform any clean up needed for stream executor. +} +void destroy_timer_fns(const SP_Platform* platform, SP_TimerFns* timer_fns) { + // Destroy timer functions here. } ``` -Define `SE_InitializePlugin` that TensorFlow will call when registering the device plugin: +Define `SE_InitPlugin` that TensorFlow will call when registering the device +plug-in: ```cpp -void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* status) { +void SE_InitPlugin(SE_PlatformRegistrationParams* params, TF_Status* status) { int32_t visible_device_count = 2; - std::string name = "MyDevice"; std::string type = "GPU"; - params.params.id = id; - params.params.visible_device_count = visible_device_count; - params.params.create_device = create_device; - params.params.destroy_device = destroy_device; - params.params.create_stream_executor = create_stream_executor; - params.params.destroy_stream_executor = destroy_stream_executor; - params.params.name = name.c_str(); - params.params.name_len = name.size(); - params.params.type = type.c_str(); - params.params.type_len = type.size(); + // Sets struct_size to a valid value, and zero initializes other attributes. + *params = { SE_PLATFORM_REGISTRATION_PARAMS_STRUCT_SIZE }; + params->platform->name = name.c_str(); + params->platform->type = type.c_str(); + params->platform->visible_device_count = visible_device_count; + params->platform_fns->create_device = create_device; + params->platform_fns->destroy_device = destroy_device; + params->platform_fns->create_stream_executor = create_stream_executor; + params->platform_fns->destroy_stream_executor = destroy_stream_executor; + params->platform_fns->create_timer_fns = create_timer_fns; + params->platform_fns->destroy_timer_fns = destroy_timer_fns; } ``` -TensorFlow will call `InitializeSEPlugin` when registering the plugin. +## Stream / Timer / Event Representation -## Stream/Timer/Event representation - -API extension would require defining SP\_Stream\_st, SP\_Event\_st and -SP\_Timer\_st structs. From the point of view of TensorFlow, we will treat their +API extension would require defining `SP_Stream_st`, `SP_Event_st`, and +`SP_Timer_st` structs. From the point of view of TensorFlow, we will treat their pointers as opaque. Underneath, StreamExecutor will rely on customized implementations of -[StreamInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=114?q=TimerInterface&ss=tensorflow%2Ftensorflow), -[TimerInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=145?q=TimerInterface&ss=tensorflow%2Ftensorflow) +[StreamInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=114), +[TimerInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=145) and -[EventInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=76?q=TimerInterface&ss=tensorflow%2Ftensorflow). +[EventInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=76). For example, Stream customization might look as follows: ```cpp @@ -542,15 +666,6 @@ class CStream : public StreamInterface { }; ``` -## Stability / User Impact - -The C API will be placed under _tensorflow/c/experimental/_ directory. -Initially, we won’t have any compatibility guarantees. At the same time we will -make the best effort to perform any updates in a backwards compatible way. For -e.g. we plan to keep track of struct sizes. - -We will have an initial bake-in period before we consider moving the API out of experimental directory. - ## Alternatives Considered * **Forking:** Contributors could always fork the TensorFlow repository, diff --git a/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md b/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md index 1e394b718..de922bdca 100644 --- a/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md +++ b/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md @@ -1,226 +1,384 @@ -**Authors**: yisitu@, penporn@, annarev@ - -**Date**: 7/9/20 +# StreamExecutor C API Versioning Strategy +| Status | Proposed | +| :------------ | :------------------------------------------------------ | +| **RFC #** | Extension of #[257](https://github.com/tensorflow/community/pull/257) | +| **Authors** | Yi Situ (yisitu@google.com), Penporn Koanantakool (penporn@google.com), Anna Revinskaya (annarev@google.com) | +| **Sponsor** | Gunhan Gulsoy (gunan@google.com) | +| **Updated** | 2020-09-08 | In reply to a question on [PR #262](https://github.com/tensorflow/community/pull/262#issuecomment-653690654). -# TensorFlow Versioning Strategy - -TensorFlow StreamExecutorInterface (SEI) uses struct_size for version checking. Struct members are not allowed to be removed or reordered. Following are concrete examples of how TensorFlow remains compatible with plug-ins when functionality is added to or removed from StreamExecutorInterface. We will be using a simplified SE_Device as an example. - -## When TensorFlow extends functionality -### Backwards compatibility -TensorFlow is compiled against a newer SEI header (v2), which has SE_Device extended with device_handle. - -**Future TensorFlow compiled against StreamExecutorInterface v2** -```cpp -// StreamExecutorInterface header version 2 -typedef struct SE_Device { - size_t struct_size; - void* next; // Always set to zero, reserved by TF for future use. +StreamExecutor C API (SE C API) follows Semantic Versioning 2.0.0 +([semver](http://semver.org/)). Each release version has a format +`MAJOR.MINOR.PATCH`, as outlined in [TensorFlow version compatibility](https://www.tensorflow.org/guide/versions#semantic_versioning_20). +We also use `struct_size` to track compatibility. + +## Updating Guidelines +This section outlines when to update version numbers specific to SE C API +(`SE_MAJOR`, `SE_MINOR`, and `SE_PATCH`). + +### SE_MAJOR +* Potentially backwards incompatible changes. +* If a change is backwards incompatible, it requires an RFC because it will + break all current plug-ins. This should be rare. +* An `SE_MAJOR` update should be planned in a way that bundles as many pending + backwards incompatible changes together as possible to avoid breaking plug-ins + multiple times. +* There will be an announcement giving a grace period before the update happens. + +### SE_MINOR +* Backwards compatible changes. + * Adding a new variable, struct, method, enumeration, etc. + * Trivial deprecation of a variable, etc. by setting it to a no-op values, + e.g., 0 or `NULL`. + +### SE_PATCH +* Backwards compatible bug fixes. + +## Conventions +* Once a member is added to a struct, it cannot be removed, reordered, renamed, + or repurposed (i.e., assigned a different functionality). +* "Renaming" a member is equivalent to adding a new member with a new name and + eventually deprecating the member with the old name. +* Fields that cannot be 0 or `NULL` can be deprecated in a backwards compatible + manner by zero-initialization. + * If the field is set by core TensorFlow, plug-ins must perform input validation + on these fields for 0 and `NULL` before accessing them. + * Plug-ins know the fields are deprecated when they find 0 or `NULL` in + these fields. + * If the field is set by plug-in, TF can check if the field is non-zero (or not + `NULL`) and print a warning if so. + * Such fields must be explicitly marked by comments, to ensure all plug-ins + have consistent behavior (e.g., none of the plug-ins is using 0 or `NULL` as + a special case). See `// 0 is no-op` and `// NULL is no-op` in the + [By value inspection](#by-value-inspection) section for example. + + +## Detecting Incompatibility + +### By Comparing SE_MAJOR at Registration Time +At load time, both plug-in and core TensorFlow should check for version +compatibility. If the versions are not compatible, plug-in should output an +error and core TensorFlow should unload the plug-in. See code example below. + +Core TensorFlow passes its SE C API version number when calling plug-in's +initialization routine (`SE_InitPlugin`): +```c++ +typedef void (*SEInitPluginFn)(SE_PlatformRegistrationParams*, TF_Status*); +SE_PlatformRegistrationParams params{SE_PLATFORM_REGISTRATION_PARAMS_SIZE}; +params.major_version = SE_MAJOR; +params.minor_version = SE_MINOR; +params.patch_version = SE_PATCH; +TF_Status status; + +// Core TensorFlow sends its view of version numbers to plugin. +void* initialize_sym = dlsym(plugin, "SE_InitPlugin"); +if (!initialize_sym) { + // Output error and skip this plug-in. +} +SEInitPluginFn initialize_plugin_fn = reinterpret_cast(initialize_sym); +initialize_plugin_fn(¶ms, &status); +if(!tensorflow::StatusFromTF_Status(status).ok()) { + // Output error and skip this plug-in. +} +``` - const char* name; - size_t name_len; - void* device_handle; -} SE_Device; +Plug-in checks the `SE_MAJOR` version numbers and outputs error if they don't +match: +```c++ +void SE_InitPlugin(SE_PlatformRegistrationParams* params, + TF_Status* status) { + if (params->struct_size == 0) { + // *status = ... + LOG(ERROR) << "Invalid argument."; + return; + } + if (SE_MAJOR != params->major) { + // *status = ... + LOG(ERROR) << "Unsupported major version. Given: " << params->major + << " Expected: " << SE_MAJOR; + return; + } + ... +} +``` -// Evaluates to 40 -#define SE_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, device_handle) +### By Value Inspection +Deprecation of an attribute can sometimes be done in a backwards compatible +manner by leaving the attribute zero initialized. + +* The plugin performs input validation on each field for `NULL` or 0 value + before consuming it, preventing it from entering a bad state. +* If deprecation by zero-initialization is not possible (e.g., because default + value of zero may be a valid input), then the change is API incompatible; + TensorFlow has to bump the major version when the attribute is deprecated. + +For example, + +```c++ +struct Example { + int32_t cannot_be_zero; // 0 is no-op. + void* cannot_be_null; // NULL is no-op. + int32_t can_be_zero; + void* can_be_null; + int32_t optional_zero_default; // Optional. 0 by default. + void* optional_null_default; // Optional. NULL by default. +}; ``` - -The plugin was compiled against an older version of SEI header without device_handle. - -**Older Plugin compiled against StreamExecutorInterface v1** -```cpp -// StreamExecutorInterface header version 1 -typedef struct SE_Device { +* `cannot_be_zero` and `cannot_be_null` here can be deprecated by + zero-initializing. +* `can_be_zero` and `can_be_null` need a MAJOR version bump for deprecation, + since 0 and `NULL` are valid values for them. +* `optional_zero_default` and `optional_null_default` are optional fields that + use 0 / `NULL` to indicate that the field is not provided. This needs an + `SE_MAJOR` version bump for deprecation as well, since 0 and `NULL` are valid + here. + +For other unintentional changes which are caused by bugs (e.g., data was +forgotten to be initialized by mistake), file a Github issue. + +### By Checking Struct Size +Backwards compatible changes within the same `SE_MINOR` version can only add new +members to a struct and cannot modify any existing member. Because of this, we +can check the byte offset of the variable we want to consume against the struct +size to see if the struct has the variable or not. + +# Usage Example + +Following are concrete examples of how TensorFlow remains compatible with +plug-ins when functionality is added to or removed from StreamExecutorInterface. + +## Extending Functionality +The following snippet shows `void* new_field1` and `int new_field2` being added +to a `Toy` struct. + +```diff +#define SE_MAJOR 1 +- #define SE_MINOR 0 ++ #define SE_MINOR 1 // Increment minor version. +#define SE_PATCH 0 + +typedef struct Toy { size_t struct_size; - void* next; - - const char* name; - size_t name_len; -} SE_Device; - + void* ext; // Free-form data set by plugin. + int32_t old_field; // Device index. ++ void* new_field1; // NULL is no-op. ++ int new_field2; // 0 is no-op. +} Toy; + +- // Evaluates to 20 +- #define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, old_field) ++ // Evaluates to 36 ++ #define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, new_field2) +``` -// Evaluates to 32 -#define SE_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, name_len) +To concisely cover compatibility of cases where structs are created by core +TensorFlow and by plug-ins, we will call the side that creates the struct +`producer`, and the side that takes the struct `consumer`. -// Plugin Implementation +### Producer Has Older Header Files -SE_Device* Plugin_CreateDevice() { - SE_Device* se = new SE_Device{ SE_DEVICE_STRUCT_SIZE }; - // Based on header v1, se->struct_size will be 32 +```cpp +// Producer implementation has v1.0.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + // Based on header v1.0.0, toy->struct_size is 20. ... - return se; + old_field = set_old_field(); + return toy; } -``` - -TensorFlow checks that struct_size must be greater than the offset of device_handle before accessing it. - -```cpp -// TF Implementation - -void TF_Foo(const SE_Device* device) { - // TF checks for struct_size greater than 32. - if (device->struct_size > offsetof(SE_Device, device_handle)) { - // TF knows that device_handle can be safely read from. - DoSomething(device->device_handle); +// Consumer implementation has v1.1.0 headers. +void take_toy(const Toy* toy) { + // Consumer checks for `struct_size` greater than 24 (offset of `new_field1`). + // In this case, `toy->struct_size` = 20 so this `if` is not entered. + if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { + // Safe to access `new_field1`. + } + // Consumer checks for `struct_size` greater than 32 (offset of `new_field2`). + // In this case, `toy->struct_size` = 20 so this `if` is not entered. + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. } } ``` -### Forwards compatibility - -In the event that a plugin is up to date or newer, se->struct_size would have been initialized to 48. This would then pass the TF_Foo() check above and device_handle can be safely accessed. - -**Future Plugin compiled against StreamExecutorInterface v3** +### Producer Has Newer Header Files ```cpp -// StreamExecutorInterface header version 3 -typedef struct SE_Device { - size_t struct_size; - void* next; - - const char* name; - size_t name_len; - void* device_handle; - void* data; // Added in v3 -} SE_Device; - -// Evaluates to 48 -#define SE_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, data) - - -// Plugin Implementation - -SE_Device* Plugin_CreateDevice() { - SE_Device* se = new SE_Device{ SE_DEVICE_STRUCT_SIZE }; - // se->struct_size will be 48 +// Producer implementation has v1.1.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + // Based on header v1.1.0, toy->struct_size is 36. ... - return se; + old_field = set_old_field(); + new_field1 = set_new_field1(); + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.0.0 headers. +void take_toy(const Toy* toy) { + // `new_field1` and `new_field2` are safely ignored + // because consumer doesn't know about them. } ``` + +If `producer` depends on `consumer` knowing about `new_field1` and `new_field2`, +adding `new_field1` and `new_field2` would be a backwards incompatible change +and `SE_MAJOR` should be bumped instead. -Using the same TF_Foo() above, TF_Foo() was implemented before SE_Device::data was added after SE_Device::device_handle. Since TensorFlow only knows about the members that come before SE_Device::data, the newly added device->data will not be accessed. - -## When TensorFlow deprecates functionality -When functionality is being deprecated, there will be comments next to the member indicating so. The member is left in place to preserve the alignment and offset of the existing structure members. - -Since members are not allowed to be removed or reordered, refactors (e.g. renaming device_handle to dev_handle) or changing of member types (e.g. from int to float) are considered as deprecation. - -### Backwards compatibility -SE_Device::data has been deprecated in version 4, and a comment in the header indicated as such. +## Deprecating Functionality + +When functionality is being deprecated, there will be comments next to the +member indicating so. The member is left in place to preserve the alignment and +offset of the existing structure members. General guidelines: +* Add comments saying which field will be deprecated. +* The minor update will still support `deprecating_feature` to allow time for + transition. This would be a good time to raise concerns on Github. +* After the transition time has passed, `deprecating_feature` can be removed in + a major update. -**Future TensorFlow compiled against StreamExecutorInterface v4** - -```cpp -// StreamExecutorInterface header version 4 -typedef struct SE_Device { +Since members are not allowed to be removed or reordered, refactors (e.g., +renaming device_handle to dev_handle) or changing of member types (e.g., from +`int` to `float`) are considered as +[deprecation with extension](#Deprecation-with-extension). + +The following code snippet shows deprecation of `new_field1`. +```diff +#define SE_MAJOR 1 +- #define SE_MINOR 1 ++ #define SE_MINOR 2 // Increment minor version. +#define SE_PATCH 0 + +typedef struct Toy { size_t struct_size; - void* next; - - const char* name; - size_t name_len; - void* device_handle; - void* data; // Deprecated -} SE_Device; - -// Evaluates to 48 -#define SE_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, data) + void* ext; // Free-form data set by plugin. + int32_t old_field; // Device index. +- void* new_field1; // NULL is no-op. ++ void* new_field1; // Deprecated. // NULL is no-op. + int new_field2; // 0 is no-op. +} Toy; + +// Evaluates to 36 +#define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, new_field2) +``` -// TF Implementation +To concisely cover compatibility of cases where structs are created by core +TensorFlow and by plug-ins, we will call the side that creates the struct +`producer`, and the side that takes the struct `consumer`. -void TF_Foo(const SE_Device* device) { - // TF checks for struct_size greater than 32. - if (device->struct_size > offsetof(SE_Device, device_handle)) { - // TF knows that device_handle can be safely accessed. - DoSomething(device->device_handle); - } +### Producer Has Older Header Files - // TensorFlow removes implementation to stop using deprecated functionality. - /* - if (device->struct_size > offsetof(SE_Device, data)) { - // TF knows that device->data can be safely accessed. - DoSomething(device->data); +```diff +// Producer implementation has v1.1.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + ... + old_field = set_old_field(); + new_field1 = set_new_field1(); + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.2.0 headers. +void take_toy(const Toy* toy) { +- // Consumer removes the code using `new_field1`. +- if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { +- // Safe to access `new_field1`. +- } + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. } - */ - } ``` -The plugin, being older, was initializing the recently deprecated SE_Device::data. Since TF_Foo() does not access it anymore, SE_Device::data will be safely ignored (even though it was initialized). - -### Forwards compatibility -Plugins may choose to support older TensorFlow releases that have deprecated functionality. - -In a simple case, TensorFlow is already performing input validation and capable of providing best effort forward compatibility with newer plugins. - -**Older TensorFlow compiled against StreamExecutorInterface v4 with data validation** +The producer, being older, initializes the recently deprecated `new_field1`. +Since consumer's `take_toy` does not access it anymore, `new_field1` will be +safely ignored (even though it was initialized). -```cpp -void TF_Foo(const SE_Device* device) { - ... - // TF checks for struct_size greater than offset of data, and also validates device->data. - if (device->struct_size > offsetof(SE_Device, data) && device->data != nullptr) { - // TF knows that data can be safely accessed. - DoSomething(device->data); +### Producer Has Newer Header Files + +```diff +// Producer implementation has v1.2.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; ++ // `new_field1` is zero-initialized with the line above. + ... + old_field = set_old_field(); +- new_field1 = set_new_field1(); // Stops setting the deprecated `new_field1`. + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.1.0 headers. +void take_toy(const Toy* toy) { ++ // `new_field1` is `NULL` so it is safely ignored. ++ // Can also add code to raise an error here when `NULL` is detected. + if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { + // Safe to access `new_field1`. + } + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. } } ``` - -This way, plugins can safely remove implementation of deprecated functionality. - -**Future Plugin compiled against StreamExecutorInterface v5** -```cpp -// StreamExecutorInterface header version 5 -typedef struct SE_Device { - size_t struct_size; - void* next; - - const char* name; - size_t name_len; - void* device_handle; - void* data; // Deprecated in v4 - void* data2; -} SE_Device; +This way, plug-ins can safely remove implementation of deprecated functionality. -// Evaluates to 56 -#define SE_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, data2) +## Deprecation with Extension +This is the more common form of deprecation where the struct is extended with a +new attribute that replaces an existing one. The analysis is the same as +[Extending functionality](#Extending-functionality) and +[Deprecating functionality](#Deprecating-functionality) combined. +General guidelines: +* Add comments saying which field will be deprecated and which one will replace + it. +* Increment the minor version. +* The minor update will support both `name` and `better_name` to allow time for + transition. This would be a good time to raise concerns on Github. +* After the transition time has passed, `name` can be removed in a major update. -// Plugin Implementation +Below are some examples. -SE_Device* Plugin_CreateDevice() { - SE_Device* se = new SE_Device{ SE_DEVICE_STRUCT_SIZE }; - // se->struct_size will be 56 - se->device_handle = GetHandle(); +```diff +#define SE_MAJOR 5 +- #define SE_MINOR 0 ++ #define SE_MINOR 1 // Increment minor version +#define SE_PATCH 0 - // se->data was deprecated so ignore it. It was already zero initialized - // at “SE_Device{ SE_DEVICE_STRUCT_SIZE }” above. +// Case 1 - Renaming an attribute +typedef struct Device { + size_t struct_size; + void* ext; + int32_t ordinal; - se->data2 = GetData2(); - return se; -} +- const char* name; ++ const char* name; // Deprecating soon. Use `better_name`. + void* device_handle; + const char* better_name; // Replaces `name`. +} Device; + + +// Case 2 - Deprecation of an entire struct can be done without a replacement... ++ // `Device` struct will be deprecated soon. +typedef struct Device { +... +} Device; + +// ...or with a replacement ++ // Replaces `Device`. ++ typedef struct BetterDevice { ++ ... ++ } Device; + +// Case 3 - Renaming a function. +typedef struct ExportFunctions { +... ++ // create_device will be deprecated soon. + void (*create_device)(Device* device); + ++ // Replaces `create_device`. ++ void (*create_better_device)(BetterDevice* device); +} ExportFunctions; ``` - -In a more complex scenario, an older TensorFlow release might consume deprecated functionality for granted. - -**Older TensorFlow compiled against StreamExecutorInterface v4 without data validation** -```cpp -void TF_Foo(const SE_Device* device) { - ... - // TF checks for struct_size greater than offset of data. - // No input validation. - if (device->struct_size > offsetof(SE_Device, data)) { - // Will crash on null pointer dereference. - Dereference(device->data); - } -} -``` - -In this case, it is recommended for plugins to continue to keep the deprecated implementation around. Once the plugin stops supporting the latest version of TensorFlow that uses the deprecated functionality, the implementation can be safely removed. This comes at the cost of maintenance of legacy deprecated code on the plugin side. - -## Limitations +# Limitations * Maximum supported alignment is 8 bytes.