diff --git a/docs/source/format/ADBC.rst b/docs/source/format/ADBC.rst new file mode 100644 index 00000000000..b71c8fe19fb --- /dev/null +++ b/docs/source/format/ADBC.rst @@ -0,0 +1,299 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================================= +ADBC: Arrow Database Connectivity +================================= + +Rationale +========= + +The Arrow ecosystem lacks standard database interfaces built around +Arrow data, especially for efficiently fetching large datasets +(i.e. with minimal or no serialization and copying). Without a common +API, the end result is a mix of custom protocols (e.g. BigQuery, +Snowflake) and adapters (e.g. Turbodbc_) scattered across languages. +Consumers must laboriously wrap individual systems (as `DBI is +contemplating`_ and `Trino does with connectors`_). + +ADBC aims to provide a minimal database client API standard, based on +Arrow, for C, Go, and Java (with bindings for other languages). +Applications code to this API standard (in much the same way as they +would with JDBC or ODBC), but fetch result sets in Arrow format +(e.g. via the :doc:`C Data Interface <./CDataInterface>`). They then +link to an implementation of the standard: either directly to a +vendor-supplied driver for a particular database, or to a driver +manager that abstracts across multiple drivers. Drivers implement the +standard using a database-specific API, such as Flight SQL. + +Goals +----- + +- Provide a cross-language, Arrow-based API to standardize how clients + submit queries to and fetch Arrow data from databases. +- Support both SQL dialects and the emergent `Substrait`_ standard. +- Support explicitly partitioned/distributed result sets to work + better with contemporary distributed systems. +- Allow for a variety of implementations to maximize reach. + +Non-goals +--------- + +- Replacing JDBC/ODBC in all use cases, particularly `OLTP`_ use + cases. +- Requiring or enshrining a particular database protocol for the Arrow + ecosystem. + +Example use cases +----------------- + +A C or C++ application wishes to retrieve bulk data from a Postgres +database for further analysis. The application is compiled against +the ADBC header, and executes queries via the ADBC APIs. The +application is linked against the ADBC libpq driver. At runtime, the +driver submits queries to the database via the Postgres client +libraries, and retrieves row-oriented results, which it then converts +to Arrow format before returning them to the application. + +If the application wishes to retrieve data from a database supporting +Flight SQL instead, it would link against the ADBC Flight SQL driver. +At runtime, the driver would submit queries via Flight SQL and get +back Arrow data, which is then passed unchanged and uncopied to the +application. (The application may have to edit the SQL queries, as +ADBC does not translate between SQL dialects.) + +If the application wishes to work with multiple databases, it would +link against the ADBC driver manager, and specify the desired driver +at runtime. The driver manager would pass on API calls to the correct +driver, which handles the request. + +ADBC API Standard 1.0.0 +======================= + +ADBC is a language-specific set of interface definitions that can be +implemented directly by a vendor-specific "driver" or a vendor-neutral +"driver manager". + +Version 1.0.0 of the standard corresponds to tag adbc-1.0.0 of the +repository ``apache/arrow-adbc``, which is commit +f044edf5256abfb4c091b0ad2acc73afea2c93c0_. Note that is is separate +from releases of the actual implementations. + +See the language-specific pages for details: + +.. toctree:: + :maxdepth: 1 + + ADBC/C + ADBC/Go + ADBC/Java + +Updating this specification +=========================== + +ADBC is versioned separately from the core Arrow project. The API +standard and components (driver manager, drivers) are also versioned +separately, but both follow semantic versioning. + +For example: components may make backwards-compatible releases as +1.0.0, 1.0.1, 1.1.0, 1.2.0, etc. They may release +backwards-incompatible versions such as 2.0.0, but which still +implement the API standard version 1.0.0. + +Similarly, this documentation describes the ADBC API standard version +1.0.0. If/when an ABI-compatible revision is made +(e.g. new standard options are defined), the next version would be +1.1.0. If incompatible changes are made (e.g. new API functions), the +next version would be 2.0.0. + +Related work +============ + +In the initial proposal, a survey of existing solutions and systems +was included, which is reproduced below for context, though note the +descriptions are only kept up-to-date on a best-effort basis. + +Comparison with Arrow Flight SQL +-------------------------------- + +Flight SQL is a **client-server protocol** oriented at database +developers. By implementing Flight SQL, a database can support +clients that use ADBC, JDBC, and ODBC. + +ADBC is an **API specification** oriented at database clients. By +coding to ADBC, an application can get Arrow data from a variety of +databases that use different client technologies underneath. + +Hence, the two projects complement each other. While Flight SQL +provides a client that can be used directly, we expect applications +would prefer to use ADBC instead of tying themselves to a particular +database. + +Comparison with JDBC/ODBC +------------------------- + +JDBC is a row-based API, so bridging JDBC to Arrow is hard to do +efficiently. + +ODBC provides support for bulk data with `block cursors`_, and +Turbodbc_ demonstrates that a performant Arrow-based API can be built +on top. However, it is still an awkward fit for Arrow: + +- Nulls (‘indicator’ values) are `represented as integers`_, requiring + conversion. +- `Result buffers are caller-allocated`_. This can force unnecessarily + copying data. ADBC uses the C Data Interface instead, eliminating + copies when possible (e.g. if the driver uses Flight SQL). +- Some data types are represented differently, and require + conversion. `SQL_C_BINARY`_ can sidestep this for drivers and + applications that cooperate, but then applications would have to + treat Arrow-based and non-Arrow-based data sources differently. + + - `Strings must be null-terminated`_, which would require a copy + into an Arrow array, or require that the application handle null + terminated strings in an array. + - It is implementation-defined whether strings may have embedded + nulls, but Arrow specifies UTF-8 strings for which 0x00 is a valid + byte. + - Because buffers are caller-allocated, the driver and application + must cooperate to handle large strings; `the driver must truncate + the value`_, and the application can try to fetch the value again. + - ODBC uses length buffers rather than offsets, requiring another + conversion to/from Arrow string arrays. + - `Time intervals use different representations`_. + +Hence, we think just extending ODBC is insufficient to meet the goals +of ADBC. ODBC will always be valuable for wider database support, and +providing an Arrow-based API on top of ODBC is useful. ADBC would +allow implementing/optimizing this conversion in a common library, +provide a simpler interface for consumers, and would provide an API +that Arrow-native or otherwise columnar systems can implement to +bypass this wrapper. + +.. figure:: ./ADBCQuadrants.svg + + ADBC, JDBC, and ODBC are database-agnostic. They define the + API that the application uses, but not how that API is implemented, + instead deferring to drivers to fulfill requests using the protocol + of their choice. JDBC and (generally) ODBC offer results in a + row-oriented format, while ADBC offers columnar Arrow data. + + Protocols/libraries like libpq (Postgres) and TDS (SQL Server) are + database-specific and row-oriented. Multiple databases may + implement the same protocol to try to reuse each other's work, + e.g. several databases implement the Postgres wire protocol to + benefit from its driver implementations. But the protocol itself + was not designed with multiple databases in mind, nor are they + generally meant to be used directly by applications. + + Some database-specific protocols are Arrow-native, like those of + BigQuery and ClickHouse. Flight SQL additionally is meant to be + database-agnostic, but it defines both the client-facing API and + the underlying protocol, so it's hard for applications to use it as + the API for databases that don't already implement Flight SQL. + +Existing database client APIs +----------------------------- + +:doc:`Arrow Flight SQL <./FlightSql>` + A standard building on top of Arrow Flight, defining how to use + Flight to talk to databases, retrieve metadata, execute queries, and + so on. Provides a single client in C++ and Java language that talks + to any database servers implementing the protocol. Models its API + surface (though not API design) after JDBC and ODBC. + +`DBI for R `_ + An R package/ecosystem of packages for database access. Provides a + single interface with "backends" for specific databases. While + row-oriented, `integration with Arrow is under consideration`_, + including a sketch of effectively the same idea as ADBC. + +`JDBC `_ + A Java library for database access, providing row-based + APIs. Provides a single interface with drivers for specific + databases. + +`ODBC `_ + A language-agnostic standard from the ISO/IEC for database access, + associated with Microsoft. Feature-wise, it is similar to JDBC (and + indeed JDBC can wrap ODBC drivers), but it offers columnar data + support through fetching buffers of column values. (See above for + caveats.) Provides a single C interface with drivers for specific + databases. + +`PEP 249 `_ (DBAPI 2.0) + A Python standard for database access providing row-based APIs. Not + a singular package, but rather a set of interfaces that packages + implement. + +Existing libraries +------------------ + +These are libraries which either 1) implement columnar data access for +a particular system; or 2) could be used to implement such access. + +:doc:`Arrow Flight <./Flight>` + An RPC framework optimized for transferring Arrow record batches, + with application-specific extension points but without any higher + level semantics. + +:doc:`Arrow JDBC <../java/jdbc>` + A Java submodule, part of Arrow/Java, that uses the JDBC API to + produce Arrow data. Internally, it can read data only row-at-a-time. + +`arrow-odbc `_ + A Rust community project that uses the ODBC API to produce Arrow + data, using ODBC’s buffer-based API to perform bulk copies. (See + also: Turbodbc.) + +`Arrowdantic `_ + Python bindings for an implementation of ODBC<>Arrow in Rust. + +`pgeon `_ + A client that manually parses the Postgres wire format and produces + Arrow data, bypassing JDBC/ODBC. While it attempts to optimize this + case, the Postgres wire protocol is still row-oriented. + +`Turbodbc `_ + A set of Python ODBC bindings, implementing PEP 249, that also + provides APIs to fetch data as Arrow batches, optimizing the + conversion internally. + +Papers +------ + +Raasveldt, Mark, and Hannes Mühleisen. `“Don't Hold My Data Hostage - +A Case for Client Protocol Redesign”`_. In *Proceedings of the VLDB +Endowment*, 1022–1033, 2017. + +.. External link definitions follow + +.. _f044edf5256abfb4c091b0ad2acc73afea2c93c0: https://github.com/apache/arrow-adbc/commit/f044edf5256abfb4c091b0ad2acc73afea2c93c0 +.. _arrow-adbc: https://github.com/apache/arrow-adbc +.. _block cursors: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/block-cursors?view=sql-server-ver15 +.. _DBI is contemplating: https://r-dbi.github.io/dbi3/articles/dbi3.html +.. _“Don't Hold My Data Hostage - A Case for Client Protocol Redesign”: https://ir.cwi.nl/pub/26415 +.. _integration with Arrow is under consideration: https://r-dbi.github.io/dbi3/articles/dbi3.html#using-arrowparquet-as-an-exchange-format +.. _OLTP: https://en.wikipedia.org/wiki/Online_transaction_processing +.. _represented as integers: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/using-length-and-indicator-values?view=sql-server-ver15 +.. _Result buffers are caller-allocated: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-and-freeing-buffers?view=sql-server-ver15 +.. _SQL_C_BINARY: https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/transferring-data-in-its-binary-form?view=sql-server-ver15 +.. _Strings must be null-terminated: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/character-data-and-c-strings?view=sql-server-ver15 +.. _Substrait: https://substrait.io +.. _the driver must truncate the value: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/data-length-buffer-length-and-truncation?view=sql-server-ver15 +.. _Time intervals use different representations: https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/c-interval-structure?view=sql-server-ver15 +.. _Trino does with connectors: https://trino.io/docs/current/connector.html diff --git a/docs/source/format/ADBC/C.rst b/docs/source/format/ADBC/C.rst new file mode 100644 index 00000000000..ee0490df368 --- /dev/null +++ b/docs/source/format/ADBC/C.rst @@ -0,0 +1,33 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +======================== +ADBC C API Specification +======================== + +In C, ADBC consists of a self-contained header. The header is +reproduced in full here, and is intended to be self-documenting. + +From apache/arrow-adbc commit f044edf5256abfb4c091b0ad2acc73afea2c93c0_: + +.. literalinclude:: ../../../../format/adbc.h + :language: c + :linenos: + :lineno-match: + :lines: 166-1123 + +.. _f044edf5256abfb4c091b0ad2acc73afea2c93c0: https://github.com/apache/arrow-adbc/commit/f044edf5256abfb4c091b0ad2acc73afea2c93c0 diff --git a/docs/source/format/ADBC/Go.rst b/docs/source/format/ADBC/Go.rst new file mode 100644 index 00000000000..b94c291c625 --- /dev/null +++ b/docs/source/format/ADBC/Go.rst @@ -0,0 +1,31 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +========================= +ADBC Go API Specification +========================= + +In Go, ADBC consists of a set of interface definitions in the package +``github.com/apache/arrow-adbc/go/adbc``. + +Broadly, the interfaces are organized similarly to the C API +specification, and bindings to the C API can be created easily. + +See apache/arrow-adbc commit f044edf5256abfb4c091b0ad2acc73afea2c93c0_ +for the definitions. + +.. _f044edf5256abfb4c091b0ad2acc73afea2c93c0: https://github.com/apache/arrow-adbc/commit/f044edf5256abfb4c091b0ad2acc73afea2c93c0 diff --git a/docs/source/format/ADBC/Java.rst b/docs/source/format/ADBC/Java.rst new file mode 100644 index 00000000000..a799fe07451 --- /dev/null +++ b/docs/source/format/ADBC/Java.rst @@ -0,0 +1,33 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +=========================== +ADBC Java API Specification +=========================== + +In Java, ADBC consists of a set of interface definitions in the +package ``org.apache.arrow.adbc:adbc-core``. + +Broadly, the interfaces are organized similarly to the C API +specification, but with conveniences for Java (actual enum +definitions, constants for common Arrow schemas, etc.) and makes use +of the Arrow Java libraries directly instead of the C Data Interface. + +See apache/arrow-adbc commit f044edf5256abfb4c091b0ad2acc73afea2c93c0_ +for the definitions. + +.. _f044edf5256abfb4c091b0ad2acc73afea2c93c0: https://github.com/apache/arrow-adbc/commit/f044edf5256abfb4c091b0ad2acc73afea2c93c0 diff --git a/docs/source/format/ADBCQuadrants.svg b/docs/source/format/ADBCQuadrants.svg new file mode 100644 index 00000000000..6d79cf79afe --- /dev/null +++ b/docs/source/format/ADBCQuadrants.svg @@ -0,0 +1,64 @@ + + + + + + + + + + + + + + + + Database-specific + Database-agnostic + + + + + Arrow-native + Row-oriented + + + + ADBC + + JDBC + ODBC + + Flight SQL + BigQuery wire protocol + + libpq/Postgreswire protocol + TDS/SQL Serverwire protocol + diff --git a/docs/source/index.rst b/docs/source/index.rst index 60879993e45..4be72554cc8 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -80,6 +80,7 @@ target environment.** format/Integration format/CDataInterface format/CStreamInterface + format/ADBC format/Other format/Glossary diff --git a/format/adbc.h b/format/adbc.h new file mode 100644 index 00000000000..a1ff53441db --- /dev/null +++ b/format/adbc.h @@ -0,0 +1,1207 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/// \file adbc.h ADBC: Arrow Database connectivity +/// +/// An Arrow-based interface between applications and database +/// drivers. ADBC aims to provide a vendor-independent API for SQL +/// and Substrait-based database access that is targeted at +/// analytics/OLAP use cases. +/// +/// This API is intended to be implemented directly by drivers and +/// used directly by client applications. To assist portability +/// between different vendors, a "driver manager" library is also +/// provided, which implements this same API, but dynamically loads +/// drivers internally and forwards calls appropriately. +/// +/// ADBC uses structs with free functions that operate on those +/// structs to model objects. +/// +/// In general, objects allow serialized access from multiple threads, +/// but not concurrent access. Specific implementations may permit +/// multiple threads. +/// +/// \version 1.0.0 + +#pragma once + +#include +#include + +/// \defgroup Arrow C Data Interface +/// Definitions for the C Data Interface/C Stream Interface. +/// +/// See https://arrow.apache.org/docs/format/CDataInterface.html +/// +/// @{ + +//! @cond Doxygen_Suppress + +#ifdef __cplusplus +extern "C" { +#endif + +// Extra guard for versions of Arrow without the canonical guard +#ifndef ARROW_FLAG_DICTIONARY_ORDERED + +#ifndef ARROW_C_DATA_INTERFACE +#define ARROW_C_DATA_INTERFACE + +#define ARROW_FLAG_DICTIONARY_ORDERED 1 +#define ARROW_FLAG_NULLABLE 2 +#define ARROW_FLAG_MAP_KEYS_SORTED 4 + +struct ArrowSchema { + // Array type description + const char* format; + const char* name; + const char* metadata; + int64_t flags; + int64_t n_children; + struct ArrowSchema** children; + struct ArrowSchema* dictionary; + + // Release callback + void (*release)(struct ArrowSchema*); + // Opaque producer-specific data + void* private_data; +}; + +struct ArrowArray { + // Array data description + int64_t length; + int64_t null_count; + int64_t offset; + int64_t n_buffers; + int64_t n_children; + const void** buffers; + struct ArrowArray** children; + struct ArrowArray* dictionary; + + // Release callback + void (*release)(struct ArrowArray*); + // Opaque producer-specific data + void* private_data; +}; + +#endif // ARROW_C_DATA_INTERFACE + +#ifndef ARROW_C_STREAM_INTERFACE +#define ARROW_C_STREAM_INTERFACE + +struct ArrowArrayStream { + // Callback to get the stream type + // (will be the same for all arrays in the stream). + // + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + // + // If successful, the ArrowSchema must be released independently from the stream. + int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out); + + // Callback to get the next array + // (if no error and the array is released, the stream has ended) + // + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + // + // If successful, the ArrowArray must be released independently from the stream. + int (*get_next)(struct ArrowArrayStream*, struct ArrowArray* out); + + // Callback to get optional detailed error information. + // This must only be called if the last stream operation failed + // with a non-0 return code. + // + // Return value: pointer to a null-terminated character array describing + // the last error, or NULL if no description is available. + // + // The returned pointer is only valid until the next operation on this stream + // (including release). + const char* (*get_last_error)(struct ArrowArrayStream*); + + // Release callback: release the stream's own resources. + // Note that arrays returned by `get_next` must be individually released. + void (*release)(struct ArrowArrayStream*); + + // Opaque producer-specific data + void* private_data; +}; + +#endif // ARROW_C_STREAM_INTERFACE +#endif // ARROW_FLAG_DICTIONARY_ORDERED + +//! @endcond + +/// @} + +#ifndef ADBC +#define ADBC + +// Storage class macros for Windows +// Allow overriding/aliasing with application-defined macros +#if !defined(ADBC_EXPORT) +#if defined(_WIN32) +#if defined(ADBC_EXPORTING) +#define ADBC_EXPORT __declspec(dllexport) +#else +#define ADBC_EXPORT __declspec(dllimport) +#endif // defined(ADBC_EXPORTING) +#else +#define ADBC_EXPORT +#endif // defined(_WIN32) +#endif // !defined(ADBC_EXPORT) + +/// \defgroup adbc-error-handling Error Handling +/// ADBC uses integer error codes to signal errors. To provide more +/// detail about errors, functions may also return an AdbcError via an +/// optional out parameter, which can be inspected. If provided, it is +/// the responsibility of the caller to zero-initialize the AdbcError +/// value. +/// +/// @{ + +/// \brief Error codes for operations that may fail. +typedef uint8_t AdbcStatusCode; + +/// \brief No error. +#define ADBC_STATUS_OK 0 +/// \brief An unknown error occurred. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_UNKNOWN 1 +/// \brief The operation is not implemented or supported. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_NOT_IMPLEMENTED 2 +/// \brief A requested resource was not found. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_NOT_FOUND 3 +/// \brief A requested resource already exists. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_ALREADY_EXISTS 4 +/// \brief The arguments are invalid, likely a programming error. +/// +/// For instance, they may be of the wrong format, or out of range. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_INVALID_ARGUMENT 5 +/// \brief The preconditions for the operation are not met, likely a +/// programming error. +/// +/// For instance, the object may be uninitialized, or may have not +/// been fully configured. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_INVALID_STATE 6 +/// \brief Invalid data was processed (not a programming error). +/// +/// For instance, a division by zero may have occurred during query +/// execution. +/// +/// May indicate a database-side error only. +#define ADBC_STATUS_INVALID_DATA 7 +/// \brief The database's integrity was affected. +/// +/// For instance, a foreign key check may have failed, or a uniqueness +/// constraint may have been violated. +/// +/// May indicate a database-side error only. +#define ADBC_STATUS_INTEGRITY 8 +/// \brief An error internal to the driver or database occurred. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_INTERNAL 9 +/// \brief An I/O error occurred. +/// +/// For instance, a remote service may be unavailable. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_IO 10 +/// \brief The operation was cancelled, not due to a timeout. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_CANCELLED 11 +/// \brief The operation was cancelled due to a timeout. +/// +/// May indicate a driver-side or database-side error. +#define ADBC_STATUS_TIMEOUT 12 +/// \brief Authentication failed. +/// +/// May indicate a database-side error only. +#define ADBC_STATUS_UNAUTHENTICATED 13 +/// \brief The client is not authorized to perform the given operation. +/// +/// May indicate a database-side error only. +#define ADBC_STATUS_UNAUTHORIZED 14 + +/// \brief A detailed error message for an operation. +struct ADBC_EXPORT AdbcError { + /// \brief The error message. + char* message; + + /// \brief A vendor-specific error code, if applicable. + int32_t vendor_code; + + /// \brief A SQLSTATE error code, if provided, as defined by the + /// SQL:2003 standard. If not set, it should be set to + /// "\0\0\0\0\0". + char sqlstate[5]; + + /// \brief Release the contained error. + /// + /// Unlike other structures, this is an embedded callback to make it + /// easier for the driver manager and driver to cooperate. + void (*release)(struct AdbcError* error); +}; + +/// @} + +/// \defgroup adbc-constants Constants +/// @{ + +/// \brief ADBC revision 1.0.0. +/// +/// When passed to an AdbcDriverInitFunc(), the driver parameter must +/// point to an AdbcDriver. +#define ADBC_VERSION_1_0_0 1000000 + +/// \brief Canonical option value for enabling an option. +/// +/// For use as the value in SetOption calls. +#define ADBC_OPTION_VALUE_ENABLED "true" +/// \brief Canonical option value for disabling an option. +/// +/// For use as the value in SetOption calls. +#define ADBC_OPTION_VALUE_DISABLED "false" + +/// \brief The database vendor/product name (e.g. the server name). +/// (type: utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_VENDOR_NAME 0 +/// \brief The database vendor/product version (type: utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_VENDOR_VERSION 1 +/// \brief The database vendor/product Arrow library version (type: +/// utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_VENDOR_ARROW_VERSION 2 + +/// \brief The driver name (type: utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_DRIVER_NAME 100 +/// \brief The driver version (type: utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_DRIVER_VERSION 101 +/// \brief The driver Arrow library version (type: utf8). +/// +/// \see AdbcConnectionGetInfo +#define ADBC_INFO_DRIVER_ARROW_VERSION 102 + +/// \brief Return metadata on catalogs, schemas, tables, and columns. +/// +/// \see AdbcConnectionGetObjects +#define ADBC_OBJECT_DEPTH_ALL 0 +/// \brief Return metadata on catalogs only. +/// +/// \see AdbcConnectionGetObjects +#define ADBC_OBJECT_DEPTH_CATALOGS 1 +/// \brief Return metadata on catalogs and schemas. +/// +/// \see AdbcConnectionGetObjects +#define ADBC_OBJECT_DEPTH_DB_SCHEMAS 2 +/// \brief Return metadata on catalogs, schemas, and tables. +/// +/// \see AdbcConnectionGetObjects +#define ADBC_OBJECT_DEPTH_TABLES 3 +/// \brief Return metadata on catalogs, schemas, tables, and columns. +/// +/// \see AdbcConnectionGetObjects +#define ADBC_OBJECT_DEPTH_COLUMNS ADBC_OBJECT_DEPTH_ALL + +/// \brief The name of the canonical option for whether autocommit is +/// enabled. +/// +/// \see AdbcConnectionSetOption +#define ADBC_CONNECTION_OPTION_AUTOCOMMIT "adbc.connection.autocommit" + +/// \brief The name of the canonical option for whether the current +/// connection should be restricted to being read-only. +/// +/// \see AdbcConnectionSetOption +#define ADBC_CONNECTION_OPTION_READ_ONLY "adbc.connection.readonly" + +/// \brief The name of the canonical option for setting the isolation +/// level of a transaction. +/// +/// Should only be used in conjunction with autocommit disabled and +/// AdbcConnectionCommit / AdbcConnectionRollback. If the desired +/// isolation level is not supported by a driver, it should return an +/// appropriate error. +/// +/// \see AdbcConnectionSetOption +#define ADBC_CONNECTION_OPTION_ISOLATION_LEVEL \ + "adbc.connection.transaction.isolation_level" + +/// \brief Use database or driver default isolation level +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_DEFAULT \ + "adbc.connection.transaction.isolation.default" + +/// \brief The lowest isolation level. Dirty reads are allowed, so one +/// transaction may see not-yet-committed changes made by others. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_READ_UNCOMMITTED \ + "adbc.connection.transaction.isolation.read_uncommitted" + +/// \brief Lock-based concurrency control keeps write locks until the +/// end of the transaction, but read locks are released as soon as a +/// SELECT is performed. Non-repeatable reads can occur in this +/// isolation level. +/// +/// More simply put, Read Committed is an isolation level that guarantees +/// that any data read is committed at the moment it is read. It simply +/// restricts the reader from seeing any intermediate, uncommitted, +/// 'dirty' reads. It makes no promise whatsoever that if the transaction +/// re-issues the read, it will find the same data; data is free to change +/// after it is read. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_READ_COMMITTED \ + "adbc.connection.transaction.isolation.read_committed" + +/// \brief Lock-based concurrency control keeps read AND write locks +/// (acquired on selection data) until the end of the transaction. +/// +/// However, range-locks are not managed, so phantom reads can occur. +/// Write skew is possible at this isolation level in some systems. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_REPEATABLE_READ \ + "adbc.connection.transaction.isolation.repeatable_read" + +/// \brief This isolation guarantees that all reads in the transaction +/// will see a consistent snapshot of the database and the transaction +/// should only successfully commit if no updates conflict with any +/// concurrent updates made since that snapshot. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_SNAPSHOT \ + "adbc.connection.transaction.isolation.snapshot" + +/// \brief Serializability requires read and write locks to be released +/// only at the end of the transaction. This includes acquiring range- +/// locks when a select query uses a ranged WHERE clause to avoid +/// phantom reads. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_SERIALIZABLE \ + "adbc.connection.transaction.isolation.serializable" + +/// \brief The central distinction between serializability and linearizability +/// is that serializability is a global property; a property of an entire +/// history of operations and transactions. Linearizability is a local +/// property; a property of a single operation/transaction. +/// +/// Linearizability can be viewed as a special case of strict serializability +/// where transactions are restricted to consist of a single operation applied +/// to a single object. +/// +/// \see AdbcConnectionSetOption +#define ADBC_OPTION_ISOLATION_LEVEL_LINEARIZABLE \ + "adbc.connection.transaction.isolation.linearizable" + +/// \defgroup adbc-statement-ingestion Bulk Data Ingestion +/// While it is possible to insert data via prepared statements, it can +/// be more efficient to explicitly perform a bulk insert. For +/// compatible drivers, this can be accomplished by setting up and +/// executing a statement. Instead of setting a SQL query or Substrait +/// plan, bind the source data via AdbcStatementBind, and set the name +/// of the table to be created via AdbcStatementSetOption and the +/// options below. Then, call AdbcStatementExecute with +/// ADBC_OUTPUT_TYPE_UPDATE. +/// +/// @{ + +/// \brief The name of the target table for a bulk insert. +/// +/// The driver should attempt to create the table if it does not +/// exist. If the table exists but has a different schema, +/// ADBC_STATUS_ALREADY_EXISTS should be raised. Else, data should be +/// appended to the target table. +#define ADBC_INGEST_OPTION_TARGET_TABLE "adbc.ingest.target_table" +/// \brief Whether to create (the default) or append. +#define ADBC_INGEST_OPTION_MODE "adbc.ingest.mode" +/// \brief Create the table and insert data; error if the table exists. +#define ADBC_INGEST_OPTION_MODE_CREATE "adbc.ingest.mode.create" +/// \brief Do not create the table, and insert data; error if the +/// table does not exist (ADBC_STATUS_NOT_FOUND) or does not match +/// the schema of the data to append (ADBC_STATUS_ALREADY_EXISTS). +#define ADBC_INGEST_OPTION_MODE_APPEND "adbc.ingest.mode.append" + +/// @} + +/// @} + +/// \defgroup adbc-database Database Initialization +/// Clients first initialize a database, then create a connection +/// (below). This gives the implementation a place to initialize and +/// own any common connection state. For example, in-memory databases +/// can place ownership of the actual database in this object. +/// @{ + +/// \brief An instance of a database. +/// +/// Must be kept alive as long as any connections exist. +struct ADBC_EXPORT AdbcDatabase { + /// \brief Opaque implementation-defined state. + /// This field is NULLPTR iff the connection is unintialized/freed. + void* private_data; + /// \brief The associated driver (used by the driver manager to help + /// track state). + struct AdbcDriver* private_driver; +}; + +/// @} + +/// \defgroup adbc-connection Connection Establishment +/// Functions for creating, using, and releasing database connections. +/// @{ + +/// \brief An active database connection. +/// +/// Provides methods for query execution, managing prepared +/// statements, using transactions, and so on. +/// +/// Connections are not required to be thread-safe, but they can be +/// used from multiple threads so long as clients take care to +/// serialize accesses to a connection. +struct ADBC_EXPORT AdbcConnection { + /// \brief Opaque implementation-defined state. + /// This field is NULLPTR iff the connection is unintialized/freed. + void* private_data; + /// \brief The associated driver (used by the driver manager to help + /// track state). + struct AdbcDriver* private_driver; +}; + +/// @} + +/// \defgroup adbc-statement Managing Statements +/// Applications should first initialize a statement with +/// AdbcStatementNew. Then, the statement should be configured with +/// functions like AdbcStatementSetSqlQuery and +/// AdbcStatementSetOption. Finally, the statement can be executed +/// with AdbcStatementExecuteQuery (or call AdbcStatementPrepare first +/// to turn it into a prepared statement instead). +/// @{ + +/// \brief A container for all state needed to execute a database +/// query, such as the query itself, parameters for prepared +/// statements, driver parameters, etc. +/// +/// Statements may represent queries or prepared statements. +/// +/// Statements may be used multiple times and can be reconfigured +/// (e.g. they can be reused to execute multiple different queries). +/// However, executing a statement (and changing certain other state) +/// will invalidate result sets obtained prior to that execution. +/// +/// Multiple statements may be created from a single connection. +/// However, the driver may block or error if they are used +/// concurrently (whether from a single thread or multiple threads). +/// +/// Statements are not required to be thread-safe, but they can be +/// used from multiple threads so long as clients take care to +/// serialize accesses to a statement. +struct ADBC_EXPORT AdbcStatement { + /// \brief Opaque implementation-defined state. + /// This field is NULLPTR iff the connection is unintialized/freed. + void* private_data; + + /// \brief The associated driver (used by the driver manager to help + /// track state). + struct AdbcDriver* private_driver; +}; + +/// \defgroup adbc-statement-partition Partitioned Results +/// Some backends may internally partition the results. These +/// partitions are exposed to clients who may wish to integrate them +/// with a threaded or distributed execution model, where partitions +/// can be divided among threads or machines and fetched in parallel. +/// +/// To use partitioning, execute the statement with +/// AdbcStatementExecutePartitions to get the partition descriptors. +/// Call AdbcConnectionReadPartition to turn the individual +/// descriptors into ArrowArrayStream instances. This may be done on +/// a different connection than the one the partition was created +/// with, or even in a different process on another machine. +/// +/// Drivers are not required to support partitioning. +/// +/// @{ + +/// \brief The partitions of a distributed/partitioned result set. +struct AdbcPartitions { + /// \brief The number of partitions. + size_t num_partitions; + + /// \brief The partitions of the result set, where each entry (up to + /// num_partitions entries) is an opaque identifier that can be + /// passed to AdbcConnectionReadPartition. + const uint8_t** partitions; + + /// \brief The length of each corresponding entry in partitions. + const size_t* partition_lengths; + + /// \brief Opaque implementation-defined state. + /// This field is NULLPTR iff the connection is unintialized/freed. + void* private_data; + + /// \brief Release the contained partitions. + /// + /// Unlike other structures, this is an embedded callback to make it + /// easier for the driver manager and driver to cooperate. + void (*release)(struct AdbcPartitions* partitions); +}; + +/// @} + +/// @} + +/// \defgroup adbc-driver Driver Initialization +/// +/// These functions are intended to help support integration between a +/// driver and the driver manager. +/// @{ + +/// \brief An instance of an initialized database driver. +/// +/// This provides a common interface for vendor-specific driver +/// initialization routines. Drivers should populate this struct, and +/// applications can call ADBC functions through this struct, without +/// worrying about multiple definitions of the same symbol. +struct ADBC_EXPORT AdbcDriver { + /// \brief Opaque driver-defined state. + /// This field is NULL if the driver is unintialized/freed (but + /// it need not have a value even if the driver is initialized). + void* private_data; + /// \brief Opaque driver manager-defined state. + /// This field is NULL if the driver is unintialized/freed (but + /// it need not have a value even if the driver is initialized). + void* private_manager; + + /// \brief Release the driver and perform any cleanup. + /// + /// This is an embedded callback to make it easier for the driver + /// manager and driver to cooperate. + AdbcStatusCode (*release)(struct AdbcDriver* driver, struct AdbcError* error); + + AdbcStatusCode (*DatabaseInit)(struct AdbcDatabase*, struct AdbcError*); + AdbcStatusCode (*DatabaseNew)(struct AdbcDatabase*, struct AdbcError*); + AdbcStatusCode (*DatabaseSetOption)(struct AdbcDatabase*, const char*, const char*, + struct AdbcError*); + AdbcStatusCode (*DatabaseRelease)(struct AdbcDatabase*, struct AdbcError*); + + AdbcStatusCode (*ConnectionCommit)(struct AdbcConnection*, struct AdbcError*); + AdbcStatusCode (*ConnectionGetInfo)(struct AdbcConnection*, uint32_t*, size_t, + struct ArrowArrayStream*, struct AdbcError*); + AdbcStatusCode (*ConnectionGetObjects)(struct AdbcConnection*, int, const char*, + const char*, const char*, const char**, + const char*, struct ArrowArrayStream*, + struct AdbcError*); + AdbcStatusCode (*ConnectionGetTableSchema)(struct AdbcConnection*, const char*, + const char*, const char*, + struct ArrowSchema*, struct AdbcError*); + AdbcStatusCode (*ConnectionGetTableTypes)(struct AdbcConnection*, + struct ArrowArrayStream*, struct AdbcError*); + AdbcStatusCode (*ConnectionInit)(struct AdbcConnection*, struct AdbcDatabase*, + struct AdbcError*); + AdbcStatusCode (*ConnectionNew)(struct AdbcConnection*, struct AdbcError*); + AdbcStatusCode (*ConnectionSetOption)(struct AdbcConnection*, const char*, const char*, + struct AdbcError*); + AdbcStatusCode (*ConnectionReadPartition)(struct AdbcConnection*, const uint8_t*, + size_t, struct ArrowArrayStream*, + struct AdbcError*); + AdbcStatusCode (*ConnectionRelease)(struct AdbcConnection*, struct AdbcError*); + AdbcStatusCode (*ConnectionRollback)(struct AdbcConnection*, struct AdbcError*); + + AdbcStatusCode (*StatementBind)(struct AdbcStatement*, struct ArrowArray*, + struct ArrowSchema*, struct AdbcError*); + AdbcStatusCode (*StatementBindStream)(struct AdbcStatement*, struct ArrowArrayStream*, + struct AdbcError*); + AdbcStatusCode (*StatementExecuteQuery)(struct AdbcStatement*, struct ArrowArrayStream*, + int64_t*, struct AdbcError*); + AdbcStatusCode (*StatementExecutePartitions)(struct AdbcStatement*, struct ArrowSchema*, + struct AdbcPartitions*, int64_t*, + struct AdbcError*); + AdbcStatusCode (*StatementGetParameterSchema)(struct AdbcStatement*, + struct ArrowSchema*, struct AdbcError*); + AdbcStatusCode (*StatementNew)(struct AdbcConnection*, struct AdbcStatement*, + struct AdbcError*); + AdbcStatusCode (*StatementPrepare)(struct AdbcStatement*, struct AdbcError*); + AdbcStatusCode (*StatementRelease)(struct AdbcStatement*, struct AdbcError*); + AdbcStatusCode (*StatementSetOption)(struct AdbcStatement*, const char*, const char*, + struct AdbcError*); + AdbcStatusCode (*StatementSetSqlQuery)(struct AdbcStatement*, const char*, + struct AdbcError*); + AdbcStatusCode (*StatementSetSubstraitPlan)(struct AdbcStatement*, const uint8_t*, + size_t, struct AdbcError*); +}; + +/// @} + +/// \addtogroup adbc-database +/// @{ + +/// \brief Allocate a new (but uninitialized) database. +ADBC_EXPORT +AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError* error); + +/// \brief Set a char* option. +/// +/// Options may be set before AdbcDatabaseInit. Some drivers may +/// support setting options after initialization as well. +/// +/// \return ADBC_STATUS_NOT_IMPLEMENTED if the option is not recognized +ADBC_EXPORT +AdbcStatusCode AdbcDatabaseSetOption(struct AdbcDatabase* database, const char* key, + const char* value, struct AdbcError* error); + +/// \brief Finish setting options and initialize the database. +/// +/// Some drivers may support setting options after initialization +/// as well. +ADBC_EXPORT +AdbcStatusCode AdbcDatabaseInit(struct AdbcDatabase* database, struct AdbcError* error); + +/// \brief Destroy this database. No connections may exist. +/// \param[in] database The database to release. +/// \param[out] error An optional location to return an error +/// message if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcDatabaseRelease(struct AdbcDatabase* database, + struct AdbcError* error); + +/// @} + +/// \addtogroup adbc-connection +/// @{ + +/// \brief Allocate a new (but uninitialized) connection. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionNew(struct AdbcConnection* connection, + struct AdbcError* error); + +/// \brief Set a char* option. +/// +/// Options may be set before AdbcConnectionInit. Some drivers may +/// support setting options after initialization as well. +/// +/// \return ADBC_STATUS_NOT_IMPLEMENTED if the option is not recognized +ADBC_EXPORT +AdbcStatusCode AdbcConnectionSetOption(struct AdbcConnection* connection, const char* key, + const char* value, struct AdbcError* error); + +/// \brief Finish setting options and initialize the connection. +/// +/// Some drivers may support setting options after initialization +/// as well. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionInit(struct AdbcConnection* connection, + struct AdbcDatabase* database, struct AdbcError* error); + +/// \brief Destroy this connection. +/// +/// \param[in] connection The connection to release. +/// \param[out] error An optional location to return an error +/// message if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection, + struct AdbcError* error); + +/// \defgroup adbc-connection-metadata Metadata +/// Functions for retrieving metadata about the database. +/// +/// Generally, these functions return an ArrowArrayStream that can be +/// consumed to get the metadata as Arrow data. The returned metadata +/// has an expected schema given in the function docstring. Schema +/// fields are nullable unless otherwise marked. While no +/// AdbcStatement is used in these functions, the result set may count +/// as an active statement to the driver for the purposes of +/// concurrency management (e.g. if the driver has a limit on +/// concurrent active statements and it must execute a SQL query +/// internally in order to implement the metadata function). +/// +/// Some functions accept "search pattern" arguments, which are +/// strings that can contain the special character "%" to match zero +/// or more characters, or "_" to match exactly one character. (See +/// the documentation of DatabaseMetaData in JDBC or "Pattern Value +/// Arguments" in the ODBC documentation.) Escaping is not currently +/// supported. +/// +/// @{ + +/// \brief Get metadata about the database/driver. +/// +/// The result is an Arrow dataset with the following schema: +/// +/// Field Name | Field Type +/// ----------------------------|------------------------ +/// info_name | uint32 not null +/// info_value | INFO_SCHEMA +/// +/// INFO_SCHEMA is a dense union with members: +/// +/// Field Name (Type Code) | Field Type +/// ----------------------------|------------------------ +/// string_value (0) | utf8 +/// bool_value (1) | bool +/// int64_value (2) | int64 +/// int32_bitmask (3) | int32 +/// string_list (4) | list +/// int32_to_int32_list_map (5) | map> +/// +/// Each metadatum is identified by an integer code. The recognized +/// codes are defined as constants. Codes [0, 10_000) are reserved +/// for ADBC usage. Drivers/vendors will ignore requests for +/// unrecognized codes (the row will be omitted from the result). +/// +/// \param[in] connection The connection to query. +/// \param[in] info_codes A list of metadata codes to fetch, or NULL +/// to fetch all. +/// \param[in] info_codes_length The length of the info_codes +/// parameter. Ignored if info_codes is NULL. +/// \param[out] out The result set. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionGetInfo(struct AdbcConnection* connection, + uint32_t* info_codes, size_t info_codes_length, + struct ArrowArrayStream* out, + struct AdbcError* error); + +/// \brief Get a hierarchical view of all catalogs, database schemas, +/// tables, and columns. +/// +/// The result is an Arrow dataset with the following schema: +/// +/// | Field Name | Field Type | +/// |--------------------------|-------------------------| +/// | catalog_name | utf8 | +/// | catalog_db_schemas | list | +/// +/// DB_SCHEMA_SCHEMA is a Struct with fields: +/// +/// | Field Name | Field Type | +/// |--------------------------|-------------------------| +/// | db_schema_name | utf8 | +/// | db_schema_tables | list | +/// +/// TABLE_SCHEMA is a Struct with fields: +/// +/// | Field Name | Field Type | +/// |--------------------------|-------------------------| +/// | table_name | utf8 not null | +/// | table_type | utf8 not null | +/// | table_columns | list | +/// | table_constraints | list | +/// +/// COLUMN_SCHEMA is a Struct with fields: +/// +/// | Field Name | Field Type | Comments | +/// |--------------------------|-------------------------|----------| +/// | column_name | utf8 not null | | +/// | ordinal_position | int32 | (1) | +/// | remarks | utf8 | (2) | +/// | xdbc_data_type | int16 | (3) | +/// | xdbc_type_name | utf8 | (3) | +/// | xdbc_column_size | int32 | (3) | +/// | xdbc_decimal_digits | int16 | (3) | +/// | xdbc_num_prec_radix | int16 | (3) | +/// | xdbc_nullable | int16 | (3) | +/// | xdbc_column_def | utf8 | (3) | +/// | xdbc_sql_data_type | int16 | (3) | +/// | xdbc_datetime_sub | int16 | (3) | +/// | xdbc_char_octet_length | int32 | (3) | +/// | xdbc_is_nullable | utf8 | (3) | +/// | xdbc_scope_catalog | utf8 | (3) | +/// | xdbc_scope_schema | utf8 | (3) | +/// | xdbc_scope_table | utf8 | (3) | +/// | xdbc_is_autoincrement | bool | (3) | +/// | xdbc_is_generatedcolumn | bool | (3) | +/// +/// 1. The column's ordinal position in the table (starting from 1). +/// 2. Database-specific description of the column. +/// 3. Optional value. Should be null if not supported by the driver. +/// xdbc_ values are meant to provide JDBC/ODBC-compatible metadata +/// in an agnostic manner. +/// +/// CONSTRAINT_SCHEMA is a Struct with fields: +/// +/// | Field Name | Field Type | Comments | +/// |--------------------------|-------------------------|----------| +/// | constraint_name | utf8 | | +/// | constraint_type | utf8 not null | (1) | +/// | constraint_column_names | list not null | (2) | +/// | constraint_column_usage | list | (3) | +/// +/// 1. One of 'CHECK', 'FOREIGN KEY', 'PRIMARY KEY', or 'UNIQUE'. +/// 2. The columns on the current table that are constrained, in +/// order. +/// 3. For FOREIGN KEY only, the referenced table and columns. +/// +/// USAGE_SCHEMA is a Struct with fields: +/// +/// | Field Name | Field Type | +/// |--------------------------|-------------------------| +/// | fk_catalog | utf8 | +/// | fk_db_schema | utf8 | +/// | fk_table | utf8 not null | +/// | fk_column_name | utf8 not null | +/// +/// \param[in] connection The database connection. +/// \param[in] depth The level of nesting to display. If 0, display +/// all levels. If 1, display only catalogs (i.e. catalog_schemas +/// will be null). If 2, display only catalogs and schemas +/// (i.e. db_schema_tables will be null), and so on. +/// \param[in] catalog Only show tables in the given catalog. If NULL, +/// do not filter by catalog. If an empty string, only show tables +/// without a catalog. May be a search pattern (see section +/// documentation). +/// \param[in] db_schema Only show tables in the given database schema. If +/// NULL, do not filter by database schema. If an empty string, only show +/// tables without a database schema. May be a search pattern (see section +/// documentation). +/// \param[in] table_name Only show tables with the given name. If NULL, do not +/// filter by name. May be a search pattern (see section documentation). +/// \param[in] table_type Only show tables matching one of the given table +/// types. If NULL, show tables of any type. Valid table types can be fetched +/// from GetTableTypes. Terminate the list with a NULL entry. +/// \param[in] column_name Only show columns with the given name. If +/// NULL, do not filter by name. May be a search pattern (see +/// section documentation). +/// \param[out] out The result set. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionGetObjects(struct AdbcConnection* connection, int depth, + const char* catalog, const char* db_schema, + const char* table_name, const char** table_type, + const char* column_name, + struct ArrowArrayStream* out, + struct AdbcError* error); + +/// \brief Get the Arrow schema of a table. +/// +/// \param[in] connection The database connection. +/// \param[in] catalog The catalog (or nullptr if not applicable). +/// \param[in] db_schema The database schema (or nullptr if not applicable). +/// \param[in] table_name The table name. +/// \param[out] schema The table schema. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection, + const char* catalog, const char* db_schema, + const char* table_name, + struct ArrowSchema* schema, + struct AdbcError* error); + +/// \brief Get a list of table types in the database. +/// +/// The result is an Arrow dataset with the following schema: +/// +/// Field Name | Field Type +/// ---------------|-------------- +/// table_type | utf8 not null +/// +/// \param[in] connection The database connection. +/// \param[out] out The result set. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection, + struct ArrowArrayStream* out, + struct AdbcError* error); + +/// @} + +/// \defgroup adbc-connection-partition Partitioned Results +/// Some databases may internally partition the results. These +/// partitions are exposed to clients who may wish to integrate them +/// with a threaded or distributed execution model, where partitions +/// can be divided among threads or machines for processing. +/// +/// Drivers are not required to support partitioning. +/// +/// Partitions are not ordered. If the result set is sorted, +/// implementations should return a single partition. +/// +/// @{ + +/// \brief Construct a statement for a partition of a query. The +/// results can then be read independently. +/// +/// A partition can be retrieved from AdbcPartitions. +/// +/// \param[in] connection The connection to use. This does not have +/// to be the same connection that the partition was created on. +/// \param[in] serialized_partition The partition descriptor. +/// \param[in] serialized_length The partition descriptor length. +/// \param[out] out The result set. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionReadPartition(struct AdbcConnection* connection, + const uint8_t* serialized_partition, + size_t serialized_length, + struct ArrowArrayStream* out, + struct AdbcError* error); + +/// @} + +/// \defgroup adbc-connection-transaction Transaction Semantics +/// +/// Connections start out in auto-commit mode by default (if +/// applicable for the given vendor). Use AdbcConnectionSetOption and +/// ADBC_CONNECTION_OPTION_AUTO_COMMIT to change this. +/// +/// @{ + +/// \brief Commit any pending transactions. Only used if autocommit is +/// disabled. +/// +/// Behavior is undefined if this is mixed with SQL transaction +/// statements. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionCommit(struct AdbcConnection* connection, + struct AdbcError* error); + +/// \brief Roll back any pending transactions. Only used if autocommit +/// is disabled. +/// +/// Behavior is undefined if this is mixed with SQL transaction +/// statements. +ADBC_EXPORT +AdbcStatusCode AdbcConnectionRollback(struct AdbcConnection* connection, + struct AdbcError* error); + +/// @} + +/// @} + +/// \addtogroup adbc-statement +/// @{ + +/// \brief Create a new statement for a given connection. +/// +/// Set options on the statement, then call AdbcStatementExecuteQuery +/// or AdbcStatementPrepare. +ADBC_EXPORT +AdbcStatusCode AdbcStatementNew(struct AdbcConnection* connection, + struct AdbcStatement* statement, struct AdbcError* error); + +/// \brief Destroy a statement. +/// \param[in] statement The statement to release. +/// \param[out] error An optional location to return an error +/// message if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcStatementRelease(struct AdbcStatement* statement, + struct AdbcError* error); + +/// \brief Execute a statement and get the results. +/// +/// This invalidates any prior result sets. +/// +/// \param[in] statement The statement to execute. +/// \param[out] out The results. Pass NULL if the client does not +/// expect a result set. +/// \param[out] rows_affected The number of rows affected if known, +/// else -1. Pass NULL if the client does not want this information. +/// \param[out] error An optional location to return an error +/// message if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcStatementExecuteQuery(struct AdbcStatement* statement, + struct ArrowArrayStream* out, + int64_t* rows_affected, struct AdbcError* error); + +/// \brief Turn this statement into a prepared statement to be +/// executed multiple times. +/// +/// This invalidates any prior result sets. +ADBC_EXPORT +AdbcStatusCode AdbcStatementPrepare(struct AdbcStatement* statement, + struct AdbcError* error); + +/// \defgroup adbc-statement-sql SQL Semantics +/// Functions for executing SQL queries, or querying SQL-related +/// metadata. Drivers are not required to support both SQL and +/// Substrait semantics. If they do, it may be via converting +/// between representations internally. +/// @{ + +/// \brief Set the SQL query to execute. +/// +/// The query can then be executed with AdbcStatementExecute. For +/// queries expected to be executed repeatedly, AdbcStatementPrepare +/// the statement first. +/// +/// \param[in] statement The statement. +/// \param[in] query The query to execute. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcStatementSetSqlQuery(struct AdbcStatement* statement, + const char* query, struct AdbcError* error); + +/// @} + +/// \defgroup adbc-statement-substrait Substrait Semantics +/// Functions for executing Substrait plans, or querying +/// Substrait-related metadata. Drivers are not required to support +/// both SQL and Substrait semantics. If they do, it may be via +/// converting between representations internally. +/// @{ + +/// \brief Set the Substrait plan to execute. +/// +/// The query can then be executed with AdbcStatementExecute. For +/// queries expected to be executed repeatedly, AdbcStatementPrepare +/// the statement first. +/// +/// \param[in] statement The statement. +/// \param[in] plan The serialized substrait.Plan to execute. +/// \param[in] length The length of the serialized plan. +/// \param[out] error Error details, if an error occurs. +ADBC_EXPORT +AdbcStatusCode AdbcStatementSetSubstraitPlan(struct AdbcStatement* statement, + const uint8_t* plan, size_t length, + struct AdbcError* error); + +/// @} + +/// \brief Bind Arrow data. This can be used for bulk inserts or +/// prepared statements. +/// +/// \param[in] statement The statement to bind to. +/// \param[in] values The values to bind. The driver will call the +/// release callback itself, although it may not do this until the +/// statement is released. +/// \param[in] schema The schema of the values to bind. +/// \param[out] error An optional location to return an error message +/// if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcStatementBind(struct AdbcStatement* statement, + struct ArrowArray* values, struct ArrowSchema* schema, + struct AdbcError* error); + +/// \brief Bind Arrow data. This can be used for bulk inserts or +/// prepared statements. +/// \param[in] statement The statement to bind to. +/// \param[in] stream The values to bind. The driver will call the +/// release callback itself, although it may not do this until the +/// statement is released. +/// \param[out] error An optional location to return an error message +/// if necessary. +ADBC_EXPORT +AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement, + struct ArrowArrayStream* stream, + struct AdbcError* error); + +/// \brief Get the schema for bound parameters. +/// +/// This retrieves an Arrow schema describing the number, names, and +/// types of the parameters in a parameterized statement. The fields +/// of the schema should be in order of the ordinal position of the +/// parameters; named parameters should appear only once. +/// +/// If the parameter does not have a name, or the name cannot be +/// determined, the name of the corresponding field in the schema will +/// be an empty string. If the type cannot be determined, the type of +/// the corresponding field will be NA (NullType). +/// +/// This should be called after AdbcStatementPrepare. +/// +/// \return ADBC_STATUS_NOT_IMPLEMENTED if the schema cannot be determined. +ADBC_EXPORT +AdbcStatusCode AdbcStatementGetParameterSchema(struct AdbcStatement* statement, + struct ArrowSchema* schema, + struct AdbcError* error); + +/// \brief Set a string option on a statement. +ADBC_EXPORT +AdbcStatusCode AdbcStatementSetOption(struct AdbcStatement* statement, const char* key, + const char* value, struct AdbcError* error); + +/// \addtogroup adbc-statement-partition +/// @{ + +/// \brief Execute a statement and get the results as a partitioned +/// result set. +/// +/// \param[in] statement The statement to execute. +/// \param[out] schema The schema of the result set. +/// \param[out] partitions The result partitions. +/// \param[out] rows_affected The number of rows affected if known, +/// else -1. Pass NULL if the client does not want this information. +/// \param[out] error An optional location to return an error +/// message if necessary. +/// \return ADBC_STATUS_NOT_IMPLEMENTED if the driver does not support +/// partitioned results +ADBC_EXPORT +AdbcStatusCode AdbcStatementExecutePartitions(struct AdbcStatement* statement, + struct ArrowSchema* schema, + struct AdbcPartitions* partitions, + int64_t* rows_affected, + struct AdbcError* error); + +/// @} + +/// @} + +/// \addtogroup adbc-driver +/// @{ + +/// \brief Common entry point for drivers via the driver manager +/// (which uses dlopen(3)/LoadLibrary). The driver manager is told +/// to load a library and call a function of this type to load the +/// driver. +/// +/// Although drivers may choose any name for this function, the +/// recommended name is "AdbcDriverInit". +/// +/// \param[in] version The ADBC revision to attempt to initialize (see +/// ADBC_VERSION_1_0_0). +/// \param[out] driver The table of function pointers to +/// initialize. Should be a pointer to the appropriate struct for +/// the given version (see the documentation for the version). +/// \param[out] error An optional location to return an error message +/// if necessary. +/// \return ADBC_STATUS_OK if the driver was initialized, or +/// ADBC_STATUS_NOT_IMPLEMENTED if the version is not supported. In +/// that case, clients may retry with a different version. +typedef AdbcStatusCode (*AdbcDriverInitFunc)(int version, void* driver, + struct AdbcError* error); + +/// @} + +#endif // ADBC + +#ifdef __cplusplus +} +#endif