diff --git a/docs/source/developers/contributing.rst b/docs/source/developers/contributing.rst index 38e3f484da9..20f33f08ef0 100644 --- a/docs/source/developers/contributing.rst +++ b/docs/source/developers/contributing.rst @@ -304,3 +304,51 @@ to your branch, which they sometimes do to help move a pull request along. In addition, the GitHub PR "suggestion" feature can also add commits to your branch, so it is possible that your local copy of your branch is missing some additions. + +Guidance for specific features +============================== + +From time to time the community has discussions on specific types of features +and improvements that they expect to support. This section outlines decisions +that have been made in this regard. + +Endianess ++++++++++ + +The Arrow format allows setting endianness. Due to the popularity of +little endian architectures most of implementation assume little endian by +default. There has been some effort to support big endian platforms as well. +Based on a `mailing-list discussion +`__, +the requirements for a new platform are: + +1. A robust (non-flaky, returning results in a reasonable time) Continuous + Integration setup. +2. Benchmarks for performance critical parts of the code to demonstrate + no regression. + +Furthermore, for big-endian support, there are two levels that an +implementation can support: + +1. Native endianness (all Arrow communication happens with processes of the + same endianness). This includes ancillary functionality such as reading + and writing various file formats, such as Parquet. +2. Cross endian support (implementations will do byte reordering when + appropriate for :ref:`IPC ` and :ref:`Flight ` + messages). + +The decision on what level to support is based on maintainers' preferences for +complexity and technical risk. In general all implementations should be open +to native endianness support (provided the CI and performance requirements +are met). Cross endianness support is a question for individual maintainers. + +The current implementations aiming for cross endian support are: + +1. C++ + +Implementations that do not intend to implement cross endian support: + +1. Java + +For other libraries, a discussion to gather consensus on the mailing-list +should be had before submitting PRs. diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index f51c6aaf633..84e3013adde 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -787,6 +787,8 @@ layouts depending on the particular realization of the type. We do not go into detail about the logical types definitions in this document as we consider `Schema.fbs`_ to be authoritative. +.. _format-ipc: + Serialization and Interprocess Communication (IPC) ================================================== diff --git a/docs/source/python/ipc.rst b/docs/source/python/ipc.rst index 5eeedbdae89..1be8ff62ce5 100644 --- a/docs/source/python/ipc.rst +++ b/docs/source/python/ipc.rst @@ -330,21 +330,3 @@ An object can be reconstructed from its component-based representation using ``deserialize_components`` is also available as a method on ``SerializationContext`` objects. - -Serializing pandas Objects -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The default serialization context has optimized handling of pandas -objects like ``DataFrame`` and ``Series``. Combined with component-based -serialization above, this enables zero-copy transport of pandas DataFrame -objects not containing any Python objects: - -.. ipython:: python - - import pandas as pd - df = pd.DataFrame({'a': [1, 2, 3, 4, 5]}) - context = pa.default_serialization_context() - serialized_df = context.serialize(df) - df_components = serialized_df.to_components() - original_df = context.deserialize_components(df_components) - original_df