From 5ebd931aa4e9e392a7fdfff660216036dbe0de3c Mon Sep 17 00:00:00 2001 From: Micah Kornfield Date: Tue, 6 Oct 2020 22:09:42 -0700 Subject: [PATCH 1/3] ARROW-10203: Give guidance on big-endian support in the contributors guide --- docs/source/developers/contributing.rst | 37 +++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/source/developers/contributing.rst b/docs/source/developers/contributing.rst index 38e3f484da9..5b89c0a23f5 100644 --- a/docs/source/developers/contributing.rst +++ b/docs/source/developers/contributing.rst @@ -304,3 +304,40 @@ to your branch, which they sometimes do to help move a pull request along. In addition, the GitHub PR "suggestion" feature can also add commits to your branch, so it is possible that your local copy of your branch is missing some additions. + +Guidance for specific features +============================== + +From time to time the community has discussions on specific types of features +and improvements that they expect to support. This section outlines decisions +that have been made in this regard. + +Endianess ++++++++++ +Arrow is primarily a little endian format there has been some effort to +support big endian platforms as well. Based on a mailing list discussion, +The requirements for a new platform are: + +1. A robut (non-flaky, returns results in a reasonable time) Continuous integration setup. +2. Performance benchmarks in performance critical parts of the code to demonstrate no + regression. + +Furthermore for big-endianess support there are two levels that an implementation can support +1. Native endianness (all arrow communication happens with processes of the same endianness. +2. Cross platform support (implementations will do byte reordering when appropriate for IPC + and flight messages). + +The decision on what level to support is based on maintainers preferences for complexity and +technical risk. In general all implementations should be open to native endianness support +(provided the CI and performance requirements are met). Cross endianness support is a question +for individual maintainers. The current implementations aiming for cross platform support are: + +1. C++ + +Implementations that do not intend to implement cross + +1. Java + +For other libraries a discussion to gather consensus on the mailing should be had before submitting +PRs. + From 27d6273b8eb174c7c98a0795f23873b6c0333575 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Thu, 15 Oct 2020 22:28:35 -0700 Subject: [PATCH 2/3] Update contributing.rst adress comments. --- docs/source/developers/contributing.rst | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/docs/source/developers/contributing.rst b/docs/source/developers/contributing.rst index 5b89c0a23f5..f00e4049232 100644 --- a/docs/source/developers/contributing.rst +++ b/docs/source/developers/contributing.rst @@ -314,27 +314,32 @@ that have been made in this regard. Endianess +++++++++ -Arrow is primarily a little endian format there has been some effort to -support big endian platforms as well. Based on a mailing list discussion, +The Arrow format allows setting endianness. Due to the popularity of little endian +architectures most of implementation assume little endian by default. There has been some +effort to support big endian platforms as well. Based on a mailing list discussion, The requirements for a new platform are: -1. A robut (non-flaky, returns results in a reasonable time) Continuous integration setup. +1. A robust (non-flaky, returns results in a reasonable time) Continuous integration setup. 2. Performance benchmarks in performance critical parts of the code to demonstrate no regression. -Furthermore for big-endianess support there are two levels that an implementation can support -1. Native endianness (all arrow communication happens with processes of the same endianness. +Furthermore for big-endianess support there are two levels that an implementation can support: + +1. Native endianness (all arrow communication happens with processes of the same endianness). + This includes ancillary libraries like file import/export. 2. Cross platform support (implementations will do byte reordering when appropriate for IPC and flight messages). The decision on what level to support is based on maintainers preferences for complexity and technical risk. In general all implementations should be open to native endianness support (provided the CI and performance requirements are met). Cross endianness support is a question -for individual maintainers. The current implementations aiming for cross platform support are: +for individual maintainers. + +The current implementations aiming for cross platform support are: 1. C++ -Implementations that do not intend to implement cross +Implementations that do not intend to implement cross platform support: 1. Java From 63bd9c8676139e4b327c4541b5cf287fed7a8768 Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Mon, 19 Oct 2020 14:32:41 +0200 Subject: [PATCH 3/3] Nits --- docs/source/developers/contributing.rst | 48 ++++++++++++++----------- docs/source/format/Columnar.rst | 2 ++ docs/source/python/ipc.rst | 18 ---------- 3 files changed, 29 insertions(+), 39 deletions(-) diff --git a/docs/source/developers/contributing.rst b/docs/source/developers/contributing.rst index f00e4049232..20f33f08ef0 100644 --- a/docs/source/developers/contributing.rst +++ b/docs/source/developers/contributing.rst @@ -314,35 +314,41 @@ that have been made in this regard. Endianess +++++++++ -The Arrow format allows setting endianness. Due to the popularity of little endian -architectures most of implementation assume little endian by default. There has been some -effort to support big endian platforms as well. Based on a mailing list discussion, -The requirements for a new platform are: -1. A robust (non-flaky, returns results in a reasonable time) Continuous integration setup. -2. Performance benchmarks in performance critical parts of the code to demonstrate no - regression. +The Arrow format allows setting endianness. Due to the popularity of +little endian architectures most of implementation assume little endian by +default. There has been some effort to support big endian platforms as well. +Based on a `mailing-list discussion +`__, +the requirements for a new platform are: -Furthermore for big-endianess support there are two levels that an implementation can support: +1. A robust (non-flaky, returning results in a reasonable time) Continuous + Integration setup. +2. Benchmarks for performance critical parts of the code to demonstrate + no regression. -1. Native endianness (all arrow communication happens with processes of the same endianness). - This includes ancillary libraries like file import/export. -2. Cross platform support (implementations will do byte reordering when appropriate for IPC - and flight messages). +Furthermore, for big-endian support, there are two levels that an +implementation can support: -The decision on what level to support is based on maintainers preferences for complexity and -technical risk. In general all implementations should be open to native endianness support -(provided the CI and performance requirements are met). Cross endianness support is a question -for individual maintainers. +1. Native endianness (all Arrow communication happens with processes of the + same endianness). This includes ancillary functionality such as reading + and writing various file formats, such as Parquet. +2. Cross endian support (implementations will do byte reordering when + appropriate for :ref:`IPC ` and :ref:`Flight ` + messages). -The current implementations aiming for cross platform support are: +The decision on what level to support is based on maintainers' preferences for +complexity and technical risk. In general all implementations should be open +to native endianness support (provided the CI and performance requirements +are met). Cross endianness support is a question for individual maintainers. + +The current implementations aiming for cross endian support are: 1. C++ -Implementations that do not intend to implement cross platform support: +Implementations that do not intend to implement cross endian support: 1. Java -For other libraries a discussion to gather consensus on the mailing should be had before submitting -PRs. - +For other libraries, a discussion to gather consensus on the mailing-list +should be had before submitting PRs. diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index f51c6aaf633..84e3013adde 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -787,6 +787,8 @@ layouts depending on the particular realization of the type. We do not go into detail about the logical types definitions in this document as we consider `Schema.fbs`_ to be authoritative. +.. _format-ipc: + Serialization and Interprocess Communication (IPC) ================================================== diff --git a/docs/source/python/ipc.rst b/docs/source/python/ipc.rst index 5eeedbdae89..1be8ff62ce5 100644 --- a/docs/source/python/ipc.rst +++ b/docs/source/python/ipc.rst @@ -330,21 +330,3 @@ An object can be reconstructed from its component-based representation using ``deserialize_components`` is also available as a method on ``SerializationContext`` objects. - -Serializing pandas Objects -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The default serialization context has optimized handling of pandas -objects like ``DataFrame`` and ``Series``. Combined with component-based -serialization above, this enables zero-copy transport of pandas DataFrame -objects not containing any Python objects: - -.. ipython:: python - - import pandas as pd - df = pd.DataFrame({'a': [1, 2, 3, 4, 5]}) - context = pa.default_serialization_context() - serialized_df = context.serialize(df) - df_components = serialized_df.to_components() - original_df = context.deserialize_components(df_components) - original_df