Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions docs/source/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,3 +304,51 @@ to your branch, which they sometimes do to help move a pull request along.
In addition, the GitHub PR "suggestion" feature can also add commits to
your branch, so it is possible that your local copy of your branch is missing
some additions.

Guidance for specific features
==============================

From time to time the community has discussions on specific types of features
and improvements that they expect to support. This section outlines decisions
that have been made in this regard.

Endianess
+++++++++

The Arrow format allows setting endianness. Due to the popularity of
little endian architectures most of implementation assume little endian by
default. There has been some effort to support big endian platforms as well.
Based on a `mailing-list discussion
<https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3cCAK7Z5T--HHhr9Dy43PYhD6m-XoU4qoGwQVLwZsG-kOxXjPTyZA@mail.gmail.com%3e>`__,
the requirements for a new platform are:

1. A robust (non-flaky, returning results in a reasonable time) Continuous
Integration setup.
2. Benchmarks for performance critical parts of the code to demonstrate
no regression.

Furthermore, for big-endian support, there are two levels that an
implementation can support:

1. Native endianness (all Arrow communication happens with processes of the
same endianness). This includes ancillary functionality such as reading
and writing various file formats, such as Parquet.
2. Cross endian support (implementations will do byte reordering when
appropriate for :ref:`IPC <format-ipc>` and :ref:`Flight <flight-rpc>`
messages).

The decision on what level to support is based on maintainers' preferences for
complexity and technical risk. In general all implementations should be open
to native endianness support (provided the CI and performance requirements
are met). Cross endianness support is a question for individual maintainers.

The current implementations aiming for cross endian support are:

1. C++

Implementations that do not intend to implement cross endian support:

1. Java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: While there is no PR to implement cross platform support for Java, I will submit the PR later after supporting the level 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jacques-n was against cross-platform support in Java?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Before the asking review for the PR, I will gather the consensus on the ML.


For other libraries, a discussion to gather consensus on the mailing-list
should be had before submitting PRs.
2 changes: 2 additions & 0 deletions docs/source/format/Columnar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,8 @@ layouts depending on the particular realization of the type.
We do not go into detail about the logical types definitions in this
document as we consider `Schema.fbs`_ to be authoritative.

.. _format-ipc:

Serialization and Interprocess Communication (IPC)
==================================================

Expand Down
18 changes: 0 additions & 18 deletions docs/source/python/ipc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -330,21 +330,3 @@ An object can be reconstructed from its component-based representation using

``deserialize_components`` is also available as a method on
``SerializationContext`` objects.

Serializing pandas Objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to remove this to build the docs. This is a deprecated functionality.

~~~~~~~~~~~~~~~~~~~~~~~~~~

The default serialization context has optimized handling of pandas
objects like ``DataFrame`` and ``Series``. Combined with component-based
serialization above, this enables zero-copy transport of pandas DataFrame
objects not containing any Python objects:

.. ipython:: python

import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
context = pa.default_serialization_context()
serialized_df = context.serialize(df)
df_components = serialized_df.to_components()
original_df = context.deserialize_components(df_components)
original_df