-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10203: [Doc] Give guidance on big-endian support in the contributors docs #8374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -304,3 +304,51 @@ to your branch, which they sometimes do to help move a pull request along. | |
| In addition, the GitHub PR "suggestion" feature can also add commits to | ||
| your branch, so it is possible that your local copy of your branch is missing | ||
| some additions. | ||
|
|
||
| Guidance for specific features | ||
| ============================== | ||
|
|
||
| From time to time the community has discussions on specific types of features | ||
| and improvements that they expect to support. This section outlines decisions | ||
| that have been made in this regard. | ||
|
|
||
| Endianess | ||
| +++++++++ | ||
|
|
||
| The Arrow format allows setting endianness. Due to the popularity of | ||
| little endian architectures most of implementation assume little endian by | ||
| default. There has been some effort to support big endian platforms as well. | ||
| Based on a `mailing-list discussion | ||
| <https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3cCAK7Z5T--HHhr9Dy43PYhD6m-XoU4qoGwQVLwZsG-kOxXjPTyZA@mail.gmail.com%3e>`__, | ||
| the requirements for a new platform are: | ||
|
|
||
| 1. A robust (non-flaky, returning results in a reasonable time) Continuous | ||
| Integration setup. | ||
| 2. Benchmarks for performance critical parts of the code to demonstrate | ||
| no regression. | ||
|
|
||
| Furthermore, for big-endian support, there are two levels that an | ||
| implementation can support: | ||
|
|
||
| 1. Native endianness (all Arrow communication happens with processes of the | ||
| same endianness). This includes ancillary functionality such as reading | ||
| and writing various file formats, such as Parquet. | ||
| 2. Cross endian support (implementations will do byte reordering when | ||
| appropriate for :ref:`IPC <format-ipc>` and :ref:`Flight <flight-rpc>` | ||
| messages). | ||
|
|
||
| The decision on what level to support is based on maintainers' preferences for | ||
| complexity and technical risk. In general all implementations should be open | ||
| to native endianness support (provided the CI and performance requirements | ||
| are met). Cross endianness support is a question for individual maintainers. | ||
|
|
||
| The current implementations aiming for cross endian support are: | ||
|
|
||
| 1. C++ | ||
|
|
||
| Implementations that do not intend to implement cross endian support: | ||
|
|
||
| 1. Java | ||
|
||
|
|
||
| For other libraries, a discussion to gather consensus on the mailing-list | ||
| should be had before submitting PRs. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -330,21 +330,3 @@ An object can be reconstructed from its component-based representation using | |
|
|
||
| ``deserialize_components`` is also available as a method on | ||
| ``SerializationContext`` objects. | ||
|
|
||
| Serializing pandas Objects | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Had to remove this to build the docs. This is a deprecated functionality. |
||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The default serialization context has optimized handling of pandas | ||
| objects like ``DataFrame`` and ``Series``. Combined with component-based | ||
| serialization above, this enables zero-copy transport of pandas DataFrame | ||
| objects not containing any Python objects: | ||
|
|
||
| .. ipython:: python | ||
|
|
||
| import pandas as pd | ||
| df = pd.DataFrame({'a': [1, 2, 3, 4, 5]}) | ||
| context = pa.default_serialization_context() | ||
| serialized_df = context.serialize(df) | ||
| df_components = serialized_df.to_components() | ||
| original_df = context.deserialize_components(df_components) | ||
| original_df | ||
Uh oh!
There was an error while loading. Please reload this page.