Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 14 additions & 9 deletions docs/source/format/IPC.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,17 @@ Encapsulated message format
Data components in the stream and file formats are represented as encapsulated
*messages* consisting of:

* A length prefix indicating the metadata size
* A 32-bit continuation indicator. The value `0xFFFFFFFF` indicates a valid
message. This component was introduced in version 0.15.0 in part to address
the 8-byte alignment requirement of Flatbuffers
* A 32-bit little-endian length prefix indicating the metadata size
* The message metadata as a `Flatbuffer`_
* Padding bytes to an 8-byte boundary
* The message body, which must be a multiple of 8 bytes

Schematically, we have: ::

<continuation: 0xFFFFFFFF>
<metadata_size: int32>
<metadata_flatbuffer: bytes>
<padding>
Expand Down Expand Up @@ -79,14 +83,15 @@ in a ``RecordBatch`` it should be defined in a ``DictionaryBatch``. ::
<DICTIONARY y DELTA>
...
<RECORD BATCH n - 1>
<EOS [optional]: int32>
<EOS [optional]: 0x0000000000000000>

When a stream reader implementation is reading a stream, after each message, it
may read the next 4 bytes to know how large the message metadata that follows
is. Once the message flatbuffer is read, you can then read the message body.
When a stream reader implementation is reading a stream, after each
message, it may read the next 8 bytes to determine both if the stream
continues and the size of the message metadata that follows. Once the
message flatbuffer is read, you can then read the message body.

The stream writer can signal end-of-stream (EOS) either by writing a 0 length
as an ``int32`` or simply closing the stream interface.
The stream writer can signal end-of-stream (EOS) either by writing 8
zero (`0x00`) bytes or closing the stream interface.

File format
-----------
Expand Down Expand Up @@ -219,8 +224,8 @@ take the form: ::
Tensor (Multi-dimensional Array) Message Format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``Tensor`` message types provides a way to write a multidimensional array of
fixed-size values (such as a NumPy ndarray) using Arrow's shared memory
The ``Tensor`` message types provides a way to write a multidimensional array
of fixed-size values (such as a NumPy ndarray) using Arrow's shared memory
tools. Arrow implementations in general are not required to implement this data
format, though we provide a reference implementation in C++.

Expand Down