diff --git a/docs/source/format/IPC.rst b/docs/source/format/IPC.rst index 16567a6744e..88cab8a288c 100644 --- a/docs/source/format/IPC.rst +++ b/docs/source/format/IPC.rst @@ -24,13 +24,17 @@ Encapsulated message format Data components in the stream and file formats are represented as encapsulated *messages* consisting of: -* A length prefix indicating the metadata size +* A 32-bit continuation indicator. The value `0xFFFFFFFF` indicates a valid + message. This component was introduced in version 0.15.0 in part to address + the 8-byte alignment requirement of Flatbuffers +* A 32-bit little-endian length prefix indicating the metadata size * The message metadata as a `Flatbuffer`_ * Padding bytes to an 8-byte boundary * The message body, which must be a multiple of 8 bytes Schematically, we have: :: + @@ -79,14 +83,15 @@ in a ``RecordBatch`` it should be defined in a ``DictionaryBatch``. :: ... - + -When a stream reader implementation is reading a stream, after each message, it -may read the next 4 bytes to know how large the message metadata that follows -is. Once the message flatbuffer is read, you can then read the message body. +When a stream reader implementation is reading a stream, after each +message, it may read the next 8 bytes to determine both if the stream +continues and the size of the message metadata that follows. Once the +message flatbuffer is read, you can then read the message body. -The stream writer can signal end-of-stream (EOS) either by writing a 0 length -as an ``int32`` or simply closing the stream interface. +The stream writer can signal end-of-stream (EOS) either by writing 8 +zero (`0x00`) bytes or closing the stream interface. File format ----------- @@ -219,8 +224,8 @@ take the form: :: Tensor (Multi-dimensional Array) Message Format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The ``Tensor`` message types provides a way to write a multidimensional array of -fixed-size values (such as a NumPy ndarray) using Arrow's shared memory +The ``Tensor`` message types provides a way to write a multidimensional array +of fixed-size values (such as a NumPy ndarray) using Arrow's shared memory tools. Arrow implementations in general are not required to implement this data format, though we provide a reference implementation in C++.