Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions docs/source/format/Columnar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1006,19 +1006,21 @@ message flatbuffer is read, you can then read the message body.

The stream writer can signal end-of-stream (EOS) either by writing 8 bytes
containing the 4-byte continuation indicator (``0xFFFFFFFF``) followed by 0
metadata length (``0x00000000``) or closing the stream interface.
metadata length (``0x00000000``) or closing the stream interface. We
recommend the ".arrows" file extension for the streaming format although
in many cases these streams will not ever be stored as files.

IPC File Format
---------------

We define a "file format" supporting random access that is build with
the stream format. The file starts and ends with a magic string
``ARROW1`` (plus padding). What follows in the file is identical to
the stream format. At the end of the file, we write a *footer*
containing a redundant copy of the schema (which is a part of the
streaming format) plus memory offsets and sizes for each of the data
blocks in the file. This enables random access any record batch in the
file. See `File.fbs`_ for the precise details of the file footer.
We define a "file format" supporting random access that is an extension of
the stream format. The file starts and ends with a magic string ``ARROW1``
(plus padding). What follows in the file is identical to the stream format.
At the end of the file, we write a *footer* containing a redundant copy of
the schema (which is a part of the streaming format) plus memory offsets and
sizes for each of the data blocks in the file. This enables random access to
any record batch in the file. See `File.fbs`_ for the precise details of the
file footer.

Schematically we have: ::

Expand All @@ -1034,8 +1036,9 @@ should be defined in a ``DictionaryBatch`` before they are used in a
``RecordBatch``, as long as the keys are defined somewhere in the
file. Further more, it is invalid to have more than one **non-delta**
dictionary batch per dictionary ID (i.e. dictionary replacement is not
supported). Delta dictionaries are applied in the order they appear in
the file footer.
supported). Delta dictionaries are applied in the order they appear in
the file footer. We recommend the ".arrow" extension for files created with
this format.

Dictionary Messages
-------------------
Expand Down