From 57b8bf5c2a8d233dec63a69f402c3b120d946890 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 13 Jan 2017 14:56:05 -0500 Subject: [PATCH 1/4] Revise README to include more detail about software components Change-Id: I0be8c79a038d76197700dd9424cad6e2e385e5c8 --- README.md | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 89114ee39b4..ec1514a5bda 100644 --- a/README.md +++ b/README.md @@ -32,17 +32,37 @@ Arrow is a set of technologies that enable big-data systems to process and move Initial implementations include: - [The Arrow Format](https://github.com/apache/arrow/tree/master/format) - - [Arrow Structures and APIs in C++](https://github.com/apache/arrow/tree/master/cpp) - - [Arrow Structures and APIs in Java](https://github.com/apache/arrow/tree/master/java) + - [Java implementation](https://github.com/apache/arrow/tree/master/java) + - [C++ implementation](https://github.com/apache/arrow/tree/master/cpp) + - [Python interface to C++ libraries](https://github.com/apache/arrow/tree/master/python) -Arrow is an [Apache Software Foundation](www.apache.org) project. More info can be found at [arrow.apache.org](http://arrow.apache.org). +Arrow is an [Apache Software Foundation](www.apache.org) project. Learn more at +[arrow.apache.org](http://arrow.apache.org). + +#### What's in the Arrow libraries? + +The reference Arrow implementations contain a number of distinct software +components: + +- Columnar vector/array and table row batch containers supporting nested data +- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers + library) +- Reference counted off-heap buffer memory management, for zero-copy memory + sharing and handling memory-mapped file +- Low-overhead IO interfaces to file system, HDFS (C++ only) +- A self-contained binary "file format" for remote procedure calls (RPC) and + interprocess communication (IPC) +- Integration tests for verifying binary compatibility between the + implementations (e.g. sending data from Java to C++) +- Conversions to and from other in-memory data structures (e.g. Python's pandas + library) #### Getting involved -Right now the primary audience for Apache Arrow are the designers and -developers of data systems; most people will use Apache Arrow indirectly -through systems that use it for internal data handling and interoperating with -other Arrow-enabled systems. +Right now the primary audience for Apache Arrow are the developers of data +systems; most people will use Apache Arrow indirectly through systems that use +it for internal data handling and interoperating with other Arrow-enabled +systems. Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: From 3d3164453b56a1fa7a0eb986ccb6889b674471be Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 13 Jan 2017 14:57:22 -0500 Subject: [PATCH 2/4] Typos Change-Id: I4d3804fd120946d209653a0a87754ca0f3a22d0d --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ec1514a5bda..9850d00103f 100644 --- a/README.md +++ b/README.md @@ -47,9 +47,9 @@ components: - Columnar vector/array and table row batch containers supporting nested data - Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library) -- Reference counted off-heap buffer memory management, for zero-copy memory - sharing and handling memory-mapped file -- Low-overhead IO interfaces to file system, HDFS (C++ only) +- Reference-counted off-heap buffer memory management, for zero-copy memory + sharing and handling memory-mapped files +- Low-overhead IO interfaces to files on disk, HDFS (C++ only) - A self-contained binary "file format" for remote procedure calls (RPC) and interprocess communication (IPC) - Integration tests for verifying binary compatibility between the From ec4b95ebe4a90d23ed1e06cf1085ad5c976b589a Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Sun, 15 Jan 2017 14:44:59 -0500 Subject: [PATCH 3/4] Generalize note about binary formats Change-Id: I452a2f79cac0816812a4cb8bd90a10488d3923ed --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9850d00103f..f2037672034 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,8 @@ components: - Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files - Low-overhead IO interfaces to files on disk, HDFS (C++ only) -- A self-contained binary "file format" for remote procedure calls (RPC) and +- Self-describing binary wire formats (streaming and batch/file-like) for + remote procedure calls (RPC) and interprocess communication (IPC) - Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++) From 8c6acf67310b9a3091269175bf0b8f277d494093 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Mon, 16 Jan 2017 13:46:08 -0500 Subject: [PATCH 4/4] Tweak description of data containers Change-Id: I9eeca781036a43728cc7b8c85c75a96b856dbaf5 --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f2037672034..1eb3f86f986 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,8 @@ Arrow is an [Apache Software Foundation](www.apache.org) project. Learn more at The reference Arrow implementations contain a number of distinct software components: -- Columnar vector/array and table row batch containers supporting nested data +- Columnar vector and table-like containers (similar to data frames) supporting + flat or nested types - Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library) - Reference-counted off-heap buffer memory management, for zero-copy memory