From bc98e4ebd09ac7f3533eaa95eca336dd8e66ba2c Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Wed, 25 Aug 2021 11:50:47 +0200
Subject: [PATCH 1/8] Add links to the cookbook

---
 docs/source/index.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/docs/source/index.rst b/docs/source/index.rst
index 65aeb47ea9f..5579e8cd781 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -55,6 +55,16 @@ target environment.**
    Rust <https://docs.rs/crate/arrow/>
    status
 
+.. _toc.cookbook:
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Cookbooks
+
+   C++ <https://arrow.apache.org/cookbook/cpp/>
+   Python <https://arrow.apache.org/cookbook/py/>
+   R <https://arrow.apache.org/cookbook/r/>
+
 .. _toc.columnar:
 
 .. toctree::

From 7a5f0fae39890243f236cf136a1b5add80b17862 Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Wed, 25 Aug 2021 16:32:15 +0200
Subject: [PATCH 2/8] Improve doc for new users

---
 docs/source/python/getstarted.rst | 149 ++++++++++++++++++++++++++++++
 docs/source/python/index.rst      |  16 +++-
 2 files changed, 160 insertions(+), 5 deletions(-)
 create mode 100644 docs/source/python/getstarted.rst

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
new file mode 100644
index 00000000000..4af82b367e1
--- /dev/null
+++ b/docs/source/python/getstarted.rst
@@ -0,0 +1,149 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _getstarted:
+
+Getting Started
+===============
+
+Arrow manages data in Arrays (:class:`pyarrow.Array`), which can be
+grouped in tables (:class:`pyarrow.Table`) to represent columns of data
+in tabular data.
+
+Arrow also exposes supports for various formats to get those tabular
+data in and out of disk and networks. Most commonly used formats are
+Parquet (:ref:`parquet`) and the IPC format (:ref:`ipc`). 
+
+Creating Arrays and Tables
+--------------------------
+
+Arrays in Arrow are collections of data of uniform type. That allows
+arrow to use the best performing implementation to store the data and
+perform computation of it. So each array is meant to have data and
+a type
+
+.. ipython:: python
+
+    import pyarrow as pa
+
+    days = pa.array([1, 12, 17, 23, 28], type=pa.int8())
+
+multiple arrays can be combined in tables to form the columns
+in tabular data according to a provided schema
+
+.. ipython:: python
+
+    months = pa.array([1, 3, 5, 7, 1], type=pa.int8())
+    years = pa.array([1990, 2000, 1995, 2000, 1995], type=pa.int16())
+
+    birthdays_table = pa.table([days, months, years], 
+                               schema=pa.schema([
+                                    ('days', days.type),
+                                    ('months', months.type),
+                                    ('years', years.type)
+                               ]))
+    
+    birthdays_table
+
+See :ref:`data` for more details.
+
+Saving and Loading Tables
+-------------------------
+
+Once you have a tabular data, Arrow provides out of the box
+the features to save and restore that data for common formats
+like parquet
+
+.. ipython:: python   
+
+    import pyarrow.parquet as pq
+
+    pq.write_table(birthdays_table, 'birthdays.parquet')
+
+Once you have your data on disk, loading it back is as easy,
+and Arrow is heavily optimized for memory and speed so loading
+data will be as quick as possible
+
+.. ipython:: python
+
+    reloaded_birthdays = pq.read_table('birthdays.parquet')
+
+    reloaded_birthdays
+
+Saving and loading back data in arrow is usually done through
+:ref:`parquet`, :ref:`ipc`, :ref:`csv` or :ref:`json` formats.
+
+Performing Computations
+-----------------------
+
+Arrow ships with a bunch of compute functions that can be applied
+to its arrays, so through the compute functions it's possible to apply
+transformations to the data
+
+.. ipython:: python
+
+    import pyarrow.compute as pc
+
+    pc.value_counts(birthdays_table["years"])
+
+See :ref:`compute` for a list of available compute functions and
+how to use them.
+
+Working with big data
+---------------------
+
+Arrow also provides the :class:`pyarrow.dataset` api to work with
+big data, which will handle for you partitioning of your data in
+smaller chunks
+
+.. ipython:: python
+
+    import pyarrow.dataset as ds
+
+    ds.write_dataset(birthdays_table, "savedir", format="parquet", 
+                     partitioning=ds.partitioning(
+                        pa.schema([birthdays_table.schema.field("years")])
+                    ))
+
+Loading back the partitioned dataset will detect the chunks
+
+.. ipython:: python
+
+    birthdays_dataset = ds.dataset("savedir", schema=birthdays_table.schema,
+                                   partitioning=ds.partitioning(field_names=["years"]))
+
+    birthdays_dataset.files
+
+and will lazily load chunks of data only when iterating over them
+
+.. ipython:: python
+
+    import datetime
+
+    current_year = datetime.datetime.utcnow().year
+    for table_chunk in birthdays_dataset.to_batches():
+        print("AGES", pc.abs(pc.subtract(table_chunk["years"], current_year)))
+
+For further details on how to work with big datasets, how to filter them,
+how to project them etc... refer to :ref:`dataset` documentation.
+
+Continuining from here
+----------------------
+
+For digging further into Arrow, you might want to read the 
+:doc:`PyArrow Documentation <./index>` itself or the 
+`Arrow Python Cookbook <https://arrow.apache.org/cookbook/py/>`_
diff --git a/docs/source/python/index.rst b/docs/source/python/index.rst
index cc7383044e0..14fe21b9bfa 100644
--- a/docs/source/python/index.rst
+++ b/docs/source/python/index.rst
@@ -15,12 +15,17 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-Python bindings
-===============
+PyArrow - Apache Arrow Python bindings
+======================================
 
 This is the documentation of the Python API of Apache Arrow. For more details
-on the Arrow format and other language bindings see the
-:doc:`parent documentation <../index>`.
+on the Arrow format and other language bindings 
+
+Apache Arrow is a development platform for in-memory analytics. 
+It contains a set of technologies that enable big data systems to store, process and move data fast. 
+
+See the :doc:`parent documentation <../index>` for additional details on
+the Arrow Project itself.
 
 The Arrow Python bindings (also named "PyArrow") have first-class integration
 with NumPy, pandas, and built-in Python objects. They are based on the C++
@@ -34,9 +39,10 @@ files into Arrow structures.
    :maxdepth: 2
 
    install
-   memory
+   getstarted
    data
    compute
+   memory
    ipc
    filesystems
    filesystems_deprecated

From a534f37fc283db7d5e251f05f654a5c26b4b14f1 Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Mon, 30 Aug 2021 12:01:37 +0200
Subject: [PATCH 3/8] Apply suggestions from code review

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
---
 docs/source/python/getstarted.rst | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 4af82b367e1..79ab43977ff 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -24,7 +24,7 @@ Arrow manages data in Arrays (:class:`pyarrow.Array`), which can be
 grouped in tables (:class:`pyarrow.Table`) to represent columns of data
 in tabular data.
 
-Arrow also exposes supports for various formats to get those tabular
+Arrow also provides support for various formats to get those tabular
 data in and out of disk and networks. Most commonly used formats are
 Parquet (:ref:`parquet`) and the IPC format (:ref:`ipc`). 
 
@@ -32,8 +32,8 @@ Creating Arrays and Tables
 --------------------------
 
 Arrays in Arrow are collections of data of uniform type. That allows
-arrow to use the best performing implementation to store the data and
-perform computation of it. So each array is meant to have data and
+Arrow to use the best performing implementation to store the data and
+perform computations on it. So each array is meant to have data and
 a type
 
 .. ipython:: python
@@ -42,7 +42,7 @@ a type
 
     days = pa.array([1, 12, 17, 23, 28], type=pa.int8())
 
-multiple arrays can be combined in tables to form the columns
+Multiple arrays can be combined in tables to form the columns
 in tabular data according to a provided schema
 
 .. ipython:: python
@@ -64,9 +64,9 @@ See :ref:`data` for more details.
 Saving and Loading Tables
 -------------------------
 
-Once you have a tabular data, Arrow provides out of the box
+Once you have tabular data, Arrow provides out of the box
 the features to save and restore that data for common formats
-like parquet
+like Parquet:
 
 .. ipython:: python   
 
@@ -85,7 +85,7 @@ data will be as quick as possible
     reloaded_birthdays
 
 Saving and loading back data in arrow is usually done through
-:ref:`parquet`, :ref:`ipc`, :ref:`csv` or :ref:`json` formats.
+:ref:`parquet`, :ref:`ipc` (:ref:`feather`), :ref:`csv` or :ref:`json` formats.
 
 Performing Computations
 -----------------------
@@ -123,8 +123,7 @@ Loading back the partitioned dataset will detect the chunks
 
 .. ipython:: python
 
-    birthdays_dataset = ds.dataset("savedir", schema=birthdays_table.schema,
-                                   partitioning=ds.partitioning(field_names=["years"]))
+    birthdays_dataset = ds.dataset("savedir", format="parquet", partitioning=["years"])
 
     birthdays_dataset.files
 
@@ -136,10 +135,10 @@ and will lazily load chunks of data only when iterating over them
 
     current_year = datetime.datetime.utcnow().year
     for table_chunk in birthdays_dataset.to_batches():
-        print("AGES", pc.abs(pc.subtract(table_chunk["years"], current_year)))
+        print("AGES", pc.subtract(current_year, table_chunk["years"]))
 
 For further details on how to work with big datasets, how to filter them,
-how to project them etc... refer to :ref:`dataset` documentation.
+how to project them, etc., refer to :ref:`dataset` documentation.
 
 Continuining from here
 ----------------------

From c59d08c8a6bdc0ed82bdba02d61ec9f02757227d Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Mon, 30 Aug 2021 15:39:26 +0200
Subject: [PATCH 4/8] Address feedback

---
 docs/source/python/getstarted.rst | 12 ++++--------
 docs/source/python/index.rst      |  5 ++---
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 79ab43977ff..756e6157efe 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -43,19 +43,15 @@ a type
     days = pa.array([1, 12, 17, 23, 28], type=pa.int8())
 
 Multiple arrays can be combined in tables to form the columns
-in tabular data according to a provided schema
+in tabular data when attached to a column name
 
 .. ipython:: python
 
     months = pa.array([1, 3, 5, 7, 1], type=pa.int8())
     years = pa.array([1990, 2000, 1995, 2000, 1995], type=pa.int16())
 
-    birthdays_table = pa.table([days, months, years], 
-                               schema=pa.schema([
-                                    ('days', days.type),
-                                    ('months', months.type),
-                                    ('years', years.type)
-                               ]))
+    birthdays_table = pa.table([days, months, years],
+                               names=["days", "months", "years"])
     
     birthdays_table
 
@@ -74,7 +70,7 @@ like Parquet:
 
     pq.write_table(birthdays_table, 'birthdays.parquet')
 
-Once you have your data on disk, loading it back is as easy,
+Once you have your data on disk, loading it back is a single function call,
 and Arrow is heavily optimized for memory and speed so loading
 data will be as quick as possible
 
diff --git a/docs/source/python/index.rst b/docs/source/python/index.rst
index 14fe21b9bfa..0ffa40545d9 100644
--- a/docs/source/python/index.rst
+++ b/docs/source/python/index.rst
@@ -18,14 +18,13 @@
 PyArrow - Apache Arrow Python bindings
 ======================================
 
-This is the documentation of the Python API of Apache Arrow. For more details
-on the Arrow format and other language bindings 
+This is the documentation of the Python API of Apache Arrow.
 
 Apache Arrow is a development platform for in-memory analytics. 
 It contains a set of technologies that enable big data systems to store, process and move data fast. 
 
 See the :doc:`parent documentation <../index>` for additional details on
-the Arrow Project itself.
+the Arrow Project itself, on the Arrow format and the other language bindings.
 
 The Arrow Python bindings (also named "PyArrow") have first-class integration
 with NumPy, pandas, and built-in Python objects. They are based on the C++

From a6ca9448353dafd2b141e8b8b736045b3f475bbe Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Mon, 30 Aug 2021 15:45:04 +0200
Subject: [PATCH 5/8] provide link names

---
 docs/source/python/getstarted.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 756e6157efe..90de8bf2d08 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -81,7 +81,8 @@ data will be as quick as possible
     reloaded_birthdays
 
 Saving and loading back data in arrow is usually done through
-:ref:`parquet`, :ref:`ipc` (:ref:`feather`), :ref:`csv` or :ref:`json` formats.
+:ref:`Parquet <parquet>`, :ref:`IPC format <ipc>` (:ref:`feather`), :ref:`CSV <csv>` or
+:ref:`Line-Delimited JSON <json>` formats.
 
 Performing Computations
 -----------------------

From 5cc90ea8aaa6e6a280d2da3730db7d290b696f3a Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Tue, 31 Aug 2021 15:55:28 +0200
Subject: [PATCH 6/8] replace big data with large data

---
 docs/source/python/getstarted.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 90de8bf2d08..95b647bc167 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -100,11 +100,11 @@ transformations to the data
 See :ref:`compute` for a list of available compute functions and
 how to use them.
 
-Working with big data
----------------------
+Working with large data
+-----------------------
 
 Arrow also provides the :class:`pyarrow.dataset` api to work with
-big data, which will handle for you partitioning of your data in
+large data, which will handle for you partitioning of your data in
 smaller chunks
 
 .. ipython:: python

From eea3795e103ab3c6c1a391ded545a98eeb711324 Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Tue, 31 Aug 2021 15:56:21 +0200
Subject: [PATCH 7/8] tables too

---
 docs/source/python/getstarted.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 95b647bc167..45c6e7958d3 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -88,8 +88,8 @@ Performing Computations
 -----------------------
 
 Arrow ships with a bunch of compute functions that can be applied
-to its arrays, so through the compute functions it's possible to apply
-transformations to the data
+to its arrays and tables, so through the compute functions 
+it's possible to apply transformations to the data
 
 .. ipython:: python
 

From c131048773e5c98711df569854fa024be10ff705 Mon Sep 17 00:00:00 2001
From: Alessandro Molina <amol@turbogears.org>
Date: Wed, 1 Sep 2021 11:15:35 +0200
Subject: [PATCH 8/8] Tweak

---
 docs/source/python/getstarted.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/source/python/getstarted.rst b/docs/source/python/getstarted.rst
index 45c6e7958d3..36e4707ad61 100644
--- a/docs/source/python/getstarted.rst
+++ b/docs/source/python/getstarted.rst
@@ -20,7 +20,7 @@
 Getting Started
 ===============
 
-Arrow manages data in Arrays (:class:`pyarrow.Array`), which can be
+Arrow manages data in arrays (:class:`pyarrow.Array`), which can be
 grouped in tables (:class:`pyarrow.Table`) to represent columns of data
 in tabular data.
 
@@ -81,8 +81,8 @@ data will be as quick as possible
     reloaded_birthdays
 
 Saving and loading back data in arrow is usually done through
-:ref:`Parquet <parquet>`, :ref:`IPC format <ipc>` (:ref:`feather`), :ref:`CSV <csv>` or
-:ref:`Line-Delimited JSON <json>` formats.
+:ref:`Parquet <parquet>`, :ref:`IPC format <ipc>` (:ref:`feather`), 
+:ref:`CSV <csv>` or :ref:`Line-Delimited JSON <json>` formats.
 
 Performing Computations
 -----------------------
@@ -103,7 +103,7 @@ how to use them.
 Working with large data
 -----------------------
 
-Arrow also provides the :class:`pyarrow.dataset` api to work with
+Arrow also provides the :class:`pyarrow.dataset` API to work with
 large data, which will handle for you partitioning of your data in
 smaller chunks