From 66022a6a0e8cb611b78817f0437f564118812fb5 Mon Sep 17 00:00:00 2001
From: Fokko Driesprong <fokko@tabular.io>
Date: Fri, 10 Nov 2023 21:00:31 +0100
Subject: [PATCH 1/2] Docs: Add section on pandas

---
 mkdocs/docs/api.md | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/mkdocs/docs/api.md b/mkdocs/docs/api.md
index d716a138a2..613c33e51b 100644
--- a/mkdocs/docs/api.md
+++ b/mkdocs/docs/api.md
@@ -318,7 +318,7 @@ In this case it is up to the engine itself to filter the file itself. Below, `to
 <!-- prettier-ignore-start -->
 
 !!! note "Requirements"
-    This requires [PyArrow to be installed](index.md).
+    This requires [`pyarrow` to be installed](index.md).
 
 <!-- prettier-ignore-end -->
 
@@ -346,6 +346,45 @@ tpep_dropoff_datetime: [[2021-04-01 00:47:59.000000,...,2021-05-01 00:14:47.0000
 
 This will only pull in the files that that might contain matching rows.
 
+### Pandas
+
+<!-- prettier-ignore-start -->
+
+!!! note "Requirements"
+    This requires [`pandas` to be installed](index.md).
+
+<!-- prettier-ignore-end -->
+
+PyIceberg makes it easy to filter out data from a huge table and pull it into a Pandas dataframe locally. This will only fetch Parquet files that that might contain matching data. This will reduce IO and therefore improve performance and reduce cost.
+
+```python
+table.scan(
+    row_filter="trip_distance >= 10.0",
+    selected_fields=("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime"),
+).to_pandas()
+```
+
+This will return a Pandas dataframe:
+
+```
+        VendorID      tpep_pickup_datetime     tpep_dropoff_datetime
+0              2 2021-04-01 00:28:05+00:00 2021-04-01 00:47:59+00:00
+1              1 2021-04-01 00:39:01+00:00 2021-04-01 00:57:39+00:00
+2              2 2021-04-01 00:14:42+00:00 2021-04-01 00:42:59+00:00
+3              1 2021-04-01 00:17:17+00:00 2021-04-01 00:43:38+00:00
+4              1 2021-04-01 00:24:04+00:00 2021-04-01 00:56:20+00:00
+...          ...                       ...                       ...
+116976         2 2021-04-30 23:56:18+00:00 2021-05-01 00:29:13+00:00
+116977         2 2021-04-30 23:07:41+00:00 2021-04-30 23:37:18+00:00
+116978         2 2021-04-30 23:38:28+00:00 2021-05-01 00:12:04+00:00
+116979         2 2021-04-30 23:33:00+00:00 2021-04-30 23:59:00+00:00
+116980         2 2021-04-30 23:44:25+00:00 2021-05-01 00:14:47+00:00
+
+[116981 rows x 3 columns]
+```
+
+It is recommended to use Pandas 2 or later, because it stores the data in an [Apache Arrow backend](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i) which avoids copies of data.
+
 ### DuckDB
 
 <!-- prettier-ignore-start -->

From aa8941316aaf25051c0e98e29ae6a6966ceda9cd Mon Sep 17 00:00:00 2001
From: Fokko Driesprong <fokko@apache.org>
Date: Tue, 14 Nov 2023 17:55:00 +0100
Subject: [PATCH 2/2] Update mkdocs/docs/api.md

---
 mkdocs/docs/api.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mkdocs/docs/api.md b/mkdocs/docs/api.md
index 613c33e51b..e2f726afe8 100644
--- a/mkdocs/docs/api.md
+++ b/mkdocs/docs/api.md
@@ -355,7 +355,7 @@ This will only pull in the files that that might contain matching rows.
 
 <!-- prettier-ignore-end -->
 
-PyIceberg makes it easy to filter out data from a huge table and pull it into a Pandas dataframe locally. This will only fetch Parquet files that that might contain matching data. This will reduce IO and therefore improve performance and reduce cost.
+PyIceberg makes it easy to filter out data from a huge table and pull it into a Pandas dataframe locally. This will only fetch the relevant Parquet files for the query and apply the filter. This will reduce IO and therefore improve performance and reduce cost.
 
 ```python
 table.scan(