datathink · ugohuche · Apr 23, 2026 · Apr 17, 2026 · Apr 22, 2026 · Apr 22, 2026
diff --git a/data/bake_sale.xlsx b/data/bake_sale.xlsx
diff --git a/databases.ipynb b/databases.ipynb
@@ -17,7 +17,7 @@
     "\n",
     "### Prerequisites\n",
     "\n",
-    "You will need the **pandas**, **SQLModel**, and **ibis** packages for this chapter. You probably already have **pandas** installed; to install **SQLModel** and **ibis** respectively run `uv add sqlmodel` and `uv add ibis-framework` on your computer's command line. First, let's bring in some general packages and turn off verbose warnings."
+    "You will need the **polars**, **SQLModel**, and **ibis** packages for this chapter. You probably already have **polars** installed; to install **SQLModel** and **ibis** respectively run `uv add sqlmodel` and `uv add ibis-framework` on your computer's command line. First, let's bring in some general packages and turn off verbose warnings."
    ]
   },
   {
@@ -39,10 +39,9 @@
    "metadata": {},
    "source": [
     "## Database Basics\n",
-    "\n",
-    "At the simplest level, you can think about a database as a collection of data frames, called **tables** in database terminology.\n",
-    "Like a **pandas** data frame, a database table is a collection of named columns, where every value in the column is the same type.\n",
-    "There are three high level differences between data frames and database tables:\n",
+    "At the simplest level, you can think about a database as a collection of data frames, called **tables** in database terminology. \n",
+    "Like a **Polars** DataFrame, a database table is a collection of named columns, where every value in a column shares the same data type. \n",
+    "There are three high-level differences between data frames and database tables:\n",
     "\n",
     "-   Database tables are stored on disk (ie on file) and can be arbitrarily large.\n",
     "    Data frames are stored in memory, and are fundamentally limited (although that limit is still big enough for many problems). You can think about the difference between on disk and in memory as being like the difference between long-term and short-term memory (and you have much more limited capacity in the latter).\n",
@@ -68,7 +67,7 @@
     "\n",
     "-   You'll always use a database interface that provides a connection to the database, for example Python's built-in **sqlite** package\n",
     "\n",
-    "-   You'll also use a package that pushes and/or pulls data to/from the database, for example **pandas**\n",
+    "-   You'll also use a package that pushes and/or pulls data to/from the database, for example **polars**\n",
     "\n",
     "The precise details of the connection varies a lot from DBMS to DBMS so unfortunately we can't cover all the details here. The initial setup will often take a little fiddling (and maybe some research) to get right, but you'll generally only need to do it once. We'll do the best we can to cover some basics here.\n",
     "\n",
@@ -112,7 +111,7 @@
    "id": "2992b718",
    "metadata": {},
    "source": [
-    "Note that the output here is in the form a Python object called a tuple. If we wanted to put this into a **pandas** data frame, we can just pass it straight in:"
+    "Note that the output here is in the form of a Python object called a tuple. If we want to convert this into a **Polars** DataFrame, we can pass it to `pl.DataFrame()`. When working with tuples, you may need to provide column names using the **schema** argument or specify **orient=\"row\"** so Polars correctly interprets the structure."
    ]
   },
   {
@@ -122,9 +121,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import pandas as pd\n",
+    "import polars as pl\n",
+    "\n",
+    "df = pl.DataFrame(rows, orient=\"row\")\n",
     "\n",
-    "pd.DataFrame(rows)"
+    "df"
    ]
   },
   {
@@ -316,9 +317,9 @@
    "source": [
     "### Joins\n",
     "\n",
-    "If you're familiar with joins in **pandas**, SQL joins are very similar. Let's see if we can join the 'album' and 'track' tables to find the *name* of the albums in the above query.\n",
+    "If you’re familiar with joins in **polars**, SQL joins are very similar. Let’s see if we can join the 'album' and 'track' tables to find the *name* of the albums in the above query.\n",
     "\n",
-    "Note that as soon as we have the *same* column names in more than one table, we need to specify the table we are referring to when we use that column name. There are different options for joins (eg `INNER`, `LEFT`) that you can find out more about [here](https://en.wikipedia.org/wiki/Join_(SQL)).\n"
+    "In polars, you use the `df.join()` method, which defaults to an \"inner\" join. Note that if you have the same column names in both tables, Polars will often append a suffix (like _right) to the duplicate names to keep them distinct, unless you specify otherwise. There are different options for joins (eg `INNER`, `LEFT`) that you can find out more about [here](https://en.wikipedia.org/wiki/Join_(SQL)).\n"
    ]
   },
   {
@@ -403,9 +404,9 @@
    "id": "495f97e5",
    "metadata": {},
    "source": [
-    "## SQL with **pandas**\n",
+    "## SQL with **polars**\n",
     "\n",
-    "**pandas** is well-equipped for working with SQL. We can simply push the query we just created straight through using its `read_sql()` function—but bear in mind we need to pass in the connection we created to the database too:"
+    "**polars** is well-equipped for working with SQL. We can simply push the query we just created straight through using its `read_database()` function—but bear in mind we need to pass in the connection we created to the database too:"
    ]
   },
   {
@@ -415,7 +416,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pd.read_sql(sql_join, con)"
+    "df = pl.read_database(\n",
+    "    query=sql_join,  # your SQL query (string)\n",
+    "    connection=con,  # your connection object (SQLAlchemy, psycopg2 cursor, etc.)\n",
+    ")"
    ]
   },
   {
@@ -435,7 +439,7 @@
    "source": [
     "## SQL with **ibis**\n",
     "\n",
-    "It's not exactly satisfactory to have to write out your SQL queries in text. What if we could create commands directly from **pandas** commands? You can't *quite* do that, but there's a package that gets you pretty close and it's called [**ibis**](https://ibis-project.org/). **ibis** is particularly useful when you are reading from a database and want to query it just like you would a **pandas** data frame.\n",
+    "It's not exactly satisfactory to have to write out your SQL queries in text. What if we could create commands directly from **polars** commands? You can't *quite* do that, but there's a package that gets you pretty close and it's called [**ibis**](https://ibis-project.org/). **ibis** is particularly useful when you are reading from a database and want to query it just like you would a **polars** data frame.\n",
     "\n",
     "**Ibis** can connect to local databases (eg a SQLite database), server-based databases (eg Postgres), or cloud-based databased (eg Google's BigQuery). The syntax to make a connection is, for example, `ibis.bigquery.connect`.\n",
     "\n",
@@ -462,7 +466,7 @@
    "id": "6dcd7d71",
    "metadata": {},
    "source": [
-    "Okay, now let's reproduce the following query: \"SELECT albumid, AVG(milliseconds)/1e3/60 FROM track GROUP BY albumid ORDER BY AVG(milliseconds) ASC LIMIT 5;\". We'll use a groupby, a mutate (which you can think of like **pandas**' assign statement), a sort, and then `limit()` to only show the first five entries."
+    "Okay, now let's reproduce the following query: \"SELECT albumid, AVG(milliseconds)/1e3/60 FROM track GROUP BY albumid ORDER BY AVG(milliseconds) ASC LIMIT 5;\". We'll use a group_by, a mutate (which you can think of like **polars** assign statement), a sort, and then `limit()` to only show the first five entries."
    ]
   },
   {

diff --git a/pyproject.toml b/pyproject.toml
@@ -6,6 +6,7 @@ readme = "README.md"
 requires-python = ">=3.12.0,<3.13"
 dependencies = [
     "beautifulsoup4>=4.12.3",
+    "fastexcel>=0.19.0",
     "graphviz>=0.20.3",
     "ibis-framework[sqlite]>=9.5.0",
     "ipykernel>=6.29.5",
@@ -36,6 +37,7 @@ dependencies = [
     "toml>=0.10.2",
     "watermark>=2.5.0",
     "wbgapi>=1.0.14",
+    "xlsxwriter>=3.2.0",
     "yfinance>=1.2.1",
 ]