Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion boolean-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -714,7 +714,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion categorical-data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
88 changes: 44 additions & 44 deletions command-line.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion communicate-plots.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1346,7 +1346,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion data-import.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion data-tidy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion data-transform.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1104,7 +1104,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion data-visualise.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
Binary file modified data/bake_sale.xlsx
Binary file not shown.
2 changes: 1 addition & 1 deletion databases.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -797,7 +797,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion dates-and-times.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1103,7 +1103,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion exploratory-data-analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion functions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
87 changes: 46 additions & 41 deletions iteration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"\n",
"One tool for reducing duplication is functions, which reduce duplication by identifying repeated patterns of code and extract them out into independent pieces that can be easily reused and updated. Another tool for reducing duplication is *iteration*, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.\n",
"\n",
"In this chapter you'll learn about iteration in three ways: explicit iteration, using for loops and while loops; iteration via comprehensions (eg list comprehensions); and iteration for **pandas** data frames."
"In this chapter you'll learn about iteration in three ways: explicit iteration, using for loops and while loops; iteration via comprehensions (eg list comprehensions); and iteration for **polars** data frames."
]
},
{
Expand Down Expand Up @@ -51,7 +51,7 @@
"source": [
"### Prerequisites\n",
"\n",
"This chapter will use the **pandas** data analysis package."
"This chapter will use the **polars** data analysis package."
]
},
{
Expand Down Expand Up @@ -452,11 +452,13 @@
"id": "5ec0643e",
"metadata": {},
"source": [
"## Iteration with **pandas** Data Frames\n",
"## Iteration with **polars** Data Frames\n",
"\n",
"For loops, while loops, and comprehensions all work on **pandas** data frames, but they are generally a bad way to get things done because they are slow and not memory efficient. To aid cases where iteration is needed, **pandas** has built-in methods for iteration depending on what you need to do.\n",
"For loops, while loops, and comprehensions can be used with data frames, but in **Polars**, they are even more strongly discouraged than in pandas. **Polars** is built on a columnar, vectorized, and expression-based engine, so row-by-row iteration breaks performance and prevents optimizations.\n",
"\n",
"These built-in methods for iteration have an overlap with what we've seen in @sec-data-transform but we'll dig a little deeper into `assign()`/assignment operations, `apply()`, and `eval()` here.\n",
"\n",
"Instead of iterating, **Polars** encourages you to use expressions and lazy evaluation, which are much faster and more memory efficient.\n",
"\n"
]
},
Expand All @@ -480,9 +482,9 @@
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import polars as pl\n",
"\n",
"df = pd.DataFrame(np.random.normal(size=(6, 4)), columns=[\"a\", \"b\", \"c\", \"d\"])\n",
"df = pl.DataFrame(np.random.normal(size=(6, 4)), schema=[\"a\", \"b\", \"c\", \"d\"])\n",
"df"
]
},
Expand All @@ -491,7 +493,7 @@
"id": "0e826ad5",
"metadata": {},
"source": [
"**pandas** has many built-in functions that are already built to iterate over rows and columns; for example, to compute the median of rows or columns respectively:"
"**polars** has built-in expressions designed to operate over columns and rows. For example, to compute the median:"
]
},
{
Expand All @@ -501,7 +503,7 @@
"metadata": {},
"outputs": [],
"source": [
"df.median(axis=\"rows\") # can also use axis=1"
"df.select(pl.all().median())"
]
},
{
Expand All @@ -511,7 +513,7 @@
"metadata": {},
"outputs": [],
"source": [
"df.median(axis=\"columns\") # can also use axis=0"
"df.select(pl.concat_list(pl.all()).list.median().alias(\"row_median\"))"
]
},
{
Expand All @@ -535,7 +537,7 @@
"def add_five_slow(df):\n",
" for i in range(len(df)):\n",
" for j in range(len(df.columns)):\n",
" df.iloc[i, j] = df.iloc[i, j] + 5\n",
" df[i, j] = df[i, j] + 5\n",
"\n",
"\n",
"%timeit add_five_slow(df)"
Expand All @@ -546,7 +548,7 @@
"id": "8246132e",
"metadata": {},
"source": [
"But to do this, every individual cell must be accessed and operated on—so it is very slow, taking milliseconds. **pandas** has far faster ways of performing the same operation. For simple operations on data frames with consistent type, you can simply add five to the whole data frame:"
"But to do this, every individual cell must be accessed and operated on—so it is very slow, taking milliseconds. **polars** has far faster ways of performing the same operation. For simple operations on data frames with consistent type, you can simply add five to the whole data frame:"
]
},
{
Expand All @@ -572,9 +574,9 @@
"id": "7313616e",
"metadata": {},
"source": [
"This also works on a per column basis, so you can do `df[\"a\"] = df[\"a\"] + 5` and so on.\n",
"This also works on a per column basis, so you can do `df.with_columns(pl.col(\"a\") + 5)` and so on.\n",
"\n",
"These operations have equivalents using the `assign()` operator, which allows for *method chaining*; stringing multiple operations together. The `assign()` operator version of `df[\"new_a\"] = df[\"a\"] + 5` would be"
"These operations have equivalents using method chaining; stringing multiple operations together. The version of `df.with_columns(new_a = pl.col(\"a\") + 5)` would be:"
]
},
{
Expand All @@ -584,17 +586,19 @@
"metadata": {},
"outputs": [],
"source": [
"df = df.assign(new_a=lambda x: x[\"a\"] + 5)"
"df = df.with_columns(new_a=pl.col(\"a\") + 5)"
]
},
{
"cell_type": "markdown",
"id": "76aec162",
"metadata": {},
"source": [
"### Apply\n",
"### Expressions (Polars' Alternative to apply)\n",
"\n",
"What happens if you have a more complicated operation you want to perform? In pandas, you might reach for `apply()`. In **polars**, you almost never need an equivalent because its expression API is incredibly expressive.\n",
"\n",
"What happens if you have a more complicated function you want to iterate over? This is where **pandas**' `apply()` comes in, and can be used with assignment. `apply()` can also be used across rows or columns. Like `assign()`, it can be combined with a lambda function and used with either the whole data frame or just a column (in which case no need to specify `axis=`)."
"Most \"complicated\" operations can be expressed directly using **polars'** built-in expressions:"
]
},
{
Expand All @@ -604,42 +608,32 @@
"metadata": {},
"outputs": [],
"source": [
"df.apply(lambda x: x[\"a\"] - x[\"new_a\"].mean() * x[\"c\"] / x[\"b\"], axis=1)"
"# Don't do this (slow, row-wise)\n",
"mean_new_a = df.select(pl.col(\"new_a\").mean()).item()\n",
"df.with_columns(\n",
" result=pl.struct([\"a\", \"b\", \"c\"]).map_elements(\n",
" lambda x: x[\"a\"] - mean_new_a * x[\"c\"] / x[\"b\"], return_dtype=pl.Float64\n",
" )\n",
")\n",
"\n",
"# Do this instead (fast, vectorized)\n",
"df.with_columns(result=pl.col(\"a\") - pl.col(\"new_a\").mean() * pl.col(\"c\") / pl.col(\"b\"))"
]
},
{
"cell_type": "markdown",
"id": "78b558f4",
"metadata": {},
"source": [
"Note that this is just an example: you could still do this entire operation without using apply! But you will sometimes find yourself with cases where you do need to use it.\n",
"\n",
"Apply also works with functions, including user-defined functions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "136d435d",
"metadata": {},
"outputs": [],
"source": [
"def complicated_function(x):\n",
" return x - x.mean()\n",
"\n",
"\n",
"df = df.apply(complicated_function, axis=1)\n",
"df"
"The first expression would work, but it evaluates the computation row by row using a **python lambda**, which is slow and prevents **polars** from optimizing the query. The second approach uses native expressions, allowing **polars** to execute the computation efficiently in a fully vectorized and optimized manner."
]
},
{
"cell_type": "markdown",
"id": "171be2c9",
"metadata": {},
"source": [
"### Eval(uate)\n",
"\n",
"`eval()` evaluates a string describing operations on DataFrame columns to create new columns. It operates on columns only, not rows or elements. Here's an example:"
"In **polars**, there's no `eval()` — you use expressions directly instead:\n"
]
},
{
Expand All @@ -649,7 +643,7 @@
"metadata": {},
"outputs": [],
"source": [
"df[\"ratio\"] = df.eval(\"a / new_a\")\n",
"df = df.with_columns((pl.col(\"a\") / pl.col(\"new_a\")).alias(\"ratio\"))\n",
"df"
]
},
Expand All @@ -658,7 +652,18 @@
"id": "8b275b5b",
"metadata": {},
"source": [
"Evaluate can also be used to create new boolean columns using, for example, a string `\"a > 0.5\"` in the above example."
"You can also create boolean columns the same way:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f301c8cb",
"metadata": {},
"outputs": [],
"source": [
"df = df.with_columns((pl.col(\"a\") > 0.5).alias(\"a_gt_0.5\"))\n",
"df"
]
}
],
Expand Down Expand Up @@ -687,7 +692,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion joins.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion missing-values.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -596,7 +596,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion numbers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -792,7 +792,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion prerequisites.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion rectangling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion regex.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion spreadsheets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
2 changes: 1 addition & 1 deletion strings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1089,7 +1089,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
"version": "3.12.13"
},
"toc-showtags": true
},
Expand Down
Loading
Loading