diff --git a/notebooks/1_table_oriented.ipynb b/notebooks/1_table_oriented.ipynb
index 432f6c6..64a8dfe 100644
--- a/notebooks/1_table_oriented.ipynb
+++ b/notebooks/1_table_oriented.ipynb
@@ -134,7 +134,7 @@
"source": [
"A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the `data.frame` in R. \n",
"\n",
- "- The table has 3 columns, each of them with a column label. The column labels are respectively `Name`, `Age` and `Sex`.\n",
+ "- The table above has 3 columns, each of them with a column label. The column labels are `Name`, `Age` and `Sex`, respectively.\n",
"- The column `Name` consists of textual data with each value a string, the column `Age` are numbers and the column `Sex` is textual data.\n",
"\n",
"In spreadsheet software, the table representation of our data would look very similar:\n",
@@ -142,6 +142,17 @@
""
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " \n",
+ "__Note__: You probably do not want to manually input the data of a DataFrame! In most situations, data stored in a file format are the starting point of an analysis. We will get to that later!\n",
+ "\n",
+ "
"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -199,7 +210,7 @@
"source": [
"\n",
" \n",
- "If you are familiar to Python :ref:`dictionaries
`, the selection of a single column is very similar to selection of dictionary values based on the key.\n",
+ "If you are familiar to Python :ref:`dictionaries `, the selection of a single column is very similar to the selection of dictionary values based on the key.\n",
"\n",
" "
]
@@ -287,7 +298,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Or to the `Series`:"
+ "Or on the `Series`:"
]
},
{
@@ -314,7 +325,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionalities each of them a _method_ you can apply to a `DataFrame` or `Series`. As methods are functions, do not forget to use parentheses `()`."
+ "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionality for working with `DataFrame` or `Series`, often defined as methods on those objects. As methods are functions, do not forget to use parentheses `()`."
]
},
{
@@ -415,7 +426,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `Series`.\n",
+ "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `DataFrame`.\n",
"\n",
"\n",
"__To user guide:__ check more options on `describe` :ref:`basics.describe`"
@@ -438,10 +449,10 @@
"source": [
"## REMEMBER\n",
"\n",
- "- Import the package, aka `import Pandas as pd`\n",
+ "- Import the package, aka `import pandas as pd`\n",
"- A table of data is stored as a pandas `DataFrame`\n",
"- Each column in a `DataFrame` is a `Series`\n",
- "- You can do things by applying a method to a `DataFrame` or `Series`"
+ "- You can do things by calling a method on a `DataFrame` or `Series`"
]
},
{
@@ -472,5 +483,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}
diff --git a/notebooks/2_read_write.ipynb b/notebooks/2_read_write.ipynb
index e21c689..e693e7b 100644
--- a/notebooks/2_read_write.ipynb
+++ b/notebooks/2_read_write.ipynb
@@ -17,14 +17,14 @@
" \n",
"This tutorial uses the titanic data set, stored as CSV. The data consists of the following data columns:\n",
"\n",
- "- PassengerId: Id of every passenger.\n",
- "- Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.\n",
+ "- PassengerId: ID of every passenger.\n",
+ "- Survived: This feature has value 0 and 1. 0 for not survived and 1 for survived.\n",
"- Pclass: There are 3 classes: Class 1, Class 2 and Class 3.\n",
"- Name: Name of passenger.\n",
"- Sex: Gender of passenger.\n",
"- Age: Age of passenger.\n",
"- SibSp: Indication that passenger have siblings and spouse.\n",
- "- Parch: Whether a passenger is alone or have family.\n",
+ "- Parch: Whether a passenger is alone or has family.\n",
"- Ticket: Ticket number of passenger.\n",
"- Fare: Indicating the fare.\n",
"- Cabin: The cabin of passenger.\n",
@@ -561,7 +561,7 @@
"source": [
"\n",
" \n",
- "__Note__: Interested in the last N rows instead? Pandas also provides a `tail` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
+ "__Note__: Interested in the last N rows instead? Pandas also provides a `tail()` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
"\n",
"
"
]
@@ -570,7 +570,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "A check on how Pandas interpreted each of the column data types can be done by requesting the Pandas `dtypes` attribute:"
+ "A check on how Pandas interpreted each of the column data types can be done by requesting the `dtypes` attribute:"
]
},
{
@@ -643,7 +643,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Whereas `read_*` fucntions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
+ "Whereas `read_*` functions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
]
},
{
@@ -908,5 +908,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}
diff --git a/notebooks/3_subset_data.ipynb b/notebooks/3_subset_data.ipynb
index 03d1524..6c514bc 100644
--- a/notebooks/3_subset_data.ipynb
+++ b/notebooks/3_subset_data.ipynb
@@ -292,7 +292,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parantheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
+ "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parentheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
]
},
{
@@ -389,7 +389,12 @@
"\n",
"\n",
" \n",
- "__Note:__ The inner square brackets define a :ref:`Python list
` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame` as seen in the previous example.\n",
+ "__Note:__ The inner square brackets define a :ref:`Python list ` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame`. The previous example can therefore also be written as:\n",
+ "\n",
+ "```python\n",
+ "columns_to_select = [\"Age\", \"Sex\"]\n",
+ "titanic[columns_to_select]\n",
+ "```\n",
"\n",
" "
]
@@ -1020,7 +1025,7 @@
"source": [
"\n",
" \n",
- "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the `or` operator `|` and the `and` operator `&`.\n",
+ "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the \"or\" operator `|` and the \"and\" operator `&`.\n",
"\n",
"
"
]
@@ -1674,5 +1679,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}
diff --git a/notebooks/4_plotting.ipynb b/notebooks/4_plotting.ipynb
index 741cdb5..5b235a3 100644
--- a/notebooks/4_plotting.ipynb
+++ b/notebooks/4_plotting.ipynb
@@ -493,5 +493,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}