diff --git a/source/week-1/print/escape-sequences.py b/source/week-1/print/escape-sequences.py
new file mode 100644
index 0000000..6a0935f
--- /dev/null
+++ b/source/week-1/print/escape-sequences.py
@@ -0,0 +1,11 @@
+# \\: backslash
+print("A backslash looks like this \\ ")
+
+# \b: backspace
+print("Hide the s in this\b ")
+
+# \t: tab
+print("Name:\tMark")
+
+# \n: newline
+print("Line 1\nLine 2")
diff --git a/source/week-4/csv-files-jupyter/README.md b/source/week-4/csv-files-jupyter/README.md
deleted file mode 100644
index 7ec63e7..0000000
--- a/source/week-4/csv-files-jupyter/README.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# CSV Files and Jupyter Notebooks
-
-CSV files are comma separated variable file. CSV files are frequently used to store data. In order to access the data in a CSV file from a Jupyter Notebook you must upload the file.
-
-## Microsoft Learn Resources
-
-Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).
-
-- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)
diff --git a/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb b/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
index e1802f1..0c84909 100644
--- a/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
+++ b/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
@@ -1,35 +1,53 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
- "# pandas Series and DataFrame"
- ]
+ "# Pandas Series and DataFrame"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## pandas\n",
- "**pandas** is an open source library providing data structures and data analysis tools for Python programmers"
+ "**Pandas** is an open source library providing data structures and data analysis tools for Python programmers. \n",
+ "The pandas **Series** is a one dimensional array, similar to a Python list"
]
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 1,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: pandas in /home/tim/.local/lib/python3.8/site-packages (1.1.4)\n",
+ "Requirement already satisfied: pytz>=2017.2 in /usr/lib/python3/dist-packages (from pandas) (2019.3)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.3 in /home/tim/.local/lib/python3.8/site-packages (from pandas) (2.8.1)\n",
+ "Requirement already satisfied: numpy>=1.15.4 in /home/tim/.local/lib/python3.8/site-packages (from pandas) (1.19.4)\n",
+ "Requirement already satisfied: six>=1.5 in /home/tim/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.3.2; however, version 20.3.3 is available.\n",
+ "You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
+ ]
+ }
+ ],
"source": [
- "import pandas as pd"
+ "# install pandas\n",
+ "! pip install pandas"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": 2,
"metadata": {},
+ "outputs": [],
"source": [
- "## Series\n",
- "The pandas **Series** is a one dimensional array, similar to a Python list"
+ "# load pandas into notebook\n",
+ "import pandas as pd"
]
},
{
@@ -38,6 +56,7 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"0 Seattle-Tacoma\n",
@@ -50,9 +69,8 @@
"dtype: object"
]
},
- "execution_count": 3,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 3
}
],
"source": [
@@ -72,60 +90,42 @@
"airports"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can reference an individual value in a Series using it's index"
- ]
- },
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'London Heathrow'"
]
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
+ "# You can reference an individual value in a Series using it's index\n",
"airports[2]"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can use a loop to iterate through all the values in a Series"
- ]
- },
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
- "name": "stdout",
"output_type": "stream",
+ "name": "stdout",
"text": [
- "Seattle-Tacoma\n",
- "Dulles\n",
- "London Heathrow\n",
- "Schiphol\n",
- "Changi\n",
- "Pearson\n",
- "Narita\n"
+ "Seattle-Tacoma\nDulles\nLondon Heathrow\nSchiphol\nChangi\nPearson\nNarita\n"
]
}
],
"source": [
+ "# You can use a loop to iterate through all the values in a Series\n",
"for value in airports:\n",
" print(value) "
]
@@ -135,9 +135,9 @@
"metadata": {},
"source": [
"## DataFrame\n",
- "Most of the time when we are working with pandas we are dealing with two-dimensional arrays\n",
"\n",
- "The pandas **DataFrame** can store two dimensional arrays"
+ "Most of the time when we are working with pandas we are dealing with two-dimensional arrays. \n",
+ "The pandas **DataFrame** can store two dimensional arrays. "
]
},
{
@@ -146,78 +146,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -342,14 +195,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
London Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 7,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 7
}
],
"source": [
+ "# Use the **columns** parameter to specify names for the columns when you create the DataFrame\n",
"airports = pd.DataFrame([\n",
" ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
" ['Dulles', 'Washington', 'USA'],\n",
@@ -382,9 +236,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/intro-to-pandas/README.md b/source/week-4/intro-to-pandas/README.md
deleted file mode 100644
index cee037f..0000000
--- a/source/week-4/intro-to-pandas/README.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# pandas
-
-[pandas](https://pandas/pydata.org​) is an open source Python library contains a number of high performance data structures and tools for data analysis.
-
-## Documentation
-
-- [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) stores one dimensional arrays
-- [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) stores two dimensional arrays and can contain different datatypes
-
-## Microsoft Learn Resources
-
-Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).
-
-- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)
diff --git a/source/week-4/jupyter-notebooks/README.md b/source/week-4/jupyter-notebooks/README.md
deleted file mode 100644
index 363489c..0000000
--- a/source/week-4/jupyter-notebooks/README.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Jupyter Notebooks
-
-Jupyter Notebooks are an open source web application that allows you to create and share Python code. They are frequently used for data science. The code samples in this course are completed using Jupyter Notebooks which have a .ipynb file extension.
-
-## Documentation
-
-- [Jupyter](https://jupyter.org/) to install Jupyter so you can run Jupyter Notebooks locally on your computer
-- [Jupyter Notebook viewer](https://nbviewer.jupyter.org/) to view Jupyter Notebooks in this GitHub repository without installing Jupyter
-- [Azure Notebooks](https://notebooks.azure.com/) to create a free Azure Notebooks account to run Notebooks in the cloud
-- [Create and run a notebook](https://docs.microsoft.com/azure/notebooks/tutorial-create-run-jupyter-notebook?WT.mc_id=python-c9-niner) is a tutorial that walks you through the process of using Azure Notebooks to create a complete Jupyter Notebook that demonstrates linear regression
-- [How to create and clone projects](https://docs.microsoft.com/azure/notebooks/create-clone-jupyter-notebooks?WT.mc_id=python-c9-niner) to create a project
-- [Manage and configure projects in Azure Notebooks](https://docs.microsoft.com/azure/notebooks/configure-manage-azure-notebooks-projects?WT.mc_id=python-c9-niner) to upload Notebooks to your project
-
-## Microsoft Learn Resources
-
-Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).
-
-- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)
diff --git a/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb b/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
index 0f50838..7900e35 100644
--- a/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
+++ b/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
@@ -5,8 +5,8 @@
"metadata": {},
"source": [
"# Examining pandas DataFrame contents\n",
- "It's useful to be able to quickly examine the contents of a DataFrame. \n",
"\n",
+ "It's useful to be able to quickly examine the contents of a DataFrame. \n",
"Let's start by importing the pandas library and creating a DataFrame populated with information about airports"
]
},
@@ -25,78 +25,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -106,11 +36,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 2,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 2
}
],
"source": [
@@ -134,9 +64,10 @@
"metadata": {},
"source": [
"## Returning first *n* rows\n",
- "If you have thousands of rows, you might just want to look at the first few rows\n",
"\n",
- "* **head**(*n*) returns the top *n* rows "
+ "If you have thousands of rows, you might just want to look at the first few rows\n",
+ "- **head**(*n*) returns the top *n* rows\n",
+ "- by default *i* is 5"
]
},
{
@@ -145,68 +76,24 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
"1 Dulles Washington USA\n",
- "2 Heathrow London United Kingdom"
- ]
+ "2 Heathrow London United Kingdom\n",
+ "3 Schiphol Amsterdam Netherlands\n",
+ "4 Changi Singapore Singapore"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n \n
\n
"
},
- "execution_count": 3,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 3
}
],
"source": [
- "airports.head(3)"
+ "airports.head()"
]
},
{
@@ -214,8 +101,10 @@
"metadata": {},
"source": [
"## Returning last *n* rows\n",
+ "\n",
"Looking at the last rows in a DataFrame can be a good way to check that all your data loaded correctly\n",
- "* **tail**(*n*) returns the last *n* rows"
+ "- **tail**(*n*) returns the last *n* rows\n",
+ "- by default i is 5"
]
},
{
@@ -224,68 +113,24 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
- " Name City Country\n",
- "4 Changi Singapore Singapore\n",
- "5 Pearson Toronto Canada\n",
- "6 Narita Tokyo Japan"
- ]
+ " Name City Country\n",
+ "2 Heathrow London United Kingdom\n",
+ "3 Schiphol Amsterdam Netherlands\n",
+ "4 Changi Singapore Singapore\n",
+ "5 Pearson Toronto Canada\n",
+ "6 Narita Tokyo Japan"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
- "airports.tail(3)"
+ "airports.tail()"
]
},
{
@@ -293,9 +138,9 @@
"metadata": {},
"source": [
"## Checkign number of rows and columns in DataFrame\n",
- "Sometimes you just need to know how much data you have in the DataFrame\n",
"\n",
- "* **shape** returns the number of rows and columns"
+ "Sometimes you just need to know how much data you have in the DataFrame\n",
+ "- **shape** returns the number of rows and columns"
]
},
{
@@ -304,14 +149,14 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"(7, 3)"
]
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
@@ -322,14 +167,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Getting mroe detailed information about DataFrame contents\n",
- "\n",
- "* **info**() returns more detailed information about the DataFrame\n",
+ "## Getting detailed information about DataFrame contents\n",
"\n",
+ "**DataFrame.info**() returns more detailed information about the DataFrame \n",
"Information returned includes:\n",
- "* The number of rows, and the range of index values\n",
- "* The number of columns\n",
- "* For each column: column name, number of non-null values, the datatype\n"
+ "- The number of rows, and the range of index values\n",
+ "- The number of columns\n",
+ "- For each column: column name, number of non-null values, the datatype\n"
]
},
{
@@ -338,23 +182,55 @@
"metadata": {},
"outputs": [
{
- "name": "stdout",
"output_type": "stream",
+ "name": "stdout",
"text": [
- "\n",
- "RangeIndex: 7 entries, 0 to 6\n",
- "Data columns (total 3 columns):\n",
- "Name 7 non-null object\n",
- "City 7 non-null object\n",
- "Country 7 non-null object\n",
- "dtypes: object(3)\n",
- "memory usage: 148.0+ bytes\n"
+ "\nRangeIndex: 7 entries, 0 to 6\nData columns (total 3 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 Name 7 non-null object\n 1 City 7 non-null object\n 2 Country 7 non-null object\ndtypes: object(3)\nmemory usage: 296.0+ bytes\n"
]
}
],
"source": [
"airports.info()"
]
+ },
+ {
+ "source": [
+ "**DataFrame.describe()** returns statistical analyses about the DataFrame \n",
+ "Information returned might include:\n",
+ "- Count number of non-NA/null observations.\n",
+ "- Mean and Standard Deviation\n",
+ "- Minimum and Maximum values buy column\n",
+ "- Percentiles (25%, 50%, 75%)\n",
+ "\n",
+ "and many other values according to the DataFrame.\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name City Country\n",
+ "count 7 7 7\n",
+ "unique 7 7 6\n",
+ "top Changi Amsterdam USA\n",
+ "freq 1 1 2"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
count
\n
7
\n
7
\n
7
\n
\n
\n
unique
\n
7
\n
7
\n
6
\n
\n
\n
top
\n
Changi
\n
Amsterdam
\n
USA
\n
\n
\n
freq
\n
1
\n
1
\n
2
\n
\n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ],
+ "source": [
+ "airports.describe()"
+ ]
}
],
"metadata": {
@@ -373,9 +249,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/panda-dataframe-content/README.md b/source/week-4/panda-dataframe-content/README.md
deleted file mode 100644
index aa35bb3..0000000
--- a/source/week-4/panda-dataframe-content/README.md
+++ /dev/null
@@ -1,10 +0,0 @@
-# Examining pandas DataFrame contents
-
-The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data.
-
-## Common functions
-
-- [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) returns the first *n* rows from the DataFrame
-- [tail](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) returns the last *n* rows from the DataFrame
-- [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html) returns the dimensions of the DataFrame (e.g. number of rows and columns)
-- [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) provides a summary of the DataFrame content including column names, their datatypes, and number of rows containing non-null values
diff --git a/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb b/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
index 95e8021..0124d51 100644
--- a/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
+++ b/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
@@ -1,15 +1,14 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
"# Query a pandas DataFrame \n",
"\n",
- "Returning a portion of the data in a DataFrame is called slicing or dicing the data\n",
- "\n",
+ "Returning a portion of the data in a DataFrame is called *slicing* or *dicing* the data. \n",
"There are many different ways to query a pandas DataFrame, here are a few to get you started"
- ]
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "code",
@@ -22,82 +21,12 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 2,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
London Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -107,11 +36,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
"
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
@@ -268,33 +137,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Using *iloc* to specify rows and columns to return\n",
- "**iloc**[*rows*,*columns*] allows you to access a group of rows or columns by row and column index positions."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You specify the specific row and column you want returned\n",
- "* First row is row 0\n",
- "* First column is column 0"
+ "## Using *iloc*\n",
+ "\n",
+ "`iloc[row, column]` allows you to access a group of rows or columns by row and column index positions. \n",
+ "You specify the specific row and column you want returned:\n",
+ "- First row is row 0\n",
+ "- First column is column 0"
]
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 5,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'Seatte-Tacoma'"
]
},
- "execution_count": 7,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
@@ -304,18 +168,18 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 6,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'United Kingdom'"
]
},
- "execution_count": 8,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 6
}
],
"source": [
@@ -323,91 +187,14 @@
"airports.iloc[2,2]"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "A value of *:* returns all rows or all columns"
- ]
- },
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 7,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
London Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -417,14 +204,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
London Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 9,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 7
}
],
"source": [
+ "# Using : returns all rows or all columns\n",
"airports.iloc[:,:]"
]
},
@@ -433,66 +221,26 @@
"metadata": {},
"source": [
"You can request a range of rows or a range of columns\n",
- "* [x:y] will return rows or columns x through y"
+ "- `[x:y]` will return rows or columns x through y"
]
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 8,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
"1 Dulles Washington USA"
- ]
+ ],
+ "text/html": "
"
},
- "execution_count": 13,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 11
}
],
"source": [
@@ -804,9 +367,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/panda-dataframe-querry/README.md b/source/week-4/panda-dataframe-querry/README.md
deleted file mode 100644
index ea22708..0000000
--- a/source/week-4/panda-dataframe-querry/README.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Query a pandas DataFrame
-
-The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data.
-
-## Common properties
-
-- [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) returns specific rows and columns by specifying column names
-- [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) returns specific rows and columns by specifying column positions
-
-## Microsoft Learn Resources
-
-Explore related tutorials on [Microsoft Learn](https://learn.microsoft.com/?WT.mc_id=python-c9-niner).
-
-- [Intro to machine learning with Python and Azure Notebooks](https://docs.microsoft.com/learn/paths/intro-to-ml-with-python/?WT.mc_id=python-c9-niner)
diff --git a/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb b/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
index 9e164e3..99c81c6 100644
--- a/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
+++ b/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
@@ -23,18 +23,17 @@
"metadata": {},
"source": [
"## Reading a CSV file into a pandas DataFrame\n",
- "**read_csv** allows you to read the contents of a csv file into a DataFrame\n",
"\n",
- "airports.csv contains the following: \n",
+ "`read_csv` allows you to read the contents of a csv file into a DataFrame.\n",
"\n",
- "Name,City,Country \n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan"
+ "*airports.csv* contains the following:\n",
+ "\n",
+ ">Washington,USA \n",
+ ">Heathrow,London,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan"
]
},
{
@@ -43,97 +42,25 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
"1 Dulles Washington USA\n",
"2 Heathrow London United Kingdom\n",
"3 Schiphol Amsterdam Netherlands\n",
- "4 Changi Singapore Singapore\n",
- "5 Pearson Toronto Canada\n",
- "6 Narita Tokyo Japan"
- ]
+ "4 Changi Singapore Singapore"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n \n
\n
"
},
- "execution_count": 2,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 2
}
],
"source": [
- "airports_df = pd.read_csv('Data/airports.csv')\n",
- "airports_df"
+ "airports_df = pd.read_csv('./airports.csv')\n",
+ "airports_df.head()"
]
},
{
@@ -141,18 +68,17 @@
"metadata": {},
"source": [
"## Handling rows with errors\n",
- "By default rows with an extra , or other issues cause an error\n",
- "\n",
- "Note the extra , in the row for Heathrow London in airportsInvalidRows.csv: \n",
"\n",
- "Name,City,Country \n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan "
+ "By default rows with an extra , or other issues cause an error. \n",
+ "Note the extra , in the row for Heathrow London in `airportsInvalidRows.csv`: \n",
+ ">Name,City,Country \n",
+ ">Seattle-Tacoma,Seattle,USA \n",
+ ">Dulles,Washington,USA \n",
+ ">Heathrow,London,,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan "
]
},
{
@@ -161,23 +87,21 @@
"metadata": {},
"outputs": [
{
- "ename": "ParserError",
- "evalue": "Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n",
"output_type": "error",
+ "ename": "FileNotFoundError",
+ "evalue": "[Errno 2] No such file or directory: 'Data/airportsInvalidRows.csv'",
"traceback": [
- "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[1;31mParserError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mairports_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[0;32m 683\u001b[0m )\n\u001b[0;32m 684\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 685\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 686\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 687\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 461\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 462\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 463\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 464\u001b[0m \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 465\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 1152\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1153\u001b[0m \u001b[0mnrows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nrows\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1154\u001b[1;33m \u001b[0mret\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1155\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1156\u001b[0m \u001b[1;31m# May alter columns / col_dict\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 2046\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2047\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2048\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2049\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2050\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n"
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mairports_df\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 686\u001b[0m )\n\u001b[1;32m 687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 688\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 689\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 690\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 452\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 454\u001b[0;31m \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp_or_buf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 455\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 456\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 946\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 947\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 948\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 949\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 950\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m\n\u001b[1;32m 1178\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"c\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1179\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"c\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1180\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCParserWrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1181\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1182\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"python\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 2008\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"usecols\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0musecols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2009\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2010\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTextReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2011\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2012\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.__cinit__\u001b[0;34m()\u001b[0m\n",
+ "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._setup_parser_source\u001b[0;34m()\u001b[0m\n",
+ "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'Data/airportsInvalidRows.csv'"
]
}
],
@@ -190,7 +114,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Specify **error_bad_lines=False** to skip any rows with errors"
+ "Specify `error_bad_lines=False` to skip any rows with errors"
]
},
{
@@ -199,79 +123,15 @@
"metadata": {},
"outputs": [
{
- "name": "stderr",
"output_type": "stream",
+ "name": "stderr",
"text": [
"b'Skipping line 4: expected 3 fields, saw 4\\n'\n"
]
},
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -280,38 +140,34 @@
"3 Changi Singapore Singapore\n",
"4 Pearson Toronto Canada\n",
"5 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
3
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
4
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
5
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
- "airports_df = pd.read_csv(\n",
- " 'Data/airportsInvalidRows.csv', \n",
- " error_bad_lines=False\n",
- " )\n",
+ "airports_df = pd.read_csv('./airportsInvalidRows.csv', error_bad_lines=False)\n",
"airports_df"
]
},
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
"## Handling files which do not contain column headers\n",
- "If your file does not have the column headers in the first row by default, the first row of data is treated as headers\n",
"\n",
- "airportsNoHeaderRows.csv contains airport data but does not have a row specifying the column headers:\n",
- "\n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan "
- ]
+ "If your file does not have the column headers in the first row by default, the first row of data is treated as headers. \n",
+ "`airportsNoHeaderRows.csv` contains airport data but does not have a row specifying the column headers:\n",
+ ">Seattle-Tacoma,Seattle,USA \n",
+ ">Dulles,Washington,USA \n",
+ ">Heathrow,London,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan "
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "code",
@@ -319,72 +175,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
- "airports_df = pd.read_csv('Data/airportsNoHeaderRows.csv')\n",
+ "airports_df = pd.read_csv('./airportsNoHeaderRows.csv')\n",
"airports_df"
]
},
@@ -409,7 +201,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Specify **header=None** if you do not have a Header row to avoid having the first row of data treated as a header row"
+ "Specify `header=None` if you do not have a Header row to avoid having the first row of data treated as a header row"
]
},
{
@@ -418,78 +210,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -730,15 +304,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
NaN
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 8,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 8
}
],
"source": [
- "airports_df = pd.read_csv('Data/airportsBlankValues.csv')\n",
+ "airports_df = pd.read_csv('./airportsBlankValues.csv')\n",
"airports_df"
]
},
@@ -747,7 +321,8 @@
"metadata": {},
"source": [
"## Writing DataFrame contents to a CSV file\n",
- "**to_csv** will write the contents of a pandas DataFrame to a CSV file"
+ "\n",
+ "`to_csv` will write the contents of a pandas DataFrame to a CSV file."
]
},
{
@@ -756,78 +331,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
NaN
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -837,11 +342,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
NaN
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 9,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 9
}
],
"source": [
@@ -854,16 +359,15 @@
"metadata": {},
"outputs": [],
"source": [
- "airports_df.to_csv('Data/MyNewCSVFile.csv')"
+ "airports_df.to_csv('./MyNewCSVFile.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "The index column is written to the csv file\n",
- "\n",
- "Specify **index=False** if you do not want the index column to be included in the csv file"
+ "The index column is written to the csv file. \n",
+ "Specify `index=False` if you do not want the index column to be included in the csv file."
]
},
{
@@ -872,10 +376,7 @@
"metadata": {},
"outputs": [],
"source": [
- "airports_df.to_csv(\n",
- " 'Data/MyNewCSVFileNoIndex.csv', \n",
- " index=False\n",
- " )"
+ "airports_df.to_csv('./MyNewCSVFileNoIndex.csv', index=False)"
]
}
],
@@ -895,9 +396,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv b/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv
new file mode 100644
index 0000000..fb9d05b
--- /dev/null
+++ b/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv
@@ -0,0 +1,8 @@
+,Name,City,Country
+0,Seattle-Tacoma,Seattle,USA
+1,Dulles,Washington,USA
+2,Heathrow,London,United Kingdom
+3,Schiphol,,Netherlands
+4,Changi,Singapore,Singapore
+5,Pearson,Toronto,Canada
+6,Narita,Tokyo,Japan
diff --git a/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv b/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv
new file mode 100644
index 0000000..19ff4c4
--- /dev/null
+++ b/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv
@@ -0,0 +1,8 @@
+Name,City,Country
+Seattle-Tacoma,Seattle,USA
+Dulles,Washington,USA
+Heathrow,London,United Kingdom
+Schiphol,,Netherlands
+Changi,Singapore,Singapore
+Pearson,Toronto,Canada
+Narita,Tokyo,Japan
diff --git a/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb b/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
index b5e35ca..a9b1818 100644
--- a/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
+++ b/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
@@ -2,29 +2,17 @@
"cells": [
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
- "# Visualizing data with matplotlib"
- ]
- },
- {
- "cell_type": "markdown",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "Somtimes graphs provide the best way to visualize data\n",
- "\n",
- "The **matplotlib** library allows you to draw graphs to help with visualization\n",
+ "# Visualizing data with matplotlib\n",
"\n",
- "If we want to visualize data, we will need to load some data into a DataFrame"
+ "Somtimes graphs provide the best way to visualize data. The **matplotlib** library allows you to draw graphs to help with visualization. \n",
+ "If we want to visualize data, we will need to load some data into a DataFrame first."
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@@ -33,26 +21,24 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Load our data from the csv file\n",
- "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') "
+ "delays_df = pd.read_csv('./Lots_of_flight_data.csv') "
]
},
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
"In order to display plots we need to import the **matplotlib** library"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
@@ -61,30 +47,50 @@
},
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
- "A common plot used in data science is the scatter plot for checking the relationship between two columns\n",
- "If you see dots scattered everywhere, there is no correlation between the two columns\n",
- "If you see somethign resembling a line, there is a correlation between the two columns\n",
+ "A common plot used in data science is the scatter plot for checking the relationship between two columns. \n",
+ "If you see dots scattered everywhere, there is no correlation between the two columns. \n",
+ "If you see somethign resembling a line, there is a correlation between the two columns. \n",
"\n",
"You can use the plot method of the DataFrame to draw the scatter plot\n",
- "* kind - the type of graph to draw\n",
- "* x - value to plot as x\n",
- "* y - value to plot as y\n",
- "* color - color to use for the graph points\n",
- "* alpha - opacity - useful to show density of points in a scatter plot\n",
- "* title - title of the graph"
+ "- `kind` - the type of graph to draw\n",
+ "- `x` - value to plot as x\n",
+ "- `y` - value to plot as y\n",
+ "- `color` - color to use for the graph points\n",
+ "- `alpha` - opacity - useful to show density of points in a scatter plot\n",
+ "- `title` - title of the graph"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Slice the DataFrame to reduce the visual processing time and reduce the data points for visibility\n",
+ "delays_df = delays_df.iloc[0:1000, :]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
"metadata": {
"scrolled": true
},
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n