diff --git a/challenges/week_1/Readme.md b/challenges/week_1/Readme.md
new file mode 100644
index 0000000..430ab42
--- /dev/null
+++ b/challenges/week_1/Readme.md
@@ -0,0 +1,31 @@
+# Bus Fare Challenge - Week 1
+
+## Challenge
+
+Write a program that does the following:
+
+1. gets today's date and stores it in a variable `'date'`
+2. uses today's date to get the name on the day of the week written in short form with the first letter capitalized eg. `'Fri'` if today were Friday and assigns it a variable `'day'`
+3. uses if statements to determine the today's fare following these bus fare schedule:
+
+ - monday - friday --> 100
+ - saturday --> 60
+ - sunday --> 80
+4. Prints the results in this format
+ >Date: 2021-01-05
+ >Day: Tue
+ >Fare: 100
+
+## Evaluation
+
+Run the [Checker](checker.py) file to evaluate your code. If all tests pass, your code will be the correct solution. If any fail, check the error messages and make changes to your code and repeat.
+
+## Note
+
+1. Your solution must be written in the [bus_fare_challenge](bus_fare_challenge.py) file and its name should never be changed.
+2. Your program must make use of the following variable names:
+ - `'date'`
+ - `'day'`
+ - `'fare'`
+*Failure to which all your tests will fail.*
+3. The **Checker** file should **never** be **altered** at any cost.
diff --git a/challenges/week_1/bus_fare_challenge.py b/challenges/week_1/bus_fare_challenge.py
new file mode 100644
index 0000000..0b90aa0
--- /dev/null
+++ b/challenges/week_1/bus_fare_challenge.py
@@ -0,0 +1 @@
+# WRITE YOUR CODE SOLUTION HERE
diff --git a/challenges/week_1/checker.py b/challenges/week_1/checker.py
new file mode 100644
index 0000000..50020fc
--- /dev/null
+++ b/challenges/week_1/checker.py
@@ -0,0 +1,55 @@
+import bus_fare_challenge
+import datetime
+import unittest
+
+
+class TestBusFareChallenge(unittest.TestCase):
+ def setUp(self) -> None:
+ self.date = datetime.datetime.now().date()
+ self.day = self.date.strftime("%a")
+ self.charts = {
+ "Mon": 100,
+ "Tue": 100,
+ "Wed": 100,
+ "Thu": 100,
+ "Thu": 100,
+ "Sat": 60,
+ "Sun": 80,
+ }
+
+ def test_date(self) -> None:
+ """
+ Tests whether the date returned by the program is correct.
+ """
+ actual = self.date
+ given = bus_fare_challenge.date
+ self.assertEqual(actual, given, f"Today's date is Wrong by {given - actual}!")
+
+ def test_day(self) -> None:
+ """
+ Tests whether the day returned by the program is correct.
+ """
+ actual = self.day
+ given = bus_fare_challenge.day
+ self.assertEqual(
+ actual, given, f"Today is wrong, expexted {actual} but got {given}!"
+ )
+
+ def test_fare(self) -> None:
+ """
+ Tests whether the fare returned by the program is correct.
+ """
+ actual = self.charts[self.day]
+ given = bus_fare_challenge.fare
+ self.assertEqual(
+ actual, given, f"Fare is wrong, expected {actual} but got {given}!"
+ )
+
+
+if __name__ == "__main__":
+ print("=========================================================================")
+ print("=========================================================================")
+ print("===== Start: Checking Return Values For Today's Date, Day and Fare =====")
+ unittest.main(exit=False)
+ print("===== End: Checking Return Values For Today's Date, Day and Fare =======")
+ print("=========================================================================")
diff --git a/source/week-4/csv-files-jupyter/README.md b/source/week-4/csv-files-jupyter/README.md
index 7ec63e7..56c22d1 100644
--- a/source/week-4/csv-files-jupyter/README.md
+++ b/source/week-4/csv-files-jupyter/README.md
@@ -1,6 +1,27 @@
# CSV Files and Jupyter Notebooks
-CSV files are comma separated variable file. CSV files are frequently used to store data. In order to access the data in a CSV file from a Jupyter Notebook you must upload the file.
+The so-called **CSV** (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. The CSV format was used for many years prior to attempts to describe the format in a standardized way.
+
+Python has an in-built csv module which implements classes to read and write tabular data in CSV format.
+
+```python
+# format example
+>>> import csv
+>>> with open('./airports.csv') as file:
+... data = csv.reader(file)
+... for row in data:
+... print(*row) # * is used to unpack lists
+Name City Country
+Seattle-Tacoma Seattle USA
+Dulles Washington USA
+Heathrow London United Kingdom
+Schiphol Amsterdam Netherlands
+Changi Singapore Singapore
+Pearson Toronto Canada
+Narita Tokyo Japan
+```
+
+A this module has a lot more features, checkout [more details](https://docs.python.org/3/library/csv.html).
## Microsoft Learn Resources
diff --git a/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb b/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
index e1802f1..0c84909 100644
--- a/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
+++ b/source/week-4/intro-to-pandas/03 - Pandas Series and DataFrame.ipynb
@@ -1,35 +1,53 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
- "# pandas Series and DataFrame"
- ]
+ "# Pandas Series and DataFrame"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## pandas\n",
- "**pandas** is an open source library providing data structures and data analysis tools for Python programmers"
+ "**Pandas** is an open source library providing data structures and data analysis tools for Python programmers. \n",
+ "The pandas **Series** is a one dimensional array, similar to a Python list"
]
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 1,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: pandas in /home/tim/.local/lib/python3.8/site-packages (1.1.4)\n",
+ "Requirement already satisfied: pytz>=2017.2 in /usr/lib/python3/dist-packages (from pandas) (2019.3)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.3 in /home/tim/.local/lib/python3.8/site-packages (from pandas) (2.8.1)\n",
+ "Requirement already satisfied: numpy>=1.15.4 in /home/tim/.local/lib/python3.8/site-packages (from pandas) (1.19.4)\n",
+ "Requirement already satisfied: six>=1.5 in /home/tim/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.3.2; however, version 20.3.3 is available.\n",
+ "You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
+ ]
+ }
+ ],
"source": [
- "import pandas as pd"
+ "# install pandas\n",
+ "! pip install pandas"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": 2,
"metadata": {},
+ "outputs": [],
"source": [
- "## Series\n",
- "The pandas **Series** is a one dimensional array, similar to a Python list"
+ "# load pandas into notebook\n",
+ "import pandas as pd"
]
},
{
@@ -38,6 +56,7 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"0 Seattle-Tacoma\n",
@@ -50,9 +69,8 @@
"dtype: object"
]
},
- "execution_count": 3,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 3
}
],
"source": [
@@ -72,60 +90,42 @@
"airports"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can reference an individual value in a Series using it's index"
- ]
- },
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'London Heathrow'"
]
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
+ "# You can reference an individual value in a Series using it's index\n",
"airports[2]"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can use a loop to iterate through all the values in a Series"
- ]
- },
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
- "name": "stdout",
"output_type": "stream",
+ "name": "stdout",
"text": [
- "Seattle-Tacoma\n",
- "Dulles\n",
- "London Heathrow\n",
- "Schiphol\n",
- "Changi\n",
- "Pearson\n",
- "Narita\n"
+ "Seattle-Tacoma\nDulles\nLondon Heathrow\nSchiphol\nChangi\nPearson\nNarita\n"
]
}
],
"source": [
+ "# You can use a loop to iterate through all the values in a Series\n",
"for value in airports:\n",
" print(value) "
]
@@ -135,9 +135,9 @@
"metadata": {},
"source": [
"## DataFrame\n",
- "Most of the time when we are working with pandas we are dealing with two-dimensional arrays\n",
"\n",
- "The pandas **DataFrame** can store two dimensional arrays"
+ "Most of the time when we are working with pandas we are dealing with two-dimensional arrays. \n",
+ "The pandas **DataFrame** can store two dimensional arrays. "
]
},
{
@@ -146,78 +146,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -342,14 +195,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
London Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 7,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 7
}
],
"source": [
+ "# Use the **columns** parameter to specify names for the columns when you create the DataFrame\n",
"airports = pd.DataFrame([\n",
" ['Seatte-Tacoma', 'Seattle', 'USA'],\n",
" ['Dulles', 'Washington', 'USA'],\n",
@@ -382,9 +236,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/intro-to-pandas/README.md b/source/week-4/intro-to-pandas/README.md
index cee037f..2184315 100644
--- a/source/week-4/intro-to-pandas/README.md
+++ b/source/week-4/intro-to-pandas/README.md
@@ -1,6 +1,6 @@
-# pandas
+# Pandas
-[pandas](https://pandas/pydata.org) is an open source Python library contains a number of high performance data structures and tools for data analysis.
+[Pandas](https://pandas/pydata.org) is an open source Python library contains a number of high performance data structures and tools for data analysis.
## Documentation
diff --git a/source/week-4/jupyter-notebooks/README.md b/source/week-4/jupyter-notebooks/README.md
index 363489c..3581879 100644
--- a/source/week-4/jupyter-notebooks/README.md
+++ b/source/week-4/jupyter-notebooks/README.md
@@ -2,6 +2,8 @@
Jupyter Notebooks are an open source web application that allows you to create and share Python code. They are frequently used for data science. The code samples in this course are completed using Jupyter Notebooks which have a .ipynb file extension.
+
+
## Documentation
- [Jupyter](https://jupyter.org/) to install Jupyter so you can run Jupyter Notebooks locally on your computer
diff --git a/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb b/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
index 0f50838..7900e35 100644
--- a/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
+++ b/source/week-4/panda-dataframe-content/04 - Exploring pandas DataFrame contents.ipynb
@@ -5,8 +5,8 @@
"metadata": {},
"source": [
"# Examining pandas DataFrame contents\n",
- "It's useful to be able to quickly examine the contents of a DataFrame. \n",
"\n",
+ "It's useful to be able to quickly examine the contents of a DataFrame. \n",
"Let's start by importing the pandas library and creating a DataFrame populated with information about airports"
]
},
@@ -25,78 +25,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -106,11 +36,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 2,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 2
}
],
"source": [
@@ -134,9 +64,10 @@
"metadata": {},
"source": [
"## Returning first *n* rows\n",
- "If you have thousands of rows, you might just want to look at the first few rows\n",
"\n",
- "* **head**(*n*) returns the top *n* rows "
+ "If you have thousands of rows, you might just want to look at the first few rows\n",
+ "- **head**(*n*) returns the top *n* rows\n",
+ "- by default *i* is 5"
]
},
{
@@ -145,68 +76,24 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
"1 Dulles Washington USA\n",
- "2 Heathrow London United Kingdom"
- ]
+ "2 Heathrow London United Kingdom\n",
+ "3 Schiphol Amsterdam Netherlands\n",
+ "4 Changi Singapore Singapore"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n \n
\n
"
},
- "execution_count": 3,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 3
}
],
"source": [
- "airports.head(3)"
+ "airports.head()"
]
},
{
@@ -214,8 +101,10 @@
"metadata": {},
"source": [
"## Returning last *n* rows\n",
+ "\n",
"Looking at the last rows in a DataFrame can be a good way to check that all your data loaded correctly\n",
- "* **tail**(*n*) returns the last *n* rows"
+ "- **tail**(*n*) returns the last *n* rows\n",
+ "- by default i is 5"
]
},
{
@@ -224,68 +113,24 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
- " Name City Country\n",
- "4 Changi Singapore Singapore\n",
- "5 Pearson Toronto Canada\n",
- "6 Narita Tokyo Japan"
- ]
+ " Name City Country\n",
+ "2 Heathrow London United Kingdom\n",
+ "3 Schiphol Amsterdam Netherlands\n",
+ "4 Changi Singapore Singapore\n",
+ "5 Pearson Toronto Canada\n",
+ "6 Narita Tokyo Japan"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
- "airports.tail(3)"
+ "airports.tail()"
]
},
{
@@ -293,9 +138,9 @@
"metadata": {},
"source": [
"## Checkign number of rows and columns in DataFrame\n",
- "Sometimes you just need to know how much data you have in the DataFrame\n",
"\n",
- "* **shape** returns the number of rows and columns"
+ "Sometimes you just need to know how much data you have in the DataFrame\n",
+ "- **shape** returns the number of rows and columns"
]
},
{
@@ -304,14 +149,14 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"(7, 3)"
]
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
@@ -322,14 +167,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Getting mroe detailed information about DataFrame contents\n",
- "\n",
- "* **info**() returns more detailed information about the DataFrame\n",
+ "## Getting detailed information about DataFrame contents\n",
"\n",
+ "**DataFrame.info**() returns more detailed information about the DataFrame \n",
"Information returned includes:\n",
- "* The number of rows, and the range of index values\n",
- "* The number of columns\n",
- "* For each column: column name, number of non-null values, the datatype\n"
+ "- The number of rows, and the range of index values\n",
+ "- The number of columns\n",
+ "- For each column: column name, number of non-null values, the datatype\n"
]
},
{
@@ -338,23 +182,55 @@
"metadata": {},
"outputs": [
{
- "name": "stdout",
"output_type": "stream",
+ "name": "stdout",
"text": [
- "\n",
- "RangeIndex: 7 entries, 0 to 6\n",
- "Data columns (total 3 columns):\n",
- "Name 7 non-null object\n",
- "City 7 non-null object\n",
- "Country 7 non-null object\n",
- "dtypes: object(3)\n",
- "memory usage: 148.0+ bytes\n"
+ "\nRangeIndex: 7 entries, 0 to 6\nData columns (total 3 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 Name 7 non-null object\n 1 City 7 non-null object\n 2 Country 7 non-null object\ndtypes: object(3)\nmemory usage: 296.0+ bytes\n"
]
}
],
"source": [
"airports.info()"
]
+ },
+ {
+ "source": [
+ "**DataFrame.describe()** returns statistical analyses about the DataFrame \n",
+ "Information returned might include:\n",
+ "- Count number of non-NA/null observations.\n",
+ "- Mean and Standard Deviation\n",
+ "- Minimum and Maximum values buy column\n",
+ "- Percentiles (25%, 50%, 75%)\n",
+ "\n",
+ "and many other values according to the DataFrame.\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name City Country\n",
+ "count 7 7 7\n",
+ "unique 7 7 6\n",
+ "top Changi Amsterdam USA\n",
+ "freq 1 1 2"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
count
\n
7
\n
7
\n
7
\n
\n
\n
unique
\n
7
\n
7
\n
6
\n
\n
\n
top
\n
Changi
\n
Amsterdam
\n
USA
\n
\n
\n
freq
\n
1
\n
1
\n
2
\n
\n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ],
+ "source": [
+ "airports.describe()"
+ ]
}
],
"metadata": {
@@ -373,9 +249,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/panda-dataframe-content/README.md b/source/week-4/panda-dataframe-content/README.md
index aa35bb3..2662b3c 100644
--- a/source/week-4/panda-dataframe-content/README.md
+++ b/source/week-4/panda-dataframe-content/README.md
@@ -5,6 +5,7 @@ The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/ap
## Common functions
- [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) returns the first *n* rows from the DataFrame
+- [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) provides a summary of the DataFrame content including column names, their datatypes, and number of rows containing non-null values
+- [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) Generate descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding *NaN* values
- [tail](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) returns the last *n* rows from the DataFrame
- [shape](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html) returns the dimensions of the DataFrame (e.g. number of rows and columns)
-- [info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) provides a summary of the DataFrame content including column names, their datatypes, and number of rows containing non-null values
diff --git a/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb b/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
index 95e8021..0124d51 100644
--- a/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
+++ b/source/week-4/panda-dataframe-querry/05 - Querying DataFrames.ipynb
@@ -1,15 +1,14 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
"# Query a pandas DataFrame \n",
"\n",
- "Returning a portion of the data in a DataFrame is called slicing or dicing the data\n",
- "\n",
+ "Returning a portion of the data in a DataFrame is called *slicing* or *dicing* the data. \n",
"There are many different ways to query a pandas DataFrame, here are a few to get you started"
- ]
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "code",
@@ -22,82 +21,12 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 2,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
London Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -107,11 +36,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
"
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
@@ -268,33 +137,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Using *iloc* to specify rows and columns to return\n",
- "**iloc**[*rows*,*columns*] allows you to access a group of rows or columns by row and column index positions."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You specify the specific row and column you want returned\n",
- "* First row is row 0\n",
- "* First column is column 0"
+ "## Using *iloc*\n",
+ "\n",
+ "`iloc[row, column]` allows you to access a group of rows or columns by row and column index positions. \n",
+ "You specify the specific row and column you want returned:\n",
+ "- First row is row 0\n",
+ "- First column is column 0"
]
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 5,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'Seatte-Tacoma'"
]
},
- "execution_count": 7,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
@@ -304,18 +168,18 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 6,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
"text/plain": [
"'United Kingdom'"
]
},
- "execution_count": 8,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 6
}
],
"source": [
@@ -323,91 +187,14 @@
"airports.iloc[2,2]"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "A value of *:* returns all rows or all columns"
- ]
- },
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 7,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
London Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
@@ -417,14 +204,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seatte-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
London Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 9,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 7
}
],
"source": [
+ "# Using : returns all rows or all columns\n",
"airports.iloc[:,:]"
]
},
@@ -433,66 +221,26 @@
"metadata": {},
"source": [
"You can request a range of rows or a range of columns\n",
- "* [x:y] will return rows or columns x through y"
+ "- `[x:y]` will return rows or columns x through y"
]
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 8,
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seatte-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seatte-Tacoma Seattle USA\n",
"1 Dulles Washington USA"
- ]
+ ],
+ "text/html": "
"
},
- "execution_count": 13,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 11
}
],
"source": [
@@ -804,9 +367,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/panda-dataframe-querry/README.md b/source/week-4/panda-dataframe-querry/README.md
index ea22708..2f12810 100644
--- a/source/week-4/panda-dataframe-querry/README.md
+++ b/source/week-4/panda-dataframe-querry/README.md
@@ -1,11 +1,11 @@
# Query a pandas DataFrame
-The pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data.
+The Pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a structure for storing two-dimensional tabular data.
## Common properties
- [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) returns specific rows and columns by specifying column names
-- [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) returns specific rows and columns by specifying column positions
+- [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) returns specific rows and columns by specifying column positions(index)
## Microsoft Learn Resources
diff --git a/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb b/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
index 9e164e3..99c81c6 100644
--- a/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
+++ b/source/week-4/read-write-csv-pandas/07 - Read write CSV files.ipynb
@@ -23,18 +23,17 @@
"metadata": {},
"source": [
"## Reading a CSV file into a pandas DataFrame\n",
- "**read_csv** allows you to read the contents of a csv file into a DataFrame\n",
"\n",
- "airports.csv contains the following: \n",
+ "`read_csv` allows you to read the contents of a csv file into a DataFrame.\n",
"\n",
- "Name,City,Country \n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan"
+ "*airports.csv* contains the following:\n",
+ "\n",
+ ">Washington,USA \n",
+ ">Heathrow,London,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan"
]
},
{
@@ -43,97 +42,25 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
"1 Dulles Washington USA\n",
"2 Heathrow London United Kingdom\n",
"3 Schiphol Amsterdam Netherlands\n",
- "4 Changi Singapore Singapore\n",
- "5 Pearson Toronto Canada\n",
- "6 Narita Tokyo Japan"
- ]
+ "4 Changi Singapore Singapore"
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n \n
\n
"
},
- "execution_count": 2,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 2
}
],
"source": [
- "airports_df = pd.read_csv('Data/airports.csv')\n",
- "airports_df"
+ "airports_df = pd.read_csv('./airports.csv')\n",
+ "airports_df.head()"
]
},
{
@@ -141,18 +68,17 @@
"metadata": {},
"source": [
"## Handling rows with errors\n",
- "By default rows with an extra , or other issues cause an error\n",
- "\n",
- "Note the extra , in the row for Heathrow London in airportsInvalidRows.csv: \n",
"\n",
- "Name,City,Country \n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan "
+ "By default rows with an extra , or other issues cause an error. \n",
+ "Note the extra , in the row for Heathrow London in `airportsInvalidRows.csv`: \n",
+ ">Name,City,Country \n",
+ ">Seattle-Tacoma,Seattle,USA \n",
+ ">Dulles,Washington,USA \n",
+ ">Heathrow,London,,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan "
]
},
{
@@ -161,23 +87,21 @@
"metadata": {},
"outputs": [
{
- "ename": "ParserError",
- "evalue": "Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n",
"output_type": "error",
+ "ename": "FileNotFoundError",
+ "evalue": "[Errno 2] No such file or directory: 'Data/airportsInvalidRows.csv'",
"traceback": [
- "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[1;31mParserError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mairports_df\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mparser_f\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[0;32m 683\u001b[0m )\n\u001b[0;32m 684\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 685\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 686\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 687\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 461\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 462\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 463\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 464\u001b[0m \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 465\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 1152\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1153\u001b[0m \u001b[0mnrows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nrows\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1154\u001b[1;33m \u001b[0mret\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1155\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1156\u001b[0m \u001b[1;31m# May alter columns / col_dict\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32m~\\Anaconda3\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 2046\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2047\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2048\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2049\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2050\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n",
- "\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4\n"
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mairports_df\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Data/airportsInvalidRows.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mairports_df\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[1;32m 686\u001b[0m )\n\u001b[1;32m 687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 688\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 689\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 690\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 452\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 453\u001b[0m \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 454\u001b[0;31m \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp_or_buf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 455\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 456\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 946\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 947\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 948\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 949\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 950\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m\n\u001b[1;32m 1178\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"c\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1179\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"c\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1180\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCParserWrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1181\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1182\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"python\"\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m~/.local/lib/python3.8/site-packages/pandas/io/parsers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 2008\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"usecols\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0musecols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2009\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2010\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTextReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2011\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munnamed_cols\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2012\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.__cinit__\u001b[0;34m()\u001b[0m\n",
+ "\u001b[0;32mpandas/_libs/parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._setup_parser_source\u001b[0;34m()\u001b[0m\n",
+ "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'Data/airportsInvalidRows.csv'"
]
}
],
@@ -190,7 +114,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Specify **error_bad_lines=False** to skip any rows with errors"
+ "Specify `error_bad_lines=False` to skip any rows with errors"
]
},
{
@@ -199,79 +123,15 @@
"metadata": {},
"outputs": [
{
- "name": "stderr",
"output_type": "stream",
+ "name": "stderr",
"text": [
"b'Skipping line 4: expected 3 fields, saw 4\\n'\n"
]
},
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Schiphol
\n",
- "
Amsterdam
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -280,38 +140,34 @@
"3 Changi Singapore Singapore\n",
"4 Pearson Toronto Canada\n",
"5 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Schiphol
\n
Amsterdam
\n
Netherlands
\n
\n
\n
3
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
4
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
5
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 4,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 4
}
],
"source": [
- "airports_df = pd.read_csv(\n",
- " 'Data/airportsInvalidRows.csv', \n",
- " error_bad_lines=False\n",
- " )\n",
+ "airports_df = pd.read_csv('./airportsInvalidRows.csv', error_bad_lines=False)\n",
"airports_df"
]
},
{
- "cell_type": "markdown",
- "metadata": {},
"source": [
"## Handling files which do not contain column headers\n",
- "If your file does not have the column headers in the first row by default, the first row of data is treated as headers\n",
"\n",
- "airportsNoHeaderRows.csv contains airport data but does not have a row specifying the column headers:\n",
- "\n",
- "Seattle-Tacoma,Seattle,USA \n",
- "Dulles,Washington,USA \n",
- "Heathrow,London,United Kingdom \n",
- "Schiphol,Amsterdam,Netherlands \n",
- "Changi,Singapore,Singapore \n",
- "Pearson,Toronto,Canada \n",
- "Narita,Tokyo,Japan "
- ]
+ "If your file does not have the column headers in the first row by default, the first row of data is treated as headers. \n",
+ "`airportsNoHeaderRows.csv` contains airport data but does not have a row specifying the column headers:\n",
+ ">Seattle-Tacoma,Seattle,USA \n",
+ ">Dulles,Washington,USA \n",
+ ">Heathrow,London,United Kingdom \n",
+ ">Schiphol,Amsterdam,Netherlands \n",
+ ">Changi,Singapore,Singapore \n",
+ ">Pearson,Toronto,Canada \n",
+ ">Narita,Tokyo,Japan "
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
},
{
"cell_type": "code",
@@ -319,72 +175,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
},
- "execution_count": 5,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 5
}
],
"source": [
- "airports_df = pd.read_csv('Data/airportsNoHeaderRows.csv')\n",
+ "airports_df = pd.read_csv('./airportsNoHeaderRows.csv')\n",
"airports_df"
]
},
@@ -409,7 +201,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Specify **header=None** if you do not have a Header row to avoid having the first row of data treated as a header row"
+ "Specify `header=None` if you do not have a Header row to avoid having the first row of data treated as a header row"
]
},
{
@@ -418,78 +210,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -730,15 +304,15 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
NaN
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 8,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 8
}
],
"source": [
- "airports_df = pd.read_csv('Data/airportsBlankValues.csv')\n",
+ "airports_df = pd.read_csv('./airportsBlankValues.csv')\n",
"airports_df"
]
},
@@ -747,7 +321,8 @@
"metadata": {},
"source": [
"## Writing DataFrame contents to a CSV file\n",
- "**to_csv** will write the contents of a pandas DataFrame to a CSV file"
+ "\n",
+ "`to_csv` will write the contents of a pandas DataFrame to a CSV file."
]
},
{
@@ -756,78 +331,8 @@
"metadata": {},
"outputs": [
{
+ "output_type": "execute_result",
"data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- "
\n",
- "
\n",
- "
Name
\n",
- "
City
\n",
- "
Country
\n",
- "
\n",
- " \n",
- " \n",
- "
\n",
- "
0
\n",
- "
Seattle-Tacoma
\n",
- "
Seattle
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
1
\n",
- "
Dulles
\n",
- "
Washington
\n",
- "
USA
\n",
- "
\n",
- "
\n",
- "
2
\n",
- "
Heathrow
\n",
- "
London
\n",
- "
United Kingdom
\n",
- "
\n",
- "
\n",
- "
3
\n",
- "
Schiphol
\n",
- "
NaN
\n",
- "
Netherlands
\n",
- "
\n",
- "
\n",
- "
4
\n",
- "
Changi
\n",
- "
Singapore
\n",
- "
Singapore
\n",
- "
\n",
- "
\n",
- "
5
\n",
- "
Pearson
\n",
- "
Toronto
\n",
- "
Canada
\n",
- "
\n",
- "
\n",
- "
6
\n",
- "
Narita
\n",
- "
Tokyo
\n",
- "
Japan
\n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
"text/plain": [
" Name City Country\n",
"0 Seattle-Tacoma Seattle USA\n",
@@ -837,11 +342,11 @@
"4 Changi Singapore Singapore\n",
"5 Pearson Toronto Canada\n",
"6 Narita Tokyo Japan"
- ]
+ ],
+ "text/html": "
\n\n
\n \n
\n
\n
Name
\n
City
\n
Country
\n
\n \n \n
\n
0
\n
Seattle-Tacoma
\n
Seattle
\n
USA
\n
\n
\n
1
\n
Dulles
\n
Washington
\n
USA
\n
\n
\n
2
\n
Heathrow
\n
London
\n
United Kingdom
\n
\n
\n
3
\n
Schiphol
\n
NaN
\n
Netherlands
\n
\n
\n
4
\n
Changi
\n
Singapore
\n
Singapore
\n
\n
\n
5
\n
Pearson
\n
Toronto
\n
Canada
\n
\n
\n
6
\n
Narita
\n
Tokyo
\n
Japan
\n
\n \n
\n
"
},
- "execution_count": 9,
"metadata": {},
- "output_type": "execute_result"
+ "execution_count": 9
}
],
"source": [
@@ -854,16 +359,15 @@
"metadata": {},
"outputs": [],
"source": [
- "airports_df.to_csv('Data/MyNewCSVFile.csv')"
+ "airports_df.to_csv('./MyNewCSVFile.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "The index column is written to the csv file\n",
- "\n",
- "Specify **index=False** if you do not want the index column to be included in the csv file"
+ "The index column is written to the csv file. \n",
+ "Specify `index=False` if you do not want the index column to be included in the csv file."
]
},
{
@@ -872,10 +376,7 @@
"metadata": {},
"outputs": [],
"source": [
- "airports_df.to_csv(\n",
- " 'Data/MyNewCSVFileNoIndex.csv', \n",
- " index=False\n",
- " )"
+ "airports_df.to_csv('./MyNewCSVFileNoIndex.csv', index=False)"
]
}
],
@@ -895,9 +396,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.9"
+ "version": "3.8.5-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
-}
+}
\ No newline at end of file
diff --git a/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv b/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv
new file mode 100644
index 0000000..fb9d05b
--- /dev/null
+++ b/source/week-4/read-write-csv-pandas/MyNewCSVFile.csv
@@ -0,0 +1,8 @@
+,Name,City,Country
+0,Seattle-Tacoma,Seattle,USA
+1,Dulles,Washington,USA
+2,Heathrow,London,United Kingdom
+3,Schiphol,,Netherlands
+4,Changi,Singapore,Singapore
+5,Pearson,Toronto,Canada
+6,Narita,Tokyo,Japan
diff --git a/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv b/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv
new file mode 100644
index 0000000..19ff4c4
--- /dev/null
+++ b/source/week-4/read-write-csv-pandas/MyNewCSVFileNoIndex.csv
@@ -0,0 +1,8 @@
+Name,City,Country
+Seattle-Tacoma,Seattle,USA
+Dulles,Washington,USA
+Heathrow,London,United Kingdom
+Schiphol,,Netherlands
+Changi,Singapore,Singapore
+Pearson,Toronto,Canada
+Narita,Tokyo,Japan
diff --git a/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb b/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
index b5e35ca..a9b1818 100644
--- a/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
+++ b/source/week-5/data-visualization-matplotlib/15 - Visualizing correlations.ipynb
@@ -2,29 +2,17 @@
"cells": [
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
- "# Visualizing data with matplotlib"
- ]
- },
- {
- "cell_type": "markdown",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "Somtimes graphs provide the best way to visualize data\n",
- "\n",
- "The **matplotlib** library allows you to draw graphs to help with visualization\n",
+ "# Visualizing data with matplotlib\n",
"\n",
- "If we want to visualize data, we will need to load some data into a DataFrame"
+ "Somtimes graphs provide the best way to visualize data. The **matplotlib** library allows you to draw graphs to help with visualization. \n",
+ "If we want to visualize data, we will need to load some data into a DataFrame first."
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@@ -33,26 +21,24 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Load our data from the csv file\n",
- "delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') "
+ "delays_df = pd.read_csv('./Lots_of_flight_data.csv') "
]
},
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
"In order to display plots we need to import the **matplotlib** library"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
@@ -61,30 +47,50 @@
},
{
"cell_type": "markdown",
- "execution_count": null,
"metadata": {},
- "outputs": [],
"source": [
- "A common plot used in data science is the scatter plot for checking the relationship between two columns\n",
- "If you see dots scattered everywhere, there is no correlation between the two columns\n",
- "If you see somethign resembling a line, there is a correlation between the two columns\n",
+ "A common plot used in data science is the scatter plot for checking the relationship between two columns. \n",
+ "If you see dots scattered everywhere, there is no correlation between the two columns. \n",
+ "If you see somethign resembling a line, there is a correlation between the two columns. \n",
"\n",
"You can use the plot method of the DataFrame to draw the scatter plot\n",
- "* kind - the type of graph to draw\n",
- "* x - value to plot as x\n",
- "* y - value to plot as y\n",
- "* color - color to use for the graph points\n",
- "* alpha - opacity - useful to show density of points in a scatter plot\n",
- "* title - title of the graph"
+ "- `kind` - the type of graph to draw\n",
+ "- `x` - value to plot as x\n",
+ "- `y` - value to plot as y\n",
+ "- `color` - color to use for the graph points\n",
+ "- `alpha` - opacity - useful to show density of points in a scatter plot\n",
+ "- `title` - title of the graph"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Slice the DataFrame to reduce the visual processing time and reduce the data points for visibility\n",
+ "delays_df = delays_df.iloc[0:1000, :]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
"metadata": {
"scrolled": true
},
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n