This GitHub repository contains a collection of tips and tricks for working with Jupyter. It demonstrates some of the basic techniques analysts use for capturing and storing data in various databases. Here's a brief description of the tutorials available in this repository.
- Basics of Jupyter - A notebook that demonstrates how you can verify if Jupyter and Python have been installed correctly on your computer. Also includes a demonstration of how many Jupyter implementations will allow you to install Python packages directly using code blocks.
- Basics of Loading Data - Demonstrates how to read files from disk and to examine them with Pandas. Includes examples for connecting to a MongoDB server, storing your connection string in a file outside of your github repository, and some techniques for querying data in MongoDB.
- Basics of Transformation - A variety of ways to transform data within a Pandas dataframe using various approaches.
- Basics of Plotting - Some basic ways to use matplotlib and pandas to chart certain types of graphs. This is a very basic tutorial - there's lots of advanced graphing lessons you can find online to build on these ideas!
- Basics of Storing Data - How to store information into MongoDB using single inserts, multiple inserts, and batches.
- Basics of Web Scraping - An example of how to fetch a web page using urllib3, how to fetch multiple pages in a range, and how to search within that page using regular expressions. Also includes some examples of how to format output and how to display thumbnails within a Jupyter notebook using Pandas dataframes.
- Web Scraping with Selenium - An example of how to fetch a web page and retrieve elements in it using Selenium, which can be used to scrape data from webpages that require the use of JavaScript to load properly.
- Basics of Google Colab - An example of how to use MongoDB within Google Colab, by detecting your colab server's IP address.
- IMDB Scraper - An example program that fetches data from IMDB and stores it into a MongoDB database.
- IMDB Mongo Scraper - The same program, but this time the data is stored in a MongoDB database. Includes extra logic to make the scraper restartable.
- Language Processing - An example tutorial demonstrating sentiment analysis.
- Topic Classification - A small example that demonstrates usage of DistilRoBERTa-based, a neural network program that classifies text based on concepts of emotions.