Skip to content

tspence/data-analysis-class-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data analytics - basics of Jupyter

This GitHub repository contains a collection of tips and tricks for working with Jupyter. It demonstrates some of the basic techniques analysts use for capturing and storing data in various databases. Here's a brief description of the tutorials available in this repository.

Tutorials

  • Basics of Jupyter - A notebook that demonstrates how you can verify if Jupyter and Python have been installed correctly on your computer. Also includes a demonstration of how many Jupyter implementations will allow you to install Python packages directly using code blocks.
  • Basics of Loading Data - Demonstrates how to read files from disk and to examine them with Pandas. Includes examples for connecting to a MongoDB server, storing your connection string in a file outside of your github repository, and some techniques for querying data in MongoDB.
  • Basics of Transformation - A variety of ways to transform data within a Pandas dataframe using various approaches.
  • Basics of Plotting - Some basic ways to use matplotlib and pandas to chart certain types of graphs. This is a very basic tutorial - there's lots of advanced graphing lessons you can find online to build on these ideas!
  • Basics of Storing Data - How to store information into MongoDB using single inserts, multiple inserts, and batches.
  • Basics of Web Scraping - An example of how to fetch a web page using urllib3, how to fetch multiple pages in a range, and how to search within that page using regular expressions. Also includes some examples of how to format output and how to display thumbnails within a Jupyter notebook using Pandas dataframes.
  • Web Scraping with Selenium - An example of how to fetch a web page and retrieve elements in it using Selenium, which can be used to scrape data from webpages that require the use of JavaScript to load properly.
  • Basics of Google Colab - An example of how to use MongoDB within Google Colab, by detecting your colab server's IP address.

Example Applications

  • IMDB Scraper - An example program that fetches data from IMDB and stores it into a MongoDB database.
  • IMDB Mongo Scraper - The same program, but this time the data is stored in a MongoDB database. Includes extra logic to make the scraper restartable.
  • Language Processing - An example tutorial demonstrating sentiment analysis.
  • Topic Classification - A small example that demonstrates usage of DistilRoBERTa-based, a neural network program that classifies text based on concepts of emotions.

About

Demonstration for my DA 320 class

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors