Skip to content
/ mlstudy Public

Repo used as a base of operations for a machine learning study group

License

Notifications You must be signed in to change notification settings

eugen/mlstudy

Repository files navigation

Welcome

Syllabus

1: Intro

Getting to know each other, discussing the plan, the guidelines for sharing the assignments, etc.

Going through a small demo/tutorial for Jupyter Notebooks. See [demo notebook here](1. Intro to Jupyter Notebooks - Python.ipynb).

2: Linear Regression

Create simple prediction models from small datasets. See [demo notebook here](2. Linear Regression - Python.ipynb)

Recommended datasets:

  • House prices: Predict the price of a house based on surface, lot size, #bathrooms, #bedrooms, etc.
  • Titanic survivability: Predict the likelyhood of someone surviving the sinking of the Titanic based on their gender, age, passenger class and some other variables.
  • Video Game sales with ratings: Predict how well a game will sell based on the critic rating, user rating, publisher and genre.

3: Binary Classification

Go over binary classification problems and some algorithms for solving them, e.g logistic regression. See [demo notebook here](3. Binary Classification - Python.ipynb)

Recommended datasets:

4: Clustering

Solve some simple clustering prodblems with K-nearest neighbors/K-means. See [demo notebook here](4. Clustering - Python.ipynb)

5: Recommendations

Create a model for product recommendations with collaborative filtering. See [demo notebook here](5. Collaborative Filtering - Python.ipynb)

Datasets

There's no machine learning without something to learn. This section contains a list of places where you can find datasets useful for a ML study group / course / training.

Theory

There are many sources that cover the theory of machine learning.

Full courses

Books

Cheatsheets

Diagrams that assist you in choosing the correct model to train:

Note: these only hint the correct algorithm to use for a particular situation and are still useful regardless of the platform one uses.

Libraries

Integrated offline environments

  • Anaconda: Simple way to offline install Python, Jupyter Notebooks and all required libraries for data science & machine learning. Should work for other languages besides Python (R, Ruby, Scala, Java, JS) but untested. Feel free to add details here if you've tried it.
  • RStudio: Very nice IDE for R

Online Environments

  • Kaggle: Online hosting of Jupyter Notebooks. Supports Python (2?) and R.
  • Azure Notebooks: Online hosting of Jupyter Notebooks. Supports Python 2&3, R and F#
  • Anaconda Cloud: Packages must be developed offline, but can then be uploaded to Anaconda Cloud and shared with everyone.

Python

Java/.NET/R/Lua/Others

To anyone interested in using any of these: Feel free to add dedicated sections.

Other Tools

  • Gist: Preferred way of sharing code snippets.

  • Jupyter Notebook viewer: Allows viewing of Jupyter notebooks from any URL, github repo or gist.

Related Subjects

Statistics

Highly recommended course available for free on Coursera: Basic Statistics, by University of Amsterdam

Statistics cheatsheets:

About

Repo used as a base of operations for a machine learning study group

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •