The workbook is based on the publishied book, Python Machine Learning written by Sebastian Raschka, which demonstrates and well explaines the mothed of ML and data analysis. It does not just provide the application the codes, but the theories behine the ML models are also clear. However, beyond following this book, I also provide a lot of extra tutorials (exersices) for optimation, debugging and test for each cases, whcih based on the inspiration of my expreinces of physics analysis. Some theories in this book may not easy to understand, I also provide clearly derivations and examples if it needs. Thus, this workbook is not just simply demostrating the codes from books, but giving the advance exersices and clear insight of models, since I beilive "Technology comes, technology goes, but insight is forever".
Several models of machine learning intruduced in this workbooks include classification and regession cases, and supervisied and unsupervised learners. Simple examples about data structure for anaylsis with some common tools, which can be done in Pyhton, are also provided. In the end, Nature Language and Nueal network are simply introduced and demonstrated.
The programing language focus on Python, and the packages of machine learning models are using scikit-learn (v 0.20), pandas and numpy. The performance and visualization for analysis are using matplotlib and jupyter notebook in ipython. They demonstrates the processes of analysis from data. Sometimes Scipy are used.
All the main contants and knowledges are refered to the book Machine Learning in Python, Sebastian Raschka. Several detial theories and mathematical methods are inspired by the book Pattern Recognition and Machine Learning, Christopher M. Bishop , , and online courses, detail listed in resources.md.
(Pictures credited by link-1 and 2)
⚠️ If the example code (*.ipynb) can't be loaded, please "copy" its Github URL and "paste" to nbviewer ⚠️
Give the major concept and history of machine learning algorithm. Start from the supervised learning: Perceptron learning algorithm (PLA), Gradient decent algorithm and Stochastic gradient decent algorithm by building own class.
- Example 1 - Perceptron Linear Algorith, PLA
- Example 2 - PLA with shuffled data
- Example 3 - Ensembling PLA hypotheses
- Example 4 - Adaptive Linear Neuron Gradian Decent
- Example 5 - Stochastic Gradian Decent
Give several futher popular learning algorithms in current and foucs on classification case with supervised learning by using Scikit-learn tools, e.g. PLA, Logist regession, Support Vector Machine (SVM), decision tree, random forest and K-nearest neigbors (KNN).
- Example 1 - PLA by scikit-learn
- Example 2 - Logistic Regression
- Example 3 - Support Vector Machine, SVM
- Example 4 - Tree algorithms
- Example 5 - k-Nearest Neighbor, KNN
Gives the basic examples for preprocessing data and introduceing the Regularization in machine learning, which is for dealing with the overfitting problem. Two Regularization methods, L1 and L2, have clear comparison in this chapter. Except regularization, two feature selection algorithms are introduced: Sequential backward selection (SBS) and Random forest.
- Example 1 - Pre-processing with Data
- Example 2 - Regularization methods
- Example 3 - Feature selections
Provide important topic about feature extraction and reduction. Several common method will be shown here, e.g. Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA) and Kernel PCA. I also introduce the fundamental theories about Kernel algorithm.
- Example 1 - Principal Component Analysis, PCA
- Example 2 - Fisher's Linear Discriminant Analysis, LDA
- Example 3 - Kernel PCA
Model validation and parameters optimation are strongly correlatied the results of the learning. The validation can check and avoid the underfitting or overfitting problem during fitting before applying model to test or comming new data. The optimazation of superparameters can fine tune the model to fit better with the helps of validation. The method introduces here includes example of Pipline, K-fold cross-validation and nested cross-validation etc....
The methods can ensemble the models a meta-model which can be used either in classification and regression case. The concept is for reducing the bias of model dependency, and it also can be view as an alternative way of regularization. The aggregation models introduced in this chapter are : Majority voting Aggregation
- Example 1 - Majority Voting Aggregation
- Example 2 - Bootstrap Aggregation
- Example 3 - Adaptive boosting
- Example 1 - Basic Techniques of Nature Language Processing
- Example 2 - Sentiment training with IMBd data, grid searching
- Example 3 - Sentiment training with IMBd data, out-of-core learning
Foucus on regression case in supervised machine learning and give a example about Exploratory Data Analysis (EDA) for analyzing features. Linear regression with/without regularization cases,
- Example 1 - Exploratory Data Analysis, EDA
- Example 2 - Linear Gredian Decent Regression
- Example 3 - Linear Regression with scikit-learn
- Example 4 - Regularization in Regression models
- Example 5 - Nonlinear regression with Polynomial models
Scripts
py2ipy.py : convert .py to .ipynb.
python py2ipy.py --inpy file.py
reWord.csh : change file's word of content
./reWord.csh [path] [text1] [text2]

