Skip to content

This is the Github repo for the course Statistical Machine Learning offered at Columbia University. I set this up as my personal blog for future students instead of just some site for homework answers. Please feel free to email me if you have questions.

Notifications You must be signed in to change notification settings

ss6025/Statistical_Machine_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical Machine Learning

This is the Github repo for the course Statistical Machine Learning offered at Columbia University. I set this up for future students and hope this site can be helpful. Please feel free to email me if you have questions.

Documents

There are the following documents:

  • Homework: there are a few files I collected from past homework; I would love people to not focus on this part, because simply doing homework is not enough.
  • Notes: this document has lecture notes taken by me and I am glad to share this with you. That being said, I have faith that successful audience tend not to use online documents. I believe they are capable of preparing their own notes. Mine is simply up for sharing and inspration purpose. It is strongly recommended that you use your own notes.
  • Lecture: slides I collected in machine learning
  • Exercise: This is a dictonary of packages of different machine learning scripts. Each script has (1) Definition of function, (2) Toy data, and (3) Running the function. I wrote all scripts in the same format so that I can fit them into a larger picture later on. It is strongly recommended that you come up with your own format of executing these functions, but I hope mine can be good inspiration.

Advice

I do not believe there is one book or one problem set to do so that one can magically become an expert in machine learning. That being said, there are a few directions to go so that perhaps you can be on the right track. On top of that, your dilligence is a great contributing factor to determine how far you can push yourself in this field.

Stage I

(1) Read as many books as you can and try to replicate the machine learning techniques. This is early stage of getting yourself familiar with machine learning tools and you should feel comfortable of getting your hands dirty.

Some great books are:

  • An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani; click here
  • An Introduction to Support Vector Machines and Other Kernel-based Learning Methods by Cristianini; click here
  • Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, Jerome Friedman; click here
  • Machine Learning by Kevin Murphy; click here

Programming Languages:

  • You should be fluent in both Python and R.
  • You should attempt to replicate same results in C++.

A good exercise is to go to Online Code Compiler, click here and do some matrix algebra with different languages simultaneously.

Stage II

(2) For intermediate level students, you should be fluent in Step (1). To move beyond here, you need to go to Github or Kaggle and search for new data sets (the ones you have never touched before) and try to replicate Step (1) using new data sets.

Review this Wiki Site and Search for New Data Set. Once you find something interesting you can go to Github or Kaggle.

A new data set is like a new person you may want to be friends with. You treat it well and learn from it. You will gain experience. The data set does not limit to any form. It can be (1) big of small, (2) supervised or unsupervised, (3) time-series, (4) images, and so on. you need to be able to tell a great story with results from multiple different machine learning techniques given any data sets.

Stage III

(3) At an advanced or research level, you are fluent in Step (1) and (2). In fact, you might be too fluent to find them interesting. Moreover, you have looked so many data sets that there isn't a single data form you have not seen before. You start to think how you can contribute to the society and what can be improved. You start asking questions such as "why apples fall?" If you are here, you are an advanced machine learning practitioner. You can override any authors or textbooks. You can design and even invent profitable machine learning products so that perhaps you can go out there to look for investors to finance your idea and start your own company.

About

This is the Github repo for the course Statistical Machine Learning offered at Columbia University. I set this up as my personal blog for future students instead of just some site for homework answers. Please feel free to email me if you have questions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published