(CSCI 4830-004) Syllabus, Fall 2016
Alvin Grissom II
Machine learning has recently seen a surge in popularity. All of the largest technology companies have made machine learning a part of their business, but so have thousands of others, from financial institutions to retailers to educational services to government agencies. We have self-driving cars, movie and music recommendations, and phones that talk to us. Indeed, machine learning has become so commonplace that we simply expect our phones to understand our spoken language and we are mildly annoyed when our faces aren't automatically tagged in our photos. We can predict epidemics before they occur, find viruses by scanning binary, and even predict mental illness years in advance. This all seems like magic to many people. By the end of this course, it will not longer seem like magic.
This course will provide an introduction to the theory and practice of machine learning. Students will be introduced to the fundamental mathematics and best practices in both supervised and unsupervised machine approaches, using real-world data. Students learn to use state-of-the art tools as well as create their own implementations to gain a deeper understanding of the algorithms.
This course has the following objectives:
- Students will have at their disposal a wide array of analytical tools and the knowledge of when and how to use them.
- Students will learn to design and conduct their own machine learning experiments using state-of-the-art software.
- Students will gain some cognizance of the ethical implications of work in machine learning.
- Students will develop skills necessary for communicating their ideas and findings clearly, truthfully, and convincingly.
- Students will grapple with some of the ethical issues that have arisen as machine learning has advanced.
Prerequisites:
Students should be comfortable with Python or Java.
Students should be willing and able to learn enough Python to complete the assignments that require it.
While we will review the requisite mathematics in the course, due to the intensely mathematical nature of machine learning, this course assumes a reasonable level of mathematical maturity. Basic knowledge of data structures will be assumed. We will make extensive use of probability and concepts from linear algebra and calculus. We will review this material in the course.
Most assignments may be completed in Python 3 or Java. Some assignments will require Python.
This course assumes access to a Unix (OS X, Linux, etc.) environment. It may possible to complete the course using Cygwin, but it is not recommended.
We will not cover all topics at the same depth. Machine learning is a gargantuan field, and no course can come close to covering it all. I can be persuaded to add topics (and spend more or less time on others) if students vote to do so.
- Probability and Vector Spaces
- Bayesian Approaches
- Information Theory
- Graphical Models
- Gradient Descent
- Linear Models
- Feature Engineering
- Support Vector Machines
- Kernel Methods
- Boosting and Ensemble Methods
- Online Algorithms
- Expectation Maximization
- Regression
- Clustering
- Evolutionary Algorithms
- Decision Trees
- Learnability
- Multiclass Classification
- Reinforcement Learning
- Neural Networks
Classes vary between lectures, in-class exercises, and discussions. Assignments vary between short homework assignments, longer programming assignments, practical machine learning experiments, and, occasionally, essays. Students may be expected to watch a lecture prior to class.
It is required that students have regular access to a computer and an Internet connection throughout this course. A laptop is preferable. If you have a laptop, it would be useful to bring the laptop to class, especially for the in-class exercises.
Every week, there will be homework assignments -- sometimes short programs, sometimes written assignments, and sometimes small scientific experiments -- designed to mentally solidify concepts from the course. In addition, every week there will be in-class surveys regarding the material. See "Class Participation" below.
There will also be larger programming assignments which may require significantly more work, including a written report written in LaTeX, in the format of a short computer science paper. Planned assignments are:
- Feature Engineering
- Language Models
- Logistic Regression
- Multiclass Classification
- k-Nearest Neighbors
This course will have one in-class midterm that covers concepts from the course up to that point. One page (front-and-back) of written or typed notes may be used for the midterm.
Students will write short reviews of machine learning papers (approximately 1/2 page). In addition, each student will write a short essay that addresses an ethical issue in machine learning.
The final exam in this course consists of a written report and presentation of a course-long project, wherein students design, implement, and analyze a machine learning experiment from scratch, using publicly available data. Progress reports will occur throughout the semester to ensure that students are making adequate progress. The presentations will occur on the final exam date. The papers will be due immediately prior to the presentation.
- Students are expected to come to class having read the material assigned. Students are also encouraged to ask questions of the instructor and of other students on the class message board.
- If a class has an assigned video, students are expected to approach it as a class reading, watching it prior to class and coming prepared with questions.
- Students are encouraged to post interesting articles relevant to machine learning on the message board.
- Students are also expected to complete anonymous (to the instructor) weekly surveys regarding their progress in the class and relative interest in topics. Not completing these will negatively affect the student's participation grade.
- The report for the course project must be submitted in LaTeX (with a PDF), using the ICML template.
- Programming assignment reports must be submitted in LaTeX (with a PDF) using the ICML template.
- Regular homework assignments should be submitted as a single Markdown file or as a zipped LaTeX file (with a PDF).
Bayesian Reasoning and Machine Learning by David Barber.
Available for free online at: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage
This book will be supplemented with online material throughout the course.
Components of the final grade are as follows:
Percentage
- Weekly Homework 20%
- Programming Assignments 20%
- Midterm 20%
- Final 25%
- Writing Assignments 5%
- Participation 10%
It is possible to earn extra credit by going above and beyond the expectations of the assignment.
When asking the instructor for help debugging, students must provide the following:
- Information about the environment (OS version, compiler/interpreter version, etc.)
- Error output.
- Other conditions under which the error is or is not reproduced.
- Evidence that the student has attempted to solve the problem him or herself.
All students of the University of Colorado at Boulder are responsible for knowing and adhering to the academic integrity policy of this institution. Violations of this policy may include: cheating, plagiarism, aid of academic dishonesty, fabrication, lying, bribery, and threatening behavior. All incidents of academic misconduct shall be reported to the Honor Code Council (honor@colorado.edu; 303-735-2273). Students who are found to be in violation of the academic integrity policy will be subject to both academic sanctions from the faculty member and non-academic sanctions (including but not limited to university probation, suspension, or expulsion).
Other information on the Honor Code can be found at http://www.colorado.edu/policies/honor.html and at http://www.colorado.edu/academics/honorcode/.
If you qualify for accommodations because of a disability, please submit to your professor a letter from Disability Services in a timely manner (for exam accommodations provide your letter at least one week prior to the exam) so that your needs can be addressed. Disability Services determines accommodations based on documented disabilities. Contact Disability Services at 303-492-8671 or by e-mail at dsinfo@colorado.edu. If you have a temporary medical condition or injury, see Temporary Injuries guidelines under the Quick Links at the Disability Services website and discuss your needs with your professor. Please inform the professor of any accommodations needed relative to disabilities at the start of the semester.
Campus policy regarding religious observances requires that faculty make every effort to deal reasonably and fairly with all students who, because of religious obligations, have conflicts with scheduled exams, assignments or required attendance. In this class, inform the professors of conflicts at the start of the semester. See full details at http://www.colorado.edu/policies/fac_relig.html
Students and faculty each have responsibility for maintaining an appropriate learning environment. Those who fail to adhere to such behavioral standards may be subject to discipline. Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor with the student's legal name. I will gladly honor your request to address you by an alternate name or gender pronoun. Please advise me of this preference early in the semester so that I may make appropriate changes to my records. See policies at http://www.colorado.edu/policies/classbehavior.html and at http://www.colorado.edu/studentaffairs/judicialaffairs/code.html#student_code.
*The instructor reserves the right to alter this syllabus as the course progresses. Check course webpage for latest version and dates