This course is designed with the intention of familiarizing the student(s) with the most commonly used tools in data science for a variety of applications in science and engineering. While there are weekly lectures, the student(s) will primarily learn through weekly assigned exercises that make use of Python, GitHub, and LaTeX, and scientific Python libraries including NumPy, SciPy, and scikit-learn. We also include a brief overview of common terminology used in data science, such as optimization, sampling, and bootstrapping. By the end of the course, students should be able to skim online resources related to data science and programming as well as tool documentation for their specific application need.
- Weekly lectures
- 2 hours via Google Hangouts, to schedule at regular intervals
- Interactive office hours included
- Basic guided programming exercises included where necessary
- Optional reading to do after lecture
- Weekly problem sets/programming exercises
- 4 hours
- To be submitted through Github
In-class and take-home exercises are posted through Jupyter notebooks or available as Python scripts. They are numbered by week and will be all available in this Github folder. Towards the end of the course, we will tackle basic LaTeX typesetting and math mode, where the choice of TeX editor is up to you.
- Python (2 weeks)
- Week 1:
- Set-up and coding environment and GitHub (Windows)
- Interpreters vs scripts
- Review of variables, control flow, and variable scope
- Functions
- Using the Python documentation
- Lists
- Recursion
- Week 2:
- String manipulation
- Advanced Data Structures: Dictionaries, Sets, Tuples
- Reading CSVs
- Using Numpy for basic statistics
- Week 1:
- Numpy, SciPy (1 week)
- Arrays (and matrix multiplication, like dot products) and broadcasting
- Using the numpy documentation
- np.random
- np.histogram
- Image manipulation (as an application of arrays)
- Using Matplotlib for scatter plots
- Matplotlib (2 weeks)
- Week 1:
- Basics (scatter, line, box plots, plotting parameters)
- Week 2:
- Dual axis, subfigures, colorbar
- Week 1:
- LaTeX (1 week)
- Set up environment (Windows)
- Review of calculus: derivatives
- Gradients/multivariable calculus
- Use LaTeX to write answers to problem sets
- Statistics (2 weeks)
- Week 1:
- Mean, variance, standard error
- Hypotheses and the t-test
- Week 2:
- Confidence intervals
- Bootstrapping
- Week 1:
- Basics of machine learning/optimization (1 week+)
- Linear regression
- Loss functions and gradient descent