π Course Materials for Students
Welcome to Applied Statistics! This repository contains all the materials you need for this course.
An introduction to applied statistics with materials released progressively throughout the semester.
Currently available:
- Statistical modeling and exploratory data analysis
- Estimation methods (Maximum Likelihood, Method of Moments)
- Estimator properties (bias, variance, confidence intervals, bootstrap)
- Final project brief (Birds Biodiversity case study)
Coming soon:
- Hypothesis testing and applications
- Basic probability theory (random variables, distributions)
- Python programming fundamentals
- Linear algebra basics
Each lesson folder contains:
- material.md or PDF - Lesson content and theory
- exercises/ - Practice problems (when applicable)
- data/ - Datasets for the lesson
| Lesson | Topic | Materials |
|---|---|---|
| 00 | Welcome & Introduction | Introduction to the course |
| 01 | Statistical Modeling | Random variables, distributions, EDA |
| 02 | Statistical Learning | MLE, Method of Moments, Fisher information |
| 03 | Estimator Properties | Bias, variance, MSE, confidence intervals, bootstrap |
Note: Additional lessons (04-06) will be released throughout the semester.
Interactive Jupyter notebooks for hands-on practice. Completed labs include both the assignment and a worked solution; in-progress topics ship the assignment only.
- 01-random-variables/
assignment.ipynbsolution.ipynb
- 02-maximum-likelihood-estimation/
assignment.ipynbsolution.ipynb
- 03-inference-estimators/
assignment.ipynbsolution.ipynb
- 04-gaussian-confidence-intervals/
assignment.ipynbdataset-recommendations.md
- 05-non-parametric-estimation/
assignment.ipynb
- 06-model-fitting/
assignment.ipynb
Labs 07+ (hypothesis testing, project integration) will be posted after they are covered in class.
data/raw/Observations 2012-2025.xlsxβ primary dataset for the capstone.final_project_assignment.pdfβ detailed brief outlining objectives, deliverables, and submission rules.- Use this folder as the starting point for your analysis; create your own notebooks/scripts to keep the workflow reproducible.
Sample datasets for exercises and projects:
heights_weights_sample.csv- Anthropometric dataab_test_clicks.csv- A/B testing datamanufacturing_defects.csv- Quality control data- Additional datasets in lesson-specific folders
git clone https://github.com/stephane-rivaud/Applied-Statistics.git
cd Applied-StatisticsCreate a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateInstall required packages:
pip install -r requirements.txtThis installs:
- NumPy - Numerical computing
- Pandas - Data manipulation
- Matplotlib & Seaborn - Visualization
- SciPy - Statistical functions
- Scikit-learn - Machine learning
- Jupyter - Interactive notebooks
jupyter labNavigate to labs/ and open the notebook for the current lab.
You can also run notebooks in Google Colab:
- Upload the notebook to Google Drive
- Open with Google Colaboratory
- Install packages:
!pip install numpy pandas matplotlib scipy scikit-learn
- Before class: Read the lesson material in
lessons/XX-topic/ - During class: Follow along with slides and examples
- After class: Complete the lab assignment in the corresponding
labs/XX-topic/folder - Practice: Work through exercises in
lessons/XX-topic/exercises/
- Open the notebook in the numbered
labs/XX-topic/directory for that session - Read instructions carefully
- Write your code in the designated cells
- Test your code thoroughly
- Save your work regularly
- Submit according to instructor guidelines
# Example: Loading a dataset
import pandas as pd
# From shared data folder
df = pd.read_csv('shared/data/heights_weights_sample.csv')
# From lesson-specific folder (when available)
# df = pd.read_csv('lessons/03-estimator-properties/data/dataset.csv')New materials are released throughout the semester. To get updates:
# Make sure you've committed or saved your work first!
git pull origin public-mainImportant: If you've modified any files, save your changes first or they may be overwritten.
See syllabus.md for:
- Detailed lesson schedule
- Learning outcomes
- Grading criteria
- Course policies
- Important dates
- Lab Assignments: Hands-on exercises throughout the course
- Final Project: Data analysis project with report and presentation
See syllabus.md for detailed grading breakdown and deadlines.
-
All of Statistics by Larry Wasserman
- Comprehensive coverage of statistical theory
- PDF Link
-
A Modern Introduction to Probability and Statistics by Dekking et al.
- Gentle introduction with applications
- PDF Link
- Think Stats by Allen Downey - Python-based statistics
- Statistical Learning MOOC - Stanford University
- OpenIntro Statistics - Free online textbook
- Python 3.9 or later
- Jupyter Notebook/Lab (included in requirements.txt)
| Library | Purpose |
|---|---|
| NumPy | Numerical arrays and operations |
| Pandas | Data manipulation and analysis |
| Matplotlib | Basic plotting |
| Seaborn | Statistical visualizations |
| SciPy | Statistical functions and tests |
| Scikit-learn | Machine learning tools |
- Git - Version control (for getting updates)
- VS Code - Code editor with Jupyter support
- Google Colab - Cloud-based Jupyter environment
- Always start with exploratory data analysis (EDA)
- Check assumptions before applying methods
- Visualize your results
- Interpret p-values carefully (they're not everything!)
- Report confidence intervals, not just point estimates
- Consider practical significance, not just statistical significance
Repository: https://github.com/stephane-rivaud/Applied-Statistics Last Updated: October 11, 2025 Good luck and enjoy the course! ππ