Skip to content
Sheila Kannappan edited this page May 25, 2018 · 83 revisions

Welcome to the 2017bootcamp-general wiki!

Information and materials will be organized here.

Textbook:

Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data

free download from UNC Libraries: search on "Ivezic"
python library (astroML) and python codes to make the book figures here

Basics:

Wireless access for your laptop
Git prep

Linux:

Linux tutorial
After completing the linux tutorial, you should make a directory in /afs/cas.unc.edu/users/y/o/yourname/public to hold the rest of your boot camp work. If your initial disk space allocation runs out, we can arrange additional space, but it's likely to be sufficient. You can work directly on any linux machine in Hell's Kitchen (the astro computing lab), or if these machines are not powerful enough, feel free to ssh into stardust.astro.unc.edu or blarney.castle.unc.edu.
Linux bonus tracks

Plain-Text Editing:

vi tutorial
Even if you prefer emacs or another programming editor, you should learn the basics of vi, because you may sometimes find yourself inadvertently dumped into vi when using git or linux software. Note that vi is installed by default for Linux/Mac and comes with Git Bash for Windows (see "Git prep" above under Basics).
emacs installation
Emacs is the primary alternative to vi and there are long arguments about which is better. Optionally install emacs for Windows or Mac and run the built-in tutorial in the emacs help menu. (FWIW, your instructor uses emacs.)

Version Control:

Git and GitHub tutorial

Python:

  • Anaconda installation and basic data analysis tutorial
  • Programming tutorial
  • Browse Chapters 1 of the AstroML textbook (reading 1.6 more closely) and download/play with the code for Figs. 1.9-1.12
  • Read Appendix A of the AstroML textbook and try out the commands
  • Browse Chapter 2 and take some time to study the vectorization example on pp. 54-56 to reconstruct why it works
  • Debug and speed up this template code or, preferably, this protected version of the same code after consulting these Programming Best Practices; make sure to read the instructions at the bottom of the code. The pdb package isn't necessary for such a short code, but try it anyway to see how it works. When you think you've found everything, discuss with a partner and/or the instructor.
  • Optional: Use these jupyter notebook quickstart instructions to examine the example jupyter notebook called ExploreRESOLVEandECO.ipynb (found in the current repo; download by clicking "raw" then right-clicking the raw contents and choosing "save as"). You can run this notebook partway through if you also download the ECO_dr1_subset.csv input file also provided in this repo. If you like the idea of being able to work in notebooks like this, then you can get comfortable with them first by finishing the example notebook (this effort will also give you a small taste of the sql database query language), then by creating your own jupyter notebook from scratch. For example, you could use ECO DR1 to plot stellar mass vs. environment distributions and compare them for early type and late type galaxies. Try raiding code from at least one of Figures 1.9-1.12 in the textbook. NOTE: If/when you launch your first jupyter notebook under linux you'll get a question about a kernel choice -- just click OK and you should be all set!
  • Optional: Check out the 10 Minutes to pandas guide to see if you'd like to learn more about this powerful data manipulation package. If you've learned about jupyter notebooks, you can play with some useful pandas commands in this pandas tutorial notebook (again, right-click on raw and "save as" to get the actual notebook file so you can run it yourself; you'll also need this input file).

Basic Statistics:

Laws of Probability, Probability Distributions, Random Sampling, Uncertainties, and Confidence Intervals

Testing for Correlations

  • Review these additional slides on correlation tests, a special case of hypothesis tests
  • Download this code and this input file, then uncomment and run each code block sequentially to compare the Spearman Rank and Pearson Correlation Tests. Add the code necessary to include Kendall's tau in the comparison (solution here).

Plotting and Comparing Distributions

Fitting Models

This is a complicated topic (!) and we'll take it one step at a time. Most people are vaguely familiar with frequentist methods for fitting functions to data, but haven't really thought deeply about them. We'll dig into frequentist methods first and come back to Bayesian methods later.

Bootstrapping

Model Selection