Skip to content

Latest commit

 

History

History
77 lines (40 loc) · 7.61 KB

File metadata and controls

77 lines (40 loc) · 7.61 KB

Monte Carlo Methods

Sheila Kannappan

"Monte Carlo methods" is a term covering pretty much any use of pseudo-randomness to help solve any kind of problem. – Niall O'Higgins

Copy all ".py" files in https://github.com/galastrostats/MonteCarloTutorial to your own working space – these files include partial answers to the exercises below, left incomplete for you to finish. Select exercises have solutions provided with a ".solns" extension. You should perform this tutorial in Anaconda under a Windows/Mac/Linux OS.

I. Random Number Generators

There are two ways to generate random numbers:

  • .physical measurements that are expected to be random (e.g., coin flips)
  • .computational algorithms that produce long sequences of apparently random results, in fact completely determined by an initial "seed" value

The latter are often called pseudo-random number generators.

Please look over the description of the "random" package for python here: http://docs.python.org/2/library/random.html

(Check out the Mersenne Twister on the internet – it's not an amusement park ride!)

Exercise 1: Use random.random to generate a variable x consisting of 10 random numbers between 0 and 1. Repeat to create a second random variable y, and plot x and y against each other. Verify that there is no correlation.

Exercise 2: Use random.seed to control the random numbers in x and y such that they are identical. Plot x and y against each other and verify that there is a perfect correlation.

Exercise 3: Use random.gauss to generate a variable u (for "uncertainty") consisting of 1000 random numbers with mean zero and standard deviation σ = 1. Create a histogram of the values to verify that they look like a Gaussian distribution .

The Gaussian distribution is the most commonly used model for random uncertainties (non-systematic errors/noise) in data. In particular:

  1. The error bars on data values are typically set to equal the expected ±1σ variations due to random measurement errors/noise (caveat: some research fields use ±2σ error bars)

  2. The signal-to-noise (S/N) ratios for data values representing "detections" are typically given in terms of the background noise (i.e., S/N=3 means S=3σ (caveat: if the signal is extended in time/space/λ/etc., it is really a sum of several data points and you must use error propagation rules)_

From the diagram, we see a S/N>3 detection has only 0.1% probability of occurring by chance, so we say it is detected "at 99.9% confidence." For data values, the error bars are referred to as "confidence intervals." From the diagram, ±1σ corresponds to the "68% confidence interval."

The python package "numpy" enables array math. The solutions to Exercises 3 & 4 illustrate some of what you can do with numpy.

Import numpy and use it to compute and overplot the expected Gaussian function shape on top of the histogram you made.

Exercise 4: Use numpy to convert u into an array and use numpy's where function to determine what percentage of the time the variable u lies inside ±1σ. If you have an array of data with error bars equal to u, how often should the fit line go through the error bars?

II. Areas or Volumes of Enclosed Regions

A basic application of random number generation is in measuring the areas or volumes of enclosed regions, especially non-rectangular regions for which a direct measurement would be difficult. The method is to choose points randomly in a rectangular region enclosing the region of interest, then find the fraction of the points that land inside the region of interest in order to assess its subarea/subvolume.

Exercise 1: Use random.uniform to measure the area of a circle with radius 1 and thus to measure the value of . How many darts do you need to get a good, consistent estimate?

Exercise 2: Use numpy's array version of random to measure the area under the Gaussian from -1σ to +1σ. Think about why this area is equal to the percentage of u values between ±1σ even though u was created with random.gauss.

III. Random Selection from a Non-Uniform Distribution

It may happen that you want to select random numbers from a distribution of your own. For example, suppose we want the distribution of radius values for a set of points drawn randomly from within a circle as shown below.

The probability of a point having a given radius increases with the area of the annulus that radius lies in, so where R is the radius of the circle. (Note that the integral as is required for a probability distribution.) The trick to computing the (non-uniform) probability distribution for r is to map values x from a uniform distribution [0, 1] onto the values of r in such a way that the correct frequency of values is produced. A one-to-one mapping in which the integrated probability out to r in [0, R] is equal to the integrated probability out to x in [0, 1] does the trick. In "inverse transform sampling," we first generate values using a uniform random number generator, then map them to values drawn from another probability distribution using this integral mapping.

Exercise 1: First, use numpy's version of random to select radii randomly in a circle by inverse transform sampling. Second, compare the distribution of radii selected by this method to the distribution of radii obtained by selecting "hits" in a circle as in Exercise 1 from Part II. (Note – the second task requires that you generate a new block of code, not just tweak the code provided. You should try to find bits of earlier code that you can copy/imitate/modify to make an array of radii, then plot the new radii in a histogram on top of a histogram of the original radii.) How do the histograms compare? Explain.

Exercise 2: Use numpy's version of random to select values from a Gaussian distribution using inverse transform sampling. In this exercise you are essentially recreating random.gauss. Hint: the integral of a Gaussian function centered on 0 is

You can import "erf" from scipy.special.

IV. For Further Inquiry

Random number generation is useful in many contexts. For example, you may wish to generate mock data sets with realistic scatter to test algorithms. Simulated data play an important role in planning and testing experiments.

http://www.ligo.org/news/blind-injection.php

Another technique that relies on Monte Carlo methods is "bootstrapping," actually a family of techniques all of which use random resampling of a real data set to estimate the uncertainties on parameters or model fits characterizing that data set.

http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29