Most recent version by Sheila Kannappan June 2017, with major contributions from Kathleen Eckert, Rohan Isaac, and Amy Oldenberg.
Why do we fit models to data?
- To understand the underlying distribution of the data
- To test hypotheses or competing theoretical models
- To predict values for future observations
In general, any of these goals will often involve parameter estimation. In this tutorial we will go through the basics of parameter estimation using frequentist "maximum likelihood" model fitting.
"Least squares" fitting is based on the assumption that all the measurement uncertainties σ in a data set are the same, i.e., follow the same Gaussian distribution. In the case of a linear model, the least squares fit gives the slope & y-intercept parameters that minimize the mean square residuals between the data and the model, where the residuals are measured in the y-direction in the case of "forward" fitting. Here the mean square residuals are given by: where xi is the independent variable, yi is the dependent variable, and α and β are the slope and y-intercept parameter values. Minimizing the mean square residuals is equivalent to minimizing the rms (root mean square) residuals, and also equivalent to minimizing the χ2, because of the fact that all σ's have been assumed to be identical.
Least squares fitting falls within the broader category of parameter estimation known as Maximum Likelihood Estimation (MLE). In this method, we measure the likelihood for a given model, typically using the χ2 statistic:
The likelihood is given by: where
To find the MLE solution to our model, we maximize the likelihood function by taking the derivative with respect to each parameter (the slope and y-intercept in the case of a linear fit) and by solving for where each derivative is equal to 0. To simplify the math, we first take the natural log of the likelihood function. For least squares fitting (wherein all σ values are taken to be equal), it is possible to obtain an analytical solution for the slope and y-intercept of a linear model, as shown below.
Take the natural log of the likelihood function
Take the derivatives of ln(L) with respect to α and β and set those equations to 0:
and
.
If we assume the σi's are all the same, then we have two equations for two unknowns to solve:
Multiply eqn 1 by N and multiply eqn 2 by to get:
Now we can set these two equations equal to each other:
Solving for α and dividing the top and bottom by N2:
where the bar over the variable signifies the mean value of that quantity.
Now we can go back and solve for β:
For more complicated functions or if the uncertainties are not uniform, setting the derivatives of the likelihood equal to zero may not lead to equations that are so easily solved analytically. Thus for more complicated MLE problems we typically use programs such as np.polyfit or mpfit to determine parameters numerically.
Download paramfit1.py. In this code we create fake data with random errors around a line of known slope and y-intercept. We then compute the maximum likelihood estimated slope and y-intercept for the fake data. Fill in lines ending with "?" and answer questions by putting your own comments in the code.
-
Run the program
paramfit1.pyand plot the data. What aspect of a real data set isnpr.normalused to emulate? What assumption is made in the code that is key to the least squares approach? -
Read over the derivation for the linear least squares analytical solution (above) and add code to compute the estimated slope and y-intercept based on the fake data set. Plot the maximum likelihood ("best fit") solution on top of the data.
-
For linear least squares fitting, we can obtain analytical formulae for the uncertainties on the slope and y-intercept estimates, which have been provided below. (See http://mathworld.wolfram.com/LeastSquaresFitting.html for the full derivation.)
Uncomment the code to compute the uncertainties for the slope and y-intercept analytically. Which parameter has larger fractional uncertainty? -
Read up on np.polyfit
Usenp.polyfitto compute the MLE slope and y-offset for the data set numerically rather than analytically. Do you get the same result as in the analytical case from #2 above?
Note that in this example we have assumed that the σ on all data points is the same. This simplified assumption is often not the case. If the uncertainties are different, then we must include each data point's uncertainty within the MLE calculation. Althoughnp.polyfitdoes not automatically take an array of uncertainties on the y value, we can input an optional weight vector:fit=np.polyfit(xvals, yvals, 1, w=1/sig)wheresigis an array containing the uncertainty on each data point. Here the input is1/sigrather than1/sig**2as you might expect from the equations above. Thenp.polyfitfunction squares the weight value within the source code. -
Now we'll compute the errors on the slope and intercept numerically, for comparison with the analytic calculation in #3. A quick method to determine uncertainties is to have
np.polyfitcompute the covariance matrix:which is the inverse of the Hessian Matrix, consisting of second derivatives of the log likelihood with respect to different model parameters. (When these matrices get tricky to compute, we can resort to a more approximate numerical technique such as the bootstrap.) Add
cov="True"to thenp.polyfitfunction call so thatnp.polyfitwill compute the covariance matrix numerically. Print out the uncertainties computed using the covariance matrix. Are they the same as found in the analytical solution? What happens to the uncertainties as you increase/decrease the number of data points? What happens to the percentage difference between the analytical and numerical methods as you increase/decrease the number of data points?
Solutions to the tutorial are in paramfit1_soln.py.