Parameter Estimation by Maximum Likelihood Model Fitting

Most recent version by Sheila Kannappan June 2017, with major contributions from Kathleen Eckert, Rohan Isaac, and Amy Oldenberg.

Why do we fit models to data?

To understand the underlying distribution of the data
To test hypotheses or competing theoretical models
To predict values for future observations

In general, any of these goals will often involve parameter estimation. In this tutorial we will go through the basics of parameter estimation using frequentist "maximum likelihood" model fitting.

Background: Least Squares Fitting

"Least squares" fitting is based on the assumption that all the measurement uncertainties σ in a data set are the same, i.e., follow the same Gaussian distribution. In the case of a linear model, the least squares fit gives the slope & y-intercept parameters that minimize the mean square residuals between the data and the model, where the residuals are measured in the y-direction in the case of "forward" fitting. Here the mean square residuals are given by: where x_i is the independent variable, y_i is the dependent variable, and α and β are the slope and y-intercept parameter values. Minimizing the mean square residuals is equivalent to minimizing the rms (root mean square) residuals, and also equivalent to minimizing the χ², because of the fact that all σ's have been assumed to be identical.

Least squares fitting falls within the broader category of parameter estimation known as Maximum Likelihood Estimation (MLE). In this method, we measure the likelihood for a given model, typically using the χ² statistic:

The likelihood is given by: where

To find the MLE solution to our model, we maximize the likelihood function by taking the derivative with respect to each parameter (the slope and y-intercept in the case of a linear fit) and by solving for where each derivative is equal to 0. To simplify the math, we first take the natural log of the likelihood function. For least squares fitting (wherein all σ values are taken to be equal), it is possible to obtain an analytical solution for the slope and y-intercept of a linear model, as shown below.

Derivation

Take the natural log of the likelihood function

Take the derivatives of ln(L) with respect to α and β and set those equations to 0: and .

If we assume the σ_i's are all the same, then we have two equations for two unknowns to solve:

eqn 1:

eqn 2:

Multiply eqn 1 by N and multiply eqn 2 by to get:

eqn 1

eqn 2

Now we can set these two equations equal to each other:

Solving for α and dividing the top and bottom by N²:

where the bar over the variable signifies the mean value of that quantity.

Now we can go back and solve for β:

from eqn 2

For more complicated functions or if the uncertainties are not uniform, setting the derivatives of the likelihood equal to zero may not lead to equations that are so easily solved analytically. Thus for more complicated MLE problems we typically use programs such as np.polyfit or mpfit to determine parameters numerically.

Tutorial:

Download paramfit1.py. In this code we create fake data with random errors around a line of known slope and y-intercept. We then compute the maximum likelihood estimated slope and y-intercept for the fake data. Fill in lines ending with "?" and answer questions by putting your own comments in the code.

Run the program paramfit1.py and plot the data. What aspect of a real data set is npr.normal used to emulate? What assumption is made in the code that is key to the least squares approach?
Read over the derivation for the linear least squares analytical solution (above) and add code to compute the estimated slope and y-intercept based on the fake data set. Plot the maximum likelihood ("best fit") solution on top of the data.
For linear least squares fitting, we can obtain analytical formulae for the uncertainties on the slope and y-intercept estimates, which have been provided below. (See http://mathworld.wolfram.com/LeastSquaresFitting.html for the full derivation.)

Uncomment the code to compute the uncertainties for the slope and y-intercept analytically. Which parameter has larger fractional uncertainty?
Read up on np.polyfit

Use np.polyfit to compute the MLE slope and y-offset for the data set numerically rather than analytically. Do you get the same result as in the analytical case from #2 above?
Note that in this example we have assumed that the σ on all data points is the same. This simplified assumption is often not the case. If the uncertainties are different, then we must include each data point's uncertainty within the MLE calculation. Although np.polyfit does not automatically take an array of uncertainties on the y value, we can input an optional weight vector: fit=np.polyfit(xvals, yvals, 1, w=1/sig) where sig is an array containing the uncertainty on each data point. Here the input is 1/sig rather than 1/sig**2 as you might expect from the equations above. The np.polyfit function squares the weight value within the source code.
Now we'll compute the errors on the slope and intercept numerically, for comparison with the analytic calculation in #3. A quick method to determine uncertainties is to have np.polyfit compute the covariance matrix: which is the inverse of the Hessian Matrix, consisting of second derivatives of the log likelihood with respect to different model parameters. (When these matrices get tricky to compute, we can resort to a more approximate numerical technique such as the bootstrap.) Add cov="True" to the np.polyfit function call so that np.polyfit will compute the covariance matrix numerically. Print out the uncertainties computed using the covariance matrix. Are they the same as found in the analytical solution? What happens to the uncertainties as you increase/decrease the number of data points? What happens to the percentage difference between the analytical and numerical methods as you increase/decrease the number of data points?

Solutions to the tutorial are in paramfit1_soln.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter Estimation by Maximum Likelihood Model Fitting

Background: Least Squares Fitting

Derivation

Tutorial:

FilesExpand file tree

frequentist_paramfitting_tutorial.md

Latest commit

History

frequentist_paramfitting_tutorial.md

File metadata and controls

Parameter Estimation by Maximum Likelihood Model Fitting

Background: Least Squares Fitting

Derivation

Tutorial: