Least-squares fitting

Fitting data to a model is a way to determine the best values for free parameters. In Part 1, you adjusted the values of $A$ and $B$ in your models until you had what looked to your eyes like the best match to the data. In this part, we will use a more rigorous method for determining the “best” values.

Definitions

Suppose that you have $N$ data points, where the $i$th data point is $(x_i, y_i \pm \delta y_i)$. You want to see if your data are well-described by a particular function, $f(x)$. Let's suppose, just as an example, that you wanted to test a linear function $f(x) = mx$, where $m$ is the slope of the line. This slope is an example of a free parameter in the model, and we want the fit to tell us what the “best” value is for $m$.

First, let's define a function called the residual. The residual of the $i$th point is $\chi_i$ and is defined as

$\chi_i \equiv \dfrac{f(x_i) - y_i}{\delta y_i}$.

This form should immediately look familiar: it is the $t^{\prime}$ comparison between the measured value $y_i \pm \delta y_i$ and the value predicted by the function at the same point, $f(x_i)$.

A “good” fit will be one where the value of the free parameters in the fit function make this residual small (meaning that the |$t^{\prime}$| at this point is small, and therefore in agreement). But perhaps values which are good for this point aren't quite as good for the next point… or the point after that. So, rather than focus on a single residual, we want to instead pick values for the free parameters which minimize the sum of all the squares of the residuals,

$\chi^2 \equiv \sum_i^N \dfrac{(f(x_i) - y_i)^2}{\delta y_i^2}$.

(We look at the squares of the residuals because we want each contribution to the sum to be positive. If a data point is a little bit higher than what is predicted by the fit function, that should count the same as if the data point is a little low. Squaring accomplishes this.)

It is possible to minimize a function by hand by using calculus, but for this class, we will instead rely on a computer algorithm to find the value for each fit parameter in a model which minimizes the value of $\chi^2$. Such a fit is called a least-squares fit because it finds the fit values that give the least value of the sum of the squares of the residuals.

In such a fit, you supply the data points and the model function, and the fit returns the best values for each fit parameter and the minimum $\chi^2$ value that results when you use those values.

Fitting our models

You can now apply the same formalism from above to your question: what are the best values for $A$ and $B$ in the 1/3- and 1/4-power law models?

To do this, return to the Google Colab notebook above, and start working on Part 2. There you will need to enter your data – energy and corresponding average crater diameter with uncertainties – and the code will guide you through the least-squares fits.

Work through the notebook slowly! Try to understand what is happening at each step, and talk to your TA if you don't know what you are looking at.

When the fits are done, you will have values for the best fit parameters and plots showing what the fits look like. The plots should look similar to what you found in Part 1 by-eye, but now we have a more quantitative justification for what is the “best” fit.

Reduced $\chi^2$, and the "goodness of fit"

In addition to being the thing which is minimized, we can use the final $\chi^2$ value to determine whether our model overall is in agreement with the data or not. (It is still possible for the “best” fit to be a “bad” fit, for example.) For this reason, $\chi^2$ is sometimes referred to as the “goodness of fit” parameter“.

First, note that $\chi^2$ can grow arbitrarily large; if we increase the number of data points used in the fit, we increase the value of $\chi^2$. Therefore, it will help us to look not at $\chi^2$ itself, but at quantity called the reduced chi-squared, $\chi^2_{red}$. If we have $N$ data points and $k$ free parameters in the fit, then the number of degrees of freedom is $\nu = N - k$ and the reduced chi-square is

$\chi^2_{red} = \chi^2/\nu$.

The reduced chi-square is sort of like the average chi-square per data point, or equivalently the average residual.

Conceptually, what does the reduced chi-square represent and how can we use this value to determine if our model is in agreement with the data or not? Suppose you have one point that is very close to the fit line so that its distance away is less than the size of its uncertainty; for such a point, $\chi_i = (f(x_i)-y_i)/\delta y_i < 1$. Now suppose another point is far away from the line, so that its distance away is greater than its uncertainty; therefore, $\chi_i = (f(x_i)-y_i)/\delta y_i > 1$. If we have a “good” fit, then we'd expect to have some close points ($\chi_i <1$), some far points ($\chi_i >1$), and some medium points ($\chi_i =1$), so we would expect our average residual (i.e. our reduced chi-square) to be about 1.

Let's look at a few scenarios:

$\chi^2_{red} \approx 1$: the scatter of the data around the fit line is about what you would expect based on the size of the uncertainties. The data and the fit agree.
$\chi^2_{red} \gg 1$: the scatter of the data around the fit line is greater than you'd expect based on the size of the uncertainties. Either the model does not agree with the data, or the uncertainties on the data points are too small (possibly because there are unaccounted for uncertainties or because of a systematic bias).
$\chi^2_{red} \ll 1$; the scatter of the data around the fit line is smaller than you'd expect based on the size of the uncertainties. The uncertainties are likely over-estimated or more data is needed to test the model.

Unlike the $t^{\prime}$ test, these are not hard rules about agreement or disagreement. But it can be helpful as part of the discussion about the quality of your fits.

Number of degrees of freedom

Why do we divide by the number of degrees of freedom, $\nu = N - k$, instead of just the number of points, $N$?

Each time we add another free parameter to the model, we “constrain” the model more. Think, for example, about what happens when you have two data points and you try to fit them to a line $f(x) = mx + b$. We have two data points ($N = 2$) and two fit parameters ($k = 2$), so we have zero degrees of freedom ($\nu = N - k = 0$). The line will go exactly through both points and the chi-squared value will be zero, $\chi^2 = 0$. We have effectively “used up” two data points worth of information to do the fit, so we have no “freedom” left to let the fit wiggle around the data points.

Now consider doing the same fit with three or more data points. The line is now no longer guaranteed to go through each point exactly, and so $\chi^2$ value will no longer be zero.

By dividing $\chi^2$ by the number of degrees of freedom instead of just by $N$, we better account for the information lost (used to constrain the model).

UChicago Instructional Physics Laboratories

Table of Contents

Least-squares fitting

Definitions

Fitting our models

Reduced $\chi^2$, and the "goodness of fit"

Number of degrees of freedom

UChicago Instructional Physics Laboratories

User Tools

Site Tools

Table of Contents

Least-squares fitting

Definitions

Fitting our models

Reduced $\chi^2$, and the "goodness of fit"

Number of degrees of freedom