Friday, August 14, 2009

Goodness of fit.

From someone working in a bio-inorganic chemistry lab of fairly strong reputation, I heard of quite the outlandish abuse of statistics.

Apparently, they were using reduced chi-square (the usual one, that assumes a gaussian distribution for the noise or of the uncertainty at each point) to fit curves to the data, and also for model selection, comparing models with different numbers of parameters. The number of parameters in the system is supposed to be physically relevant, in a way I don't recall. I don't want to identify the lab, anyway.

Using reduced chi-square to do model selection is bad enough as it is: there's no reason to take a model with a higher number of parameters as being better just because it has a lower reduced chi-square. How much lower it should be is not clear.

But the real trouble is: the reduced chi-square method comes, at least, with a sort of "warning light" for overfitting, that is, fitting models with too many free parameters. If the reduced chi-square is greater than one, one may or may not be overfitting. But if the reduced chi-square is less than one, and the noise in the data is independently distributed, it is an almost sure sign that one is overfitting, and filling in the noise.

Have a look at the formula for the reduced chi-square statistic. For a perfect fit, what value does it converge to in probability? The answer jumps right out, doesn't it.

Supposedly, at least one dissertation in this lab drew physical conclusions from fits for which the reduced chi-square was less than one. Several papers, as well, may have been published using this method.

The social scientists and the climatologists are very good about their statistics; I'd like to think we're at least getting better about statistics in biophysics. But most of the physical sciences have a long way to come. The curriculum is already somewhat overloaded, but perhaps a course in practical statistics should be required of all doctoral students in the natural sciences, just as in the social sciences; such silly mistakes should be inexcusable. The mistake was not over petty matters such as the size of the error bar (a la the never-ending trouble some people have understanding the difference between noise standard deviation and standard error of the mean.) It directly affected categorical, conceptual conclusions.