From someone working in a bioinorganic chemistry lab of fairly strong reputation, I heard of quite the outlandish abuse of statistics.
Apparently, they were using reduced chisquare (the usual one, that assumes a gaussian distribution for the noise or of the uncertainty at each point) to fit curves to the data, and also for model selection, comparing models with different numbers of parameters. The number of parameters in the system is supposed to be physically relevant, in a way I don't recall. I don't want to identify the lab, anyway.
Using reduced chisquare to do model selection is bad enough as it is: there's no reason to take a model with a higher number of parameters as being better just because it has a lower reduced chisquare. How much lower it should be is not clear.
But the real trouble is: the reduced chisquare method comes, at least, with a sort of "warning light" for overfitting, that is, fitting models with too many free parameters. If the reduced chisquare is greater than one, one may or may not be overfitting. But if the reduced chisquare is less than one, and the noise in the data is independently distributed, it is an almost sure sign that one is overfitting, and filling in the noise.
Have a look at the formula for the reduced chisquare statistic. For a perfect fit, what value does it converge to in probability? The answer jumps right out, doesn't it.
Supposedly, at least one dissertation in this lab drew physical conclusions from fits for which the reduced chisquare was less than one. Several papers, as well, may have been published using this method.
The social scientists and the climatologists are very good about their statistics; I'd like to think we're at least getting better about statistics in biophysics. But most of the physical sciences have a long way to come. The curriculum is already somewhat overloaded, but perhaps a course in practical statistics should be required of all doctoral students in the natural sciences, just as in the social sciences; such silly mistakes should be inexcusable. The mistake was not over petty matters such as the size of the error bar (a la the neverending trouble some people have understanding the difference between noise standard deviation and standard error of the mean.) It directly affected categorical, conceptual conclusions.
Friday, August 14, 2009
Goodness of fit.
Labels:
science education,
Statistics
Subscribe to:
Comment Feed (RSS)
