## Caveat Smoother

*September 6, 2012 at 9:19 am* *gabrielrossman* *
1 comment *

| Gabriel |

All metrics and models are assumption-laden, but some are more assumption-laden than others. Among the worst offenders are smoothers, which as the name implies, assume that the underlying reality is smooth. If the underlying reality has discontinuity then the smoother will obfuscate this in the course of trying to smooth out “noise.” This can actually have big theoretical implications. Most notably, there was always a lot of zig-zag in the fossil record but traditionally people assumed it was just noise and so they smoothed it out. Then Gould came up with the theory of punctuated equilibrium and said that evolution substantively works through bursts, which at a data level is equivalent to saying that the zig zags are signal, not noise.

Here’s an illustration. Let’s simulate a dataset that, by assumption, follows a step function. To keep it simple we’ll have no noise at all, just the underlying step function. Now, let’s apply a LOWESS smooth on the time-series. As you can see, the smoothed trend is basically an s-curve even though we know by assumption that the underlying structure of causation is a step.

set obs 50 gen t=[_n] gen x=0 in 1/30 replace x=1 in 31/50 twoway (lowess x t) (scatter x t, msymbol(circle_hollow)), legend(off)

Moral of the story, think carefully about whether the smoother is theoretically appropriate. If there are substantive reasons to expect discontinuities then it probably ain’t. For a similar reason you may want to not just assume a linear effect or even a polynomial specification in regression but compare various transformations (e.g., linear splines vs quadratics) and see what fits best.

Entry filed under: Uncategorized. Tags: graphs.

1.tc | September 6, 2012 at 3:37 pmThis is true, but it misses a pretty large literature on kernel smoothing with jump discontinuities, which is pretty mainstream in statistical theory, for example:

http://projecteuclid.org/euclid.lnms/1215463119

This would of course never happen if economists/sociologists were more receptive to nonparametric methods. ðŸ™‚