## Archive for October, 2012

### Within the Margin of Error

It’s election season and that means I have ample opportunities to be annoyed by people misunderstanding how sampling error works. Let’s put aside the popular canard that n/N is a meaningful ratio (a complaint found in innumerable letters to the editor furious that a sample of 1500 is being used to draw inferences about a population of 300 million). Let’s also put aside questions that are about validity rather than sampling error (Bradley effect, cell phones vs landlines, likely voter screens, etc) as in principle these are valid issues even if they are sometimes the objects of motivated reasoning and/or bizarre conspiracy theories as with the whole “unskew” trope.

What I have in mind is misunderstanding about “margin of error” that treats all points within the confidence interval as equally likely, as if the central limit theorem implied a bounded uniform distribution instead of a *t* distribution. For instance, let’s imagine if a poll showed the president up by 2 points in a poll with a 4 point margin of error and the Romney campaign said “we’re not worried as that’s within the poll’s margin of error.” A Google search for the phrase “within the margin of error” gives me 863,000 hits, 14,500 of which are from the last month. Well, sure, but the smart money would still be on the president. Indeed, we can quantify by exactly how much.

Anyway, I’m going too fast for those of you in the back of the class. Let’s back up to the beginning. For starters, the term “margin of error” is just a heuristic for explaining to people who don’t really understand statistics that you have to take the point estimate (i.e., the headline figure) with a grain of salt. In real statistics we generally speak of “standard error” and margin of error is just double the standard error. It’s doubled because that gives you enough wiggle room that the correct answer will be in that range 95% of the time and by convention statistics usually sets 5% as the acceptable rate of error from statistical inference. (This is also what we use in most scientific journals). So if you want to interpret poll results like a pro, the first thing you do is cut the margin of error in half and that’s your standard error.

The way you interpret standard error is by realizing that sampling error follows a *t* distribution, which with the exception of very small datasets is the same thing as a normal distribution (i.e., “the bell curve”). (Thanks to the Central Limit Theorem it doesn’t matter if the underlying thing you’re measuring follows a normal distribution or not, an infinite number of repeated estimates of its mean will still follow a normal distribution.). Standard error is the standard deviation (or “sigma”) of this bell curve of repeated estimates. Your point estimate is the center of the curve and you measure different alternate possibilities by their difference from the point estimate divided by the standard error. The thing that talk about “margin of error” misses is that possibilities that are close to the point estimate are much more likely than possibilities that are at the edge of the margin. In a normal distribution, 68% of the density is within one standard deviation of the mean, 95% within two sigmas, and 99.5% within three sigmas. As you may have noticed, 68% is a lot bigger than 27% (i.e., 95%-68%). So if a poll says 52% of people favor the president and the margin of error is 4 points, there’s a 68% chance that the actual number favoring the president is between 50% and 54% and a 95% chance that the number favoring the president is between 48% and 56%.

You may have noticed that in a country where elections are decided by majorities, the change from 54% to 56% is much less interesting than the shift from 50% to 48%. Indeed, the most meaningful way to interpret a poll is probably to ask what are the chances that a candidate’s actual level of support is lower than 50%. This is a special case of a “one-tailed test,” which means we don’t care about plus and minus, but only plus or minus. In this case since our point estimate is *above* the interesting threshold, we only care about the *minus* or left tail. Take our point estimate of 52% and subtract 50% for majority and our estimate is 2 percentage points above a majority. This equals our standard error, so the one-tailed test is at one standard deviation out. If you remember that normal is symmetrical, you know how many tails you want, and you’ve memorized the 68/95/99.5 densities for a standard normal distribution, then you can calculate it in your head. If not (or if you’re not dealing with integer sigmas) you can use the NORM.S.DIST() function in Excel or the normal() function in Stata. With our example of a point estimate of 52% and a margin of error of 4 points, you find there’s an 84% chance that the true answer is a bare majority or higher. This is technically within the “margin of error,” but it’s also 5 to 1 odds, which would be great odds to have if you were playing blackjack. Bottom line, there’s nothing magical about being just inside versus just outside the margin of error. If you’re down, you’re down.

Recent Comments