Margin of Error
| Gabriel |
A few months ago, the Gary Johnson campaign for the GOP primary issued a press release responding to CNN’s decision to exclude him from their debate on grounds of viability, specifically low poll numbers. (h/t, Conor). The Johnson campaign makes some valid points about the validity of early polls, for instance that they are mostly being a function of name (*cough* Trump bubble *cough*) recognition. However they make a common mistake when they talk about sampling error:
While we have had no specific explanation from the debate sponsors, it appears that Gary Johnson’s exclusion was based on some mysterious polling arithmetic. Whatever that arithmetic was, the differences that excluded us while producing invitations for several other less-known candidates would certainly fall within the margin of error of any poll.
Without commenting on the merits of Governor Johnson’s candidacy, there are really two issues.
First, when you are making a decision on the basis of numbers you have to some threshold and cases that fall near the threshold are necessarily arbitrary. If one moves the threshold to accommodate the boundary cases this puts other cases near the boundary. Another way to think about this is that CNN’s true threshold for viability could be 4% and they’re just calling it 2% to allow in the boundary cases that are close to 4%. In some ways this is the opposite of the issue that the difference between significance and insignificance is not significant. The difference is that in science we have the luxury of saying we should postpone judgement on an issue barring further data collection (which is really what a p value around .06 or .07 usually means) whereas barring the invention of quantum televisions CNN doesn’t have the option of “maybe” giving Johnson a podium.
Second, “margin of error” is not really a statistical concept so much as a heuristic for making the concept of standard error easier to understand. The heuristic is valid for proportions near 50%, but breaks down as you get towards extremely high or extremely low proportions. Assuming a proportion of 50% gives an upper-bound estimate for standard error and while such epistemic humility is a better bias than the alternative, it can occasionally lead us astray. A simple way to understand this is that if a poll has a stated “margin of error” of +/- 3%, and the point estimate is 1%, this does not mean that the population proportion is anywhere from negative 2% to positive 4% as proportions are necessarily non-negative.
Specifically, standard error of a proportion is π(1-π)/(n-1)^0.5. Do the math plugging in a proportion of 50% and a sample size of about 1200 (both of which are typical for opinion polls) and you get standard error of about 1.5 points. To get a 95% confidence interval you multiply the standard error by +/- 2, which is where we get the usual margin of error of plus or minus 3 points. Again, that’s around 50%. If instead you plug in 1%, you get a standard error of .03%, which you can double for a margin of error of +/- .06%. In that sense, two candidates who are polling at 1% and 3% are not “in a statistical tie” even though this would be true of two candidates who are polling at 49% and 51%.
Entry filed under: Uncategorized.