Posts filed under ‘Uncategorized’

How Long Are Songs

| Gabriel |

Last night Scott Golder asked me how long pop songs are. I checked the Whitburn file’s “pop annual” tab to see this, at least for hit singles. The “time” variable appears consistently starting in the mid-50s. Just to be safe though I only looked at data starting with 1960 through 2008 (which is when my copy of the Whitburn file ends).

The short answer is that there’s a bimodal distribution, with one mode a bit shy of three minutes and another mode of a bit shy of four minutes. There are very few hit songs under two minutes. (Sorry Black Flag, sucks to be you). There are also relatively few hit songs over five minutes, though the right tail extends pretty far, with the following songs being over eight minutes: “Everybody Move” by Cathy Dennis, “The Astronauts (Parts 1 and 2)” by Jose Jimenez, “I Will Possess Your Heart” by Death Cab for Cutie, “American Pie (Parts 1 and 2)” by Don McLean, “November Rain” by Guns N Roses, and “A Better Place to Be (Live) (Parts 1 & 2)” by Harry Chapin. (Honorable mention to “2 Legit 2 Quit” by MC Hammer at 7:55).

time

Bimodal distributions always make me nervous, so let’s break this up by decade.

time_1960s

time_1970s

time_1980s

time_1990s

time_2000s

Breaking it up by decade makes it clear that the “bit under three minutes” mode is disproportionately songs from the 1960s and the “bit under four minutes” mode from subsequent periods. This makes sense when you realize that the dominant technological format of the 1960s was the 7-inch 45rpm single, which had a limit of three minutes. In contrast, 45s were less commercially important in subsequent periods and such technological formats as 12 inch 33rpm LPs, cassettes, CDs, and MP3s have no such time limitation that would reasonably matter for a single song. Moreover, there were also changes in radio. Genre-based radio formats were given a big boost in the late 1960s with the commercialization of the FM band and 1970s era formats like “Album-Oriented Rock” allowed for airplay that was more, well, album-oriented in terms of drawing cuts from LPs and not just 7 inches.

Another thing that’s pretty clear from looking at the decade-specific histograms is that there are sharp discontinuities. The three minute mark discontinuity in the 1960s is obviously a reflection of the technology of 7 inches. The other eras also show discontinuities at 3 and half minutes in the 1970s and 4 minutes in the 1980s and 1990s, with much weaker discontinuities at 4 minutes in the 1970s and 2000s. The recent 3:30 and 4:00 discontinuities are much harder to explain than the old 3:00 discontinuity because they don’t reflect a hard technological constraint. Rather, they seem to reflect a convention of radio airplay. Here’s a passage from Jacob Slichter’s one-hit wonder memoir, So You Want to Be a Rock N Roll Star (p 138-9):

In anticipation of “crossing over” the single to radio formats other than alternative rock, we did a pop mix (by Don Gehman with lighter portions of electric guitar) and an acoustic mix (by Puig, a soccer-mom version with no electric guitars and no drums until the second verse). Each mix had to be edited down to under four minutes, an important limit in the mind of radio programmers. (To submit a single with a track length of 4:01 is as foolish as pricing kitchen knives sold on television at $20.01). We pestered Bob Ludwig, the mastering engineer, with a slew of editing adjustments. “Okay, shorten the intro to what it was two verses ago, cut eight bars off the end of the bridge, and undo the cuts we asked you to make to the final chorus.”

(btw, the album version of “Closing Time” clocks in at 4:34.)

Nonetheless, the strength of this convention seems to have weakened since Slichter’s story takes place in 1998, with the 2000s showing a much weaker discontinuity and many more songs a few seconds over 4 minutes than did the 1990s. I don’t know why this is, but I think it’s worth noting that it doesn’t necessarily reflect weakening of the 4 minute radio rubicon but could also reflect changes to how the chart is calculated, such as the rise of a digital singles market (which has been weighted into the Billboard Hot 100 since 2005), or how the time variable is measured (perhaps it’s the iTunes or album time, not the time for the radio edit).

Here’s the code:

cd ~/Documents/codeandculture/whitburn
clear all
insheet using popannual.txt, clear
gen min=real(regexs(1)) if regexm(time,"([0-9]+)\:")
gen sec=real(regexs(1)) if regexm(time,"\:([0-9]+)")
gen time_sec=min*60+sec
sum min sec time_sec

gen decade=.
replace decade=1 if year>=1960 & year<1970
replace decade=2 if year>=1970 & year<1980
replace decade=3 if year>=1980 & year<1990
replace decade=4 if year>=1990 & year<2000
replace decade=5 if year>=2000 & year<2010
lab def decade 1 "1960s" 2 "1970s" 3 "1980s" 4 "1990s" 5 "2000"
lab val decade decade
histogram time_sec if decade!=., discrete xlabel(0(60)600) title("Billboard Hits, 1960-2008")
graph export time.png, replace width(1600)
histogram time_sec if decade==1, discrete xlabel(0(60)600) title("Billboard Hits, 1960-1969")
graph export time_1960s.png, replace width(1600)
histogram time_sec if decade==2, discrete xlabel(0(60)600) title("Billboard Hits, 1970-1979")
graph export time_1970s.png, replace width(1600)
histogram time_sec if decade==3, discrete xlabel(0(60)600) title("Billboard Hits, 1980-1989")
graph export time_1980s.png, replace width(1600)
histogram time_sec if decade==4, discrete xlabel(0(60)600) title("Billboard Hits, 1990-1999")
graph export time_1990s.png, replace width(1600)
histogram time_sec if decade==5, discrete xlabel(0(60)600) title("Billboard Hits, 2000-2008")
graph export time_2000s.png, replace width(1600)

*have a nice day

May 21, 2013 at 1:32 pm Leave a comment

Blogs, Payola, and Gift Exchange

| Gabriel |

In a review of the MRU media economics MOOC course (to which I contributed a guest lecture, part 1 and part 2), Ashok Rao asks why is there not more focus on new media. It’s a fair question and one that could be extended to my own course on media sociology, which for the most part could be fairly described as “sociology of the media as it existed through the 1990s” (I do deal with a few recent issues like how piracy unraveled bundling). In particular, Rao wants to know about blogging payola. This is actually an interest of mine as I’ve done work on radio payola and I’ve been thinking a lot lately about gift exchange.

First of all, Rao’s model is about exchange among bloggers, whereas traditionally payola involves exchange between two different types of actors, such as record labels and radio stations. As I’ve previously discussed, we have seen examples of this with bloggers who seem to be a little too close to political campaigns. Likewise a few years ago the FTC announced that bloggers should disclose when they’d received incentives from companies whose products they were discussing. The business model of Klout is basically to institutionalize this, by quantifying how influential social media users are and then serving as a broker for companies who want to give freebies to relatively influential folks in the hope that they’ll blog or tweet about their experiences.

That said, let’s get back to Rao’s model of blogging, which is that we link to higher status bloggers in the hopes that they’ll reciprocate with a link back. (Did I mention that I saw Rao’s post, via MR?). I’m not sure if I’d exactly call this “payola” but it is an interesting phenomena and is related insofar as it involves an exchange of fame. In fact it closely follows Roger Gould’s model of status. Gould’s model of status is that it’s a combination of preferential attachment and reciprocity. The preferential attachment dynamic means that we prefer to direct our attention towards high-status actors. However the reciprocity heuristic means that we also expect our attention and resources to be reciprocated. To the extent that the high status actors have finite attention with which to reciprocate, the two heuristics are in tension with each other and so in effect low status actors jointly optimize the two heuristics by accepting asymmetric relationships with high status actors, even as they would refuse similarly asymmetric relationships with low status actors. So I am willing to link to Tyler or Megan more than they link to me because they are higher status than I am and so this asymmetry in power makes me grateful for what attention they give me rather than resenting that I give them more attention than they do me.* And in a sense, I should be grateful since their attention is worth so much more than mine, as indicated by a look at the “referrers” section of my WordPress stats.

Nonetheless, as Podolny‘s model of status argues, the Gould model tends to result in cumulative advantage since the preferential attachment heuristic means we are willing to forgo a certain amount of reciprocity when dealing with high status actors. (Note that JLM treats exploitation in patronage as contingent, see figure 6.6 in Social Structures). As such, only occasionally reciprocated links will tend to lead to cumulative advantage in blogging fame.

———–

* It’s hard to describe patronage without sounding like you’re complaining. All I can say is that I have no complaints at all about my relationships with various famous bloggers and I consider some of them to be among my closest friends.

May 17, 2013 at 10:47 am 4 comments

Choosing the null set

| Gabriel |

A few months ago I listened to an interview with a historian who had studied “bride shows” in Czarist Russia. If you’re familiar with the Book of Esther (or its holiday, Purim) you’re familiar with the idea — a monarch holds a beauty contest to find a wife. This seems like a fairly obvious thing to do, but if you’ve studied history (or watched Game of Thrones) you know that typically royalty marry in order to cement political alliances. So why would the czar (or the shah) choose a commoner to marry? The answer is not that the king is actually trying to find the biggest hottie in the kingdom, but a political logic, in that the monarch does not wish to form an alliance with any of the domestic or foreign noble houses. If you’re at the apex of a power structure, forming an edge mostly serves to bring the other party up to your level and this could undermine efforts to horde power for yourself (or more likely, for your clan or faction). This seems to be a common practice where the polity is relatively isolated from neighboring polities (e.g., Russia, Egypt, Hawaii) and so marriage would in effect involve elevating a client rather than allying with a rival. In such situations the strategic choice is to choose “none of the above.”

It seems like there are really three ways to go about this:

1. Do not form a tie at all. That is, celibacy. This was the strategy exercised by Queen Elizabeth I.

2. Loops. That is, royal incest. This was the strategy practiced by most Egyptian dynasties right through the Ptolemies.

3. Form a tie with a socially irrelevant person. Here we have the bride show strategy. You form a tie, but do so with someone of low enough status that obviously they’re not a player.

Note that some apparent instances of strategies 1 and 2 might actually be strategy 3. On page 95 of Social Structures, JLM describes how strategy 1 was actually strategy 3 in Renaissance Florence:

But given that Florentine sons had to marry up, those of the most distinguished lineages were hard pressed to marry— there was no one good enough for the sons of the elite to marry. In this case, there was no elegant structural solution, but rather a cheat: the elite, argue Padgett and Ansell, snuck away to other neighborhoods to find women as opposed to effectively announcing to their neighbors that there was a family of higher status than themselves.

Likewise, powerful “celibate” clergy from Alexander VI to Marcial Maciel have formed ties to socially irrelevant people but framed it as celibacy by having children with mistresses. I’m not aware of explicit references to this, but I like to imagine that some royal incest marriages were sexless and the heir was actually produced by a concubine, which would be socially irrelevant marriage framed as a loop. You can even find cases where celibacy is framed as a socially irrelevant marriage, as with women who are married off to a god or inanimate object.

Also note that sometimes “strategies” could be imposed on people, as with celibacy imposed on rival succession claimants (eg, the mythological Greek princess Danae and her Roman doublet Rhea Silvia or the dozen or so very historical deposed Byzantine emperors forced into monastic orders).

You also see this sort of thing in non-marital contexts. Most famously, during the principate senators resented the emperors because the emperors relied heavily on freedmen and knights to staff the Roman imperial bureaucracy, such relatively lowly people being less likely than senators to use such positions to build rival power bases (or to extract usurious rents). We see a similar practice more recently with the kings of Ethiopia, who for centuries would request a bishop be sent down from Alexandria, the purpose of which was not so much to cement ties to Egypt as to refrain from investing ecclesiastical power in any of the local notables, a foreigner bishop being the next best thing to no bishop at all, politically speaking.

April 2, 2013 at 7:53 am Leave a comment

It’s Not TV, It’s Coasian Bargaining

| Gabriel |

In a guest post for Megan last year I argued that the biggest barrier to a la carte HBO Go is that it would provoke a backlash from the cable operators, upon whom HBO is still reliant for most of its sales. (FWIW, I wanted to title that post “There is no word for `cord-cutter’ in Dothraki,” but the editor made it less elliptical). Just a year later, we have HBO floating a proposal to let you buy HBO Go without getting basic cable. At first glance it looks like I was just wrong, but check out the fine print (actually the headline), which is that you wouldn’t buy the service directly from HBO, but through your ISP.

Now this seems crazy. I pay for all sorts of content on the internet (e.g., Netflix) but it’s not a check-off on my broadband bill, rather it’s something I pay directly to the provider. The idea of adding premium content as a check-off to your telecom bill seems really 80s or early 90s, harking back to when the information superhighway was going to be a sort of Minitel en anglais rather than the internet we’re used to where your connection is a pure infrastructure service, most content is ad-supported, and premium content is something you either pay for directly or through a handful of platforms charging the rightsholder a 30% sales commission (e.g., iTunes, Google Play, Amazon Instant/MP3/AndroidApps) who are not directly connected to your ISP. And yet HBO wants to go through the telecom check-off model rather than just sell you their content directly (or through a “store” platform like iTunes). The question is why, and, no, the answer is not because they are too stupid to think about it any other way or too lazy to set up their own billing system.

As I argued before, HBO has to navigate the Scylla of “piracy is a customer service issue” and the Charybdis of “don’t antagonize the still-powerful incumbents.” My reading is that this otherwise cockamamie proposal of ISP-centric billing is a pretty solid strategy for accomplishing just that. Let’s think about the advantages, from the point of view of maintaining HBO’s relationship with the telecoms:

  1. The ISPs get a cut. Traditionally, HBO retails for about a 100% markup. So if your cable company charges you $12 (on top of your basic cable) it’s paying HBO about $5 or $6. The proposed model would keep that going. Keep in mind that the ISP and the cable operators are usually the same companies. In this sense, making you buy HBO from Comcast or AT&T instead of directly from HBO is effectively a convoluted way for HBO to make a side payment to the telecoms to not retaliate in the core business model of selling HBO as part of a tv package. Note that if HBO were to settle the Coasian bargain by just to writing a check to the MSOs, this would be a lot simpler, but simple exchanges are often perceived as more morally objectionable than Rube Goldberg exchanges.
  2. Each ISP gets control over pricing. About half the price of HBO through your tv is the cable operator’s markup (see above) and given that Amazon and Apple only charge 30% for billing and hosting, it’s conceivable that HBO Go a la carte could undercut cable HBO on price. The new proposal ensures this won’t happen.
  3. Each ISP gets veto rights for its own customers. Suppose that your ISP isn’t happy with HBO’s offer to let it keep half the money from IP only HBO Go (which it would price at or above the price it charges tv customers) because it really wants to keep pushing you towards that “triple play” package its telemarketers keep harassing you with? Well, that ISP can just refuse to sell HBO GO to its broadband-only customers. And unlike Netflix, the ISP would actually be able to veto your purchase. It’s structurally very similar to car dealerships, where local brokers are terrified of (and can use their clout to prevent) translocal competition. This one is actually kind of scary. Imagine if you could only subscribe to the New York Times through your condo’s HOA, which would otherwise deny building access to the paperboy?

There are some ways in which this would still create problems for the cable operators, mostly in that it would undermine the two-part tariff aspect of their business model, but I think this is effectively obviated by the local veto aspect of the proposal. Moreover, cable operators are increasingly showing signs that they see the bundling aspect of their business model unraveling (mostly because carriage fees are out of control) and are willing to settle for a role of brokerage, without bundling. (Note that data caps, which don’t apply to content bought from your ISP, help enforce this brokerage role since they effectively let your ISP tax content bought on the open market).

So the good news is that you may be able to watch Girls without first having to also pay for a bunch of sports and reality shows about petulant alcoholics. The bad news is this represents yet another business model innovation against the open internet.

March 26, 2013 at 11:27 am 1 comment

Cutbacks or Hostile Media Effect?

| Gabriel |

Pew just came out with a “State of the Media” report. The main interpretation (which seems to originate with the authors) has been that the media are stuck in a death spiral as cost-cutting decreases coverage which in turn diminishes the audience (eg, see here and here). I have a lot of sympathy for the death spiral model and it’s certainly a relatively appealing model for journalists and j-school types (as it implies a switch to a subsidized and/or NPO model will solve all their problems) but as a reading of the survey results it is simply wrong.

The fundamental misunderstanding is to presume that consumers evaluate news coverage the same way the CJR does. They don’t. As argued by Gentzkow and Shapiro, consumers evaluate news with regards to their ideological priors. That is, almost nobody reads the newspaper and says “I am offended that this story seems to have allowed the journalist inadequate time to  report the story exhaustively” but lots of people read the paper and say “I am offended that this story takes the point of view that I disagree with.”

So when consumers answer “yes” to the question “Have you stopped turning to a particular news outlet because you felt they were no longer providing you with the news and information you were accustomed to getting?,” they probably aren’t thinking “I miss the in-depth reporting and investigative work I used to see” but rather “I no longer trust the media as reflecting my values.”

There are three key pieces of evidence in the report itself for the Gentzkow and Shapiro model:

  • When asked to elaborate problems with content, far more respondents said “The stories are less complete” than “there are fewer stories.” I strongly suspect by “less complete” many respondents are choosing the closest available option from the forced choice set to map onto “bias” allegations.
  • Dissatisfaction and abandonment is concentrated among men and Republicans. Although there are “hostile media” allegations from the left (eg, Herman and Chomsky, Media Matters, etc), in recent years conservatives have been the most vociferous in alleging media bias and providing an alternative “fair and balanced” media ecosystem. As such, conservatives are exactly among whom you’d expect to see the Gentzkow and Shapiro effect concentrated. (I’m bracketing the issue of whether it is justified for conservatives to feel this way since for our purposes only their subjective views are relevant).
  • 57% of respondents who are aware of media financial problems think they’re immaterial to coverage about national and international issues. I’m not one to believe that survey responses have to be logically consistent, but this only makes sense if you think the issue is bias, not man-hours.

The upshot is that my reading of the survey in light of the Gentzkow and Shapiro model is that the way for media outlets to survive and thrive is to engage in what traditionally trained journalists would regard as lower quality, by forsaking the objectivity genre and pandering to their readership’s beliefs. To a large extent that’s what we’ve been seeing already over the last generation as a process of creative destruction.

March 18, 2013 at 10:16 am 2 comments

Payola work

| Gabriel |

Since Tyler recommended me as a payola expert in his MOOC course on media economics, I figured I should create a brief index for the issue.

My work:

Climbing the Charts (Amazon link). Chapter 3 is about payola. It’s by far the most holistic and accessible thing I’ve written on the subject.

Methods piece (with Chiu and Mol) applying diffusion analysis to payola data

Theory piece on the micro-interactions of disreputable exchange. This paper was inspired by reading subpoenaed payola evidence and briefly mentions the case

By other people:

Daanen’s HitMen. Focuses on the 1980s (when the mob controlled radio) but also covers other periods

Coase. Payola in Radio and Television Broadcasting. Great theory view, though the efficiency argument doesn’t work as well if you assume “nobody knows” and/or imperfect capital markets. Even then, still really good. Also covers a lot of history (much of it from Sanjek and Sanjek)

Evidence from the 2005 settlements between NY state and EMI, Warner, Sony, Universal

Dozens of other things which I cite in my book and articles

March 2, 2013 at 7:24 am 2 comments

The Control Vector

| Gabriel |

(To be sung to the tune of “The Irish Rover“)

On the fourth of July two thousand and six
We plotted density, kernel
We had a parsimonious theory of cliques
To place in the grand flagship journal
In a flurry of chalk, we saw why the nodes so flock
It worked well as community detector
Then we worked out the specs, it had twenty-seven x
We’d specified the Control Vector

We had quadratic of time spent looking for work
We had dummy sets for SIC,
We had three million county-level fixed effects,
We’d a linear spline for distance from Rome
Homicides per hundred thousand!
We had eight million versions of former English colony
All dumped into the Control Vector

There was MLE (iteration four thousand three),
There was Poisson in lieu of a log
There were R libraries that never would work
And instruments nobody believed
There was the psych subject pool, they were drunk as a rule
And Huber-White to solve all problems
And the OECD, if that you can believe
Was the source for half the Control Vector

We were in review round seven when the funding ran out
And the department’s budget was cut
And all our FTE were reduced down to three
Just meself and some deadweight old nuts
Then the server crashed, what can you do with that?
The hard drives were turned right over
Hard crash on the ground, and no backup to be found
That was the last of the Control Vector

February 7, 2013 at 5:59 am Leave a comment

Fake Plutarch Quotes Are the Newest and Most Facile Ailment of All Arguments About Inequality

| Gabriel |

Apparently it’s a thing to quote Plutarch as having said “An imbalance between rich and poor is the oldest and most fatal ailment of all republics.” This phrasing does not appear anywhere in the Project Gutenberg edition of the canonical Clough version of Lives.

It is possible that “oldest and most fatal” is just an unusual translation from the original Greek and so doesn’t turn up in a ctrl-F search, but I am extremely skeptical. As somebody who has actually read Plutarch (and who quotes him accurately in my own syllabus), it doesn’t pass the smell test. Plutarch has a distinctly aristocratic perspective and is more likely to complain about demagogues pandering to the mob than to complain about the dispossession of the poor. For instance, in his lives of the Gracchi he describes the underlying grievances of the depopulation of small farms and the rise of the latifundia, but he also criticizes the Senate for going squishy by offering conciliatory redistributive measures (specifically, a grain dole and colonial land) to the mob, “by gratifying and obliging them with such unreasonable things as otherwise they would have felt it honorable for them to incur the greatest unpopularity in resisting.” Mind you, I think it is entirely fair to read Plutarch and come away with the opinion that the facts he describes provide evidence that inequality is indeed the oldest and most fatal ailment of republics, I just don’t think that’s Plutarch’s own opinion, let alone his language.

Here’s a Google Books search (91 hits), web search (36000 hits), and Google Scholar search (31 hits) for the exact phrase. I also found 8 hits in Lexis-Nexis, one in Proquest dissertations, and 3 hits in Proquest Newspapers but those are hard to link to.

The oldest version I could identify was from 1985,  The Longman History of the United States by Hugh Brogan which appears to be a textbook. In the early 1990s it starts making appearances in The Economist and a few books, including Boiling Point by Kevin Phillips. It has a second wave in the last decade, perhaps because Robert Frank used it in a much quoted and recirculated  op-ed for the Philadelphia Inquirer. In most of these cases the quote is used as an opening epigraph and in none of them is there any indication of where Plutarch is alleged to have written this, just general dates and descriptions for Plutarch himself (e.g., “1st c. AD historian”). Also note that in some versions the phrase is misattributed to Plato instead of being misattributed to Plutarch — I guess one somewhat recognizable but seldom read figure from antiquity starting with “P” is as good as another.

The best way to get a big picture is with Google ngrams. Unfortunately this only allows searches of 5 word strings so I chose “oldest and most fatal ailment” as the most distinctive part.

plutarch

As you can see, this phrase goes back to 1966 (although I don’t know in which book as the oldest hit in Google Books is Longman History from 1985) which is older than anything I found but much more recent than all the major English-language versions of Plutarch. It skitters along with the occasional usage and then begins to take off in the 1990s and with a second and larger wave occurring right now. (Btw, here’s a Google Books search just for the shorter phrase, for some reason it gives more and earlier hits, but none older than Hugh Brogan’s Longman History of the United States of America in 1985.)

In contrast if you do Ngram searches for authentic five word strings from Plutarch, some of them don’t turn up at all but others show references dating to the 19th century. For instance, here are Ngrams for some memorable authentic phrases.

Ultimately I can’t identify where this “oldest and most fatal” canard comes from, but I’m pretty sure it ain’t Plutarch and most likely it was just made up in the 1960s. All I have to say is to quote Thomas Jefferson, “he who would falsely ascribe a passage would desecrate a mind.” Or maybe he didn’t say that and actually I just made it up because I think it’s kind of cool to be able to draw on the authority of a memorable historical thinker using archaic sounding language.

[Update, here's America's favorite superhero mayor using this canard.]

December 12, 2012 at 9:24 am 1 comment

Paul’s Letter to the Unskewers

| Gabriel |

CLT never faileth: but whether there be speculations, they shall fail; whether there be talking heads, they shall cease; whether there be punditry, it shall vanish away. For we know in part, and we expound in part. But when the election actually happens, then that which is observed in sample shall generalize to the population. When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish ideas that polls were deliberately biased. For now we see as through a homophilous social network; but then directly observe the population: now I know in part; but then shall I know even as my secret ballot remains unknown. And now abideth parameter, error, CLT, these three; but the greatest of these is CLT.

November 7, 2012 at 5:27 am 5 comments

Within the Margin of Error

It’s election season and that means I have ample opportunities to be annoyed by people misunderstanding how sampling error works. Let’s put aside the popular canard that n/N is a meaningful ratio (a complaint found in innumerable letters to the editor furious that a sample of 1500 is being used to draw inferences about a population of 300 million). Let’s also put aside questions that are about validity rather than sampling error (Bradley effect, cell phones vs landlines, likely voter screens, etc) as in principle these are valid issues even if they are sometimes the objects of motivated reasoning and/or bizarre conspiracy theories as with the whole “unskew” trope.

What I have in mind is misunderstanding about “margin of error” that treats all points within the confidence interval as equally likely, as if the central limit theorem implied a bounded uniform distribution instead of a t distribution. For instance, let’s imagine if a poll showed the president up by 2 points in a poll with a 4 point margin of error and the Romney campaign said “we’re not worried as that’s within the poll’s margin of error.” A Google search for the phrase “within the margin of error” gives me 863,000 hits, 14,500 of which are from the last month. Well, sure, but the smart money would still be on the president. Indeed, we can quantify by exactly how much.

Anyway, I’m going too fast for those of you in the back of the class. Let’s back up to the beginning. For starters, the term “margin of error” is just a heuristic for explaining to people who don’t really understand statistics that you have to take the point estimate (i.e., the headline figure) with a grain of salt. In real statistics we generally speak of “standard error” and margin of error is just double the standard error. It’s doubled because that gives you enough wiggle room that the correct answer will be in that range 95% of the time and by convention statistics usually sets 5% as the acceptable rate of error from statistical inference. (This is also what we use in most scientific journals). So if you want to interpret poll results like a pro, the first thing you do is cut the margin of error in half and that’s your standard error.

The way you interpret standard error is by realizing that sampling error follows a t distribution, which with the exception of very small datasets is the same thing as a normal distribution (i.e., “the bell curve”). (Thanks to the Central Limit Theorem it doesn’t matter if the underlying thing you’re measuring follows a normal distribution or not, an infinite number of repeated estimates of its mean will still follow a normal distribution.). Standard error is the standard deviation (or “sigma”) of this bell curve of repeated estimates. Your point estimate is the center of the curve and you measure different alternate possibilities by their difference from the point estimate divided by the standard error. The thing that talk about “margin of error” misses is that possibilities that are close to the point estimate are much more likely than possibilities that are at the edge of the margin. In a normal distribution, 68% of the density is within one standard deviation of the mean, 95% within two sigmas, and 99.5% within three sigmas. As you may have noticed, 68% is a lot bigger than 27% (i.e., 95%-68%). So if a poll says 52% of people favor the president and the margin of error is 4 points, there’s a 68% chance that the actual number favoring the president is between 50% and 54% and a 95% chance that the number favoring the president is between 48% and 56%.

You may have noticed that in a country where elections are decided by majorities, the change from 54% to 56% is much less interesting than the shift from 50% to 48%. Indeed, the most meaningful way to interpret a poll is probably to ask what are the chances that a candidate’s actual level of support is lower than 50%. This is a special case of a “one-tailed test,” which means we don’t care about plus and minus, but only plus or minus. In this case since our point estimate is above the interesting threshold, we only care about the minus or left tail. Take our point estimate of 52% and subtract 50% for majority and our estimate is 2 percentage points above a majority. This equals our standard error, so the one-tailed test is at one standard deviation out. If you remember that normal is symmetrical, you know how many tails you want, and you’ve memorized the 68/95/99.5 densities for a standard normal distribution, then you can calculate it in your head. If not (or if you’re not dealing with integer sigmas) you can use the NORM.S.DIST() function in Excel or the normal() function in Stata. With our example of a point estimate of 52% and a margin of error of 4 points, you find there’s an 84% chance that the true answer is a bare majority or higher. This is technically within the “margin of error,” but it’s also 5 to 1 odds, which would be great odds to have if you were playing blackjack. Bottom line, there’s nothing magical about being just inside versus just outside the margin of error. If you’re down, you’re down.

October 1, 2012 at 6:05 am 3 comments

Older Posts


The Culture Geeks

Recent Posts


Follow

Get every new post delivered to your Inbox.

Join 46 other followers