Bootstrapping superstars

March 19, 2009 at 7:25 pm

| Gabriel |

Most cultural products and cultural workers follow a scale-free or power-law distribution for success with a tiny handful of ludicrously successful products/workers, a fairly small number with a modicum of success, and a truly ginormous number that are absolute failures. Ever since Sherwin Rosen described this phenomena in an influential American Economic Review theory piece this phenomena has been nicknamed “the superstar effect.” For a review of the major theories as to why there is a superstar effect, check out this lecture from my undergrad course (mp3 of the lecture and pdf of the slides).

One methodological problem this creates is that if you are interested in describing the overall market share of abstract categories the measure is highly sensitive to flukes. For instance, say you were interested in drawing trend lines for the sales volumes of different book genres and you noticed that in 2002 there was a decent-sized jump in sales of the genre “Christian.” One interpretation of this would be that this is a real trend, for instance you could make up some post hoc explanation that after the 9/11 terrorist attacks people turned to God for comfort. Another interpretation would be that there was no trend and all this reflects is that one book, The Purpose Driven Life, was a surprise hit. Distinguishing statistically between these concepts is surprisingly hard because it’s very hard (at least for me) to figure out how to model the standard error of a ratio based on an underlying count.

Fortunately you don’t have to because when in doubt about error structure you can just bootstrap it. My solution is to bootstrap on titles then calculate the ratio variable (e.g., genre market share) based on the bootstrapped sample of titles. You can then use the standard deviation of the bootstrapped distribution of ratios as a standard error. To return to our example of book genres, we could bootstrap book titles in 2001 and 2002 and calculate a bootstrapped distribution of estimates of Christian market share for book sales. You then do a t-test of means to see whether 2002 was statistically different from 2001 or whether any apparent difference is just the result of a few fluke hits. In other words, was 2002 a good year for Christian books as a genre, or just a good year for Rick Warren (whose book happened to be in that genre).

Here’s some Stata code to create bootstrapped samples of titles, weight them by sales, and record the proportion of sales with the relevant trait:

set obs 1
gen x=.
save results.dta, replace
use salesdatset, clear
*var desc
*  title  -- a title
*  sales  -- some measure of success
*  trait  -- dummy for some trait of interest for title (eg genre, author gender, etc)
*  period -- period of the obs (eg, year)
forvalues i=1/1000 {
 bsample, strata (period)
 gen traitsales = trait*sales
 ren sales salestotal
 collapse (sum) traitsales salestotal, by (period)
 gen traitshare=traitsales/salestotal
 drop traitsales salestotal
 gen bs=`i' /*records the run of the bstrap */
 reshape wide traitshare, i(bs) j(period)
 append using results
 save results.dta, replace
use results, clear
*for each traitshare variable, the sd can be interpreted as bstrapped standard error

Entry filed under: Uncategorized. Tags: , , .

Append to nothing Commercial visualization

The Culture Geeks

%d bloggers like this: