Team Sorting
| Gabriel |
Tyler Cowen links to an NBER paper by Hoxby that shows that in recent decades, status sorting has gotten more intense for college. Cowen asks “is this a more general prediction in a superstars model?” The archetypal superstar system is Hollywood, and here’s my quick and dirty stab at answering Tyler’s question for that field. Faulkner and Anderson’s 1987 AJS showed that there is a lot of quality sorting in Hollywood, but they didn’t give a time trend. As shown in my forthcoming ASR with Esparza and Bonacich, there are big team spillovers so this is something we ought to care about.
I’m reusing the dataset from our paper, which is a subset of IMDB for Oscar eligible films (basically, theatrically-released non-porn) from 1936-2005. If I were doing it for publication I’d do it better (i.e., I’d allow the data to have more structure and I’d build confidence intervals from randomness), but for exploratory purposes the simplest way to measure sorting is to see if a given film had at least one prior Oscar nominee writer, director, and actor. From that I can calculate the odds-ratio of having an elite peer in the other occupation.
Overall, a movie that has at least one prior nominee writer is 7.3 times more likely than other films to have a prior nominee director and 4.4 times more likely to have a prior nominee cast. A cast with a prior nominee is 6.5 times more likely to have a prior nominee director. Of course we already knew there was a lot of sorting from Faulker and Anderson, the question suggested by Hoxby/Cowen is what are the effects over time?
This little table shows odds-ratios for cast-director, writer-director, and writer-cast. Big numbers mean more intense sorting.
...+--------------------------------------+
...| decade cd wd wc |
...|--------------------------------------|
1. | 1936-1945 6.545898 6.452388 4.306554 |
2. | 1946-1955 9.407476 6.425553 5.368151 |
3. | 1956-1965 12.09229 8.741302 6.720059 |
4. | 1966-1975 4.697238 5.399081 4.781106 |
5. | 1976-1985 4.113508 6.984528 4.450109 |
6. | 1986-1995 4.923809 7.599852 3.301461 |
7. | 1996-2005 4.826018 12.35915 3.641975 |
+-----------------------------------------+
The trend is a little complicated. For collaborations between Oscar-nominated casts on the one-hand and either writers or directors, the sorting is most intense in the 1946-1955 decade and especially the 1956-1965 decade. My guess is that this is tied to the decline of the studio system and/or the peak power of MCA. The odds-ratio of good director for nom vs non-nom writers also has a jump around the end of the studio system, but it seems there’s a second jump starting in the 80s. My guess is that this is an artifact of the increasing number of writer-directors (see Baker and Faulkner AJS 1991), but it’s an empirical question.
Putting aside the writer-director thing, it seems that sorting is not growing stronger in Hollywood. My guess is that ever more intense sorting is not a logical necessity of superstar markets, but has to do with contingencies, such as the rise of a national market for elite education in Hoxby’s case or the machinations of Lew Wasserman and Olivia deHavilland in my case.
The Stata code is below. (sorry that wordpress won’t preserve the whitespace). The data consists of film-level data with dummies for having at least one prior nominee for the three occupations.
cd parentpath "/Users/rossman/Documents/oscars"
capture program drop makedecade
program define makedecade
gen decade=year
recode decade 1900/1935=. 1936/1945=1 1946/1955=2 1956/1965=3 1966/1975=4 1976/1985=5 1986/1995=6 1996/2005=7
capture lab drop decade
lab def decade 1 "1936-1945" 2 "1946-1955" 3 "1956-1965" 4 "1966-1975" 5 "1976-1985" 6 "1986-1995" 7 "1996-2005"
lab val decade decade
end
cd $parentpath
capture log close
log using $parentpath/sorting_analysis.log, replace
use sorting, clear
makedecade
*do odds-ratio of working w oscar nom, by own status
capture program drop allstar
program define allstar
preserve
if "`1'"!="" {
keep if decade==`1'
}
tabulate cast director, matcell(CD)
local pooled_cd=(CD[2,2]*CD[1,1])/(CD[1,2]*CD[2,1])
tabulate writers director, matcell(WD)
local pooled_wd=(WD[2,2]*WD[1,1])/(WD[1,2]*WD[2,1])
tabulate writers cast, matcell(WC)
local pooled_wc=(WC[2,2]*WC[1,1])/(WC[1,2]*WC[2,1])
shell echo "`pooled_cd' `pooled_wd' `pooled_wc' `1'" >> sortingresults.txt
restore
end
shell echo "cd wd wc decade" > sortingresults.txt
quietly allstar
forvalues t=1/7 {
quietly allstar `t'
}
insheet using sortingresults.txt, delimiter(" ") names clear
lab val decade decade
*have a nice day
Add comment November 8, 2009
A Note on the Uses of Official Statistics
| Gabriel |
One of the points I like to stress to my grad students is that data is not an objective (or even unbiased) representation of reality but the result of a social process. The WSJ had a story recently on how we get the “jobs created or saved” figures around the stimulus bill and it makes me want to burn my Stata dvd, take a two-hour shower, and then switch to qualitative methods where at least I know that I would be responsible for any validity problems in my work.
The idea of “jobs created or saved” by a government policy is a meaningful concept in principle but in practice it’s essentially impossible to reckon with any certainty. It’s the kind of problem you might be able to approach empirically if it happened many times and there was some relatively exogenous instrument, but in a single instance you’re probably better off using an answer derived from theory than actually trying to measure it. Nonetheless the political process demands that it be answered empirically and the results are absurd.
The way the government has tried to measure “jobs created or saved” by the stimulus is by simply asking contractors or subcontractors how many jobs were created or saved in their firm by the contract. This involves both false positives of contractors exaggerating the number of jobs they created or saved and false negatives of firms that were not direct beneficiaries of contracts but increased or retained production in expectations of benefitting from the multiplier. In the case covered by the WSJ, a shoe store that sold nine pairs of boots for $100 each to the Army Corps of Engineers didn’t know what else to put and so said they saved nine jobs. When asked about this by the WSJ the shoe store owner’s daughter/bookkeeper replied
“The question, I would like to know is: How do you answer that? Did we create zero? Is it creating a job because they have boots and go out and work for the Corps? I would be really curious to hear how somebody does create a job. The formula is out there for anyone to create, and it’s just so difficult,” she said.
Who’d a thunk it, but apparently FA Hayek was reincarnated as a shoe store worker in Kentucky.
(h/t McArdle)
Add comment November 4, 2009
Astro-baptists
| Gabriel |
On NPR the other day I heard a story about how a lobbyist forged letters to Congress from the NAACP and AAUW opposing the Waxman-Markey cap-trade bill. I thought this was amusing on several levels, only the first of which is that apparently the bill wasn’t convoluted and toothless enough to buy off all of the incumbent stakeholders as some of them hired this guy. The real interest though is that the blatant absurdity of this story heightens the basic dynamics of the bootlegger and Baptist coalition dynamic in that in this case the bootlegger was so desperate for a Baptist that he imagined one, much as the too-good-to-be-true quotes conjured by fabulist reporters heighten the absurd genre conventions of journalism.
The bootlegger and Baptist model is a part of public choice theory that argues that policy making often involves a coalition between stakeholders motivated by rentseeking and ideologues with principled positions. In the titular example, the policy is blue laws which would be supported both by Baptists who don’t like booze violating the sabbath and clandestine alcohol entrepreneurs delighted to see demand pushed from legitimate retailers to the black market. We had something close to a literal bootlegger-baptist model with the Abramoff scandal, in which various gambling interests paid the Christian Coalition to kneecap the competition. Another recent prominent example is that, before being airbrushed out of history for having, ahem, unorthodox political affiliations, Van Jones was best known for “green jobs,” which can be uncharitably described as a bit of political entrepreneurship proposing a grand bargain in which his constituents would get patronage jobs in exchange for supporting green policies.
Although bootlegger-Baptist is an econ model, soc and OB folks independently arrived at this same model by noting that resource dependence on the state is not a pure Tullock lottery, but is contingent on facial legitimacy. If you read chapter 8 of External Control of Organizations you’ll see that it’s not only the bridge between resource dependence and neo-institutionalism, but also a bootlegger-Baptist model avant le lettre.
One of the interesting things is that lately civil rights groups seem to have been the (real or imagined) Baptists of choice, and not just in the anti-Waxman-Markey forgery. So for instance a few weeks ago 72 Democratic Congressmen sent a letter to the FCC opposing net neutrality. It’s not surprising that the blue dogs were among them as you’d expect fiscal conservatives to oppose a new regulation. The interesting thing is that the letter was also signed by most of the Congressional Black Caucus, as well as “the Hispanic Technology and Telecommunications Partnership, the National Association for the Advancement of Colored People (NAACP), the Asian American Justice Center.” Their (plausible) logic was essentially that preventing telecoms from charging content providers would delay the rollout of broadband and therefore maintain the digital divide. So here we have an issue combining rent-seeking telecoms hoping to soak content providers and prevent competition from VOIP forming a coalition with civil rights groups and their legislative allies who have a principled commitment to eliminating inequality in use of technology.
I got total deja vu when I read this as the exact same thing happened a few years ago when Nielsen was attacked by the Don’t Count Us Out Coalition. The backstory is that Nielsen and Arbitron traditionally rely on diaries to collect the audience data that is used to set advertising rates. Unfortunately respondents are too lazy/stupid to complete diaries accurately. In recognition of this problem both Arbitron and Nielsen have been trying to switch to more accurate passive monitoring techniques that aren’t dependent on the diligence and recall of the respondent, but they still use diaries for sweeps.
Nielsen had the bright idea of the Local People Meter project, which would eliminate sweeps diaries in the largest media markets and rely entirely on a large continuous rolling sample using passive monitoring. This implies a substantial improvement in data quality for a large part of the advertising market. This sounds like a good thing but Nielsen found itself attacked by the “Don’t Count Us Out Coalition” which argued that Nielsen was a racist monopoly, mostly on the basis that in one or two of the test markets for LPM they undersampled blacks. The “Coalition” got some serious support in Congress until Nielsen was able to demonstrate that it was just an astroturf* group set up by NewsCorp, which stood to see a ratings drop under the improved technology. (Or more technically, the new technology would reveal that the old technology had been exaggerating the ratings of NewsCorp properties. Peterson and Anand have a great article on a similar dynamic in recorded music sales).
—–
*Given the rather promiscuous way that people throw around the term “astroturf” it’s necessary to clarify the term. I reserve the term “astroturf” exclusively for fax machine and letterhead operations organized by a lobbyist, pr firm, or the like. It is not analytically useful to extend the term to cover things like the tea parties where elites mobilize ordinary people to come and protest. If you want to distinguish such things from the Platonic ideal of grassroots mobilization fine, call them “fertilizing the grassroots” or something, but astroturf they ain’t. Likewise, it is lazy and slanderous conspiracy-mongering to assume without further evidence that anyone who takes the same position on an issue as a stakeholder must of course be bought by the stakeholder. If you want to echo Orwell and call such people “objectively pro-X” then fine, but that don’t mean the Baptist lacks a principled reasons for siding with the bootlegger on a particular issue.
Add comment November 4, 2009
Don said you were the market, and you were
| Gabriel |
AdAge has a report on who drinks different kinds of beer. For instance, it describes Heineken drinkers as “They love their brand badges—a role the distinctive green glass bottle may play—and in fact, this group is attracted to luxury products in general.” Ouch, better hope Betty Draper doesn’t read Don’s copy of AdAge.
Anyway, I mention it because this speaks to cultural capital, which for a long time was about musical taste but is increasingly focused on food. Likewise, there’s some very good niche partitioning literature on beer, which has been pretty salient lately given the advertising blitz for “BL Golden Wheat” (Anheuser-Busch’s brand of hefeweizen).
Add comment November 3, 2009
Stata 11 FV and margin
| Gabriel |
Yesterday I attended the ATS workshop on the new factor variables and margin syntax in Stata 11. Despite the usual statistical usage of the word “factor,” this has nothing to do eigenvectors and multi-dimensional scaling but is really about dummy sets and interactions. I might still be missing something, but it seems like the factor variables syntax is only an incremental improvement over the old “xi” syntax, mostly because it’s more elegant.
However the margin command is really impressive and should go a long way to making nonlinear models (including logit) more intelligible. I think a big reason people have p-fetishism is because with a lot of models it’s difficult to understand effects size. For this reason I like to close my results section with predicted values for various vignettes. I had been doing this in Excel or Numbers but “margin” will make this much easier, especially if I continue to experiment with specifications. (In general, I find that if you’re doing something once, GUI is faster than scripting, but we never just do something once so scripting is better in the long run). Anyway, it’s a very promising command.
My only reservation about both “factor variables” and “margin” is the value labeling. First, (like “xi”) neither command carries through value labels so you have to remember what occupation 3 is instead of it saying “sales.” Second, the numbers aren’t even consistent between factor and margin. Factor shows the value of the underlying variable whereas margin numbers the categories sequentially. So for instance, your basic dummy would be “0″for no and “1″ for yes in factor variables because that’s how it’s stored in memory and “1″ for no and “2″ for yes in margin because “no” is the first category. What is this, SPSS? Anyway, margin is a very useful command, but it would be even more useful if the command itself or some kind of postestimation or wrapper ado file made the output more intuitive. Not that I’m volunteering to write it. Help us Ben Jann, you’re our only hope!
7 comments October 29, 2009
Why Jay Leno is like classical music
| Gabriel |
As the logical conclusion of a trend that began with reality tv, NBC has concluded that it’s just too expensive to make scripted television. And so they filled their 10pm slot every week night with a dirt cheap variety show. Not surprisingly, the show has much lower ratings and ad revenues than the traditional one hour scripted dramas that filled the slot until this season. On the one hand it’s embarrassing for the network that ruled tv in the 1990s to embrace a low-cost, low-revenue model or, as some industry people call it, the ”winning by losing” model. On the other hand, it’s a much more profitable strategy because low as the revenues are, the costs are even lower.
People have mostly been focusing on the short-term revenues and in that sense NBC has indeed made the smart (but shameful) call. The long-run picture is more uncertain, even if you put aside Podolny-esque status issues and affiliate defections. Scripted television has always lost money on the first run and only turned a profit over the long-run. For decades this was mostly an issue of syndication (re-run) rights but for the last ten years it’s been dvd box sets. In contrast, non-scripted tv produces basically no long-term revenues — almost nobody watches ESPN Classic or buys dvds of game shows or “the Tonight Show”. There have been some troubling signs lately for the long-run revenue streams. Gilmore Girls is one of the best shows ever on television and you can buy the dvds for $20/season, a huge drop from the $50 or $60 a season the studios were asking a few years ago for tv. Likewise, it’s not clear that the studios will be able to effectively monetize streaming video, despite Rupert Murdoch’s attempt to get media companies to charge for content.
Thus you can read the Leno-ization of NBC as not just the idea that drama production budgets have gotten out of control, but also a bet that streaming video won’t produce long-term revenue streams at all comparable to those produced by syndication or dvd. Note that SAG and WGA seem to be making the opposite bet, as for the past few years Hollywood labor has been doing extremely painful strikes and soft strikes primarily over residuals on streaming. My hunch is that NBC is right and the unions are wrong on this, but it’s an empirical question.
The other interesting thing is that NBC is doing the exact opposite of what public radio did the 80s and 90s. Originally, public radio mostly consisted of classical music and jazz djs, which was dirt cheap content to produce but brought in little revenue. Them CPB and NPR bought some Arbitron reports and noticed that “All Things Considered” brought in the lion’s share of listeners. They checked pledge drive data and found that it also brought in most listener contributions. On this basis NPR added “Morning Edition,” and a little later, “Weekend Edition,” and it’s gotten to the point that pretty much all public radio stations play news and talk either in their best time slots (e.g., KCRW Santa Monica) or pretty much 24/7 (e.g., KPCC in Pasadena, WHYY in Philadelphia). Of course news is much more expensive to produce than just hiring a dj to spin Bach, but it also brings in more numerous, young, and affluent listeners. So we’ve seen public radio experience shift from low revenue & low cost to high revenue & high cost, pretty much exactly the opposite of what NBC is doing this season.
2 comments October 28, 2009
Copy mac files when booting from dvd
| Gabriel |
One of the frustrating things about the Mac is that there’s no such thing as a live cd (and live cds for Windows and Linux can’t read HFS disks). Of course you can boot from the installer dvd, but it doesn’t have the Finder. If you have problems booting from your internal disk and you don’t have a reasonably current backup this can induce alternating waves of panic and despair. (I’m speaking from experience. I’ve screwed up my partition table by playing with gparted. Actually, I’ve done this twice — as a dog returns to his vomit so a fool returns to his folly).
However you can still copy files because the installer dvd does have the Terminal, and the Terminal can invoke the command “cp“. Here’s how to do it.
- Put the dvd in and restart, tapping option so it let’s you choose the dvd.
- Choose a language, then instead of installing the OS, go to the Utilities menu and choose Terminal
- Plug in a USB drive and type “ls /Volumes”. Figure out which one is your USB drive, which one is your internal drive, and write it down. If it doesn’t recognize the USB drive you’ll need to mount.
- Use “cd” to navigate to your internal disk and find your most important files, which are probably in “/Volumes/Macintosh HD/Users/yournamehere/Documents”
- Use the “cp source target” command to copy files from the internal disk to the USB disk. To copy a directory use the -R option. For example to copy the directory “bookmanuscript” you’d use something like
cp -R '/Volumes/Macintosh HD/Users/yournamehere/Documents/bookmanuscript' /Volumes/USBdisk"
Add comment October 27, 2009
Shufflevar
| Gabriel |
Sometimes you face a situation where it’s really hard to see what the null is because the data structure is really complicated and there is all sorts of nonlinearity, etc. Analyses of non-sparse square network matrices can use the quadratic assignment procedure, but you can do something similar with other data structures, including bipartite networks.
A good null keeps everything constant, but shows what associations we would expect were association random. The simplest way to do this is to keep the actual variable vectors but randomly sort one of the vectors. So for instance, you could keep the actual income distribution and the actual values of peoples’ education, race, etc, but randomly assign actual incomes to people.
Fernandez, Castilla, and Moore used what was basically this approach to build a null distribution of the effects of employment referrals. Since then Ezra Zuckerman has used it in several papers on Hollywood to measure the strength of repeat collaboration. I myself am using it in some of my current radio work to understand how much corporate clustering we’d expect to see in the diffusion of pop songs under the null hypothesis that radio corporations don’t actually practice central coordination.
I wrote a little program that takes the argument of the variable you want shuffled. It has a similar application as bsample, and like bsample it’s best used as part of a loop.
capture program drop shufflevar program define shufflevar local shufflevar `1' tempvar oldsortorder gen `oldsortorder'=[_n] tempvar newsortorder gen `newsortorder'=uniform() sort `newsortorder' capture drop `shufflevar'_shuffled gen `shufflevar'_shuffled=`shufflevar'[_n-1] replace `shufflevar'_shuffled=`shufflevar'[_N] in 1/1 sort `oldsortorder' drop `newsortorder' `oldsortorder' end
Here’s an example to show how much clustering of “y” you’d expect to see by “clusterid” if we keep the observed distributions of “y” and “clusterid” but break any association between them:
shell echo "run rho" > _results_shuffled.txt
forvalues run=1/1000 {
disp "iteration # `run' of 1000"
quietly shufflevar clusterid
quietly xtreg y, re i(clusterid_shuffled)
shell echo "`run' `e(rho)'" >> _results_shuffled.txt
}
insheet using _results_shuffled.txt, names clear delimiter(" ")
histogram rho
sum rho
(Note that “shell echo” only works with Mac/Unix, Windows users should try postfile).
2 comments October 26, 2009
La vie en mort
| Gabriel |
Denis Colombi has contributed a few entries to the thriving sociology of zombies literature. His abstracts (including a description of zombie habitus) are all funny, but for my money the most sublime satire is his rational choice marginal analysis of zombie equilibrium:
Il faut comprendre les zombis en restituant les “bonnes raisons” de devenir zombis, afin de le faire apparaître comme un comportement rationnel. Ainsi, le choix de devenir ou non zombi dépend avant tout d’un calcul en fonction du rendement espéré de cette transformation. L’agrégation de ces comportements se traduit par un effet émergents, à savoir la réduction du nombre d’humains non-zombifiés ce qui réduit les gains de sa propre zombification. On peut ainsi parler d’une inflation zombifique, comme pour les diplômes.
It’s been a long time since lycée, but here’s my loose translation
We can understand zombies by restoring the “good reasons” to become zombies and thus make it apparent that it is a rational behavior. Thus the choice to become a zombie or not depends primarily a calculation based on the expected value of this transformation. The aggregation of these behaviors results in an emergent phenomenon, that is increasing the number of zombies reduces the marginal value of zombification. We can thus speak of zombification inflation, as with credential inflation for diplomas.
1 comment October 23, 2009
Time Machine and rsync
| Gabriel |
I think Time Machine is one of the best features of Leopard / Snow Leopard, but I still have a few issues with it.
First, I’m really not interested in having a Spotlight index of my Time Machine drive, so I go to System Preferences / Spotlight / Privacy and add my Time Machine volume to the “do not index” list. This isn’t so much a privacy issue as a performance issue since the Spotlight indexer (“mdworker”) is a real hog so why have it index stuff you don’t plan to search?
Second, Time Machine doesn’t work well with more than one backup volume, especially if you want to update one of the backups infrequently or backup different directories to each drive. In my case I have a large drive that I keep at work and a small backup drive that I keep at home in case my office burns down and destroys both my mac and the big backup drive. To use Time Machine for both disks, I would not only need to “select disk” but also “exclude items” because the disk I keep at home isn’t big enough to hold everything. Furthermore if I skip a few weeks of backing up to the home disk, Time Machine refuses to do an incremental backup.
My solution to this is to use Time Machine for the main backup drive and rsync for the second one. Every day I use Time Machine with my big backup drive at the office. Once a week or so at home I take my redundant backup drive (“seagate”) out of the drawer, plug it in, and run this shell script.
#!/bin/bash #backup_seagate.sh rsync -aE --delete ~/Documents/ /Volumes/seagate/rossman/Documents rsync -aE --delete ~/Library/ /Volumes/seagate/rossman/Library rsync -aE --delete ~/scripts/ /Volumes/seagate/rossman/scripts rsync -aE --delete ~/Pictures/ /Volumes/seagate/rossman/Pictures rsync -aE --delete ~/Music/ /Volumes/seagate/rossman/Music rsync -aE --delete ~/Applications/ /Volumes/seagate/rossman/Applications
Note that the version of rsync that ships with OS 10.5 or 10.6 is pretty old. If you install the current version, it will handle the resource fork more efficiently. There are instructions here but for my purposes it’s not worth the hassle.
[Update1: USB flash drives work well as your off-site backup because they are easier to transport than hard drives, being smaller and lacking moving parts. However you'll need to use Disk Utility to change the file system from FAT to HFS+].
[Update2: Be careful with rsync as the syntax is important. It needs to be "command options source target," if you reverse source and target you're pretty much screwed].
Add comment October 23, 2009