Archive for November, 2010

Conditioning on a Collider Between a Dummy and a Continuous Variable

| Gabriel |

In a post last year, I described a logical fallacy of sample truncation that helpful commenters explained to me is known in the literature as conditioning on a collider. As is common, I illustrated the issue with two continuous variables, where censorship is a function of the sum. (Specifically, I used the example of physical attractiveness and acting ability for a latent population of aspiring actresses and an observed population of working actresses to explain the paradox that Megan Fox was considered both “sexiest” and “worst” actress in a reader poll).

In revising my notes for grad stats this year, I generalized the problem to cases where at least one of the variables is categorical. For instance, college admissions is a censorship process (only especially attractive applicants become matriculants) and attractiveness to admissions officers is a function of both categorical (legacy, athlete, artist or musician, underrepresented ethnic group, in-state for public schools or out-of-state for private schools, etc) and continuous distinctions (mostly SAT and grades).

For simplicity, we can restrict the issue just to SAT and legacy. (See various empirical studies and counterfactual extrapolations by Espenshade and his collaborators for how it works with the various other things that determine admissions.) Among college applicant pools, the children of alumni to prestigious schools tend to score about a hundred points higher on the SAT than do other high school students. Thus the applicant pool looks something like this.

However, many prestigious colleges have policies of preferring legacy applicants. In practice this mean that the child of an alum can still be admitted with an SAT score about 150 points below non-legacy students. Thus admission is a function of both SAT (a continuous variable) and legacy (a dummy variable). This implies the paradox that the SAT scores of legacies are about half a sigma above average for the applicant pool but about a full sigma below average in the freshman class, as seen in this graph.

Here’s the code.

```clear
set obs 1000
gen legacy=0
replace legacy=1 in 1/500
lab def legacy 0 "Non-legacy" 1 "Legacy"
lab val legacy legacy
gen sat=0
replace sat=round(rnormal(1100,250)) if legacy==1
replace sat=round(rnormal(1000,250)) if legacy==0
lab var sat "SAT score"
recode sat -1000/0=0 1600/20000=1600 /*top code and bottom code*/
graph box sat, over(legacy) ylabel(0(200)1600) title(Applicants)
graph export collider_collegeapplicants.png, replace
graph export collider_collegeapplicants.eps, replace
ttest sat, by (legacy)
keep if (sat>1400 & legacy==0) | (sat>1250 & legacy==1)
graph box sat, over(legacy) ylabel(0(200)1600) title(Admits)
ttest sat, by (legacy)
*have a nice day```

Keep the best 5 (updated)

| Gabriel |

Last year I mentioned my policy of assigning about seven quizzes and then keeping the best 5. I then had a real Rube Goldberg-esque workflow that involved piping to Perl. Several people came up with simpler ideas in the comments, but the most “why didn’t I think of that” was definitely John-Paul Ferguson’s suggestions to just use reshape. Now that I’m teaching the class again, I’ve rewritten the script to work on that logic.

Also, I’ve made the script a bit more flexible by allowing it to specify in the header how many quizzes were offered and how many to keep. To make this work I made a loop that builds a local called sumstring.

[UPDATE 11/29/2010, applied Nick Cox’s suggestions. Old code remains but is commented out]

```local numberofquizzes 6
local keepbest 5

*import grades, which look like this
*uid    name    mt  q1  q2  q3
*5001   Joe     40  5   4   6
*4228   Alex    20  6   3   5
*rescale the quizzes from raw points to proportion
forvalues qnum=1/`numberofquizzes' {
quietly sum q`qnum'
replace q`qnum'=q`qnum'/`r(max)'
}
/*
*build the sumstring local (original code)
local sumstring ""
forvalues i=1/`keepbest' {
local sumstring "`sumstring' + q`i'"
disp "`sumstring'"
local sumstring=subinstr("`sumstring'","+","",1)
disp "`sumstring'"
}
*/
*reshape long, keep top few quizzes
reshape long q, i( notes uid name mt) j(qnum)
recode q .=0
gsort uid -q
by uid: drop if _n>`keepbest'
by uid: replace qnum=_n
*reshape wide, calc average
reshape wide q, i(notes uid name mt) j(qnum)
*build the sumstring local (w/ Nick Cox's suggestions)
unab sumstring : q*
disp "`sumstring'"
local sumstring : subinstr local sumstring " " "+", all
disp "`sumstring'"
gen q_avg=(`sumstring')/`keepbest'
sort name
sum q_avg

*have a nice day```

PDF DRM and CUPS-PDF

| Gabriel |

A lot of PDFs are intentionally crippled by DRM (digital rights management). I’ve found this is common with any PDF that requires forms and many PDFs you get from publishers for peer review, page proofs, and the like. Anything involving FDF or Locklizard is going to be DRM’d. These DRM restrictions prevent you from saving annotations, viewing outside of a certain date window, and various other forms of hassle that obstruct a paperless workflow.

Fortunately, the DRM usually retains printing privileges. This implies an incredibly simple solution for Mac/Linux users — “print” the document to a PDF file on disk using CUPS-PDF. This driver works at a really low level so the application sees CUPS-PDF as just another postscript printer, which means that it works even when the Preview/PDF button in the Mac print dialog is disabled. In an earlier post I gave instructions on how to install CUPS-PDF (in color) on a Mac.

Why Steve Jobs and Jeff Bezos May Succeed Where Kevin Martin Failed (and why we might regret it)

| Gabriel |

OK, so first thing is that those of you who are not telecom nerds should know is that Kevin Martin was the chairman of the FCC during George W. Bush’s second term. His agenda at the FCC was to force multi-system operators (i.e., cable and satellite companies) to provide ala carte pricing. We already have this for premium cable (HBO, Showtime, etc) but not for “basic” cable. The way basic cable television works is that the cable channel charges the carrier a fee for each subscriber, whether or not the subscriber requested or even ever watches the channel. So for instance, before I canceled my cable subscription, about \$2 or \$3 of my exorbitant cable bill to Time Warner would get passed on to Disney for the privilege of having ESPN, even though I have never watched this channel ever. The reason is that many other people love ESPN and get far more than \$3 a month worth of pleasure from viewing it. Disney knows this and tells Time Warner that it’s all or nothing. Long story short, Kevin Martin failed in this agenda and nothing happened with a la carte cable pricing.

A few months ago, Steve Jobs charged up the reality distortion field and unveiled a few products, one of which was eliminating the abomination that was the buttonless iPod Shuffle. The most important “one more thing” reveal though was the new version of the Apple TV. The old Apple TV was basically a stripped down Mac Mini that used the “Front Row” 10-foot user interface to access your iTunes library. It was primarily oriented towards downloading purchased content and storing it on the device’s hard drive. The new Apple TV is basically the same as a Roku in that it emphasizes streaming over local storage.

Both AppleTV and Roku can stream Netflix and each also has an a la carte video service, iTunes rentals and Amazon video-on-demand, respectively. While Netflix is somewhat similar to a traditional cable business model insofar as it’s a monthly subscription fee for “all you can eat,” the two a la carte services represent a decisive break from the cable television business model. Instead of all you can eat, you buy an episode (\$1 for 24 hour rental from Apple, \$2 for perpetual access from Amazon).

This is a potentially disruptive innovation for the television industry because one of the main ways the industry had practiced price discrimination (and therefore increased both revenues and quantity) was to engage in bundling. A switch to a la carte will probably result in an increase in consumer surplus per unit demanded but a drastic decrease in quantity supplied. (“Consumer surplus” is econ jargon for the subjective experience of a “bargain”).

Suppose that my household values watching True Blood and Mad Men at \$5 an episode, Top Chef at \$2 an episode, and Mythbusters and Toddlers and Tiaras at 50 cents an episode. Now suppose that my next door neighbors have the exact opposite set of preferences. In both cases there is a total of \$13 of demand for television per household per week. If the cable company charges \$12.99 per week, both my neighbor and I will write the check, but do so reluctantly as we’re just barely this side of the indifference curve.

Now suppose that someone (say, Apple or Amazon) starts selling shows a la carte. If the price point is 50 cents both my neighbor and I will still watch all five shows. However we’ll only be paying \$2.50 a week and will be getting \$10.50 in consumer surplus. If the price point is \$2, each of us will get three shows and pay \$6, for \$6 in consumer surplus. If the price point is \$4.99 we’ll each buy two shows, pay almost \$10, and get two cents of consumer surplus. There is no price point where we both pay \$12.99 like we used to. At any one a la carte price point, both my neighbor and I will pay less than we used to, watch the same or less amount of tv, and get the same or higher consumer surplus.

Thus a switch to an a la carte model implies much lower costs to the consumer. Because revenues would fall, so would production by some combination of reduced numbers of shows and reduced production values. Basically, we’re looking at an end to the television renaissance we’ve enjoyed since the late 1990s as people like me decide that we’d rather pay \$10 or \$20 a month for the few shows we love and do without the rest than pay \$50 a month for a bunch of stuff, most of which we don’t even really like.

However desirable the trade-off of less viewing for a much lower price may be, it may prove unsustainable in the long run. I may love Mad Men so much that \$2 an episode to stream to my Roku feels like a bargain. However these shows may only be economically viable because there are also some people who have a marginal attachment to the same show and the current business model of cable bundling lets the content producers effectively get several dollars per episode from the cult following and maybe fifty cents an episode from the casual viewers.

For instance, Battlestar Galactica was only just barely profitable and that was achieved by combining high per-viewer revenues from a small numbers of viewers who really got the show (i.e., viewers who loved it for Laura “Airlock” Roslin and Gaius Baltar)  and low per-viewer revenues from a larger number of viewers who just watched once in awhile (i.e., viewers who tuned in to see the spaceships shoot at each other). However contemptible the latter viewers might be, the show wouldn’t have been renewed without them (which might not have been a bad thing given how the show eventually jumped the shark with the whole “final five” business).

That is to say, television may not be economically viable when priced on an a la carte basis and this could lead to a decline in volume and possibly quality of original programming. This will probably involve a slow decline but could be catastrophic. The most likely scenario for a catastrophic collapse is if the studios forecast that a la carte means declining revenue and try to pare back their cost structure in anticipation. This would probably lead to a militant slate getting elected at both WGA and SAG and an even worse strike / soft strike than we had on the last contract cycle.