Archive for September, 2009

Uncertainty, the CBO, and health coverage

| Gabriel |

[update. #1. i’ve been thinking about these ideas for awhile in the context of the original Orszag v. CBO thing, but was spurred to write and post it by these thoughts by McArdle. #2. MR has an interesting post on risk vs uncertainty in the context of securities markets]

Over at OT, Katherine Chen mentions that IRB seems to be a means for universities to try to tame uncertainty. The risk/uncertainty dichotomy is generally a very interesting issue. It played a huge part in the financial crash in that most of the models and instruments based on them were much better at dealing with (routine) risk than with uncertainty (aka, “systemic risk”). Everyone was aware of the uncertainty but the really sophisticated technologies for risk provided enough comfort to help us ignore that so much was unknowable.

Currently one of the main ways we’re seeing uncertainty in action is with the CBO’s role in health finance reform. The CBO’s cost estimates are especially salient given the poor economy and Orszag/Obama’s framing of the issue as about cost. The CBO’s practice is to score bills based on a) the quantifiable parts of a bill and b) the assumption that the bill will be implemented as written. Of course qualitative parts of a bill and the possibility of time inconsistency are huge elements of uncertainty on the likely fiscal impact of any legislation. The fun thing is that this is a bipartisan frustration.

When the CBO scored an old version of the bill it said it would be a budget buster, which made Obama’s cost framing look ridiculous and scared the hell out of the blue dogs. This infuriated the pro-reform people who (correctly) noted that the CBO had not included in its estimates that IMAC would “bend the cost curve,” and thus decrease the long-term growth in health expenditures by some unknowable but presumably large amount. That is to say, the CBO balked at the uncertainty inherent in evaluating a qualitative change and so ignored the issue, thereby giving a cost estimate that was biased upwards.

More recently the CBO scored another version of the bill as being reasonably cheap, which goes a long way to repairing the political damage of its earlier estimate. This infuriates anti-reform people who note (correctly) that the bill includes automatic spending cuts and historically Congress has been loath to let automatic spending cuts in entitlements (or for that matter, scheduled tax hikes) go into effect. That is to say, the CBO balked at the uncertainty inherent in considering whether Congress suffers time inconsistency and so ignored the issue, thereby giving a cost estimate that was biased downwards.

That is to say, what looks like a straight forward accounting exercise is only partly knowable and the really interesting questions are inherently qualitative ones like do we trust IMAC to cut costs and do we trust Congress to stick to a diet. And that’s not even getting into real noodle-scratchers like pricing in the possibility that an initially cost-neutral plan chartered as a GSE would eventually get general fund subsidies or what will happen to the tax base when you factor in that making coverage less tightly coupled to employment should yield improvements in labor productivity.

September 18, 2009 at 5:18 pm

Stata2Pajek (update)

I fixed two bugs in Stata2Pajek and improved the documentation. To get it, type (from within Stata)

ssc install stata2pajek

If you already have the old version, the above command will update it but even better is to update all of your ado files with:


September 18, 2009 at 2:07 pm

If at first you don’t succeed, try a different specification

| Gabriel |

Cristobal Young (with whom I overlapped at Princeton for a few years) has an article in the last ASR on model uncertainty, with an empirical application to religion and development. This is similar to the issue of publication bias but more complicated and harder to formally model. (You can simulate the model uncertainty problem as to control variables but beyond that it gets intractable).

In classic publication bias, the assumption is that the model is always the same and it is applied to multiple datasets. This is somewhat realistic in fields like psychology where many studies are analyses of original experimental data. However in macro-economics and macro-sociology there is just one world and so to a first approximation what happens is that there is basically just one big dataset that people just keep analyzing over and over. To a lesser extent this is true of micro literatures that rely heavily on secondary analyses of a few standard datasets (e.g., GSS and NES for public opinion; PSID and ADD-health for certain kinds of demography; SPPA for cultural consumption). What changes between these analyses is the models, most notably assumptions about the basic structure (distribution of dependent variable, error term, etc), the inclusion of control variables, and the inclusion of interaction terms.

Although Cristobal doesn’t put it like this, my interpretation is that if there were no measurement error, this wouldn’t be a bad thing as it would just involve people groping towards better specifications. However if there is error then these specifications may just be fitting the error rather than fitting the model. Cristobal shows this pretty convincingly by showing that the analysis is sensitive to the inclusion of data points suspected to be of low quality.

I think it’s also worth honoring Robert Barro for being willing to cooperate with a young unknown researcher seeking to debunk one of his findings. A lot of established scientists are complete assholes about this kind of thing and not only won’t cooperate but will do all sorts of power plays to prevent publication.

Finally, see this poli sci paper which does a meta-analysis of their two flagship journals and finds a suspicious number of papers that are just barely significant. Although, they describe the issue as “publication bias,” I think the issue is really model uncertainty.

September 17, 2009 at 3:30 pm

Bash/Perl tutorial

| Gabriel |

I’m a big fan of the idea of using Unix tools like Perl to script the cleaning of the massive text-based datasets that social scientists (especially sociologists of culture) often use. Unfortunately there’s something of a learning curve to this so even though I like the idea in principle and increasingly in practice, I still sometimes clean data interactively with TextWrangler and just try to keep good notes.

Fortunately two of my UC system colleagues have posted the course materials for a “Unix and Perl Primer for Biologists.” I’m about halfway through the materials and it’s great, in part because (unlike the llama) they assume no prior familiarity with programming or Unix. Although the examples involve genetics, it’s well-suited for social scientists as, like us, biologists are not computer scientists but are reasonably technically competent and they often deal with large text based data sets. Basically, if you can write a Stata do-file, you should be able to follow their course guide and if you use things like scalars and loops it should be pretty easy.

I highly recommend the course to any social scientist who deals with large dirty datasets, in other words, basically anyone who is a quant but doesn’t just download clean ICPSR or Census data. This is especially relevant for anyone who wants to scrape data off the web, use IMDB, do large-scale content analysis, etc.

Some notes:

  • They assume you will a) be running the materials off a stick and b) using Mac OS X. If you’re keeping the material on the hard drive, get used to typing “chmod u+x” to make the perl script “foo” executable. (This step is unnecessary for files on a stick because unlike HFS+ or EXT3, the FAT filesystem doesn’t do permissions). If you’re using a different version of Unix, most of it should work similarly with only a few minor differences, such as that you’ll want to use Kate instead of Smultron and on a Mac a USB stick is in /Volumes/ whereas in Linux it’s in /Media/ and in BSD it’s in /mnt/. If you’re using Windows you’ll either need to a) install CygWin b) install a virtual machine c) run off a live cd or bootable stick or d) dual boot with Wubi.
  • If you’re really used to Stata, some of the nomenclature may seem backwards, mostly because Perl doesn’t keep a dataset in memory but processes it on disk, one command at a time. So, in Perl and Bash a “variable” is the equivalent to what Stata calls a (global or local) “macro”. The closest Perl equivalent to what Stata calls a “variable” would be a “field” in a tab-delimited text file.

[Update: Although they suggest Smultron, I find TextMate works even better as it can execute scripts entirely within the editor, so you don’t have to constantly cmd-tab to and back.]

    September 15, 2009 at 2:58 pm

    I drink your agency!

    | Gabriel |

    A few weeks ago I read (well, listened to an audiobook of) Upton Sinclair’s Oil! and was both impressed and surprised by it.

    First, although a few scenes are almost identical, overall the book is very different from There Will Be Blood, although both are well worth reading/watching. The central character in the book is the son, not the father. The father (J. A. Ross in the book) is a basically sympathetic character completely lacking the seething misanthropy that characterized the father (Daniel Plainview) in the movie. In the book Paul is a central character whereas Eli is a footnote, the opposite of the movie where Eli is important and Paul is almost a phantasm.

    Second, all of these changes in character and plot are directly related to the change in theme. The film is all about the clash between Wirtschaft und Gemeinschaft as personified in a decades long fight between a miserable bastard of an entrepreneur and a miserable bastard of a preacher. So Daniel Plainview flat out says “I hate most people” and he means it. In contrast, the book is about the clash between capital and labor and is manifested despite the good will of all the major characters. J. A. Ross the entrepreneur and Paul Watkins the proletarian are both sympathetic characters and in fact are friends with each other. Their opposition is not personal, but political, and these political differences are driven pretty directly by opposing material interests. In the book, as in Marx, class cohesion is emergent from the structure of economic relations and not reliant on the personal ideologies of the various individuals involved. Consistent with this highly structural approach was making the protagonist the son, Bunny, instead of the father. Bunny is a fairly passive character who develops very gradually and the only break with structural determinism in the book is that Bunny slowly rejects his father’s ambition that he continue in the oil business and instead founds a socialist newspaper and later, college.

    I thought this choice of making the action occur despite the character’s inner motivation was both in keeping with the semi-Marxist tone of the book and surprisingly effective as a dramatic device. Of course when you recall that the book came first, the question is not just why did Sinclair write a materialist dialect novel, but why, in the course of adapting it, did Anderson make it more conventional by emphasizing agency and personality? I have to think that the answer is that despite the careful wardrobe and set decoration to evoke pre-war California, the movie is thematically very much of our era. By era, I don’t mean “post-1989 when socialism was dead” (although that too) but even more importantly the post-1970 era. In politics this means the new social movements who changed the focus of politics from social class and redistribution to identity and self-expression, even while the Berlin Wall was still standing. In art in general and film in particular this means to be deep it must be dark, with The Godfather and Taxi Driver being the paradigmatic cases. To the contemporary ear, a film in which a basically decent capitalist is driven to corrupt the state and exploit workers by the impersonal dynamics of class struggle, might as well be written in Sanskrit. Much better to have a nihilistic anti-hero as opposed by a fundamentalist charlatan, both of them driven entirely by their internal character.

    A few tangential notes on the book:

    • All the “I’ve seen the future and it works” stuff about Russia is cringe-inducing in retrospect, as is the not too subtle consistent description of the Bolsheviks as “working men” which is meant to imply that they weren’t combatants at all, but civilians, and thus all violence done to their faction during the Russian civil war was a war crime (even as the violence they did to other factions is regrettable, but understandable).
    • While the book is very deliberately and explicitly to the far left on economics, it’s interesting how in many ways the book is, by today’s standards, extremely culturally reactionary in a more taken-for-granted kind of way. There are many references to various liberties and vices, all of which are used as examples of upper class decadence. This is clearest in the case of the three (count em) rich women not only sleep around, but justify their behavior with elaborate self-serving treatises on free love. (In contrast, the only romantic relationship between two leftists is only consummated within marriage). Likewise, the description of an election night victory party (for the candidate bought by the oil men) opens with an essay on jazz that is, ahem, racially-insensitive. I take this as an example of how hard it is to project today’s political alignments into the past.
    • You have to love a book that describes sociology as “an elaborate structure of classifications, wholly artificial, devised by learned gentlemen in search of something to be learned about.”
    • Between the socialist politics of the book, his lengthy satire of the lives and works of Hollywood (Ross’ business partner is a satire of W R Hearst and Bunny dates an actress), and the inclusion of the Eli character as a (tangential) satire of Aimee Semperson McPhee, you can see why the Hollywood moguls and McPhee worked so strenuously to oppose Sinclair’s run for governor of California a few years after the book was published. This gubernatorial run had an important place in Hollywood history in that it set the ground work for the radicalization of the WGA, and by further extension, the blacklist.

    September 14, 2009 at 5:25 am

    Rate vs raw number

    | Gabriel |

    Two things I read recently spoke to my sense of why it doesn’t make sense to talk about raw numbers but only rates.

    One is the observation that “from 2004 to 2007 more people left California for Texas and Oklahoma than came west from those states to escape the Dust Bowl in the 1930s.” I’m not going to claim that the article has it wrong and California is in fantastic shape, but this is a very misleading figure. (Even though it’s pretty funny to imagine a latter day Tom Joad loading up the aroma therapy kit into his Prius and heading east in search of work and low taxes). Let’s put aside that it’s using out-migration when net migration is a much better measure of the “voting with your feet” effect. This “worse than the dust bowl” factoid ignores that the present population of California is four times bigger than the combined population of Texas and Oklahoma in 1930, so as a rate the gross outflow is much less than that of the Dust Bowl Okies. (On the other hand the Okies largely came to California whereas many native born Californians are moving to Arizona, Nevada, Oregon, and Washington, not just OK+TX). Anyway, the dust bowl comparison strikes me as a cute but meaningless figure, which shouldn’t be necessary given that the state’s problems are severe enough that you don’t need half-truths to describe them.

    The second thing I read was this fascinating article on the public health debate over salt. (long story short, the case for salt intake restrictions was always weak but has been growing weaker, nonetheless the hyper risk averse killjoys of the public health community favor maintaining the low recommended daily allowance just in case). The most interesting thing to me was this passage:

    The controversy itself remains potent because even a small benefit–one clinically meaningless to any single patient–might have a major public health impact. This is a principal tenet of public health: Small effects can have important consequences over entire populations. If by eating less salt, the world’s population reduced its average blood pressure by a single millimeter of mercury, says Oxford University epidemiologist Richard Peto, that would prevent several hundred thousand deaths a year

    That is, we should take even infinitesimal rates seriously if they are applied to a large population. If you take this logic to its natural conclusion it implies that in ginormous countries like China and India the speed limit should be 40 miles an hour and they should put statins in the drinking water, but in tiny countries like New Zealand or Belize people should drive like the devil is after them and open another bag of pork rinds. Or if you take a more cosmopolitan perspective, you could say that we should further restrict our salt intake until some time around 2075 (the UN’s best estimate for the peak world population) and after that we can start having the occasional french fry again.

    I guess I shouldn’t be too harsh on this kind of thing since as someone who sometimes deals with very large datasets, it is in my interests for peer reviewers to pay attention to my awesome p-values (look Ma, three stars!) and ignore that in some instances these p-values are attached to betas that aren’t substantively or theoretically significant, but benefit from practically infinite N driving standard error practically to zero.

    September 9, 2009 at 5:19 am 1 comment

    Surface / contour graphs with tddens.ado

    | Gabriel |

    I have previously complained about the lack of surface / contour graphs in Stata, first providing native code for some thoroughly fug graphs, then code that pipes to gnuplot. The latter solution produces nice output but is a) a pain to install on Mac or Windows and b) doesn’t adhere to Stata’s graph conventions.

    I was actually considering cleaning up my gnuplot pipe and submitting it to SSC, but no longer, for there is a new ado file by Austin Nichols that does this natively and produces absolutely gorgeous graphs. The only advantage of piping things to gnuplot is that gnuplot is faster, but I think the advantages of doing everything natively within Stata make the extra couple seconds well worth it.

    Fire up your copy of Stata and type

    ssc install tddens

    Here’s some sample output:

    On behalf of Stata users everywhere, thank you Dr. Nichols.

    September 8, 2009 at 6:06 am

    Older Posts Newer Posts

    The Culture Geeks