Archive for May, 2009

Computer viruses, herd immunity, and public goods

| Gabriel |

Today Slashdot has a post arguing that anti-virus software has both a private character (your computer works) and positive externalities (you don’t spread the virus to others) and this latter quality implies a public goods character that may make anti-virus software worthy of some kind of subsidy, including the indirect subsidy of a public service propaganda campaign. (I don’t see how much value a “the more you know” PSA would have, given that Windows is already obnoxiously in your face if you fail to install anti-virus software or let your the subscription lapse). Something the article doesn’t mention but is entirely consistent with is that there are subnational units who benefit from widespread immunity as a club good and so almost all corporate and university buy a site license for anti-virus software. This is perfectly rational behavior as they end up capturing most of the private and public (or rather, club) benefits of anti-virus in the form of less IT support as well as avoiding things like lowered productivity, bandwidth siphoned by botnets, and exposure to corporate espionage.

Similarly, Megan McArdle occasionally talks about how parents who refuse to vaccinate their children are not just endangering their own children but creating a public health problem. This is something I take seriously, not only as someone with a professional interest in diffusion but I also have a personal stake given that I have a toddler and autism junk science is very popular in west LA. So basically my daughter has an elevated risk of measles because these crackpots are terrified of a vaccine additive that is a) harmless and b) hasn’t even been in use for over ten years.

Both of these cases rely on the logic of diffusion. The big picture is that in an endogenous growth process the hazard is a function of extant saturation so the more infected there are the more at risk any given uninfected person is and by implication anything that lowers the overall infection rate lowers the hazard. A more complex and more micro version comes from the mixed contagion / threshold diffusion model of the sort that was modeled in Abrahamson and Rosenkopf’s 1997 Organization Science paper. (I’ve mentioned it before, but I really do love that paper). In these models individuals have a frailty level drawn from a random distribution and are exposed to (network) contagion and (generalized) cascades from their environment. When the contagion effect exceeds the individual’s threshold, the individual becomes infected and starts spreading the infection, thereby increasing the cascade and contributing to the social network effect on alters. What immunization does is it raises the individual’s threshold appreciably, but not to infinity. This makes the individual less likely to be infected, and where the public good aspect comes in is that this is especially so if the individual is facing only low to medium contagion pressure from the environment.

Another way to think of it is as the flipside of public goods, the tragedy of the commons. If everyone else in the world but you installed Symantec and got a measles shot, it would actually be very safe for you as an individual to forgo these protections because everyone else is healthy and won’t infect you or your computer. On the other hand if nobody else in the world had this protection you would be constantly bombarded by both corporeal and computer viruses and so personal immunization would be more attractive, though because your threshold is finite even with immunization you would still be much more vulnerable than if immunization were widespread. The irony is that in the high-vaccination and low-vaccination contexts there are very different individual benefits on the margin vs on average.

May 6, 2009 at 12:20 pm 3 comments

Strip the spaces from a string

| Gabriel |

Because both Stata and the OS treat (non-quote-escaped) whitespace as syntax parsing, I try to keep spaces out of strings when possible and either just run the words together or put in underscores. I especially like to do this for anything having to do with file paths. On the other hand I sometimes want to keep the spaces. For instance, if I have a file with lots of pop songs (many of which have spaces in their titles) and I want to graph them, I like to have the regular spelling in the title (as displayed in the graph) but take the spaces out for the filename. I wrote a little program called “nospaces” to strip the spaces out of a string and return it as a global.

capture program drop nospaces
program define nospaces
	set more off
	local x=lower("`1'")
	local char ""
	local char "`2'"
	local nspaces=wordcount("`x'")-1
	forvalues ns=1/`nspaces' {
		quietly disp regexm("`x'","(.+) (.+)")
		local x=regexs(1)+"`char'"+regexs(2)
	quietly disp "`x'"
	global x = "`x'"

Note that I’m too clumsy of a programmer to figure out how to get it to return a local so there’s the somewhat clumsy workaround of having it return a global called “x.” There’s no reason to use this program interactively but it could be a good routine to work into a do-file. Here’s an example of how it would work. This little program (which would only be useful if you looped it) opens a dataset, keeps a subset, and saves that subset as a file named after the keep criteria.

local thisartist "Dance Hall Crashers"
local thissong "Fight All Night" 
use alldata, clear
keep if artist=="`thisartist'" & song=="`thissong'"
nospaces "`thisartist'"
local thisartist=$x
nospaces "`thissong'"
save `thisartist'_$x.dta, replace

May 5, 2009 at 3:23 am 2 comments


| Gabriel |

In the face of the swine flu France has suggested suspending all EU flights to Mexico and you likewise occasionally hear calls for the US to temporarily close the border as a public health measure. Of course France has nothing close to the levels of social and economic integration with Mexico that we do so it’s a little easier for them to consider this than it would be for us. President Obama was asked about closing the border and said it would be “akin to closing the barn door after the horses are out, because we already have cases here in the United States.” Instead the US government has emphasized measures to curtail the domestic transmission of this disease through things like public transportation, schools, etc.

Having a completely one-track mind, I heard all this and thought it’s all about diffusion from without vs within a bounded population, I know this! Assume for the sake of argument that (absent action by American authorities) Mexico has a constant impact on the hazard rate of infection for Americans. This can either be because Mexico has a stable number of infections or, more realistically, because an increasing number of infections in Mexico are offset by a completely voluntary reduction in border traffic. We can thus treat Mexico as an exogenous influence. Of course Americans infecting each other is an endogenous influence. Now assume that there are two public health measures available, close the border and reduce the (domestic) transmission rate. The latter would involve things like face masks, encouraging sick people to stay home, closing schools that have an infection, etc. Further imagine that each measure would respectively cut the effect of exogenous and endogenous diffusion in half. What is the projected trajectory of the disease under various scenarios?

I’ve plotted some projections below, but first a few caveats:

  • For simplicity, I’ve assumed a linear baseline hazard rather than the more realistic Gompertz hazard. The projection is basically robust to this.
  • I’m assuming “no exit,” i.e. once infected, people remain contagious rather than getting better, being quarantined, or dying. This assumption is realistic over the short-run but absurd over the medium- to long-run.
  • I’m assuming that at this point 1% of the potential American risk pool is already infected. I also tried it with 5% and it works out the same.
  • Most importantly of all, I know nothing at all about the substantive issues of infectious disease, the efficacy of public health measures, and all of that sort of thing. Both the baseline numbers and the projected impacts of the public health measures are totally made up, and not even on made up an informed basis. I’m more interested in the math (which I know) than plausible assumptions to feed into it (which I don’t know).

Anyway, here are the projected number of infections, which again assume that public health measures suppress the relevant disease vector by 50%.


As can be seen, in the very short run closing the border is more effective but in the medium-run, measures to reduce domestic transmission are more effective. This is just efficacy, not efficiency (i.e., cost-benefit).

The code is below the fold.


May 3, 2009 at 2:57 pm

A threshold model for gay marriage

| Gabriel |
In a post at Volokh Conspiracy, Dale Carpenter notes that many states have recently made a push towards gay marriage and this may reflect a “bandwagon” effect. Although most of my work is on pop culture, I’ve experimented with applying these models to state policy and it looks like there’s rather a lot of the kind of bandwagon thing that Carpenter is describing. In my secondary analysis of the Walker data, I found that the typical law spreads from state to state via a mixed-influence curve very similar to that which Bass found for consumer appliances.

I think Carpenter’s analysis is basically accurate as far as it goes, but some of the details are a bit fuzzier than they ought to be. First, “bandwagon” is a vague term by which can mean any cumulative advantage process, be it cohesive contagion, structural equivalence contagion, information cascades, or network externalities. In quoting a post by Ryan Sager, Carpenter implies that it’s mostly cascades. Second, Carpenter talks a lot about public opinion, but this isn’t really the issue, rather what really matters are the opinions of policy makers. For a long time it has been apparent that courts are much more open to gay marriage than democratic policy institutions, but increasingly we are now seeing a gap open between small-d democratic plebiscites and small-r republican state legislatures. For instance in California, the gay marriage issue lines up as the courts and the state legislature (pro) versus plebiscites and the governor (con). It seems that part of the reasons for the public opinion versus policy maker opinion gap is that educated people are more cosmopolitan and part has to do with the coalition politics of the Democratic party (for instance, in California many Democratic legislators voted for AB 849 whose districts voted for prop 8, likewise I would be very surprised if gay marriage is as popular with DC residents as with the DC city council).

When you combine these two vagaries of what is the exact cumulative advantage mechanism and cumulative advantage among whom, you come to a very interesting synthesis about how this may be working. I would suggest that a very large part of the issue is not an information cascade but a network externality among policy makers. These points are subtly different. In an information cascade we don’t know the value of things and so we figure that the consensus about it is informative. With network externalities the consensus itself implies value so the important thing is to be with the consensus.

A simple recent example of network externality dynamics is the format war between HD-DVD and Blu-Ray. Aside from Sony no movie studio really cared about the differences between the formats (and to the extent they did care, they preferred HD-DVD which was cheaper to manufacture) but they cared a lot about making sure they didn’t commit to the wrong format because nobody wants to own a bunch of equipment and a big disc inventory for a format that consumers have rejected. The studios dithered about making a big commitment to either format until Sony basically sent its Playstation brand on a suicide mission to build a critical mass of Blu-Ray players at which point the remaining studios abandoned HD-DVD almost immediately.

Likewise at a certain point gay marriage began to seem inevitable (a prediction shared even by many people who see this as unfortunate). Now many ordinary people would say popular or not is irrelevant, I’d support [marriage equality / traditional marriage] even if everyone disagreed with me. However there is another way to think about it as “being on the right side of history,” a concern made more salient by the frequent analogies drawn to Jim Crow and especially to miscegenation laws. The Sager piece alludes to some pro-segregation pieces published by National Review in the 1950s and this is interesting. At the time these were not considered crackpot ideas (they were probably more mainstream than NR‘s pro drug legalization pieces in the 1990s) but in retrospect they are repulsive. I think this is a big part of what’s going on here, policy makers are not just judging themselves via public opinion today but against what they project public opinion to be in the future. Since they (probably accurately) perceive that gay marriage will become more popular over time they are calibrating their actions to this future metric rather than current opinion, which is basically divided (at present the median voters opposes gay marriage per se but favors the Solomonic “civil unions” compromise). In contrast, some voters care about “being on the right side of history” but many do not, in part because unlike legislators their votes are not recorded and thus if they change their minds in the future (or if they remain the same but their opinions become less popular than they are currently) they will suffer little problem from the inter-temporal contradiction.

(Note: I’m interested in this as a question of diffusion, not a substantive one of morality or policy, and will enforce this in the comments.)

May 1, 2009 at 5:00 pm 3 comments

Stata 64 for Mac

| Gabriel |

I’m way late to the party on this, but a couple months ago Stata released a free 64-bit upgrade for users who already have licenses of Stata 10 for Mac. Note that this upgrade does not download with the usual “update all” command, you need to follow instructions on the website.

The major advantage of this is it lets you access more than 2gb of memory. For reasons that they explain thoroughly it performs calculations faster on many variables but slower on others. If you’re worried about this open your favorite large dataset and type “desc” to see how most of your variables are stored. If most of them are “double” you definitely want 64-bit, if most of them are “byte” maybe not. (They don’t explain how it handles strings). Likewise think about how often you use things like “forvalue” loops, which involve the kind of little numbers that run faster in the old version of Stata.

May 1, 2009 at 3:42 am

Newer Posts

The Culture Geeks