Posts tagged ‘diffusion’

Zeno’s Webmail Security Team Account Confirmation

| Gabriel |

Last year I described how a “reply to all” cascade follows an s-curve. Now (via Slashdot) I see that another pathology of email results in the other classic diffusion process. That is, the number of hits received by phishing scams follow the constant hazard function, otherwise known as an “external influence” diffusion curve or Zeno’s paradox of Achilles and the tortoise.

link to original story

This is of course entirely predictable from theory. Once you realize that people aren’t forwarding links to phishing scams, but only clicking on links spammed to them directly then it’s obvious that there will not be an endogenous hazard function. Furthermore, con artists know that the good guys will shut down their site ASAP which means that it is in their interest to send out all their spam messages essentially simultaneously. Thus you have a force that people are exposed to simultaneously and they react to individualistically. Under these scope conditions it is necessarily the case that you’d get this diffusion curve and you’d get a majority of fraud victims within the first hour.

This only comes as at all surprising to people because we’re so enamored of s-curves that we forget that sometimes people open their umbrellas because it’s raining. (Which is not to say that such behavior is asocial in a broader sense).

December 3, 2010 at 12:14 am

Scraping for Event History

| Gabriel |

As I’ve previously mentioned, there’s a lot of great data out there but much of it is ephemeral so if you’re interested in change (which given our obsession with event history, many sociologists are) you’ve got to know how to grab it. I provided a script (using cron and curl) for grabbing specific pages and timestamping them but this doesn’t scale up very well to getting entire sites, both because you need to specify each specific URL and because it saves a complete copy each time rather than the diff. I’ve recently developed another approach that relies on wget and rsync and is much better for scaling up to a more ambitious scraping project.

Note that because of subtle differences between dialects of Unix, I’m assuming Linux for the data collection but Mac for the data cleaning.* Using one or the other for everything requires some adjustments. Also note that because you’ll want to “cron” this, I don’t recommend running it on your regular desktop computer unless you leave it on all night. If you don’t have server space (or an old computer on which you can install Linux and then treat as a server), your cheapest option is probably to run it on a wall wart computer for about $100 (plus hard drive).

Wget is similar to curl in that it’s a tool for downloading internet content but it has several useful features, some of which aren’t available in curl. First, wget can do recursion, which means it will automatically follows links and thus can get an entire site as compared to just a page. Second, it reads links from a text file a bit better than curl. Third, it has a good time-stamping feature where you can tell it to only download new or modified files. Fourth, you can exclude files (e.g., video files) that are huge and you’re unlikely to ever make use of. Put these all together and it means that wget is scalable — it’s very good at getting and updating several websites.

Unfortunately, wget is good at updating, but not at archiving. It assumes that you only want the current version, not the current version and several archival copies. Of course this is exactly what you do need for any kind of event history analysis. That’s where rsync comes in.

Rsync is, as the name implies, a syncing utility. It’s commonly used as a backup tool (both remote and local). However the simplest use for it is just to sync several directories and we’ll be applying it to a directory structure like this:


In this set up, wget only ever works on the “current” directory, which it freely updates. That is, whatever is in “current” is a pretty close reflection of the current state of the websites you’re monitoring. The timestamped stuff, which you’ll eventually be using for event history analysis, goes in the “backup” directories. Every time you run wget you then run rsync after it so that next week’s wget run doesn’t throw this week’s wget run down the memory hole.

The first time you do a scrape you basically just copy current/ to backup/t0. However if you were to do this for each scrape it would waste a lot of disk space since you’d have a lot of identical files. This is where incremental backup comes in, which Mac users will know as Time Machine. You can use hard links (similar to aliases or shortcuts) to get rsync to accomplish this.** The net result is that backup/t0 takes the same disk space as current/ but each subsequent “backup” directory takes only about 15% as much space. (A lot of web pages are generated dynamically and so they show up as “recently modified” every time, even if there’s no actual difference with the “old” file.) Note that your disk space requirements get big fast. If a complete scrape is X, then the amount of disk space you need is approximately 2 * X + .15 * X * number of updates. So if your baseline scrape is 100 gigabytes, this works out to a full terabyte after about a year of weekly updates.

Finally, when you’re ready to analyze it, just use mdfind (or grep) to search the backup/ directory (and its subdirectories) for the term whose diffusion you’re trying to track and pipe the results to a text file. Then use a regular expression to parse each line of this query into the timestamp and website components of the file path to see on which dates each website used your query term — exactly the kind of data you need for event history. Furthermore, you can actually read the underlying files to get the qualitative side of it.

So on to the code. The wget part of the script looks like this

DATESTAMP=`date '+%Y%m%d'`
cd ~/Documents/project
mkdir logs/$DATESTAMP
cd current
wget -S --output-file=../logs/$DATESTAMP/wget.log --input-file=../links.txt -r --level=3 -R mpg,mpeg,mp4,au,mp3,flv,jpg,gif,swf,wmv,wma,avi,m4v,mov,zip --tries=10 --random-wait --user-agent=""

That’s what it looks like the first time you run it. When you’re just trying to update “current/” you need to change “wget -S” to “wget -N” but aside from that this first part is exactly the same. Also note that if links.txt is long, I suggest you break it into several parts. This will make it easier to rerun only part of a large scrape, for instance if you’re debugging, or there’s a crash, or if you want to run the scrape only at night but it’s too big to completely run in a single night. Likewise it will also allow you to parallelize the scraping.

Now for the rsync part. After your first run of wget, run this code.

cd ..
rsync -a current/ backup/baseline/

After your update wget runs, you do this.

cd ..
cp -al backup/baseline/ backup/$DATESTAMP/
rsync -av --delete current/ backup/$DATESTAMP/

* The reason to use Linux for data collection is that OS X doesn’t include wget and has an older version of the cp command, though it’s possible to solve both issues by using Fink to install wget and by rewriting cp in Mac/BSD syntax. The reason to use Mac for data analysis is that mdfind is faster (at least once it builds an index) and can read a lot of important binary file formats (like “.doc”) out of the box, whereas grep only likes to read text-based formats (like “.htm”). There are apparently Linux programs (e.g., Beagle) that allow indexed search of many file formats, but I don’t have personal experience with using them as part of a script.

** I adapted this use of hard links and rsync from this tutorial, but note that there are some important differences. He’s interested in a rolling “two weeks ago,” “last week,” “this week” type of thing, whereas I’m interested in absolute dates and don’t want to overwrite them after a few weeks

September 28, 2010 at 4:26 am 5 comments

Apple v Adobe and network externalities

| Gabriel |

Many people are aware that the iPad doesn’t display Flash and so many websites have big “plug-in not loaded” boxes in areas where streaming video and interactive features would normally be. This is just the tip of the iceberg of Apple’s attempt to kill Flash. Not only does the iPhone OS not directly process Flash (as an interpreted language), but the Apple app store has created a new policy that it won’t even allow compiled code that was originally written in Flash — only software written in Apple’s own C compiler XCode. This sounds esoteric, but in plain English, this means that if you’re a software developer it’s much harder than it used to be to develop apps for both the iPhone and Android.

Apple claims that this is about preserving the quality of the user experience and I think it’s worth taking this seriously. Cross-platform development is efficient on the supply side but the results are ugly and slow as an end user experience. For instance, on my Mac OpenOffice (which is written in Java) is much slower and uglier than either MS Office or Apple iWork (both of which are true Mac native applications). Most of the speed hit comes from being interpreted, but note that Apple is forbidding even compiled Flash code, which in theory should be almost as fast as software written in C. Thus the most charitable interpretation of this policy is that Apple is run by such massive control freaks that they are willing to impose huge expenses on developers to avoid the relatively minor speed hit and aesthetic deviations implied by allowing compiled Flash code.

The more sinister interpretation is that from Apple’s perspective the difficulties the policy imposes on software developers aren’t a bug but a feature. Flash makes it almost as easy to make software for the Android and the iPhone as it does to create software for only one or the other. Thus under a regime that allowed Flash, the end user would have available the same software with either kind of phone and so the Android and iPhone would be in direct competition with each other. This would drive the smart phone market into commodity hell. So by killing Flash, Apple is raising the costs of cross-platform development and forcing developers to pick a side. Assuming that many developers pick Apple, this will increase the availability of software on the iPhone versus the Android and so many consumers will buy the iPhone for its better software availability. Apple is familiar with this dynamic as traditionally one of the main disadvantages of Mac (and Linux) is that many important applications and hardware drivers are only available for Windows, thereby locking users into that platform.

This has been the speculation among geeks for a few weeks now (here, here, and here) but according to the NY Post, the government is taking the sinister interpretation seriously and may well sue Apple.

May 3, 2010 at 2:37 pm 2 comments

Fiddler’s Green

| Gabriel |

Matt at Permutations links to a critique of the famous zombie epidemiology paper. The original paper was your basic diffusion model, which assumes spatial homogeneity, and as people like to do to these kinds of models, the critique relaxes that assumption. Specifically the new model assumes a gradient of areas that are relatively favorable to humans, perhaps because of resource stocks, terrain that’s easily defensible with a rifle, etc. The new model then finds that the equilibrium is not total apocalypse, but a mixed equilibrium with substantial depopulation with few zombies and a relatively large number of humans.

Being both the world’s foremost expert on the sociology of zombies and a diffusion guy, I feel obliged to weigh in on this. The new model adds a parameter of “safe areas” but assumes that “safeness” is exogenous. However, if the Romero movies have taught us anything, it’s that the defensive resources are only effective if they aren’t sabotaged by the internal squabbles of humans. (If you’re not familiar with Romero’s movies, think of what Newman from Seinfeld did in “Jurassic Park”). Thus you’d have to add another parameter, which is the probability in any given period that some jackass sabotages the defensive perimeter, steals the battle bus, etc. If such sabotage eliminates or even appreciably reduces the “safe area” efficacy then human survival in the “safe areas” is contingent on the act of sabotage not occurring. If we assume that p(sabotage) is 1% in any given month, then the probability of sabotage occurring at least once over the course of two years is 1-.99^24, which works out to 21%. That’s not bad, but if we assume a p(sabotage) per month of at least 2.9% then there’s a better than even chance that we’re fucked. Having a dim view of human nature, I don’t like those odds.

So a more elaborated model would not only have to add in parameters for spatial heterogeneity, but also human sabotage. The two probably interact in that the higher the probability of sabotage, the more important it is to have many small resource islands rather than one big resource island. This may have policy implications in terms of how and where we stockpile resources in preparation for the inevitable zombie apocalypse.

March 12, 2010 at 2:06 pm 1 comment

Memetracker into Stata

| Gabriel |

A few months ago I mentioned the Memetracker project to scrape the internet and look for the diffusion of (various variants of) catchphrases. I wanted to play with the dataset but there were a few tricks. First, the dataset is really, really, big. The summary file is 862 megabytes when stored as text and would no doubt be bigger in Stata (because of how Stata allocates memory to string variables). Second, the data is in a moderately complicated hierarchical format, with “C” specific occurrences, nested within “B” phrase variants, which are in turn nested within “A” phrase families. You can immediately identify whether a row is A, B, or C by the numer of leading tabs (0, 1, and 2, respectively).

I figured that the best way to interpret this data in Stata would be two create two flat-files, one a record of all the “A” records that I call “key”, and the other a simplified version of all the “C” records but with the key variable to allow merging with the “A” records. Rather than do this all in Stata, I figured it would be good to pre-process it in perl, which reads text one line at a time and thus is well-suited for handling very large files. The easy part was to make a first pass through the file with grep to create the “key” file by copying all the “A” rows (i.e., those with no leading tabs).

Slightly harder was to cull the “C” rows. If I just wanted the “C” rows this would be easy, but I wanted to associate them with the cluster key variable from the “A” rows. This required looking for “A” rows, copying the key, and keeping it in memory until the next “A” row. Meanwhile, every time I hit a “C” row, I copy it but add in the key variable from the most recent “A” row. Both for debugging and because I get nervous when a program doesn’t give any output for several minutes, I have it print to screen every new “A” key. Finally, to keep the file size down, I set a floor to eliminate reasonably rare phrase clusters (anything with less than 500 occurrences total).

At that point I had two text files, “key” which associates the phrase cluster serial number with the actual phrase string and “data” which records occurrences of the phrases. The reason I didn’t merge them is that it would massively bloat the file size and it’s not necessary for analytic purposes. Anyway, at this point I could easily get both the key and data files into Stata and do whatever I want with them. As a first pass, I graphed the time-series for each catchphrase, with and without special attention drawn to mentions occurring in the top 10 news websites.

Here’s a sample graph.

Here’s the perl file:

#!/usr/bin/perl by ghr
#this script cleans the "phrase cluster" data
#script takes the (local and unzipped) location of this file as an argument
#throws out much of the data, saves as two tab flatfiles
#"key.txt" which associates cluster IDs with phrases
#"data.txt" which contains individual observations of the phrases
# input
# A:  <ClSz>  <TotFq>  <Root>  <ClId>
# B:          <QtFq>   <Urls>  <QtStr>  <QtId>
# C:                   <Tm>    <Fq>     <UrlTy>  <Url>
# output, key file
# A:  <ClSz>  <TotFq>  <Root>  <ClId>
# output, data file
# C:<ClID>	<Tm>	<UrlTy>	<URL>
# make two passes.

use warnings; use strict;
die "usage: <phrase cluster data>\n" unless @ARGV==1;

#define minimum number of occurences a phrase must have
my $minfreq = 500;

my $rawdata = shift(@ARGV);
# use bash grep to write out the "key file"
system("grep '^[0-9]' $rawdata > key.txt");

# read again, and write out the "data file"
# if line=A, redefine the "clid" variable
# optional, if second field of "A" is too small, (eg, below 100), break the loop?
# if line=B, skip
# if line=C, write out with "clid" in front
my $clid  ;
open(IN, "<$rawdata") or die "error opening $rawdata for reading\n";
open(OUT, ">data.txt") or die "error creating data.txt\n";
print OUT "clid\ttm\turlty\turl\n";
while (<IN>) {
	#match "A" lines by looking for numbers in field 0
	if($_=~ /^\d/) {
		my @fields = split("\t", $_); #parse as tab-delimited text
		if($fields[1] < $minfreq) { last;} #quit when you get to a rare phrase
		$clid = $fields[3]; #record the ClID
		$clid =~ s/\015?\012//; #manual chomp
		print "$clid ";
	#match "C" lines, write out with clid
	if ($_ =~ m/^\t\t/) {
		my @fields = split("\t", $_);
		print OUT "$clid\t$fields[2]\t$fields[4]\t$fields[5]\n";
close IN;
close OUT;
print "\ndone\n";

And here’s the Stata file:

set mem 500m
set more off
cd ~/Documents/Sjt/memetracker/
*import key, or "A" records
insheet using key.txt, clear
ren v1 clsz
ren v2 totfq
ren v3 root
ren v4 clid
sort clid
lab var clsz "cluster size, n phrases"
lab var totfq "total frequency"
lab var root "phrase"
lab var clid "cluster id"
save key, replace
*import data, or "C" records
insheet using data.txt, clear
drop if clid==.
gen double timestamp=clock(tm,"YMDhms")
format timestamp %tc
drop tm
gen hostname=regexs(1) if regexm(url, "http://([^/]+)") /*get the website, leaving out the filepath*/
drop url
gen blog=0
replace blog=1 if urlty=="B"
replace blog=1 if hostname==""
gen technoratitop10=0 /*note, as of 2/3/2010, some mismatch with late 2008 memetracker data*/
foreach site in {
	replace technoratitop10=1 if hostname=="`site'"
gen alexanews10=0 /*as w technorati, anachronistic*/
foreach site in {
	replace alexanews10=1 if hostname=="`site'"
drop urlty
sort clid timestamp
contract _all /*eliminate redundant "C" records (from different "B" branches)*/
drop _freq
save data, replace
*draw a graph of each meme's occurrences
levelsof clid, local(clidvalues)
foreach clid in `clidvalues' {
	disp "`clid'"
	quietly use key, clear
	quietly keep if clid==`clid'
	local title=root in 1
	quietly use data, clear
	histogram timestamp if clid==`clid', frequency xlabel(#5, labsize(small) angle(forty_five)) title(`title', size(medsmall))
	graph export graphs/`clid'.png, replace
	twoway (histogram timestamp if clid==`clid') (line alexanews10 timestamp if clid==`clid', yaxis(2)), legend(off) xlabel(#5, labsize(small) angle(forty_five)) title(`title', size(medsmall))
	graph export graphs_alexa/`clid'.png, replace
*have a nice day

February 8, 2010 at 4:31 am 7 comments

MDC Code (updated)

| Gabriel |

I continue to make baby steps towards a self-contained ado file for my multilevel diffusion curves (MDC) technique. This is tricky as the model has to be run through a quadratic. My co-author and his RA wrote a spreadsheet that does this, but it’s kind of a pain to copy the output from Stata into the spreadsheet. Anyway, the current version of the script still requires you to run it through the spreadsheet, but it makes it a lot more convenient by putting it in the shape the spreadsheet expects and it gives you the Bass model for the baseline (no independent variables) model. Eventually I’ll work myself up to figuring out how to do the matrix multiplication in Stata and then it will be totally plug and chug.
Here’s how you’d specify:

mdcrun newadoptions innovationlevelvariable timelevelvariable, i(serialno) nt(laggedcumulativeadoptions) saving(resultsfile)

Here’s the program itself:

capture program drop quadraticprocess
program define quadraticprocess, rclass
	matrix betas=e(b)
	matrix varcovar=e(V)
	mata: st_matrix("se",sqrt(diagonal(st_matrix("varcovar")))')
	matrix results= betas\se
	matrix rowname results = beta se
	* A B C in the quadratic sense
	matrix colname results = B C A

	*several abs() fcns necessary for se to avoid imaginary numbers and keep se>=0
	local nmax	= (-results[1,1] - ((results[1,1]^2)-4*results[1,3]*results[1,2])^0.5) / (2*results[1,2])
	local nmax_se	= abs(`nmax'-(-(results[2,1]+results[1,1])-((results[2,1]+results[1,1])^2-4*(results[2,3]+results[1,3])*(results[2,2]+results[1,2]))^0.5)/2/(results[2,2]+results[1,2]))
	local endo	= - results[1,2]
	local endo_se	= abs(`endo'-(-results[2,2]))
	local exo	= results[1,3] / `nmax'
	local exo_se	= abs(`exo'- results[2,3] /`nmax_se')

	return local N		= `e(N)'
	return local nmax	= `nmax'
	return local nmax_se	= `nmax_se'
	return local endo	= `endo'
	return local endo_se	= `endo_se'
	return local exo	= `exo'
	return local exo_se	= `exo_se'

capture program drop mdcrun
program define mdcrun
	*dependency: mat2txt
	set more off
	syntax varlist , i(string asis) nt(string asis) [SAVing(string asis)]
	disp as text "{hline 80}"
	disp as text "MDCRUN"
	disp as text "This code gives information which must be interpreted"
	disp as text "with the spreadsheet at"
	disp as text "{hline 80}"
	gettoken first varlist : varlist

	gen cons=1
	foreach var in `varlist' cons {
		quietly drop if `var'==.
		quietly gen `var'_1=`nt'*`var'
		quietly gen `var'_2=`nt'*`nt'*`var'

	* create `varlist_ext' as an alternate varlist macro that has the interactions
	foreach var in `varlist' {
		local varlist_ext="`varlist_ext' `var'_1 `var'_2 `var'"
	local varlist_ext="`varlist_ext' cons_1 cons_2"

	quietly tabstat `varlist', save stat(mean sd)
	matrix varlistdescriptive=r(StatTotal)'
	matrix varlistdescriptive=varlistdescriptive\(1,0)
	disp as text "Baseline model"
	xtreg `first' cons_1 cons_2, re i(`i')
	matrix baselineresults = results[1...,"A"], results[1...,"B"], results[1...,"C"]
	matrix colnames baselineresults = raw nt nt2
	matrix baselineresults = baselineresults'
	disp as text "model: adds = ( `r(exo)' + `r(endo)' Nt ) (`r(nmax)' - Nt) "
	disp "error: adds = ( `r(exo_se)' + `r(endo_se)' Nt ) (`r(nmax_se)' - Nt) "
	disp "{hline 80}"
	disp "Covariates Model"
	*coefficients are vars + interactions with nt and nt^2"
	xtreg `first' `varlist_ext', re i(`i') 
	matrix betas=e(b)
	matrix varcovar=e(V)
	mata: st_matrix("se",sqrt(diagonal(st_matrix("varcovar")))')
	foreach figure in betas se {
		local counter=1
		foreach var in `varlist' cons {
			matrix `figure'_`var' = (`figure'[1,`counter'], `figure'[1,`counter'+1], `figure'[1,`counter'+2])
			local counter=`counter'+3
		matrix `figure'grid = (., ., .)
		foreach var in `varlist' cons {
			matrix `figure'grid = `figure'grid \ `figure'_`var' 
		matrix define `figure'grid = `figure'grid[2...,....]
		matrix colnames `figure'grid = nt nt2 raw
		matrix rownames `figure'grid = `varlist' cons
	matrix biggrid=varlistdescriptive, betasgrid[1...,"raw"], segrid[1...,"raw"], betasgrid[1...,"nt"], segrid[1...,"nt"], betasgrid[1...,"nt2"], segrid[1...,"nt2"]
	matrix colnames biggrid = mean sd b_raw se_raw b_nt se_nt b_nt2 se_nt2
	matrix rownames biggrid = `varlist' cons
	di as text "{hline 80}"
	di as text "The following should be fed into the spreadsheet"
	disp "The baseline model goes in L7:N9"
	*transpose? beta, se should be columns
	matlist baselineresults
	disp as text _newline "The covariate model should go in cells xx:xx"
	di as text "Excel Cols:  |         H & J        |       D & F         |         L & N       |      T & V" _continue
	matlist biggrid
	disp "Please see AC-AJ for interpretation"
	*write to disk or print to screen: varlistdescriptive col 1+2, beta_col3, se_col3, beta_col1, se_col1, beta_col2, se_col2 
	*tab or comma as "delimchar"
	disp as text "{hline 80}"
	disp as text "For citation and help with theory/interpretation, see"
	disp as text `"Rossman, Chiu, and Mol. 2008. "Modeling Diffusions of"'
	disp "Multiple Innovations Via Multilevel Diffusion Curves:"
	disp `"Payola in Pop Music Radio" Sociological Methodology"' 
	disp "38:201-230." _newline
	if "`saving'"!="" {
		mat2txt , matrix(baselineresults) saving(`saving') title(baseline model, L7:N9) replace
		mat2txt , matrix(biggrid) saving(`saving') title(covariates, D-V) append

(Btw, here’s the older version of the MDC code.)

January 5, 2010 at 4:35 am 2 comments

Towards a sociology of living death

| Gabriel |

Daniel Drezner had a post a few months ago talking about how international relations scholars of the four major schools would react to a zombie epidemic. Aside from the sheer fun of talking about something as silly as zombies, it has much the same illuminating satiric purpose as “how many X does it take to screw in a lightbulb” jokes. If you have even a cursory familiarity with IR it is well worth reading.

Here’s my humble attempt to do the same for several schools within sociology. Note that I’m not even to get into the Foucauldian “whose to say that life is ‘normal’ and living death is ‘deviant’” stuff because, really, it would be too easy. Also, I wrote this post last week and originally planned to save it for Halloween, but I figured I’d move it up given that Zombieland is doing so well with critics and at the box office.

Public Opinion. Consider the statement that “Zombies are a growing problem in society.” Would you:

  1. Strongly disagree
  2. Somewhat disagree
  3. Neither agree nor disagree
  4. Somewhat agree
  5. Strongly agree
  6. Um, how do I know you’re really with NORC and not just here to eat my brain?

Criminology. In some areas (e.g., Pittsburgh, Raccoon City), zombification is now more common that attending college or serving in the military and must be understood as a modal life course event. Furthermore, as seen in audit studies employers are unwilling to hire zombies and so the mark of zombification has persistent and reverberating effects throughout undeath (at least until complete decomposition and putrefecation). However race trumps humanity as most employers prefer to hire a white zombie over a black human.

Cultural toolkit. Being mindless, zombies have no cultural toolkit. Rather the great interest is understanding how the cultural toolkits of the living develop and are invoked during unsettled times of uncertainty, such as an onslaught of walking corpses. The human being besieged by zombies is not constrained by culture, but draws upon it. Actors can draw upon such culturally-informed tools as boarding up the windows of a farmhouse, shotgunning the undead, or simply falling into panicked blubbering.

Categorization. There’s a kind of categorical legitimacy problem to zombies. Initially zombies were supernaturally animated dead, they were sluggish but relentlessness, and they sought to eat human brains. In contrast, more recent zombies tend to be infected with a virus that leaves them still living in a biological sense but alters their behavior so as to be savage, oblivious to pain, and nimble. Furthermore even supernatural zombies are not a homogenous set but encompass varying degrees of decomposition. Thus the first issue with zombies is defining what is a zombie and if it is commensurable with similar categories (like an inferius in Harry Potter). This categorical uncertainty has effects in that insurance underwriters systematically undervalue life insurance policies against monsters that are ambiguous to categorize (zombies) as compared to those that fall into a clearly delineated category (vampires).

Neo-institutionalism. Saving humanity from the hordes of the undead is a broad goal that is easily decoupled from the means used to achieve it. Especially given that human survivors need legitimacy in order to command access to scarce resources (e.g., shotgun shells, gasoline), it is more important to use strategies that are perceived as legitimate by trading partners (i.e., other terrified humans you’re trying to recruit into your improvised human survival cooperative) than to develop technically efficient means of dispatching the living dead. Although early on strategies for dealing with the undead (panic, “hole up here until help arrives,” “we have to get out of the city,” developing a vaccine, etc) are practiced where they are most technically efficient, once a strategy achieves legitimacy it spreads via isomorphism to technically inappropriate contexts.

Population ecology. Improvised human survival cooperatives (IHSC) demonstrate the liability of newness in that many are overwhelmed and devoured immediately after formation. Furthermore, IHSC demonstrate the essentially fixed nature of organizations as those IHSC that attempt to change core strategy (eg, from “let’s hole up here until help arrives” to “we have to get out of the city”) show a greatly increased hazard for being overwhelmed and devoured.

Diffusion. Viral zombieism (e.g. Resident Evil, 28 Days Later) tends to start with a single patient zero whereas supernatural zombieism (e.g. Night of the Living Dead, the “Thriller” video) tends to start with all recently deceased bodies rising from the grave. By seeing whether the diffusion curve for zombieism more closely approximates a Bass mixed-influence model or a classic s-curve we can estimate whether zombieism is supernatural or viral, and therefore whether policy-makers should direct grants towards biomedical labs to develop a zombie vaccine or the Catholic Church to give priests a crash course in the neglected art of exorcism. Furthermore marketers can plug plausible assumptions into the Bass model so as to make projections of the size of the zombie market over time, and thus how quickly to start manufacturing such products as brain-flavored Doritos.

Social movements. The dominant debate is the extent to which anti-zombie mobilization represents changes in the political opportunity structure brought on by complete societal collapse as compared to an essentially expressive act related to cultural dislocation and contested space. Supporting the latter interpretation is that zombie hunting militias are especially likely to form in counties that have seen recent increases in immigration. (The finding holds even when controlling for such variables as gun registrations, log distance to the nearest army administered “safe zone,” etc.).

Family. Zombieism doesn’t just affect individuals, but families. Having a zombie in the family involves an average of 25 hours of care work per week, including such tasks as going to the butcher to buy pig brains, repairing the boarding that keeps the zombie securely in the basement and away from the rest of the family, and washing a variety of stains out of the zombie’s tattered clothing. Almost all of this care work is performed by women and very little of it is done by paid care workers as no care worker in her right mind is willing to be in a house with a zombie.

Applied micro-economics. We combine two unique datasets, the first being military satellite imagery of zombie mobs and the second records salvaged from the wreckage of Exxon/Mobil headquarters showing which gas stations were due to be refueled just before the start of the zombie epidemic. Since humans can use salvaged gasoline either to set the undead on fire or to power vehicles, chainsaws, etc., we have a source of plausibly exogenous heterogeneity in showing which neighborhoods were more or less hospitable environments for zombies. We show that zombies tended to shuffle towards neighborhoods with low stocks of gasoline. Hence, we find that zombies respond to incentives (just like school teachers, and sumo wrestlers, and crack dealers, and realtors, and hookers, …).

Grounded theory. One cannot fully appreciate zombies by imposing a pre-existing theoretical framework on zombies. Only participant observation can allow one to provide a thick description of the mindless zombie perspective. Unfortunately scientistic institutions tend to be unsupportive of this kind of research. Major research funders reject as “too vague and insufficiently theory-driven” proposals that describe the intention to see what findings emerge from roaming about feasting on the living. Likewise IRB panels raise issues about whether a zombie can give informed consent and whether it is ethical to kill the living and eat their brains.

Ethnomethodology. Zombieism is not so much a state of being as a set of practices and cultural scripts. It is not that one is a zombie but that one does being a zombie such that zombieism is created and enacted through interaction. Even if one is “objectively” a mindless animated corpse, one cannot really be said to be fulfilling one’s cultural role as a zombie unless one shuffles across the landscape in search of brains.

Conversation Analysis.

1  HUMAN:    Hello, (0.5) Uh, I uh, (Ya know) is anyone in there?
2  ZOMBIE1:  Br:ai[ns], =
3  ZOMBIE2:       [Br]:ain[s]
4  ZOMBIE1:              =[B]r:ains
5  HUMAN:    Uh, I uh= li:ke, Hello? =
6  ZOMBIE1:  Br:ai:ns!
7  (0.5)
8  HUMAN:    Die >motherfuckers!<
9  SHOTGUN:  Bang! (0.1) =
10 ZOMBIE1:  Aa:ar:gg[gh!]
11 SHOTGUN:         =[Chk]-Chk, (0.1) Bang!

October 13, 2009 at 4:24 am 21 comments

Applied diffusion modeling

| Gabriel |

Via Slashdot, some mathematicians at the University of Ottowa have modeled zombie infestation. It’s basically your standard endogenous growth model with a cute application. Here’s the conclusion:

In summary, a zombie outbreak is likely to lead to the collapse of civilisation, unless it is dealt with quickly. While aggressive quarantine may contain the epidemic, or a cure may lead to coexistence of humans and zombies, the most effective way to contain the rise of the undead is to hit hard and hit often. As seen in the movies, it is imperative that zombies are dealt with quickly, or else we are all in a great deal of trouble.

Here’s an even more “sophisticated” simulation, which allows spatial heterogeneity.

August 14, 2009 at 11:04 pm

Older Posts Newer Posts

The Culture Geeks

Recent Posts


Get every new post delivered to your Inbox.

Join 1,472 other followers