Archive for August, 2009

User interface mimetic isomorphism

| Gabriel |

Slashdot reports that the next version of OpenOffice will include the “ribbon” interface, which replaces the familiar “file/edit/view/…” menus with a bunch of icons. This is pretty interesting as the ribbon was the single most despised aspect of Office 2007 (it’s one of the main reasons that when my Dell broke last year I replaced it with a Mac). And yet OOo is imitating this feature. I see this as pretty solid evidence for mimetic isomorphism, the premise that actors will blindly copy the market leader even when the particular behavior at issue kind of sucks.

In other sucky user interface news, Wednesday’s Mac 10.5.8 update automatically “upgrades” Safari to 4.0.2, which means that but for the grace of Time Machine I would have lost my beloved Safari 4 Beta, which had tabs on the top as the default interface. Safari 4.0.x has tabs on the bottom (where they take up an extra centimeter of vertical space) and there’s no way to opt-out, even with the command line. I really don’t understand why Apple thinks it’s a good idea to build computers with letterbox screens, then write software that takes up gratuitous horizontal screen real estate with UI junk. Between Safari 4.0.x tabs and the Dock, reading the web on a 13″ MacBook is a pretty squinty experience. The whole horizontal aesthetic really plays into the “iLife” / “I’m-a-Mac-and-I’m-a-PC” brand image of the Mac as the world’s most expensive family photo album rather than a serious work tool.

August 7, 2009 at 5:29 am 2 comments

Regression to the mean

| Gabriel |

Imagine having all your undergrads write practice essays. You read them all and find the five worst essays, then send these kids to the writing center, or even (martyr that you are) tutor them personally. At the end of the term you see that they were no longer the bottom five but were still in the bottom half. Conversely, imagine noticing that most of the faculty brats you know are much smarter than average kids, but not as smart as their parents.

In these cases the issue is not necessarily the efficacy of the writing center or the stupefaction of growing up in a college town, but regression to the mean. In the first case it’s adverse selection, in the second it’s advantageous selection, but the issue is the same.

Regression to the mean occurs whenever you have three conditions:

  1. a pre-treatment and post-treatment measure of the key variable (or something similar like two indicators loading on the same latent variable)
  2. assignment to the treatment is non-random with respect to the pre-treatment measure
  3. the key variable has moderate to low reliability

The reason is that you operationalize effect of the treatment as (Yi1+ei1)-(Yi0+ei0). Now it’s true that the actual treatment effect would be Yi1-Yi0. But note that ei0 is uncorrelated with ei1. Therefore, to the extent that ei0 was important to assigning cases to the treatment, a lot of what you think is an effect is really just that the latent value of the cases you selected for treatment weren’t as severe as you thought they were. I wrote a simulation that demonstrates all of this.

capture log close
log using reg2mean.log, replace

*the model assumes a true level of Y, which is susceptible to treatment, but
*  which can only be measured with error
*a population of agents is generated with y_0true distributed as a standard
*  normal
*y_0observed is defined as y_0true + random-normal*noisiness
*  where "noisiness" is a scaling factor.
*"Adverse selection" occurs when the treatment is applied to all agents
*  for whom y_0observed < -1 sigma
*In the second wave, y is measured again for all agents
*  y_1true = y_0true + beta_treatment*treatment
*  y_1observed = y_1true + random-normal/noisiness
*  delta_observed = y_1observed - y_0observed

*If there were random assignment, delta_observed should equal zero for the
* control group and beta_treatment for the treatment group
*However because noise_0 is uncorrelated with noise_1, with adverse selection
*  they can diverge.
*As such we can measure
*   bias=delta_observed(for treatment group) - beta_treatment

*simulation will vary "noisiness" and "beta_treatment" to show effects on
*  "bias"

global nagents=10000
*each condition gets $nagents

capture program drop reg2mean
program define reg2mean
 set more off
 capture macro drop noisiness beta_treatment
 global noisiness 			`1'
 * how bad is our measure of Y, should range 0 (good measure) to
 *  1 (1 signal: 1 noise), though theoretically it could be even higher
 global beta_treatment		`2'
 * how effective is the treatment. should range from -.5 (counter-productive)
 * to .5 (pretty good), where 0 means no effect
 disp "noise " float(`1') " -- efficacy " float(`2')
 quietly set obs $nagents
 gen y_0true=rnormal()
 gen y_0observed=y_0true + (rnormal()*$noisiness)
 gen treatment=0
 *this code defines recruitment
 * for adverse selection use "<-1"  * for advantageous selection use ">1"
 quietly replace treatment=1 if y_0observed<-1
 gen y_1true=y_0true+ (treatment*$beta_treatment)
 gen y_1observed=y_1true+ (rnormal()*$noisiness)
 gen delta_observed=y_1observed-y_0observed
 gen bias=delta_observed - (treatment*$beta_treatment)
 collapse (mean) bias delta_observed, by (treatment)
 quietly keep if treatment==1
 drop treatment
 gen noisiness=$noisiness
 gen beta_treatment=$beta_treatment
 append using reg2mean
 quietly save reg2mean, replace

set obs 1
gen x=.
save reg2mean.dta, replace

forvalues noi=0(.1)1 {
	forvalues beta=-.5(.1).5 {
		reg2mean `noi' `beta'
drop x
drop if noisiness==.
lab var delta_observed "apparent efficacy of treatment"
lab var bias "measurement error of delta_obs"
lab var noisiness "measurement error of Y"
lab var beta_treatment "true efficacy of treatment"
recode beta_treatment -1.001/-.999=-1 -.001/.001=0 .999/1.001=1 1.999/2.001=2
save reg2mean.dta, replace

table noisiness, c(m bias sd bias)
table beta_treatment, c(m bias sd bias)
*have a nice day

As you can see, bias is robust to the size of the true effect but is basically equal to noisiness. The practical implication is to be very skeptical of claims about effects where the measurement has low reliability and selectivity is built into the system.

If you like, you can use some of my other code to graph the simulation as a contour plot, either crudely but natively or more elegantly with gnuplot. Here’s the code with those two commands:

crudecontour noisiness beta_treatment bias
gnuplotpm3d noisiness beta_treatment bias, title(Regression to the Mean Simulation) xlabel(noisiness) ylabel(efficacy) using(reg2mean)

August 6, 2009 at 5:56 am 1 comment

Incentives vs institutions

| Gabriel |

As anyone who has ever written an empirical paper knows, one of the hardest things is coming up with what can charitably be called a “compelling null” and cynically a “good straw man.” Behold, a gift I bestow (via MR) unto macro economic sociologists of the world polity school. A new NBER theory piece argues that the global institutionalization of child labor bans will delay the actual diffusion of child labor bans in low income countries. From henceforth, anyone caring to do a world polity paper (or conversely, a public choice / RCT political economy paper) can have that most desirable of things, a “competing predictions” lit review.

You’re welcome.

August 5, 2009 at 1:24 pm


| Gabriel |

For a very long time I’ve been in the habit of keeping multiple versions of both manuscripts and scripts. Every day that I work on a file I save a new version as basenameYYMMDD.extension. The reason I do this is to facilitate buyer’s remorse on changes (and conversely, let myself be free to experiment). Although Time Machine does this for me to an extent, I still like to do manual snapshots in part because I’m a creature of habit and in part because Time Machine is more of a restore tool than a “how did I do this last month” kind of tool. Worse, Time Machine deletes old backups (which happens really often if you have any disk image files).

Anyway, In the course of working with Lyx I realized that a problem with my approach is that the file always has a new name, which makes it hard to draw interconnections between files. For instance, Lyx allows you to have child documents, graphics, etc, embedded within master documents but this only works if you save stable names for the target files.

What I realized was that the simple solution was that instead of having all versions called basenameYYMMDD.extension, where the current version is just the one with the most recent YYMMDD, I could make the current version basename.extension and the snapshots basenameYYMMDD.extension. That way I can have another file point to the file and always have it point to the current version, even as I also have access to the snapshots.

I wrote a shell script that does this. On a Mac you can use the “run shell script” workflow action to do it in Automator which let’s you treat it as a Finder plug-in (right-click) or as an app (drag-and-drop).

TIMESTAMP=`date '+%Y%m%d'`
for f in "$@"
	FILENAME=`basename $f | sed 's/\(.*\)\..*/\1/'`
	DIRPATH=`dirname $f`

My version assumes that it’s taking arguments (in my case from Automator) but to make it self contained you can define a variable as a list of paths and have it read off of that variable instead of “$@”. Here’s an example of the syntax for the variable:

X="/Users/rossman/bigdeal.txt /Users/rossman/biggerdeal.lyx /Users/rossman/ "

Two possible modifications you might wish to use are to a) only snapshot files that have been changed recently or b) put the snapshots in an “archive” or “oldversions” subdirectory. To do the latter change the penultimate line to:


(Thanks to Haynes and Ganbustein in the MacOSXHints forums for some help debugging the code).

August 5, 2009 at 5:42 am 5 comments

Time consistency and reputation

Two recent posts make an interesting contrast in firm reputation. In the first, Bryan Caplan notes that German insurance companies were eager to honor claims by Jews whose property was destroyed in Kristallnacht, despite the efforts of the Nazis to get them to repudiate these claims. The insurance companies were more concerned with their reputation for honoring claims and not finding a pretext to weasel out than they were with avoiding paying the substantial damages.

In the second, Ed Felten explains why Amazon’s sackcloth and ashes routine after the Kindle 1984 fiasco simply isn’t credible. They claim that they will never again delete downloaded works but this is what game theorists call “cheap talk” given the time inconsistency problem. Given the power of the state and Amazon’s recent much-lamented cravenness, it’s just hard to believe that Amazon would refuse a court order for any of the myriad reasons that one might be issued. In some of these cases Amazon might figure a way to reconcile its promise to its customers and its damages to the plaintiff (for instance, it might pay the plaintiff royalties in an IP case) but there are other plausible circumstances where only outright censorship could resolve the aggrieved party (these are especially issues in foreign jurisdictions many of which have pro-plaintiff burdens of proof for libel and ban hate speech and blasphemy).

As such, Felten observes that the only way that Amazon could really earn the trust of its customers is to use Schelling’s power of constraint (or for you humanities-types, to lash itself to the mast like Odysseus) and make it technically impossible for the Kindle to remotely delete or disable content. However it is unlikely that they would constrain themselves in this way because the publishers who license Kindle content probably insist on a kill switch as a hedge against piracy. Indeed, this is exactly why (despite in many ways being an ideal customer for it) I will never buy a Kindle. However I still trust Amazon for print because it already is impossible for them to demand that I return a paper book that they have sold me, no matter what judges in countries where Amazon does substantial business but that lack the First Amendment have to say about it.

So in the long run reputation is a difficult issue for firms to maintain. One way is to count on such severitas as was shown by the German insurance men, but as Amazon has shown this is not certain. The other is, as Felten argues, to use technical or legal instruments to make it impossible to renege.

August 4, 2009 at 1:38 pm 1 comment

generate m3u in a directory

| Gabriel |

This isn’t even remotely sociology, but it is code.

My daughter has a Playskool “Made for Me” child’s MP3 player that we bought for her back when I had a Windows machine. Now that I’m on a mac I’m having trouble updating the contents as the client is only for Windows (and doesn’t work with Wine). Fortunately it’s pretty transparent how it works and so I’m able to fix it so it works with Mac or Linux (or for that matter, Windows if you lose the installer disc). I’d imagine that this advice not only works for this particular product but many similar mp3 players or any other device that needs a file index in similar format.

Anyway, when you plug in the device it mounts as USB drive “Untitled.” Open it up and you see it has several folders: “Playtime,” “Soothing,” “Favorites,” and “Sounds.” Inside of each of these is a bunch of MP3s and an M3U.

The obvious thing is to just drag-and-drop MP3s into one of these folders but the device won’t recognize it because it uses the M3U file as a script. Fortunately it’s easy to edit an M3U. Even though your operating system likes to think of it as a media format (specifically, a playlist), an M3U is just a text file listing filepaths. This device uses old-school MSDOS FAT syntax (i.e., backslash for directories, allcaps, 8 letters for filename truncated with “~1”, 3 for extension, CRLF line breaks). For instance, here are the first few lines of the M3U for the “playtime” directory:


My solution was to drag and drop the files, then run this shell script to update the M3U. It goes by alphabetical order, but you can hand-edit it with a text editor, just move the lines around.

# make a backup of the old m3u file
cp Playtime.m3u Playtime.bak
# create a text file listing all the mp3 files
ls | grep -i mp3 > filelist.txt
# clean the text file by deleting whitespace and file extension, capitalizing everything
awk '{ gsub("\.mp3", ""); print $0;}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{ gsub(" ", ""); print $0;}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{print toupper($0)}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# if code is >8 chars, truncate to 6 chars and append "~1"
awk '{if (length($0)>8) {print substr($0, 1 , 6) "~1"} else {print $0}}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# add path and extension
awk '{print "\\PLAYTIME\\" $0}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{print $0 "\.MP3"}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# use windows txt (crlf) instead of unix (lf)
awk '{ sub(/$/,"\r"); print }' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
mv filelist.txt Playtime.m3u

If you don’t know how to execute a shell script, just wrap it in the Automator action “run shell script” and execute the Automator workflow inside the appropriate directory.

August 4, 2009 at 2:25 am 1 comment

If it’s genetic, why is it changing?

| Gabriel |

In the course of an extended discussion of obesity (herehere, here, and here), Megan McArdle mentions that weight is highly heritable. She doesn’t mention it, but it’s also true that the psychometric latent variable “g” (aka, IQ) is highly heritable. The puzzling thing is that both obesity and IQ have been increasing over the last few generations. This would make sense if there were selective mortality and/or fertility such that intelligent fat people were more likely to live and have children than stupid skinny people, but there is no such selective pressure to any appreciable degree. So here we have the apparent paradox of a rapidly changing phenotype of a highly heritable trait despite minimal change in the genotype.

The paradox comes from (implicitly) understanding heritability as meaning something like a regression coefficient when it’s really more like a correlation coefficient (technically it’s a structural equation modeling phi). Although we think of correlations as being even more basic than regression coefficients, the interpretation is actually weirder. GNXP provides a very good explanation of this that is well worth reading in its entirety, but here’s the take home:

To say that a trait is .95 heritable does not mean that it is caused 95% by genes, that’s not even wrong. Rather, it is to say that 95% of the variance within the population can be accounted for by the variance of genes within the population. But heritable traits are also usually affected by environment; if you starve someone they will be short, but retain five fingers. The number of fingers you have on your hand is not heritable, because there’s no real variance within the population of the trait. It’s genetically specified, but not heritable.

August 3, 2009 at 4:38 pm 2 comments

Newer Posts

The Culture Geeks