Archive for August, 2009

The big gens

| Gabriel |

I heard that a handful of clans (or “gens” in Latin) dominated the higher offices of the Roman republic and I figured that this would be a good data question. To start, I copied the Fasti Consulares from Wikipedia and limited it to the Republican period, defined as the Rape of Lucretia through the Battle of Actium.

Roman names followed the convention of “personal gens family [honorifics].” So, for instance “Publius Cornelius Scipio Africanus” means “the man Publius from the Scipio branch of the Cornelius clan, the conqueror of Africa.” From the perspective of seeing which clans dominated the Republic, the key bit is the second name so I used the Stata string function “word” to pull the second word out of each of these names.

As can be seen, the distribution of consulships/gens follows a power-law. Since power-laws indicate a cumulative advantage mechanism we can interpret this as meaning that in Rome a family’s power and prestige was endogenous.

consuls

The most dominant clans  in the republic were the Furii (41 consulships), the Claudii (45 consulships), the Aemilii (53 consulships), the Fabii (62 consulships), the Valerii (71 consulships), and the Cornelii (106 consulships). This means that a Cornelius was consul about once every six years.

In contrast the Iulii (as in Gaius Iulius Caesar) held the consulship a relatively paltry 29 times, so small wonder that in order to establish the monarchy they had to form a marriage alliance with the Claudii. Likewise, the Pompeii were a politically obscure family but Pompey Magnus became powerful through his patron-client relationships with the Cornelii.

August 14, 2009 at 5:34 am 1 comment

Did I stutter?

| Gabriel |

Vain creature that I am, I was googling my Dixie Chicks stuff and it was a somewhat frustrating experience. Here are two of the references to it I found, out there on the wilds of the internet:

  • Gabriel Rossman uses the Dixie Chicks incident of when they openly spoke out against Bush to show how synergy is creating corporate censorhip.
  • A study titled “Who Killed the Travelin’ Soldier: Elites, Masses, and Blacklisting of Critical Speakers” done by Gabriel Rossman Dept of Sociology Princeton University supports the fact that neither “Free Republic or other right wing groups” organized boycotts were responsible for the dcx demise. It was the dcx themselves.

Aaaargggghhh!

If you haven’t read or don’t remember the article, the gist of it is that the Dixie Chicks blacklist was probably instigated by right-wing social movements and was definitely not instigated by big companies like Clear Channel. That is to say that both of these references to the paper get it completely back asswards, apparently so as to make it conform to their ideological priors. I can understand people thinking (incorrectly in my opinion) that it was big media’s fault or that right-wing social movements had nothing to do with it, but I really don’t see how you can use findings to the opposite effect as evidence for these positions. Nonetheless I’m sure it happens constantly (sometimes even in journals rather than, as in this case, random websites).

August 13, 2009 at 4:49 am

But it’s got an 11

| Gabriel |

stata11

StataCorp cronies like Jeremy may have gotten it weeks ago, but mere end-users like me had to wait a bit longer. This looks to be a significant upgrade (especially for Windows users, who now have a good integrated do-file editor). In particular I wish I had access to the new “factor variables” syntax (as a replacement for “xi”) a few weeks ago. Likewise the new stcrreg model (think an st version of mlogit) looks very good.

A few words on a smooth upgrade. First, remember to reload your ado files.

Also, remember to update any scripts in your text editor so it knows where to push. In TextMate hit control-command-option-B to invoke the script editor. Scroll down to “Stata” and edit the scripts “Send File to Stata” and “Send Selection to Stata.” In each script the key line is

osascript -e "tell application \"StataMP\"

If this line doesn’t refer to your current version of Stata, change it.

August 12, 2009 at 5:39 pm 2 comments

links

| Gabriel |

NYT: Statistics jobs

Note that a lot of what they are calling statistics is really more about data mining, which is why several of the people they highlight are computer scientists not statisticians. This is consistent with my belief that our training ought to give more emphasis to workflow and data cleaning. Despite the usual standard-error-centric statistics training I’ve managed to develop a decent workflow, but (as can be seen by reading my shell scripts) I still really struggle with data cleaning languages like awk and perl.

Apple Rejecting All e-Book App Store Submissions?

Long story short, Apple is worried about confirming whether the application developers have clear title to the copyright. I see this as indirect fallout from the Kindle “1984” scandal, as well as a good illustration of transaction costs (and by extension, a good argument for limited copyright terms).

August 7, 2009 at 6:06 am

User interface mimetic isomorphism

| Gabriel |

Slashdot reports that the next version of OpenOffice will include the “ribbon” interface, which replaces the familiar “file/edit/view/…” menus with a bunch of icons. This is pretty interesting as the ribbon was the single most despised aspect of Office 2007 (it’s one of the main reasons that when my Dell broke last year I replaced it with a Mac). And yet OOo is imitating this feature. I see this as pretty solid evidence for mimetic isomorphism, the premise that actors will blindly copy the market leader even when the particular behavior at issue kind of sucks.

In other sucky user interface news, Wednesday’s Mac 10.5.8 update automatically “upgrades” Safari to 4.0.2, which means that but for the grace of Time Machine I would have lost my beloved Safari 4 Beta, which had tabs on the top as the default interface. Safari 4.0.x has tabs on the bottom (where they take up an extra centimeter of vertical space) and there’s no way to opt-out, even with the command line. I really don’t understand why Apple thinks it’s a good idea to build computers with letterbox screens, then write software that takes up gratuitous horizontal screen real estate with UI junk. Between Safari 4.0.x tabs and the Dock, reading the web on a 13″ MacBook is a pretty squinty experience. The whole horizontal aesthetic really plays into the “iLife” / “I’m-a-Mac-and-I’m-a-PC” brand image of the Mac as the world’s most expensive family photo album rather than a serious work tool.

August 7, 2009 at 5:29 am 2 comments

Regression to the mean

| Gabriel |

Imagine having all your undergrads write practice essays. You read them all and find the five worst essays, then send these kids to the writing center, or even (martyr that you are) tutor them personally. At the end of the term you see that they were no longer the bottom five but were still in the bottom half. Conversely, imagine noticing that most of the faculty brats you know are much smarter than average kids, but not as smart as their parents.

In these cases the issue is not necessarily the efficacy of the writing center or the stupefaction of growing up in a college town, but regression to the mean. In the first case it’s adverse selection, in the second it’s advantageous selection, but the issue is the same.

Regression to the mean occurs whenever you have three conditions:

  1. a pre-treatment and post-treatment measure of the key variable (or something similar like two indicators loading on the same latent variable)
  2. assignment to the treatment is non-random with respect to the pre-treatment measure
  3. the key variable has moderate to low reliability

The reason is that you operationalize effect of the treatment as (Yi1+ei1)-(Yi0+ei0). Now it’s true that the actual treatment effect would be Yi1-Yi0. But note that ei0 is uncorrelated with ei1. Therefore, to the extent that ei0 was important to assigning cases to the treatment, a lot of what you think is an effect is really just that the latent value of the cases you selected for treatment weren’t as severe as you thought they were. I wrote a simulation that demonstrates all of this.

capture log close
log using reg2mean.log, replace

*the model assumes a true level of Y, which is susceptible to treatment, but
*  which can only be measured with error
*a population of agents is generated with y_0true distributed as a standard
*  normal
*y_0observed is defined as y_0true + random-normal*noisiness
*  where "noisiness" is a scaling factor.
*"Adverse selection" occurs when the treatment is applied to all agents
*  for whom y_0observed < -1 sigma
*In the second wave, y is measured again for all agents
*  y_1true = y_0true + beta_treatment*treatment
*  y_1observed = y_1true + random-normal/noisiness
*  delta_observed = y_1observed - y_0observed

*If there were random assignment, delta_observed should equal zero for the
* control group and beta_treatment for the treatment group
*However because noise_0 is uncorrelated with noise_1, with adverse selection
*  they can diverge.
*As such we can measure
*   bias=delta_observed(for treatment group) - beta_treatment

*simulation will vary "noisiness" and "beta_treatment" to show effects on
*  "bias"

global nagents=10000
*each condition gets $nagents

capture program drop reg2mean
program define reg2mean
 set more off
 capture macro drop noisiness beta_treatment
 global noisiness 			`1'
 * how bad is our measure of Y, should range 0 (good measure) to
 *  1 (1 signal: 1 noise), though theoretically it could be even higher
 global beta_treatment		`2'
 * how effective is the treatment. should range from -.5 (counter-productive)
 * to .5 (pretty good), where 0 means no effect
 disp "noise " float(`1') " -- efficacy " float(`2')
 clear
 quietly set obs $nagents
 gen y_0true=rnormal()
 gen y_0observed=y_0true + (rnormal()*$noisiness)
 gen treatment=0
 *this code defines recruitment
 * for adverse selection use "<-1"  * for advantageous selection use ">1"
 quietly replace treatment=1 if y_0observed<-1
 gen y_1true=y_0true+ (treatment*$beta_treatment)
 gen y_1observed=y_1true+ (rnormal()*$noisiness)
 gen delta_observed=y_1observed-y_0observed
 gen bias=delta_observed - (treatment*$beta_treatment)
 collapse (mean) bias delta_observed, by (treatment)
 quietly keep if treatment==1
 drop treatment
 gen noisiness=$noisiness
 gen beta_treatment=$beta_treatment
 append using reg2mean
 quietly save reg2mean, replace
end

clear
set obs 1
gen x=.
save reg2mean.dta, replace

forvalues noi=0(.1)1 {
	forvalues beta=-.5(.1).5 {
		reg2mean `noi' `beta'
	}
}
drop x
drop if noisiness==.
lab var delta_observed "apparent efficacy of treatment"
lab var bias "measurement error of delta_obs"
lab var noisiness "measurement error of Y"
lab var beta_treatment "true efficacy of treatment"
recode beta_treatment -1.001/-.999=-1 -.001/.001=0 .999/1.001=1 1.999/2.001=2
compress
save reg2mean.dta, replace

table noisiness, c(m bias sd bias)
table beta_treatment, c(m bias sd bias)
*have a nice day

As you can see, bias is robust to the size of the true effect but is basically equal to noisiness. The practical implication is to be very skeptical of claims about effects where the measurement has low reliability and selectivity is built into the system.

If you like, you can use some of my other code to graph the simulation as a contour plot, either crudely but natively or more elegantly with gnuplot. Here’s the code with those two commands:

crudecontour noisiness beta_treatment bias
gnuplotpm3d noisiness beta_treatment bias, title(Regression to the Mean Simulation) xlabel(noisiness) ylabel(efficacy) using(reg2mean)

August 6, 2009 at 5:56 am 1 comment

Incentives vs institutions

| Gabriel |

As anyone who has ever written an empirical paper knows, one of the hardest things is coming up with what can charitably be called a “compelling null” and cynically a “good straw man.” Behold, a gift I bestow (via MR) unto macro economic sociologists of the world polity school. A new NBER theory piece argues that the global institutionalization of child labor bans will delay the actual diffusion of child labor bans in low income countries. From henceforth, anyone caring to do a world polity paper (or conversely, a public choice / RCT political economy paper) can have that most desirable of things, a “competing predictions” lit review.

You’re welcome.

August 5, 2009 at 1:24 pm

Snapshot

| Gabriel |

For a very long time I’ve been in the habit of keeping multiple versions of both manuscripts and scripts. Every day that I work on a file I save a new version as basenameYYMMDD.extension. The reason I do this is to facilitate buyer’s remorse on changes (and conversely, let myself be free to experiment). Although Time Machine does this for me to an extent, I still like to do manual snapshots in part because I’m a creature of habit and in part because Time Machine is more of a restore tool than a “how did I do this last month” kind of tool. Worse, Time Machine deletes old backups (which happens really often if you have any disk image files).

Anyway, In the course of working with Lyx I realized that a problem with my approach is that the file always has a new name, which makes it hard to draw interconnections between files. For instance, Lyx allows you to have child documents, graphics, etc, embedded within master documents but this only works if you save stable names for the target files.

What I realized was that the simple solution was that instead of having all versions called basenameYYMMDD.extension, where the current version is just the one with the most recent YYMMDD, I could make the current version basename.extension and the snapshots basenameYYMMDD.extension. That way I can have another file point to the file and always have it point to the current version, even as I also have access to the snapshots.

I wrote a shell script that does this. On a Mac you can use the “run shell script” workflow action to do it in Automator which let’s you treat it as a Finder plug-in (right-click) or as an app (drag-and-drop).

#bin/bash
TIMESTAMP=`date '+%Y%m%d'`
for f in "$@"
do
	EXTENSION=${f##*.}
	FILENAME=`basename $f | sed 's/\(.*\)\..*/\1/'`
	DIRPATH=`dirname $f`
	cp "$f" "$DIRPATH/$FILENAME$TIMESTAMP.$EXTENSION"
done

My version assumes that it’s taking arguments (in my case from Automator) but to make it self contained you can define a variable as a list of paths and have it read off of that variable instead of “$@”. Here’s an example of the syntax for the variable:

X="/Users/rossman/bigdeal.txt /Users/rossman/biggerdeal.lyx /Users/rossman/biggestdeal.do "

Two possible modifications you might wish to use are to a) only snapshot files that have been changed recently or b) put the snapshots in an “archive” or “oldversions” subdirectory. To do the latter change the penultimate line to:

  cp "$f" "$DIRPATH/archive/$FILENAME$TIMESTAMP.$EXTENSION"

(Thanks to Haynes and Ganbustein in the MacOSXHints forums for some help debugging the code).

August 5, 2009 at 5:42 am 5 comments

Time consistency and reputation

Two recent posts make an interesting contrast in firm reputation. In the first, Bryan Caplan notes that German insurance companies were eager to honor claims by Jews whose property was destroyed in Kristallnacht, despite the efforts of the Nazis to get them to repudiate these claims. The insurance companies were more concerned with their reputation for honoring claims and not finding a pretext to weasel out than they were with avoiding paying the substantial damages.

In the second, Ed Felten explains why Amazon’s sackcloth and ashes routine after the Kindle 1984 fiasco simply isn’t credible. They claim that they will never again delete downloaded works but this is what game theorists call “cheap talk” given the time inconsistency problem. Given the power of the state and Amazon’s recent much-lamented cravenness, it’s just hard to believe that Amazon would refuse a court order for any of the myriad reasons that one might be issued. In some of these cases Amazon might figure a way to reconcile its promise to its customers and its damages to the plaintiff (for instance, it might pay the plaintiff royalties in an IP case) but there are other plausible circumstances where only outright censorship could resolve the aggrieved party (these are especially issues in foreign jurisdictions many of which have pro-plaintiff burdens of proof for libel and ban hate speech and blasphemy).

As such, Felten observes that the only way that Amazon could really earn the trust of its customers is to use Schelling’s power of constraint (or for you humanities-types, to lash itself to the mast like Odysseus) and make it technically impossible for the Kindle to remotely delete or disable content. However it is unlikely that they would constrain themselves in this way because the publishers who license Kindle content probably insist on a kill switch as a hedge against piracy. Indeed, this is exactly why (despite in many ways being an ideal customer for it) I will never buy a Kindle. However I still trust Amazon for print because it already is impossible for them to demand that I return a paper book that they have sold me, no matter what judges in countries where Amazon does substantial business but that lack the First Amendment have to say about it.

So in the long run reputation is a difficult issue for firms to maintain. One way is to count on such severitas as was shown by the German insurance men, but as Amazon has shown this is not certain. The other is, as Felten argues, to use technical or legal instruments to make it impossible to renege.

August 4, 2009 at 1:38 pm 1 comment

generate m3u in a directory

| Gabriel |

This isn’t even remotely sociology, but it is code.

My daughter has a Playskool “Made for Me” child’s MP3 player that we bought for her back when I had a Windows machine. Now that I’m on a mac I’m having trouble updating the contents as the client is only for Windows (and doesn’t work with Wine). Fortunately it’s pretty transparent how it works and so I’m able to fix it so it works with Mac or Linux (or for that matter, Windows if you lose the installer disc). I’d imagine that this advice not only works for this particular product but many similar mp3 players or any other device that needs a file index in similar format.

Anyway, when you plug in the device it mounts as USB drive “Untitled.” Open it up and you see it has several folders: “Playtime,” “Soothing,” “Favorites,” and “Sounds.” Inside of each of these is a bunch of MP3s and an M3U.

The obvious thing is to just drag-and-drop MP3s into one of these folders but the device won’t recognize it because it uses the M3U file as a script. Fortunately it’s easy to edit an M3U. Even though your operating system likes to think of it as a media format (specifically, a playlist), an M3U is just a text file listing filepaths. This device uses old-school MSDOS FAT syntax (i.e., backslash for directories, allcaps, 8 letters for filename truncated with “~1”, 3 for extension, CRLF line breaks). For instance, here are the first few lines of the M3U for the “playtime” directory:

\PLAYTIME\3MONS~1.MP3
\PLAYTIME\2IFID~1.MP3
\PLAYTIME\PETERPAN.MP3

My solution was to drag and drop the files, then run this shell script to update the M3U. It goes by alphabetical order, but you can hand-edit it with a text editor, just move the lines around.

# make a backup of the old m3u file
cp Playtime.m3u Playtime.bak
# create a text file listing all the mp3 files
ls | grep -i mp3 > filelist.txt
# clean the text file by deleting whitespace and file extension, capitalizing everything
awk '{ gsub("\.mp3", ""); print $0;}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{ gsub(" ", ""); print $0;}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{print toupper($0)}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# if code is >8 chars, truncate to 6 chars and append "~1"
awk '{if (length($0)>8) {print substr($0, 1 , 6) "~1"} else {print $0}}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# add path and extension
awk '{print "\\PLAYTIME\\" $0}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
awk '{print $0 "\.MP3"}' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
# use windows txt (crlf) instead of unix (lf)
awk '{ sub(/$/,"\r"); print }' "filelist.txt" > tmp.txt ; mv tmp.txt filelist.txt
mv filelist.txt Playtime.m3u

If you don’t know how to execute a shell script, just wrap it in the Automator action “run shell script” and execute the Automator workflow inside the appropriate directory.

August 4, 2009 at 2:25 am 1 comment

Older Posts Newer Posts


The Culture Geeks