Archive for November, 2009

Perl text library

| Gabriel |

I found this very useful library of perl scripts for text cleaning. You can use them even if you can’t code perl yourself, for instance to transpose a dataset just download “” script to your ~/scripts directory and enter the shell command:
perl ~/scripts/ row_col.txt > col_row.txt

The transpose script is particularly useful to me as I’ve never gotten Excel’s transpose function to work and for some bizarre reason Stata’s “xpose” command only works with numeric variables. You can even use these scripts from directly in a do-file like so:

tempfile foo1
tempfile foo2
outsheet using `foo1'.txt
shell perl ~/scripts/ `foo1'.txt > `foo2'.txt
insheet using `foo2'.txt, clear

November 30, 2009 at 4:49 am 1 comment

some R baby steps

| Gabriel |

I’ve tried to learn R a few times but the syntax has always been opaque to my Stata-centric mind. I think the trick is to realize that two of the key differences are that:

  • whereas handles only come up in a few Stata commands (e.g., file, postfile, log), they are very important in R, what with all the “<-” statements
  • in R there’s a much fuzzier line between commands and functions than in Stata. What I mean by this is both the superficial thing of all the parentheses and also the more substantive issue that often you don’t put them one to a line and they just do something (like Stata commands) but you usually put them many to a line and feed them into something else (like Stata functions). Related to this is that the typical Stata line has the syntax “verb object, adverb” whereas the typical R line has the syntax “object <- verb(object2, adverb)”

The two combine in an obvious way with something as simple as opening a dataset, which is just use file in Stata but is filehandle <- read.table(“file”) in R, that is, there’s not a read.table() command but a read.table() function and you feed this function to a handle. (And people say R isn’t intuitive!)

At least I think that’s a good way to think about the basic syntax — I suck at R and I really could be totally wrong about this. (Pierre or Kieran please correct me).

Anyway, I wrote my first useful R file the other day. It reads my Pajek formatted network data on top 40 radio stations and does a graph.

# File-Name: testgraph.R
# Date: 2009-11-20
# Author: Gabriel Rossman
# Purpose: graph CHR station network
# Data Used:
# Packages Used: igraph
chrnet <- read.graph("", c("pajek"))
plot.igraph(chrnet, layout=layout.fruchterman.reingold, vertex.size=2, vertex.label=NA, vertex.color="red", edge.color="gray20", edge.arrow.size=0.3, margin=0)

The weird thing is that it works fine in but breaks when I try to R run from the Terminal, regardless of whether I try to do it all in one line or first invoke R and then feed it the script. [Update: the issue is a 32 bit library and 64 bit R, the simple solution is to invoke “R32” rather than just plain “R”. see the comments for details]. Here’s a session with both problems:

gabriel-rossmans-macbook-2:~ rossman$ Rscript ~/Documents/book/stata/testgraph.R
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/':
dlopen(/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/, 10): Symbol not found: ___gmpz_clear
Referenced from: /Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/
Expected in: dynamic lookup

Error : .onLoad failed in 'loadNamespace' for 'igraph'
Error: package/namespace load failed for 'igraph'
Execution halted
gabriel-rossmans-macbook-2:~ rossman$ R 'source("~/Documents/book/stata/testgraph.R")'
ARGUMENT 'source("~/Documents/book/stata/testgraph.R")' __ignored__

R version 2.10.0 (2009-10-26)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("~/Documents/book/stata/testgraph.R")
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/':
dlopen(/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/, 10): Symbol not found: ___gmpz_clear
Referenced from: /Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/
Expected in: dynamic lookup


Error : .onLoad failed in 'loadNamespace' for 'igraph'
Error: package/namespace load failed for 'igraph'

The problem seems to be that R (terminal) can’t find the igraph library. This is weird because has no trouble finding it. Furthermore, I get the same error even if I make sure igraph is installed directly from R (Terminal) in the same R session:

chooseCRANmirror(graphics = FALSE)

I guess that’s another difference with Stata, StataConsole knows where the ado library is. I’d like to be able to use the Terminal mode for R as this would let me to reach my Nirvana-like goal of having a single script that does everything without any proximate human intervention. So I’ll just ask? How do I get R (Terminal) to run as reliably as Is this a naive question?

Or would it be better to try to feed a “source” script from the command line? Much how like I can do this for Stata to launch a do-file into the Stata GUI
exec /Applications/Stata/ ~/Documents/book/stata/

November 24, 2009 at 4:42 am 11 comments

I am shocked–shocked–to find scientists abusing peer review

| Gabriel |

A major climate lab in Britain was hacked (leaked?) last week and a lot of the material was really embarrassing. Stuff along the lines of obstruction of freedom of information requests, smoothing messy data, and using peer review and shunning to freeze out contradictory perspectives. From the WaPo write-up:

“I can’t see either of these papers being in the next IPCC report,” Jones writes. “Kevin and I will keep them out somehow — even if we have to redefine what the peer-review literature is!”

In another, Jones and Mann discuss how they can pressure an academic journal not to accept the work of climate skeptics with whom they disagree. “Perhaps we should encourage our colleagues in the climate research community to no longer submit to, or cite papers in, this journal,” Mann writes.

“I will be emailing the journal to tell them I’m having nothing more to do with it until they rid themselves of this troublesome editor,” Jones replies.

All I can say is:

Most people have been looking at this in terms of the science or politics of climate change, but I’m completely with Robin Hanson in thinking that those are non sequiturs and what’s really interesting about this is the (office) politics of science. I mean, is anyone who has ever been through peer review at all surprised to hear that peer reviewers can be malicious assholes willing to use power plays to effect closure against minority perspectives?

On the other hand, while I think this is an affront to decency, this doesn’t really give me severe problems as a matter of scientific epistemology. Sure, I’d rather that scientists took the JS Mills ideal of the market of ideas with a “let me hear you out and then if I’m still unconvinced I’ll give you my good faith rebuttal.” Nonetheless, I’m enough of a Quinean/Kuhnian to think that science isn’t about isolated findings but the big picture and the dominant perspective is probably still right, even if its adherents aren’t themselves exactly Popperians actively seeking out (and failing to find) evidence against their perspective.

November 23, 2009 at 4:39 pm

12 weeks of culture

| Gabriel |

Jenn posted her draft syllabus for grad soc of culture / cultural sociology. It looks like about the best survey of the literature you could get in 12 weeks and about a thousand pages of material. Aside from simply choosing good readings, she’s managed to organize them into weeks in a way that imposes a good sense of order on an often messy and amorphous set of issues. She also does a much better job than I do of covering all parts of the field. (My syllabus is unabashedly production-centric, as it’s part of a two quarter sequence with a sister course taught by a colleague on meaning-centric approaches). I highly recommend checking it out for any grad students prepping for a field exam or faculty prepping a course.

November 23, 2009 at 2:57 pm

A few stats pedagogy notes

| Gabriel |

I’ve found the OS X zoom feature to be very effective when teaching stats. Most of the time I have the projector at full resolution (so any given thing on it looks small), but when I want to show a piece of code or output I just mouse over to it and zoom (hold “control” and scroll-wheel or two-finger swipe up). This lets me keep my mac at the regular resolution and use Stata and TextMate in class instead of setting it to a lo-res and/or putting zoomed screenshots in Powerpoint or Keynote. This both has a more improvisatory feel and cuts down on the purely technical aspects of course prep.

Speaking of Powerpoint/Keynote, one of the problems with teaching code is you lose syntax highlighting. However you can keep it by copying from a text editor as rtf.

Finally, via Kai Arzheimer, I see the new site Teaching With Data, which includes both sample datasets and pedagogical materials.

November 17, 2009 at 4:47 am

Programming notes

| Gabriel |

I gave a lecture to my grad stats class today on Stata programming. Regular readers of the blog will have heard most of this before but I thought it would help to have it consolidated and let more recent readers catch up.

Here’s my lecture notes. [Updated link 10/14/2010]

November 12, 2009 at 3:33 pm 2 comments

Science (esp. econ) made fun

| Gabriel |

In a review essay, Vromen talks about the (whodathunkit) popular book/magazine-column/blog genre of economics-made-fun that’s become a huge hit with the mass audience in the last 5 to 10 years. Although Vromen doesn’t mention it, this can be seen as a special case of the science-can-be-fun genre (e.g., Stephen Jay Gould’s short essays that use things like Hershey bars and Mickey Mouse to explain reasonably complex principles of evolutionary biology.)

Vromen makes a careful distinction from the older genre of economists-can-be-funny (currently exemplified by the stand-up economist), which is really a special case of the general genre of scientists doing elaborate satires of their own disciplines for the benefit of their peers. There is an entire journal of this, but my all time favorite example is a satire of mid-20th century psychology in the form of a review of the literature on when people are willing to pass the salt at the dinner table.  Two excerpts from the “references” section should suffice to convince you to click the link and read the whole thing.

  • Festinger, R. “Let’s Give Some Subjects $20 and Some Subjects $1 and See What Happens.” Journal for Predictions Contrary to Common Sense 10, 1956, pp. 1-20.
  • Milgram, R. “An Electrician’s Wiring Guide to Social Science Experiments.” Popular Mechanics 23, 1969, pp. 74-87.

If you don’t remember what Festinger and Milgram actually did in the 50s and 60s this won’t be funny, but if you do it’s hilarious. Hence, the scientists-can-be-funny genre is a self-deprecating genre for an audience of insiders that simultaneously demonstrates the joker’s mastery of the field and the field’s foibles. In contrast, the science-can-be-fun genre is targeted to a mass audience and is about demonstrating the elegance and power of the field. The former inspires humility among practitioners, the latter awe among the yokels.

One of the interesting things about the econ-made-fun literary genre is that it is largely orthogonal to any theoretical distinction within scholarly economics. The most prominent “econ made fun” practitioners span such theoretical areas as applied micro (Levitt), behavioral (Ariely), and Austrian (Cowen). In part because the “econ made fun” genre exploded at about the same time as the Kahneman Nobel and in part because “econ made fun” tends to focus on unusual substantive issues (i.e., anything but financial markets), this has led a lot of people to conflate “econ made fun” and behavioral econ. I’ve heard Steve Levitt referred to as a “behavioral economist” several times. This drives me crazy as at a theoretical level, behavioral economics is the opposite of applied micro, and in fact Levitt has done important work suggesting that behavioral econ may not generalize very well from the lab to the real world. That people (including people who ought to know better) nonetheless refer to him as a “behavioral economist” suggests to me that in the popular imagination literary genre is vastly more salient than theoretical content.

I myself occasionally do the “sociologists can be funny” genre (see here , here, and here) but these are basically elaborate deadpan in-jokes and I am under no illusions that anyone without a PhD would find them at all funny. I have no idea how to go about writing “sociology can be fun” (this is probably the closest I’ve come) along the lines of Levitt/Dubner or Harford, nor to be honest do I see any other sociologist doing it particularly well. There are plenty of sociologists who try to speak to a mass audience, but the tone tends to be professorial exposition or political exhortation rather than amusement at the surprising intricacy of social life. Fortunately Malcolm Gladwell has an intense and fairly serious interest in sociology and is very talented at making our field look fun.

November 10, 2009 at 4:40 am 1 comment

Older Posts

The Culture Geeks