Archive for May, 2009

Contour graphs in Stata

| Gabriel |

Stata has a lot of graphing capabilities but it can’t do contour maps (and can only do monochrome surface maps through an ado file). Contour maps are kind of like a scatterplot except that they have three dimensions, where the color-coding stands in for the z-axis. They’re really useful for graphically illustrating complex nonlinear functions (which is why you see them a lot in hard science journals and really hardcore techy network stuff). Usually x and y are some parameter and z is some metric that’s calculated by feeding x and y into a nonlinear equation or simulation. Surface maps are similar but they show the z-axis literally instead of through color-coding. Grusky the Younger has used these to show ginormous occupational reproduction xtabs (x is dad’s job, y is son’s job, z is frequency of that particular combo).

Anyway, these graphs are really useful for certain things but they don’t work in Stata (or Numbers or OpenOffice). On the other hand Excel, R, and Gnuplot have all been able to do this forever. (Gnumeric can do contour but not surface). Anyway, this kind of sucks because I work in Stata and it’s a hassle to a) export a table to a Gnumeric or Excel or b) script Stata to push the graphing to R or GnuPlot. Since I need to do these graphs a lot for exploratory purposes I want to be able to do a quick and dirty draft directly within Stata. (I don’t mind using the other software for publication quality stuff but as every quant knows you do the exploratory stuff a thousand times before you’re ready to set it as publication quality). There’s a pretty good ado file for surface graphs (just type “findit surface”) but I find color-coding much easier to read than 3-D so I want contour graphs and I want it in Stata.

Anyway, I wrote this code to create draft contour graphs. The command syntax is just “crudecontour x y z” where x and y are the axes and z is the color-coding. The program automatically breaks z into quartiles and shows it color-coded from blue (low z) to red (high z).

Note that this little program revels in two distinct aspects of mediocrity. First, it expects the dataset to have exactly one cell for each combination of x and y. If there’s a missing cell it just plots it as white space (unlike the good packages which will impute). Second, it produces graphs that look a like a video game from 1979. On the other hand, it’s native (have you tried to install Gnuplot on a Mac?) and it took five minutes to code. It’s good enough for exploratory work but you definitely want to use something else for the final version. I still haven’t given up yet on scripting Stata to push it to R or Gnuplot because I’d really rather batch this than do it in a GUI spreadsheet.

capture program drop crudecontour
capture program drop crudecontour program define crudecontour set more off local x `1' local y `2' local z `3' /*color-coding variable*/ quietly sum `z', detail local z25=`r(p25)' local z50=`r(p50)' local z75=`r(p75)' twoway /* */ (scatter `x' `y' if `z'<`z25', mcolor(blue) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z25' & `z'<`z50', mcolor(green) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z50' & `z'<`z75', mcolor(yellow) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z75', mcolor(red) msize(huge) msymbol(square))/* */ , legend(order(1 "1Q `z'" 2 "2Q `z'" 3 "3Q `z'" 4 "4Q `z'")) end

May 29, 2009 at 3:49 am 3 comments

Network externalities and Facebook

| Gabriel |

GNXP had an interesting post a few days ago graphing per capita use of Facebook at the state level and shows that there’s a steep gradient the further you get from Harvard (patient zero). The post argues that this is all about network externalities. I pretty much agree but I have a few thoughts and caveats.

The method assumes that the (underlying) social network is basically a lattice. If there really is anything to the stereotype of coastal yuppies seeing everything between LA and the Boston-DC corridor as “flyover country” then the underlying social network has a lot of random graph elements and there’s no reason to assume that California should be more socially distant from Boston than Indiana. On the other hand, Facebook isn’t necessarily the kind of thing that could spread by a random graph element. If you adopt Facebook because a critical mass of your friends do then it will spread much more rapidly through cliques and strong ties than through random graph elements because even if you’re exposed to it through the random graph that’s only one exposure. Something like this probably takes the multiple exposures you only really get with triadic closure so random elements will be useless for this kind of diffusion.

Speaking of vapid “social media”:

Police Slog Through 40,000 Insipid Party Pics To Find Cause Of Dorm Fire

May 26, 2009 at 5:38 am

Data.gov

Perhaps on the model of foreign services like Statistics Canada, the government has launched a one-stop-shop data website called Data.gov. Although many social scientists were under the impression that such a thing already existed (and was called the Census bureau) there’s a lot more data in it than that.

May 22, 2009 at 5:30 am

Directed graphs?

| Gabriel |

Last year ago I read Richard Saller’s book on patron-client networks in ancient Rome, Personal Patronage under the Early Empire. I found it fascinating because the book wasn’t explicitly informed by economic sociology, but every few pages I’d think, this is just like Podolny, or Gould, or Zelizer, or Granovetter! Anyway, it’s a very good book but the thing I’m thinking about right now is a methodological point, which I’ll get to in a minute.

We have a variety of sources that tell us about the institution of clientela. Most concretely, it was built into the very architecture, such that a villa would have benches by the front door where clients could wait to suck up to the boss. Not only do these benches survive at places like Pompeii, but we have poetry and satiric plays making fun of the people who sat on them.

The methodological point stems from Saller’s observation that in some sources the very idea of clientela seems to disappear. For instance one of the historians (Cassius Dio? I’m going from memory) rarely used terms implying a directed tie (clientela, patronus, or cliens). The interesting thing is that whenever he did use such words it was in a context involving a (shameful) status inversion of social class where a senator would become a client of a knight or freedman. (The implication was that by putting commoners in structural positions of power the principate had disrupted the natural order of things that was respected during the republic, the complaint is similar to Southern narratives that complain about black political power during reconstruction). But this is not to say that the historian seldom mentions networks. Rather the historian talks a lot about amicitia (friendship), but always to refer to networks that were either intra-class or with an appropriate hierarchy. Likewise, Pliny wrote many letters to Trajan asking for some favor for Pliny himself or one of his cronies, and Trajan’s reply always used language of friendship.

What seems to be going on is that even in a society as hierarchical and status conscience as Rome, there was a level of discomfort with boldly asserting dominance and so the superior party euphemistically describes the relationship as egalitarian. Pliny sucks up to Trajan, but Trajan maintains the face-saving pretense that Pliny is his equal. So we have a system of directed ties but they can only be perceived as such when viewed from below. When viewed (credulously) from above they appear to be symmetric ties. This is particularly a problem if you’re relying on the superior party for evidence, as classicists do if they rely on the written sources (which are heavily dominated by the senatorial class) rather than, say, archeological discoveries of elaborate tombstones raised by freed slaves extolling the patronage of their former masters.

Similar issues can come up in modern contexts of interest to sociologists. For instance people tend to exaggerate the help they provide to others and minimize the help they receive so you get very different estimates of care-work and other domestic exchange if you ask about incoming versus outgoing transfers. Likewise when you’re doing social network research this isn’t so much a problem for whole network approaches because you can often get information on a dyad from both parties, but it’s potentially a big problem for ego-centric networks.

May 21, 2009 at 5:07 am 1 comment

Are institutions cyclical, counter-cyclical, or non-cyclical?

| Gabriel |

Here’s a question for all you institutionalists — what do we expect to happen to institutions in a sustained economic downturn? For the sake of argument, let’s just assume that we can fairly clearly distinguish between rational and ritualistic economic behavior. That is, let’s go way back to Meyer and Rowan and assume that most of the kinds of things that institutionalists study don’t directly benefit the firm in any proximate sense but are a burnt offering to maintain the pax deorum with the legitimacy spirits. Furthermore, let’s assume that we can find a latent variable for ritual rather than just measure particular rituals (whose popularity may be epiphenomenal). In other words, let’s imagine that we can measure something like the population of B-Ark or the Gross National Ritual (GNR). We can then ask, is GNR cyclical, counter-cyclical, or non-cyclical? I can think of an argument for each (and make a crude association with a flavor of institutionalism) but I’m sincerely curious what others(Pierre? Brayden? Kieran? anyone?) have to say.

Cyclical (Meyer and Rowan)

This argument goes that ritual is a superior good and follows pretty directly from Maslow’s hierarchy of needs. To transpose it to the corporate environment it would be that first you worry about at least breaking even and then you worry about whether you’re contributing to social value. Some arguments go that CSR is not about management leaving money on the table to do right, but is actually efficient in a “doing well by doing good” kind of way. OK, let’s unpack these arguments, most of which are about pleasing customers and pleasing employees. I think it’s fair to say that however true it may be that customers and employees value companies that have gone carbon neutral and scrupulously ensure that their suppliers use sustainable practices, they care about all this a lot less when they are terrified about the economy. Consider that right now Whole Foods is trading at 27% of its January 2006 price whereas the equivalent figure for WalMart is 109%.

Counter-cyclical (Pfeffer and Salancik)

The Chevy Volt’s total development budget is about a billion dollars and even when they start production it in a few years each car will lose several thousand dollars. That is, it’s like the entrepreneur in Chelm who loses money on each transaction but figures he’ll make it up on volume. Nonetheless, if I were (God forbid) an executive at GM right now the last program I would cut would be the Chevy Volt. The reason is because right now GM is so far in the red that laying off a couple of guys who design batteries won’t get them close enough to solvency to matter. What it will do is piss off the state, which is currently shoveling money at GM in a desperate bid to keep it (and its suppliers) out of liquidation. The interesting thing is that GM knows that the Volt would never be profitable even if successful and basically commissioned it as a symbolic gesture.

Now the issue isn’t just GM. We’ve seen similar (and much more directly coercive) action with the TARP banks where the state has forced a lot of symbolically resonant moves of dubious efficacy like the AIG clawbacks. The fact is that the state has been in a very Keynesian lately, but a very symbolically attuned Keynesian state.

Note that my cyclical and counter-cyclical arguments are basically about industry. So one way to harmonize them is to expect that you might see a lot less contribution to GNR coming out of firms in retail and a lot more contribution to GNR coming out of firms that are about to become much more sensitive to the state in FIRE, heavy manufacturing, construction, and energy.

Non-cyclical (Dobbin)

I see the best argument for GNR being non-cyclical as basically being that companies are too paralyzed by internal stakeholders to respond to the cycle. So for instance a lot of Dobbin’s work has shown that affirmative action started in response to aggressive state civil rights enforcement after Duke v Griggs but it continued even under Reagan because by that point firms had created entrenched internal stakeholders to argue for the policy even after the state stopped caring. Thus in this scenario you could say that firms just do institutions whether or not they really need to in order to please the state, their customers, or their employees (the ones outside of HR and legal). Likewise, you could say that firms are so boundedly rational that they don’t know which of their behaviors are ritual and which are technical. I believe a weak version of this argument but am skeptical of a strong version. Nonetheless even if you accept it, this is where population ecology comes into play. However if a downturn is short enough, and ritual is a small enough portion of expenses, then it’s likely that a selection model would let highly ritualistic firms ride out the increased selective pressure for efficiency.

May 20, 2009 at 1:50 pm

Sub-prime marriage

One of the things that’s been making the rounds is the story of the NY Times financial writer who is losing his house (and of course, writing a book about it which was just excerpted as an article in the NY Times magazine). The interesting thing is that what at first looks like a case study in the culture of debt (what idiot/shyster gave him a mortgage?) is really all about divorce.

Basically what happened is this guy and his new wife (until that point a housewife) divorced their respective spouses and married each other. His ex got roughly 2/3 of his (fairly lavish) income as alimony and child support. His new wife had been out of the labor market so long that she was only minimally employable, especially because she was mostly interested in taking a prestigious and intrinsically rewarding knowledge worker type job of the type that a) people tend to self-subsidize and b) she was no longer competitive for. Despite a massive income shock they were both accustomed to their old bourgeois lifestyles and ended up in the red by over $1000 /month.
I find this story to be interesting in part because we’ve suspected since Moynihan (and known for an absolute fact since McLanahan and Sandefur) that many of the problems of the lower class are related to the breakdown of the bourgeois family (especially the increasing number of households with only one adult), but this story illustrates how it’s no picnic for upper middle class professionals types either. I should also note that much of the commentary is flat wrong. For instance if you read the comment thread on this post, most of it is misogyny directed at both women in this story (i.e., the ex is bleeding this guy dry and the second wife had no right to have remained a housewife in her first marriage as soon as her kids were in kindergarten) to the effect that marriage is biased against men. In fact all the research shows that men benefit more from marriage than women, mostly because we start acting responsibly after we get married. After getting married men have more stable employment histories and are much less likely to use cocaine or abuse alcohol. As far as can be determined (through longitudinal studies) the association is mostly causal.

May 19, 2009 at 9:35 am 3 comments

Collaborative code

| Gabriel |

A friend recently told me about a collaborative text editor. I’m perfectly happy having an entirely local text editor because I tend to do most of coding by myself (I’m a loner, a rebel). Although I co-author a lot, there tends to be sufficiently clear division of labor that I can just send data files and output to my co-authors and vice versa. For instance on one project I did all the cleaning and my co-author did the analysis. Nonetheless, not everyone works like this so I figured I’d pass it along.

The program my friend told me about is Etherpad. This is a totally cloud solution and is very quick to set up. Unfortunately it’s really bare bones, for instance it highlights by author (which is good) but doesn’t highlight syntax for anything but Java.

There are also local clients with remote sync. A popular solution for collaborative coding on the mac is SubEthaEdit. On the plus side there is Stata syntax. On the downside both authors need to have Macs and buy the software (30 euros).

A cross-platform, free, and open-source solution is Gobby. Although there is no Stata syntax file it uses a well documented highlighting standard so it should be feasible to write one. In principle Gobby works on the Mac but there’s no binary so good luck getting it to compile. If you’re a Mac person who can’t get Fink to work my suggestion is to use the Linux or Windows version through virtualization.

May 19, 2009 at 5:30 am

Older Posts


The Culture Geeks