Archive for May, 2009
Contour graphs in Stata
| Gabriel |
Stata has a lot of graphing capabilities but it can’t do contour maps (and can only do monochrome surface maps through an ado file). Contour maps are kind of like a scatterplot except that they have three dimensions, where the color-coding stands in for the z-axis. They’re really useful for graphically illustrating complex nonlinear functions (which is why you see them a lot in hard science journals and really hardcore techy network stuff). Usually x and y are some parameter and z is some metric that’s calculated by feeding x and y into a nonlinear equation or simulation. Surface maps are similar but they show the z-axis literally instead of through color-coding. Grusky the Younger has used these to show ginormous occupational reproduction xtabs (x is dad’s job, y is son’s job, z is frequency of that particular combo).
Anyway, these graphs are really useful for certain things but they don’t work in Stata (or Numbers or OpenOffice). On the other hand Excel, R, and Gnuplot have all been able to do this forever. (Gnumeric can do contour but not surface). Anyway, this kind of sucks because I work in Stata and it’s a hassle to a) export a table to a Gnumeric or Excel or b) script Stata to push the graphing to R or GnuPlot. Since I need to do these graphs a lot for exploratory purposes I want to be able to do a quick and dirty draft directly within Stata. (I don’t mind using the other software for publication quality stuff but as every quant knows you do the exploratory stuff a thousand times before you’re ready to set it as publication quality). There’s a pretty good ado file for surface graphs (just type “findit surface”) but I find color-coding much easier to read than 3-D so I want contour graphs and I want it in Stata.
Anyway, I wrote this code to create draft contour graphs. The command syntax is just “crudecontour x y z” where x and y are the axes and z is the color-coding. The program automatically breaks z into quartiles and shows it color-coded from blue (low z) to red (high z).
Note that this little program revels in two distinct aspects of mediocrity. First, it expects the dataset to have exactly one cell for each combination of x and y. If there’s a missing cell it just plots it as white space (unlike the good packages which will impute). Second, it produces graphs that look a like a video game from 1979. On the other hand, it’s native (have you tried to install Gnuplot on a Mac?) and it took five minutes to code. It’s good enough for exploratory work but you definitely want to use something else for the final version. I still haven’t given up yet on scripting Stata to push it to R or Gnuplot because I’d really rather batch this than do it in a GUI spreadsheet.
capture program drop crudecontourcapture program drop crudecontour program define crudecontour set more off local x `1' local y `2' local z `3' /*color-coding variable*/ quietly sum `z', detail local z25=`r(p25)' local z50=`r(p50)' local z75=`r(p75)' twoway /* */ (scatter `x' `y' if `z'<`z25', mcolor(blue) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z25' & `z'<`z50', mcolor(green) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z50' & `z'<`z75', mcolor(yellow) msize(huge) msymbol(square)) /* */ (scatter `x' `y' if `z'>=`z75', mcolor(red) msize(huge) msymbol(square))/* */ , legend(order(1 "1Q `z'" 2 "2Q `z'" 3 "3Q `z'" 4 "4Q `z'")) end
Network externalities and Facebook
| Gabriel |
GNXP had an interesting post a few days ago graphing per capita use of Facebook at the state level and shows that there’s a steep gradient the further you get from Harvard (patient zero). The post argues that this is all about network externalities. I pretty much agree but I have a few thoughts and caveats.
The method assumes that the (underlying) social network is basically a lattice. If there really is anything to the stereotype of coastal yuppies seeing everything between LA and the Boston-DC corridor as “flyover country” then the underlying social network has a lot of random graph elements and there’s no reason to assume that California should be more socially distant from Boston than Indiana. On the other hand, Facebook isn’t necessarily the kind of thing that could spread by a random graph element. If you adopt Facebook because a critical mass of your friends do then it will spread much more rapidly through cliques and strong ties than through random graph elements because even if you’re exposed to it through the random graph that’s only one exposure. Something like this probably takes the multiple exposures you only really get with triadic closure so random elements will be useless for this kind of diffusion.
Speaking of vapid “social media”:
Police Slog Through 40,000 Insipid Party Pics To Find Cause Of Dorm Fire
Data.gov
Perhaps on the model of foreign services like Statistics Canada, the government has launched a one-stop-shop data website called Data.gov. Although many social scientists were under the impression that such a thing already existed (and was called the Census bureau) there’s a lot more data in it than that.
Directed graphs?
| Gabriel |
Last year ago I read Richard Saller’s book on patron-client networks in ancient Rome, Personal Patronage under the Early Empire. I found it fascinating because the book wasn’t explicitly informed by economic sociology, but every few pages I’d think, this is just like Podolny, or Gould, or Zelizer, or Granovetter! Anyway, it’s a very good book but the thing I’m thinking about right now is a methodological point, which I’ll get to in a minute.
We have a variety of sources that tell us about the institution of clientela. Most concretely, it was built into the very architecture, such that a villa would have benches by the front door where clients could wait to suck up to the boss. Not only do these benches survive at places like Pompeii, but we have poetry and satiric plays making fun of the people who sat on them.
The methodological point stems from Saller’s observation that in some sources the very idea of clientela seems to disappear. For instance one of the historians (Cassius Dio? I’m going from memory) rarely used terms implying a directed tie (clientela, patronus, or cliens). The interesting thing is that whenever he did use such words it was in a context involving a (shameful) status inversion of social class where a senator would become a client of a knight or freedman. (The implication was that by putting commoners in structural positions of power the principate had disrupted the natural order of things that was respected during the republic, the complaint is similar to Southern narratives that complain about black political power during reconstruction). But this is not to say that the historian seldom mentions networks. Rather the historian talks a lot about amicitia (friendship), but always to refer to networks that were either intra-class or with an appropriate hierarchy. Likewise, Pliny wrote many letters to Trajan asking for some favor for Pliny himself or one of his cronies, and Trajan’s reply always used language of friendship.
What seems to be going on is that even in a society as hierarchical and status conscience as Rome, there was a level of discomfort with boldly asserting dominance and so the superior party euphemistically describes the relationship as egalitarian. Pliny sucks up to Trajan, but Trajan maintains the face-saving pretense that Pliny is his equal. So we have a system of directed ties but they can only be perceived as such when viewed from below. When viewed (credulously) from above they appear to be symmetric ties. This is particularly a problem if you’re relying on the superior party for evidence, as classicists do if they rely on the written sources (which are heavily dominated by the senatorial class) rather than, say, archeological discoveries of elaborate tombstones raised by freed slaves extolling the patronage of their former masters.
Similar issues can come up in modern contexts of interest to sociologists. For instance people tend to exaggerate the help they provide to others and minimize the help they receive so you get very different estimates of care-work and other domestic exchange if you ask about incoming versus outgoing transfers. Likewise when you’re doing social network research this isn’t so much a problem for whole network approaches because you can often get information on a dyad from both parties, but it’s potentially a big problem for ego-centric networks.
Are institutions cyclical, counter-cyclical, or non-cyclical?
| Gabriel |
Here’s a question for all you institutionalists — what do we expect to happen to institutions in a sustained economic downturn? For the sake of argument, let’s just assume that we can fairly clearly distinguish between rational and ritualistic economic behavior. That is, let’s go way back to Meyer and Rowan and assume that most of the kinds of things that institutionalists study don’t directly benefit the firm in any proximate sense but are a burnt offering to maintain the pax deorum with the legitimacy spirits. Furthermore, let’s assume that we can find a latent variable for ritual rather than just measure particular rituals (whose popularity may be epiphenomenal). In other words, let’s imagine that we can measure something like the population of B-Ark or the Gross National Ritual (GNR). We can then ask, is GNR cyclical, counter-cyclical, or non-cyclical? I can think of an argument for each (and make a crude association with a flavor of institutionalism) but I’m sincerely curious what others(Pierre? Brayden? Kieran? anyone?) have to say.
Cyclical (Meyer and Rowan)
This argument goes that ritual is a superior good and follows pretty directly from Maslow’s hierarchy of needs. To transpose it to the corporate environment it would be that first you worry about at least breaking even and then you worry about whether you’re contributing to social value. Some arguments go that CSR is not about management leaving money on the table to do right, but is actually efficient in a “doing well by doing good” kind of way. OK, let’s unpack these arguments, most of which are about pleasing customers and pleasing employees. I think it’s fair to say that however true it may be that customers and employees value companies that have gone carbon neutral and scrupulously ensure that their suppliers use sustainable practices, they care about all this a lot less when they are terrified about the economy. Consider that right now Whole Foods is trading at 27% of its January 2006 price whereas the equivalent figure for WalMart is 109%.
Counter-cyclical (Pfeffer and Salancik)
The Chevy Volt’s total development budget is about a billion dollars and even when they start production it in a few years each car will lose several thousand dollars. That is, it’s like the entrepreneur in Chelm who loses money on each transaction but figures he’ll make it up on volume. Nonetheless, if I were (God forbid) an executive at GM right now the last program I would cut would be the Chevy Volt. The reason is because right now GM is so far in the red that laying off a couple of guys who design batteries won’t get them close enough to solvency to matter. What it will do is piss off the state, which is currently shoveling money at GM in a desperate bid to keep it (and its suppliers) out of liquidation. The interesting thing is that GM knows that the Volt would never be profitable even if successful and basically commissioned it as a symbolic gesture.
Now the issue isn’t just GM. We’ve seen similar (and much more directly coercive) action with the TARP banks where the state has forced a lot of symbolically resonant moves of dubious efficacy like the AIG clawbacks. The fact is that the state has been in a very Keynesian lately, but a very symbolically attuned Keynesian state.
Note that my cyclical and counter-cyclical arguments are basically about industry. So one way to harmonize them is to expect that you might see a lot less contribution to GNR coming out of firms in retail and a lot more contribution to GNR coming out of firms that are about to become much more sensitive to the state in FIRE, heavy manufacturing, construction, and energy.
Non-cyclical (Dobbin)
I see the best argument for GNR being non-cyclical as basically being that companies are too paralyzed by internal stakeholders to respond to the cycle. So for instance a lot of Dobbin’s work has shown that affirmative action started in response to aggressive state civil rights enforcement after Duke v Griggs but it continued even under Reagan because by that point firms had created entrenched internal stakeholders to argue for the policy even after the state stopped caring. Thus in this scenario you could say that firms just do institutions whether or not they really need to in order to please the state, their customers, or their employees (the ones outside of HR and legal). Likewise, you could say that firms are so boundedly rational that they don’t know which of their behaviors are ritual and which are technical. I believe a weak version of this argument but am skeptical of a strong version. Nonetheless even if you accept it, this is where population ecology comes into play. However if a downturn is short enough, and ritual is a small enough portion of expenses, then it’s likely that a selection model would let highly ritualistic firms ride out the increased selective pressure for efficiency.
Sub-prime marriage
One of the things that’s been making the rounds is the story of the NY Times financial writer who is losing his house (and of course, writing a book about it which was just excerpted as an article in the NY Times magazine). The interesting thing is that what at first looks like a case study in the culture of debt (what idiot/shyster gave him a mortgage?) is really all about divorce.
Collaborative code
| Gabriel |
A friend recently told me about a collaborative text editor. I’m perfectly happy having an entirely local text editor because I tend to do most of coding by myself (I’m a loner, a rebel). Although I co-author a lot, there tends to be sufficiently clear division of labor that I can just send data files and output to my co-authors and vice versa. For instance on one project I did all the cleaning and my co-author did the analysis. Nonetheless, not everyone works like this so I figured I’d pass it along.
The program my friend told me about is Etherpad. This is a totally cloud solution and is very quick to set up. Unfortunately it’s really bare bones, for instance it highlights by author (which is good) but doesn’t highlight syntax for anything but Java.
There are also local clients with remote sync. A popular solution for collaborative coding on the mac is SubEthaEdit. On the plus side there is Stata syntax. On the downside both authors need to have Macs and buy the software (30 euros).
A cross-platform, free, and open-source solution is Gobby. Although there is no Stata syntax file it uses a well documented highlighting standard so it should be feasible to write one. In principle Gobby works on the Mac but there’s no binary so good luck getting it to compile. If you’re a Mac person who can’t get Fink to work my suggestion is to use the Linux or Windows version through virtualization.
Virtualization
| Gabriel |
Sometimes you want to use a tool that’s not available for your operating system. For instance, I use a Mac but I sometimes want to use Windows software (eg Pajek) or Unix software (eg Dia). Likewise, Windows users might envy the system tools provided by POSIX systems. Since POSIX has a lot of very powerful text-processing tools I think this should be particularly appealing to culture quants. My own basic solution is to use API emulation for Windows and a full-blown virtual machine for Unix applications.
A lot of people use dual boot solutions for this, but I’m not as fond of this. The way this works is that when you turn on the computer you choose which OS you want to use. For instance OS X Leopard includes “boot camp” and programs like Grub and Wubi let you choose between Windows and Linux. Once you’re in the environment you can only use the software native to that environment and you sometimes even have limited access to the file system of the other environment. The upside is that once you get them working they make minimal demands on system resources. There are really two problems with this approach. One is that you have to reboot to switch between, say, Windows apps and Mac apps. The other is that some of these solutions are a little dangerous as most of them involve things like partitions. This is particularly the case with Macs, which have a weird BIOS and partition table so dual-boot solutions other than boot camp don’t work very smoothly with them. I’ve tried to get my Mac running as dual boot with Linux twice and both times I ended up having to reinstall OS X and restoring from Time Machine. (On the other hand I’ve had no problem installing dual-boots on Wintel machines, including the old clunker I used to write my dissertation which is now mostly running Xubuntu because it’s faster than XP).
Since OS X already has the full panoply of POSIX tools and can run any UNIX software, at first glance it doesn’t make sense that I’d want to run Linux on my Mac. The problem is that while in theory, MacOS should be able to run any UNIX software, this usually only works if it’s pre-compiled and most of it is not and I’ve had a lot of trouble getting Fink to work properly. It seems like it’s always missing some package or compiler and won’t compile the application. As such I find it easier just to keep a copy of Ubuntu so I can use the native package manager which never ever gives me any hassle. Basically, I’m so frustrated with Fink that I find it much easier to just use VirtualBox to run an Xubuntu virtual machine. VirtualBox is a free virtual machine manager that can run just about anything from just about anything. The main reason I use virtualization instead of dual-boot is that it’s impossible to damage your main OS by installing a virtual machine. Even if you can’t get it to work, the worst case scenario is you wasted your time, unlike trying to do a dual boot where you may have to start thinking about how good your most recent backup is and whether you still have all your installation discs. Of course the main downside to virtualization is that you split the system resources. The first way to handle this is to avoid bloated guest OS. You can get really small with DSL or Puppy, but the best ratio of user-friendly to compact is probably Xubuntu. Likewise If you’re installing a Windows guest OS you’d rather use XP than Vista. The second way to handle it is to buy more RAM. This is cheaper than you think because OEM’s in general, but especially Apple, use configuration as a form of price discrimination. Apple charges $100 to upgrade a new computer from 2GB to 4GB but you can get 4GB of RAM on Amazon for $50 and it takes about five minutes to install if you have a jeweler’s size philip’s head screw driver. (Their hard drives are even more over-priced).
For Windows software I don’t keep a virtual machine, in part because I don’t want to buy a Windows license and in part because I worry about the performance hit of running Windows as a VM. Instead I use Crossover, a proprietary build of Wine with better tech support. Crossover/Wine is a Windows API emulator, which is basically a minimalist virtual machine. It both runs much faster than a full-blown emulation and doesn’t require a license for the guest OS. On the other hand it can be slightly more buggy for some things, but in my experience Crossover works great with my old Microsoft Office 2003 for Windows license as well as Pajek.
Herd immunity, again
| Gabriel |
Recently I talked about herd immunity in computer viruses. Yesterday Slashdot linked to an article on a potential vaccine that kills mosquitoes after they’ve bitten you, that is it has a herd immunity effect but no individual benefit at all. Although the article doesn’t mention it, traditional residential DDT spraying works exactly the same way. (After the mosquito bites you she rests on your wall, takes in DDT, and dies).
It’s interesting to think about whether people will adopt these sorts of vaccines since the discrepancy between the marginal vs average benefit is even greater than with, say, the measles vaccine. In an article on DDT, Gladwell noted that dictatorships tended to be more effective at DDT campaigns than democracies, but I’d like to imagine that carrots would work as well as sticks to encourage people to contribute to the public good of herd immunity. Of course sociology has a lot to say about the best ways to get people to contribute to the health of strangers.
Remove the double-spacing from estout
| Gabriel |
As mentioned before, I love estout. However I dislike some of its features, such as that it leaves a blank line between rows. I wrote this code to go at the end of a do-file where it could clean all my tables and remove the gratuitous lines. You could also use it to accomplish other cleaning tasks or to clean other text-based files, including log files.
*each of several banks of regression commands throughout the code end with some variant on: esttab using table_k.txt shell cp table_k.txt $tabledir *note, i prefer "shell cp" over the Stata command "copy" because "cp" assumes the target filename *at the end of the do-files clean the tables *this part merely seeds the loop and is adapted from my "do it to everything in the directory" post cd $parentpath shell touch tmpfile shell mv tmpfile filelist_text.txt cd $tabledir shell ls *.txt >"$parentpath/filelist_text.txt" shell perl -pe 's/\n/ /g' "$parentpath/filelist_text.txt" > tmp shell mv tmp "$parentpath/filelist_text.txt" capture file close myfile file open myfile using "$parentpath/filelist_text.txt", read file read myfile line global filelist `line' *this is where the actual cleaning occurs foreach file in $filelist { shell perl -ne 'print unless /^$/' `file' > tmp copy tmp `file', replace erase tmp }
Recent Comments