Archive for May, 2011
- Lisa sends along this set of instructions for doing a wide-long reshape in R. Useful and I’m passing it along for the benefit of R users, but the relative intuition and simplicity of “reshape wide stub, i(i) j(j)” is why I still do my mise en place in Stata whenever I use R. Ideally though, as my grad student Brooks likes to remind me, we really should be doing this kind of data mise en place in a dedicated database and use the Stata and R ODBC commands/functions to read it in.
- “The days change at night, change in an instant.”
- Anyone interested in replicating this paper should be paying close attention to this pending natural experiment. In particular I hope the administrators of this survey are smart enough to oversample California in the next wave. I’d consider doing the replication myself but I’m too busy installing a new set of deadbolts and adopting a dog from a pit bull rescue center.
- In Vermont, a state government push to get 100% broadband penetration is using horses to wire remote areas that are off the
supply curvebeaten path. I see this as a nice illustration both of cluster economies and of the different logics used by markets (market clearing price) and states (fairness, which often cashes out as universal access) in the provision of resources. (h/t Slashdot)
- Yglesias discusses some poll results showing that voters in most of the states that recently elected Republican governors now would have elected the Democrats. There are no poll results for California, the only state that switched to the Democrats last November. Repeat after me: REGRESSION TO THE MEAN. I don’t doubt that some of this is substantive backlash to overreach on the part of politically ignorant swing voters who didn’t really understand the GOP platform, but really, you’ve still got to keep in mind REGRESSION TO THE MEAN.
- Speaking of Yglesias, the ThinkProgress redesign only allows commenting from Facebook users, which is both a pain for those of us who don’t wish to bear the awesome responsibility of adjudicating friend requests and a nice illustration of how network externalities can become coercive as you reach the right side of the s-curve.
| Gabriel |
Compare and contrast:
Postive comments, demonstrations of attention, or expressions of interest reflect approval, thereby influencing opinion, if everyone knows that they are not made lightly; and they will not be made lightly if those making them understand them as forms of deference. It is painful to pay attention to another person if the favor is not repaid.
The displeasure of offering unreciprocated gestures of approval keeps such gestures within limits, in turn limiting their impact on other people’s attributions, and so forth. Runaway status hierarchies are thus unlikely to the degree that people are reluctant to make gestures of approval without having the favor returned, at least in part. (Hence the pressure on media celebrities to feign affection for their fans).
There is the personal level. I used to call my dear brother [Obama] every two weeks. I said a prayer on the phone for him, especially before a debate. And I never got a call back. And when I ran into him in the state Capitol in South Carolina when I was down there campaigning for him he was very kind. The first thing he told me was, ‘Brother West, I feel so bad. I haven’t called you back. You been calling me so much. You been giving me so much love, so much support and what have you.’ And I said, ‘I know you’re busy.’ But then a month and half later I would run into other people on the campaign and he’s calling them all the time. I said, wow, this is kind of strange. He doesn’t have time, even two seconds, to say thank you or I’m glad you’re pulling for me and praying for me, but he’s calling these other people. I said, this is very interesting. And then as it turns out with the inauguration I couldn’t get a ticket with my mother and my brother. I said this is very strange. We drive into the hotel and the guy who picks up my bags from the hotel has a ticket to the inauguration. My mom says, ‘That’s something that this dear brother can get a ticket and you can’t get one, honey, all the work you did for him from Iowa.’ Beginning in Iowa to Ohio. We had to watch the thing in the hotel.
What it said to me on a personal level, was that brother Barack Obama had no sense of gratitude, no sense of loyalty, no sense of even courtesy, [no] sense of decency, just to say thank you. Is this the kind of manipulative, Machiavellian orientation we ought to get used to? That was on a personal level.
| Gabriel |
I recently noted that graph exporting to PDF in Stata for Mac is fixed. Turns out that this is only partially true. It works and creates beautiful output, but unlike the other “graph export” options it only works if you have “set graphics on” in the Stata GUI. If you’re running it as Stata console or have graphics set off in Stata GUI, it simply doesn’t work. (I do this when batching a lot of graphs as it is faster and less distracting).
My understanding is that this has something to do with how Stata relies on Mac’s Quartz driver to render PDF so it’s not really feasible to fix. So basically you have three options:
1) Do it in the GUI with “set graphics on” and accept the CPU performance hit and distraction of all the graphs rendering.
2) Use my graphexportpdf ado file or the “graph print” command with CUPS-PDF as the print driver.
3) Stick to using EPS
| Gabriel |
Lyx 2.0 is now in official release. I’ve been using it in beta for about six months and I find that it’s a big improvement. The thing that initially attracted me to it is the better spell checker integration in OS X. (In 1.6 it was so bad I’d run a Ubuntu VM just to get the spell checker to work). After a few months of regular usage, I can say that the biggest advantage to me is the document navigation sidebar (activated by the toolbar’s speedometer icon or “Navigate/ List of Figures/ Open Navigator”) , which lets you jump by TOC headings, figure, equation, footnote, or citations. This is of great advantage in a long complex document, like a book.
I highly recommend Lyx 2.0 to people who already use Lyx 1.6 or who are interested in LaTeX but are put-off by having to learn a new markup language. However Lyx/LaTeX has a lot of network externalities associated with it so think twice if you belong to a discipline (like sociology) where editors/collaborators expect MS Word files and it’s hard to find “.cls” and “.bst” files for your journals’ house style.
[Update: I forgot to mention that the new version has a new version of “diff” that in practice behaves like the Word “track changes” feature. It’s pretty elegant and should work well for collaborations.]
| Gabriel |
The current issue of Sunset has a full-page pictorial of a concept for a
dining foodie car in an LA-SF bullet train. Let’s put aside the fact that the route doesn’t make any sense. (Current plans are to build a short proof-of-concept track in a very rural part of the San Joaquin valley, later to be extended all the way from Fresno-Bakersfield. Getting the rest of the way from Bakersfield to LA requires going under or around the Tehachapi mountain range. That or transferring to a three-hour bus ride on the 5).
Instead, let’s think about the dining car itself. The pictorial shows a dwarf citrus tree in the car for passengers to pick fruit either to eat out of hand or for juicing. (As the owner of an orange tree, I can tell you that the pictured dwarf tree would make about two carafes of orange juice). Similarly, there is a “Self-Harvest Salad Bar. Snip and dress your own organic greens from a hydroponic vertical garden and choice of on-tap vinaigrettes.” Because, you know, there is no more efficient way to grow lettuce and oranges than to put a farm on a rail car and rocket it up and down the state at 150 mph. Similarly, it shows solar panels on the roof to, I kid you not, power the espresso machine and “grow lights” for the aforementioned dwarf citrus tree. I’m not an engineer, but I guarantee you that the extra weight and/or drag implied by the solar panels (and changes to the fuselage necessary to accommodate them) would imply fuel costs at least an order of magnitude greater than the power generated by the panels. I mean, who comes up with this shit?
[click the image for full size]
I guess I should be grateful they didn’t describe plans for an aquaculture car or a composting car. It’s one thing to talk about putting hydroponics in a moon base, but that’s because the transportation costs for getting food to the moon are millions of dollars per pound. Transportation costs in getting food onto a train means sending a van from the train station to the supermarket. That or you could just not eat for the three hours you’re on the train and have dinner when you get to San Francisco, which I’ve heard has restaurants in it.
As I fumed about this, I realized that this isn’t just a really stupid idea for a train’s dining car, but a reductio ad absurdum of the whole idea of locavorism. Just as it is much cheaper and ecologically sound to grow food on a farm and have it loaded onto a train at the station, or to generate power at a power plant and transmit it to a train via overhead lines than to produce the food and power on the train itself, to a lesser extent it is more efficient to grow food in a rural farm and truck it into a city (accepting the trivial carbon emissions implied by a few “food-miles”) than it is to devote extraordinarily valuable urban or suburban land to agriculture, thereby increasing the commute times (and by extension, the carbon emissions) of people who might have made denser commercial or residential use of that land.
Gains from specialization and trade people, gains from specialization and trade.
| Gabriel |
A couple years ago UCLA’s pop center migrated our statistical computing from our own server to the university’s Hoffman2 cluster. When this happened I tried out the cluster and hated the recommended “Grid” browser-based GUI, with the single biggest aggravation being that it requires you to transfer files one at a time through a clunky upload/download wizard. As such, I paid for my own Stata MP license (which even as part of a lab volume purchase wasn’t cheap) and since the migration I’ve just done all my statistics locally on my MacBook.
I’ve recently given Hoffman2 another try and realized that I can just ignore “Grid” and do my regular workflow when dealing with a server:
- write code with a good local text editor (preferably one that is SFTP compatible)
- sync scripts, data, and output between the local and remote file systems with an SFTP client
- batch jobs on the server through SSH
- (as a last resort) run GUI apps through X11
Pretty plain vanilla stuff but it’s actually much simpler in practice than a (broken) browser-based GUI.
Now that I’ve gotten this worked out I’m a big fan of Hoffman2 for big jobs because it’s extremely fast. For instance, a simulation that takes Stata MP about seven hours on my MacBook took just an hour and twenty minutes on Hoffman2. As such I’m writing up some notes on how I use it, in part so I remember and in part so I can recommend the cluster to colleagues and students.
File management. Use a dedicated FTP client like Filezilla or Cyberduck. (For some reason the Finder/Pathfinder “Connect to Server” command doesn’t work with Hoffman2). The connection type should be “SFTP”. The URL is “hoffman2.idre.ucla.edu”. Your name and password are the same logins you use for an SSH terminal session (or as the documentation calls it, a “node” session). Use your FTP client to upload data and scripts (which you will probably write locally on a text editor) and download output. Here’s what my configuration window looks like in Cyberduck.
Coding. Either do this locally and sync it through SFTP (see above) or use a text editor with integrated SFTP. On a Mac, TextWrangler/BBEdit has great SFTP support (in addition to other notable features such as really good regular expressions support and Stata syntax highlighting). I can also recommend the cross-platform program Komodo Edit. Or if you’re into that sort of thing you can use Vim or emacs through SSH.
Connecting to SSH. Open your “Terminal” (on Mac/Linux) or an SSH client (on Windows). Type “ssh hoffman2.idre.ucla.edu”. If it didn’t guess your username correctly you need to write “ssh hoffman2.idre.ucla.edu -l username“. You now have a bash session. You can do all the usual stuff, but mostly you’re just going to batch jobs.
Batching a Job. If you just want to put a job in the queue you simply type “program.q script“. For instance, to do the Stata script “foo.do” you’d make sure you’re in the right directory and type:
The documentation makes it sound much more complicated than this, but 9 times out of 10 that’s all you need to do. The system will email you when your job starts and finishes and you can use SFTP to retrieve the output and log. However if you want to kill a job or something, you just type program.q without arguments and then follow the instructions.
Importing your Stata ado-files
Unlike R (where you have to put “library()” at the start of your source files), Stata’s use of libraries is so transparent that you can forget they’re not part of the stock Stata installation. (My first batch crashed twice because I forgot to install some of my commands). On your own computer, remind yourself what ado-files you have installed with these Stata commands.
disp "`c(sysdir_plus)'" disp "`c(sysdir_personal)'"
On a Mac, both of these folders are in “~/Library/Application Support/Stata/ado”
Once you remember what ado-files you want, write yourself a do-file that will install them and batch it. For instance, I did:
ssc install fs ssc install fsx ssc install gllamm ssc install estout ssc install stata2pajek ssc install shufflevar
The ado files go in “~/ado” which has the practical upshot that you don’t need admin permission to install them and they persist between sessions.
Interactive GUI Usage. Do it on your own computer. If that’s not possible (perhaps because you don’t have a personal license for a particular piece of software) use X11 rather than Grid. When I experimented with Grid’s browser-based VNC session it took forever to load the Java Virtual Machine, it refreshed at about 10 frames per second, and worst of all it wouldn’t capture keyboard input.
The results are much better if you use a real X11 client rather than Grid’s JVM. To do this you first connect through X11 (in Mac this means using X11.app rather than the Terminal.app) and add the flag “-X” to your ssh session (eg, “ssh hoffman2.idre.ucla.edu -l rossman -X”). As always you can test it with “xeyes” command. You then type “xstata” and follow the instructions carefully. (It bounces you to an interactive node and makes you type back a fairly lengthy command to actually launch the session). It’s a pretty fair amount of work to get an X11 session but unlike the browser version it is useable. (For more instructions on X11 sessions for Stata and other software see the links labeled “How to run on ATS-Hosted Clusters” in this table). Try to avoid this though as it’s faster and less work to just script and batch it with the “stata.q” command described above.
Finally, if you just want an interactive command-line session you can use ssh and issue the “qrsh” command. This actually works really well. Remember that you don’t need to see a graph to make a graph but can use the “graph export” command in Stata and the “pdf()” function in R to write graphs to disc and then retrieve them through your SFTP client.
| Gabriel |
- How to fix GrowlMail after an OS X system update
- Discussion of alpha-centrality (which my co-authors and I used to measure Hollywood stardom).
- Very interesting personal financial history of what it’s like to be an aspiring fashion model (h/t Yglesias). It’s basically a tournament model which means it ain’t fun. A lot of the contractual practices are very similar to the record industry, and to a lesser extent the Hollywood studio system. I may end up assigning this to my undergrads as it makes similar points to Slichter’s So You Wanna Be a Rock N Roll Star (which I already assign).
- How to say “America, fuck yeah!” in dog