some R baby steps

November 24, 2009 at 4:42 am 11 comments

| Gabriel |

I’ve tried to learn R a few times but the syntax has always been opaque to my Stata-centric mind. I think the trick is to realize that two of the key differences are that:

  • whereas handles only come up in a few Stata commands (e.g., file, postfile, log), they are very important in R, what with all the “<-” statements
  • in R there’s a much fuzzier line between commands and functions than in Stata. What I mean by this is both the superficial thing of all the parentheses and also the more substantive issue that often you don’t put them one to a line and they just do something (like Stata commands) but you usually put them many to a line and feed them into something else (like Stata functions). Related to this is that the typical Stata line has the syntax “verb object, adverb” whereas the typical R line has the syntax “object <- verb(object2, adverb)”

The two combine in an obvious way with something as simple as opening a dataset, which is just use file in Stata but is filehandle <- read.table(“file”) in R, that is, there’s not a read.table() command but a read.table() function and you feed this function to a handle. (And people say R isn’t intuitive!)

At least I think that’s a good way to think about the basic syntax — I suck at R and I really could be totally wrong about this. (Pierre or Kieran please correct me).

Anyway, I wrote my first useful R file the other day. It reads my Pajek formatted network data on top 40 radio stations and does a graph.

# File-Name: testgraph.R
# Date: 2009-11-20
# Author: Gabriel Rossman
# Purpose: graph CHR station network
# Data Used: ties_bounded.net
# Packages Used: igraph
library(igraph)
setwd("~/Documents/Sjt/radio/survey")
chrnet <- read.graph("ties.net", c("pajek"))
pdf("~/Documents/book/images/chrnetworkbounded.pdf")
plot.igraph(chrnet, layout=layout.fruchterman.reingold, vertex.size=2, vertex.label=NA, vertex.color="red", edge.color="gray20", edge.arrow.size=0.3, margin=0)
dev.off()

The weird thing is that it works fine in R.app but breaks when I try to R run from the Terminal, regardless of whether I try to do it all in one line or first invoke R and then feed it the script. [Update: the issue is a 32 bit library and 64 bit R, the simple solution is to invoke “R32” rather than just plain “R”. see the comments for details]. Here’s a session with both problems:

gabriel-rossmans-macbook-2:~ rossman$ Rscript ~/Documents/book/stata/testgraph.R
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so, 10): Symbol not found: ___gmpz_clear
Referenced from: /Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so
Expected in: dynamic lookup

Error : .onLoad failed in 'loadNamespace' for 'igraph'
Error: package/namespace load failed for 'igraph'
Execution halted
gabriel-rossmans-macbook-2:~ rossman$ R 'source("~/Documents/book/stata/testgraph.R")'
ARGUMENT 'source("~/Documents/book/stata/testgraph.R")' __ignored__

R version 2.10.0 (2009-10-26)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("~/Documents/book/stata/testgraph.R")
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared library '/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so, 10): Symbol not found: ___gmpz_clear
Referenced from: /Library/Frameworks/R.framework/Resources/library/igraph/libs/x86_64/igraph.so
Expected in: dynamic lookup

 

Error : .onLoad failed in 'loadNamespace' for 'igraph'
Error: package/namespace load failed for 'igraph'
>

The problem seems to be that R (terminal) can’t find the igraph library. This is weird because R.app has no trouble finding it. Furthermore, I get the same error even if I make sure igraph is installed directly from R (Terminal) in the same R session:

chooseCRANmirror(graphics = FALSE)
install.packages("igraph")
source("/Users/rossman/Documents/book/stata/testgraph.R")

I guess that’s another difference with Stata, StataConsole knows where the ado library is. I’d like to be able to use the Terminal mode for R as this would let me to reach my Nirvana-like goal of having a single script that does everything without any proximate human intervention. So I’ll just ask? How do I get R (Terminal) to run as reliably as R.app? Is this a naive question?

Or would it be better to try to feed R.app a “source” script from the command line? Much how like I can do this for Stata to launch a do-file into the Stata GUI
exec /Applications/Stata/StataMP.app/Contents/MacOS/stataMP ~/Documents/book/stata/import.do

Entry filed under: Uncategorized. Tags: , , .

I am shocked–shocked–to find scientists abusing peer review Perl text library

11 Comments

  • 1. Jesse  |  November 24, 2009 at 11:28 am

    Congrats on your R journey. It pays off, but it’s a long hike. I can’t answer your last question since it might be a Mac-specific issue. There is a great community though.

    There is a mac OS R faq that appears to answer the question in 1.5.

  • 2. pkremp  |  November 24, 2009 at 12:23 pm

    I don’t know the Rscript command and I haven’t tested your code, but when I use R from the command line, I usually use the command:

    /usr/bin/R –vanilla <filename.R

    it finds all the libraries the same way R-gui does. And all the graphs get saved on distinct pages of a pdf file named Rplots.pdf (which can be quite convenient if you want to save lots of graphs).

    Now it looks like you are using the 64-bit version of R. Installing a library is a bit more difficult as R needs to download the package source and compile it to make it work — instead of just downloading the precompiled (binary) version of the package and telling R to use it. Usually this works, but I have seen problems with gcc or gfortran…

  • 3. pkremp  |  November 24, 2009 at 12:28 pm

    oops, I think wordpress reformatted the two minus signs before the “vanilla” argument as one em dash:

    I meant:

    /usr/bin/R [minus][minus]vanilla <filename.R

    • 4. gabrielrossman  |  November 24, 2009 at 7:12 pm

      pierre,
      1. i also hate the auto-formatting in wordpress. i had trouble with this when i tried to post a perl script

      2. your tip didn’t work exactly, but it put me on the right track. my /usr/bin/ includes R, R32, and R64. since you noted the problem seems to be with the 64 bit version, i forced it to work in 32 bit and it worked! here’s the command that worked for me
      /usr/bin/R32 --vanilla <~/Documents/book/stata/testgraph.R
      thanks

      • 5. gabrielrossman  |  November 24, 2009 at 7:19 pm

        ps,
        this guy has a similar config and similar problem as me. basically, 64 bit R doesn’t work with 32 bit libraries, and vice versa.

        this also makes sense why Stata doesn’t have similar problems, as ado file are scripts, not compiled code and so they will run on any version of Stata: Mac, Windows, Linux, 32 bit, 64 bit, etc.

        i guess if i get serious enough about R and am dealing with slow algorithms / large datasets, i’ll figure out how to compile the libraries as 64 bit, but for now it’s easiest to stick with R32.

  • 6. Kieran  |  November 26, 2009 at 11:30 am

    in R there’s a much fuzzier line between commands and functions than in Stata

    R is an object-oriented and functional language. Think of every command as a function. And think of everything (including functions and file handles) as objects.

    Although it’s not quite true any more (because of the introduction of programming concepts such as namespaces, classes, and methods to the R language) in many cases you can just type the name of a command without parentheses or arguments, and the function it executes will output to the terminal, so you can ‘see inside’ the function in much the same way as you can ‘see inside’ a data frame by just typing its name without arguments. Try it with, e.g.,

    > read.table

    or

    > jitter

    As I say, this used to be a good way to get a sense of the language but these days it’s a bit harder to get to the code underlying functions. But you can still access them. For instance, try

    > methods(mean)

    to see the methods associated with the generic function mean(), and then

    > mean.default

    to see the default mean function.

    As mentioned above your library problem is a 32/64 bit issue, not a problem with your path.

    • 7. gabrielrossman  |  November 28, 2009 at 10:10 pm

      kieran,

      it’s funny because i knew in the abstract what an object oriented language was but never really understood what one looked like. i think i was on the right track in realizing that using R requires understanding that it’s a completely different paradigm than what I’m used to, but it’s nice to put a name on it.

      yet more proof that social scientists could stand to have a bit stronger computer science background

      thanks

  • 8. Tal  |  November 29, 2009 at 5:18 pm

    Any pointer to a great R intro for Stata users? I always wanted to know how to do

    reg y x , cluster(z)

    in R, and also

    reg y x , robust

    But I’ve never found the perfect R for Stata users manual.

    • 9. gabrielrossman  |  November 30, 2009 at 11:28 am

      tal,
      good question but as an R neophyte myself i’m not qualified to answer it.
      my hunch though is that unlike some other packages are reasonably parallel to Stata (e.g., gretl, gnuplot), looking for direct parallels between Stata and R only increases the opacity of R. so it’s probably best to try to learn R from scratch (or not at all).
      as to your specific questions, i think you’ll find it easier to search for what the Stata commands represent rather than the Stata commands themselves. so search for “sandwich error” rather than “cluster(z)” or “Huber-White” instead of “robust”.
      I think you want the sandwich package.
      install.packages(sandwich)
      require(sandwich)

      Unfortunately, I don’t completely understand how it works. (It seems like it wants to report matrices rather than robust standard errors for each beta). But you should be able to figure it out with the documentation (which unlike many R packages includes example syntax).
      Good luck and please post to the comments if you figure out how to get it to report robust standard error.

  • 10. Kieran  |  November 30, 2009 at 9:21 pm

    You can get various robust SE estimators via Frank Harrell’s Design library (see e.g., this search or search for “robcov” and “bootcov” in the Design library.

    One thing you’ll tend not to get with R, though, is much respect for techniques that, although perhaps standard in some disciplines or subfields, are not seen as useful or up-to-date by research statisticians. The Statistics people drive R’s development. They are applied in their orientation but of course they see something like R very much as a platform for research and for researching and implementing up-to-date methods rather than a repository of canned techniques to be applied mechanically. This is in general a very good thing, and one of the virtues of R is that it helps keep you honest.

    It does mean, though, that people like us will tend to run into situations where you want a library or function to “do x” but the people most likely to have written the package to do that in R think that method is no good. In those circumstances, x may nevertheless be available (lucky you), be available but not implemented in a way that yields convenient straight-to-table output (annoying but workable), or simply be unimplemented. In the latter case you will likely be told you shouldn’t be doing x anyway, and will likely be directed to the new and improved (or at least approved) method. Or you may be told that you can always adapt the y library to write code yourself to implement the method you want (assuming you know what you are doing, and if you don’t then why are you looking for a canned solution in the first place?). These answers may be delivered in a more or less blunt way depending on your interlocutor on r-help. A search for “Huber-White” in the r-help archives will bring up one such (quite friendly) exchange.

    Let me be clear that I’m speaking mostly descriptively here — I’m not taking a position on sandwich estimators or the relative merits of R and Stata’s models of development, implementation, attitudes to users, etc.

  • 11. Network Graphs in Native Stata Code « Code and Culture  |  April 13, 2010 at 5:38 am

    […] them in Stata. Rather I take an approach inspired by the Unix philosophy and export the data, then call an R script to do what I need, and in some cases use perl to clean the output for importing back into Stata. Since R/igraph has […]


The Culture Geeks


%d bloggers like this: