More R headaches

February 28, 2010 at 1:23 pm 9 comments

| Gabriel |

I’ve continued to play with the R package igraph and I think I’ve gotten the hang of igraph itself but R itself is still pretty opaque to me and that limits my ability to use igraph. Specifically, I’m trying to merge diffusion data onto a network graph and plot it as kind of a slideshow, where each successive image is a new period. I can actually do this, but I’m having trouble looping it and so my code is very repetitive. [Update, thanks to Brian Rubineau I’ve gotten it to work]

First the two tricks that I have solved.

1. Merging the diffusion data (a vertice-level trait) onto the network.

The preferred way to do this is to read in the network file to R then merge in the vertice-level trait (or more technically, read the vertice data and associate it with the network data). This has proven really difficult to me so instead I wrote an alternate version of stata2pajek that let’s me do this within Stata. The upside is that I spend more time in Stata and less time in R, the downside to this is that I need a separate “.net” file for every version of the graph.

2. Getting the different versions of the graph to appear comparable.

It does no good to have multiple versions of a graph if they don’t look at all similar, which is what you get if you generate the layout on the fly as part of each plot. The solution to this is to first generate a layout (the object “la”) then apply it to each graph by defining the “layout” parameter to “=la” within the “plot.igraph()” function.

So instead of this:

plot.igraph(chrnetbounded, layout=layout.fruchterman.reingold, vertex.size=4, vertex.label=NA, vertex.color="red", edge.color="gray20", edge.arrow.size=0.3, margin=0)

Do this:

la = layout.fruchterman.reingold(chrnetbounded)
plot.igraph(chrnetbounded, layout=la, vertex.size=4, vertex.label=NA, vertex.color="red", edge.color="gray20", edge.arrow.size=0.3, margin=0)

Now the problem I can’t solve, at least with anything approaching elegance.

3. Looping it.

Currently my code is really long and repetitive. I want one graph per week for a whole year. To make 52 versions of the graph, I need about two hundred lines of code, whereas I should be able to just loop my core four lines of code 52 times — pdf(); read.graph();plot.igraph();dev.off(). This works, but is ridiculous.

The thing is that I can’t figure out how to make a loop in R where the looping local feeds to be part of a filename. This kind of thing is trivially easy in Stata since Stata expands locals and then interprets them. For example, here’s how I would do this in Stata. (I’m using “twoway scatter” as a placeholder for “plot.igraph()”, which of course doesn’t exist in Stata).

forvalues i=0/52 {
	use ties_bounded`i', clear
	twoway scatter x y
	graph export chrnet_hc`i'.png, replace
}

R not only doesn’t let you do this directly, but I can’t even figure out how to do it by adding an extra step where the looped local feeds into a new object to write the filename. (The problem is that the “paste()” function adds whitespace). So basically I’m at an impasse and I see three ways to do it:

  • Wait for somebody to tell me in the comments how to do this kind of loop in R. (hint, hint).
  • Give up on writing this kind of loop and just resign myself to writing really repetitive R code.
  • Give up on using igraph from within R and learn to use it from within Python. I have basically zero experience using Python but it has a good reputation for usability. In fact, it only took me, a complete Python-noob, about ten minutes to figure out what’s been a real stumper in R. I haven’t yet worked igraph into this loop but I’m thinking it can’t be that hard.
for i in range(0,52):
     datafile='ties_bounded%d' % i
     # igraph code here that treats the python object "datafile" as a filename

Update
So thanks to Brian Rubineau’s suggestion on how to better use the “paste()” function, which is functionally equivalent to the second line of the Python code above. The catch is that you have to add a “, sep=”” ” parameter to suppress the whitespace that had been annoying me. I thought I tried this already, but apparently not. Anyway, the Python / R method of first defining a new object then calling it is an extra step compared to Stata loops (where the looping local can expand directly) but it’s still reasonably easy. Here’s my complete R code, which I’m now very happy with.

# File-Name:       chrnetwork.R                 
# Date:            2010-02-26
# Created Date:    2009-11-24                               
# Author:          Gabriel Rossman                                       
# Purpose:         graph CHR station network
# Data Used:       ties_bounded.net
# Packages Used:   igraph    
library(igraph)
setwd("~/Documents/Sjt/radio/survey")
#ties bounded to only top 40, includes adoption time color-codes, but use is optional
chrnetbounded <- read.graph("ties_bounded_humpcolor.net", c("pajek"))
la = layout.fruchterman.reingold(chrnetbounded)  #create layout for use on several related graphs
#graph structure only
pdf("~/Documents/book/images/chrnetworkbounded.pdf")
 plot.igraph(chrnetbounded, layout=la, vertex.size=4, vertex.label=NA, vertex.color="red", edge.color="gray20", edge.arrow.size=0.3, margin=0)
dev.off()
#graph color coded diffusion
pdf("~/Documents/book/images/chrnetworkboundedcolor.pdf")
 plot.igraph(chrnetbounded, layout=la, vertex.size=4, vertex.label=NA, edge.color="gray60", edge.arrow.size=0.3, margin=0)
dev.off()
#do as flipbook
setwd("~/Documents/Sjt/radio/survey/flipbook")
for(i in 0:52) {
	datafile<-paste('ties_bounded_hc',i,'.net', sep="")
	pngfile<-paste('~/Documents/book/images/chrnet_hc',i,'.png', sep="")
	chrnetbounded <- read.graph(datafile, c("pajek"))
	png(pngfile)
	plot.igraph(chrnetbounded, layout=la, vertex.size=4, vertex.label=NA, edge.color="gray60", edge.arrow.size=0.3, margin=0)
	dev.off()
}

Entry filed under: Uncategorized. Tags: , , .

Dyadkey Network slideshow

9 Comments

  • 1. brubineau  |  February 28, 2010 at 2:03 pm

    Although I don’t know if this is your main obstacle, but
    the argument:
    sep=””

    to the paste command in R suppresses the return of the space.
    Good luck!

    • 2. gabrielrossman  |  February 28, 2010 at 2:35 pm

      you know i thought i tried that already but i must have had a typo or something. anyway, now it works perfectly, thank you. i’ll update the post with the working code

  • 3. Network slideshow « Code and Culture  |  March 1, 2010 at 1:35 pm

    […] that I’ve gotten R and igraph to make a set of 53 png files (see yesterday’s post), the next step is animating them. I did this using the command line tool ImageMagick, which I […]

  • 4. Kieran  |  March 4, 2010 at 5:05 pm

    How is the network data structured in ties_bounded.net? Is it a giant matrix, or 52 separate matrices, or what? To avoid the loops, you can make your 52 weekly network snaphots into a list object, put the equivalent of lines 23-30 in a function (to create a particular file), and then use lapply() to apply the function to each element in the list.

    • 5. gabrielrossman  |  March 4, 2010 at 7:34 pm

      kieran,

      thanks for the hints, though before we get too deep into solving this i should say that my current R script (a) gives me the end results i want and (b) takes about 20 seconds to execute. as such, your suggestions for making it run more efficiently would be elegant and emotionally satisfying but not necessary in any pragmatic sense, at least in the short run.

      all of the “.net” files are standard Pajek files which i generated using this Stata script. that is, the beginning consists of vertice serial numbers, vertice labels, and other vertice-level traits (ie, color) and it’s followed by an arc list. the only difference between ties_bounded.net, ties_bounded_hc1.net, ties_bounded_hc2.net, etc, is that they have different colors attached to the vertices. aside from that the vertice files are identical (which is why it’s meaningful to reuse the layout object).

      anyway, that’s how the “.net” file is structured, but the R object “chrnetbounded” created by the read.graph() import filter is structured differently. to be honest, i don’t really understand the internal igraph data structure but i trust the read.graph() function to do the conversion properly.

      the standard way of attaching a vector of vertice-level traits in igraph is to use the function set.graph.attributes(), but i haven’t gotten the function to work yet.

  • 6. Kieran  |  March 5, 2010 at 12:23 am

    my current R script (a) gives me the end results i want and (b) takes about 20 seconds to execute.

    This is the right attitude, yeah. I was motivated by your question about how to avoid for {} loops, which in R are generally to be avoided, but not always — if the thing runs in acceptable time, there’s not too much point in vectorizing it or using variants of apply().

    Still, I poked at this for a while this evening and here’s a post: http://www.kieranhealy.org/blog/archives/2010/03/04/lists-and-loops-in-r/

  • 7. Thanks for the Aspirin Guys « Code and Culture  |  March 16, 2010 at 4:46 am

    […] a recent post, I lamented that I couldn’t figure out how to do loops and a few other things in R. The […]

  • 8. anderson  |  April 8, 2010 at 4:57 am

    Thank you .

    Could you help me.

    I want to change the thickness of some arc?

    Thank

    • 9. gabrielrossman  |  April 8, 2010 at 10:57 am

      if you want to have all the edges be thicker, use the argument edge.width=k, where k is the width you want. if you want some edges to be thicker than others it’s a bit more complicated, see the “attributes” entry of the igraph manual or this tutorial.


The Culture Geeks


%d bloggers like this: