Stata2Pajek w vertice colors
December 10, 2009 at 7:29 pm GR 9 comments
| Gabriel |
[updated 6/29/2010, added option for “missingcolor” instead of always defaulting to yellow. also introduced commented quotes so as to make TextMate’s syntax parser happy]
My Stata program stata2pajek let’s you export an edge list or arc list from Stata to Pajek. However I recently wanted to make color-coded vertices and this required tweaking the syntax a bit. Because this is a relational database problem (and Stata likes flat-file), I do it by merging in a vertice level file on disk. Although I’ve written it to merge on colors it should be fairly easy to rewrite to deal with other vertice-level variables. Unfortunately the syntax is pretty awkward, which is why I’m forking it and posting it to the blog instead of updating the main stata2pajek file at SSC.
Note that the script assumes that nodes which appear in the arc list but not the vertice attributes file should be color-coded yellow. If you want yellow to have a substantive meaning you should change line 61 to be something else.
The syntax is the same as stata2pajek except there are a few more options. If you omit these options it behaves just like stata2pajek classic:
- attributefile() — the file where the vertice level variables are stored
- attributekey() — the merge key, defaults to “ego”
- color() — the variable storing the color-codes
For instance, to color-code a graph of radio stations based on when they first played “My Humps” (yes, really), I use this command:
stata2pajekalt ego alter, filename(ties_bounded_humpcolor) attributefile(humps_color) attributekey(ego) color(color)
It says to treat the data in memory as an arc list, to merge on station-level data on adoption time where the station name is stored as the variable “ego” in the file “humps_color”, and to write out the whole mess as a Pajek formatted text file called “ties_bounded_humpscolor.net”.
*1.4 GHR 6/29/2010 *forked from stata2pajek 1.2 (available at ssc) capture program drop stata2pajekalt program define stata2pajekalt version 10 set more off syntax varlist(min=2 max=2) [ , tiestrength(string asis) filename(string asis) EDGEs attributefile(string asis) attributekey(string asis) color(string asis) missingcolor(string asis)] tempfile dataset quietly save `dataset' gettoken ego alter : varlist local starchar="*" if "`filename'"=="" { local pajeknetfile "mypajekfile" } else { local pajeknetfile "`filename'" } if "`attributekey'"=="" { local attributekey "`ego'" } if "`missingcolor'"=="" { local missingcolor "Yellow" } capture file close pajeknetfile file open pajeknetfile using `pajeknetfile'.net, write text replace use `dataset', clear drop `ego' ren `alter' `ego' append using `dataset' keep `ego' contract `ego' ren `ego' verticelabel drop _freq sort verticelabel gen number=[_n] order number verticelabel if "`attributefile'"!="" { ren verticelabel `attributekey' merge `attributekey' using `attributefile' drop if _merge==2 ren `attributekey' verticelabel drop _merge keep number verticelabel `color' sort verticelabel } tempfile verticelabels quietly save `verticelabels', replace local nvertices=[_N] file write pajeknetfile "`starchar'Vertices `nvertices'" _n if "`attributefile'"=="" { forvalues x=1/`nvertices' { local c2=verticelabel in `x' file write pajeknetfile `"`x' "`c2'""' _n *" } } else { forvalues x=1/`nvertices' { local c2=verticelabel in `x' local colvalue=`color' in `x' if "`colvalue'"=="" { local colvalue "`missingcolor'" } file write pajeknetfile `"`x' "`c2'" ic `colvalue'"' _n } *" } use `dataset', clear ren `ego' verticelabel sort verticelabel quietly merge verticelabel using `verticelabels' quietly keep if _merge==3 drop _merge verticelabel ren number `ego' ren `alter' verticelabel sort verticelabel quietly merge verticelabel using `verticelabels' quietly keep if _merge==3 drop _merge verticelabel ren number `alter' order `ego' `alter' `tiestrength' keep `ego' `alter' `tiestrength' local narcs=[_N] if "`edges'"=="edges" { file write pajeknetfile `"`starchar'Edges"' _n } else { file write pajeknetfile `"`starchar'Arcs"' _n } *" sort `ego' `alter' if "`tiestrength'"~="" { forvalues x=1/`narcs' { local c1=`ego' in `x' local c2=`alter' in `x' local c3=`tiestrength' in `x' file write pajeknetfile "`c1' `c2' `c3'" _n } } else { forvalues x=1/`narcs' { local c1=`ego' in `x' local c2=`alter' in `x' file write pajeknetfile "`c1' `c2'" _n } } file close pajeknetfile *ensure that it's windows (CRLF) text format if "$S_OS"~="Windows" { filefilter `pajeknetfile'.net tmp, from(\M) to(\W) replace shell mv tmp `pajeknetfile'.net filefilter `pajeknetfile'.net tmp, from(\U) to(\W) replace shell mv tmp `pajeknetfile'.net } use `dataset', clear disp "Your output is saved as" disp "`c(pwd)'`c(dirsep)'`pajeknetfile'.net" end
Entry filed under: Uncategorized. Tags: cleaning, graphs, networks, Stata.
1. Hannah | December 14, 2009 at 5:31 pm
Dear Gabriel,
I am trying to use your stata2pajek program currently but what is not entirely clear to me still is what format the stata data should be in… Perhaps this is a dumb question
Best wishes
Hannah
2. gabrielrossman | December 14, 2009 at 5:43 pm
hannah,
the data should already be an edge list — i.e., a dataset at the dyad level. so if you want to describe a clique of tom, dick, and harry, the data would look like this
i j
"tom" "dick"
"dick" "harry"
"tom" "harry"
you’d then use the syntax
stata2pajek i j, edges
note that stata2pajek doesn’t handle bipartite networks (aka collaboration networks). if you want to use bipartite data you can convert it using the Stata command “joinby“. for instance to make a network of imdb you’d go
use imdb_actors
ren actor j
joinby film using imdb_actors
ren actor i
stata2pajek i j, edges filename(imdb)
does that make sense?
3. Hannah | March 18, 2010 at 5:46 am
Hey! I
didn’t check in with your answer before but it makes sense that it starts from an edgelist format. My data are in nodelist, however, but it shouldn’t be too hard to get them into an edgelist.
Thanks!
Best wishes
4. Víctor Aguiar | January 1, 2010 at 3:02 am
Dear Gabriel,
I have a question related to how you present your code in your blog not to this post, sorry for that. I’m starting my own blog on Stata and economics. Again sorry for the rudeness but i would love to know what kind of plugin, text editor or gimmick to put your code in a text editor way and even highlighted. Hope you can help me. Respectfully
Victor Aguair
Ecuador South America
5. gabrielrossman | January 1, 2010 at 12:17 pm
hi victor,
that’s not rude at all.
i use the “sourcecode” wordpress plugin. the advantage of this over just the “pre” or “code” html tags is that it a) escapes the text and tabs so the rendering engine doesn’t eat them and b) provides some highlighting.
for details see
http://en.support.wordpress.com/code/posting-source-code/
there’s no Stata syntax file so I usually just tell it I’m posting “perl” which is how I get some syntax highlighting. this works reasonably well because the two languages are similar in regards to things like loops.
i’ve never done this, but another way to do it is to preserve the highlighting of your text editor by copying as RTF. here are instructions on how to do it with TextMate
http://github.com/drnic/copy-as-rtf-tmbundle
good luck with your econometrics blog.
6. Víctor Aguiar | January 1, 2010 at 5:30 pm
Thanks a lot! keep the good work!
7. More R headaches « Code and Culture | February 28, 2010 at 1:41 pm
[…] associate it with the network data). This has proven really difficult to me so instead I wrote an alternate version of stata2pajek that let’s me do this within Stata. The upside is that I spend more time in […]
8. Affiliation network 2 edge list « Code and Culture | March 17, 2010 at 5:31 am
[…] My student wanted to project the affiliation network into an edge list at the “i” level. As before he only wanted each edge in once (so if we have “1 & 2″ we don’t also want “2 & 1″). To accomplish this, I wrote him a program that takes as arguments the name of the “affiliation” variable and the name of the “members list” variable. To do this it first reshapes to a long file (mostly lines 11-18), then uses joinby against itself to create all permutations (mostly lines 22-24), and finally drops redundant cases by only keeping dyads where ego was listed before alter in the original list of affiliation members (mostly lines 25-32). With minor modifications, the script would also work with affiliation data that starts out as long, like IMDB. Also note that it should work well in combination with stata2pajek classic (“ssc install stata2pajek”) or the version that lets you save vertice traits like color. […]
9. Network Graphs in Native Stata Code « Code and Culture | April 13, 2010 at 5:38 am
[…] is to not handle them in Stata. Rather I take an approach inspired by the Unix philosophy and export the data, then call an R script to do what I need, and in some cases use perl to clean the output for […]