Stata2Pajek w vertice colors

December 10, 2009 at 7:29 pm 9 comments

| Gabriel |

[updated 6/29/2010, added option for “missingcolor” instead of always defaulting to yellow. also introduced commented quotes so as to make TextMate’s syntax parser happy]
My Stata program stata2pajek let’s you export an edge list or arc list from Stata to Pajek. However I recently wanted to make color-coded vertices and this required tweaking the syntax a bit. Because this is a relational database problem (and Stata likes flat-file), I do it by merging in a vertice level file on disk. Although I’ve written it to merge on colors it should be fairly easy to rewrite to deal with other vertice-level variables. Unfortunately the syntax is pretty awkward, which is why I’m forking it and posting it to the blog instead of updating the main stata2pajek file at SSC.

Note that the script assumes that nodes which appear in the arc list but not the vertice attributes file should be color-coded yellow. If you want yellow to have a substantive meaning you should change line 61 to be something else.

The syntax is the same as stata2pajek except there are a few more options. If you omit these options it behaves just like stata2pajek classic:

  • attributefile() — the file where the vertice level variables are stored
  • attributekey() — the merge key, defaults to “ego”
  • color() — the variable storing the color-codes

For instance, to color-code a graph of radio stations based on when they first played “My Humps” (yes, really), I use this command:

stata2pajekalt ego alter, filename(ties_bounded_humpcolor) attributefile(humps_color) attributekey(ego) color(color)

It says to treat the data in memory as an arc list, to merge on station-level data on adoption time where the station name is stored as the variable “ego” in the file “humps_color”, and to write out the whole mess as a Pajek formatted text file called “ties_bounded_humpscolor.net”.

*1.4 GHR 6/29/2010
*forked from stata2pajek 1.2 (available at ssc)
capture program drop stata2pajekalt
program define stata2pajekalt
	version 10
	set more off
	syntax varlist(min=2 max=2) [ , tiestrength(string asis) filename(string asis) EDGEs attributefile(string asis) attributekey(string asis) color(string asis) missingcolor(string asis)]

	tempfile dataset
	quietly save `dataset'
	
	gettoken ego alter : varlist
	
	local starchar="*"
	if "`filename'"=="" {
		local pajeknetfile "mypajekfile"
	}
	else {
		local pajeknetfile "`filename'"
	}
	if "`attributekey'"=="" {
		local attributekey "`ego'"
	}
	if "`missingcolor'"=="" {
		local missingcolor "Yellow"
	}
	capture file close pajeknetfile
	file open pajeknetfile using `pajeknetfile'.net, write text replace
	use `dataset', clear
	drop `ego'
	ren `alter' `ego'
	append using `dataset'
	keep `ego'
	contract `ego'
	ren `ego' verticelabel
	drop _freq
	sort verticelabel
	gen number=[_n]
	order number verticelabel
	if "`attributefile'"!="" {
		ren verticelabel `attributekey'
		merge `attributekey' using `attributefile'
		drop if _merge==2
		ren `attributekey' verticelabel
		drop _merge
		keep number verticelabel `color'
		sort verticelabel
	}
	tempfile verticelabels
	quietly save `verticelabels', replace
	local nvertices=[_N]
	file write pajeknetfile "`starchar'Vertices `nvertices'" _n
	if "`attributefile'"=="" {
		forvalues x=1/`nvertices' {
			local c2=verticelabel in `x'
			file write pajeknetfile `"`x' "`c2'""' _n
			*"
		}
	}
	else {
		forvalues x=1/`nvertices' {
			local c2=verticelabel in `x'
			local colvalue=`color' in `x'
			if "`colvalue'"=="" {
				local colvalue "`missingcolor'"
			}
			file write pajeknetfile `"`x' "`c2'" ic `colvalue'"' _n
		}
		*"
		
	}
	use `dataset', clear
	ren `ego' verticelabel
	sort verticelabel
	quietly merge verticelabel using `verticelabels'
	quietly keep if _merge==3
	drop _merge verticelabel
	ren number `ego'
	ren `alter' verticelabel
	sort verticelabel
	quietly merge verticelabel using `verticelabels'
	quietly keep if _merge==3
	drop _merge verticelabel
	ren number `alter'
	order `ego' `alter' `tiestrength'
	keep  `ego' `alter' `tiestrength'
	local narcs=[_N]
	if "`edges'"=="edges" {
		file write pajeknetfile `"`starchar'Edges"' _n
	}
	else {
		file write pajeknetfile `"`starchar'Arcs"' _n
	}
	*"
	sort `ego' `alter'      
	if "`tiestrength'"~="" {
		forvalues x=1/`narcs' {
		        local c1=`ego' in `x'
		        local c2=`alter' in `x'
		        local c3=`tiestrength' in `x'                   
		        file write pajeknetfile "`c1' `c2' `c3'" _n
		}
	}
	else {
		forvalues x=1/`narcs' {
		        local c1=`ego' in `x'
		        local c2=`alter' in `x'
		        file write pajeknetfile "`c1' `c2'" _n
		}       
	}
	file close pajeknetfile
	*ensure that it's windows (CRLF) text format
	if "$S_OS"~="Windows" {
		filefilter `pajeknetfile'.net tmp, from(\M) to(\W) replace
		shell mv tmp `pajeknetfile'.net
		filefilter `pajeknetfile'.net tmp, from(\U) to(\W) replace
		shell mv tmp `pajeknetfile'.net
	}
	use `dataset', clear
	disp "Your output is saved as"
	disp "`c(pwd)'`c(dirsep)'`pajeknetfile'.net"
end

Entry filed under: Uncategorized. Tags: , , , .

sudo apt-get install reconciliation Stata shell “command not found” errors

9 Comments

  • 1. Hannah  |  December 14, 2009 at 5:31 pm

    Dear Gabriel,

    I am trying to use your stata2pajek program currently but what is not entirely clear to me still is what format the stata data should be in… Perhaps this is a dumb question

    Best wishes
    Hannah

    • 2. gabrielrossman  |  December 14, 2009 at 5:43 pm

      hannah,
      the data should already be an edge list — i.e., a dataset at the dyad level. so if you want to describe a clique of tom, dick, and harry, the data would look like this

      i j
      "tom" "dick"
      "dick" "harry"
      "tom" "harry"

      you’d then use the syntax
      stata2pajek i j, edges

      note that stata2pajek doesn’t handle bipartite networks (aka collaboration networks). if you want to use bipartite data you can convert it using the Stata command “joinby“. for instance to make a network of imdb you’d go
      use imdb_actors
      ren actor j
      joinby film using imdb_actors
      ren actor i
      stata2pajek i j, edges filename(imdb)

      does that make sense?

      • 3. Hannah  |  March 18, 2010 at 5:46 am

        Hey! I
        didn’t check in with your answer before but it makes sense that it starts from an edgelist format. My data are in nodelist, however, but it shouldn’t be too hard to get them into an edgelist.

        Thanks!
        Best wishes

  • 4. Víctor Aguiar  |  January 1, 2010 at 3:02 am

    Dear Gabriel,
    I have a question related to how you present your code in your blog not to this post, sorry for that. I’m starting my own blog on Stata and economics. Again sorry for the rudeness but i would love to know what kind of plugin, text editor or gimmick to put your code in a text editor way and even highlighted. Hope you can help me. Respectfully
    Victor Aguair
    Ecuador South America

    • 5. gabrielrossman  |  January 1, 2010 at 12:17 pm

      hi victor,
      that’s not rude at all.

      i use the “sourcecode” wordpress plugin. the advantage of this over just the “pre” or “code” html tags is that it a) escapes the text and tabs so the rendering engine doesn’t eat them and b) provides some highlighting.
      for details see
      http://en.support.wordpress.com/code/posting-source-code/

      there’s no Stata syntax file so I usually just tell it I’m posting “perl” which is how I get some syntax highlighting. this works reasonably well because the two languages are similar in regards to things like loops.

      i’ve never done this, but another way to do it is to preserve the highlighting of your text editor by copying as RTF. here are instructions on how to do it with TextMate
      http://github.com/drnic/copy-as-rtf-tmbundle

      good luck with your econometrics blog.

  • 6. Víctor Aguiar  |  January 1, 2010 at 5:30 pm

    Thanks a lot! keep the good work!

  • 7. More R headaches « Code and Culture  |  February 28, 2010 at 1:41 pm

    […] associate it with the network data). This has proven really difficult to me so instead I wrote an alternate version of stata2pajek that let’s me do this within Stata. The upside is that I spend more time in […]

  • 8. Affiliation network 2 edge list « Code and Culture  |  March 17, 2010 at 5:31 am

    […] My student wanted to project the affiliation network into an edge list at the “i” level. As before he only wanted each edge in once (so if we have “1 & 2″ we don’t also want “2 & 1″). To accomplish this, I wrote him a program that takes as arguments the name of the “affiliation” variable and the name of the “members list” variable. To do this it first reshapes to a long file (mostly lines 11-18), then uses joinby against itself to create all permutations (mostly lines 22-24), and finally drops redundant cases by only keeping dyads where ego was listed before alter in the original list of affiliation members (mostly lines 25-32). With minor modifications, the script would also work with affiliation data that starts out as long, like IMDB. Also note that it should work well in combination with stata2pajek classic (“ssc install stata2pajek”) or the version that lets you save vertice traits like color. […]

  • 9. Network Graphs in Native Stata Code « Code and Culture  |  April 13, 2010 at 5:38 am

    […] is to not handle them in Stata. Rather I take an approach inspired by the Unix philosophy and export the data, then call an R script to do what I need, and in some cases use perl to clean the output for […]


The Culture Geeks