Stata2Pajek

September 1, 2009 at 4:56 am 9 comments

| Gabriel |

[update #4 (9/16), I also fixed a problem with tie strength, improved the help file, and submitted it to SSC. the SSC version should be updated in a few days but in the meantime the updated version is in this post]

[update #3, thanks to “T” in the comments, i fixed a bug in the code. the updated code is in this post but not yet uploaded to SSC]

[update #2, the ado file is now hosted at SSC. type “ssc install stata2pajek” to install it]

[update #1, I cleaned the code a bit]

I wait impatiently wait for somebody more talented than I am to write a native Stata social network package (which should be feasible with Mata). In the meantime, to do anything much more ambitious than estimate density, us Stata folks have to export the data to other packages, basically all of which can read the “.net” format. Currently the best way to do this is to use the command “outsheet” and then clean it with the Windows program “Txt2Pajek,” but I decided to do natively in Stata. My way is faster, scriptable, and works better cross-platform. (Running Txt2Pajek on Unix-like computers requires Crossover/Wine, although it does run smoothly).

The program takes the arguments dataset, ego, alter, and an optional “tie strength” argument.

*1.2 GHR Sept 16, 2009
capture program drop stata2pajek
program define stata2pajek
	version 10
	set more off
	syntax varlist(min=2 max=2) [ , tiestrength(string asis) filename(string asis) EDGEs]

	tempfile dataset
	quietly save `dataset'

	gettoken ego alter : varlist

	local starchar="*"
	if "`filename'"=="" {
		local pajeknetfile "mypajekfile"
	}
	else {
		local pajeknetfile "`filename'"
	}
	capture file close pajeknetfile
	file open pajeknetfile using `pajeknetfile'.net, write text replace
	use `dataset', clear
	drop `ego'
	ren `alter' `ego'
	append using `dataset'
	keep `ego'
	contract `ego'
	ren `ego' verticelabel
	drop _freq
	sort verticelabel
	gen number=[_n]
	order number verticelabel
	tempfile verticelabels
	quietly save `verticelabels', replace
	local nvertices=[_N]
	file write pajeknetfile "`starchar'Vertices `nvertices'" _n
	forvalues x=1/`nvertices' {
		local c2=verticelabel in `x'
		file write pajeknetfile `"`x' "`c2'""' _n
	}
	use `dataset', clear
	ren `ego' verticelabel
	sort verticelabel
	quietly merge verticelabel using `verticelabels'
	quietly keep if _merge==3
	drop _merge verticelabel
	ren number `ego'
	ren `alter' verticelabel
	sort verticelabel
	quietly merge verticelabel using `verticelabels'
	quietly keep if _merge==3
	drop _merge verticelabel
	ren number `alter'
	order `ego' `alter' `tiestrength'
	keep  `ego' `alter' `tiestrength'
	local narcs=[_N]
	if "`edges'"=="edges" {
		file write pajeknetfile `"`starchar'Edges"' _n
	}
	else {
		file write pajeknetfile `"`starchar'Arcs"' _n
	}
	sort `ego' `alter'
	if "`tiestrength'"~="" {
		forvalues x=1/`narcs' {
			local c1=`ego' in `x'
			local c2=`alter' in `x'
			local c3=`tiestrength' in `x'
			file write pajeknetfile "`c1' `c2' `c3'" _n
		}
	}
	else {
		forvalues x=1/`narcs' {
			local c1=`ego' in `x'
			local c2=`alter' in `x'
			file write pajeknetfile "`c1' `c2'" _n
		}
	}
	file close pajeknetfile
	*ensure that it's windows (CRLF) text format
	if "$S_OS"~="Windows" {
		filefilter `pajeknetfile'.net tmp, from(\M) to(\W) replace
		shell mv tmp `pajeknetfile'.net
		filefilter `pajeknetfile'.net tmp, from(\U) to(\W) replace
		shell mv tmp `pajeknetfile'.net
	}
	use `dataset', clear
	disp "Your output is saved as"
	disp "`c(pwd)'`c(dirsep)'`pajeknetfile'.net"
end

I also wrote a help file:

{smcl}
{* 16sep2009}{...}
{hline}
help for {hi:stata2pajek}
{hline}

{title:Export data to Pajek .net format} 

{p 8 17 2}
{cmd:stata2pajek} {it:ego alter}[, {cmdab:edges tiestrength() filename()}]

{title:Description} 

{p 4 4 2}
{cmd:stata2pajek} exports data to the ".net" format read by Pajek, Network
Workbench, and many other social network analysis packages.

{title:Remarks} 

{p 4 4 2}
The program assumes that you already have used Stata to create an edge list
or arc list. You specify which (string or numeric) variable identifies ego and
which alter. Specifying a tie strength variable is optional. {cmd:stata2pajek}
converts this to .net format, which is a Windows-formatted text file beginning
with a list of vertices (aka nodes) and their labels, followed by a list of ties
(arcs or edges).

{p 4 4 2}
As an alternative to this program, you may wish to use {cmd:outsheet} then
process the saved output with the Windows program txt2pajek.

{p 4 4 2}
Note that {cmd:stata2pajek} treats the Stata versions of your id variables as
labels even if they are numeric. Thus if you have a node called #15 in Stata
it will not necessarily also be called #15 in Pajek. Please see the vertice
section of the .net file (which is human-readable text) to see the
correspondence.

{title:Options} 

{p 4 8 2}
{cmd:edges} specifies that ties should be treated as edges (symmetrical ties).
The default is to treat ties as arcs (directed ties).

{p 4 8 2}
{cmd:filename()} allows you to name the output file. By default it is named
mypajekfile.net

{title:Examples}

{p 4 8 2}{cmd:. *create a random network with 10 nodes and export it as pajek}{p_end}
{p 4 8 2}{cmd:. clear}{p_end}
{p 4 8 2}{cmd:. set obs 200}{p_end}
{p 4 8 2}{cmd:. gen i=int(uniform()*10)+1}{p_end}
{p 4 8 2}{cmd:. gen j=int(uniform()*10)+1}{p_end}
{p 4 8 2}{cmd:. contract i j, freq(strength)}{p_end}
{p 4 8 2}{cmd:. drop if i==j}{p_end}
{p 4 8 2}{cmd:. sort i j}{p_end}
{p 4 8 2}{cmd:. stata2pajek i j, tiestrength(strength) filename(samplerandomnetwork)}{p_end}

{title:Author}

{p 4 4 2}Gabriel Rossman, UCLA{break}
rossman@soc.ucla.edu

{title:Also see}

{p 4 13 2}On-line:
help for {help outsheet}

A few notes:

  1. My code assumes you want arcs but it would be pretty easy to modify so it gives edges, or even lets you mix and match. [update — edges are now an option]
  2. Escaped quotes choke the syntax highlighting of TextMate and TextWrangler. I wouldn’t be surprised if there were similar problems with UltraEdit and/or TextPad. However Smultron works great, which I plan to use in the future for any file that uses escaped quotes (though generally I like the TextMate for the code-folding).
  3. Stata ignores a literal “*” in the “file” command. My workaround is to define a local containing the asterix, then use that local.
  4. I can’t figure out how to dump the entire dataset into “file.” My extremely ugly workaround is to write it to file one line at a time using locals to ferry the values between the dataset in memory and the text file on disk. This works OK for me but might be slow for large datasets.

Now I just need to figure out how to invoke Pajek from the command line and I’ll be able to completely script everything rather than having do-file lines that say things like “Note, after this runs remember to play around with the Pajek GUI for ten minutes.” I know this is technically possible because many Pajek commands are scriptable in Windows, but I’m using a Mac and would have to figure out how to pipe it from Stata to the shell to Crossover/Wine to Pajek, which I’ve never done. Another alternative is to use Network Workbench.

Also see this post on getting Pajek output back into Stata.

Entry filed under: Uncategorized. Tags: , , .

Stakeholders in the rubber room Why I’m not upgrading to Snow Leopard (yet)

9 Comments

  • 1. T  |  September 9, 2009 at 3:33 pm

    Hey,

    I am not sure but I think lines 65 and 66 of stata2pajek.ado should read:
    local c1=`ego’ in `x’
    local c2=`alter’ in `x’
    My program chokes on this line. Got the program from SSC, maybe you have already corrected this.

    Best,

    T

    • 2. gabrielrossman  |  September 9, 2009 at 4:08 pm

      sorry about that, i’ll post an update within a few days

  • 3. Pajek_labelvector.pl « Code and Culture  |  September 29, 2009 at 6:02 am

    […] this perl script and stata2pajek.ado it should be fairly easy to integrate network data into […]

  • 4. Stata2Pajek w vertice colors « Code and Culture  |  December 10, 2009 at 7:29 pm

    […] Stata program stata2pajek let’s you export an edge list or arc list from Stata to Pajek. However I recently wanted to […]

  • […] Pajek offer great tools for network visualization,  but if you are working with Stata, having to export your data to Pajek for every simple picture is a […]

  • 6. Rense Corten , Archive » pajek2stata  |  May 2, 2010 at 4:26 pm

    […] case you’d like to export data from Stata to Pajek, stata2pajek (written by Gabriel Rossman) can do the job. At the moment, the two programs do not really smoothly […]

  • 7. Social network packages poll « Code and Culture  |  May 6, 2010 at 4:59 am

    […] early efforts see Rense Corten’s netplot program and .net import filter, as well as my own .net export filter). So this necessitates leaving the warm cocoon of Stata and learning something else and a […]

  • 8. mark bruner  |  August 19, 2010 at 4:53 pm

    Does anyone know of a help/inquiry email for Pajek.

    I am attempting to create a main path of citations for a topic and have been receiving an odd error message when trying to play a macro?

    The error message reads as follows “Unknown commang: E3 Layersnx3” “Abnormal termination of running macro file!”

    Thank you.
    Sincerely,
    Mark


The Culture Geeks