Pajek_labelvector.pl
September 29, 2009 at 5:25 am GR 2 comments
| Gabriel |
A few months ago I wrote some notes on using a text editor to get output out of Pajek or Network Workbench and into a rows and columns dataset. Now that I’ve learned Perl from the course notes my UC Davis colleagues posted, I wrote up a perl script that will automate this and create a tab-delimited ascii file (or files if you give it multiple .vec files).
I’d like to put the code directly in the post but when I try, wordpress drops some of the characters (eg, backslash-zero-one-five renders as just “15”) so I put the properly-formatted script here .[Update, the new “sourcecode” tag properly escapes all this stuff so I’ve updated the post to include the script at the bottom of the post. The external link still works but is now unnecessary].
It takes the labels from a “.net” data file and merges them (by sort order) onto a “.vec” output file which let’s you merge it back onto your main (non-network) dataset. Read my older post for an explanation of why this is necessary. Note that if the sort order is different for the .vec and .net files it will get screwy so be sure to spot check the values. The syntax is simply:
perl pajek_labelvector.pl myfile.net netmetric_1.vec netmetric_k.vec
Between this perl script and stata2pajek.ado it should be fairly easy to integrate network data into Stata.
#!/usr/bin/perl # pajek_labelvector.pl # Gabriel Rossman, UCLA, 2009-09-22 # this file extracts the vertice labels from a .net file and merges it (by sort order) with one or more .vec files # take filenames as arguments # file 1 is .net, files 2-k are .vec # writes out foo.txt as tab delimited text # note, this is dependent on an unchanged sort order use strict; use warnings; die "usage: pajek_labelvector.pl ... \n" unless @ARGV > 1; my $netfile = shift (@ARGV); my @labels=(); #read the vertice labels from .net file open(NETIN, "<$netfile") or die "error reading $netfile for reading"; while (<NETIN>) { if ($_ =~ m/"/) { #only use the vertice label lines, which include quote chars $_ =~ /^[0-9]+ "(.*)"/; #search for quoted text push @labels, $1; #return match, push to array } } close NETIN; #read netfile foreach my $vecfile (@ARGV) { open(VECIN, "<$vecfile") or die "error reading $vecfile"; open(VECOUT, ">$vecfile.txt") or die "error creating $vecfile.txt"; my @vec=(); while (<VECIN>) { $_ =~ s/\015?\012//; #manual chomp to allow windows or unix text if ($_ !~ m/^\*/) { push @vec, $_; } } close VECIN; my $veclength = @vec - 1; my $lablength = @labels -1; die "error, $vecfile is different length than $netfile" unless $veclength==$lablength; for my $i (0..$veclength) { print VECOUT "$labels[$i]\t$vec[$i]\n"; } close VECOUT; @vec=(); } print "WARNING: this script assumes that the .vec and .net have the same sort order\nplease spot check the values to avoid error\n";
Entry filed under: Uncategorized. Tags: cleaning, networks, perl.
1. Network Graphs in Native Stata Code « Code and Culture | April 13, 2010 at 5:39 am
[…] and export the data, then call an R script to do what I need, and in some cases use perl to clean the output for importing back into Stata. Since R/igraph has great network tools, this is a very flexible and […]
2. Misc Links: Stata Networks and Mac SPSS bugfix « Code and Culture | October 25, 2010 at 12:54 pm
[…] As probably became inevitable with the creation of Mata, progress marches on in bringing social networks to Stata. Specifically, SSC is now hosting “centpow.ado,” which calculates Bonacich centrality and a few related measures directly in Stata. Thanks to Zach Neal of Michigan Stata for contributing this command. A few more years of this kind of progress and I can do everything entirely within Stata rather than exporting my network data and “shell” to R/igraph, and merging back in. […]