Posts Tagged shell
Stata shell “command not found” errors
| Gabriel |
I like to use the shell command to pipe commands from Stata to the OS and/or other programs. For instance, graphexportpdf pipes to the Ghostscript command ps2pdf. Unfortunately I pretty often get error messages like this
/bin/bash: ps2pdf: command not found
Sometimes just restarting Stata works, but I’ve found that the only 100% reliable way to get shell to work properly is to execute the script in Stata console instead of Stata.app. You can do this from the Terminal as
exec /Applications/Stata/StataMP.app/Contents/MacOS/stata-mp foo.do
3 comments December 14, 2009
Get the path
| Gabriel |
When you’re scripting (whether in Stata or anything else) you need to tell the script where to look for things by giving it a directory path. As previously mentioned, I think it’s a good idea to treat the path as what Stata calls a “macro” and most other languages call a “variable.” That way you can define the path at the beginning of the script and if you later decide to change the target path you can change the one macro/variable rather than combing through the script looking for each instance.
Of course, this assumes that you know what the path is, which can be hard to remember if it’s a long path. There are a few ways to get it.
From within Stata, the local `c(pwd)' holds the current path and this info is also displayed in the interactive mode interface (in the toolbar on a Mac, at the bottom of the main window on Windows).
TextWrangler has a “copy path” feature in “get info”.
From the Mac Terminal you can get the path in the clipboard with pwd | pbcopy
In Snow Leopard, you can also do it as a Finder service. Follow these instructions, except substitute this shell script:
sed -e 's/:/\//g' -e 's/\ /%20/g' -e 's,[^/]*$,,' | pbcopy
5 comments December 7, 2009
Perl text library
| Gabriel |
I found this very useful library of perl scripts for text cleaning. You can use them even if you can’t code perl yourself, for instance to transpose a dataset just download “transpose.pl” script to your ~/scripts directory and enter the shell command:
perl ~/scripts/transpose.pl row_col.txt > col_row.txt
The transpose script is particularly useful to me as I’ve never gotten Excel’s transpose function to work and for some bizarre reason Stata’s “xpose” command only works with numeric variables. You can even use these scripts from directly in a do-file like so:
tempfile foo1
tempfile foo2
outsheet using `foo1'.txt
shell perl ~/scripts/transpose.pl `foo1'.txt > `foo2'.txt
insheet using `foo2'.txt, clear
1 comment November 30, 2009
Copy mac files when booting from dvd
| Gabriel |
One of the frustrating things about the Mac is that there’s no such thing as a live cd (and live cds for Windows and Linux can’t read HFS disks). Of course you can boot from the installer dvd, but it doesn’t have the Finder. If you have problems booting from your internal disk and you don’t have a reasonably current backup this can induce alternating waves of panic and despair. (I’m speaking from experience. I’ve screwed up my partition table by playing with gparted. Actually, I’ve done this twice — as a dog returns to his vomit so a fool returns to his folly).
However you can still copy files because the installer dvd does have the Terminal, and the Terminal can invoke the command “cp“. Here’s how to do it.
- Put the dvd in and restart, tapping option so it let’s you choose the dvd.
- Choose a language, then instead of installing the OS, go to the Utilities menu and choose Terminal
- Plug in a USB drive and type “ls /Volumes”. Figure out which one is your USB drive, which one is your internal drive, and write it down. If it doesn’t recognize the USB drive you’ll need to mount.
- Use “cd” to navigate to your internal disk and find your most important files, which are probably in “/Volumes/Macintosh HD/Users/yournamehere/Documents”
- Use the “cp source target” command to copy files from the internal disk to the USB disk. To copy a directory use the -R option. For example to copy the directory “bookmanuscript” you’d use something like
cp -R '/Volumes/Macintosh HD/Users/yournamehere/Documents/bookmanuscript' /Volumes/USBdisk"
Add comment October 27, 2009
Time Machine and rsync
| Gabriel |
I think Time Machine is one of the best features of Leopard / Snow Leopard, but I still have a few issues with it.
First, I’m really not interested in having a Spotlight index of my Time Machine drive, so I go to System Preferences / Spotlight / Privacy and add my Time Machine volume to the “do not index” list. This isn’t so much a privacy issue as a performance issue since the Spotlight indexer (“mdworker”) is a real hog so why have it index stuff you don’t plan to search?
Second, Time Machine doesn’t work well with more than one backup volume, especially if you want to update one of the backups infrequently or backup different directories to each drive. In my case I have a large drive that I keep at work and a small backup drive that I keep at home in case my office burns down and destroys both my mac and the big backup drive. To use Time Machine for both disks, I would not only need to “select disk” but also “exclude items” because the disk I keep at home isn’t big enough to hold everything. Furthermore if I skip a few weeks of backing up to the home disk, Time Machine refuses to do an incremental backup.
My solution to this is to use Time Machine for the main backup drive and rsync for the second one. Every day I use Time Machine with my big backup drive at the office. Once a week or so at home I take my redundant backup drive (“seagate”) out of the drawer, plug it in, and run this shell script.
#!/bin/bash #backup_seagate.sh rsync -aE --delete ~/Documents/ /Volumes/seagate/rossman/Documents rsync -aE --delete ~/Library/ /Volumes/seagate/rossman/Library rsync -aE --delete ~/scripts/ /Volumes/seagate/rossman/scripts rsync -aE --delete ~/Pictures/ /Volumes/seagate/rossman/Pictures rsync -aE --delete ~/Music/ /Volumes/seagate/rossman/Music rsync -aE --delete ~/Applications/ /Volumes/seagate/rossman/Applications
Note that the version of rsync that ships with OS 10.5 or 10.6 is pretty old. If you install the current version, it will handle the resource fork more efficiently. There are instructions here but for my purposes it’s not worth the hassle.
[Update1: USB flash drives work well as your off-site backup because they are easier to transport than hard drives, being smaller and lacking moving parts. However you'll need to use Disk Utility to change the file system from FAT to HFS+].
[Update2: Be careful with rsync as the syntax is important. It needs to be "command options source target," if you reverse source and target you're pretty much screwed].
Add comment October 23, 2009
Pajek_labelvector.pl
| Gabriel |
A few months ago I wrote some notes on using a text editor to get output out of Pajek or Network Workbench and into a rows and columns dataset. Now that I’ve learned Perl from the course notes my UC Davis colleagues posted, I wrote up a perl script that will automate this and create a tab-delimited ascii file (or files if you give it multiple .vec files).
I’d like to put the code directly in the post but when I try, wordpress drops some of the characters (eg, backslash-zero-one-five renders as just “15″) so I put the properly-formatted script here .[Update, the new "sourcecode" tag properly escapes all this stuff so I've updated the post to include the script at the bottom of the post. The external link still works but is now unnecessary].
It takes the labels from a “.net” data file and merges them (by sort order) onto a “.vec” output file which let’s you merge it back onto your main (non-network) dataset. Read my older post for an explanation of why this is necessary. Note that if the sort order is different for the .vec and .net files it will get screwy so be sure to spot check the values. The syntax is simply:
perl pajek_labelvector.pl myfile.net netmetric_1.vec netmetric_k.vec
Between this perl script and stata2pajek.ado it should be fairly easy to integrate network data into Stata.
#!/usr/bin/perl
# pajek_labelvector.pl
# Gabriel Rossman, UCLA, 2009-09-22
# this file extracts the vertice labels from a .net file and merges it (by sort order) with one or more .vec files
# take filenames as arguments
# file 1 is .net, files 2-k are .vec
# writes out foo.txt as tab delimited text
# note, this is dependent on an unchanged sort order
use strict; use warnings;
die "usage: pajek_labelvector.pl ... \n" unless @ARGV > 1;
my $netfile = shift (@ARGV);
my @labels=();
#read the vertice labels from .net file
open(NETIN, "<$netfile") or die "error reading $netfile for reading";
while (<NETIN>) {
if ($_ =~ m/"/) { #only use the vertice label lines, which include quote chars
$_ =~ /^[0-9]+ "(.*)"/; #search for quoted text
push @labels, $1; #return match, push to array
}
}
close NETIN;
#read netfile
foreach my $vecfile (@ARGV) {
open(VECIN, "<$vecfile") or die "error reading $vecfile"; open(VECOUT, ">$vecfile.txt") or die "error creating $vecfile.txt";
my @vec=();
while (<VECIN>) {
$_ =~ s/\015?\012//; #manual chomp to allow windows or unix text
if ($_ !~ m/^\*/) {
push @vec, $_;
}
}
close VECIN;
my $veclength = @vec - 1;
my $lablength = @labels -1;
die "error, $vecfile is different length than $netfile" unless $veclength==$lablength;
for my $i (0..$veclength) {
print VECOUT "$labels[$i]\t$vec[$i]\n";
}
close VECOUT;
@vec=();
}
print "WARNING: this script assumes that the .vec and .net have the same sort order\nplease spot check the values to avoid error\n";
Add comment September 29, 2009
Stata console mode
| Gabriel |
I just realized that Stata SE/MP includes the console version of Stata.
Since the Stata GUI adds only about 15 or 16 megs of RAM and a comparably light load to the CPU, it doesn’t really improve the performance that much for most things, but I still thought it was pretty cool in a dorky ASCII art kind of way. The only place where I notice a substantial performance jump is with one do-file that generates hundreds of graphs (and saves them to disk) — not only is console mode much faster but it’s less distracting as graphs aren’t constantly popping up.
To invoke console mode on a mac, go to the terminal and write:
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp
To get the GUI you’d do the same thing but change the last bit to “stataMP” (note the case-sensitivity). Of course both versions can take a do-file as an argument and you can add the path as an alias to ~/.bashrc like this:
echo "alias stataconsole='exec /Applications/Stata/StataMP.app/Contents/MacOS/stata-mp'" >> ~/.bashrc echo "alias statagui='exec /Applications/Stata/StataMP.app/Contents/MacOS/stataMP'" >> ~/.bashrc
You could similarly change text editor push scripts to use console, but I think it’s a good idea to use the GUI while you’re still debugging because it’s easier to spot error messages (the GUI has syntax highlighting) and experiment with alternate usage (the GUI menus can be useful for learning syntax).
1 comment September 24, 2009
Shell vs “Shell”
| Gabriel |
Two thoughts on Stata’s “shell” command (which let’s Stata access the OS command line).
First, I just discovered the “ashell” command (“ssc install ashell”), which pipes the shell’s “standard out” to Stata return macros. For short output this can be a lot more convenient than what I had been doing, which was to pipe stdout to a file, then use “file” or “insheet” to get that into Stata. For instance, my “do it to everything in a directory” script is a lot simpler if I rewrite it to use “ashell” instead of “shell”.
ashell ls *.dta
* note that ashell tokenizes stdout so to use it as one string you need to reconstruct it
forvalues stringno=1/`r(no)' {
local stdout "`stdout' `r(o`stringno')'"
}
*because i used "ls", stdout is now a list of files, suitable for looping
*as an example, i'll load each file and export it to excel 2007 format
foreach file in `stdout' {
use `file', clear
xmlsave `file'.xlsx, doctype(excel) replace
}
The second thing is a minor frustrations I’ve had with the Stata “shell” command. Unlike a true shell in the terminal, it has no concept of a “session” but treats each shell command as coming ex nihilio. A related problem is it doesn’t read any of your preference files (e.g., ~/.bashrc). Since shell preference files are just shell scripts read automatically at the beginning of a session, the latter is a logical corollary of the first. Ignoring the preference files is arguably a feature, not a bug, as it forces you to write do-files that will travel between machines (at least if both are Windows or both are POSIX).
Anyway, here’s a simple example of what I mean. Compare running this (working) bash script in a terminal session:
alias helloworld="echo 'hello world'" helloworld
with the (failing) equivalent through Stata’s shell command:
shell alias helloworld="echo 'hello world'" shell helloworld
Anyway, I think the best work-around is to use multiple script files that either invoke each other as a daisy chain or are all invoked by a master shell script. So, say you needed Stata to do some stuff, then process the data with some Unix tools, then do some more stuff with Stata, then typeset the output as PDF. One way to do this would be to have a shell script that says to use the first do file, then the perl script, then the second do-file, then a latex command. Alternately you could make the last line of the first do-file a “shell” command to invoke the perl script, the last line of the perl script a “system” or “exec” command to invoke the second do-file, and the last line of the second do-file is a “shell” command to invoke ghostscript or lyx.
Also note that if you’re doing the master shell script approach you can do some interesting stuff with “make” so as to ensure that the dependencies in a complicated workflow execute in the right order. See here and here for more info.
Finally, if you just want to read a long path that you usually “alias”, the simplest thing is to just copy the full path from bashrc, you can do this directly from Stata by typing “view ~/.bashrc”
3 comments September 23, 2009
Texcount.pl
| Gabriel |
Somebody recently asked me for a projected word count of my manuscript (which is in Lyx) and to answer this question I found the amazingly useful script texcount.pl. If you just run “wc” (or the equivalent in a text editor) on a tex or lyx file you count all the plain text and the markup code. Not only does this script screen out the meta-text, but it can give you detailed breakdowns of words, figures, and captions — all broken out by section.
I like to keep scripts in “~/scripts/” so to make this script readily accessible from the command-line I entered the command:
echo "alias texcount='perl ~/scripts/TeXcount_2_2/texcount.pl'" >> ~/.bashrc
Now to run the command I just go to the terminal and type
texcount foo.tex
You should really check out the options if you have a long and complex document. My favorite option is “-sub”. This gives a detailed breakdown of word count, figure count, etc, by chapter, section, or whatever.
texcount -sub foo.tex
Remember that if you always use a certain option, you can write it into the alias command.
Lyx has a similar basic command built in (Tools/Statistics), but it doesn’t give as much information and doesn’t break out the data by section. To use texcount with lyx files, you first need to export Lyx to Latex which you can do from the GUI (File/Export/Latex), but if you’re using texcount anyway you should just use the command line.
lyx --export latex foo.lyx
That works for Linux but on a Mac this will work more consistently
exec '/Applications/Lyx.app/Contents/MacOS/lyx' --export latex foo.lyx
That’s a long command, so on my Mac I created an alias as “lyx2tex”
echo "alias lyx2tex='exec /Applications/Lyx.app/Contents/MacOS/lyx --export latex'" >> ~/.bashrc
Note that all this works on POSIX but may require some modification to work with Windows (unless it has CygWin).
Add comment September 21, 2009
Bash/Perl tutorial
| Gabriel |
I’m a big fan of the idea of using Unix tools like Perl to script the cleaning of the massive text-based datasets that social scientists (especially sociologists of culture) often use. Unfortunately there’s something of a learning curve to this so even though I like the idea in principle and increasingly in practice, I still sometimes clean data interactively with TextWrangler and just try to keep good notes.
Fortunately two of my UC system colleagues have posted the course materials for a “Unix and Perl Primer for Biologists.” I’m about halfway through the materials and it’s great, in part because (unlike the llama) they assume no prior familiarity with programming or Unix. Although the examples involve genetics, it’s well-suited for social scientists as, like us, biologists are not computer scientists but are reasonably technically competent and they often deal with large text based data sets. Basically, if you can write a Stata do-file, you should be able to follow their course guide and if you use things like scalars and loops it should be pretty easy.
I highly recommend the course to any social scientist who deals with large dirty datasets, in other words, basically anyone who is a quant but doesn’t just download clean ICPSR or Census data. This is especially relevant for anyone who wants to scrape data off the web, use IMDB, do large-scale content analysis, etc.
Some notes:
- They assume you will a) be running the materials off a stick and b) using Mac OS X. If you’re keeping the material on the hard drive, get used to typing “chmod u+x foo.pl” to make the perl script “foo” executable. (This step is unnecessary for files on a stick because unlike HFS+ or EXT3, the FAT filesystem doesn’t do permissions). If you’re using a different version of Unix, most of it should work similarly with only a few minor differences, such as that you’ll want to use Kate instead of Smultron and on a Mac a USB stick is in /Volumes/ whereas in Linux it’s in /Media/ and in BSD it’s in /mnt/. If you’re using Windows you’ll either need to a) install CygWin b) install a virtual machine c) run off a live cd or bootable stick or d) dual boot with Wubi.
- If you’re really used to Stata, some of the nomenclature may seem backwards, mostly because Perl doesn’t keep a dataset in memory but processes it on disk, one command at a time. So, in Perl and Bash a “variable” is the equivalent to what Stata calls a (global or local) “macro”. The closest Perl equivalent to what Stata calls a “variable” would be a “field” in a tab-delimited text file.
[Update: Although they suggest Smultron, I find TextMate works even better as it can execute scripts entirely within the editor, so you don't have to constantly cmd-tab to Terminal.app and back.]
Add comment September 15, 2009