Posts tagged ‘Stata’

Stata for Mac PDF fixed in Stata 11.2

| Gabriel |

About a year ago, I got frustrated with Stata’s “graph export foo.pdf” command, which at the time gave hideous output. Apparently the problem was that Stata used the same code to write to disk as to write to screen. As a work-around, I wrote graphexportpdf.ado, which is basically a wrapper to pipe Stata-generated eps files through Ghostscript.

I am happy to report that the revision notes for Stata 11.2 include this line, from the section about Stata for Mac:

33.  Graphs exported as PDF files are now exported with increased resolution.

That is to say, they fixed it. I tested it and it creates beautiful output and does so very quickly. Thanks StataCorp!

I highly recommend that Mac users of Stata 11.2 and higher use the native PDF capabilities through the standard “graph export foo.pdf” syntax. Graphexportpdf.ado may still be useful for Mac users of versions 10 and earlier and to Linux users (who don’t have Quartz but usually have Ghostscript as part of a LaTeX distro).

Finally, remember that “graph export foo.pdf” is a Mac only option so if you want your code to be portable you should treat it like this:

if "`c(os)'"=="MacOSX" { 
  graph export mygraph.pdf, replace
} 
else {
  graph export mygraph.eps, replace
}

April 11, 2011 at 4:29 pm 3 comments

Escaped quotes and syntax highlighting

| Gabriel |

Quotes usually delimit where a string with embedded spaces begin, but sometimes you want the quote to be literal and this requires escaping it. To recycle an example I’ve used before, suppose you wanted to display:

Beavis said "Fire! Fire!"

To get Stata to display this, you would escape the quotes by encompassing them in left and right apostrophes (just like calling a local) so the command would be:

disp `"Beavis said "Fire! Fire!""'

This is a trivial example, but a more realistic application is you might want to put some things that involve quotes inside a local and since the content of the local is itself delimited by quotes you’ll need to escape them.

OK, easy enough, but the problem is that most external text editors don’t appreciate this nuance of Stata syntax and end up showing the rest of the document as quoted text, effectively making the syntax highlighting useless. (Stata’s internal editor doesn’t suffer this problem, but I’m in the habit of using TextMate since prior to Stata 11 the editor didn’t have highlighting and it still doesn’t do code-folding). The solution is to let two syntax-parsing wrongs make a right by putting a single quote in a comment, which Stata will ignore but which the text editor will parse as closing a previous hanging quote. It works like this:

disp `"Beavis said "Fire! Fire!""'
* " this line exists only to let the text editor's parser know that everything is back to normal
disp "see, it works. this quoted text should show up as quoted whereas the word 'disp' appears as a command"

February 7, 2011 at 5:05 am 5 comments

Shufflevar update

| Gabriel |

Thanks to Elizabeth Blankenspoor (Michigan) I corrected a bug with the “cluster” option in shufflevar. I’ve submitted the update to SSC but it’s also here:

*1.1 GHR January 24, 2011

*changelog
*1.1 -- fixed bug that let one case per "cluster" be misallocated (thanks to Elizabeth Blankenspoor)

capture program drop shufflevar
program define shufflevar
	version 10
	syntax varlist(min=1) [ , Joint DROPold cluster(varname)]
	tempvar oldsortorder
	gen `oldsortorder'=[_n]
	if "`cluster'"!="" {
		local bystatement "by `cluster': "
	}
	else {
		local bystatement ""
	}
	if "`joint'"=="joint" {
		tempvar newsortorder
		gen `newsortorder'=uniform()
		sort `cluster' `newsortorder'
		foreach var in `varlist' {
			capture drop `var'_shuffled
			quietly {
				`bystatement' gen `var'_shuffled=`var'[_n-1]
				`bystatement' replace `var'_shuffled=`var'[_N] if _n==1
			}
			if "`dropold'"=="dropold" {
				drop `var'
			}
		}
		sort `oldsortorder'
		drop `newsortorder' `oldsortorder'
	}
	else {
		foreach var in `varlist' {
			tempvar newsortorder
			gen `newsortorder'=uniform()
			sort `cluster' `newsortorder'
			capture drop `var'_shuffled
			quietly {
				`bystatement' gen `var'_shuffled=`var'[_n-1]
				`bystatement' replace `var'_shuffled=`var'[_N] if _n==1
			}
			drop `newsortorder'
			if "`dropold'"=="dropold" {
				drop `var'
			}
		}
		sort `oldsortorder'
		drop `oldsortorder'
	}
end

January 24, 2011 at 7:30 pm 6 comments

Growl in R and Stata

| Gabriel |

Growl is a system notification tool for Mac that lets applications, the system itself, or hardware display brief notification, usually in the top-right corner. The translucent floating look reminds me of the more recent versions of KDE.

Anyway, one of the things it’s good for is letting programs run in the background and let you know when something noteworthy has happened. Of course, a large statistics batch would qualify. I was running a 10 minute job in the R package igraph and got tired of checking to see when it was done so I found this tip. In a nutshell, it says to download Growl, including the command-line tool GrowlNotify from the “Extras” folder, then create this R function.

growl <- function(m = 'Hello world')
system(paste('growlnotify -a R -m \'',m,'\' -t \'R is calling\'; echo \'\a\' ', sep=''))

Since the R function “system()” is equivalent to the Stata command “shell,” I realized this would work in Stata as well and so I wrote this ado file.* You call it just by typing “growl”. It takes as an (optional) argument whatever you’d like to see displayed in Growl, such as “Done with first analysis” or “All finished” but by default displays “Stata needs attention.” Note that while Stata already bounces in the dock when it completes a script or hits an error, you can also have Growl appear at various points during the run of a script.

I’ve only tested it with StataMP, but I’d appreciate it if people who use Growl and other versions of Stata would post their results in the comments. If it proves robust I’ll submit it to SSC.

Here’s the Stata code. The most important line is #32 and if you were doing it by hand you’d do it as a one-liner but the rest of this stuff allows argument passing and compatibility with the different versions (small, IC, SE, MP) of Stata:

*1.0 GHR Jan 19, 2011
capture program drop growl
program define growl
	version 10
	set more off
	syntax [anything]

	if "`anything'"=="" {
		local message "Stata needs attention"		
	}
	else {
		local message "`anything'"
	}
	
	local appversion "Stata"
	if "`c(flavor)'"=="Small" {
		local appversion "smStata"
	}
	else {
		if `c(SE)'==1 {
			if `c(MP)'==1 {
				local appversion "StataMP"
			}
			else {
				local appversion "StataSE"
			}
		}
		else {
			local appversion "Stata"
		}
	}
	shell growlnotify -a `appversion' -m \ "`message'" \ `appversion' \
end

*Most programming/scripting languages and a few GUI applications have a similar system call and you can likewise get Growl to work with them. For instance, Perl also has a function called system(). In shell scripting of course you can just use the “growlnotify” command directly.

January 20, 2011 at 5:10 am 10 comments

Stata tv

| Gabriel |

Stata Daily links to a bunch of instructional YouTube videos on Stata basics provided by U Minnesota.

Similarly, UCLA ATS has a Stata starter kit, which includes videos.

Stata Tidbit of the Week also has lots of screencasts. His videos tend to be on slightly more advanced issues than the other two sites so people who are already used to Stata should go here but absolute beginners should start with the Minnesota or UCLA material.

December 14, 2010 at 4:37 am

Conditioning on a Collider Between a Dummy and a Continuous Variable

| Gabriel |

In a post last year, I described a logical fallacy of sample truncation that helpful commenters explained to me is known in the literature as conditioning on a collider. As is common, I illustrated the issue with two continuous variables, where censorship is a function of the sum. (Specifically, I used the example of physical attractiveness and acting ability for a latent population of aspiring actresses and an observed population of working actresses to explain the paradox that Megan Fox was considered both “sexiest” and “worst” actress in a reader poll).

In revising my notes for grad stats this year, I generalized the problem to cases where at least one of the variables is categorical. For instance, college admissions is a censorship process (only especially attractive applicants become matriculants) and attractiveness to admissions officers is a function of both categorical (legacy, athlete, artist or musician, underrepresented ethnic group, in-state for public schools or out-of-state for private schools, etc) and continuous distinctions (mostly SAT and grades).

For simplicity, we can restrict the issue just to SAT and legacy. (See various empirical studies and counterfactual extrapolations by Espenshade and his collaborators for how it works with the various other things that determine admissions.) Among college applicant pools, the children of alumni to prestigious schools tend to score about a hundred points higher on the SAT than do other high school students. Thus the applicant pool looks something like this.

However, many prestigious colleges have policies of preferring legacy applicants. In practice this mean that the child of an alum can still be admitted with an SAT score about 150 points below non-legacy students. Thus admission is a function of both SAT (a continuous variable) and legacy (a dummy variable). This implies the paradox that the SAT scores of legacies are about half a sigma above average for the applicant pool but about a full sigma below average in the freshman class, as seen in this graph.

Here’s the code.

clear
set obs 1000
gen legacy=0
replace legacy=1 in 1/500
lab def legacy 0 "Non-legacy" 1 "Legacy"
lab val legacy legacy
gen sat=0
replace sat=round(rnormal(1100,250)) if legacy==1
replace sat=round(rnormal(1000,250)) if legacy==0
lab var sat "SAT score"
recode sat -1000/0=0 1600/20000=1600 /*top code and bottom code*/
graph box sat, over(legacy) ylabel(0(200)1600) title(Applicants)
graph export collider_collegeapplicants.png, replace
graph export collider_collegeapplicants.eps, replace
ttest sat, by (legacy)
keep if (sat>1400 & legacy==0) | (sat>1250 & legacy==1)
graph box sat, over(legacy) ylabel(0(200)1600) title(Admits)
graph export collider_collegeadmits.png, replace
graph export collider_collegeadmits.eps, replace
ttest sat, by (legacy)
*have a nice day

November 30, 2010 at 4:40 am 5 comments

Keep the best 5 (updated)

| Gabriel |

Last year I mentioned my policy of assigning about seven quizzes and then keeping the best 5. I then had a real Rube Goldberg-esque workflow that involved piping to Perl. Several people came up with simpler ideas in the comments, but the most “why didn’t I think of that” was definitely John-Paul Ferguson’s suggestions to just use reshape. Now that I’m teaching the class again, I’ve rewritten the script to work on that logic.

Also, I’ve made the script a bit more flexible by allowing it to specify in the header how many quizzes were offered and how many to keep. To make this work I made a loop that builds a local called sumstring.

[UPDATE 11/29/2010, applied Nick Cox’s suggestions. Old code remains but is commented out]

local numberofquizzes 6
local keepbest 5

*import grades, which look like this
*uid    name    mt  q1  q2  q3
*5001   Joe     40  5   4   6
*4228   Alex    20  6   3   5
insheet using grades.txt, clear
*rescale the quizzes from raw points to proportion 
forvalues qnum=1/`numberofquizzes' {
	quietly sum q`qnum'
	replace q`qnum'=q`qnum'/`r(max)'
}
/*
*build the sumstring local (original code)
local sumstring ""
forvalues i=1/`keepbest' {
	local sumstring "`sumstring' + q`i'"
	disp "`sumstring'"
	local sumstring=subinstr("`sumstring'","+","",1)
	disp "`sumstring'"
}
*/
*reshape long, keep top few quizzes
reshape long q, i( notes uid name mt) j(qnum)
recode q .=0
gsort uid -q
by uid: drop if _n>`keepbest'
by uid: replace qnum=_n
*reshape wide, calc average
reshape wide q, i(notes uid name mt) j(qnum)
*build the sumstring local (w/ Nick Cox's suggestions)
unab sumstring : q* 
disp "`sumstring'"
local sumstring : subinstr local sumstring " " "+", all
disp "`sumstring'"
gen q_avg=(`sumstring')/`keepbest'
sort name
sum q_avg

*have a nice day

November 24, 2010 at 4:24 am 4 comments

Older Posts Newer Posts


The Culture Geeks