Posts tagged ‘Stata’

Adding elements to graphs as a slideshow

| Gabriel |

One of the tricks to a successful presentation is to limit what your audience sees so they don’t get ahead of you and also to preserve a general sense of timing and flow. This helps keep the audience’s attention and also is good for focusing expectations in such a way that the next bit is counter-intuitive and therefore interesting. Nothing is so boring as sitting in a talk and seeing ten bullet points and realizing that the speaker is only on bullet number three.

Similarly if you’re using graphs in a talk (which you should as much as possible since they read better than tables), you may only want to reveal part of a graph as you talk about it, then reveal the next bit when you’re ready. The most obvious way to do this is to just crop the graph or cover it with boxes that match the background or something. Unfortunately that’s ugly and clunky and doesn’t work if the graph elements are tightly commingled. Another way to do it is to generate two graphs, one of which has the elements and the other of which doesn’t. The problem with this is that the graphs don’t match up properly. For instance, if you have a line graph and you keep adding lines to it, the legend will first appear and then grow larger, crowding out the graph itself.

Ideally, what you want is a set of graphs that are completely identical except some elements are missing in one version which are added in the other version. You can then line the graphs up, talk about the first set of elements, and then do a smooth transition to the version with the full set of elements. Here’s an example from the talk I gave yesterday. In order to explain crossover I first show the song’s native formats then dissolve to also show the crossover formats.

Here’s how I did it. The basic trick is that Stata can create transparent graph elements by setting the color to “none”. You do the exact same graph multiple times, you just set colors to be transparent when you want to conceal elements. That is, the code in lines 10–14 is identical to that in lines 17–21 except that lines 13 and 14 set line color to “none” instead of Stata’s standard s2color scheme.

use final_f, clear
keep if artist=="SARA BAREILLES"
drop if format=="All" | format=="Other"
sum date
local maxdate=`r(max)'
local mindate=`r(min)'
local interval=(`maxdate'-`mindate')/10
local interval=round(`interval',7)

twoway (line Nt_inc_p date if format=="AAA_Rock", lwidth(thick) lcolor(navy)) /*
  */ (line Nt_inc_p date if format=="Hot_AC", lwidth(thick) lcolor(maroon)) /*
  */ (line Nt_inc_p date if format=="Top_40", lwidth(thick) lcolor(none)) /*
  */ (line Nt_inc_p date if format=="Mainstream_AC", lwidth(thick) lcolor(none)) /*
  */ , xtitle("") xmtick(`mindate'(7)`maxdate') xlabel(`mindate'(`interval')`maxdate', labsize(vsmall) angle(forty_five) format(%tdMon_dd,_CCYY)) legend(order (1 "AAA Rock" 2 "Hot AC" 3 "Top 40" 4 "Mainstream AC"))  graphregion(fcolor(white))
graph export $images/sarabareilles_lovesong_1.pdf, replace

twoway (line Nt_inc_p date if format=="AAA_Rock", lwidth(thick) lcolor(navy)) /*
  */ (line Nt_inc_p date if format=="Hot_AC", lwidth(thick) lcolor(maroon)) /*
  */ (line Nt_inc_p date if format=="Top_40", lwidth(thick) lcolor(dkorange)) /*
  */ (line Nt_inc_p date if format=="Mainstream_AC", lwidth(thick) lcolor(forest_green)) /*
*/ , xtitle("") xmtick(`mindate'(7)`maxdate') xlabel(`mindate'(`interval')`maxdate', labsize(vsmall) angle(forty_five) format(%tdMon_dd,_CCYY)) legend(order (1 "AAA Rock" 2 "Hot AC" 3 "Top 40" 4 "Mainstream AC")) graphregion(fcolor(white))
graph export $images/sarabareilles_lovesong_2.pdf, replace 

October 4, 2011 at 4:42 am 2 comments

Executing do-files from text editors

| Gabriel |

Stata now defaults to opening a do-file in the integrated do-file editor rather than just running it. The integrated do-file editor is now pretty good, but I’m a creature of habit and I prefer to use an external text editor (usually TextMate) then pipe to Stata. The current default behavior makes this somewhat inconvenient.

Fortunately, you can change this pretty easily in the preferences. Open Stata’s preferences, go to the “Do-File” tab and then the “advanced” sub-tab. Now uncheck the box that says “Edit do-files opened from the Finder in Do-file Editor.” Even though it says “from the Finder” this also applies to do-files launched pretty much any way you can think of: after-market file managers, text editors, etc.

Alternately, you could rewrite your text editor’s Stata support to use Stata console, but that’s probably overkill.

September 28, 2011 at 5:18 am 3 comments

Misc Links

| Gabriel |

  • Useful detailed overview of Lion. The user interface stuff doesn’t interest me nearly as much as the tight integration of version control and “resume.” Also, worth checking if your apps are compatible. (Stata and Lyx are supposed to work fine. TextMate is supposed to run OK with some minor bugs. No word on R. Fink doesn’t work yet). It sounds good but I’m once again sitting it out for a few months until the compatibility bugs get worked out. Also, as with Snow Leopard many of the features won’t really do anything until developers implement them in their applications.
  • I absolutely loved the NPR Planet Money story on the making of Rihanna’s “Man Down.” (Not so fond of the song itself, which reminds me of Bing Crosby and David Bowie singing “Little Drummer Boy” in matching cardigans). If you have any interest at all in production of culture read the blog post and listen to the long form podcast (the ATC version linked from the blog post is the short version).
  • Good explanation of e, which comes up surprisingly often in sociology (logit regression, diffusion models, etc.). I like this a lot as in my own pedagogy I really try to emphasize the intuitive meaning of mathematical concepts rather than just the plug and chug formulae on the one hand or the proofs on the other.
  • People are using “bimbots” to scrape Facebook. And to think that I have ethical misgivings about forging a user-agent string so wget looks like Firefox.

July 20, 2011 at 3:46 pm

Christmas in July

| Gabriel |

Has it been two years already? Holy moly, Stata 12 looks awesome.

The headline feature is structural equation modeling. It comes with a graphic model builder, which even an “only scripting is replicable” zealot like me can appreciate as it helps you learn complicated command syntax. (I feel the same way about graphs). I had actually been thinking of working SEM into my next paper and was thinking through the logistics of getting a copy of M+, learning the syntax (again), etc. Now I can do it within Stata. I look forward to reading more papers that use SEM without really understanding the assumptions.

Probably the most satisfying new feature to me though is contour plots. Ever since I got interested in writing simulations a few years ago, I have been wanting to make heat maps in Stata. I’ve spent many hours writing code that can pipe to gnuplot and, not being satisfied with that, I (with some help from Lisa) have spent yet more time working on another script that can pipe to the wireframe function in R’s lattice library. Now I’m very happy to say that I will not finish writing this ado-file and submitting it to SSC as Stata 12 contains what looks to be really good native heat plots.

I’m thinking the set of commands I will feel most guilty about not using more often, is margin plots, which extends the margin command from Stata 11. In addition to the headline new features there’s a bunch of little stuff, including fixes to get more compatibility between the estimation commands and the ancillary commands (e.g., better “predict” support for count models and “svy” support for “xtmixed”). Also, Windows users should be pleased to hear that they can now do PDFs natively.

[Update: Also see Jeremy’s post on Stata 12. He closes with a pretty funny metaphor of stats packages to cell phone brands.]

June 27, 2011 at 12:46 pm 7 comments

Which of my cites is missing?

| Gabriel |

I was working on my book (in Lyx) and it drove me crazy that at the top of the bibliography was a missing citation. Finding the referent to this missing citation manually was easier said than done and ultimately I gave up and had the computer do it. These suggestions are provided rather inelegantly as a “log” spread across two languages. However you could pretty easily work them into an argument-passing script written in just one language. Likewise, it should be easy to modify them for use with plain vanilla LaTeX if need be.

First, I pulled all the citations from the book manuscript and all the keys from my Bibtex files.

grep '^key ' book.lyx | sort | uniq -u | perl -pe 's/^key "([^"]+)"/$1/' > cites.txt
grep '^\@' ~/Documents/latexfiles/ghrcites_manual.bib | perl -pe 's/\@.+{(.+),/$1/' > bibclean.txt
grep '^\@' ~/Documents/latexfiles/ghrcites_zotero.bib | perl -pe 's/\@.+{(.+),/$1/' >> bibclean.txt

Then in Stata I merged the two files and looked for Bibtex keys that appear in the manuscript but not the Bibtex files. [Update, see the comments for a better way to do this.] From that point it was easy to add the citations to the Bibtex files (or correct the spelling of the keys in the manuscript).

insheet using bibclean.txt, clear
tempfile x
save `x'
insheet using cites.txt, clear
merge 1:1 v1 using `x'
list if _merge==1

June 1, 2011 at 4:37 am 2 comments

Stata for Mac PDF and “set graphics on”

| Gabriel |

I recently noted that graph exporting to PDF in Stata for Mac is fixed. Turns out that this is only partially true. It works and creates beautiful output, but unlike the other “graph export” options it only works if you have “set graphics on” in the Stata GUI. If you’re running it as Stata console or have graphics set off in Stata GUI, it simply doesn’t work. (I do this when batching a lot of graphs as it is faster and less distracting).

My understanding is that this has something to do with how Stata relies on Mac’s Quartz driver to render PDF so it’s not really feasible to fix. So basically you have three options:

1) Do it in the GUI with “set graphics on” and accept the CPU performance hit and distraction of all the graphs rendering.

2) Use my graphexportpdf ado file or the “graph print” command with CUPS-PDF as the print driver.

3) Stick to using EPS

May 16, 2011 at 4:14 am 4 comments

Simulations, numlist, and order of operations

| Gabriel |

I’ve been programming another simulation and as is typical am batching it through various combinations of parameter values, recording the results each time. In making such heavy (and recursive) use of the forvalues loop I noticed some issues with numlist and orders of operation in algorithms.

First, Stata’s numlist expression (as in the “forvalues” syntax) introduces weird rounding errors, especially if specified as fractions. Thus it is preferable to count by integers then scale down to the fractional value within the loop. This is also useful if you want to save each run of the simulation as a file as it lets you avoid fractional filenames.

So instead of this:

forvalues i=0(.01)1 {
	replace x=sin(`i')
	save sin`i'.dta, replace
}

Do this:

forvalues i=0/100 {
	local i_scaled=`i'/100
	replace x=sin(`i_scaled')
	save sin`i'.dta, replace
}

Another issue with numlist is that it can introduce infintessimal errors so that evaluating “1==1” comes back false. If you have a situation like this you need to make the comparison operator fuzzy. So instead of just writing the expression “if y==x” you would use the expression

if y>x-.0001 & y<x+.0001

Finally, I’ve noticed that when you are running nested loops the number of operations grows exponentially and so it makes a big difference in what order you do things. In particular, you want to arrange operations so they are repeated the least numbers of times. For instance, suppose you have batched a simulation over three parameters (x, y, and z) and saved each combination in its own dataset with the convention “results_x_y_z” and you wish to append the results in such a way that the parameter values are variables in the new appended dataset. The simple (but slow) way to run the append is like this:

clear
gen x=.
gen y=.
gen z=.
forvalues x=1/100 {
	forvalues y=1/100 {
		forvalues z=1/100 {
			append using results_`x'_`y'_`z'
			recode x .=`x'
			recode y .=`y'
			recode z .=`z'
		}
	}
}

Unfortunately this is really slow. The following code has the same number of lines but it involves about half as many operations for the computer to do. In the first version there are four commands that are each run 100^3 times. The second version has two commands that run 100^3 times, one command that runs 100^2 times, and one command that runs 100 times.

clear
gen x=.
gen y=.
gen z=.
forvalues x=1/100 {
	forvalues y=1/100 {
		forvalues z=1/100 {
			append using results_`x'_`y'_`z'
			recode z .=`z'
		}
		recode y .=`y'
	}
	recode x .=`x'
}

April 26, 2011 at 4:46 am 2 comments

Older Posts Newer Posts


The Culture Geeks