Posts tagged ‘typesetting’

| Gabriel |

Somebody recently asked me for a projected word count of my manuscript (which is in Lyx) and to answer this question I found the amazingly useful script If you just run “wc” (or the equivalent in a text editor) on a tex or lyx file you count all the plain text and the markup code. Not only does this script screen out the meta-text, but it can give you detailed breakdowns of words, figures, and captions — all broken out by section.

I like to keep scripts in “~/scripts/” so to make this script readily accessible from the command-line I entered the command:

echo "alias texcount='perl ~/scripts/TeXcount_2_2/'" >> ~/.bashrc

Now to run the command I just go to the terminal and type

texcount foo.tex

You should really check out the options if you have a long and complex document. My favorite option is “-sub”. This gives a detailed breakdown of word count, figure count, etc, by chapter, section, or whatever.

texcount -sub foo.tex

Remember that if you always use a certain option, you can write it into the alias command.

Lyx has a similar basic command built in (Tools/Statistics), but it doesn’t give as much information and doesn’t break out the data by section. To use texcount with lyx files, you first need to export Lyx to Latex which you can do from the GUI (File/Export/Latex), but if you’re using texcount anyway you should just use the command line.

lyx --export latex foo.lyx

That works for Linux but on a Mac this will work more consistently

exec '/Applications/' --export latex foo.lyx

That’s a long command, so on my Mac I created an alias as “lyx2tex”

echo "alias lyx2tex='exec /Applications/ --export latex'" >> ~/.bashrc

Note that all this works on POSIX but may require some modification to work with Windows (unless it has CygWin).

September 21, 2009 at 5:31 am

Where was this published? Who cares? Viva Jeremy!

| Gabriel |

In honor of Jeremy’s election to the publications committee, I’m posting a BibTex style file that incorporates his campaign promise to abolish the anachronistic “place of publication” field from ASA citation style. The file is hand-modified from the Dierkes and Louch style file of the soon to be defunct ASA citation style.

Because it’s a particularly long bit of code it’s below the fold. (more…)

June 19, 2009 at 10:50 pm 1 comment

Anonymous pdf (updated)

| Gabriel |

One of the quaint traditions of academia is anonymous peer review. This is made somewhat difficult by the fact that all sorts of info is in the meta-text. Here’s a way to keep your name out of the meta text while making pdfs on a mac. (Steps two and three also work on Linux).

1. Save the file as postscript. On the Mac you do this the same way you save as pdf (that is you click the “pdf” button in the print dialog) except that you choose “save as postscript”

2. Open the postscript file in a text editor, search for all occurrences of your name and other identifying information etc and replace the strings with “anonymous.”

3. Open the terminal, “cd” to the right place, and type:


It takes a few seconds to execute, but that’s it. Yes it would be possible to batch it but unless you’re much more productive than me and send articles out for peer review every day it’s not really worth it.

If you want to double check that it worked you can use this procedure to search the meta-text of a pdf.


Based on this macosxhints thread I created this drag and drop Automator application that scrubs the “author” (your name) and “content creator” (e.g. MS Word) fields from an existing pdf. So just create the pdf normally, then use this application to clean it.

June 4, 2009 at 3:59 pm 2 comments

Workflow and literate programming

| Gabriel |

In a thread over on scatterplot, olderwoman asks for advice on automating the table-making in a report with oodles of cross-tabs. Kieran describes his hardcore geek workflow of using Sweave to integrate R code directly into LaTeX so that he regenerates things on the fly. Although it’s a much less mature project, he mentions that StatWeave is a more portable solution that should generalize this approach to Stata. This sounds very promising but it’s not exactly user-friendly, and let’s face it, the kind of people who like writing raw LaTeX code probably already use R.

The workflow solution I’ve been developing (especially for writing a book) is much simpler but works pretty well for me. Basically, instead of weaving the write-up and math together, I have two files. As long as I execute them in the proper order this is equivalent to using a literate programming solution like StatWeave but much simpler.

The first is a Stata file called* This do-file generates all the tables and figures in the book and saves them as png, pdf, or tex. The second are a series of Lyx files called chapterX_MMDDYY.lyx that contain both the actual writing (you know, of words, not code) and pointers to the figures and such. Text-based file formats like html, lyx, and tex don’t directly contain graphics but instead just point to them.** Unlike pasting a graphic into a Word document, when you update the thing they are pointing to it automatically propagates the update. (Of course this can be a disadvantage if you want to know how it used to be). So if I have a lyx file for chapter one that points to a file called ~/book/graphics/graph1_1.png, and then I use Stata to update the graph, then the next time I open Lyx and generate a PDF it’s going to have the new version of the graph.

So my solution gives similar results to the StatWeave approach. My way has two advantages. First, it has a much lower learning curve. Second, it makes it easier to use the same (or overlapping) sets of figures in two files. For instance imagine you wanted to try dozens of specifications for exploratory.pdf but only a few specifications for article.pdf. On the other hand there are two disadvantages to my way. One is that you have to remember to run the do-file before the lyx or tex file, but that’s no big deal. The other is that mine requires more effort on the marginal basis to sync the two files. With StatWeave, you just type the code generating a figure directly into the write-up file. With my way you first write the Stata code generating the figure into the do-file and you then go into the lyx or tex file and create a place-holder targeting the file generated by the Stata code.

*Most of my filenames end with the date because I save a new version of important files every day to facilitate debugging and/or buyer’s remorse about edits. This habit predates Time Machine but I still think it’s a good practice since Time Machine deletes really old versions of your files unless you have a truly ginormous backup disk.

**In the last few years most word processors, including Word, have switched from the old binary file formats to an xml based format. If you’ve ever created a webpage that has both html and jpg files this will be very familiar. However (for some very good reasons) they hide this from the user and make it feel like you’re still using a binary. I don’t know if you can do this on a PC, but on a mac you can select “show package contents” to see inside one of these documents and internally it works much the way I’m describing Lyx. However because Word, Pages, and OpenOffice all keep their own copy of the png file in the package rather than linking the source they would not propagate changes from the original. I’m sure there are ways to get regular word processors to do this (back in System 7 Macs had a clumsy attempt at this called “publish and subscribe”) but I’m happy with the Lyx approach.

April 7, 2009 at 11:05 am


| Gabriel |

There have been a lot of updates lately to the completely indispensable estout package.

If you’re thinking, what is this “estout” of which he speaks? Don’t walk, but run to your copy of Stata and type:

ssc install estout

If you already have estout and are trying to install the update try.

ssc install estout, replace

As every quant knows, getting Stata output into journal layout is really, really, tedious and you have to start all over and do it from scratch anytime you change anything about a model. When I was an undergrad I thought I was so cool when I realized I could read a log file into Excel as a fixed-width text file. This and some related tricks cuts down the time it takes to make a decent-sized regression table from about 40 minutes to about twenty minutes, but that’s still a pretty tedious 20 minutes.

So I was pretty happy when I learned about the various table-making commands that can do this for you. The first time somebody showed me how estout works I felt like one of the Munchkins after Dorothy killed the wicked witch of the East.

Estout cuts down table-making to between zero and five minutes, depending on how gung ho you are about tweaking the syntax. Really hardcore people have it output TeX that they embed directly in their write-up. The syntax is a little hard to learn but you generally only have to learn enough syntax to get it to work with one or two styles that you use often. Here’s my syntax to create an ASA-style table for a multi-level model with nested independent variables. I use it as fixed width because it makes it easier to import into a spreadsheet. (Excel really likes to think of parentheses as meaning “negative” rather than as literal strings).

eststo clear
eststo: xtreg y x1, re i(clusterid)
eststo: xtreg y x1 x2, re i(clusterid)
eststo: xtreg y x1 x2 x3, re i(clusterid)
esttab using table.txt , se b(3) se(3) scalars(ll rho) nodepvars nomtitles  label title(Table: REGRESSION MODELS OF SOMETHING) replace fixed

March 25, 2009 at 11:10 am 8 comments

Newer Posts

The Culture Geeks