## Predicted vignettes

| Gabriel |

One of my favorite ways to interpret a complicated model is to make up hypothetical cases and see what predicted values they give you. For instance, at the end of my Oscars paper with Esparza and Bonacich, we compared predicted probabilities of Oscar nomination for the 2 x 2 of typical vs exceptional actors in typical vs exceptional films. Doing so helps make sense of a complicated model in a way that doesn’t boil down to p-value fetishism. Stata 11 has a very useful “margins” command for doing something comparable to this, but as best as I can tell, “margins” only works with categorical variables.

The way I like to do this kind of thing is to create some hypothetical cases representing various scenarios and use the “predict” command to see what the model predicts for such a scenario. (Note that you can do this in Excel, which is handy for trying to interpret published work, but it’s easier to use Stata for your own work). Below is a simple illustration of how this works using the 1978 cars dataset. In real life it’s overkill to do this for a model with only a few variables where everything is linear and there are no interactions, but this is just an illustration.

Also, note that this approach can get you into trouble if you feed it nonsensical independent variable values. So the example below includes the rather silly scenario of a car that accomplishes the formidable engineering feat of getting good mileage even though it’s very heavy. A conceptually related problem is that you have to be careful if you’re using interaction terms that are specified by the “gen” command (which is a good reason to use factor variables when possible instead of “gen” for interactions). Another way you can get in trouble is going beyond the observed range (e.g., trying to predict the price of a car that gets 100 mpg).

```sysuse auto, clear

local n_hypothetical 10

*create the hypothetical data

*first, append some (empty) cases
local biggerdata=[_N]+`n_hypothetical'
set obs `biggerdata'
gen hypothetical=0
replace hypothetical=1 in -`n_hypothetical'/L

*for each of these hypothetical cases, set all independent variables to the
* mean
*note that you can screw up here if some of the real cases are missing data
* on some variables but not others. in such an instance the means for the
* analytic subset will not match those of the hypothetical cases
foreach var in `indyvars' {
sum `var'
replace `var'=`r(mean)' in -`n_hypothetical'/L
}

*change selected hypothetical values to be theoretically interesting values
*i'm using a nested loop with a tick counter to allow combinations of values
* on two dimensions
*if you only want to play with a single variable this can be a lot simpler
* alternately you don't need to loop at all, but can just "input" or "edit"
local i=-`n_hypothetical'
forvalues mpg=0/1 {
forvalues weight=0/4 {
quietly replace mpg=20 + 20*`mpg' in `i'
quietly replace weight=2000 + 500*`weight' in `i'
local i=`i'+1
}
}

*make sure that hypothetical cases are missing on Y and therefore don't go
*into regression model
replace price=. in -`n_hypothetical'/L

*do the regression
preserve
*to be extra sure the hypotheticals don't bias the regression, drop them,
keep if hypothetical==0
reg price `indyvars'
*then bring them back
restore

*create predicted values (for all cases, real and hypothetical)
* this postestimation command does a lot of the work and you should explore
* the options, which are different for the various regressions commands
predict yhat

*create table of vignette predictions (all else held at mean or median)
preserve
keep if hypothetical==1
table mpg weight, c(m yhat)
restore```

Entry filed under: Uncategorized. Tags: .

• 1. Adam  |  September 13, 2010 at 4:33 am

Sounds like unit testing for models?

• 2. Stata 11 Factor Variable / Margins Links « Code and Culture  |  September 28, 2010 at 4:31 am

[...] purposes of interpreting the betas (especially when there’s some nonlinearity involved). In a recent post explaining a comparable approach, I said that my impression was that margins only really works with categorical independent [...]