Archive for March, 2012

Control for x

| Gabriel |

An extremely common estimation strategy, which Roland Fryer calls “name that residual,” is to throw controls at an effect then say whatever effect remains net of the controls is the effect. Typically as you introduce controls the effect goes down, but not all the way down to zero. Here’s an example using simulated data where we do a regression of y (continuous) on x (dummy) with and without control (continuous and negatively associated with x).

```--------------------------------------------
(1)             (2)
--------------------------------------------
x                  -0.474***       -0.257***
(0.073)         (0.065)

control                             0.492***
(0.023)

_cons               0.577***        0.319***
(0.054)         (0.048)
--------------------------------------------
N                    1500            1500
--------------------------------------------
```

So as is typical, we see that even if you allow that x=1 tends to be associated with low values for control, you still see an x penalty. However this is a spurious finding since by assumption of the simulation there is no actual net effect of x on y, but only an effect mediated through the control.

This raises the question of what it means to have controlled for something. Typically we’re not really controlling for something perfectly, but only for a single part of a bundle of related concepts (or if you prefer, a noisy indicator of a latent variable). For instance when we say we’ve controlled for “human capital” the model specification might only really have self-reported highest degree attained. This leaves out both other aspects of formal education (eg, GPA, major, institution quality) and other forms of HC (eg, g and time preference). These related concepts will be correlated with the observed form of the control, but not perfectly. Indeed it can even work if we don’t have “omitted variable bias” but just measurement error on a single variable, as is the assumption of this simulation.

To get back to the simulation, let’s appreciate that the “control” is really the control as observed. If we could perfectly specify the control variable, the main effect might go down all the way to zero. In fact in the simulation that’s exactly what happens.

```------------------------------------------------------------
(1)             (2)             (3)
------------------------------------------------------------
x                  -0.474***       -0.257***       -0.005
(0.073)         (0.065)         (0.053)

control                             0.492***
(0.023)

control_good                                        0.980***
(0.025)

_cons               0.577***        0.319***        0.538***
(0.054)         (0.048)         (0.038)
------------------------------------------------------------
N                    1500            1500            1500
------------------------------------------------------------
```

That is, when we specify the control with error much of the x penalty persists. However when we specify the control without error the net effect of x disappears entirely. Unfortunately in reality we don’t have the option of measuring something perfectly and so all we can do is be cautious about whether a better specification would further cram down the main effect we’re trying to measure.

Here’s the code

```clear
set obs 1500
gen x=round(runiform())
gen control_good=rnormal(.05,1) - x/2
gen y=control_good+rnormal(0.5,1)
gen control=control_good+rnormal(0.5,1)
eststo clear
eststo: reg y x
eststo: reg y x control
esttab, se b(3) se(3) nodepvars nomtitles
eststo: reg y x control_good
esttab, se b(3) se(3) nodepvars nomtitles

*have a nice day```

Recursively building graph commands

| Gabriel |

I really like multi-line graphs and scatterplots where the marker color/style reflects categories. Such graphs are both more compact than just having multiple graphs and they make it easier to compare different things. The way you do this is with “twoway,” a lot of parentheses, and the “if” condition. For example:

`twoway (kdensity x if year==1985) (kdensity x if year==1990) (kdensity x if year==1995) (kdensity x if year==2000) (kdensity x if year==2005), legend(order(1 "1985" 2 "1990" 3 "1995" 4 "2000" 5 "2005"))`

Unfortunately such graphs can be difficult to script. This is especially so for the legends, which by default show the variable name rather than the selection criteria. I handle this by recursively looping over a local, which in the case of the legend involves embedded quote marks.

```capture program drop multigraph
program define multigraph
local var         `1'
local interval	  `2'

local command ""
local legend ""
local legendtick=1
forvalues i=1985(`interval')2005 {
local command "`command' (kdensity `var' if year==`i')"
local legend `legend' `legendtick' `" `i' "'
*"
local legendtick=`legendtick'+1
}
disp "twoway ""`command'" ", legend:" "`legend'"
twoway `command' , legend(order(`legend'))
end```

To replicate the hard-coded command above, you’d call it like this:

`multigraph x 5`

Bleg on Failure

| Gabriel |

A friend is writing a trade book on failure and was interested in what sociologists have to say about it.

The scope is pretty broad, including both people and organizations, why and to whom failure happens, and how things either prove robust to failure or are permanently trapped by it.

I’ve already suggested several things ranging from this to this, but was hoping my beloved readers could post some reading suggestions to the comments. I’m asking both because I like the author and think it will be a good book and also because it’s a good way to get our ideas out there. That is, if you don’t suggest things now, you don’t get to bitch later that nobody pays attention to sociological research.