## Posts tagged ‘causality’

### Control for x

| Gabriel |

An extremely common estimation strategy, which Roland Fryer calls “name that residual,” is to throw controls at an effect then say whatever effect remains net of the controls is the effect. Typically as you introduce controls the effect goes down, but not all the way down to zero. Here’s an example using simulated data where we do a regression of y (continuous) on x (dummy) with and without control (continuous and negatively associated with x).

```--------------------------------------------
(1)             (2)
--------------------------------------------
x                  -0.474***       -0.257***
(0.073)         (0.065)

control                             0.492***
(0.023)

_cons               0.577***        0.319***
(0.054)         (0.048)
--------------------------------------------
N                    1500            1500
--------------------------------------------
```

So as is typical, we see that even if you allow that x=1 tends to be associated with low values for control, you still see an x penalty. However this is a spurious finding since by assumption of the simulation there is no actual net effect of x on y, but only an effect mediated through the control.

This raises the question of what it means to have controlled for something. Typically we’re not really controlling for something perfectly, but only for a single part of a bundle of related concepts (or if you prefer, a noisy indicator of a latent variable). For instance when we say we’ve controlled for “human capital” the model specification might only really have self-reported highest degree attained. This leaves out both other aspects of formal education (eg, GPA, major, institution quality) and other forms of HC (eg, g and time preference). These related concepts will be correlated with the observed form of the control, but not perfectly. Indeed it can even work if we don’t have “omitted variable bias” but just measurement error on a single variable, as is the assumption of this simulation.

To get back to the simulation, let’s appreciate that the “control” is really the control as observed. If we could perfectly specify the control variable, the main effect might go down all the way to zero. In fact in the simulation that’s exactly what happens.

```------------------------------------------------------------
(1)             (2)             (3)
------------------------------------------------------------
x                  -0.474***       -0.257***       -0.005
(0.073)         (0.065)         (0.053)

control                             0.492***
(0.023)

control_good                                        0.980***
(0.025)

_cons               0.577***        0.319***        0.538***
(0.054)         (0.048)         (0.038)
------------------------------------------------------------
N                    1500            1500            1500
------------------------------------------------------------
```

That is, when we specify the control with error much of the x penalty persists. However when we specify the control without error the net effect of x disappears entirely. Unfortunately in reality we don’t have the option of measuring something perfectly and so all we can do is be cautious about whether a better specification would further cram down the main effect we’re trying to measure.

Here’s the code

```clear
set obs 1500
gen x=round(runiform())
gen control_good=rnormal(.05,1) - x/2
gen y=control_good+rnormal(0.5,1)
gen control=control_good+rnormal(0.5,1)
eststo clear
eststo: reg y x
eststo: reg y x control
esttab, se b(3) se(3) nodepvars nomtitles
eststo: reg y x control_good
esttab, se b(3) se(3) nodepvars nomtitles

*have a nice day```

### Conditioning on a Collider, Human Popsicle Edition

| Gabriel |

Robin Hanson mentions a poll of whether people would like to be cryogenically frozen and notes that the rate is higher for people outside the US, who he assumes are American expats and uses that as an argument about adventurous personality types. I’m skeptical that most of these people are American citizens as compared to foreigners who speak English, but let’s put that aside. The big problem is that this is a self-selected reader poll.

The short version of the problem is that reader polls are nearly worthless. The long version is that you’d expect to find a negative correlation between access to the poll and salience of its subject matter. Assume that there are two things driving participation in a reader poll:

1. Accessibility of the poll. In this case, the poll was hosted by ABC News, an American news organization that is presumably of greatest interest to people living in the US.
2. Salience of the poll’s subject matter. It’s pretty easy to imagine that cryonics fans may seek out material about cryonics. Some of them might have Google News alerts. Likewise it’s not exactly unheard of for fans of a band/politician/whatever to find out about a reader poll and direct other fans to it. Note that I’m assuming that for this issue salience is highly correlated with favorability.

If so, the magic of conditioning on a collider means that the subset of the population that responds to the reader poll will have an artifactual negative correlation between accessibility and salience. Anytime that censorship is related to the interaction of two variables then the observed data suffer artifactual results about the relationship between those two variables (and their close correlates).

That’s it. We’re done. No need to speculate about Indiana Jones and the Freezer Burn of Doom.

• Lisa sends along this set of instructions for doing a wide-long reshape in R. Useful and I’m passing it along for the benefit of R users, but the relative intuition and simplicity of “reshape wide stub, i(i) j(j)” is why I still do my mise en place in Stata whenever I use R. Ideally though, as my grad student Brooks likes to remind me, we really should be doing this kind of data mise en place in a dedicated database and use the Stata and R ODBC commands/functions to read it in.
• The days change at night, change in an instant.”
• Anyone interested in replicating this paper should be paying close attention to this pending natural experiment. In particular I hope the administrators of this survey are smart enough to oversample California in the next wave. I’d consider doing the replication myself but I’m too busy installing a new set of deadbolts and adopting a dog from a pit bull rescue center.
• In Vermont, a state government push to get 100% broadband penetration is using horses to wire remote areas that are off the supply curve beaten path. I see this as a nice illustration both of cluster economies and of the different logics used by markets (market clearing price) and states (fairness, which often cashes out as universal access) in the provision of resources. (h/t Slashdot)
• Yglesias discusses some poll results showing that voters in most of the states that recently elected Republican governors now would have elected the Democrats. There are no poll results for California, the only state that switched to the Democrats last November. Repeat after me: REGRESSION TO THE MEAN. I don’t doubt that some of this is substantive backlash to overreach on the part of politically ignorant swing voters who didn’t really understand the GOP platform, but really, you’ve still got to keep in mind REGRESSION TO THE MEAN.
• Speaking of Yglesias, the ThinkProgress redesign only allows commenting from Facebook users, which is both a pain for those of us who don’t wish to bear the awesome responsibility of adjudicating friend requests and a nice illustration of how network externalities can become coercive as you reach the right side of the s-curve.

### Apples to Apples

| Gabriel |

At Slate’s XX blog, Amanda Marcotte discusses a report by Brad Wilcox (a good friend of mine from grad school) and basically asks if we can be sure that marriage actually benefits people or is it just selection:

Sure, marriage chauvinists can point to things such as marriage’s impact on health and well-being, and to the fact that married men are less anti-social. I’m skeptical, though, because these kinds of studies lump all nonmarried people into one group. People who are in long term, committed relationships without that piece of paper are put in the same group as people who’ve never held a relationship together. I want to see apples to apples comparisons. How do unmarried people who’ve been together for five or 10 years hold up next to people who have been together that long but tied the knot in their first year or two together?

Let’s put aside the fact that cohabitating couples are indeed different from married couples even holding constant duration. There’s an interesting question here and it has a lot to do with what you mean by “apples to apples” and how such commensuration works under different logical premises. The traditional approach is to throw controls at something. So following this logic, Marcotte is exactly right to say that we ought to be controlling for things like duration of the union.

However traditional (or common in scholarly practice) such an implication may be, it just ain’t right. Suppose that a couple in their mid-to-late twenties came to you and said, “We are deeply in love and committed to each other. We’d probably like to have children at some point. Given that we have no religious reasons to get married, should we marry or just move in together?” It would be silly to base your advice to this young couple on the expected quality of their union conditional on it not dissolving. The reason is that one of the worst things that is likely to happen to this couple is breaking up and all the ugliness this entails, especially if they have kids. If dissolution is much more likely if they are cohabiting then this is worth factoring in rather than assuming away. The total treatment effect of marriage versus cohabitation needs to count from formation forward and include all end states rather than using a cross-sectional sample of survivors (which from the viewpoint of formation is systematically censored) and using tenure as a variable. The same logic applies in more extreme forms to tournament model labor markets. If you want to know whether being a rock star is a good career aspiration you shouldn’t look just at rock stars but also at the far more numerous people who seriously pursued careers in pop music but nonetheless failed at it.

OK, let’s make a slightly more charitable argument and assume that Marcotte understands everything I just said but thinks that there is selection into cohabitation versus marriage such that most of these ephemeral cohabitations would have dissolved even if they were marriages. In particular she seems to have in mind some kind of selection on the unobservable of “commitment.” This is a fair point in part but unlikely to change anything much. First, if an unobservable is correlated with observables then omitted variable bias is mitigated. Second, something like unobservable “commitment” is likely to be a relatively ephemeral inclination and most of its impact should be early in the union whereas the hazard for dissolution of cohabitation remains persistently high rather than crashing after a few years like the hazard for divorce. We have support for this in qualitative data. Contrary to “that baby ain’t mine bitch” stereotypes, a very high proportion of unwed fathers express a commitment to a newborn child and (to a slightly lesser extent) the child’s mother. Nonetheless unless they marry the mothers, within a few years most of these fathers will have broken up with the mothers and an almost inevitable consequence of the dissolution is much less involvement with the child than the father initially expressed the intention to maintain. I’m generally inclined to think that accounts are as much cultural scripts as reliable articulations of preferences and logics of action, but the rich qualitative work in this area seems to have established that the men are sincere and so I’m willing to say that if you’re not convinced by the last 20 years of qualitative and quantitative demography that there really is a substantial treatment effect of marriage then the burden of proof is on you to prove that there’s not.

### Conditioning on a Collider Between a Dummy and a Continuous Variable

| Gabriel |

In a post last year, I described a logical fallacy of sample truncation that helpful commenters explained to me is known in the literature as conditioning on a collider. As is common, I illustrated the issue with two continuous variables, where censorship is a function of the sum. (Specifically, I used the example of physical attractiveness and acting ability for a latent population of aspiring actresses and an observed population of working actresses to explain the paradox that Megan Fox was considered both “sexiest” and “worst” actress in a reader poll).

In revising my notes for grad stats this year, I generalized the problem to cases where at least one of the variables is categorical. For instance, college admissions is a censorship process (only especially attractive applicants become matriculants) and attractiveness to admissions officers is a function of both categorical (legacy, athlete, artist or musician, underrepresented ethnic group, in-state for public schools or out-of-state for private schools, etc) and continuous distinctions (mostly SAT and grades).

For simplicity, we can restrict the issue just to SAT and legacy. (See various empirical studies and counterfactual extrapolations by Espenshade and his collaborators for how it works with the various other things that determine admissions.) Among college applicant pools, the children of alumni to prestigious schools tend to score about a hundred points higher on the SAT than do other high school students. Thus the applicant pool looks something like this.

However, many prestigious colleges have policies of preferring legacy applicants. In practice this mean that the child of an alum can still be admitted with an SAT score about 150 points below non-legacy students. Thus admission is a function of both SAT (a continuous variable) and legacy (a dummy variable). This implies the paradox that the SAT scores of legacies are about half a sigma above average for the applicant pool but about a full sigma below average in the freshman class, as seen in this graph.

Here’s the code.

```clear
set obs 1000
gen legacy=0
replace legacy=1 in 1/500
lab def legacy 0 "Non-legacy" 1 "Legacy"
lab val legacy legacy
gen sat=0
replace sat=round(rnormal(1100,250)) if legacy==1
replace sat=round(rnormal(1000,250)) if legacy==0
lab var sat "SAT score"
recode sat -1000/0=0 1600/20000=1600 /*top code and bottom code*/
graph box sat, over(legacy) ylabel(0(200)1600) title(Applicants)
graph export collider_collegeapplicants.png, replace
graph export collider_collegeapplicants.eps, replace
ttest sat, by (legacy)
keep if (sat>1400 & legacy==0) | (sat>1250 & legacy==1)
graph box sat, over(legacy) ylabel(0(200)1600) title(Admits)
ttest sat, by (legacy)
*have a nice day```

### Or you could just do regressions

| Gabriel |

Over at the “Office Hours” podcast (née Contexts podcast), Jeremy Freese gives an interview about sociology and genetics. The main theme of it is that when you have a model characterized by nonlinearity, positive feedback, and other sorts of complexity, you can get misleading results from models with essentially additive assumptions like the models we use to calculate heritability coefficients. (Heritability is closely analogous to a Pearson correlation coefficient. It is usually calculated from data about outcomes for fraternal vs identitical twins and uses reasonable assumptions about how much genetics these twins share, respectively 0.5 vs 1.0).

Jeremy gives the example that if people have small differences in natural endowments, but they specialize in human capital formation in ways that play to their endowments, then this will show up as very high heritability. Jeremy suggests this is misleading since the actual genetic impact on initial endowment is relatively small. I agree in a sense, but in another way, it’s not misleading at all. That is, the heritability coefficient is accurately reflecting that a condition is a predictable consequence of genetics even if the causal mechanism is in some sense social rather than entirely about amino acids.

This is exactly the same issue as an argument I had with one of my co-authors a few years ago. We were studying how pop songs spread across radio and dividing how much of this was endogenous (stations imitating each other) versus exogenous (stations all imitating something else). The argument was how to understand the effects of the pop charts published in Billboard and Radio & Records. One of my co-authors was arguing that these are not radio stations but periodicals and therefore should be considered exogenous to the system of radio stations. Myself and the other author held the position that appearing on the pop charts is an entirely predictable consequence of being played by a lot of radio stations and therefore it is endogenous, even if the effect is proximately channeled through something outside the system. I believe this is true in an ontological sense but it’s also a convenient belief since it’s necessary to make the math work.

Anyway, back to Jeremy’s case, you have a lot of things that are predictable outcomes of genetic endowment but for the sake of argument we can assume that we are really dealing with a small initial effect that is greatly magnified by a social mechanism. I would submit that in the current set of social circumstances the heritability coefficient as naively measured is very informative. This is sometimes contrasted with how informative it is in the abstract, but if you take gene-environment interdependence (or any complex system) seriously, then “in the abstract” is a meaningless concept. Rather you can only think about a counterfactual heritability coefficient in a counterfactual social system. This calls out for counterfactual causality logic to see how effects vary on different margins, etc, of the sort developed by Pearl and operationalized for social scientists by Morgan and Winship.

Currently, American social structure allows a lot of self-assignment to different trajectories, including an expensive (at both the personal and societal level) system of “second chances” for people to get back into the academic trajectory whether they show much aptitude for it or not and have sufficient remaining years in the labor market to amortize the human capital expense or not. As such there is sorting but it’s fairly subtle and to a substantial extent voluntary. This is the situation Jeremy describes in his stylized example of people voluntarily accruing human capital to complement natural endowments.

We can contrast this with two hypothetical scenarios. In counterfactual A, imagine that we had perfect sorting to match aptitude to development. Think of how the military uses the ASVAB to assign recruits to occupational specialties. Better yet, imagine some perfectly measured and perfectly interpreted genetic screen for aptitudes measured at birth, and on that basis we sent people from daycare onwards into a humanities track, a hard science track, or various blue collar vocational tracks with no opportunity for later transfers between tracks. That is, in this scenario we would see much stronger sorting to match aptitude and career than in the status quo. In counterfactual B, we can imagine that people are again permanently and coercively tracked, but tracking is assigned by a roulette wheel. That is, there would be no association between endowments and later experiences. In these two scenarios we could puzzle out a variety of consequences. Aside from the degradation of freedom taken as an assumption of the counterfactuals, the most obvious implications are that higher sorting would increase the dispersion of various outcome measures and the apparent heritability effect whereas random sorting would decrease outcome dispersion and measured heritability.

When people talk about heritability coefficients being biased as high, they seem to have in mind something like the random sorting model. This model strikes me as only useful as a thought experiment to establish the lower bounds of heritability since in the real world a Harrison Bergeron dystopia isn’t terribly likely. Rather we can think of scenarios that are roughly similar to reality, but vary on some margin. For instance, we can imagine how various policies (e.g., merit scholarships vs. need-based scholarships) might increase or decrease the sorting of genetic endowment and complementary human capital development on the margin and by extension what impact this would have on the distribution and covariation of outcomes.

[Update 10/22/2010: On further reflection, I can think of a scenario where a naive reading of heritability coefficients would still strike me as grossly misleading, even if it were reliable, and I would prefer the “random assignment” counterfactual as “true” heritability. Imagine a society that is genetically homogenous as to skin pigmentation genes, but where having detached earlobes were a social basis for assigning people to work indoors. In this scenario, there would be non-trivial heritability for skin color even though (by assumption) this society has no variance (and hence no heritability) for genes directly affecting pigmentation. Similarly, imagine a society where children without cheek dimples were exposed to ample lead and inadequate iodine, thereby making the undimpled into a hereditary caste of half-wits even though the genes that create dimples have no direct effect on g. I suppose what I’m getting at is that social mechanisms that select on and magnify genetic endowments are one thing, whereas social processes based on completely orthogonal stigma are another.]

### Sampling on the independent variables

| Gabriel |

At Scatterplot, Jeremy notes that in a reader poll, Megan Fox was voted both “worst” and “sexiest” actress. Personally, I’ve always found Megan Fox to be less sexy than a painfully deliberate simulacra of sexy. The interesting question Jeremy asks is whether this negative association is correlation or causation. My answer is neither, it’s truncation.

What you have to understand is that the question is implicitly about famous actresses. It is quite likely that somewhere in Glendale there is some barista with a headshot by the register who is both fugly and reads lines like a robot. However this person is not famous (and probably not even Taft-Hartleyed). If there is any meritocracy at all in Hollywood, the famous are — on average — going to be desirable in at least one dimension. They may become famous because they are hot or because they are talented, but our friend at the Starbucks on Colorado is staying at the Starbucks on Colorado.

This means that when we ask about the association of acting talent and sexiness amongst the famous, we have censored data where people who are low on both dimensions are censored out. Within the truncated sample there may be a robust negative association, but the causal relationship is very indirect, and it’s not as if having perky breasts directly obstructs the ability to convincingly express emotions (a botoxed face on the other hand …).

You can see this clearly in simulation (code is at the end of the post). I’ve modeled a population of ten thousand aspiring actresses as having two dimensions, body and mind, each of which is drawn from a random normal. As built in by assumption, there is no correlation between body and mind.

Stars are a subsample of aspirants. Star power is defined as a Poisson centered on the sum of body and mind (and re-centered to avoid negative values). That is, star power is a combination of body, mind, and luck. Only the 10% of aspirants with the most star power become famous. If we now look at the correlation of body and mind among stars, it’s negative.

This is a silly example, but it reflects a serious methodological problem that I’ve seen in the literature and I propose to call “sampling on the independent variable.” You sometimes see this directly in the sample construction when a researcher takes several overlapping datasets and combines them. If the researcher then uses membership in one of the constituent datasets (or something closely associated with it) to predict membership in another of a constituent datasets (or something closely associated with it), the beta is inevitably negative. (I recently reviewed a paper that did this and treated the negative associations as substantive findings rather than methodological artifacts).

Likewise, it is very common for a researcher to rely on prepackaged composite data rather than explicitly creating original composite data. For instance, consider that favorite population of econ soc, the Fortune 500. Fortune defines this population as the top 500 firms ranked by sales. Now imagine decomposing sales by industry. Inevitably, sales in manufacturing will be negatively correlated with sales in retail. However this is an artifact of sample truncation. In the broader population the two types of sales will be positively correlated (at least among multi-dimensional firms).

```clear
set obs 10000
gen body=rnormal()
gen mind=rnormal()
*corr in the population
corr body mind
scatter body mind
graph export bodymind_everybody.png, replace
*keep only the stars
gen talent=body+mind+3
recode talent -100/0=0
gen stardom=rpoisson(talent)
gsort -stardom
keep in 1/1000
*corr amongst stars
corr body mind
scatter body mind
graph export bodymind_stars.png, replace
```