Publication bias

March 17, 2009 at 3:23 pm 3 comments

| Gabriel |

One of the things I try to stress to my grad students is all the ways that stats can go wrong. Case in point is publication bias (the tendency of scientists to abandon, and journals to reject, work that is not statistically significant). The really weird thing about publication bias is that it means that the p-value means different things depending on where you read it. When you run numbers in Stata and it tells you “p<.05” it more or less means what you think it does (i.e., probability of seeing these results if the null were true). However, when you publish that same result and I read it I should interpret the p-value more conservatively.

The reason is that when you get a null finding you tend to give up and if you are so tenacious as to submit it, the peer reviewers tend to reject it. So if you think of the literature as a sample, the null findings tend to be censored out. This wouldn’t necessarily be a problem except that the literature does not also censor false positives. We would expect there to be rather a lot of false positives since the conventional alpha means that about 1 in 20 analyses of noise data would appear significant.

Really what you want the p-value to mean is “what’s the probability that this result is a fluke?” When it first comes out of Stata it basically has that interpretation. But once it’s gone through the peer review process a much more relevant question is:


Yes, this makes my head hurt too and it’s even worse because the figure corresponding to the left-side of the equation doesn’t appear anywhere in the tables. But the take home is that the censorship process of peer review implies that p-values are too generous, even if you assume no problems with specification, measurement error, etc. (and there’s no reason not to assume those problems).

Anyway, that’s the logic of publication bias and the most famous study of it in the social sciences is by Card and Krueger. They were trying to explain why, in a previous study, they found that an increase in the minimum wage increases employment of unskilled labor. This finding is pretty surprising since the law of supply and demand predicts that an exogenous increase in price (of labor) will lead to a decrease in the quantity (of labor) demanded. Likewise a decent size literature had findings consistent with the orthodox prediction. Card and Krueger therefore had to explain why their (well-designed) PA/NJ natural experiment was so anomalous. Against the theoretical argument they basically argued that various kinds of friction lead to the “assume a can opener” version of theory not bearing out. This sounds plausible enough, though I think it’s very likely that such friction is most relevant in the short-run and for fairly small changes in price so I would be very skeptical that their finding about a $1 increase in the minimum wage would generalize to, say, a $10 increase.

The more interesting thing they did was argue that the literature was censored and that there were in fact a large number of studies that either found no effect or (like their PA/NJ study) a small positive effect on employment, but these studies were never published. This sounds like an eminently unprovable theory of the sort given by stubborn paranoids, but in fact it had testable empirical implications which they demonstrated in a meta-analysis of minimum wage studies. Specifically, statistical significance is a function of the root of sample size and so weaker effects are “significant” with a larger sample. Therefore a spurious literature should have a negative correlation between n and beta but no correlation between n and t. On the other hand, a true literature should not be censored and therefore n should have no correlation with beta and a positive correlation with t. As an illustration for my grad students I wrote this simulation which shows this to be the case.

In a typical run, the simulation produced this graph:


begin code
*this Stata do file illustrates publication bias by simulating two literatures
*   in each case a binary variable is used to predict a continuous variable
*   in literature "spurious" there is no true underlying effect
*   in literature "true" there is an underlying effect of an arbitrary size defined by the parameter "e"
*   multiple studies with varying n are simulated for both litertures and only statistically significant results are "published"
*   finally, it shows the distribution of "published" results for both literatures
*dependency: "est_table.ado" to install "ssc install estout, replace"
*since i'm not very good with scalars, etc, this program involves writing some temp files to disk. i suggest that before running the file you "cd" to a convenient temp directory for later elimination

global trials=2000
*each literature gets $trials initial studies (ie, potential publications) at each sample size (defined below)
*   this is an unrealisticly large number but is meant to imply inference to an infinite number of studies

global e=.2
*the "true" literature's underlying model is Y=0 + $e * X
*   where X is a binary variable found half of the time
*the "spurious" literature's underlying model is Y=0
*in both cases Y is drawn from a normal

capture program drop pubbias
program define pubbias
 set more off
 capture macro drop n effect effect00
 global effect `1' /* size of the true effect. should be between 0 and 1 */
 global n      `2' /*how large should the sample size be per trial*/
 global effect00=$effect*100
 set obs 1
 gen v2="t"
 outsheet using e$n.txt, replace
 set obs $n
 gen predictor=0
 local halfn=($n/2) +1
 replace predictor=1 in `halfn'/$n
 gen fakep=predictor
 gen effect=rnormal()
 gen fakeeffect=rnormal()
 forvalues t=1/$trials {
  replace effect=rnormal()
  replace effect=rnormal()+$effect if predictor==1 /*note, the effect is created here for the true model */
  replace fakee=rnormal()
  regress effect predictor
  estout using e$n.txt, cell(t) append
  regress fakee fakep
  estout using e$n.txt, cell(t) append
  disp "iteration `t' of $trials complete"
 insheet using e$n.txt, clear
 keep if v1=="predictor" | v1=="fakep"
 gen t=real(v2)
 gen published=0
 replace published=1 if t>=1.96 /*note that this is where the censorship occurs, you can manipulate alpha to show type I vs II error trade-off. likewise you can make the criteria more complicated with a weighted average of t and n or by adding a noise element */
 gen pubbias=0
 replace pubbias=1 if v1=="fakep"
 keep published t pubbias
 gen n=$n
 save e$n.dta, replace

pubbias $e 200

pubbias $e 400

pubbias $e 600

pubbias $e 800

pubbias $e 1000

use e200, clear
append using e400
append using e600
append using e800
append using e1000
sort n pubbias

*traditionally, meta-analysis does a scatterplot, but i use a boxplot to avoid either having dots superimposed on each other or having to add jitter

lab def pubbias 0 "true effect" 1 "spurious findings"
lab val pubbias pubbias

graph box t if published==1, over(n, gap(5) label(angle(vertical))) over(pubbias) title("Simulation of Reliable Literature Vs Publication Bias") ytitle("Range of T Statistics Across Published Literature") note("Y is drawn from a standard normal. T is for beta, which is $e in the true lit and 0 in the false lit.")

*have a nice day


Entry filed under: Uncategorized. Tags: , , , .

Do it to everything in the directory Bass projections


  • 1. gabriel returns «  |  March 20, 2009 at 1:22 am

    […] and theoretical problems and then writes down related Stata script. For example, check out this post discussing the publication bias favoring statistically significant findings. The end of the post […]

  • 2. Scientific Inference, part 1 of 4 « Code and Culture  |  March 30, 2009 at 9:46 am

    […] by an empirical test (in part because of the problem of false positives being exaggerated by publication bias). I entirely agree with them about the relative merits of traditional vs. alternative medicine and […]

  • […] with an empirical application to religion and development. This is similar to the issue of publication bias but more complicated and harder to formally model. (You can simulate the model uncertainty problem […]

The Culture Geeks

%d bloggers like this: