Posts tagged ‘philosophy of science’

You Broke Peer Review. Yes, I Mean You

| Gabriel |

I’m as excited as anybody about Sociological Science as it promises a clean break from the “developmental” model of peer review by moving towards an entirely evaluative model. That is, no more anonymous co-authors making your paper worse with a bunch of non sequiturs or footnotes with hedging disclaimers. (The journal will feature frequent comment and replies, which makes debate about the paper a public dialog rather than a secret hostage negotiation). The thing is though that optimistic as I am about the new journal, I don’t think it will replace the incumbent journals overnight and so we still need to fix review at the incumbent journals.

So how did peer review get broken at the incumbent journals?


You and I broke it.

Your average academic’s attitude towards changes demanded in R&R is like the Goofy cartoon “Motor Mania.”

In this cartoon Goofy is a meek pedestrian dodging aggressive drivers but as soon as he gets behind the wheel himself he drives like a total asshole. Similarly, as authors we all feel harassed by peer reviewers who try to turn our paper into the paper they would have written, but then as reviewers ourselves we develop an attitude of “yer doin it wrong!!!” and start demanding they cite all our favorite articles and with our favorite interpretations of those articles. (Note that in the linked post, Chen is absolutely convinced that she understands citations correctly and the author has gotten them wrong out of carelessness, without even considering the possibility that the interpretive flaw could be on her end or that there might be a reasonable difference of opinions).

So fixing peer review doesn’t begin with you, the author, yelling at your computer “FFS reviewer #10, maybe that’s how you would have done it, but it’s not your paper” (and then having a meeting with your co-authors that goes something like this):

And then spending the next few months doing revisions that feel like this:

And finally summarizing the changes in a response memo that sounds like this:

Nor, realistically, can fixing peer review happen from the editors telling you to go ahead and ignore comments 2, 5, and 6 of reviewer #6. First, it would be an absurd amount of work for the editors to adjudicate the quality of comments. Second, from the editor’s perspective the chief practical problem is recruiting reviewers and getting timely reviews from them and so they don’t want to alienate the reviewers by telling them that half their advice sucks in their cover letter any more than you want to do that in your response memo.

Rather, fixing peer review has to begin with you, the reviewer, telling yourself “maybe I would have done it another way myself, but it’s not my paper.” You need to adopt a mentality of “is it good how the author did it” rather than “how could this paper be made better” (read: how would I have done it). That is the whole of being a good reviewer, the rest is commentary. That said, here’s the commentary.

Do not brainstorm
Responding to a research question by brainstorming possibly relevant citations or methods is a wonderful and generous thing to do when a colleague or student mentions a new research project but it’s a thoroughly shitty thing to do as a peer reviewer. There are a few reasons why the same behavior is so different in two different contexts.

First, many brainstormed ideas are bad. When I give you advice in my office, you can just quietly ignore the ideas I give you that don’t work or are superfluous. When I give you advice as a peer reviewer there is a strong presumption that you take the advice even if it’s mediocre which is why almost every published paper has a couple of footnotes along the lines of “for purposes of this paper we assume that water is wet” or “although it has almost nothing to do with this paper, it’s worth noting that Author (YEAR) is pretty awesome.” Of course some suggestions are so terrible that the author can’t take them in good conscience but in such cases the author needs to spend hours or days per suggestion writing an apologetic and extremely deferential memo apologizing for not implementing the reviewer’s suggestions.

Second, many brainstormed ideas are confusing. When I give you advice in my office you can ask follow-up questions about how to interpret and implement it. When I give advice as a peer reviewer it’s up to you to hope that you read the entrails in a way that correctly augurs the will of the peer reviewers. As a related point, be as specific as possible. “This paper needs more Bourdieu” is a not terribly useful comment (indeed, “cite this” comments without further justification are usually less about any kind of intellectual content than they are about demanding shibboleths or the recitation of a creedal confession) whereas it might actually be pretty helpful to say “your argument about the role of critics on pages 4-5 should probably be described in terms of restricted field from Bourdieu’s Field of Cultural Production.” (Being specific has the ancillary benefit that it’s costly to the reviewer which should help you maintain the discipline to thin the mindfart herd stampeding into the authors’ revisions.)

Third, ideas are more valuable at the beginning of a project than at the end of it. When I give you advice about your new project you can use it to shape the way the project develops organically. When I give it to you as a reviewer you can only graft it on after the fact. My suggested specification may check the robustness of your finding or my suggested citation may help you frame your theory in a way that is more appealing, but they can’t help you develop your ideas because that ship has sailed.

That’s not to say that you shouldn’t give an author advice on how to fix problems with the paper. However it is essential to keep in mind that no matter how highly you think of your own expertise and opinions, you remember that the author doesn’t want to hear it. When you give advice, think in terms of “is it so important that these changes be made that I upset the author and possibly delay publication at a crucial career point.” Imagine burning a $20 bill for every demand you make of the author and ask yourself if you’d still make it. Trust me, the author would pay a lot more than $20 to avoid it — and not just because dealing with comments is annoying but because it’s time-consuming and time is money. It usually takes me an amount of time that is at least the equivalent of a course release to turn-around an R&R and at most schools a course release in turn is worth about $10,000 to $30,000 if you’re lucky enough to raise the grants to buy them. If you think about the productivity of science as a sector then ask yourself if your “I’m just thinking out loud” comment that takes the author a week to respond to is worth a couple thousand dollars to society. I mean, I’ve got tenure so in a sense I don’t care but I do feel a moral obligation to give society a good value in exchange for the upper middle class living it provides me and I don’t feel like I’m getting society its money’s worth when I spend four months of full-time work to turn around one round of R&R instead of getting to my next paper. This brings me to my next point…

Distinguish demands versus suggestions versus synapses that happened to fire as you were reading the paper
A lot of review comments ultimately boil down to some variation on “this reminds me of this citation” or “this research agenda could go in this direction.” OK, great. Now ask yourself, is it a problem that this paper does not yet do these things or are these just possibilities you want to share with the author? Often as not they’re really just things you want to share with the author but the paper is fine without them. If so, don’t demand that the author do them. Rather just keep it to yourself or clearly demarcate these as optional suggestions that the author may want to consider, possibly for the next paper rather than the revision of this paper.

As a related issue, demonstrate some rhetorical humility. Taking a commanding and indignant tone doesn’t mean you know what you’re talking about. On a recent review I observed, I noticed that one of the reviewers whose (fairly demanding) comments seemed to reflect a deep understanding of the paper nonetheless used a lot of phrases like “might,” “consider,” “could help,” etc whereas another reviewer who completely missed the point of the paper was prone to phrases like “needs to” and “is missing.”

There’s wrong and then there’s difference of opinion
On quite a few methodological and theoretical issues there is a reasonable range of opinion. Don’t force the author to weigh in on your side. It may very well be appropriate to suggest that the author acknowledge the existence of a debate on this subject (and perhaps briefly explore the implications of the alternative view) but that’s a different thing from expecting that the author completely switch allegiances because error has no rights. Often such demands are tacit rather than explicit, just taking for granted that somebody should use, I don’t know, Luhmann, without considering that the author might be among the many people who if told “you can cite Luhmann or you can take a beating” would ask you “tell me more about this beating? will there be clubs involved?”


For instance, consider Petev ASR 2013. The article relies heavily on McPherson et al ASR 2006, which is an extremely controversial article (see here, here, and here). One reaction to this would be to say the McPherson et al paper is refuted and ought not be cited. However Petev summarizes the controversy in footnote 10 and then in footnote 17 explains why his own data is a semi-independent (same dataset, different variables) corroboration of McPherson et al. These footnotes acknowledge a nontrivial debate about one of the article’s literature antecedents and then situates the paper within the debate. No matter what your opinion of McPherson et al 2006, you should be fine with Petev relying upon and supporting it while parenthetically acknowledging the debate about it.

There are also issues of essentially theoretical nature. I sat on one of my R&R for years in large part because I’m using a theory in its original version while briefly discussing how it would be different if we were to use a schism of the theory while one of the reviewers insists that I rewrite it from the perspective of the schismatic view. Theoretical debates are rarely an issue of decisive refutation or strictly cumulative knowledge but rather at any given time there’s a reasonable range of opinions and you shouldn’t demand that the author go with your view but at most that they explore its implications if they were to. Most quants will suggest robustness checks to alternative plausible model specifications without demanding that these alternative models are used in the actual paper’s tables, we should have a similar attitude towards treating robustness or scope conditions to alternative conceptions of theory as something for the footnotes rather than a root and branch reconceptualization of the paper.

There are cases where you fall on one side of a theoretical or methodological gulf and the author on another to the extent that you feel that you can’t really be fair. For instance, I can sometimes read the bibliography of a paper, see certain cites, and know instantly that I’m going to hate the paper. Under such circumstances you as the reviewer have to decide if you’re going to engage in what philosophers of science call “the demarcation problem” and sociologists of science call “boundary work” or you’re going to recuse yourself from the review. If you don’t like something but it has an active research program of non-crackpots then you should probably refuse to do the review rather than agreeing and inevitably rejecting. Note that the managing editor will almost always try to convince you to do the review anyway and I’ve never been sure if this is them thinking I’m giving excuses for being lazy and not being willing to let me off the hook, them being lazy about finding a more appropriate reviewer, or an ill-conceived principle that a good paper should be pleasing to all members of the discipline and thus please even a self-disclaimed hostile reader. Notwithstanding the managing editor’s entreaties, be firm about telling him or her, “no, I don’t feel I could be fair to a paper of type X, but please send me manuscripts of type Y or Z in the future.”

Don’t try to turn the author’s theory section into a lit review.
The author’s theory section should motivate the hypotheses. The theory section is not about demonstrating basic competence or reciting a creedal confession and so it does not need to discuss every book or article ever published on the subject or even just the things important enough to appear on your graduate syllabus or field exam reading list. If “AUTHOR (YEAR)” would not change the way we understand the submission’s hypotheses, then there’s no good reason the author needs to cite it. Yes, that is true even if the “omitted” citation is the most recent thing published on the subject or was written by your favorite grad student who you’re so so proud of and really it’s a shame that her important contribution isn’t cited more widely. If the submission reminds you of a citation that’s relevant to the author’s subject matter, think about whether it would materially affect the argument. If it would, explain how it would affect the argument. If it wouldn’t, then either don’t mention it at all or frame it as an optional suggestion rather than berating the author for being so semi-literate as to allow such a conspicuous literature lacuna.

By materially affect the argument I mostly have in mind the idea that in light of this citation the author would do the analysis or interpret the analysis differently. This is not the same thing as saying “you do three hypotheses, this suggests a fourth.” Rather it’s about this literature shows that doing it that way is ill-conceived and you’re better off doing it this way. It’s simplest if you think about in terms of methods where we can imagine a previous cite demonstrates how important it is for this phenomena that one models censorship, specifies a particular form for the dependent variable, or whatever. Be humble in this sort of thing though lest it turn into brainstorming.

Another form of materially affecting the argument would be if the paper is explicitly pitched as novel but it is in fact discussing a well understood problem. It is not necessarily a problem if the article discusses an issue in terms of literature X but does not also review literature Y that is potentially related. However it is a problem if the author says nobody has ever studied issue A in fashion B when there is in fact a large literature from subfield Y that closely parallels what the author is pitching. More broadly, you should call the authors on setting up straw man lit review, where one special case of that would be “there is no literature.” (Note to authors: be very careful with “this is unprecedented” claims). Again, be humble in how you apply this lest it turn into a pretext for demanding that every article not only motivate its positive contribution, but also be prefaced with an exhaustive review that would be suitable for publication in ARS.

There is one major exception to the rule that a paper should have a theory section and not a lit review, which is when the authors are importing a literature that is likely to be unfamiliar to their audience and so they need more information than usual to get up to speed. Note though that this is an issue best addressed by the reviewers who are unfamiliar with the literature and for whom it is entirely appropriate to say something like “I was previously unfamiliar with quantum toad neuropathology and I suspect other readers will be as well so I ask that rather than assuming a working knowledge of this literature that the author please add a bit more background information to situate the article and point to a good review piece or textbook for those who want even more background.” Of course that’s rarely how the “do more lit review” comments go. Rather such comments tend to be from people with a robust knowledge of theory X and they want to ensure that the authors share that knowledge and gavage it into the paper’s front end. I’m speaking from personal experience as on several occasions I have used theories that are exotic to sociologists and while several of the reviewers said they were glad to learn about this new-to-them theory and how it fits with more mainstream sociology like peanut butter and chocolate, nobody asked for more background on it. And I’m cool with that since it means my original drafts provided sufficient background info for them to get the gist of the exotic theory and how it was relevant. Of course, I did get lots of “you talk about Bourdieu, but only for ten pages when you could easily go for twenty.” That is, nobody wants to know more about something they didn’t know before and need a little more background knowledge to get up to speed, but everybody wants to yell “play Freebird!” This is exactly backwards of how it should be.


Don’t let flattery give you a big head
It is customary for authors to express their gratitude to the reviewers. You might take from this to think, “ahhh, Gabriel’s wrong about R&Rs being broken,” or more likely “that may be true of other reviewers, but I provide good advice since, after all, they thank me for it.” Taking at face value an author who gushes about what a wonderful backseat driver you are is like watching a prisoner of war saying “I would like to thank my captors for providing me with adequate food and humane treatment even as my country engages in unprovoked imperialist aggression against this oppressed people.” Meanwhile he’s blinking “G-E-T-M-E-O-U-T-O-F-H-E-R-E” in Morse code.

Appreciate the constraints imposed on the author by the journal:
Many journals impose a tight word count. When you ask an author to also discuss this or that, you’re making it very difficult for them to keep their word count. One of the most frustrating things as an author is getting a cover letter from the editor saying “Revise the manuscript to include a half dozen elaborate digressions demanded by the reviewers, but don’t break your word count.”

Some journals demand that authors include certain material and you need to respect that. ASR is obsessed with articles speaking to multiple areas of the discipline. This necessarily means that an article that tries to meet this mandate won’t be exclusively oriented towards your own subfield and it may very well be that its primary focus is on another literature and its interest in your own literature being secondary. Don’t view this as an incompetent member of your own subfield but as a member of another subfield trying (under duress) to build bridges to your subfield. Similarly some journals demand implications for social policy or for managers. Even if you would prefer value neutrality (or value-laden but with a different set of values) or think it’s ridiculous to talk as if firms will change their business practices because somebody did a t-test, appreciate that this may be a house rule of the journal and the author is doing the best she can to meet it.

Stand up to the editors:
You can be the good guy. Or if necessary, you can demand a coup de grace. But either way you can use your role as a reviewer to push the editors and your fellow reviewers towards giving the authors a more streamlined review process.

First, you can respond to the other parts of the reviews and response memo from the previous round. If you think the criticisms were unfair or that the author responded to them effectively, go ahead and say so. It makes a big difference to the author if she can make explicit that the other reviewers are with her.

Second, you can cajole the editors to make a decision already. In your second round R&R review tell the editors that there’s never going to be a complete consensus among the reviewers and they should stop dicking the authors around with R&Rs. You can refuse to be the dreaded “new reviewer.” You can refuse to review past the first round R+R. You can tell the editors that you’re willing to let them treat your issues as a conditional accept adjudicated by them rather than as another R&R that goes back to you for review.

Just as importantly as being nice, you can tell the editors to give a clean reject. Remember, an R&R does not mean “good but not great” or “honorable mention” but “this could be rewritten to get an accept.” Some flaws (often having to do with sampling or generalizability) are of a nature that they simply can’t be fixed so even if you like the other aspects of the paper you should just reject. Others may be fixable in principle (often having to do with the lit review or model specification) but in practice doing so would require you to rewrite the paper for the authors and it benefits nobody for you to appoint yourself anonymous co-author. Hence my last point…

Give decisive rejections

I’ve emphasized how to be nice to the authors by not burdening them with superfluous demands However it’s equally important to be decisive about things that are just plain wrong. I have a lot of regrets about my actions as a peer reviewer and if I were to go through my old review reports right now I’d probably achieve Augustinian levels of self-reproach. Many of them of are of the nature of “I shouldn’t have told that person to cite/try a bunch of things that didn’t really matter because by so doing I was being the kind of spend-the-next-year-on-the-revisions-to-make-the-paper-worse reviewer I myself hate to get.” However, I don’t at all regret, for instance, a recommendation to reject that I wrote in which I pointed out that the micro-mechanisms of the author’s theory were completely incompatible with the substantive processes in the empirical setting and that the quantitative model was badly misspecified. Nor do I regret recommending to reject a paper because it relied on really low quality data and its posited theoretical mechanism was a Rube Goldberg device grounded in a widely cited but definitively debunked paper. Rather my biggest regret as a reviewer is that I noticed a manuscript had a grievous methodological flaw that was almost certainly entirely driving the results but I raised the issue in a hesitant fashion and the editor published the paper anyway. As I’ve acquired more experience on both sides of the peer review process, I’ve realized that being a good peer reviewer isn’t about being nice, nor is it about providing lots of feedback. Rather being a good reviewer is about evaluating which papers are good and which papers are bad and clearly justifying those decisions. I’m honored to serve as a consulting editor for Sociological Science because that is what that journal asks of us but I also aspire to review like that regardless of what journal I’m reviewing for and I hope you will too. (Especially if you’re reviewing my papers).


November 18, 2013 at 9:07 am 12 comments

Is Facebook “Naturally Occurring”?

| Gabriel |

Lewis, Gonzalez, and Kaufman have a forthcoming paper in PNAS on “Social selection and peer influence in an online social network.” The project uses Facebook data from the entire college experience of a single cohort of undergrads at one school in order to pick at the perennial homophily/influence question. (Also see earlier papers from this project).

Overall it’s an excellent study. The data collection and modeling efforts are extremely impressive. Moreover I’m very sympathetic to (and plan to regularly cite) the conclusion that contagious diffusion is over-rated and we need to consider the micro-motives and mechanisms underlying contagion. I especially liked how they synthesize the Bourdieu tradition with diffusion to argue that diffusion is most likely for taste markers that are distinctive in both sense of the term. As is often the case with PNAS or Science, the really good stuff is in the appendix and in this case it gets downright comical as they apply some very heavy analytical firepower to trying to understand why hipsters are such pretentious assholes before giving up and delegating the issue to ethnography.

The thing that really got me thinking though was a claim they make in the methods section:

Because data on Facebook are naturally occurring, we avoided interviewer effects, recall limitations, and other sources of measurement error endemic to survey-based network research

That is, the authors are reifying Facebook as “natural.” If all they mean is that they’re taking a fly on the wall observational approach, without even the intervention of survey interviews, then yes, this is naturally occurring data. However I don’t think that observational necessarily means natural. If researchers themselves imposed reciprocity, used a triadic closure algorithm to prime recall, and discouraged the deletion of old ties; we’d recognize this as a measurement issue. It’s debatable whether it’s any more natural if Mark Zuckerberg is the one making these operational measurement decisions instead of Kevin Lewis.

Another way to put this is to ask where does social reality end and observation of it begin? In asking the question I’m not saying that there’s a clean answer. On one end of the spectrum we might have your basic random-digit dialing opinion survey that asks people to answer ambiguously-worded Likert-scale questions about issues they don’t otherwise think about. On the other end of the spectrum we might have well-executed ethnography. Sure, scraping Facebook isn’t as unnatural as the survey but neither is it as natural as the ethnography. Of course, as the information regimes literature suggests to us, you can’t really say that polls aren’t natural either insofar as their unnatural results leak out of the ivory tower and become a part of society themselves. (This is most obviously true for things like the unemployment rate and presidential approval ratings).

At a certain point something goes from figure to ground and it becomes practical, and perhaps even ontologically valid, to treat it as natural. You can make a very good argument that market exchange is a social construction that was either entirely unknown or only marginally important for most of human history. However at the present the market so thoroughly structures and saturates our lives that it’s practical to more or less take it for granted when understanding modern societies and only invoke the market’s contingent nature as a scope condition to avoid excessive generalization of economics beyond modern life and into the past, across cultures, and the deep grammar of human nature.

We are, God help us, rapidly approaching a situation where online social networks structure and constitute interaction. Once we do, the biases built into these systems are no longer measurement issues but will be constitutive of social structure. During the transitional period we find ourselves in though, let’s recognize that these networks are human artifices that are in the process of being incorporated into social life. We need a middle ground between “worthless” and “natural” for understanding social media data.

December 22, 2011 at 11:07 am 16 comments

We have a protractor

| Gabriel |

Neal Stephenson’s Anathem opens with the worst instrument in the history of survey research. A monastery of cloistered science-monks is about to open its gates to the outside world for a brief decennial festival and they are interviewing one of the few ordinary people with whom they have regular contact about what they can expect outside. The questions are as vague and etic as imaginable and the respondent has a hard time interpreting them. The reason the questions are so bad is that the monks are almost completely cut off from society and the instrument has been in continuous use for millenia.

The monks call society the “secular world” which sounds strange given that these monks are atheists, but makes sense if you remember that “secular” means “in time” and in English we use this word to mean “non-religious” because St. Augustine argued that God exists outside of time and Pope Gelasius elaborated this argument to develop a theory of  the Church’s magisterium. Anyway, the monks in Anathem are so separate from society that in a very real sense they too exist outside of time. To the extent that the outside world does impinge on their experience, it is mostly with the threat of a “sack,” an anti-intellectual pogrom that tends to happen every few centuries.

Thus the novel, especially the first third, is primarily a thought experiment about what it would look like if we were to take the ivory tower as a serious aspiration. I mean, imagine never struggling to figure out what the broader impacts are of your research for purposes of a grant proposal because you’re opposed in principle to the very idea of broader impacts and strive for such perfect lack of them that you asymptotically approach “causal domain shear,” meaning that nothing in the monastery affects the outside world and vice versa. Also, you never go through tenure review because you can stay in the monastery as long as you want (intellectual deadweight are gently shifted to full-time semi-skilled manual labor). OK, there are some pretty big downsides compared to academia here on Earth. You have to do all your calculations by hand as you are forbidden computers and most other modern technology. You spend half the day chanting and gardening. Your only possessions are a bathrobe and a magic beach ball. When you break the rules they punish you by sending you into solitary for a few months to study for a field exam on gibberish. And as previously mentioned, once or twice every thousand years the yokels storm your office and lynch you.

The most easily ridiculed and stereotypically science-fiction-y aspect of the book is the abundance of neologisms. When I started reading it, I found the whole alternative vocabulary very distracting and I did a lot of on-the-fly translation from Stephenson to English. I mean, I understand the need to coin terms like “centarian” (members of a monastery whose cloistered status is relaxed only once a century) when there is no good English equivalent, but it’s mostly* gratuitous to talk about a “Counter-Bazian ark” instead of a “Protestant church,” a “jee-jaw” instead of an “iPhone,” “Syntactic Faculty” instead of “nominalism,” “Semantic Faculty” instead of “naturalism,” or “Orth” instead of “Latin.” Likewise, I found myself constantly interpreting the dates by adding 2200.** Fortunately after a few hundred pages this didn’t bother me, not so much because I thought it was justified, but because I was sufficiently acclimated to it and enjoying the novel that I didn’t notice anymore. Still, I would have preferred it if he just set the book in the future of an alternate Earth which had science monks but didn’t have a bunch of silly vocabulary.

Also, for better or worse, several of the secondary characters were basically recycled from Cryptonomicon. So the outdoorsman Yul and his tom boy girlfriend Cord are basically the same people as the outdoorsman Doug Shaftoe and his daughter Amy. Likewise, the asshole master Procian Fraa Lodoghir is basically the same person as the asshole celebrity postmodernist GEB Kivistik.

Oh, and there’s kung fu.

*If you want to know what I mean by “mostly,” read the book’s spoiler-tastic Wikipedia page and figure it out.

**Aside from the whole science monasteries thing, the book’s backstory closely parallels actual political and intellectual history through the “Praxic” (read: modern) age. Their dating system is pegged to a horrific nuclear war in about the year 2200 AD rather than to the foundation of the Bazian church (read: the birth of Christ). The novel’s present is 3700 years after the nuclear apocalypse, or about the equivalent of the year 6000 AD.

September 20, 2010 at 2:52 pm 1 comment

Allegory of the quant

| Gabriel |

And now I will describe in a figure the enlightenment or unenlightenment of our nature — Imagine human beings living in a school; they have been there from childhood, having their necks and legs chained. In the school there is a computer, and between the computer and the prisoners an LCD display. Inside the computer are databases, who generate various tables and graphs. “A strange parable,” he said, “and strange captives.” They are ourselves, I replied; and they see only the shadows of the images which the computer throws on the LCD; these they give names like “variables” and “models.” Suppose now that you suddenly send them out to do field work and make them look with pain and grief to themselves at the human subjects; will they believe them to be real? Will not their eyes be dazzled, and will they not try to get away from the light to something which is already in machine-readable format with a well-documented codebook and a reasonably good sample design and response rate?

April 19, 2010 at 5:07 am 4 comments

Misc Links

| Gabriel |

  • There’s a very interesting discussion at AdAge comparing buzz metrics (basically, data mining blogs and Twitter) to traditional surveys. Although the context is market research, this is an issue that potentially has a lot of relevance for basic research and so I recommend it even to people who don’t particularly care about advertising. The epistemological issue is basically the old validity versus generalizability debate. Surveys are more representative of the general consumer but they suffer from extremely low salience and so answers are so riddled with question wording effects and that sort of thing as to be almost meaningless. On the other hand buzz metrics are meaningful but not representative (what kind of person tweets about laundry bleach?). The practical issue is that buzz metrics are cheaper and faster than surveys.
  • I listened to the bhtv between Fodor and Sober and I really don’t get Fodor’s argument about natural selection. He seems to think that the co-occurence of traits is some kind of devastating problem for biology when in fact biologists have well-articulated theories (i.e., “hitchhiking,” “spandrels,” and the “selection for vs. selection of” distinction) for understanding exactly these issues and as implied by the charge “hyper-adaptionist” there’s already an understanding with the field that these make natural selection a little more complicated than it otherwise might be. However the internal critics who raise these issues (e.g., the late Stephen Jay Gould) wouldn’t come anywhere close to claiming that these issues are an anomaly that challenges the paradigm.
  • As a related philosophy of science issue, Phil @ Gelman’s blog has some thoughts on (purposeful or inadvertent) data massaging to fit the model. He takes it as a Bayesian math issue, but I think you can agree with him on Quinean/Kuhnian philosophical grounds.
  • The essay “Why is there no Jewish Narnia?” has been much discussed lately (e.g., Douthat). The essay basically argues that this is because modern Judaism simply is not a mythic religion. The interesting thing though is that it once was, as can be seen clearly in various Babylonian cognates (eg, the parts of Genesis and Exodus from the J source and the 41st chapter of the book of Job). However, as the essay argues, the mythic aspects were driven out by the rabbinic tradition. Myself, I would go further than that and say that the disenchantment really began with P, though I agree that the rabbinate finished it off, as evidenced by the persistence of myth well through the composition of “Daniel” in the 2nd c. BCE. This reminds me of the conclusion to The Sacred Canopy, where Berger basically says disenchantment has been a long-term trend ever since animism gave way to distinct pagan gods and especially with monotheism.
  • Of course the animism -> paganism ->henotheism -> monotheism -> atheism thing isn’t cleanly monotonic as we sometimes see with pagan survivalism. The first episode of the new season of Breaking Bad cold opens with a couple of narcos praying at a shrine to La Santa Muerte. In a great NYer piece on narco culture, one of the worshippers says “Yes, it was true that the Catholic Church disapproved of her ‘Little Skinny One,’ she said. ‘But have you noticed how empty their churches are?'” Maybe Rodney Stark should write his next book on the market theory of religion using Mexican Satanism as a case study of a new market entrant that more effectively pandered to met the needs of worshippers than the incumbent Catholic church, what with its stodgy rules against murder. (This isn’t a critique of Stark. Since he’s fond of Chesterton’s aphorism that when people don’t believe in God they don’t believe in nothing, they believe in anything, I think he’d argue that the popularity of the Santa Muerte cult is the product of a lack of competition among decent religions).
  • The Red Letter feature length deconstructions of the Star Wars prequels are why we have the fair use doctrine. They make dense and creative use of irony, especially with the brilliant contrasts between the narrative and the visual collage. Probably the funniest two segments are the first segment of the Episode I critique when he talks about the poor character development and the fifth segment of the Episode II critique when he plays dating coach for Anakin.

April 8, 2010 at 5:14 am 2 comments

Uncertainty, the CBO, and health coverage

| Gabriel |

[update. #1. i’ve been thinking about these ideas for awhile in the context of the original Orszag v. CBO thing, but was spurred to write and post it by these thoughts by McArdle. #2. MR has an interesting post on risk vs uncertainty in the context of securities markets]

Over at OT, Katherine Chen mentions that IRB seems to be a means for universities to try to tame uncertainty. The risk/uncertainty dichotomy is generally a very interesting issue. It played a huge part in the financial crash in that most of the models and instruments based on them were much better at dealing with (routine) risk than with uncertainty (aka, “systemic risk”). Everyone was aware of the uncertainty but the really sophisticated technologies for risk provided enough comfort to help us ignore that so much was unknowable.

Currently one of the main ways we’re seeing uncertainty in action is with the CBO’s role in health finance reform. The CBO’s cost estimates are especially salient given the poor economy and Orszag/Obama’s framing of the issue as about cost. The CBO’s practice is to score bills based on a) the quantifiable parts of a bill and b) the assumption that the bill will be implemented as written. Of course qualitative parts of a bill and the possibility of time inconsistency are huge elements of uncertainty on the likely fiscal impact of any legislation. The fun thing is that this is a bipartisan frustration.

When the CBO scored an old version of the bill it said it would be a budget buster, which made Obama’s cost framing look ridiculous and scared the hell out of the blue dogs. This infuriated the pro-reform people who (correctly) noted that the CBO had not included in its estimates that IMAC would “bend the cost curve,” and thus decrease the long-term growth in health expenditures by some unknowable but presumably large amount. That is to say, the CBO balked at the uncertainty inherent in evaluating a qualitative change and so ignored the issue, thereby giving a cost estimate that was biased upwards.

More recently the CBO scored another version of the bill as being reasonably cheap, which goes a long way to repairing the political damage of its earlier estimate. This infuriates anti-reform people who note (correctly) that the bill includes automatic spending cuts and historically Congress has been loath to let automatic spending cuts in entitlements (or for that matter, scheduled tax hikes) go into effect. That is to say, the CBO balked at the uncertainty inherent in considering whether Congress suffers time inconsistency and so ignored the issue, thereby giving a cost estimate that was biased downwards.

That is to say, what looks like a straight forward accounting exercise is only partly knowable and the really interesting questions are inherently qualitative ones like do we trust IMAC to cut costs and do we trust Congress to stick to a diet. And that’s not even getting into real noodle-scratchers like pricing in the possibility that an initially cost-neutral plan chartered as a GSE would eventually get general fund subsidies or what will happen to the tax base when you factor in that making coverage less tightly coupled to employment should yield improvements in labor productivity.

September 18, 2009 at 5:18 pm

If at first you don’t succeed, try a different specification

| Gabriel |

Cristobal Young (with whom I overlapped at Princeton for a few years) has an article in the last ASR on model uncertainty, with an empirical application to religion and development. This is similar to the issue of publication bias but more complicated and harder to formally model. (You can simulate the model uncertainty problem as to control variables but beyond that it gets intractable).

In classic publication bias, the assumption is that the model is always the same and it is applied to multiple datasets. This is somewhat realistic in fields like psychology where many studies are analyses of original experimental data. However in macro-economics and macro-sociology there is just one world and so to a first approximation what happens is that there is basically just one big dataset that people just keep analyzing over and over. To a lesser extent this is true of micro literatures that rely heavily on secondary analyses of a few standard datasets (e.g., GSS and NES for public opinion; PSID and ADD-health for certain kinds of demography; SPPA for cultural consumption). What changes between these analyses is the models, most notably assumptions about the basic structure (distribution of dependent variable, error term, etc), the inclusion of control variables, and the inclusion of interaction terms.

Although Cristobal doesn’t put it like this, my interpretation is that if there were no measurement error, this wouldn’t be a bad thing as it would just involve people groping towards better specifications. However if there is error then these specifications may just be fitting the error rather than fitting the model. Cristobal shows this pretty convincingly by showing that the analysis is sensitive to the inclusion of data points suspected to be of low quality.

I think it’s also worth honoring Robert Barro for being willing to cooperate with a young unknown researcher seeking to debunk one of his findings. A lot of established scientists are complete assholes about this kind of thing and not only won’t cooperate but will do all sorts of power plays to prevent publication.

Finally, see this poli sci paper which does a meta-analysis of their two flagship journals and finds a suspicious number of papers that are just barely significant. Although, they describe the issue as “publication bias,” I think the issue is really model uncertainty.

September 17, 2009 at 3:30 pm

Scientific inference, part 4 of 4

Having gone over Popper’s readiness to reject theory, and Quine’s readiness to reject data (kind of), let’s get to the Goldilocks of scientific epistemology, Thomas Kuhn. Kuhn was not trained as a philosopher but as a physicist when he started teaching a course on the history of science at Harvard and as such his work is mostly descriptive whereas Popper and Quine are prescriptive. In teaching this course he developed the ideas that he eventually published as The Structure of Scientific Revolutions. While the term “paradigm shift” is what most people remember about this book — and which became fashionable among business types in the 1990s — the real interest is what happens within the paradigms.

A paradigm is a more or less cohesive agenda and set of guiding principles. The paradigm thus shows scientists what sort of questions to ask and what are reasonable ways to go about answering them. Kuhn refers to work within a paradigm as “normal science” which mostly consists of “puzzle-solving.” Contra Popper, normal science consists of working out minor puzzles about exactly how the paradigm works, but not if the paradigm works as this is taken for granted. This is holism in practice in both a positive and negative sense. In the positive sense, the paradigm provides coherence to the universe and presents manageable chunks of reality for the scientist to chew on. As anticipated by Duhem, one of the things the paradigm does is tell the scientist what a meaningful problem looks like. (Note that this implies that the “grounded theory” method is not very Kuhnian). In the negative sense, the paradigm can blind the scientist to evidence against it for when observations contradict the theory they are impossible to interpret and must be shunted off to some auxiliary hypothesis.

Whenever evidence contradicts predictions of the paradigm this does not cause the rejection of theory but merely presents “anomalies.” These anomalies can be temporarily accommodated by speculation about measurement error or epiphenomenal forces. Eventually though such perturbations in the web of belief make it so convoluted as to be less than useful. At this point a scientist or small circle of scientists creates a new paradigm which can accommodate the anomalies. Unlike the cruft-besotten old paradigm, the new paradigm is internally consistent, but it is also fairly vague and may lack details as to exactly how the paradigm work. Fleshing out these details thus becomes puzzle-solving for a new generation of normal science and we come full circle.

I choose the engineering metaphor “kruft” advisedly. Think of a newly created paradigm as like version 1.0 of a computer operating system. Over time the OS encounters problems and people create often clumsy solutions to work around these problems. Eventually these solutions clutter the original elegance of the system and create kruft. Then you have a choice, you can stick with it or you can create something from scratch. The new thing will be more elegant but it won’t have worked out lots of specific problems, say, device drivers. So switching to the new OS is cleaner, but is not fully fleshed out enough to deal with the messiness of reality. Switching from Windows to Linux is going to improve your memory allocation since it’s actually designed for a powerful computer, not jerry-rigged out of something designed for a 386, but it’s also going to be harder to get your application software and your monitor to behave with the OS. The same is true of a new paradigm, it’s going to solve many of the old anomalies but it’s also not going to say much concretely about a lot of empirical problems.

Thus, Kuhn essentially took holism and ran with it, but added a somewhat Popperian element during unsettled periods. It is especially worth noting that Victorian scientists had identified numerous anomalies in the Newtonian paradigm which Einsten was able to solve with the general relativity paradigm. Thus Kuhn’s model of science is a very good description of the positivists’ favorite case of good science.

Combining holism and positivism is something of a Goldilocks problem. If Popper wants us to be ready to discount theory and Quine wants us to be ready to discount data, Kuhn finds a way that they can both be right. Most of the time his model resembles holism, but when the web becomes too knotted by accommodating anomalies it shatters and we get a paradigm shift somewhat like Popper’s falsification.

April 3, 2009 at 10:24 am

Scientific inference, part 3 of 4

Yesterday I talked about positivism, which a lot of empirically-minded sociologists and other scientists think is a nifty term. What we tend not to know is that W. V. Quine published “Two Dogmas of Empiricism” in the Philosophic Review in 1951 and basically destroyed positivism. The two dogmas in question are that 1) there is a meaningful boundary between synthetic and analytical and 2) that a discrete synthetic statement can be evaluated. Quine feels that really these two dogmas have the same conceptual flaw but he treats them as relatively distinct so as to make his critique isomorphic to positivism itself. On the synthetic/analytic dichotomy, Quine’s critique is basically that it gets very messy distinguishing between a definition and a finding as they take the same grammatical form. Even more radically, he claimed that the profound weirdness of quantum physics demonstrates that even abstract logic is an empirical question.

The second critique is more directly of interest to practicing scientists. This “underdeterminism” problem shares a lot with an earlier argument by Pierre Duhem and so is sometimes known as the Duhem-Quine thesis. The positivists understood a version of this, called the “auxiliary hypotheses” problem but they underestimated how problematic it was. When we state a hypothesis, it is implied that the hypothesis is expected to hold ceteris paribus. The assumption is that the evidence will test a hypothesis if (and only if) the auxiliary hypotheses are well behaved. This raises the problem that when we encounter evidence that is facially contrary to a hypothesis we cannot be sure if this is really evidence against the hypothesis, or is it only one of the ceteri making mischief by failing to remain parilis.

One of the best known problems of this sort was Marx’s various failed historical predictions, most notably that the revolution would occur in an industrial nation (but also other expectations such as the emisseration of the proletariat). Many people, including both Popper and Gramsci, took this to mean that Marx was simply wrong. However many Marxists argued that there was nothing wrong with the theory of dialectical materialism as such, it merely hadn’t explicitly anticipated the skill and charisma of Lenin or the agency that the bourgeois state showed in creating the welfare state to defuse class struggle in the industrial world. Thus in this imagining Marx’s hypothesis (the Germans or English will have a socialist revolution) was suppressed by the auxiliary hypothesis that unusually capable leaders would not show up in backwaters that had only recently abandoned serfdom or that the welfare state would save capitalism from itself. The positivists didn’t see this case as especially problematic because they thought that the Marxist apologia of auxiliary hypotheses were embarrassingly ad hoc.

The Duhem-Quine underdeterminism thesis is that auxiliary hypotheses are literally infinite. Some of the examples of these infinite auxiliary hypotheses philosophers give are kind of silly, like “elves do not cause equipment to give inaccurate readings on Tuesdays” but it’s hard to say whether blaming elves is really any sillier than claiming that men make history but not in the circumstances of their choosing except for V I Lenin because he’s just so awesome. However positivists would have no problem saying that when a scientist makes such an interpretation it is clearly the fault of the scientist and not of either elves or Lenin. Unfortunately, Duhem and Quine argued that there are a very large number of very plausible auxiliary hypotheses. Prime among these are the innumerable ways in which data collection can suffer measurement error. Furthermore there is the fact that many of our measurement tools are themselves based in theories which conceivably could be wrong. For example, say your hypothesis is that if you heat a gas in an airtight and rigid chamber the pressure will rise and your barometer finds that the pressure does not rise. This could be interpreted as evidence against Boyle’s law or it could be that Boyle was right and you have one of the following problems:

• your chamber is not airtight and/or rigid

• your heater and/or thermometer are broken

• your barometer is broken

• barometers don’t actually measure pressure

• an infinite number of other more or less plausible auxiliary hypotheses

If we ignore the infinite number of unstated auxiliary hypotheses and focus on the specific ones, you can imagine testing each of them in turn. For instance, you could measure that your chamber is indeed airtight by putting a rat in it and seeing that it suffocates. But these verifications are themselves beset with problems such as that maybe the rat had a heart attack despite an abundance of oxygen, or perhaps it takes more ventilation to sustain a rat than to relieve a slowly expanding gas. The problem is recursive so that ultimately you can always spin a (progressively more convoluted) story that your original hypothesis was correct. In some cases this is actually a good thing to do since things like sloppy lab work are pretty common and if we never blamed anomalous results on auxiliary hypothesis we’d soon run out of theories. Here’s a true story, when I was in high school physics I once “measured” gravity to be almost 11 m/s^2. It never occurred to me that I had just disproven the the standard figure of 9.8 m/s^2 but rather I thought it rather obvious that I had made a mistake.

While it’s easiest to illustrate with the hard sciences, the issue of theory dependent tools is also a problem for social science. For instance, before we can get to the real problems, social scientists have to implicitly or explicitly decide issues like whether income is an adequate proxy for total consumption, how to reduce millions of jobs into hundreds of occupations and ultimately into something as manageable as the EGP class schema or the ISEI synthetic prestige index, the magnitude of social desirability bias, and how long (if ever) it takes informants to relax and act normally in front of an ethnographer. In my own substantive interest of radio, everyone agrees what the key hypothesis is (broadcasting monopolies create diverse content) and what the evidence shows facially (yes they do) but there is a big debate over an essentially auxiliary hypothesis about the quality of the evidence (whether “format” is a meaningful proxy for content).

Despite his rejection of positivism, Quine was no nihilist or skeptic. Indeed, he was explicit about offering a post-positivist way to recover empiricism. Quine felt that ultimately we cannot test any discrete hypothesis but only the entire system of science. However even in limited and narrow cases we must accommodate the evidence so that if we wish to salvage a particular hypothesis against contradictory evidence we must displace the doubt to some more or less specific auxiliary hypotheses. Quine speaks of belief as a web, fabric, or force field and treats surprising observations as not necessarily discrediting any particular belief but prodding the field as a whole and deforming it. There is a loose coupling between observations and beliefs so the main hypothesis may withstand the contrary observation but the anomaly’s evidentiary weight has to go somewhere. Implicit in Quine’s positive agenda is that parsimony is a worthy goal (lest evidence be diffused through the web indefinitely) but it is debatable whether parsimony is a distinctly scientific value or merely an aesthetic principle. This post-positivist empiricist agenda (usually called “holism”) is a bit fuzzy and its ambiguities would not be resolved until Kuhn.

April 1, 2009 at 7:01 am

Scientific Inference, part 2 of 4

Sociologists tend to divide themselves into “positivists” and “social constructionists” (with “enlightened positivist” sometimes being a middle ground), but these terms don’t do justice to the philosophy of science and if taken seriously neither model is very appealing. Likewise, many scientists will tell you that they follow Popper’s falsifiable hypothesis logic but neither does this reflect the way science is actually done (or ought to be done). We’ll go over several approaches to science, all of which agree that science is possible and desirable, but differ as to exactly what this means. The central problem that all of them are directly or indirectly attempting to grasp with is that of rigorous induction — how do we translate observations about the universe into understandings of natural law without being blinded by our preconceptions.

For both historical reasons and because he is so often invoked by practicing scientists, it’s probably worth starting with Popper. Karl Popper was originally very excited by the social theories of Marx and the psychology of Freud and Adler, at one point working in Adler’s lab. He grew frustrated though when he saw how absolutely any evidence could be made to fit within their theories with even facially disconfirming evidence being interpreted as the result of a previously unstated contingency or of the system’s ability to sublimate contradiction. In contrast, when Einstein stated his theory of general relativity he extrapolated from it a very specific prediction about how during a solar eclipse it would be apparent that the gravity of the sun bends starlight. Sir Arthur Eddington observed an eclipse and found that Einstein’s predictions were correct, but the point is not that Einstein was right but that he gave a specific prediction that could have been wrong. Popper found this contrast fascinating and set out to build an intellectual career out of contrasting science (exemplified by Einstein) with pseudo-science (exemplified by the Marxists and Freudians). Note that Popper thought that there was nothing inherently pseudo-scientific about social or behavioral inquiries, he just wasn’t fond of these particular examples.


Popper’s essential insight from this contrast was that confirmation is cheap. He gave the example of a theory that all swans are white. It would be fairly easy to make a long list of white swans in much the same way that Freud made a long list of people whose neuroses derived from sublimated sexuality. Popper said a much better thing to do would be to search for black swans and fail to find any. In fact (as seen in the photo I took at the Franklin Park Zoo in Boston) there are black swans so we can reject the “all swans are white” hypothesis. Freud never looked for his black swans, which in his case would be neurotics without sublimated sexuality. Worse yet, Freud didn’t really have a non-tautological measure of sublimation so his theory is literally not falsifiable. In contrast, Einstein made a very specific prediction about how the stars would appear during a solar eclipse such that any astronomer could examine a photo of an eclipse and see whether it matched Einstein’s prediction.

Popper can be summed up as emphasizing the importance of “falsifiable hypotheses” as the definitive characteristic of science. Such a definition worked for him as he was far less interested in how science works than in defining what is not science. This is why one of the worst things you can say about a scientist is that his work is “not even wrong” as it implies that the scientist has lapsed into metaphysics. Philosophers call this agenda “demarcation” and sociologists call it “boundary work.” In our time the principle demarcation problem is creationism whereas for Popper himself it was mostly about Marxism and psychoanalysis.

[Creationism comes in two major forms, a hardcore “young Earth” version and a more squishy “intelligent design.” Young Earth creationism argues that Genesis is literally true and about 6000 years ago God created the heavens and the Earth in 144 hours and a few generations later created a massive flood that essentially rebooted the world. Intelligent design accepts the broad outlines of the conventional scientific view of the age of the Earth and the procession of natural history but argues that divine intervention routinely adjusts natural history, chiefly through being responsible for speciation. One bizarre consequence of this is that intelligent design is too vague to test, whereas young Earth creationism gives very concrete predictions (all of which are demonstrably false). Thus in strict Popperian terms intelligent design is more pseudo-scientific than young Earth creationism as the latter gives testable (albeit false) hypotheses whereas the former does not.]

Positivists also use obviously ridiculously things like astrology, pyramidology, and parapsychology for calibrating the gun sights, with the assumption being that any good demarcation criteria should be able to explain why astrology is bullshit. Popper went so far as to say that the theory of natural selection is not scientific because in practice “fitness” is defined tautologically as “that which is associated with survival.” However in a famous lecture he eventually recanted and argued that even if it is tautological to label any given common allele as promoting fitness, we can make a falsifiable hypothesis that selective advantage is more important for explaining complex organs than such alternative descent processes as genetic drift. (Note that this comes very close to saying that while a specific hypothesis is not testable the paradigm taken as a whole is and so here Popper was implicitly embracing holism).

While the idea of the “falsifiable hypothesis” was Popper’s key contribution, it’s worth also reviewing the logical positivist school with which he was loosely affiliated. The positivists drew a very strong distinction between synthetic (empirical data) and analytic (math and logic). Any statement that could not be described as either synthetic or analytic they derided as metaphysics (or more whimsically, “music” or “poetry”). Popper’s work fit within the positivist framework as it assumed a sort of deduction-induction cycle where the scientist would use logic to derive falsifiable hypotheses from theory, then collect data to test these hypotheses. That’s the technical meaning of “positivist” in philosophy, but sociologists usually use the term casually as the opposite of “deconstructionist” or “postmodernist” to mean someone who believes that science is possible without being hopelessly mired by subjectivity. Our usage completely loses any philosophic notions about strong distinctions between analytic, synthetic, and metaphysical and many sociologists who describe themselves as “positivists” probably really mean only that they are empiricists or scientific realists.

March 31, 2009 at 8:06 am

Older Posts

The Culture Geeks