Posts Tagged philosophy of science
Uncertainty, the CBO, and health coverage
| Gabriel |
[update. #1. i've been thinking about these ideas for awhile in the context of the original Orszag v. CBO thing, but was spurred to write and post it by these thoughts by McArdle. #2. MR has an interesting post on risk vs uncertainty in the context of securities markets]
Over at OT, Katherine Chen mentions that IRB seems to be a means for universities to try to tame uncertainty. The risk/uncertainty dichotomy is generally a very interesting issue. It played a huge part in the financial crash in that most of the models and instruments based on them were much better at dealing with (routine) risk than with uncertainty (aka, “systemic risk”). Everyone was aware of the uncertainty but the really sophisticated technologies for risk provided enough comfort to help us ignore that so much was unknowable.
Currently one of the main ways we’re seeing uncertainty in action is with the CBO’s role in health finance reform. The CBO’s cost estimates are especially salient given the poor economy and Orszag/Obama’s framing of the issue as about cost. The CBO’s practice is to score bills based on a) the quantifiable parts of a bill and b) the assumption that the bill will be implemented as written. Of course qualitative parts of a bill and the possibility of time inconsistency are huge elements of uncertainty on the likely fiscal impact of any legislation. The fun thing is that this is a bipartisan frustration.
When the CBO scored an old version of the bill it said it would be a budget buster, which made Obama’s cost framing look ridiculous and scared the hell out of the blue dogs. This infuriated the pro-reform people who (correctly) noted that the CBO had not included in its estimates that IMAC would “bend the cost curve,” and thus decrease the long-term growth in health expenditures by some unknowable but presumably large amount. That is to say, the CBO balked at the uncertainty inherent in evaluating a qualitative change and so ignored the issue, thereby giving a cost estimate that was biased upwards.
More recently the CBO scored another version of the bill as being reasonably cheap, which goes a long way to repairing the political damage of its earlier estimate. This infuriates anti-reform people who note (correctly) that the bill includes automatic spending cuts and historically Congress has been loath to let automatic spending cuts in entitlements (or for that matter, scheduled tax hikes) go into effect. That is to say, the CBO balked at the uncertainty inherent in considering whether Congress suffers time inconsistency and so ignored the issue, thereby giving a cost estimate that was biased downwards.
That is to say, what looks like a straight forward accounting exercise is only partly knowable and the really interesting questions are inherently qualitative ones like do we trust IMAC to cut costs and do we trust Congress to stick to a diet. And that’s not even getting into real noodle-scratchers like pricing in the possibility that an initially cost-neutral plan chartered as a GSE would eventually get general fund subsidies or what will happen to the tax base when you factor in that making coverage less tightly coupled to employment should yield improvements in labor productivity.
Add comment September 18, 2009
If at first you don’t succeed, try a different specification
| Gabriel |
Cristobal Young (with whom I overlapped at Princeton for a few years) has an article in the last ASR on model uncertainty, with an empirical application to religion and development. This is similar to the issue of publication bias but more complicated and harder to formally model. (You can simulate the model uncertainty problem as to control variables but beyond that it gets intractable).
In classic publication bias, the assumption is that the model is always the same and it is applied to multiple datasets. This is somewhat realistic in fields like psychology where many studies are analyses of original experimental data. However in macro-economics and macro-sociology there is just one world and so to a first approximation what happens is that there is basically just one big dataset that people just keep analyzing over and over. To a lesser extent this is true of micro literatures that rely heavily on secondary analyses of a few standard datasets (e.g., GSS and NES for public opinion; PSID and ADD-health for certain kinds of demography; SPPA for cultural consumption). What changes between these analyses is the models, most notably assumptions about the basic structure (distribution of dependent variable, error term, etc), the inclusion of control variables, and the inclusion of interaction terms.
Although Cristobal doesn’t put it like this, my interpretation is that if there were no measurement error, this wouldn’t be a bad thing as it would just involve people groping towards better specifications. However if there is error then these specifications may just be fitting the error rather than fitting the model. Cristobal shows this pretty convincingly by showing that the analysis is sensitive to the inclusion of data points suspected to be of low quality.
I think it’s also worth honoring Robert Barro for being willing to cooperate with a young unknown researcher seeking to debunk one of his findings. A lot of established scientists are complete assholes about this kind of thing and not only won’t cooperate but will do all sorts of power plays to prevent publication.
Finally, see this poli sci paper which does a meta-analysis of their two flagship journals and finds a suspicious number of papers that are just barely significant. Although, they describe the issue as “publication bias,” I think the issue is really model uncertainty.
Add comment September 17, 2009
Scientific inference, part 4 of 4
Having gone over Popper’s readiness to reject theory, and Quine’s readiness to reject data (kind of), let’s get to the Goldilocks of scientific epistemology, Thomas Kuhn. Kuhn was not trained as a philosopher but as a physicist when he started teaching a course on the history of science at Harvard and as such his work is mostly descriptive whereas Popper and Quine are prescriptive. In teaching this course he developed the ideas that he eventually published as The Structure of Scientific Revolutions. While the term “paradigm shift” is what most people remember about this book — and which became fashionable among business types in the 1990s — the real interest is what happens within the paradigms.
A paradigm is a more or less cohesive agenda and set of guiding principles. The paradigm thus shows scientists what sort of questions to ask and what are reasonable ways to go about answering them. Kuhn refers to work within a paradigm as “normal science” which mostly consists of “puzzle-solving.” Contra Popper, normal science consists of working out minor puzzles about exactly how the paradigm works, but not if the paradigm works as this is taken for granted. This is holism in practice in both a positive and negative sense. In the positive sense, the paradigm provides coherence to the universe and presents manageable chunks of reality for the scientist to chew on. As anticipated by Duhem, one of the things the paradigm does is tell the scientist what a meaningful problem looks like. (Note that this implies that the “grounded theory” method is not very Kuhnian). In the negative sense, the paradigm can blind the scientist to evidence against it for when observations contradict the theory they are impossible to interpret and must be shunted off to some auxiliary hypothesis.
Whenever evidence contradicts predictions of the paradigm this does not cause the rejection of theory but merely presents “anomalies.” These anomalies can be temporarily accommodated by speculation about measurement error or epiphenomenal forces. Eventually though such perturbations in the web of belief make it so convoluted as to be less than useful. At this point a scientist or small circle of scientists creates a new paradigm which can accommodate the anomalies. Unlike the cruft-besotten old paradigm, the new paradigm is internally consistent, but it is also fairly vague and may lack details as to exactly how the paradigm work. Fleshing out these details thus becomes puzzle-solving for a new generation of normal science and we come full circle.
I choose the engineering metaphor “kruft” advisedly. Think of a newly created paradigm as like version 1.0 of a computer operating system. Over time the OS encounters problems and people create often clumsy solutions to work around these problems. Eventually these solutions clutter the original elegance of the system and create kruft. Then you have a choice, you can stick with it or you can create something from scratch. The new thing will be more elegant but it won’t have worked out lots of specific problems, say, device drivers. So switching to the new OS is cleaner, but is not fully fleshed out enough to deal with the messiness of reality. Switching from Windows to Linux is going to improve your memory allocation since it’s actually designed for a powerful computer, not jerry-rigged out of something designed for a 386, but it’s also going to be harder to get your application software and your monitor to behave with the OS. The same is true of a new paradigm, it’s going to solve many of the old anomalies but it’s also not going to say much concretely about a lot of empirical problems.
Thus, Kuhn essentially took holism and ran with it, but added a somewhat Popperian element during unsettled periods. It is especially worth noting that Victorian scientists had identified numerous anomalies in the Newtonian paradigm which Einsten was able to solve with the general relativity paradigm. Thus Kuhn’s model of science is a very good description of the positivists’ favorite case of good science.
Combining holism and positivism is something of a Goldilocks problem. If Popper wants us to be ready to discount theory and Quine wants us to be ready to discount data, Kuhn finds a way that they can both be right. Most of the time his model resembles holism, but when the web becomes too knotted by accommodating anomalies it shatters and we get a paradigm shift somewhat like Popper’s falsification.
Add comment April 3, 2009
Scientific inference, part 3 of 4
Yesterday I talked about positivism, which a lot of empirically-minded sociologists and other scientists think is a nifty term. What we tend not to know is that W. V. Quine published “Two Dogmas of Empiricism” in the Philosophic Review in 1951 and basically destroyed positivism. The two dogmas in question are that 1) there is a meaningful boundary between synthetic and analytical and 2) that a discrete synthetic statement can be evaluated. Quine feels that really these two dogmas have the same conceptual flaw but he treats them as relatively distinct so as to make his critique isomorphic to positivism itself. On the synthetic/analytic dichotomy, Quine’s critique is basically that it gets very messy distinguishing between a definition and a finding as they take the same grammatical form. Even more radically, he claimed that the profound weirdness of quantum physics demonstrates that even abstract logic is an empirical question.
The second critique is more directly of interest to practicing scientists. This “underdeterminism” problem shares a lot with an earlier argument by Pierre Duhem and so is sometimes known as the Duhem-Quine thesis. The positivists understood a version of this, called the “auxiliary hypotheses” problem but they underestimated how problematic it was. When we state a hypothesis, it is implied that the hypothesis is expected to hold ceteris paribus. The assumption is that the evidence will test a hypothesis if (and only if) the auxiliary hypotheses are well behaved. This raises the problem that when we encounter evidence that is facially contrary to a hypothesis we cannot be sure if this is really evidence against the hypothesis, or is it only one of the ceteri making mischief by failing to remain parilis.
One of the best known problems of this sort was Marx’s various failed historical predictions, most notably that the revolution would occur in an industrial nation (but also other expectations such as the emisseration of the proletariat). Many people, including both Popper and Gramsci, took this to mean that Marx was simply wrong. However many Marxists argued that there was nothing wrong with the theory of dialectical materialism as such, it merely hadn’t explicitly anticipated the skill and charisma of Lenin or the agency that the bourgeois state showed in creating the welfare state to defuse class struggle in the industrial world. Thus in this imagining Marx’s hypothesis (the Germans or English will have a socialist revolution) was suppressed by the auxiliary hypothesis that unusually capable leaders would not show up in backwaters that had only recently abandoned serfdom or that the welfare state would save capitalism from itself. The positivists didn’t see this case as especially problematic because they thought that the Marxist apologia of auxiliary hypotheses were embarrassingly ad hoc.
The Duhem-Quine underdeterminism thesis is that auxiliary hypotheses are literally infinite. Some of the examples of these infinite auxiliary hypotheses philosophers give are kind of silly, like “elves do not cause equipment to give inaccurate readings on Tuesdays” but it’s hard to say whether blaming elves is really any sillier than claiming that men make history but not in the circumstances of their choosing except for V I Lenin because he’s just so awesome. However positivists would have no problem saying that when a scientist makes such an interpretation it is clearly the fault of the scientist and not of either elves or Lenin. Unfortunately, Duhem and Quine argued that there are a very large number of very plausible auxiliary hypotheses. Prime among these are the innumerable ways in which data collection can suffer measurement error. Furthermore there is the fact that many of our measurement tools are themselves based in theories which conceivably could be wrong. For example, say your hypothesis is that if you heat a gas in an airtight and rigid chamber the pressure will rise and your barometer finds that the pressure does not rise. This could be interpreted as evidence against Boyle’s law or it could be that Boyle was right and you have one of the following problems:
• your chamber is not airtight and/or rigid
• your heater and/or thermometer are broken
• your barometer is broken
• barometers don’t actually measure pressure
• an infinite number of other more or less plausible auxiliary hypotheses
If we ignore the infinite number of unstated auxiliary hypotheses and focus on the specific ones, you can imagine testing each of them in turn. For instance, you could measure that your chamber is indeed airtight by putting a rat in it and seeing that it suffocates. But these verifications are themselves beset with problems such as that maybe the rat had a heart attack despite an abundance of oxygen, or perhaps it takes more ventilation to sustain a rat than to relieve a slowly expanding gas. The problem is recursive so that ultimately you can always spin a (progressively more convoluted) story that your original hypothesis was correct. In some cases this is actually a good thing to do since things like sloppy lab work are pretty common and if we never blamed anomalous results on auxiliary hypothesis we’d soon run out of theories. Here’s a true story, when I was in high school physics I once “measured” gravity to be almost 11 m/s^2. It never occurred to me that I had just disproven the the standard figure of 9.8 m/s^2 but rather I thought it rather obvious that I had made a mistake.
While it’s easiest to illustrate with the hard sciences, the issue of theory dependent tools is also a problem for social science. For instance, before we can get to the real problems, social scientists have to implicitly or explicitly decide issues like whether income is an adequate proxy for total consumption, how to reduce millions of jobs into hundreds of occupations and ultimately into something as manageable as the EGP class schema or the ISEI synthetic prestige index, the magnitude of social desirability bias, and how long (if ever) it takes informants to relax and act normally in front of an ethnographer. In my own substantive interest of radio, everyone agrees what the key hypothesis is (broadcasting monopolies create diverse content) and what the evidence shows facially (yes they do) but there is a big debate over an essentially auxiliary hypothesis about the quality of the evidence (whether “format” is a meaningful proxy for content).
Despite his rejection of positivism, Quine was no nihilist or skeptic. Indeed, he was explicit about offering a post-positivist way to recover empiricism. Quine felt that ultimately we cannot test any discrete hypothesis but only the entire system of science. However even in limited and narrow cases we must accommodate the evidence so that if we wish to salvage a particular hypothesis against contradictory evidence we must displace the doubt to some more or less specific auxiliary hypotheses. Quine speaks of belief as a web, fabric, or force field and treats surprising observations as not necessarily discrediting any particular belief but prodding the field as a whole and deforming it. There is a loose coupling between observations and beliefs so the main hypothesis may withstand the contrary observation but the anomaly’s evidentiary weight has to go somewhere. Implicit in Quine’s positive agenda is that parsimony is a worthy goal (lest evidence be diffused through the web indefinitely) but it is debatable whether parsimony is a distinctly scientific value or merely an aesthetic principle. This post-positivist empiricist agenda (usually called “holism”) is a bit fuzzy and its ambiguities would not be resolved until Kuhn.
Add comment April 1, 2009
Scientific Inference, part 2 of 4
Sociologists tend to divide themselves into “positivists” and “social constructionists” (with “enlightened positivist” sometimes being a middle ground), but these terms don’t do justice to the philosophy of science and if taken seriously neither model is very appealing. Likewise, many scientists will tell you that they follow Popper’s falsifiable hypothesis logic but neither does this reflect the way science is actually done (or ought to be done). We’ll go over several approaches to science, all of which agree that science is possible and desirable, but differ as to exactly what this means. The central problem that all of them are directly or indirectly attempting to grasp with is that of rigorous induction — how do we translate observations about the universe into understandings of natural law without being blinded by our preconceptions.
For both historical reasons and because he is so often invoked by practicing scientists, it’s probably worth starting with Popper. Karl Popper was originally very excited by the social theories of Marx and the psychology of Freud and Adler, at one point working in Adler’s lab. He grew frustrated though when he saw how absolutely any evidence could be made to fit within their theories with even facially disconfirming evidence being interpreted as the result of a previously unstated contingency or of the system’s ability to sublimate contradiction. In contrast, when Einstein stated his theory of general relativity he extrapolated from it a very specific prediction about how during a solar eclipse it would be apparent that the gravity of the sun bends starlight. Sir Arthur Eddington observed an eclipse and found that Einstein’s predictions were correct, but the point is not that Einstein was right but that he gave a specific prediction that could have been wrong. Popper found this contrast fascinating and set out to build an intellectual career out of contrasting science (exemplified by Einstein) with pseudo-science (exemplified by the Marxists and Freudians). Note that Popper thought that there was nothing inherently pseudo-scientific about social or behavioral inquiries, he just wasn’t fond of these particular examples.

Popper’s essential insight from this contrast was that confirmation is cheap. He gave the example of a theory that all swans are white. It would be fairly easy to make a long list of white swans in much the same way that Freud made a long list of people whose neuroses derived from sublimated sexuality. Popper said a much better thing to do would be to search for black swans and fail to find any. In fact (as seen in the photo I took at the Franklin Park Zoo in Boston) there are black swans so we can reject the “all swans are white” hypothesis. Freud never looked for his black swans, which in his case would be neurotics without sublimated sexuality. Worse yet, Freud didn’t really have a non-tautological measure of sublimation so his theory is literally not falsifiable. In contrast, Einstein made a very specific prediction about how the stars would appear during a solar eclipse such that any astronomer could examine a photo of an eclipse and see whether it matched Einstein’s prediction.
Popper can be summed up as emphasizing the importance of “falsifiable hypotheses” as the definitive characteristic of science. Such a definition worked for him as he was far less interested in how science works than in defining what is not science. This is why one of the worst things you can say about a scientist is that his work is “not even wrong” as it implies that the scientist has lapsed into metaphysics. Philosophers call this agenda “demarcation” and sociologists call it “boundary work.” In our time the principle demarcation problem is creationism whereas for Popper himself it was mostly about Marxism and psychoanalysis.
[Creationism comes in two major forms, a hardcore “young Earth” version and a more squishy “intelligent design.” Young Earth creationism argues that Genesis is literally true and about 6000 years ago God created the heavens and the Earth in 144 hours and a few generations later created a massive flood that essentially rebooted the world. Intelligent design accepts the broad outlines of the conventional scientific view of the age of the Earth and the procession of natural history but argues that divine intervention routinely adjusts natural history, chiefly through being responsible for speciation. One bizarre consequence of this is that intelligent design is too vague to test, whereas young Earth creationism gives very concrete predictions (all of which are demonstrably false). Thus in strict Popperian terms intelligent design is more pseudo-scientific than young Earth creationism as the latter gives testable (albeit false) hypotheses whereas the former does not.]
Positivists also use obviously ridiculously things like astrology, pyramidology, and parapsychology for calibrating the gun sights, with the assumption being that any good demarcation criteria should be able to explain why astrology is bullshit. Popper went so far as to say that the theory of natural selection is not scientific because in practice “fitness” is defined tautologically as “that which is associated with survival.” However in a famous lecture he eventually recanted and argued that even if it is tautological to label any given common allele as promoting fitness, we can make a falsifiable hypothesis that selective advantage is more important for explaining complex organs than such alternative descent processes as genetic drift. (Note that this comes very close to saying that while a specific hypothesis is not testable the paradigm taken as a whole is and so here Popper was implicitly embracing holism).
While the idea of the “falsifiable hypothesis” was Popper’s key contribution, it’s worth also reviewing the logical positivist school with which he was loosely affiliated. The positivists drew a very strong distinction between synthetic (empirical data) and analytic (math and logic). Any statement that could not be described as either synthetic or analytic they derided as metaphysics (or more whimsically, “music” or “poetry”). Popper’s work fit within the positivist framework as it assumed a sort of deduction-induction cycle where the scientist would use logic to derive falsifiable hypotheses from theory, then collect data to test these hypotheses. That’s the technical meaning of “positivist” in philosophy, but sociologists usually use the term casually as the opposite of “deconstructionist” or “postmodernist” to mean someone who believes that science is possible without being hopelessly mired by subjectivity. Our usage completely loses any philosophic notions about strong distinctions between analytic, synthetic, and metaphysical and many sociologists who describe themselves as “positivists” probably really mean only that they are empiricists or scientific realists.
Add comment March 31, 2009
Scientific Inference, part 1 of 4
Last week I listened to this bloggingheads diavlog between Lipson and Stemwendel and I found it to be really interesting since a lot of philosophy of science and sociology of science is implicit in their conversation. The focus of their conversation is alternative medicine (which they agree is mostly hogwash) and particularly Senator Harkin’s advocacy of continued funding for the National Center on Complementary and Alternative Medicine (NCCAM).
Stemwendel seems to think there is some value in finding the null and disproving kooky ideas, but Lipson takes an even harder line and argues that ideas which are nonsense on the face of it don’t deserved to be dignified by an empirical test (in part because of the problem of false positives being exaggerated by publication bias). I entirely agree with them about the relative merits of traditional vs. alternative medicine and I actually get angry when I hear bullshit about thimerasol or magnets or vague “toxins” but I also had a series of reactions to this issue on a meta level.
Nonetheless, my initial reaction to this was, wow, talk about social closure. Here we have an instance of a contested field and the dominant faction in the field is arguing that the subordinate faction is ipso facto illegitimate and its ideas don’t even merit a hearing. Even though I support the dominant faction on the merits this still seems like the kind of thing that would make J.S. Mill and Karl Popper turn over in their respective graves.
Then I was thinking about how this social closure problem is particularly severe since there’s the structural bias that, unlike traditional medicine, alternative medicine can’t get funding from big pharmaceutical companies and so it’s government grants or nothing. This is a common argument you hear from alternative medicine advocates but when you think about it, it doesn’t really make sense. The argument has several implicit assumptions:
- Alternative medicine is not commercial
- Alternative medicine can’t be protected by intellectual property law
- Intellectual property law is the only way to capture rents from intellectual property
All of these assumptions are false. First, Alternative medicine actually involves a lot of commerce (which unlike traditional medicine is on a cash basis as it’s usually not covered by insurance and thus, if anything, it’s even more commercial). Some of this could be covered by intellectual property law (most of the little gadgets and doodads they produce are patentable) but much of it is not. Nonetheless this doesn’t mean that the sponsors of research can’t make money off of it. You could imagine scenarios under which research could be a kind of club good allowing the sponsors of the research to capture most of the rents flowing from it. For instance, pomegranates have been around at least since Persephone ate those seeds and so it seems like they would be long since off-patent, and hence there would be no private sector incentive to sponsor research promoting pomegranates. Nonetheless a major pomegranate processing company did sponsor research into pomegranates which is why you now see all those ads talking about antioxidants. The processor owned enough of the market that they were able to capture the rents associated with research making pomegranates more attractive, even though at first glance they seem like a commodity. Likewise you might imagine that some kind of major alternative medicine facility (like the Victorian-era resort in “The Road to Wellsville”) could profit by promoting techniques even if it couldn’t patent them.
My final thought was to unpack what it means to say that even if a hypothesis is testable, it is such utter nonsense that it doesn’t deserve to be tested. For instance, consider the claim that putting magnets in your shoes will align your energy field which will improve your health. It would be easy to design a very rigorous experiment that would test this hypothesis. In this sense Popper and the positivists would approve of it as a hypothesis. However the hypothesis is completely incommensurable with the body of scientific knowledge about human physiology. In other words, it falls outside the paradigm. A lot of people (especially those who haven’t actually read Kuhn) see paradigms as a bad thing, a mechanism for social closure and close-mindedness. However Kuhn makes a positive case for paradigms in that it is impossible to come to the world as a naive empiricist and accomplish anything useful. A paradigm is necessary to give order to the world sufficient to motivate hypotheses and make sense of specific findings. In this sense when Lipson argues for defunding NCCAM he’s really making a very Kuhnian argument — we have a paradigm, it seems to be a pretty good paradigm (i.e., it has few anomalies and continues to inspire productive research agendas), and let’s not get distracted with stuff that falls outside of it.
Since we’re now getting into the philosophy of science, over the next few days I’ll be posting my lecture notes on the subject in a series of three posts.
Add comment March 30, 2009