Nested logit vs multinomial logit

June 6, 2009 at 7:23 pm 1 comment

| Gabriel |

Megan McArdle has been on the warpath (most notably here and here) against the new study by TARP czar Elizabeth Warren showing that an increasing proportion of bankruptcies are medical. The crux of McArdle’s argument is that the Warren study is misleading because it fails to put the finding in the context of a dramatically reduced number of bankruptcies overall such that the absolute number of medical bankruptcies have fallen. The political element is that McArdle believes that Warren and her co-authors are being intentionally coy on this point so as to make the private insurance status quo look worse than it is and thereby build political support for universal coverage.

Bracketing the merits of the political argument, McArdle is completely right about this as a methods issue. The Warren study is based on a sample composed entirely of bankruptcy filers, whom they classify as “medical bankruptcy” vs. “nonmedical bankruptcies” and predict the distinction with a logit. McArdle’s point is that by limiting the sample to bankrupts, the study is implicitly modeling p(medicalbankruptcy | bankruptcy) and this is weird because p(bankruptcy) recently experienced a major policy shock whereas nothing about medical expenses has experienced comparable short run changes. This is a valid critique but it’s only the tip of the iceberg. More generally, it makes no sense to model the causal process this way because it is not as if people first become bankrupt and only then determine whether they had health problems driving them to it. For the same reasons it makes no sense to do a study of violent death by taking a sample at the morgue and modeling p(homicide | death). Basically, they were doing a nested logit and omitting the first stage.

When multiple causes (medical debts vs other debts, homicide vs natural causes) can lead to the same outcome (bankruptcy, death) it makes no sense to assume the outcome then model the cause. Ideally what you want to do is to have a sample of the general population and treat the causes as competing risks in a multinomial logit framework. A multinomial logit where medical and other bankruptcies (and an omitted outcome category of solvency) are competing risks would be much better at answering the policy question of how much is medical debt leading to bankruptcy. Of course this would require a greater (and perhaps unfeasible) data collection, but even absent that you could adjust for these issues by presenting a supplementary analysis comparing the filers to general population data on several variables and the Warren team did not do this.

It’s obviously the case that medical debt causes bankruptcy and not the other way around but I think multinomial logit is to be preferred even when it’s merely ambiguous. For instance last year Center for American Progress and Free Press issued a report on talk radio. The finding of the report was that the reason left wing talk radio isn’t a major format is because deregulation has led to more stations being owned by (presumably right-wing) big corporations and fewer owned by (presumably left-wing) women, minorities, and local owners. The analytical core of the report was a nested logit where first a station is political talk vs other format and then a political talk station is right vs. left. Note that this assumes that the causal process is that first a station owner decides to make a station political talk and then decides which politics the station should support. Now (speaking as a radio specialist) I think this is plausible but it’s probably more likely that management (who tend to think in terms of practical issues like where do we get content and which audiences might we attract) will see left-wing political talk and right-wing political talk as two distinct formats and so it’s better to model them as competing risks.

This sounds nitpicky but it actually makes a big difference. If you do it as nested logit (which they did) you see that first, corporate stations are much more likely to have political talk, and second that corporate stations are much more likely to have their political talk stations be right-wing. (To its credit the report documents both stages but it gives much more emphasis to the second stage). On the other hand, if you do it as competing risks you see that corporate and other stations are equally likely to have left-wing talk radio but corporate stations are especially likely to have right-wing talk. The upshot is that if you took the report’s policy recommendations and reinstated ownership caps you’d probably still have very few left-wing political talk stations but would find that most of the currently abundant right-wing political talk stations would switch to pop music or sports.

Of course this isn’t to say that nested logit is always a bad idea. There are a whole class of problems involving what we might call “pipeline” or “cursus honorum” issues for which nested logit is perfectly appropriate. For instance, Rob Mare (who is a friend and colleague) did his early work on education by treating it as a nested logit problem where one first starts (or doesn’t) a given degree, then completes it (or doesn’t), etc. So for instance you can describe p(BA)=p(BA|startcollege)*p(startcollege|HS)*p(HS). Until Rob did this people had treated education as a continuous “years of education” variable which they modeled with OLS. What Mare’s technique demonstrated was that family background has different effects at different stages.

Anyway, the bottom line is that you should think carefully about what the causal process is before trying to model it. In some circumstances a nested logit is appropriate but just as often it is not. Following my general principle never to suspect malice when incompetence is plausible, I think many people do nested logit when they ought to be doing multinomial logit simply out of ignorance as mlogit is often considered a more “advanced” technique than regular logit. For instance, Acock‘s Stata book covers regular logit but not mlogit.


Entry filed under: Uncategorized.

Anonymous pdf (updated) Off by 50 or off by 10?

1 Comment

  • 1. greg swinand  |  February 5, 2011 at 5:00 am

    Gabriel — terrific discussion of Nested v Multnomial logit. The standard discussion (about common unobserved factors) is all very good when model selection is obvious (e.g., classic transport and mode selection problems), but in these more grey cases you’ve gotten down to some good intuition on which model to use. Thanks Greg Swinand

The Culture Geeks

%d bloggers like this: