Author Archive

Fool’s Gold and Organizational Inertia

| Pierre |

Daniel Beunza at Socializing Finance links to Donald MacKenzie’s LRB review of Fool’s Gold, Gillian Tett’s new book on the financial crisis. I just finished reading the book, and I can only recommend it. Tett is an editor at the Financial Times; she also has a PhD in anthropology from Cambridge, which probably explains why the book somehow reads more like an econ-soc analysis of the crisis than as a journalistic account. In his review piece, MacKenzie gives a clear and detailed overview of the book’s main argument — and as such, his review is one of the best and most accessible accounts of the recent developments in structured finance I’ve read so far. But there’s a point that bothered me in the book, and which MacKenzie doesn’t seem to touch on:

Tett tells us about the crisis mostly from the standpoint of bankers and credit derivative specialists at J.P. Morgan — a bank which, by and large, stayed out the mortgage-backed securities mess and emerged relatively unscathed from the crisis. There’s nothing suprising about this: it is probably easier to find people willing to be interviewed on the crisis at J.P. Morgan nowadays than at, say, Citigroup or AIG. But this angle is precisely what makes the story so fascinating. The book starts from a simple but puzzling observation: the credit instruments at the center of the crisis (those pools of loans that were sliced into multiple tranches with distinct risk levels, which were in turn sold with overly optimistic ratings, aka collateralized debt obligations or CDOs) originated not from the home mortgage market but from the corporate bond market. Arguably, the crisis was largely caused by the belief that the same structured products could easily be used to transfer home mortgage risk away from banks’ balance sheets—although estimating the risk of CDO tranches turned out to be much more complex for mortgages than for corporate debt (in particular, there wasn’t any reliable data allowing to estimate the correlation of default probabilities). But surprisingly, the pioneer and leader in corporate debt CDOs, J.P. Morgan, decided not to further their advantage in structured finance: instead of moving into the mortgage-backed securities market and applying the same recipes they had just developed for corporate debt on a much wider scale, J.P. Morgan largely stayed out the market. Incentives were there: J.P. Morgan had expertise in such structured products; investment banks could collect enormous fees for underwriting CDOs and the market was booming; but JP Morgan executives, Tett observes, stayed on the sidelines and were even puzzled by the development of the market. So, what happened?

Tett’s account is essentially an organizational story. She argues that the prevailing culture at J.P. Morgan (a rather old-fashioned, boring and elitist institution) favored more prudent risk management strategies and more effective oversight than at other banks. This is an interesting hypothesis, but it may be, in part, the product of Tett’s methodology: the evidence supporting this argument comes mostly from interviews with J.P. Morgan executives — which are of course subject to response bias, recall bias and so forth. MacKenzie seems to generally agree with Tett’s thesis: what made J.P. Morgan different from other banks was the foresight of its management. Maybe reading too much organizational sociology has made me more cynical than I should be, but there’s an alternative explanation that Tett never fully explores or dismisses: organizational inertia. At the begining of the housing bubble, J.P. Morgan was specialized in corporate debt, and had no experience with home mortgages unlike BoA or Citi (J.P. Morgan was later absorbed by Chase Manhattan — but even then, only Chase got involved in mortgage CDOs). I do not doubt that organizational culture played an important role, but I did not see much evidence in the book that J.P. Morgan’s corporate culture (coupled with its expertise in structured finance) made its management more aware of the fragility of the mortgage CDOs than its competitors — if this were truly the case, one would have expected the bank to bet against the market by massively shorting CDO indexes, as a few hedge funds did. The alternative hypothesis, of course, is that organizational culture contributes to organizational inertia — and while it did not necessarily make J.P. Morgan’s executives more prudent or more aware of the risks inherent in the mortgage market, it may have prevented the bank from taking positions (long or short) in a segment of the industry it did not belong to.

June 10, 2009 at 5:43 pm 1 comment

R, Stata and descriptive stats

| Pierre |

It’s amazing how R can make complicated things look simple and simple things look complicated.

I tried to explain in my previous post that R could have important advantages over Stata when it came to managing weird, “non-rectangular” datasets that need to be transformed/combined/reshaped in non-trivial ways: R makes it much easier to work on several datasets at the same time, and different types of objects can be used in consistent ways.

Still, I haven’t completely stopped using Stata: one of the things that bother me when I use R is the lack of nice and quick descriptive statistics functions like “tabulate” or “tabstat”. Of course, it is possible to use standard R functions to get about the same desired output, but they tend to be quite a bit more cumbersome. Here’s an example:

tabstat y, by(group) stats(N mean p10 median p90)

could be translated into R as:

tapply(levels(group), levels(group), function(i)
cbind(N=length(y[group == i],
mean(y[group == i]),
quantile(y[group == i], c(.1,.5,.9)))

or, for a more concise version:

by(y, group, function(x)

That’s quite ugly compared to the simple tabstat command, but I could deal with it… Now suppose I am working on survey data and observations have sampling weights, and the syntax will have to get even more complicated — I’d have to think about something for a few minutes, when all Stata would need is a quick [fw=weight] statement added before the comma.

True, R can deal with survey weights, but it almost never matches the simplicity of Stata when all I am trying to do is get a few simple descriptive statistics on survey data:

One of my latest problems with R involved trying to make a two-way table of relative frequencies by column with weighted data… yes, a simple contingency table! The table() function cannot even compare with Stata’s tabulate twoway command, since:

  1. it does not handle weights;
  2. it does not report marginal distributions in the last row and column of the table (which I always find helpful);
  3. it calculates cell frequencies but not relative frequencies by row or column.

Luckily, writing an R function that can achieve this is not too hard:

col.table <- function(var1, var2, weights=rep(1,length(var1)), margins=TRUE){
# Creating table of (weighted) relative frequencies by column, and adding row variable margins as the last column
crosstab <- prop.table(xtabs(weights ~ var1 + var2), margin=2)
t <- cbind(crosstab, Total=prop.table(xtabs(weights ~ var1)))
# Adding column sums in the last row
t <- rbind(t,Total = colSums(t))
# Naming rows and columns of the table after var1 and var2 used, and returning result
names(dimnames(t)) <- c(deparse(substitute(var1)), deparse(substitute(var2)))

col.table(x,y,w) gives the same output as Stata’s “tabulate x y [fw=w], col nofreq”. Note that the weight argument is optional so that: col.table(x,y) is equivalent to tabulate x y, col nofreq.

Here’s the same function, but for relative distributions by row:

row.table <- function(var1, var2, weights=rep(1,length(var1)), margins=TRUE){
t <- rbind(prop.table(xtabs(weights ~ var1 + var2), margin=1),
Total=prop.table(xtabs(weights ~ var2)))
t <- cbind(t,Total = rowSums(t))
names(dimnames(t)) <- c(deparse(substitute(var1)), deparse(substitute(var2)))

May 6, 2009 at 7:41 pm 11 comments

R, Stata and “non-rectangular” data

| Pierre |

Thanks Gabriel for letting me join you on this blog. For those who don’t know me, my name is Pierre, I am a graduate student in sociology at Princeton and I’ve been doing work on organizations, culture and economic sociology (guess who’s on my committee). Recently, I’ve become interested in diffusion processes — in quite unrelated domains: the emergence of new composers and their adoption in orchestra repertoires, the evolution of attitudes towards financial risk, the diffusion of stock-ownership and the recent stock-market booms and busts.

When Gabriel asked me if I wanted to post on this Stata/soc-of-culture-oriented blog, I first told him I was actually slowly moving away from Stata and using R more and more frequently… which is going to be the topic of my first post. I am not trying to start the first flamewar of “Code and culture” — rather I’d like to argue that both languages have their own strengths and weaknesses; the important thing for me is not to come to a definitive conclusion (“Stata is better than R” or vice versa) and only use one package while discarding the other, but to identify conditions under which R or Stata are more or less painful to use for the type of data analysis I am working on.

People usually emphasize graphics functions and the number of high-quality user-contributed packages for cutting-edge models as being R’s greatest strengths over other statistical packages. I have to say I don’t run very often into R estimation functions for which I can’t find an equivalent Stata command. And while I agree that R-generated graphs can be amazingly cool, Stata has become much better in recent years. For me, R is particularly useful when I need to manipulate certain kinds of data and turn them into a “rectangular” dataset:

Stata is perfect for “rectangular” data, when the dataset fits nicely inside a rectangle of observations (rows) and variables (colums) and when the conceptual difference between rows and columns is clear — this is what a dataset will have to look like just before running a regression. But things can get complicated when the raw dataset I need to manipulate is not already “rectangular”: this may include network data and multilevel data — even when the ultimate goal is to turn these messy-looking data, sometimes originating from multiple sources, into a nice rectangular dataset that can be analyzed with a simple linear model… Sure, Stata has a few powerful built-in commands (although I’d be embarrassed to say how many times I had to recheck the proper syntax for “reshape” in the Stata help). But sometimes egen, merge, expand, collapse and reshape won’t do the trick… and I find myself sorting, looping, using, saving and merging until I realize (too late of course!) that Stata can be a horrible, horrible tool when it comes to manipulating datasets that are not already rectangular. R on the other hand has two features that make it a great tool for data management:

  1. R can have multiple objects loaded in memory at the same time. Stata on the other hand can only work on one dataset at a time — which is not just inefficient (you always need to write the data into temporary files and read a new file to switch from one dataset to another), it can also  unnecessarily add lines to the code and create confusion.
  2. R can easily handle multiple types of objects: vectors, matrices, arrays, data frames (i.e. datasets), lists, functions… Stata on the other hand is mostly designed to work on datasets: most commands take variables or variable lists as input; and when Stata tries to handle other types of objects (matrices, scalars, macros, ado files…), Stata uses distinct commands each with a different syntax (e.g. “matrix define”, “scalar”, “local”, “global”, “program define” instead of “generate”…) and sometimes a completely different language (Mata for matrix operations — which I have never had the patience to learn). R on the other hand handles these objects in a simple and consistent manner (for example it uses the same assignment operator “<-” for a matrix, a vector, an array, a list or a function…) and can extract elements which are seamlessly “converted” into other object types (e.g. a column of a matrix, or coefficients/standard errors from a model are by definition vectors, which can be treated as such and added as variables in a data frame, without even using any special command à la “svmat”).

In my next post, I’ll try to explain why I keep using Stata despite all this…

April 28, 2009 at 12:57 pm 9 comments

The Culture Geeks