Author Archive

They Meant Us No Harm, But Only Gave Us the Lotus

| Gabriel |

After hearing Sam Quinones on EconTalk, I finally stopped procrastinating and read Dreamland. It only took a few days over which every other activity was a distraction from finishing the book. Dreamland provides a unified story of the opiate epidemic starting in the late 1990s with both the overall social trend and close-ups on the lives of dealers, addicts, doctors, cops, epidemiologists, and mourners. I’ve watched every episode of Justified and read Case and Deaton PNAS 2015, so I was not surprised by the broad argument of the book that a shift in medicine towards prescribing opiates created ubiquitous chemical dependence that was eventually met by black tar heroin, all of which disproportionately affected rust belt white people. What made the book amazing to me even knowing the broad contours of the social facts it describes was how every detail of the book illustrated and illuminated another aspect of sociology. As I remarked on Twitter, my discipline could very well treat Dreamland the same way political scientists treat History of the Peloponnesian War.

In no particular order, here are a few of the themes I noticed.

The dealers who come up from Xalisco, Nayarit to live for a few months in spartan conditions working long hours driving around with balloons of dope in their mouths are motivated by relative deprivation. As more and more dealer-migrants return to Xalisco flush with cash this creates a new standard of living in the village and transforms being an impoverished sugar cane farmer from just how life goes to a status that can be rejected. But relative deprivation is too weak to explain Xalisco life, which is better characterized as competitive feasting straight out of Mauss’s The Gift. Xalisco-style potlatch can occur whenever a migrant returns with suitcases full of Levis 501s to disburse to a receiving line of supplicants, but is especially centered on the corn festival, where migrants would compete by sponsoring banda performances (104). Interestingly, while dealers often planned to save enough wages to capitalize a small business, they tended to dissipate their wealth in gifts to family and “the rest on beer, strip clubs, and cocaine, and walked the streets of Xalisco for a week or two the object of other men’s envy” (261). This envy is something Quinones emphasizes repeatedly and the way it is formed by public feasting and is sublimated into a need to reciprocate so as to restore honor, which in turn creates the labor supply for black tar heroin retailing as men seek another bundle of cash through which to engage in such honorable public profligacy.

Social capital also plays a strong role in explaining how Xalisco drug crews operated, which was distinct from most drug dealers. Notwithstanding a handful of murders in the book, Xalisco dealers generally eschewed violence and never carry guns. Competing heroin crews had an approach of friendly competition rather than violent turf wars over territory. Quinones attributes this partly to their “pizza delivery” business model as compared to traditional corner slinging, but mostly to the thick interconnected ties based in a small rancho back home where everybody knows everybody. Another distinctive aspect of the Xalisco boys business model is that dealers earn a salary, whereas typically drugs are sold on commission. This would normally present a principle-agent problem, but it was not an issue for Xalisco dealers. Crew bosses did engage in monitoring  through calling junkies to confirm that their dealers were prompt, polite, and the heroin was of high quality, but these monitoring costs were feasible because of the high level of trust. Crew bosses basically trusted their dealers because they weren’t junkies (Xalisco boys consider heroin disgusting) and they had thick communal ties from the rancho. This is the positive aspect of social capital, but there is also a negative sense of social capital in that men were pushed into drug dealing and returning to drug dealing by the insatiable demands to support relatives. That’s all supply side, but social capital also characterizes Quinones’s understanding of the demand side, though in a sense closer to Putnam than Portes, in blaming the rise of opiates on the collapse of community. In this aspect of the story Quinones is a staunch communitarian moralist, which didn’t bother me as I’m a communitarian moralist too, but YMMV and blaming opiates on the collapse of community was the only argument in the book that was more tell than show.

On the prescription opiates side, Quinones tells the story of how medicine lost its traditional reluctance to prescribe opiates in the pain revolution and particularly the key role played by Porter and Jick NEJM (1980). The article itself is a one paragraph letter noting that in-patients treated with opiates rarely became addicted. The role of this brief letter in the pain revolution is instructive for scientific epistemology. In terms of scientific epistemology it provides a valuable cautionary tale for the problem of generalizing beyond the scope of the data. The finding showed that in-patients receiving very conservative doses of opiates rarely became addicted but this was interpreted as it being completely safe to provide out-patients with liberal supplies of opiates. In Quinones’s telling, the article is something of a Sleeping Beauty citation, taking off after it was cited in a 1986 Pain article by Foley and Portenoy. However a Google Scholar search shows that the article began getting cited almost immediately (the earliest citation is from 1982 in a nursing journal). Nonetheless the story of how a brief publication summarizing a single database query was interpreted well beyond its original scope conditions to justify risky changes to medical practice can provide grist for the mill of historians and sociologists of science. A key part of the story as to why people cited this tiny publication is because they wanted to believe it as it created a permission structure for prescribing effective but dangerous drugs and pharmaceutical detailing exploited this by promoting Porter and Jick, or even just the black-boxed factoid of “1% addiction rate” to physicians.

A few other themes I noticed:

  • pharmaceutical detailing in opiates, as in all drugs, follows my model of obfuscated transactionalism and Quinones has a lot of material on the history of detailing
  • the submerged state gives Medicaid rather than cash transfers and a lot of diverted opiates came from pill mills paid for through Medicaid fraud
  • Xalisco boys engage in statistical discrimination by only selling to white customers who they see as less likely to rob them than black customers
  • chain migration characterizes some aspects of Xalisco boy migration, but they also are entrepreneurial in relying on junkies as scouts to explore new markets, including ones with no history of Nayarit migrants
  • doctors prescribed opiates in part to get patients out of their offices quickly and prescribed 30 day packs of pills rather than 3 day packs of pills to avoid return visits. Proper pain management is extremely labor intensive, but hard to get insurance reimbursement. This follows logically from Baumol’s disease in that as high-skilled medical labor grows more expensive, insurance companies will substitute capital (drugs).
  • reactivity is everywhere. Pain is part of doctor and hospital ratings, but iatrogenic addiction is not so doctors prescribe dope. Sentencing is based on large quantities of dope and carrying a gun so Xalisco boys carry only small quantities of dope and go unarmed.


And oh yeah, there’s also some stuff in the book about how this is an enormous social and public health epidemic, killing tens of thousands of Americans a year and stealing the souls of many more — debasing them into the kind of people who steal their children’s Christmas presents to trade for pills. But I’d rather focus on how it provides material for developing theory because I prefer to be fascinated than livid and that attitude is how I made it all the way through the book only breaking down in tears once.

January 27, 2017 at 9:55 am Leave a comment

Obfuscated Transactionalism at Cato Unbound

| Gabriel |

From my lead essay at Cato Unbound:

And so we modern people take for granted that we both produce and consume through markets. The idea that we might acquire groceries because the butcher, the baker, and the brewer owe us favors rather than because we hand them cash or a Visa card seems primitive. Nonetheless, there are circumstances where we modern westerners consider prestations more appropriate than purchases. This preference extends well beyond obvious matters of intimacy like sex and Christmas presents and even reaches into business interactions.

Responses from Mike Munger, Alan Fiske, and Alex Tabarrok to follow.

June 6, 2016 at 9:05 am

Ruby Slippers

| Gabriel |


Rod Dreher at The American Conservative has a post on people invoking the concept of “social construction” with his lead example being a speech and debate team that always changes the subject to a critical race theory rant about the conventions of debate itself, even if the pre-specified debate topic is about national service or green energy or whatever. The judge then awards the match to this non sequitur, invoking “social constructionism” to explain himself.

I can get angry about this on a whole other level than Dreher does, precisely because I think social construction is a valuable concept. And I really do take the concept seriously. My PhD training is as a neo-institutionalist (ie, how organizational practices are socially constructed), I have an ASR on market information regimes (ie, how socially constructed market data shapes market behavior), and my current project is on relational work (ie, how exchange is socially constructed as market or social). I also advise grad students on these sorts of topics. So it’s not like I’m some angry epistemological realist who goes around giving swirlies to phenomenologists.

Social construction is a really useful concept, but unfortunately, this really important concept has the misfortune of being popular with idiots who don’t really understand it. When this sort of person says “x is socially constructed” the implication is “therefore we can ignore x.” When I lecture on social constructionism I ridicule this sort of thing as “ruby slippers” social constructionism, as if your sociology professor tells you “why Dorothy, you’ve had the power to solve inequality all along, just click your heels three times and say ‘race is a social construct,’ ‘race is a social construct,’ ‘race is a social construct.'” If you really grok social constructionism, the appropriate reaction to somebody invoking the concept in almost any practical context is to shrug and say “your point being?” If you actually read Berger and Luckmann rather than just get the gist of it from some guy with whom you are smoking weed, you’ll see that the key aspects of social constructionism are intersubjectivity and institutions. That is social construction is important because social interaction is premised on shared conventions and becomes deeply codified to the extent that for most purposes it might as well be objective.

Suppose you had two contractors bidding on remodeling your kitchen. One of them says that it will be done in X days, involving Y materials, and cost you $Z. The other gives you a fascinating (but at times dubious) lecture about whether time exists in the abstract or only relative to perception, the ugly history of exploitation in the formica industry, and the chartalist theory of money. You then go back to the first contractor, who is bewildered and has no rebuttal to the second contractor’s very, um, creative arguments. You would have to be an idiot to award the bid to the second contractor, even if you think they are right about everything they said. As it happens, I actually believe that time, kitchen materials, and money are all socially constructed. It is also true that kitchen remodeling is also a social construct and one of the conventions of that particular social construct is that you talk about things like time, material, and price rather than offer a critical perspective on the same.

March 18, 2016 at 8:03 pm 6 comments


| Gabriel |

This morning Governor Chris Christie endorsed Donald Trump for president. There was widespread speculation that this reflected Christie hoping for an appointment as Attorney General in the event of a Trump victory. This was met with widespread disgust from mainstream conservative intellectuals, all of whom despise Trump (and immediately prior to the endorsement were delighting in Rubio having learned to fight Trump at his own insult comic game).  Over on Twitter, Josh Barro observed that it is precisely Trump’s outsider nature that makes endorsing him attractive for an ambitious Republican politician.

This struck me as very astute and reminded me of Gould’s 2002 AJS on The Origins of Status Hierarchies. This model starts with a cumulative advantage model for status. The trick with cumulative advantage models though is to avoid their natural tendency towards absolute inequality and so the models always have some kind of braking mechanism so the histogram ends up as a power-law, not a step function. For instance, Rosen 1981 uses heterogeneity of taste and diminishing marginal returns to avoid what would otherwise be the implication of his model of exactly one celebrity achieving universal acclaim. Anyway, the point is that cumulative advantage models need a brake, and Gould’s brake is reciprocity. Gould observes that attention and resources are finite and so when someone has many followers, they lose the ability to reciprocate with them. To the extent that followers are attentive not only to the status of a patron, but the attention and resources the patron reciprocates, then their high numbers of followers will swamp the ability of high status patrons to reciprocate and so inhibit their ability to attract new followers. For instance, a grad student might rationally prefer to work with an associate professor who has only a few advisees and so can spend several hours a week with each of them than with a Nobel Laureate who has so many advisees he doesn’t recognize some of them in the hallway.

In this sense, Rubio as the clear favorite of the party establishment has already recruited great masses of political talent. Should Rubio win in November, he will have an embarrassment of riches in terms of followers with whom to fill cabinet positions and other high-ranking political roles. That is to say, Rubio’s ability to reciprocate the support of his followers is swamped by the great number of followers he has acquired. (I’m talking about followers among the sorts of people likely to be appointed to administration positions, I’ll get to voters later). This then makes some potential followers decide to affiliate with a patron who is not too busy for them, and hence Chris Christie is hoping to spend the next eight years building RICO cases against people who use the term “short-fingered vulgarian.”

But, there’s a problem with this, which is that status itself provides resources, especially in a system where power is not continuous but winner-take-all. (The discontinuity is really important, as Schilke and I argued recently). In this sense, it shouldn’t matter that a candidate with few endorsements has the fewest supporters competing for patronage because that candidate would lose and so not have patronage to allocate. That would be true if the political science model nicknamed “the party decides”(which we can generalize as the endogeneity of status competition) were true. But if that model were true, we would be seeing Rubio (who recruited the most intellectuals) or Jeb! (who raised the most money) as the clear front-runner and that is anything but the case since the GOP primary this cycle has been consistently dominated by outsiders (Trump, briefly Carson, and even Cruz, who is a senator but a notably un-collegial one).

This then suggests that we have to recognize that power, including the ability to allocate resources to followers, is not necessarily a function of how many followers one has. In ordinary times it might be, especially in the Republican party which normally follows the party decides model. However in this year it is clear that popularity in opinion polls and primaries/caucuses has no (positive) correlation with establishment support. This may be because Trump, like Lenin, is a figure of such immense charisma that he can defy the models. Or it may be that the base is revolting over a substantive issue like immigration. Or maybe the support of neo-Nazis with a bizarre interest in anime and the Frankfurt school is the secret sauce. Whatever the exact nature of why the party decides model is breaking, the fact is that it is. The Republican primary reminds me of Bourdieu’s model of a field of mass cultural production and a restricted field of production. Rubio is clearly dominating in the restricted field of elite conservative opinion, but that does him very little good considering how effective Trump is at the mass field. If we view the competition for endorsements not as an isolated system, but one that is loosely coupled to an adjacent system of competition for voters, then the status competition for endorsements is no longer entirely endogenous but there is a source of exogenous power shaping it. (In the Gould model this would be subsumed as part of Q_j). Hence Trump’s great popularity with voters despite his great unpopularity with party elites makes him more attractive than he would otherwise be to party elites who will break ranks and affiliate with the demagogue.

In Trump’s case, his fame, wit, and shamelessness have gained him the support of voters and this has disrupted the otherwise endogenous system of endorsements, however the model could generalize to any source of power outside of the endogenous process of consensus building within party elites. A very similar model would apply to those political actors who welcome a foreign invader as supporters in domestic disputes they would otherwise lose. Americans take for granted that the opposition party will be a loyal opposition and so we abide by the maxim that “politics ends at the water’s edge,” which is why periods like the Second Red Scare (or from the other perspective, the Popular Front that preceded it) seem so anomalous. However for centuries, machinations to set yourself up as a client-state after relying on imperial powers to depose the current batch of elites is most of what politics was. In such a scenario, a political actor who lacks much power within the internal dynamics of oligarchy could still acquire followers if they seemed to be favored by the forces massing across the border. So we might expect a lot of ambitious mitteleuropean politicians to affiliate with heretofore minor fascist parties c 1938, or with heretofore minor communist parties c 1943.

February 26, 2016 at 5:21 pm

Who Said It? Gift Exchange Lit vs Article on LASD

| Gabriel |

For each quote, guess the source: a classic of gift exchange or a Los Angeles Times article about deposed Sheriff and soon to be plea bargainee, Lee Baca. Highlight the text to see the answers and score your quiz!

“Until he has given back, the receiver is ‘obliged,’ expected to show his gratitude towards his benefactor or at least to show regard for him, go easy on him, pull his punches…” (Bourdieu Logic of Practice)

“The etiquette of the feast, of the gift that one receives with dignity, but is not solicited, is extremely marked among these tribes.” (Mauss The Gift)

“I don’t solicit any gifts. I’ve never asked for a gift.… People just do it for me.” (Los Angeles Times)

“When you’re taking gifts from strangers, there’s only one reason. They only give gifts because they want something.” (Los Angeles Times)

“These, however, are but the outward signs of kindness, not the kindnesses themselves.” (Seneca Benefits)

“What they’re expressing is appreciation for the respectful way we do business.” (Los Angeles Times)

“No one is really unaware of the logic of exchange … but no one fails to comply with the rules of the game, which is to act as if one did not know the rule.” (Bourdieu Pascalian Meditations)

“Nobody is free to refuse the present that is offered.” (Mauss The Gift)

“My life would be much easier if people did not give me gifts.” (Los Angeles Times)


February 10, 2016 at 11:57 am

Scraping Twitter with Python

| Gabriel |

As long-time readers will remember, I have been collecting Twitter with the R library(twitteR). Unfortunately that workflow has proven to be buggy, mostly for reasons having to do with authentication. As such I decided to learn Python and migrate my project to the Twython module. Overall, I’ve been very impressed by the language and the module.  I haven’t had any dependency problems and authentication works pretty smoothly. On the other hand, it requires a lot more manual coding to get around rate limits than does twitteR and this is a big part of what my scripts are doing.

I’ll let you follow the standard instructions for installing Python 3 and the Twython module before showing you my workflow. Note that all of my code was run on Python 3.5.1 and OSX 10.9. You want to use Python 3, not Python 2 as tweets are UTF-8. If you’re a Mac person, OSX comes with 2.7 but you will need to install Python3. For the same reason, use Stata 14 for tweets.

One tip on installation, pip tends to default to 2.7 so use this syntax in bash.

python3   -m pip install twython

I use three py scripts, one to write Twython queries to disk, one to query information about a set of Twitter users, and one to query tweets from a particular user. Note that the query scripts can be slow to execute, which is deliberate as otherwise you end up hitting rate limits. (Twitter’s API allows fifteen queries per fifteen minutes). I call the two query scripts from bash with argument passing. The disk writing script is called by the query scripts and doesn’t require user intervention, though you do need to be sure Python knows where to find it (usually by keeping it in the current working directory). Note that you will need to adjust things like file paths and authentication keys. (When accessing Twitter through scripts instead of your phone, you don’t use usernames and passwords but keys and secrets, you can generate the keys by registering an application).

I am discussing this script first even though it is not directly called by the user because it is the most natural place to discuss Twython’s somewhat complicated data structure. A Twython data object is a list of dictionaries. (I adapted this script for exporting lists of dictionaries). You can get a pretty good feel for what these objects look like by using type() and the pprint module. In this sample code, I explore a data object created by

type(users) #shows that users is a list
type(users[0]) #shows that each element of users is a dictionary
#the objects are a bunch of brackets and commas, use pprint to make a dictionary (sub)object human-readable with whitespace
import pprint
pp.pprint(users[0]['status']) #you can also zoom in on daughter objects, in this case the user's most recent tweet object. Note that this tweet is a sub-object within the user object, but may itself have sub-objects

As you can see if you use the pprint command, some of the dictionary values are themselves dictionaries. It’s a real fleas upon fleas kind of deal. In the script I pull some of these objects out and delete others for the “clean” version of the data. Also note that tw2csv defaults to writing these second-level fields as one first-level field with escaped internal delimiters. So if you open a file in Excel, some of the cells will be really long and have a lot of commas in them. While Excel automatically parses the escaped commas correctly, Stata assumes you don’t want them escaped unless you use this command:

import delimited "foo.csv", delimiter(comma) bindquote(strict) varnames(1) asdouble encoding(UTF-8) clear

Another tricky thing about Twython data is there can be variable number of dictionary entries (ie, some fields are missing from some cases). For instance, if a tweet is not a retweet it will be missing the “retweeted_status” dictionary within a dictionary. This was the biggest problem with reusing the Stack Overflow code and required adapting another piece of code for getting the union set of dictionary keys. Note this will give you all the keys used in any entry from the current query, but not those found uniquely in past or future queries. Likewise, Python sorts field order randomly. For these two reasons, I hard-coded tw2csv as overwrite, not append, and build in a timestamp to the query scripts. If you tweak the code to append, you will run into problems with the fields not lining up.

Anyway, here’s the actual tw2csv code.
def tw2csv(twdata,csvfile_out):
    import csv
    import functools
    allkey = functools.reduce(lambda x, y: x.union(y.keys()), twdata, set())
    with open(csvfile_out,'wt') as output_file:

One of the queries I like to run is getting basic information like date created, description, and follower counts. Basically, all the stuff that shows up on a user’s profile page. The Twitter API allows you to do this for 100 users simultaneously and I do this with the script. It assumes that your list of target users is stored in a text file, but there’s a commented out line that lets you hard code the users, which may be easier if you’re doing it interactively. Likewise, it’s designed to only query 100 users at a time, but there’s a commented out line that’s much simpler in interactive use if you’re only querying a few users.

You can call it from the command line and it takes as an argument the location of the input file. I hard-coded the location of the output. Note the “3” in the command-line call is important as operating systems like OSX default to calling Python 2.7.

python3 list.txt

And here’s the actual script. Note that I’ve taken out my key and secret. You’ll have to register as an “application” and generate these yourself.
from twython import Twython
import sys
import time
from math import ceil
import tw2csv #custom module

targetlist=sys.argv[1] #text file listing feeds to query, one per line. full path ok.
today = time.strftime("%Y%m%d")

APP_KEY='' #25 alphanumeric characters
APP_SECRET='' #50 alphanumeric characters
twitter=Twython(APP_KEY,APP_SECRET,oauth_version=2) #simple authentication object

handles = [line.rstrip() for line in open(targetlist)] #read from text file given as cmd-line argument
#handles=("gabrielrossman,sociologicalsci,twitter") #alternately, hard-code the list of handles

#API allows 100 users per query. Cycle through, 100 at a time
#users = twitter.lookup_user(screen_name=handles) #this one line is all you need if len(handles) < 100
users=[] #initialize data object
#unlike a get_user_timeline query, there is no need to cap total cycles
for i in range(0, cycles): ## iterate through all tweets up to max of 3200
    del handles[0:100]
    incremental = twitter.lookup_user(screen_name=h)
    time.sleep(90) ## 90 second rest between api calls. The API allows 15 calls per 15 minutes so this is conservative


This last script collects tweets for a specified user. The tricky thing about this code is that the Twitter API allows you to query the last 3200 tweets per user, but only 200 at a time, so you have to cycle over them. moreover, you have to build in a delay so you don’t get rate-limited. I adapted the script from this code but made some tweaks.

One change I made was to only scrape as deep as necessary for any given user. For instance, as of this writing, @SociologicalSci has 1192 tweets, so it cycles six times, but if you run it in a few weeks @SociologicalSci would have over 1200 and so it would run at least seven cycles. This change makes the script run faster, but ultimately gets you to the same place.

The other change I made is that I save two versions of the file, one as is and the other that pulls out some objects from the subdictionaries and deletes the rest. If for some reason you don’t care about retweet count but are very interested in retweeting user’s profile background color, go ahead and modify the code. See above for tips on exploring the data structure interactively so you can see what there is to choose from.

As above, you’ll need to register as an application and supply a key and secret.

You call it from bash with the target screenname as an argument.

python3 sociologicalsci
from twython import Twython
import sys
import time
import simplejson
from math import ceil
import tw2csv #custom module

handle=sys.argv[1] #takes target twitter screenname as command-line argument
today = time.strftime("%Y%m%d")

APP_KEY='' #25 alphanumeric characters
APP_SECRET='' #50 alphanumeric characters
twitter=Twython(APP_KEY,APP_SECRET,oauth_version=2) #simple authentication object

#adapted from
#user_timeline=twitter.get_user_timeline(screen_name=handle,count=200) #if doing 200 or less, just do this one line
user_timeline=twitter.get_user_timeline(screen_name=handle,count=1) #get most recent tweet
lis=user_timeline[0]['id']-1 #tweet id # for most recent tweet
#only query as deep as necessary
tweetsum= user_timeline[0]['user']['statuses_count']
cycles=ceil(tweetsum / 200)
if cycles>16:
    cycles=16 #API only allows depth of 3200 so no point trying deeper than 200*16
for i in range(0, cycles): ## iterate through all tweets up to max of 3200
    incremental = twitter.get_user_timeline(screen_name=handle,
    count=200, include_retweets=True, max_id=lis)
    time.sleep(90) ## 90 second rest between api calls. The API allows 15 calls per 15 minutes so this is conservative


#clean the file and save it
for i, val in enumerate(user_timeline):
    if 'retweeted_status' in user_timeline[i].keys():
        user_timeline[i]['rt_count'] = user_timeline[i]['retweeted_status']['retweet_count']
        user_timeline[i]['qt_id'] = user_timeline[i]['retweeted_status']['id']
        user_timeline[i]['rt_created'] = user_timeline[i]['retweeted_status']['created_at']
        user_timeline[i]['rt_user_screenname'] = user_timeline[i]['retweeted_status']['user']['name']
        user_timeline[i]['rt_user_id'] = user_timeline[i]['retweeted_status']['user']['id']
        user_timeline[i]['rt_user_followers'] = user_timeline[i]['retweeted_status']['user']['followers_count']
        del user_timeline[i]['retweeted_status']
    if 'quoted_status' in user_timeline[i].keys():
        user_timeline[i]['qt_created'] = user_timeline[i]['quoted_status']['created_at']
        user_timeline[i]['qt_id'] = user_timeline[i]['quoted_status']['id']
        user_timeline[i]['qt_text'] = user_timeline[i]['quoted_status']['text']
        user_timeline[i]['qt_user_screenname'] = user_timeline[i]['quoted_status']['user']['name']
        user_timeline[i]['qt_user_id'] = user_timeline[i]['quoted_status']['user']['id']
        user_timeline[i]['qt_user_followers'] = user_timeline[i]['quoted_status']['user']['followers_count']
        del user_timeline[i]['quoted_status']
    if user_timeline[i]['entities']['urls']: #list
        for j, val in enumerate(user_timeline[i]['entities']['urls']):
    if user_timeline[i]['entities']['user_mentions']: #list
        for j, val in enumerate(user_timeline[i]['entities']['user_mentions']):
            user_timeline[i][mentionj] = user_timeline[i]['entities']['user_mentions'][j]['screen_name']
    if user_timeline[i]['entities']['hashtags']: #list
        for j, val in enumerate(user_timeline[i]['entities']['hashtags']):
            user_timeline[i][hashtagj] = user_timeline[i]['entities']['hashtags'][j]['text']
    if user_timeline[i]['coordinates'] is not None:  #NoneType or Dict
        user_timeline[i]['coord_long'] = user_timeline[i]['coordinates']['coordinates'][0]
        user_timeline[i]['coord_lat'] = user_timeline[i]['coordinates']['coordinates'][1]
    del user_timeline[i]['coordinates']
    del user_timeline[i]['user']
    del user_timeline[i]['entities']
    if 'place' in user_timeline[i].keys():  #NoneType or Dict
        del user_timeline[i]['place']
    if 'extended_entities' in user_timeline[i].keys():
        del user_timeline[i]['extended_entities']
    if 'geo' in user_timeline[i].keys():
        del user_timeline[i]['geo']


January 19, 2016 at 8:10 am

Everything I Needed to Know (About Publication Bias), I Learned In (Pre-) Kindergarten

| Gabriel |


There has been a tremendous amount of hype over the last few years about universal pre-K as a magic bullet to solve all social problems. We see a lot of talk of return on investment at rates usually only promised by prosperity gospel preachers and Ponzi schemes. Unfortunately, two recent large-scale studies, one in Quebec and one in Tennessee, showed small negative effects for pre-K. An article writing up the Tennessee study in New York advises fear not, for:

These are all good studies, and they raise important questions. But none of them is an indictment of preschool, exactly, so much as an indictment of particular approaches to it. How do we know that? Two landmark studies, first published in 1993 and 2008, demonstrate definitively that, if done right, state-sponsored pre-K can have profound, lasting, and positive effects — on individuals and on a community.

It then goes on to explain that the Perry and Abecedarian projects were studies involving 123 and 100 people respectively, had marvelous outcomes, and were play rather than drill oriented.

The phrase “demonstrate definitively” is the kind of phrase you have to very careful with and it just looks silly to say that this definitive knowledge comes from two studies with sample size of about a hundred. Tiny studies with absurdly large effects sizes are exactly where you would expect to find publication bias. Indeed, this is almost inevitable when the sample sizes are so underpowered that the only way to get β/se>1.96 is for β to be implausibly large. (As Jeremy Freese observed, this is among the dozen or so major problems with the PNAS himmicane study).

The standard way to detect publication bias is through a meta-analysis showing that small studies have big effects and big studies have small effects. For instance, this is what Card and Krueger showed in a meta-analysis of the minimum wage literature which demonstrated that their previous paper on PA/NJ was only an outlier when you didn’t account for publication bias. Similarly, in a 2013 JEP, Duncan and Magnuson do a meta-analysis of the pre-K literature. Their visualization in figure 2 emphasizes the declining effects sizes over time, but you can also see that the large studies (shown as large circles) generally have much smaller β than the small studies (shown as small circles). If we added the Tennessee and Quebec studies to this plot they would be large circles on the right slightly below the x-axis. That is to say, they would fall right on the regression line and might even pull it down further.


This is what publication bias looks like: old small studies have big effects and new large studies have small effects.

I suppose it’s possible that the reason Perry and Abecedarian showed big results is because the programs were better implemented than those in the newer studies, but this is not “demonstrated definitively” and given the strong evidence that it’s all publication bias, let’s tentatively assume that if something’s too good to be true (such as that a few hours a week can almost deterministically make kids stay in school, earn a solid living, and stay out of jail), then it ain’t.

November 6, 2015 at 1:21 pm 2 comments

Older Posts

The Culture Geeks