Who Said It? Gift Exchange Lit vs Article on LASD

| Gabriel |

For each quote, guess the source: a classic of gift exchange or a Los Angeles Times article about deposed Sheriff and soon to be plea bargainee, Lee Baca. Highlight the text to see the answers and score your quiz!

“Until he has given back, the receiver is ‘obliged,’ expected to show his gratitude towards his benefactor or at least to show regard for him, go easy on him, pull his punches…” (Bourdieu Logic of Practice)

“The etiquette of the feast, of the gift that one receives with dignity, but is not solicited, is extremely marked among these tribes.” (Mauss The Gift)

“I don’t solicit any gifts. I’ve never asked for a gift.… People just do it for me.” (Los Angeles Times)

“When you’re taking gifts from strangers, there’s only one reason. They only give gifts because they want something.” (Los Angeles Times)

“These, however, are but the outward signs of kindness, not the kindnesses themselves.” (Seneca Benefits)

“What they’re expressing is appreciation for the respectful way we do business.” (Los Angeles Times)

“No one is really unaware of the logic of exchange … but no one fails to comply with the rules of the game, which is to act as if one did not know the rule.” (Bourdieu Pascalian Meditations)

“Nobody is free to refuse the present that is offered.” (Mauss The Gift)

“My life would be much easier if people did not give me gifts.” (Los Angeles Times)

 

Advertisements

February 10, 2016 at 11:57 am

Scraping Twitter with Python

| Gabriel |

As long-time readers will remember, I have been collecting Twitter with the R library(twitteR). Unfortunately that workflow has proven to be buggy, mostly for reasons having to do with authentication. As such I decided to learn Python and migrate my project to the Twython module. Overall, I’ve been very impressed by the language and the module.  I haven’t had any dependency problems and authentication works pretty smoothly. On the other hand, it requires a lot more manual coding to get around rate limits than does twitteR and this is a big part of what my scripts are doing.

I’ll let you follow the standard instructions for installing Python 3 and the Twython module before showing you my workflow. Note that all of my code was run on Python 3.5.1 and OSX 10.9. You want to use Python 3, not Python 2 as tweets are UTF-8. If you’re a Mac person, OSX comes with 2.7 but you will need to install Python3. For the same reason, use Stata 14 for tweets.

One tip on installation, pip tends to default to 2.7 so use this syntax in bash.

python3   -m pip install twython

I use three py scripts, one to write Twython queries to disk, one to query information about a set of Twitter users, and one to query tweets from a particular user. Note that the query scripts can be slow to execute, which is deliberate as otherwise you end up hitting rate limits. (Twitter’s API allows fifteen queries per fifteen minutes). I call the two query scripts from bash with argument passing. The disk writing script is called by the query scripts and doesn’t require user intervention, though you do need to be sure Python knows where to find it (usually by keeping it in the current working directory). Note that you will need to adjust things like file paths and authentication keys. (When accessing Twitter through scripts instead of your phone, you don’t use usernames and passwords but keys and secrets, you can generate the keys by registering an application).

tw2csv.py

I am discussing this script first even though it is not directly called by the user because it is the most natural place to discuss Twython’s somewhat complicated data structure. A Twython data object is a list of dictionaries. (I adapted this script for exporting lists of dictionaries). You can get a pretty good feel for what these objects look like by using type() and the pprint module. In this sample code, I explore a data object created by infoquery.py.

type(users) #shows that users is a list
type(users[0]) #shows that each element of users is a dictionary
#the objects are a bunch of brackets and commas, use pprint to make a dictionary (sub)object human-readable with whitespace
import pprint
pp=pprint.PrettyPrinter(indent=4)
pp.pprint(users[0])
pp.pprint(users[0]['status']) #you can also zoom in on daughter objects, in this case the user's most recent tweet object. Note that this tweet is a sub-object within the user object, but may itself have sub-objects

As you can see if you use the pprint command, some of the dictionary values are themselves dictionaries. It’s a real fleas upon fleas kind of deal. In the datacollection.py script I pull some of these objects out and delete others for the “clean” version of the data. Also note that tw2csv defaults to writing these second-level fields as one first-level field with escaped internal delimiters. So if you open a file in Excel, some of the cells will be really long and have a lot of commas in them. While Excel automatically parses the escaped commas correctly, Stata assumes you don’t want them escaped unless you use this command:

import delimited "foo.csv", delimiter(comma) bindquote(strict) varnames(1) asdouble encoding(UTF-8) clear

Another tricky thing about Twython data is there can be variable number of dictionary entries (ie, some fields are missing from some cases). For instance, if a tweet is not a retweet it will be missing the “retweeted_status” dictionary within a dictionary. This was the biggest problem with reusing the Stack Overflow code and required adapting another piece of code for getting the union set of dictionary keys. Note this will give you all the keys used in any entry from the current query, but not those found uniquely in past or future queries. Likewise, Python sorts field order randomly. For these two reasons, I hard-coded tw2csv as overwrite, not append, and build in a timestamp to the query scripts. If you tweak the code to append, you will run into problems with the fields not lining up.

Anyway, here’s the actual tw2csv code.

#tw2csv.py
def tw2csv(twdata,csvfile_out):
    import csv
    import functools
    allkey = functools.reduce(lambda x, y: x.union(y.keys()), twdata, set())
    with open(csvfile_out,'wt') as output_file:
        dict_writer=csv.DictWriter(output_file,allkey)
        dict_writer.writeheader()
        dict_writer.writerows(twdata)

infoquery.py

One of the queries I like to run is getting basic information like date created, description, and follower counts. Basically, all the stuff that shows up on a user’s profile page. The Twitter API allows you to do this for 100 users simultaneously and I do this with the infoquery.py script. It assumes that your list of target users is stored in a text file, but there’s a commented out line that lets you hard code the users, which may be easier if you’re doing it interactively. Likewise, it’s designed to only query 100 users at a time, but there’s a commented out line that’s much simpler in interactive use if you’re only querying a few users.

You can call it from the command line and it takes as an argument the location of the input file. I hard-coded the location of the output. Note the “3” in the command-line call is important as operating systems like OSX default to calling Python 2.7.

python3 infoquery.py list.txt

And here’s the actual script. Note that I’ve taken out my key and secret. You’ll have to register as an “application” and generate these yourself.

#infoquery.py
from twython import Twython
import sys
import time
from math import ceil
import tw2csv #custom module

parentpath='/Users/rossman/Documents/twittertrucks/infoquery_py'
targetlist=sys.argv[1] #text file listing feeds to query, one per line. full path ok.
today = time.strftime("%Y%m%d")
csvfilepath_info=parentpath+'/info_'+today+'.csv'

#authenticate
APP_KEY='' #25 alphanumeric characters
APP_SECRET='' #50 alphanumeric characters
twitter=Twython(APP_KEY,APP_SECRET,oauth_version=2) #simple authentication object
ACCESS_TOKEN=twitter.obtain_access_token()
twitter=Twython(APP_KEY,access_token=ACCESS_TOKEN)

handles = [line.rstrip() for line in open(targetlist)] #read from text file given as cmd-line argument
#handles=("gabrielrossman,sociologicalsci,twitter") #alternately, hard-code the list of handles

#API allows 100 users per query. Cycle through, 100 at a time
#users = twitter.lookup_user(screen_name=handles) #this one line is all you need if len(handles) < 100
users=[] #initialize data object
hl=len(handles)
cycles=ceil(hl/100)
#unlike a get_user_timeline query, there is no need to cap total cycles
for i in range(0, cycles): ## iterate through all tweets up to max of 3200
    h=handles[0:100]
    del handles[0:100]
    incremental = twitter.lookup_user(screen_name=h)
    users.extend(incremental)
    time.sleep(90) ## 90 second rest between api calls. The API allows 15 calls per 15 minutes so this is conservative

tw2csv.tw2csv(users,csvfilepath_info)

datacollection.py

This last script collects tweets for a specified user. The tricky thing about this code is that the Twitter API allows you to query the last 3200 tweets per user, but only 200 at a time, so you have to cycle over them. moreover, you have to build in a delay so you don’t get rate-limited. I adapted the script from this code but made some tweaks.

One change I made was to only scrape as deep as necessary for any given user. For instance, as of this writing, @SociologicalSci has 1192 tweets, so it cycles six times, but if you run it in a few weeks @SociologicalSci would have over 1200 and so it would run at least seven cycles. This change makes the script run faster, but ultimately gets you to the same place.

The other change I made is that I save two versions of the file, one as is and the other that pulls out some objects from the subdictionaries and deletes the rest. If for some reason you don’t care about retweet count but are very interested in retweeting user’s profile background color, go ahead and modify the code. See above for tips on exploring the data structure interactively so you can see what there is to choose from.

As above, you’ll need to register as an application and supply a key and secret.

You call it from bash with the target screenname as an argument.

python3 datacollection.py sociologicalsci
#datacollection.py
from twython import Twython
import sys
import time
import simplejson
from math import ceil
import tw2csv #custom module

parentpath='/Users/rossman/Documents/twittertrucks/feeds_py'
handle=sys.argv[1] #takes target twitter screenname as command-line argument
today = time.strftime("%Y%m%d")
csvfilepath=parentpath+'/'+handle+'_'+today+'.csv'
csvfilepath_clean=parentpath+'/'+handle+'_'+today+'_clean.csv'

#authenticate
APP_KEY='' #25 alphanumeric characters
APP_SECRET='' #50 alphanumeric characters
twitter=Twython(APP_KEY,APP_SECRET,oauth_version=2) #simple authentication object
ACCESS_TOKEN=twitter.obtain_access_token()
twitter=Twython(APP_KEY,access_token=ACCESS_TOKEN)

#adapted from http://www.craigaddyman.com/mining-all-tweets-with-python/
#user_timeline=twitter.get_user_timeline(screen_name=handle,count=200) #if doing 200 or less, just do this one line
user_timeline=twitter.get_user_timeline(screen_name=handle,count=1) #get most recent tweet
lis=user_timeline[0]['id']-1 #tweet id # for most recent tweet
#only query as deep as necessary
tweetsum= user_timeline[0]['user']['statuses_count']
cycles=ceil(tweetsum / 200)
if cycles>16:
    cycles=16 #API only allows depth of 3200 so no point trying deeper than 200*16
time.sleep(60)
for i in range(0, cycles): ## iterate through all tweets up to max of 3200
    incremental = twitter.get_user_timeline(screen_name=handle,
    count=200, include_retweets=True, max_id=lis)
    user_timeline.extend(incremental)
    lis=user_timeline[-1]['id']-1
    time.sleep(90) ## 90 second rest between api calls. The API allows 15 calls per 15 minutes so this is conservative

tw2csv.tw2csv(user_timeline,csvfilepath)

#clean the file and save it
for i, val in enumerate(user_timeline):
    user_timeline[i]['user_screen_name']=user_timeline[i]['user']['screen_name']
    user_timeline[i]['user_followers_count']=user_timeline[i]['user']['followers_count']
    user_timeline[i]['user_id']=user_timeline[i]['user']['id']
    user_timeline[i]['user_created_at']=user_timeline[i]['user']['created_at']
    if 'retweeted_status' in user_timeline[i].keys():
        user_timeline[i]['rt_count'] = user_timeline[i]['retweeted_status']['retweet_count']
        user_timeline[i]['qt_id'] = user_timeline[i]['retweeted_status']['id']
        user_timeline[i]['rt_created'] = user_timeline[i]['retweeted_status']['created_at']
        user_timeline[i]['rt_user_screenname'] = user_timeline[i]['retweeted_status']['user']['name']
        user_timeline[i]['rt_user_id'] = user_timeline[i]['retweeted_status']['user']['id']
        user_timeline[i]['rt_user_followers'] = user_timeline[i]['retweeted_status']['user']['followers_count']
        del user_timeline[i]['retweeted_status']
    if 'quoted_status' in user_timeline[i].keys():
        user_timeline[i]['qt_created'] = user_timeline[i]['quoted_status']['created_at']
        user_timeline[i]['qt_id'] = user_timeline[i]['quoted_status']['id']
        user_timeline[i]['qt_text'] = user_timeline[i]['quoted_status']['text']
        user_timeline[i]['qt_user_screenname'] = user_timeline[i]['quoted_status']['user']['name']
        user_timeline[i]['qt_user_id'] = user_timeline[i]['quoted_status']['user']['id']
        user_timeline[i]['qt_user_followers'] = user_timeline[i]['quoted_status']['user']['followers_count']
        del user_timeline[i]['quoted_status']
    if user_timeline[i]['entities']['urls']: #list
        for j, val in enumerate(user_timeline[i]['entities']['urls']):
            urlj='url_'+str(j)
            user_timeline[i][urlj]=user_timeline[i]['entities']['urls'][j]['expanded_url']
    if user_timeline[i]['entities']['user_mentions']: #list
        for j, val in enumerate(user_timeline[i]['entities']['user_mentions']):
            mentionj='mention_'+str(j)
            user_timeline[i][mentionj] = user_timeline[i]['entities']['user_mentions'][j]['screen_name']
    if user_timeline[i]['entities']['hashtags']: #list
        for j, val in enumerate(user_timeline[i]['entities']['hashtags']):
            hashtagj='hashtag_'+str(j)
            user_timeline[i][hashtagj] = user_timeline[i]['entities']['hashtags'][j]['text']
    if user_timeline[i]['coordinates'] is not None:  #NoneType or Dict
        user_timeline[i]['coord_long'] = user_timeline[i]['coordinates']['coordinates'][0]
        user_timeline[i]['coord_lat'] = user_timeline[i]['coordinates']['coordinates'][1]
    del user_timeline[i]['coordinates']
    del user_timeline[i]['user']
    del user_timeline[i]['entities']
    if 'place' in user_timeline[i].keys():  #NoneType or Dict
        del user_timeline[i]['place']
    if 'extended_entities' in user_timeline[i].keys():
        del user_timeline[i]['extended_entities']
    if 'geo' in user_timeline[i].keys():
        del user_timeline[i]['geo']

tw2csv.tw2csv(user_timeline,csvfilepath_clean)

January 19, 2016 at 8:10 am

Everything I Needed to Know (About Publication Bias), I Learned In (Pre-) Kindergarten

| Gabriel |

prek

There has been a tremendous amount of hype over the last few years about universal pre-K as a magic bullet to solve all social problems. We see a lot of talk of return on investment at rates usually only promised by prosperity gospel preachers and Ponzi schemes. Unfortunately, two recent large-scale studies, one in Quebec and one in Tennessee, showed small negative effects for pre-K. An article writing up the Tennessee study in New York advises fear not, for:

These are all good studies, and they raise important questions. But none of them is an indictment of preschool, exactly, so much as an indictment of particular approaches to it. How do we know that? Two landmark studies, first published in 1993 and 2008, demonstrate definitively that, if done right, state-sponsored pre-K can have profound, lasting, and positive effects — on individuals and on a community.

It then goes on to explain that the Perry and Abecedarian projects were studies involving 123 and 100 people respectively, had marvelous outcomes, and were play rather than drill oriented.

The phrase “demonstrate definitively” is the kind of phrase you have to very careful with and it just looks silly to say that this definitive knowledge comes from two studies with sample size of about a hundred. Tiny studies with absurdly large effects sizes are exactly where you would expect to find publication bias. Indeed, this is almost inevitable when the sample sizes are so underpowered that the only way to get β/se>1.96 is for β to be implausibly large. (As Jeremy Freese observed, this is among the dozen or so major problems with the PNAS himmicane study).

The standard way to detect publication bias is through a meta-analysis showing that small studies have big effects and big studies have small effects. For instance, this is what Card and Krueger showed in a meta-analysis of the minimum wage literature which demonstrated that their previous paper on PA/NJ was only an outlier when you didn’t account for publication bias. Similarly, in a 2013 JEP, Duncan and Magnuson do a meta-analysis of the pre-K literature. Their visualization in figure 2 emphasizes the declining effects sizes over time, but you can also see that the large studies (shown as large circles) generally have much smaller β than the small studies (shown as small circles). If we added the Tennessee and Quebec studies to this plot they would be large circles on the right slightly below the x-axis. That is to say, they would fall right on the regression line and might even pull it down further.

duncanmagnuson

This is what publication bias looks like: old small studies have big effects and new large studies have small effects.

I suppose it’s possible that the reason Perry and Abecedarian showed big results is because the programs were better implemented than those in the newer studies, but this is not “demonstrated definitively” and given the strong evidence that it’s all publication bias, let’s tentatively assume that if something’s too good to be true (such as that a few hours a week can almost deterministically make kids stay in school, earn a solid living, and stay out of jail), then it ain’t.

November 6, 2015 at 1:21 pm 2 comments

Picking sides

| Gabriel |

Today the Economist posted a graph showing the patrons of factions in various civil wars in the Middle East. The point of the graph is that the alliances don’t neatly follow balance theory, since it is in fact sometimes the case that the friend of my enemy is my friend, which is a classic balance theory fail. As such, I thought it would be fun to run a Spinglass model on the graph. Note that I could only do edges, not arcs, so I only included positive ties, not hostility ties. One implication of this is ISIS drops out as it (currently) lacks state patronage.

Here’s the output. The second column is community and the third is betweenness.

> s
Graph community structure calculated with the spinglass algorithm
Number of communities: 4 
Modularity: 0.4936224 
Membership vector:
 [1] 4 4 3 2 2 2 4 3 4 3 1 4 1 3 3 4 2 4 2
> output
 b 
 [1,] "bahrain_etc" "4" "0" 
 [2,] "egypt_gov" "4" "9.16666666666667" 
 [3,] "egypt_mb" "3" "1.06666666666667" 
 [4,] "iran" "2" "47.5" 
 [5,] "iraq_gov" "2" "26" 
 [6,] "iraq_kurd" "2" "26" 
 [7,] "jordan" "4" "6.73333333333333" 
 [8,] "libya_dawn" "3" "1.06666666666667" 
 [9,] "libya_dignity" "4" "0.333333333333333"
[10,] "qatar" "3" "27.5333333333333" 
[11,] "russia" "1" "0" 
[12,] "saudi" "4" "4" 
[13,] "syria_gov" "1" "17" 
[14,] "syria_misc" "3" "31.0333333333333" 
[15,] "turkey" "3" "6.83333333333333" 
[16,] "uae" "4" "4" 
[17,] "usa" "2" "74.4" 
[18,] "yemen_gov" "4" "74.3333333333333" 
[19,] "yemen_houthi" "2" "0"   

So it looks like we’re in community 2, which is basically Iran and its clients, though in fairness we also have high betweenness as we connect community 2 (Greater Iran), community 3 (the pro Muslim Brotherhood Sunni states), and community 4 (the pro Egyptian government Sunni states). This is consistent with the “offshore balancing” model of Obama era MENA policy.

Here’s the code:

library("igraph")
setwd('~/Documents/codeandculture')
mena <- read.graph('mena.net',format="pajek")
la = layout.fruchterman.reingold(mena)
V(mena)$label <- V(mena)$id #attaches labels
plot.igraph(mena, layout=la, vertex.size=1, vertex.label.cex=0.5, vertex.label.color="darkred", vertex.label.font=2, vertex.color="white", vertex.frame.color="NA", edge.color="gray70", edge.arrow.size=0.5, margin=0)
s <- spinglass.community(mena)
b <- betweenness(mena, directed=FALSE)
output <- cbind(V(mena)$id,s$membership,b)
s
output

And here’s the data:

*Vertices 19
1 "bahrain_etc"
2 "egypt_gov"
3 "egypt_mb"
4 "iran"
5 "iraq_gov"
6 "iraq_kurd"
7 "jordan"
8 "libya_dawn"
9 "libya_dignity"
10 "qatar"
11 "russia"
12 "saudi"
13 "syria_gov"
14 "syria_misc"
15 "turkey"
16 "uae"
17 "usa"
18 "yemen_gov"
19 "yemen_houthi"
*Arcs
1 18
2 9
2 18
4 5
4 6
4 13
4 19
7 2
7 14
7 18
10 3
10 8
10 14
10 18
11 13
12 2
12 9
12 18
15 3
15 8
15 14
16 2
16 9
16 18
17 5
17 6
17 14
17 18

April 3, 2015 at 9:16 am

Monday Night Anomalies

| Gabriel |

The transformations of the television industry are an endlessly fascinating subject that I spend a lot of time ruminating on but haven’t ever, you know, actually published on. We can start with a few basic technological shifts, specifically the DVR and broadband internet. Both technologies have the effect that people are watching fewer commercials. From this we can infer that advertisers will have a pronounced preference for “DVR-proof” advertising.* One form of this is product shots, which are indeed a big deal nowadays, especially in the reality competition genre. Of course product shots are inherently cumbersome and are pretty much the antithesis of the scatter advertising market insofar as they require commitments during pre-production which is even more extreme than up-fronts and which is why we long ago got past the age of Texaco Star Theatre. So basically, the 30 second spot you will always have with you. Or rather, the demand for the 30 second spot you will always have with you and the question is can we find a type of programming where people watch the ads. (Note that the recent Laureate Jean Tirole did work on this issue, as explained by Alex Tabarrok at MR).

In practice getting people to watch spot advertising means programming that has to be watched live and in practice that in turn means sports.** Thus it is entirely predictable that advertisers will pay a premium for sports. It is also predictable that the cable industry will pay a premium for sports because must-watch ephemera is a good insurance policy against cord-cutting. Moreover, as a straight-forward Ricardian rent type issue, we would predict that this increased demand would accrue to the owners of factor inputs: athletes, team owners, and (in the short-run) the owners of cable channels with contracts to carry sports content. Indeed this has basically all happened. You’ve got ESPN being the cash cow of Disney, ESPN and TNT in turn signing a $24 billion deal with the NBA, an NBA team selling for $2 billion, and Kobe Bryant making $30 million in salary. Basically, there’s a ton of money in DVR-proof sports, both from advertising and from the ever-rising carriage fees that get passed on in the form of ever rising basic cable rates. (I imagine a Johnny Cash parody, “how high’s the carriage fees mama? 6 bucks per sub and rising.”).

Here’s something else that is entirely predictable from these premises: we should have declining viewership for sports. Think about it, you have widget A and widget B. Widget A has a user experience that’s the same as it’s always been (ie, you got to watch it when it’s on and sit through the ads) but the price is rapidly increasing (it used to be you could get it over broadcast or just from a basic cable package that was relatively cheap). In contrast you have widget B which has a dramatically improved user experience (you can watch every episode ever on-demand whenever you feel like it without ads and do so on your tv, tablet, or whatever) and a rapidly declining price (if you’re willing to wait for the previous season, scripted content is practically free). If you’re the marginal viewer who ex ante finds sports and scripted equally compelling, it seems like as sports get more expensive and you keep having to watch ads, whereas scripted gets dirt cheap, ad-free, and generally more convenient, the marginal viewer would give up sports, watch last season’s episodes of Breaking Bad on Netflix, be blissfully unaware of major advertising campaigns, and pocket the $50 difference between a basic cable package and a $10 Netflix subscription. Of course you wouldn’t predict that the kinds of guys who put body paint on their naked torsos would give up on sports just because Netflix has every season of Frasier, but you would predict that at the population level interest in sports would decline slightly to moderately.

The weird thing is that this latter prediction didn’t happen. During exactly the same period over which sports got more expensive in absolute terms and there was declining direct cost and hassle for close substitutes, viewership for sports increased. From 2003 to 2013, sports viewership was up 27%. Or rather, baseball isn’t doing so great and basketball is holding its own, but holy moly, people love football. If you look at both the top events and top series on tv, it’s basically football, football, some other crap, and more football. (Also note that football doesn’t appear in the “time-shifted” lists, meaning that people do watch the ads). And it’s not just that people have always liked football or that non-football content is weakening, but football is growing in absolute popularity.

That this would happen in an era of DVRs and streaming is nuts, and kind of goes contrary to the whole notion of substitutes. I mean, I just can’t understand how when one thing gets more expensive and something else that’s similar gets a lot cheaper and lower hassle, that you see people flocking to the thing that is more money in absolute terms and more hassle in relative terms.*** Maybe we just need to keep heightening the contradictions and then eventually the system will unravel, but this doesn’t explain why we’ve seen a medium-run fairly substantial rise in sports viewership instead of just stability with a bit of noise.

I’m sure one of my commenters is smarter than me and can explain why either my premises or logic is incorrect, but at least to me this looks like an anomaly. And even if we can ultimately find some auxiliary hypothesis that explains why of course we’d predict a rise in sports viewership if we only considered that [your brilliant ex post explanation goes here],**** let’s keep in mind that this is all ex post, and adjust down our confidence about making social scientific predictive inferences accordingly. A theory like decline in total cost of widget B will lead to substitution of widget B for widget A is a pretty good theory and if its predictions don’t hold in the face of something like bigger linebackers or more exciting editing for instant replay, then you have to wonder how much any theory can get us.

*If we’re a bit more creative we could also infer that the market information regime for audience ratings will see a lot of contentious changes.

**It is interesting that the tv networks aggressively promote Twitter in order to promote live viewing of scripted content and news, but at this point the idea that networks will hashtag their way to a higher “C3” ratings is pretty niche/speculative.

*** The closest parallel I can think of is that it’s the easy-going mainline Protestant churches that have seen especially steep declines in attendance/membership and the more personally demanding churches that are relatively strong. I may have to rethink this point though after I fully digest the new Hout & Fischer.

**** Your ex post explanation better speak to the (extensive) marginal fan and not just the intensity of hardcore fans, since my understanding is total number of football viewers is up, and so the explanation can’t be anything like the growth in fantasy leagues leads hardcore fans to watch 20 hours a week instead of 3 hours a week.

October 14, 2014 at 9:09 am 18 comments

What is the word for “log” in R?

| Gabriel |

Like most native speakers of Stata, the most natural thing in the world is to start every script or session with a log file. Typing “log using” is like brushing your teeth, it’s the first thing you do and you just feel gross if you haven’t done it. Judging by what you get if you Google “logging in R” this seems to be something of a cultural eccentricity peculiar to Stata users as R users seems not to understand the question. In particular most responses to the question say something like “use sink(),” which ignores that to a Stata user a log file is neither the command history nor the output, but the two interpolated together so that you can see what command elicited what output.

However, much as the frustrated tourist abroad will occasionally find someone who understands what they mean in asking for a Western toilet, one great StackOverflow user speaks sufficient Stata to direct us to what we were hoping to find. Specifically, the library “TeachingDemos” includes a “txtStart()” function that behaves almost exactly like a Stata log file by default, but where you also have various options such as to suppress commands/output or to use Markdown format.

To install TeachingDemos:

install.packages('TeachingDemos')

Thereafter invoke it start a log file, do your work, and close it:

library(TeachingDemos)
txtStart('mylogfile.txt') # this is similar to "log using mylogfile.txt" in Stata
#insert code here to load your data, analyze it, etc
txtStop() # this is similar to "log close" in Stata

October 9, 2014 at 3:48 pm 4 comments

Obfuscation Form 700

| Gabriel |

[cross-posted from TAS]

The Supreme Court recently ruled in favor of Hobby Lobby but among the many things that are not widely understood is that the decision did not actually result in the firm’s employees losing insurance coverage for IUDs. The actual result is that the employees will still have coverage for IUDs, but the insurance processor rather than Hobby Lobby will have to pay for it (at least in theory). That is, Hobby Lobby was seeking to take advantage of the Obama administration’s own proposal for faith-based nonprofits. As Julian Sanchez at Cato observed, the entire case turns on an entirely symbolic issue of whether the Greens explicitly have to pay for IUDs or are allowed to wink at an obfuscation in which their insurance company bears the cost (at least theoretically).

I found this interesting not only because it’s a much discussed case but also because it’s a close fit with my article published a few months ago in Sociological Theory. (Here’s an ungated version that lacks the benefit of some really good copy-editing). In the article I talk about situations where a moral objection gets in the way of a transaction but the transaction nonetheless occurs through the expedient of obfuscating that a transaction is occurring at all. I describe three mechanisms for accomplishing this and the nonprofit exemption which now also applies to Hobby Lobby is characterized by two of them, brokerage and bundling. That is, the employer does not buy theIUD for the employee but rather pays a broker (the insurance processor) who in turn provides the IUD. Moreover, the IUD is bundled together with other health coverage. The third model which is not at issue in Hobby Lobby but which I describe in the paper is gift exchange, where explicit quid pro quo is replaced with tacit reciprocity.

Of course for an exchange to be morally objectionable or for it to be koshered is entirely subjective. Most obviously in Hobby Lobby there is a range of opinions about the moral acceptability of birth control and abortifacients and where to draw the line between the two. More interesting to me is that opinions vary on what counts as “buying” the contested commodity and whether to seize on obfuscation and denounce it. On this issue the irony is that while the Obama administration itself came up with this obfuscation for nonprofits it opposed extending it to for-profit firms. At a general level, obfuscation doesn’t objectively exist but rather it creates a permission structure that actors can choose to consent to.

This becomes clear when we contrast Hobby Lobby to Little Sisters of the Poor. Whereas the owners of Hobby Lobby sued to avail themselves of the obfuscatory accomodation, the Little Sisters of the Poor who (as a nonprofit) already have this obfuscation available to them but are suing to denounce it as mere obfuscation and completely remove themselves from even obfuscated provision of all birth control. Specifically, the Little Sisters are refusing to fill out EBSA Form 700 stating their objection to providing contraceptive coverage since to do so would trigger provision through their insurer and they see this as involving themselves in something morally objectionable. That is, while Hobby Lobby would be delighted to wink and nod (and the Obama administration was reluctant to allow them to do so) the Little Sisters are adamantly opposed to a fig leaf (and the Obama administration would be delighted were they to play along with the face-saving obfuscation).

July 7, 2014 at 9:59 am

Older Posts Newer Posts


The Culture Geeks