Posts tagged ‘diffusion’
| Gabriel |
A few months ago Stanford’s sociology department was nice enough to invite me up to give a talk on chapter four of Climbing the Charts. This chapter argues that the opinion leadership hypothesis cannot be supported in radio and in the talk I show a simulation of why we should be skeptical of this hypothesis in general. There’s no video, but here’s an enhanced audio file with slideshow. Also a separate PDF of the slides in case you have problems with the integrated version. (A caveat, I knew I was speaking to a technically sophisticated audience so I let the jargon flow freely, the chapter itself is much easier to follow for people without a networks background).
Also in shameless plugging news, Fabio’s review at OrgTheory.
| Gabriel |
[Update 1: From skimming Al Jazeera's (conveniently date-stamped) blog on this issue for Saturday and Sunday it looks like the protests have slowed considerably, which would imply an s-curve.]
[Update 2: It looks like the prime mover political entrepreneur was the shock artist, who actively tried to get a reaction out of people. That is, this is more similar to Jones threatening to burn Korans than to the Danish imans going on tour with (forged versions of) the Danish cartoons. Of course as in purely domestic culture wars issues, there can be a strange symbiosis between partisans on both sides who disagree on the merits but mutually benefit from discord.]
I took the Atlantic Wire’s map (also see the KML file) of “Innocence of Muslims” protest, did my best to add dates, and graphed it as a (cumulative) diffusion curve. Pretty much as you’d expect, it shows exponential growth indicating a process of imitation. Note that the curve rises a bit above trend on Friday, but on the other hand it’s not entirely a Friday thing since you do see growth on Wednesday and Thursday too. I’m gonna split the difference and say it’s about half garden variety imitation and half the fact that Friday is the Islamic Sabbath.
Let’s hope the curve starts bumping up against the asymptote soon and goes from exponential to s-curve. On a more pessimistic note, even after this particular issue burns out, the tactic itself of drumming up outrage against an obscure blasphemy will be imitated at some point in the future by some political entrepreneur, just as this itself was almost certainly inspired by earlier similar efforts by other policy entrepreneurs. That is, there is a logic of imitation at both micro and macro, for protests within each scandal and for scandals imitating each other.
A caveat, I did my best to get the dates right but it often isn’t clear, even in the original news story linked in the KML file. Also, thanks to Neal Caren and Matt Frost for pointing to and showing me how to download the file.
| Gabriel |
There’s a lot of great research on names and I’ve been a big fan of it for years, although it’s hard to commensurate with my own favorite diffusion models since names are a flow whereas the stuff I’m interested in generally concern diffusion across a finite population.
Anyway, I was inspired to play with this data by two things in conversation. The one I’ll discuss today is somebody repeated a story about a girl named “Lah-d,” which is pronounced “La dash da” since “the dash is not silent.”
This appears to be a slight variation on an existing apocryphal story, but it reflects three real social facts that are well documented in the name literature. First, black girls have the most eclectic names of any demographic group, with a high premium put on on creativity and about 30% having unique names. Second, even when their names are unique coinages they still follow systematic rules, as with the characteristic prefix “La” and consonant pair “sh.” Third, these distinctly black names are an object of bewildered mockery (and a basis for exclusion) by others, which is the appeal in retelling this and other urban legends on the same theme.*
To tell if there was any evidence for this story I checked the Social Security data, but the web searchable interface only includes the top 1000 names per year. Thus checking on very rare names requires downloading the raw text files. There’s one file per year, but you can efficiently search all of them from the command line by going to the directory where you unzipped the archive and grepping.
cd ~/Downloads/names grep '^Lah-d' *.txt grep '^Lahd' *.txt
As you can see, this name does not appear anywhere in the data. Case closed? Well, there’s a slight caveat in that for privacy reasons the data only include names that occur at least five times in a given birth year. So while it includes rare names, it misses extremely rare names. For instance, you also get a big fat nothing if you do this search:
grep '^Reihan' *.txt
This despite the fact that I personally know an American named Reihan. (Actually I’ve never asked him to show me a photo ID so I should remain open to the possibility that “Reihan Salam” is just a memorable nom de plume and his birth certificate really says “Jason Miller” or “Brian Davis”).
For names that do meet the minimal threshold though you can use grep as the basis for a quick and dirty time series. To automate this I wrote a little Stata script to do this called grepnames. To call it, you give it two arguments, the (case-sensitive) name you’re looking for and the directory where you put the name files. It gives you back a time-series for how many births had that name.
capture program drop grepnames program define grepnames local name "`1'" local directory "`2'" tempfile namequery shell grep -r '^`name'' "`directory'" > `namequery' insheet using `namequery', clear gen year=real(regexs(1)) if regexm(v1,"`directory'yob([0-9][0-9][0-9])\.txt") gen name=regexs(1) if regexm(v1,"`directory'yob[0-9][0-9][0-9]\.txt:(.+)") keep if name=="`name'" ren v3 frequency ren v2 sex fillin sex year recode frequency .=0 sort year sex twoway (line frequency year if sex=="M") (line frequency year if sex=="F"), legend(order(1 "Male" 2 "Female")) title(`"Time Series for "`name'" by Birth Cohort"') end
grepnames Gabriel "/Users/rossman/Documents/codeandculture/names/"
Note that these numbers are not scaled for the size of the cohorts, either in reality or as observed by the Social Security administration. (Their data is noticeably worse for cohorts prior to about 1920). Still, it’s pretty obvious that my first name has grown more popular over time.
We can also replicate a classic example from Lieberson of a name that became less popular over time, for rather obvious reasons.
grepnames Adolph "/Users/rossman/Documents/codeandculture/names/"
Next time, how diverse are names over time with thoughts on entropy indices.
(Also see Jay’s thoughts on names, as well as taking inspiration from my book to apply Bass models to film box office).
* Yes, I know that one of those stories is true but the interesting thing is that people like to retell it (and do so with mocking commentary), not that the underlying incident is true. It is also true that yesterday I had eggs and coffee for breakfast, but nobody is likely to forward an e-mail to their friends repeating that particular banal but accurate nugget.
| Gabriel |
Lewis, Gonzalez, and Kaufman have a forthcoming paper in PNAS on “Social selection and peer influence in an online social network.” The project uses Facebook data from the entire college experience of a single cohort of undergrads at one school in order to pick at the perennial homophily/influence question. (Also see earlier papers from this project).
Overall it’s an excellent study. The data collection and modeling efforts are extremely impressive. Moreover I’m very sympathetic to (and plan to regularly cite) the conclusion that contagious diffusion is over-rated and we need to consider the micro-motives and mechanisms underlying contagion. I especially liked how they synthesize the Bourdieu tradition with diffusion to argue that diffusion is most likely for taste markers that are distinctive in both sense of the term. As is often the case with PNAS or Science, the really good stuff is in the appendix and in this case it gets downright comical as they apply some very heavy analytical firepower to trying to understand why hipsters are such pretentious assholes before giving up and delegating the issue to ethnography.
The thing that really got me thinking though was a claim they make in the methods section:
Because data on Facebook are naturally occurring, we avoided interviewer effects, recall limitations, and other sources of measurement error endemic to survey-based network research
That is, the authors are reifying Facebook as “natural.” If all they mean is that they’re taking a fly on the wall observational approach, without even the intervention of survey interviews, then yes, this is naturally occurring data. However I don’t think that observational necessarily means natural. If researchers themselves imposed reciprocity, used a triadic closure algorithm to prime recall, and discouraged the deletion of old ties; we’d recognize this as a measurement issue. It’s debatable whether it’s any more natural if Mark Zuckerberg is the one making these operational measurement decisions instead of Kevin Lewis.
Another way to put this is to ask where does social reality end and observation of it begin? In asking the question I’m not saying that there’s a clean answer. On one end of the spectrum we might have your basic random-digit dialing opinion survey that asks people to answer ambiguously-worded Likert-scale questions about issues they don’t otherwise think about. On the other end of the spectrum we might have well-executed ethnography. Sure, scraping Facebook isn’t as unnatural as the survey but neither is it as natural as the ethnography. Of course, as the information regimes literature suggests to us, you can’t really say that polls aren’t natural either insofar as their unnatural results leak out of the ivory tower and become a part of society themselves. (This is most obviously true for things like the unemployment rate and presidential approval ratings).
At a certain point something goes from figure to ground and it becomes practical, and perhaps even ontologically valid, to treat it as natural. You can make a very good argument that market exchange is a social construction that was either entirely unknown or only marginally important for most of human history. However at the present the market so thoroughly structures and saturates our lives that it’s practical to more or less take it for granted when understanding modern societies and only invoke the market’s contingent nature as a scope condition to avoid excessive generalization of economics beyond modern life and into the past, across cultures, and the deep grammar of human nature.
We are, God help us, rapidly approaching a situation where online social networks structure and constitute interaction. Once we do, the biases built into these systems are no longer measurement issues but will be constitutive of social structure. During the transitional period we find ourselves in though, let’s recognize that these networks are human artifices that are in the process of being incorporated into social life. We need a middle ground between “worthless” and “natural” for understanding social media data.
| Gabriel |
USC Annenberg posted video of my talk in which I discuss the genre chapter of Climbing the Charts. In the chapter/talk I discuss how genre conventions structure diffusion, using as examples crossover between radio formats and the institutionalization of reggaetón with the growth of the “hurban” format.
| Gabriel |
Practical advice will follow, but first a rant.
I have previously complained about “social” features that automate how you share information, especially when such features are opt-out rather than opt-in. For instance, I was not enthusiastic about Skype “mood messages” giving your friends and colleagues a play-by-play of what music you listen to, nor was I enamored of a product that would share your browser history.
It’s not as if I’m an introverted recluse either. I have a blog and I correspond pretty actively by e-mail, but the difference is that in these media I actively and deliberately control the flow of information rather than having the prestigious, shameful, and indifferent aspects of my personality and behavior all indiscriminately broadcast to my alters.
I have a fantasy in which Mark Zuckerberg is weeping in his garden when he overhears some neighbor children saying “take and read.” He looks up and notices an old copy of The Presentation of Self in Everyday Life sitting on the table. Tolle lege Mr. Zuckerberg, tolle lege.
Barring such an epiphany, I wouldn’t be surprised if next year’s Facebook Developer’s Conference includes announcements that American Standard is going social to automatically let your friends know when you use the toilet. Or perhaps Vivid will automatically tell all your second cousins and old friends from high school what pornography you’ve purchased. Or Gap brands could let all your friends know what size pants you wear. Visa could post a status update giving the vendor, address, and dollar value every time you buy anything. Because, really, everything’s better when it’s social regardless of whether it’s humiliating or just pointless information overload. It’s a brave new world of web 2.0 social media integration!
Anyway, I was most recently aggravated by Spotify which (like most things nowadays) defaults to over-sharing. Spotify describes this to NPR as “Freeing people from the hassle of actively sharing songs they like [which] will help keep people engaged in their friends’ listening habits without effort.” Some of us prefer to have this “hassle” because the alternative is an uncensored view of our listening habits. As I wrote when Apple added its “Ping” social feature to iTunes:
As a cultural sociologist who has published research on music as cultural capital, I understand how my successful presentation of self depends on me making y’all believe that I only listen to George Gershwin, John Adams, Hank Williams, the Raveonettes, and Sleater-Kinney, as compared to what I actually listen to 90% of the time, which is none of your fucking business.
Anyway, the worst thing about Spotify freeing you from
privacyhassle is it does so by default and it’s difficult to opt-out. You can edit your profile to suppress playlists, but by default they are all revealed and even if you suppress them, new ones created thereafter are revealed. Worse, editing your profile provides no way to suppress “Top Tracks” and “Top Artists” (at least in the Mac client version 0.6.1). After a fair amount of searching (and coming very close to deleting my account entirely), I discovered that it’s fairly easy to totally suppress all of this through the client’s preferences. Just go to the “Spotify” menu and choose “Preferences . . .” then scroll down and uncheck these boxes:
You may now return to the dignity of crafting a public personae that is only loosely coupled to your backstage behavior. Enjoy.
| Gabriel |
Just a quick tip to check out the current episode of This American Life, which is based on the work of my CCPR colleague Susan Watkin on HIV-related gossip in Malawi. Even if you’re not interested in health or development, it’s very interesting for what it says about social networks, diffusion, statistical discrimination, and concealed stigma. The main issue is that people constantly talk about HIV in attempts to figure out who has HIV and thus makes an undesirable sex partner but I also had a few somewhat idiosyncratic interests:
- Information does not just diffuse through social networks in the usual sense of things that would show up in your edge list or sociomatrix but also through space (I’m at the clinic next door to the HIV clinic when you pick up your meds) and through ad hoc collections of people temporarily bounded together (a bunch of people on a bus all start speculating about the HIV status of a pedestrian). I consider this more evidence for my belief that network contagion as a mechanism for information flow is over-rated.
- A lot of public health programs emphasize the coals to Newcastle policy of “encouraging discussion” and “raising awareness.” These policies were driven by cosmopolitan elites, international NGOs, etc. That is, it’s John Meyer “world society” kind of stuff run amuck.
- About a year ago our mutual grad student, Tom Hannan, started a new project that synthesizes Susan’s concerns in #2 with some of my recent theoretical/methodological interests.
| Gabriel |
Shortly before ASA, I finished John Levi Martin’s Social Structures and I loved it, loved it, loved it. (Also see thoughts from Paul DiMaggio, Omar Lizardo, Neil Gross, Fabio Rojas, and Science). I find myself hoping I have to prep contemporary theory just so I can inflict it on unsuspecting undergrads. The book is all about emergence and how fairly minor changes in the nature of social mechanisms can create quite different macro social structures.* It’s just crying out for someone to write a companion suite in NetLogo, chapter by chapter. In addition, JLM knows an enormous amount of history, anthropology, and even animal behavior and uses it all very well to both illustrate his points and show how they work when the friction of reality enters. For instance, he notes that balance theory breaks down to the extent that people have some agency in defining the nature of ties and/or keeping some relations “neutral” rather than the ally versus enemy dichotomy.**
An interesting contrast is Francis Fukuyama’s Origins of Political Order, which I also liked. The two books are broadly similar in scope, giving a sweeping comparative overview of history that starts with animals and attempts to work up to the early modern era. (There are also some similarities in detail, such as their very similar understandings of the “big man” system and that domination is more likely in bounded populations). There is an obvious difference of style in that Fukuyama is easier to read and goes into more extended historical discussions but the more important differences are thematic and theoretical. One such difference is that Fukuyama follows Polybius in seeing the three major socio-political classes as the people, the aristocracy, and the monarch, with the people and the monarch often combining against the aristocracy (as seen in the Roman Revolution and in early modern absolute monarchies). In contrast, JLM’s model tends to see the monarch as just the top aristocrat, though his emphasis on the development of transitivity in command effectively accomplishes some of the same work as the Fukuyama/Polybius model.
The most important difference comes in that Fukuyama is inspired by Weber whereas JLM uses Simmel, a distinction that becomes especially distinct as they move from small tribal bands to early modern societies. Fukuyama’s book is fundamentally about the tension between kinship and law as the fundamental organizing principle of society. In Fukuyama’s account both have very old roots and modernity represents the triumph of law. In contrast, JLM sees kinship (and analogous structures like patronage) as the fundamental logics of society with modernity being similar in kind but grander in scale. In the last chapter and a half JLM discusses the early modern era and here he sounds a bit more like Fukuyama, but he’s clearly more interested in, for instance, the origins of political parties than in their transformation into modern ideological actors.
In part this is because, as Duncan Watts observed at the “author meets critics” at ASA, JLM is mostly interested in that which can be derived from micro-macro emergence and tends to downplay issues that do not fit into this framework.*** This is seen most clearly in the fact that the book winds down around the year 1800 after noting that (a) institutionalization can partially decouple mature structures from their micro origins and (b) ideology can in effect form a sort of bipartite network structure through which otherwise disconnected factions and patronage structures can be united (usually in order to provide a heuristic through which elites can practice balance theory), as with the formation of America’s original party system of Federalists and Democrats which JLM discusses in detail. Of course as I said in the “critics” Q&A, at the present most politically active Americans have a primarily ideological attachment to their party without things like ward bosses and perhaps more interestingly, a role for ideology as a bridge is not an issue restricted to the transition from early modern to modern. As is known to any reader of Gibbon, there was a similar pattern in late antiquity in how esoteric theological disputes over adoptionist Christology and reconciliation of sinners provided rallying points for core vs periphery political struggles in the late Roman empire. Since this is largely a dispute over emphasis, it’s not surprising that JLM was sympathetic to this but he noted that there are limits to what ideological affinity can accomplish and when it comes to costly action you really need micro structures. (He is of course entirely right about this as seen most clearly in the military importance of unit cohesion, but it’s still interesting that ideology has waxed and patronage waned in party systems of advanced democracies).
There are a few places in the book where JLM seemed to be arguing from end states back to micro-mechanisms and I couldn’t tell whether he meant that the micro-mechanisms necessarily exist (i.e., functionalism) or that such demanding specifications of micro-mechanisms implied that the end state was inherently unstable (i.e., emergence). For instance, in chapter three he discusses exchange of women between patrilineal lineages and notes that if there is not simple reciprocity (usually through cross-cousin marriage) then there must be either be some form of generalized reciprocity or else the bottom-ranked male lineages will go extinct. On reading this I was reminded of this classic exchange:
That is, I think it is entirely possible that powerful male lineages could have asymmetric marital exchange with less powerful male lineages and if the latter are eventually driven into extinction then that sucks for them. (The reason this wouldn’t lead to just a single male lineage clan is because, as Fukuyama notes, large clans can fissure and tracing descent back past the 5th or 6th generation is usually more political than genealogical). This is the sort of thing that can actually be answered empirically by contrasting Y chromosomes with mitochondrial DNA. For instance, a recent much publicized study showed that pretty much all ethnically English men carry the Germanic “Frisian Y” chromosome. The authors’ interpretation of this is that a Saxon mass migration displaced the indigenous Gallo-Roman population but I don’t see how this is at all inconsistent with the older elite transfer model of the Saxon invasion if we assume that the transplanted foreign elite hoarded women, including indigenous women. A testable implication of the elite transfer model is that the English would have the same Y as the Danes and Germans but similar mitochondria as the Irish and Welsh. Similarly, a 2003 study showed that 8% of men in East and Central Asia show descent on the male line from Ghengis Khan but nobody has suggested that this reflects a mass migration. Rather in the 12th and 13th centuries the Mongols used rape and polygamy to impregnate women of many Asian nations and they didn’t really give a damn if this meant extinction of the indigenous male lineages.
A very minor point but one that is important to me as a diffusion guy is that chapter five uses the technical jargon of diffusion in non-standard ways, or to be more neutral about it, he and I use terms differently. That said it’s a good chapter, it just needs to be read carefully to avoid semantic confusion.
This post may read like I’m critical of the book but that’s only because I prefer to react to and puzzle out the book rather than summarize it. What reservations I have are fairly minor and unconfident. My overall assessment is that this is a tremendously important book that should be read carefully by anyone interested in social networks, political sociology, social psychology, or economic sociology. For instance, I wish it had been published before my paper with Esparza and Bonacich as using the chapter on pecking orders would have allowed us to develop more depth to the finding about credit ranking networks. (That and it would have given us a pretext to compare Hollywood celebrities to poultry and small children). Despite the book’s foundation in graph theory, this interest should span qualitative/quantitative — at ASA Randy Collins praised the book enthusiastically and gave a very thoughtful reading and from personal conversation I know that Alice Goffman was also very impressed. I think this is because JLM’s relentless focus on interaction between people is a much thinner but nonetheless similar approach to the kinds of issues that qualitative researchers tend to engage with. Indeed, at a deep level Social Structures has more in common with ethnography than with anything that uses regression to try to describe society as a series of slope-intercept equations.
* Technically, it’s about weak emergence, not strong emergence. At “author meets critics” JLM was very clear that he rejects the idea of sui generis social facts with an independent ontological status rather than just a summary or aggregation of micro structure.
** One of the small delights in the early parts of the book is that he notes how our understanding of network structure is driven in part by the ways we measure and record it. So networks based on observation of proximity are necessarily symmetric whereas networks based on sociometric surveys highlight the contingent nature of reciprocity, networks based on balance theory tend to be positive/negative whereas matrices emphasize presence/absence and are often sparse, etc. I might add to his observations in this line that the extremely common practice of projecting bipartite networks into unipartite space (as with studies of Hollywood, Broadway, corporate boards, and technical consortia) has its own sets of biases, most obviously exaggerating the importance and scalability of cliques. Also, I’ve previously remarked on a similar issue in Saller’s Personal Patronage as to how we need to be careful about directed ties being euphemistically described as symmetric ties in some of our data.
*** Watts also observed that JLM’s approach is very much a sort of 1960s sociometry and doesn’t use the recent advances in social network analysis driven by the availability of big data about computer-mediated communication (such as Watts’ current work on Twitter). JLM responded with what was essentially a performativity critique of naive reliance on web 2.0 data, noting for instance that Facebook encourages triadic closure, enforces reciprocity, and discourages deletion of old ties.
| Gabriel |
- Useful detailed overview of Lion. The user interface stuff doesn’t interest me nearly as much as the tight integration of version control and “resume.” Also, worth checking if your apps are compatible. (Stata and Lyx are supposed to work fine. TextMate is supposed to run OK with some minor bugs. No word on R. Fink doesn’t work yet). It sounds good but I’m once again sitting it out for a few months until the compatibility bugs get worked out. Also, as with Snow Leopard many of the features won’t really do anything until developers implement them in their applications.
- I absolutely loved the NPR Planet Money story on the making of Rihanna’s “Man Down.” (Not so fond of the song itself, which reminds me of Bing Crosby and David Bowie singing “Little Drummer Boy” in matching cardigans). If you have any interest at all in production of culture read the blog post and listen to the long form podcast (the ATC version linked from the blog post is the short version).
- Good explanation of e, which comes up surprisingly often in sociology (logit regression, diffusion models, etc.). I like this a lot as in my own pedagogy I really try to emphasize the intuitive meaning of mathematical concepts rather than just the plug and chug formulae on the one hand or the proofs on the other.
- People are using “bimbots” to scrape Facebook. And to think that I have ethical misgivings about forging a user-agent string so wget looks like Firefox.
- Lisa sends along this set of instructions for doing a wide-long reshape in R. Useful and I’m passing it along for the benefit of R users, but the relative intuition and simplicity of “reshape wide stub, i(i) j(j)” is why I still do my mise en place in Stata whenever I use R. Ideally though, as my grad student Brooks likes to remind me, we really should be doing this kind of data mise en place in a dedicated database and use the Stata and R ODBC commands/functions to read it in.
- “The days change at night, change in an instant.”
- Anyone interested in replicating this paper should be paying close attention to this pending natural experiment. In particular I hope the administrators of this survey are smart enough to oversample California in the next wave. I’d consider doing the replication myself but I’m too busy installing a new set of deadbolts and adopting a dog from a pit bull rescue center.
- In Vermont, a state government push to get 100% broadband penetration is using horses to wire remote areas that are off the
supply curvebeaten path. I see this as a nice illustration both of cluster economies and of the different logics used by markets (market clearing price) and states (fairness, which often cashes out as universal access) in the provision of resources. (h/t Slashdot)
- Yglesias discusses some poll results showing that voters in most of the states that recently elected Republican governors now would have elected the Democrats. There are no poll results for California, the only state that switched to the Democrats last November. Repeat after me: REGRESSION TO THE MEAN. I don’t doubt that some of this is substantive backlash to overreach on the part of politically ignorant swing voters who didn’t really understand the GOP platform, but really, you’ve still got to keep in mind REGRESSION TO THE MEAN.
- Speaking of Yglesias, the ThinkProgress redesign only allows commenting from Facebook users, which is both a pain for those of us who don’t wish to bear the awesome responsibility of adjudicating friend requests and a nice illustration of how network externalities can become coercive as you reach the right side of the s-curve.