Now These Are the Names, Pt 1
| Gabriel |
There’s a lot of great research on names and I’ve been a big fan of it for years, although it’s hard to commensurate with my own favorite diffusion models since names are a flow whereas the stuff I’m interested in generally concern diffusion across a finite population.
Anyway, I was inspired to play with this data by two things in conversation. The one I’ll discuss today is somebody repeated a story about a girl named “Lah-d,” which is pronounced “La dash da” since “the dash is not silent.”
This appears to be a slight variation on an existing apocryphal story, but it reflects three real social facts that are well documented in the name literature. First, black girls have the most eclectic names of any demographic group, with a high premium put on on creativity and about 30% having unique names. Second, even when their names are unique coinages they still follow systematic rules, as with the characteristic prefix “La” and consonant pair “sh.” Third, these distinctly black names are an object of bewildered mockery (and a basis for exclusion) by others, which is the appeal in retelling this and other urban legends on the same theme.*
To tell if there was any evidence for this story I checked the Social Security data, but the web searchable interface only includes the top 1000 names per year. Thus checking on very rare names requires downloading the raw text files. There’s one file per year, but you can efficiently search all of them from the command line by going to the directory where you unzipped the archive and grepping.
cd ~/Downloads/names grep '^Lah-d' *.txt grep '^Lahd' *.txt
As you can see, this name does not appear anywhere in the data. Case closed? Well, there’s a slight caveat in that for privacy reasons the data only include names that occur at least five times in a given birth year. So while it includes rare names, it misses extremely rare names. For instance, you also get a big fat nothing if you do this search:
grep '^Reihan' *.txt
This despite the fact that I personally know an American named Reihan. (Actually I’ve never asked him to show me a photo ID so I should remain open to the possibility that “Reihan Salam” is just a memorable nom de plume and his birth certificate really says “Jason Miller” or “Brian Davis”).
For names that do meet the minimal threshold though you can use grep as the basis for a quick and dirty time series. To automate this I wrote a little Stata script to do this called grepnames. To call it, you give it two arguments, the (case-sensitive) name you’re looking for and the directory where you put the name files. It gives you back a time-series for how many births had that name.
capture program drop grepnames program define grepnames local name "`1'" local directory "`2'" tempfile namequery shell grep -r '^`name'' "`directory'" > `namequery' insheet using `namequery', clear gen year=real(regexs(1)) if regexm(v1,"`directory'yob([0-9][0-9][0-9])\.txt") gen name=regexs(1) if regexm(v1,"`directory'yob[0-9][0-9][0-9]\.txt:(.+)") keep if name=="`name'" ren v3 frequency ren v2 sex fillin sex year recode frequency .=0 sort year sex twoway (line frequency year if sex=="M") (line frequency year if sex=="F"), legend(order(1 "Male" 2 "Female")) title(`"Time Series for "`name'" by Birth Cohort"') end
grepnames Gabriel "/Users/rossman/Documents/codeandculture/names/"
Note that these numbers are not scaled for the size of the cohorts, either in reality or as observed by the Social Security administration. (Their data is noticeably worse for cohorts prior to about 1920). Still, it’s pretty obvious that my first name has grown more popular over time.
We can also replicate a classic example from Lieberson of a name that became less popular over time, for rather obvious reasons.
grepnames Adolph "/Users/rossman/Documents/codeandculture/names/"
Next time, how diverse are names over time with thoughts on entropy indices.
(Also see Jay’s thoughts on names, as well as taking inspiration from my book to apply Bass models to film box office).
* Yes, I know that one of those stories is true but the interesting thing is that people like to retell it (and do so with mocking commentary), not that the underlying incident is true. It is also true that yesterday I had eggs and coffee for breakfast, but nobody is likely to forward an e-mail to their friends repeating that particular banal but accurate nugget.