Baby name data

The website of the US Social Security Administration provides baby name data, but sometimes it can be a bit unstable or hard to reach. In the github repository of this script you can find a file with the latest data (BabyData2019.csv.gz), so far including baby name statistics between 1880 and 2019. If you want to know more about how to access and process this data, as well as codes to reproduce this analysis, check the Rmd version of this handout on Github. Fist, let’s look at how much data per year we get from the dataset:

There are several million babies per year. The speed up around the late 30s is due to the change how the US SSA recorded data. People registered themselves as adults before 1937 and only from the 1940s it became widespread to register babies when newly born. So it is better not to read trends that predate that decade.

Let’s look at the trend of two names as an example. Here we plot the number of babies called “Angelina” and “Leonardo” (regardless of gender):

You can see the peak of the fashion of naming your daughter like a Hollywood actress, while the level of popularity of the name is pretty much back to before it was fashionable. The case for Leonardo is a bit different, a slower increase but so far, for a longer period. The Simmel effect predicts that nothing stays fashionable forever, which in this case would mean that sooner or later we will see a decay in popularity like the one you see for Angelina.

The QWERTY effect in baby names

The QWERTY effect is a hypothesis in Psychology that postulates that words that are written with more right-hand letters of the keyboard are, on average, more positive than words that are written with more left-hand letters of the keyboard. Kyle Jasmin and Daniel Casasanto found this effect for the first time when comparing how words are written and how they are scored in a scale of positive to negative. They got further results in English, Portuguese, and German. It appears both in left-handed and right-handed people, and even in pseudowords (words that look like they could mean something but are meaningless). I also found some evidence of the effect in the way people give likes to online content and review products, but figuring out the mechanism behind the effect is still an open research question.

One of the most surprising manifestations of the QWERTY effect is baby names. If we try to give “nice” names to our babies, in theory there should be a trend to give more right-handed names to babies since keyboards became popular when computers penetrated society in the 1990s. Here, we reproduce previous evidence that looked at these trends since the 1960s, with a slightly different calculation. We average the number of right-handed letters minus the number of left-handed letters of all baby names in a year and plot the resulting trend:

Looks like the original result by Casasanto et al. replicates, but we can see a difference with their analysis, which covered data only until 2012. Since the early 2010s, the trend seems to have stopped. Perhaps the QWERTY effect getting softer since phones and tablets are replacing keyboards. While this result replicates, you cannot see the QWERTY effect when you correlate baby name popularity and they way it is typed over the decades, as this paper has shown. So there might be a trend, but not strong enough to say that names with more right-hand letters are more popular than names with more left-hand letters.

Wacky baby name research

There are many papers using the SSA baby name database, some of them published in prestigious journals like PNAS and PRSB. There is a sarcastic journal called “Proceedings of the Natural Institute of Science” (PNIS) that made fun of this trend in a parody paper titled “We are entering an unprecedented age in baby name flux”. The most cheeky graph is Figure 2, where the authors show a scatter plot of the number of unique baby names for girls and for boys versus the yearly average US temperature, reaching the conclusion that “baby name diversity also seems to have risen with the increasing annual temperature of the US (i.e., climate change)”. Here we reproduce that analysis using the average US temperature annomaly from the US Environmental Protection Agency:

The lines show the results of linear regression for boys and girls separately, check our linear regression tutorial to learn more about it. We find the same result as the PNIS article, a positive correlation between the number of unique baby names in a year and the average US temperature, even though we measure it as anomaly rather than raw Fahrenheit like in the original paper. In particular, we get a correlation coefficient of 0.591 for boys and of 0.544 for girls. But do not be deceived, this does not mean that climate change is causing baby name diversity. Both quantities have an upwards trend and this correlation is a result of that. If you want to dig more on this topic, you can run yourself a Granger test and you will see how we do not have evidence that rising temperatures cause larger numbers of names in any of the genders.

The limits of baby name predictability

Baby names are a popular example to illustrate scientific topics. The book Freakonomics explains the imitation part of the Simmel effect and explains how people imitate their richer neighbors when naming their babies. The book goes as far as making a prediction of what will be the top US baby names in 2015, based on a data analysis exercise that is never explained in detail in the article. Here is the prediction: