Among English speakers, the phrase “Oh my God” might be the most frequent way to express surprise with soft blasphemy. In this first paper of the Computational Social Science Lab Holiday Paper Series, we look into how “Oh my God” became popular with the TV series Friends by analyzing the scripts of the series and other datsets of spoken and written expression across media and languages. We also study the demographics of who says it through Twitter data and explore how AI language models have learned to say it.
The expression “Oh my God” is specially prevalent in US television as part of the valleyspeak sociolect (Wikipedia, 2020) (see any episode of “Keeping up with the Kardashians” to have an idea). Beyond the OMG expression, valleyspeak is characterized by the frequent use of uptalk, a speech pattern in which sentences end in a question-like tone, vocal fry, another speech pattern in which utterances end in a vibrating sound similar to what a goat makes when queezed, and the extreme use of the word like as a filler word. While original examples of valleyspeak originate in the San Fernando Valley in California, it can now be heard across the US and specially in mass media.
“Oh my God” is the most characteristic reaction of the characters of Friends, which is one of the most successful sitcoms in history, even more than 15 years after it ended (The Economist, 2019). The Honest Trailer of the series points out how frequently the characters of the series react with this phrase. For example, Rachel says it several times in just one scene in reaction to the sudden appearance of a pidgeon in her kitchen. The phrase is so typical of the series that it has even motivated a data science blog post about which Friends characters use it the most (Loscalzo, 2018). Beyond this phrase, Friends is becoming a common example for popular data science analyses, including social network analysis (Albright, 2015; Sahakyan, 2019), finding the most popular character (Sohoye, 2019), and of course, sentiment analysis (Bhattacharyya, 2019).
Our aim with this article is to understand the popularity and meaning of “Oh my God”, especially in relation with the TV series Friends. We will start by analyzing the scripts of Friends, comparing the use of “Oh my God” in Friends with contemporary TV shows and movies. We then continue by analyzing the expression through Google Books, inspecting what could have been the role of Friends in the use of the phrase and how it compares to similar phrases in other languages. We then analyze the current use of the phrase in social media through the analysis of a Twitter dataset, paying special attention to its use across genders and states in the US. Finally, we explore how AI language models like BERT and GPT-2 have learned the phrase and which meanings we can associate with the phrase through these models. The code, data, and detailed results of all these analyses can be found online in our github repository.
We dowloaded the scripts of all Friends episodes from this Github repository and processed the text, converting it to lower case and matching the regular expression
"oh[:punct:]*\\s*my[:punct:]*\\s*[:punct:]*(god)". This regular expression counts the instances of “Oh my God” with a soft rule that allows various punctuation and spaces in between words, but we do not count other variations such as “Oh my fricking God”. We denote the count of matches of the regular expression as OMG, defining this way the unit of our analyses. In total we found 1039 OMG the 229 episodes of the series (double episodes are merged into single files). After counting words with the tm package (Feinerer et al., 2008; Feinerer and Hornik, 2020), we found that Friends has 1476.8 instances of OMG per million trigrams (i.e. sequences of three words).
Figure 1.1 shows the OMG per episode in each season of the series, There is a tendency to more OMG over the lifetime of the series, from less than 3 OMG per episode in the first season to more than 6 in the last one. To compare Friends to contemporary TV shows and movies we applied the same analysis to the 2020 edition of the Corpus Of Contemporary American English (COCA) (Davies, 2010). In our github repository we share the final yearly counts of our analyses, as we are not allowed to share the raw text of the corpus by the terms to access it.
Figure 1.2 shows the yearly frequency of OMG per million trigrams in the TV and movie subtitles part of the COCA corpus. While it is clear that the phrase is very popular in US entertainment and its popularity is increasing, Friends had many more OMG than contemporary TV and movies. Friends had approximately 1477 OMG per million trigrams, which is 4.26 times what you would find on the typical TV shows and movies between 1994 and 2004 (300-400 OMG per million).
Figure 1.3 shows the yearly frequency of OMG per million trigrams in the transcripts of unscripted spoken TV shows of the COCA corpus. While the frequency is about 8.45 times higher in TV and movie subtitles than in these spoken transcripts, the increasing tendency is present too. Although the source of both corpora is TV, the spoken transcripts come from talk shows and other kinds of unscripted shows. Mass media scripts seem to use “Oh my God” as a way to emphasize and elicit surprise reactions in the audience, which does not happen so naturally in live unscripted television.
The frequency of OMG in both scripted and unscripted spoken language in TV shows and movies has been steadily increasing since the 1990s, but to test if Friends might have affected the tendency to use the phrase, we need to look at a longer time period. Inspiried by the trend previously observed in (Loscalzo, 2018), we study the frequency of OMG in Google Books, one of the most comprehensive records of human written communication over several centuries and languages (Michel et al., 2011)2. We use the ngramr R package (Carmody, 2020) to query the 2019 dataset of English fiction books to avoid known problems with non-fiction texts (Pechenick et al., 2015). We also only analyze frequencies since 1900 to avoid Optical Character Recognition errors like mistaking a long s for an f in 1600s and 1700s texts.
Figure 2.1 shows the frequency of OMG per million trigrams in Google Books with the number of OMG per episode of the ten seasons of friends superimposed. The frequency of OMG in books has consistently increased over more than a century. This rate apparently accelerated after Friends came out. Could Friends be responsible for additional growth in the frequency of OMG in books?