Google Trends is a website hosted by Google that allows you to get a measurement of Google search volume for a term. You can try it yourself here: https://trends.google.com/trends/.
Example of Google Trends volume for āeuropean unionā from the United Kindgom around the Brexit vote.
You can learn more about Google Trends in the next topic Measuring temporal orientation with Google Trends and in the gtrendsR tutorial.
Nowcasting is predicting the present. It provides an estimation of the value of a quantity based on signals that appear at the same time. Nowcasting is valuable when data reporting is delayed, for example in the case of flu incidence.
Google Flu Trends aimed at nowcasting influenza-related physician visits based on Google search volumes. The reports from the Centers for Disease Control are delayed by two weeks, making this a good case for nowcasting. Google Flu Trends used a set of 45 search terms as predictors from a set of 50 million candidate terms.
As reported in the Nature paper, Google Flu Trends achieved a high weekly accuracy between 2003 and 2008. It was applied worked for various regions in the US, not just the total. The figure above shows the nowcast of Google Flu Trends two weeks ahead of the report from the CDC.
In February 2013, Google Flu Trends started overestimating the incidence of flu cases.
A follow-up study in Science showed that just a lagged CDC prediction with 2 week-old data outperformed Google Flu Trends, without using any Google data. Since many details about how Google and Google Trends work are not public, we cannot know for sure what was the surce of the error. Some candidates are news-related searches, seasonal effects, or changes in Googleās algorithm and interface.
Google Flu Trends is an example of Big Data Hubris: āThe often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysisā (Lazer et al, Science, 2014)
Take home message: All data is better than Big Data
Also known as kitchen-sink regression, garbage-in garbage-out, and post hoc storytelling. Some patterns will always come out with sufficient data, what matters is making sense of them. To sum up, paraphrasing Boyd et al., 2020, in statistics the essential challenge is to make sound inferences from too little data, while in data science the challence is to make meaningful inferences from too much data.