This exercise reproduces the findings of the article “Quantifying tthe Advantage of Looking Forward” http://www.nature.com/articles/srep00350. According to the results, the GDP per capita of countries is positively correlated to how much their population searches in Google for the next year, relative to how much they search for the previous year. This ratio is called the Future Orientation Index (FOI). So for example for the year 2017 the FOI can be calculated as: FOI = number of searches for the term “2018” / number of searches for the term “2016”.
For this task you will need to install the WDI package. The WDI package gives you access to data of the World Bank’s World Development Indicators.
1.1 Install the WDI package
Run the following commands in your R console to install the WDI package
#Your code here
1.2 Load the WDI library
In the following chunk, load the WDI library
#Your code here
1.3 Set working directory
Check that the working directory of R Studio is the same one where you have the Markdown file. You can set it automatically with this:
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
2.1 Download WDI data
From the WDI we need three indicators:
Gross Domestic Product (GDP) per capita corrected by the Purchase Power Parity (PPP in current or 2005 international $, “NY.GDP.PCAP.PP.KD”)
The amount of Internet users (per 100 people, “IT.NET.USER.ZS”)
The total population (described as as “Population, Total”, “SP.POP.TOTL”)
In the following code chunk, download all data (including extras) for all countries in year 2014.
WDIdf <- WDI(indicator = c("NY.GDP.PCAP.PP.KD", "SP.POP.TOTL", "IT.NET.USER.ZS"),
start = 2014, end = 2014, extra = TRUE)
2.2 Clean WDI data
Some entries are not complete and some others are not countries, but regions. In the following code chunk, make sure that you only use complete rows (use the complete.cases function) and ignore groups of countries and regions by deleting rows ‘Aggregates’ on the region column.
newdf <- WDIdf[complete.cases(WDIdf) & WDIdf$region != "Aggregates",]
2.3 Select countries with more than 5 million internet users In the following code chunk, calculate the value of a new column with the estimated amount of internet users in the country. Filter out countries with less than 5 Million internet users (As reported in the original article).
#Your code here
3.1 Download data from Google Trends
You can download the data from Google Trends following these steps:
Load the .csv file and clean its format with the following code chunk
GoogleDF <-read.csv("geoMap.csv", skip=2)
names(GoogleDF) <- c("country", "G2013", "G2015")
GoogleDF$G2013 <- as.numeric(sub("%", "", GoogleDF$G2013))
GoogleDF$G2015 <- as.numeric(sub("%", "", GoogleDF$G2015))
3.2 Calculate the Future Orientation Index
In the following code chunk, make a new column in the google trends dataframe with the Future Orientation Index, which is the ratio between the search volume for 2015 and 2013 in 2014 for each country
#Your code here
3.2 Merge with World Bank data
Merge the WDI and google trends data frames, using the name of the country. (Hint: use “merge” or “inner_join”)
allDF <- merge(GoogleDF, filteredDF) #filteredDF here is the data frame you produced for task 2.3
4.1 Visualize FOI vs GDP
Now that you have the FOI index and GPD per capita, PPP value for each country, you can make a scatter plot of FOI vs GDP
#Your code here
4.2 Measure Pearson’s correlation
In the following chunk, calculate Pearson’s correlation coefficient between GDP and FOI
#Your code here
4.3 Measure correlation after shuffling
What happens if we shuffle the data (e.g. shuffle the FOIs) and repeat the above analysis? Do you find any difference between the two plots and two Pearson’s correlation coefficients?
shufdata <- allDF[sample(nrow(allDF)),]
#Your code here