You can find the markdown files and data for this exercise in the corresponding Github folder.

Tasks:

  1. Construct the timelines of Twitter users

  2. Build social network of retweets

  3. Calculate assortativity

  4. Permutation tests

  5. Community detection

1. Construct the timelines of Twitter users

Remember what we did in the exercise about retrieving Twitter network data. First connect to the Twitter API using your credentials as in previous exercises.

library(dplyr)
library(tidygraph)
library(rtweet)
library(ggraph)

Then download the file SwissPoliticians.csv and read it as a csv in R. Take into account that separators are tabs.

#Your code here

Look up the basic profile information of each user by screenname and delete the filtered users from your dataset. Then convert to lowercase all screen names in your Twitter profiles and in the data from the csv file and merge them in a single dataframe.

#Your code here

Now retrieve the last 200 tweets of each user (remember the check=FALSE parameter). It might take a bit to get data. Save the result in a file called “timelines.RData” so you don’t lose any data.

#Your code here

2. Build social network of retweets

If you had problems gathering data in previous steps, you can load the example datasets provided here (timelines.RData and users.Rdata). After that, filter the tweets in the timeline such that the user id that is being retweeted is one of the user ids in the user list (the command %in% is useful here or you can use one of the join functions of dplyr):

#Your code here

To build a graph, we have to prepare two data frames. First, construct a vertices data frame with the user ids, screen_names, and the political party label of the vertices. Then build a dataframe for the edges with an edge between any pair of two users that exchanged at least one retweet with each other (regardless of the direction). Think that this will be an undirected network connecting politicians that share the same information on Twitter:

#Your code here

Now do the corresponding call to tbl_graph to build your undirected graph, using the column id of nodes as identifier (node_key).

#Your code here

3. Calculate assortativity

Use the graph_assortativity function to calculate the assortativity with respect to party labels. How high is the value?

#Your code here

To see if the assortativity value fits your expectations, use ggraph to plot the network coloring each node according to the political party label of the politician. Does the pattern of colors fit the value of assortativity?

#Your code here

4. Permutation tests

The above result looks assortative, but how can we test if it could have happened at random and not because of party identity? Here were are going to test it with a permutation test.

First, let’s run a permutation. Perform the same assortativity calculation as above but permuting the party labels of nodes. You can do this very efficiently by using the sample() function when you call the graph_assortativity() function.

#Your code here

Is the value much closer to zero? Repeat the calculation with 1000 permutations and plot the histogram of the resulting values. Add a line with the value of the assortativity without permutation. Is it far or close to the permuted values?

#Your code here

To be sure, let’s calculate a p-value for the null hypothesis that the assortativity is zero and the alternative hypothesis that it is positive (what we expected):

#Your code here

After looking at the above results, do you think it is likely that the assortativity we found in the data was produced by chance?

5. Community detection

Let’s test if Twitter communities match political affiliations. Remove nodes with degree zero in the network and run the Louvain community detection algorithm. Visualize the result coloring nodes by community labels (cast the label to a character so you have distinct colors).

#Your code here

Run the graph_modularity function with the above community labels. Is it high enough to think that the network has a community structure?

#Your code here

Repeat but using the party labels (you might have to cast to factor) instead of the communities detected with Louvain. Is it higher or lower? How far is this modularity from the maximal one found with Louvain?

#Your code here

Finally, to understand which parties are represented in each community, build a data frame for nodes with two columns: one with the party label and another one with the community label. Use the table() function to print a contingency table. Can you guess which party or parties compose each community?

#Your code here

To learn more: