Twitter

Twitter logo
Twitter’s logo

A quick overview of Twitter:

Twitter has an Application Programming Interface (API) to access tweets and user data. You can access this API from various packages in programming languages like R, python, or java. To access the API you need to make a Twitter account, and when you do so you agree to the terms of Twitter to access their service and their data. There is an additional set of terms for developers that access the API and to limit how data can be shared outside Twitter. All exercises in this course respect the Twitter terms of service and the developer policy, but you should keep in mind these rules in case you do further projects with the Twitter API.

The rtweet package

There are several R package to access the Twitter API. The Twitter API changes very often and package developers have to adapt their packages if they want them to keep on working as expected. As of 2021 the best maintained package for R is rtweet. You can find many examples and functionalities of the package in its github repository and documentation about its functions at https://docs.ropensci.org/rtweet/reference/index.html.

You can install it with the typical install.packages() call. This tutorial will use some functions of rtweet that need you to have the package httpuv installed too:

install.packages("rtweet")
install.packages("httpuv")

To start using rtweet, you only need to load it with the library() function. The dependencies with httpuv are handled inside rtweet, so you only need to load one package:

library(rtweet)

Connecting to the Twitter API

rtweet allows a very easy way to access the Twitter API from R thanks to the rstats2twitter application. Before you run this code, open a browser and log into your Twitter account. Then in an interactive R session (for example what you have in the Console of RStudio), call one of rtweet functions like this one:

result <- lookup_users("dgarcia_eu")

The first time you run this function will open a window in your browser asking for permissions to the rstats2twitter app in your Twitter account. This app on Twitter will allow you to use the API with your Twitter account. If you do not want to use rtweet this way in the future, you can always disconnect the app from your Twitter account at https://twitter.com/settings/connected_apps.

Once you have given permissions, rtweet will save your connection in your R configuration and you can access the Twitter API from R from now on. This will work from interactive R sessions like when you are running R chunks in RStudio. If you want to run automated scripts or knit markdown, you will need to connect with a token as a developer. Check the appendix at the end of this tutorial if you want to learn more.

User profiles

We can get the basic profile information of a user with the lookup_users() function of rtweet. You can request information on many users at once and get a data frame with a row for each user. Here is an example that uses the dplyr function glimpse() to format such large output:

library(dplyr)
result <- lookup_users("dgarcia_eu")
glimpse(result)
## Rows: 1
## Columns: 90
## $ user_id                 <chr> "574364219"
## $ status_id               <chr> "1360611024605487105"
## $ created_at              <dttm> 2021-02-13 15:25:21
## $ screen_name             <chr> "dgarcia_eu"
## $ text                    <chr> "@SimonDeDeo @j_bertolotti We discussed the s…
## $ source                  <chr> "Twitter Web App"
## $ display_text_width      <int> NA
## $ reply_to_status_id      <chr> "1360595306015055876"
## $ reply_to_user_id        <chr> "2364445033"
## $ reply_to_screen_name    <chr> "SimonDeDeo"
## $ is_quote                <lgl> FALSE
## $ is_retweet              <lgl> FALSE
## $ favorite_count          <int> 4
## $ retweet_count           <int> 0
## $ quote_count             <int> NA
## $ reply_count             <int> NA
## $ hashtags                <list> [NA]
## $ symbols                 <list> [NA]
## $ urls_url                <list> ["twitter.com/i/web/status/1…"]
## $ urls_t.co               <list> ["https://t.co/skkInxBMzn"]
## $ urls_expanded_url       <list> ["https://twitter.com/i/web/status/136061102…
## $ media_url               <list> [NA]
## $ media_t.co              <list> [NA]
## $ media_expanded_url      <list> [NA]
## $ media_type              <list> [NA]
## $ ext_media_url           <list> [NA]
## $ ext_media_t.co          <list> [NA]
## $ ext_media_expanded_url  <list> [NA]
## $ ext_media_type          <chr> NA
## $ mentions_user_id        <list> [<"2364445033", "956539964795301889", "30216…
## $ mentions_screen_name    <list> [<"SimonDeDeo", "j_bertolotti", "CSHVienna">]
## $ lang                    <chr> "en"
## $ quoted_status_id        <chr> NA
## $ quoted_text             <chr> NA
## $ quoted_created_at       <dttm> NA
## $ quoted_source           <chr> NA
## $ quoted_favorite_count   <int> NA
## $ quoted_retweet_count    <int> NA
## $ quoted_user_id          <chr> NA
## $ quoted_screen_name      <chr> NA
## $ quoted_name             <chr> NA
## $ quoted_followers_count  <int> NA
## $ quoted_friends_count    <int> NA
## $ quoted_statuses_count   <int> NA
## $ quoted_location         <chr> NA
## $ quoted_description      <chr> NA
## $ quoted_verified         <lgl> NA
## $ retweet_status_id       <chr> NA
## $ retweet_text            <chr> NA
## $ retweet_created_at      <dttm> NA
## $ retweet_source          <chr> NA
## $ retweet_favorite_count  <int> NA
## $ retweet_retweet_count   <int> NA
## $ retweet_user_id         <chr> NA
## $ retweet_screen_name     <chr> NA
## $ retweet_name            <chr> NA
## $ retweet_followers_count <int> NA
## $ retweet_friends_count   <int> NA
## $ retweet_statuses_count  <int> NA
## $ retweet_location        <chr> NA
## $ retweet_description     <chr> NA
## $ retweet_verified        <lgl> NA
## $ place_url               <chr> NA
## $ place_name              <chr> NA
## $ place_full_name         <chr> NA
## $ place_type              <chr> NA
## $ country                 <chr> NA
## $ country_code            <chr> NA
## $ geo_coords              <list> [<NA, NA>]
## $ coords_coords           <list> [<NA, NA>]
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>]
## $ status_url              <chr> "https://twitter.com/NA/status/13606110246054…
## $ name                    <chr> "David Garcia"
## $ location                <chr> "Graz and Vienna, Austria"
## $ description             <chr> "Professor for Computational Behavioral and S…
## $ url                     <chr> "http://t.co/6iP1nxpYeS"
## $ protected               <lgl> FALSE
## $ followers_count         <int> 1610
## $ friends_count           <int> 537
## $ listed_count            <int> 29
## $ statuses_count          <int> 1176
## $ favourites_count        <int> 1121
## $ account_created_at      <dttm> 2012-05-08 10:22:31
## $ verified                <lgl> FALSE
## $ profile_url             <chr> "http://t.co/6iP1nxpYeS"
## $ profile_expanded_url    <chr> "http://www.dgarcia.eu"
## $ account_lang            <lgl> NA
## $ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/574364…
## $ profile_background_url  <chr> "http://abs.twimg.com/images/themes/theme1/bg…
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/11457361…

As you see, not all the fields of a user object are filled. Some of the most useful fields are:

The result contains also the information of the latest tweet of the user. Some of the columns in this data frame are for that tweet, for example text and is_retweet.

User timelines

If you want more than just the last tweet of a user, you can use the get_timeline() function to get the latest tweets of a user:

glimpse(get_timeline(user="dgarcia_eu", n=10))
## Rows: 10
## Columns: 90
## $ user_id                 <chr> "574364219", "574364219", "574364219", "57436…
## $ status_id               <chr> "1360611024605487105", "1360609046018088961",…
## $ created_at              <dttm> 2021-02-13 15:25:21, 2021-02-13 15:17:30, 20…
## $ screen_name             <chr> "dgarcia_eu", "dgarcia_eu", "dgarcia_eu", "dg…
## $ text                    <chr> "@SimonDeDeo @j_bertolotti We discussed the s…
## $ source                  <chr> "Twitter Web App", "Twitter Web App", "Twitte…
## $ display_text_width      <dbl> 123, 140, 140, 21, 61, 81, 140, 116, 168, 140
## $ reply_to_status_id      <chr> "1360595306015055876", NA, NA, "1360170218316…
## $ reply_to_user_id        <chr> "2364445033", NA, NA, "95488935", "7180228587…
## $ reply_to_screen_name    <chr> "SimonDeDeo", NA, NA, "vgalaz", "FerreiramrR"…
## $ is_quote                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALS…
## $ is_retweet              <lgl> FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,…
## $ favorite_count          <int> 4, 0, 0, 0, 2, 21, 0, 0, 4, 0
## $ retweet_count           <int> 0, 2, 106, 0, 0, 5, 47, 0, 0, 62
## $ quote_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ reply_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ hashtags                <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ symbols                 <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ urls_url                <list> [NA, NA, NA, NA, NA, "twitter.com/STWorg/sta…
## $ urls_t.co               <list> [NA, NA, NA, NA, NA, "https://t.co/Sxrhnmz8r…
## $ urls_expanded_url       <list> [NA, NA, NA, NA, NA, "https://twitter.com/ST…
## $ media_url               <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ media_t.co              <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ media_expanded_url      <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ media_type              <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ ext_media_url           <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ ext_media_t.co          <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ ext_media_expanded_url  <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA]
## $ ext_media_type          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ mentions_user_id        <list> [<"2364445033", "956539964795301889", "30216…
## $ mentions_screen_name    <list> [<"SimonDeDeo", "j_bertolotti", "CSHVienna">…
## $ lang                    <chr> "en", "en", "en", "en", "en", "en", "en", "en…
## $ quoted_status_id        <chr> NA, NA, NA, NA, NA, "1359893447495323649", NA…
## $ quoted_text             <chr> NA, NA, NA, NA, NA, "Exciting opportunity for…
## $ quoted_created_at       <dttm> NA, NA, NA, NA, NA, 2021-02-11 15:53:58, NA,…
## $ quoted_source           <chr> NA, NA, NA, NA, NA, "TweetDeck", NA, NA, NA, …
## $ quoted_favorite_count   <int> NA, NA, NA, NA, NA, 19, NA, NA, NA, NA
## $ quoted_retweet_count    <int> NA, NA, NA, NA, NA, 16, NA, NA, NA, NA
## $ quoted_user_id          <chr> NA, NA, NA, NA, NA, "277826494", NA, NA, NA, …
## $ quoted_screen_name      <chr> NA, NA, NA, NA, NA, "STWorg", NA, NA, NA, NA
## $ quoted_name             <chr> NA, NA, NA, NA, NA, "Stephan Lewandowsky", NA…
## $ quoted_followers_count  <int> NA, NA, NA, NA, NA, 7411, NA, NA, NA, NA
## $ quoted_friends_count    <int> NA, NA, NA, NA, NA, 1731, NA, NA, NA, NA
## $ quoted_statuses_count   <int> NA, NA, NA, NA, NA, 15122, NA, NA, NA, NA
## $ quoted_location         <chr> NA, NA, NA, NA, NA, "", NA, NA, NA, NA
## $ quoted_description      <chr> NA, NA, NA, NA, NA, "Prof Stephan Lewandowsky…
## $ quoted_verified         <lgl> NA, NA, NA, NA, NA, TRUE, NA, NA, NA, NA
## $ retweet_status_id       <chr> NA, "1360354968037916675", "13601678488370667…
## $ retweet_text            <chr> NA, "On a related note, but mindful of your i…
## $ retweet_created_at      <dttm> NA, 2021-02-12 22:27:53, 2021-02-12 10:04:20…
## $ retweet_source          <chr> NA, "TweetDeck", "Twitter Web App", NA, NA, N…
## $ retweet_favorite_count  <int> NA, 9, 182, NA, NA, NA, 164, NA, NA, 80
## $ retweet_retweet_count   <int> NA, 2, 106, NA, NA, NA, 47, NA, NA, 62
## $ retweet_user_id         <chr> NA, "285698560", "18018877", NA, NA, NA, "126…
## $ retweet_screen_name     <chr> NA, "jayvanbavel", "FredericJacobs", NA, NA, …
## $ retweet_name            <chr> NA, "Jay Van Bavel", "Frederic Jacobs", NA, N…
## $ retweet_followers_count <int> NA, 29227, 19503, NA, NA, NA, 643, NA, NA, 31…
## $ retweet_friends_count   <int> NA, 693, 365, NA, NA, NA, 291, NA, NA, 168
## $ retweet_statuses_count  <int> NA, 8475, 6744, NA, NA, NA, 74, NA, NA, 1011
## $ retweet_location        <chr> NA, "New York, NY", "Lausanne, Switzerland", …
## $ retweet_description     <chr> NA, "Social neuroscience professor at NYU wri…
## $ retweet_verified        <lgl> NA, FALSE, TRUE, NA, NA, NA, FALSE, NA, NA, F…
## $ place_url               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ place_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ place_full_name         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ place_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ country                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ country_code            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ geo_coords              <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ coords_coords           <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, …
## $ status_url              <chr> "https://twitter.com/dgarcia_eu/status/136061…
## $ name                    <chr> "David Garcia", "David Garcia", "David Garcia…
## $ location                <chr> "Graz and Vienna, Austria", "Graz and Vienna,…
## $ description             <chr> "Professor for Computational Behavioral and S…
## $ url                     <chr> "http://t.co/6iP1nxpYeS", "http://t.co/6iP1nx…
## $ protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ followers_count         <int> 1610, 1610, 1610, 1610, 1610, 1610, 1610, 161…
## $ friends_count           <int> 537, 537, 537, 537, 537, 537, 537, 537, 537, …
## $ listed_count            <int> 29, 29, 29, 29, 29, 29, 29, 29, 29, 29
## $ statuses_count          <int> 1176, 1176, 1176, 1176, 1176, 1176, 1176, 117…
## $ favourites_count        <int> 1121, 1121, 1121, 1121, 1121, 1121, 1121, 112…
## $ account_created_at      <dttm> 2012-05-08 10:22:31, 2012-05-08 10:22:31, 20…
## $ verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ profile_url             <chr> "http://t.co/6iP1nxpYeS", "http://t.co/6iP1nx…
## $ profile_expanded_url    <chr> "http://www.dgarcia.eu", "http://www.dgarcia.…
## $ account_lang            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/574364…
## $ profile_background_url  <chr> "http://abs.twimg.com/images/themes/theme1/bg…
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/11457361…

We use the dplyr glimpse() function again to make the output readable. We get a data frame with one row per tweet and one column for each of the fields of the tweets. Twitter has restrictions on how to share this data, but you can always share the tweet numeric ids and other developers can request the content of the tweets themselves through the API.

The parameter n=10 told rtweet that we just want the latest 10 tweets. You can get up to the latest 3200 tweets from a user. Twitter has rate limits on the number of requests per 15-minute windows you can make. The get_timeline() function needs to make one request per 200 tweets, so plan accordingly in case you want to make many requests.

Tweet content

You can look up the content of individual tweets if you know their numeric ids. This is very useful in case you got a dataset just with tweet ids but no other data. Take care of making sure that you treat these ids as strings because they are such long numbers that they can be truncated if you treat them as integers. Here is an example of the result:

tweet <- lookup_tweets("1344740446069809152")
glimpse(tweet)
## Rows: 1
## Columns: 90
## $ user_id                 <chr> "574364219"
## $ status_id               <chr> "1344740446069809152"
## $ created_at              <dttm> 2020-12-31 20:21:21
## $ screen_name             <chr> "dgarcia_eu"
## $ text                    <chr> "The first is the best one 😂😂\nhttps://t.co/o…
## $ source                  <chr> "Twitter Web App"
## $ display_text_width      <dbl> 52
## $ reply_to_status_id      <lgl> NA
## $ reply_to_user_id        <lgl> NA
## $ reply_to_screen_name    <lgl> NA
## $ is_quote                <lgl> FALSE
## $ is_retweet              <lgl> FALSE
## $ favorite_count          <int> 9
## $ retweet_count           <int> 0
## $ quote_count             <int> NA
## $ reply_count             <int> NA
## $ hashtags                <list> [NA]
## $ symbols                 <list> [NA]
## $ urls_url                <list> ["reddit.com/r/MapPorn/comm…"]
## $ urls_t.co               <list> ["https://t.co/oRZycCvqNx"]
## $ urls_expanded_url       <list> ["https://www.reddit.com/r/MapPorn/comments/…
## $ media_url               <list> [NA]
## $ media_t.co              <list> [NA]
## $ media_expanded_url      <list> [NA]
## $ media_type              <list> [NA]
## $ ext_media_url           <list> [NA]
## $ ext_media_t.co          <list> [NA]
## $ ext_media_expanded_url  <list> [NA]
## $ ext_media_type          <chr> NA
## $ mentions_user_id        <list> [NA]
## $ mentions_screen_name    <list> [NA]
## $ lang                    <chr> "en"
## $ quoted_status_id        <chr> NA
## $ quoted_text             <chr> NA
## $ quoted_created_at       <dttm> NA
## $ quoted_source           <chr> NA
## $ quoted_favorite_count   <int> NA
## $ quoted_retweet_count    <int> NA
## $ quoted_user_id          <chr> NA
## $ quoted_screen_name      <chr> NA
## $ quoted_name             <chr> NA
## $ quoted_followers_count  <int> NA
## $ quoted_friends_count    <int> NA
## $ quoted_statuses_count   <int> NA
## $ quoted_location         <chr> NA
## $ quoted_description      <chr> NA
## $ quoted_verified         <lgl> NA
## $ retweet_status_id       <chr> NA
## $ retweet_text            <chr> NA
## $ retweet_created_at      <dttm> NA
## $ retweet_source          <chr> NA
## $ retweet_favorite_count  <int> NA
## $ retweet_retweet_count   <int> NA
## $ retweet_user_id         <chr> NA
## $ retweet_screen_name     <chr> NA
## $ retweet_name            <chr> NA
## $ retweet_followers_count <int> NA
## $ retweet_friends_count   <int> NA
## $ retweet_statuses_count  <int> NA
## $ retweet_location        <chr> NA
## $ retweet_description     <chr> NA
## $ retweet_verified        <lgl> NA
## $ place_url               <chr> NA
## $ place_name              <chr> NA
## $ place_full_name         <chr> NA
## $ place_type              <chr> NA
## $ country                 <chr> NA
## $ country_code            <chr> NA
## $ geo_coords              <list> [<NA, NA>]
## $ coords_coords           <list> [<NA, NA>]
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>]
## $ status_url              <chr> "https://twitter.com/dgarcia_eu/status/134474…
## $ name                    <chr> "David Garcia"
## $ location                <chr> "Graz and Vienna, Austria"
## $ description             <chr> "Professor for Computational Behavioral and S…
## $ url                     <chr> "http://t.co/6iP1nxpYeS"
## $ protected               <lgl> FALSE
## $ followers_count         <int> 1610
## $ friends_count           <int> 537
## $ listed_count            <int> 29
## $ statuses_count          <int> 1176
## $ favourites_count        <int> 1121
## $ account_created_at      <dttm> 2012-05-08 10:22:31
## $ verified                <lgl> FALSE
## $ profile_url             <chr> "http://t.co/6iP1nxpYeS"
## $ profile_expanded_url    <chr> "http://www.dgarcia.eu"
## $ account_lang            <lgl> NA
## $ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/574364…
## $ profile_background_url  <chr> "http://abs.twimg.com/images/themes/theme1/bg…
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/11457361…

The results has many fields, some of the most useful ones are:

There are many other functions in the rtweet package. We will learn more in the Twitter networks tutorial but you can already start looking at them in the documentation of rtweet.

Appendix: connecting as developer

Using the Twitter API as a developer is more convenient because it allows you to run R codes automatically and it provides more stable and easy-to-track permissions.

The rtweet package provides an excellent vignette explaining how to connect like this. You will have to fill a form for Twitter to give you access as a developer. In this form, they will ask you questions about what you want to do with the API. In the part “tell us how this app will be used”, explain in your own words that this app is for a Social Data Science course (mention that you are enrolled if that is the case) and say that you won’t interact with users in any way. This is important so Twitter can avoid developers from creating malicious bots that automatically interact with real users, if you don’t say this or are not specific enough, your application might take long to be approved.

Running the following code will take you to the vignette. Follow the steps in the vignette to set up your access token:

vignette("auth", package = "rtweet")

Once you have the token created, it will be saved and you can loaded automatically in future R sessions. Other tutorials and exercises in this course will use this functionality, for example the Twitter network data tutorial, but if you run the code in an interactive R session, you can still run the tutorial even if you didn’t get a developer account yet.