Collect Twitter user timeline

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

What is the rate limit for collecting timeline data

According to this Twitter API document, you can get up to 3,200 of a user’s most recent tweets.

Scrapping tweets from someone’s timeline is as easy as running the code below. We will get the recent 200 tweets from Elizabeth Warren, a Senator of Massachusetts (@SenWarren).

library(rtweet)
library(readr)

mytoken <- create_token(
  app = "", 
  consumer_key = "", 
  consumer_secret = "", 
  access_token = "", 
  access_secret = "") 

timeline1 <- get_timelines("SenWarren", n = 200, token = mytoken)
timeline1 

We can collect timeline tweets from multiple accounts by using:

#we will collect the recent 50 tweets from Donald Trump, GOP, and the Democratic Party respectively. 

timeline2 <- get_timelines(c("realdonaldtrump", "gop", "dnc"), n = 50, token = mytoken)
timeline2

Remember the trick we did on R data frames in the second tutorial? You can split a data frame based on some matching criteria. For example, we can dissect tweets from @realdonaldtrump into retweets (the retweeted content) and non-retweets (the original content).

dt_timeline <- get_timelines("realdonaldtrump", n = 200, token = mytoken)
dt_rt <- dt_timeline[dt_timeline$is_retweet == TRUE,]
dt_nonrt<- dt_timeline[dt_timeline$is_retweet == FALSE,]
dt_nonrt

Here is a challenge for you: can you extract Trump’s original tweets (non-retweets) that mention other Twitter users? To give you a hint: the mentioned screen names are stored in the mentions_screen_name column. Add the following to your code.

mentions <- dt_timeline[dt_timeline$is_retweet=="FALSE" & !is.na(dt_timeline$mentions_screen_name),]
Avatar
Weiai Wayne Xu
Associate Professor in Computational Communication

Related