This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.
What is the rate limit for collecting timeline data
According to this Twitter API document, you can get up to 3,200 of a user’s most recent tweets.
Scrapping tweets from someone’s timeline is as easy as running the code below. We will get the recent 200 tweets from Elizabeth Warren, a Senator of Massachusetts (@SenWarren).
library(rtweet)
library(readr)
mytoken <- create_token(
app = "",
consumer_key = "",
consumer_secret = "",
access_token = "",
access_secret = "")
timeline1 <- get_timelines("SenWarren", n = 200, token = mytoken)
timeline1
We can collect timeline tweets from multiple accounts by using:
#we will collect the recent 50 tweets from Donald Trump, GOP, and the Democratic Party respectively.
timeline2 <- get_timelines(c("realdonaldtrump", "gop", "dnc"), n = 50, token = mytoken)
timeline2
Remember the trick we did on R data frames in the second tutorial? You can split a data frame based on some matching criteria. For example, we can dissect tweets from @realdonaldtrump into retweets (the retweeted content) and non-retweets (the original content).
dt_timeline <- get_timelines("realdonaldtrump", n = 200, token = mytoken)
dt_rt <- dt_timeline[dt_timeline$is_retweet == TRUE,]
dt_nonrt<- dt_timeline[dt_timeline$is_retweet == FALSE,]
dt_nonrt
Here is a challenge for you: can you extract Trump’s original tweets (non-retweets) that mention other Twitter users? To give you a hint: the mentioned screen names are stored in the mentions_screen_name column. Add the following to your code.
mentions <- dt_timeline[dt_timeline$is_retweet=="FALSE" & !is.na(dt_timeline$mentions_screen_name),]