Collect tweets by keywords/hashtags

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

Collect tweets by keywords/hashtags

What Twitter Data are Available?

From the previous (post)[https://curiositybits.cc/post/r_analytics2/], you have learned that in order to collect data from Twitter API, you must obtain permission, namely. You probably have also noticed that it is not possible to collect as many tweets as you would like to because Twitter imposes rate limit on each API call. As you play around this tutorial, ask yourself this: Should Twitter make more data available for the public? Or has Twitter already revealed too much data to developers?

Which R library will we be using?

The most essential library we will use is called rtweet (https://rtweet.info) developed by Michael W. Kearney, a professor of Journalism in the University of Missouri. In previous tutorials, I used a library twitteR. twitteR came out earlier in the field and thus is more widely used. But, in terms of ease of use and functionality, rtweet is the best.

Search API vs. Stream API

Try first with Twitter Search API. Find 50 tweets (non-retweets) that contain #breakingnews. We will put tweets into a data frame called tweets1.

library(rtweet)
library(readr)

mytoken <- create_token(
  app = "", 
  consumer_key = "", 
  consumer_secret = "", 
  access_token = "", 
  access_secret = "") 

tweets1 <- search_tweets("#breakingnews", n = 50, token=mytoken)
tweets1

Twitter rate limits cap the number of search results to 18,000 every 15 minutes. You can simply add set _retryonratelimit = TRUE_ and rtweet will wait for rate limit resets for you. (more info). See the example belove.

tweets1 <- search_tweets("#breakingnews", n = 18000, token=mytoken, retryonratelimit = TRUE)

** Next, we will collect tweets from Twitter Stream API. Notice how Stream API differs from Search API. The returned tweets will be put in the data frame named tweets2.

library(rtweet)
library(readr)

mytoken <- create_token(
  app = "", 
  consumer_key = "", 
  consumer_secret = "", 
  access_token = "", 
  access_secret = "") 

tweets2 <- stream_tweets("", timeout = 10, token=mytoken)
tweets2

** By setting timeout = 10, we ask rtweet to keep streaming tweets for 10 seconds. You can set a higher value on your machine. By keeping the search field empty, we ask rtweet to randomly sample (approximately 1%) from the live stream of all tweets.

Avatar
Weiai Wayne Xu
Associate Professor in Computational Communication

Related