Collect YouTube Data

Tufekci (2014) wrote that “Twitter has become to social media scholars what the fruit fly is to biologists—a model organism.” But, let’s not forget that there are so many web platforms out there. Arguably, Facebook provides far richer insights than Twitter given its comparatively larger user base and higher penetration rate around the world. Unfortunately, Facebook has shutted down much of its API, making our previous tutorials on Facebook-based data mining obsolete.

I hope we take comfort in the fact that Google’s API remains largely open. Using Google’s API, we can collect metadata from YouTube (e.g., YouTube videos’ statistics and comments).

What are the required libraries for this task We need a library called tuber. It is not something you can install by just running install.packages(). You would first need to install a library called devtools and then execute the following code to install tuber from its development version from GitHub.

devtools::install_github("soodoku/tuber", build_vignettes = TRUE)

What should I know about YouTube’s API Unlike Twitter, which requires you to access its API through a vetted developer account. Google’s API (note: YouTube is owned by Google) is open to all Google users. You can review the steps in the later part of the slide.

You can set up Google API access using your own Google account.

#connect to YouTube's API. More at
yt_oauth("enter Client ID here", "enter Client secret here",token = '')

#get video stats
videostats <- get_stats(video_id = "0JMkzakXgIY")

#Get Information About a Video
videodetails <- get_video_details(video_id = "0JMkzakXgIY")

#Search videos
video_search <- yt_search("Nick Sandmann")

#Get All the Comments Including Replies
comments <- get_all_comments(video_id = "0JMkzakXgIY")

#Get Captions of a Video (still under testing)
captions <- list_caption_tracks(part = "snippet", video_id = "q7Eb4KVw4nE")
get_captions(id = "lvnNItHaLK1QZwMrO67nEelmK37ml7Fh")


Tufekci Z. 2014. Big questions for social media big data: representativeness, validity and other methodological pitfalls. arXiv:1403.7400 [cs.SI]

Weiai Wayne Xu
Associate Professor in Computational Communication