R

I tried to turn a Raspberry Pi desktop into a productivity powerhouse

This post also appears on Medium. There are eleven computing devices in my home office: my two work laptops, three spare Linux laptops, a Raspberry Pi desktop, a smartphone, two old and new iPads, a Chromebook, and a Kindle. They are all vying for the home WiFi and my attention. As a tech aficionado, I work and live with many screens, browser tabs, and cloud servers. Digital distraction is real and feels like enslavement.

How we built an election tracker

I am currently working with my colleague at UMass and the Australian National University to build a social media tracker for the upcoming the 2019 Philippine General Election. The Shiny app is now in beta testing, which you can access from the links below. Dashboard showing how candidates use Twitter Dashboard showing Twitter conversation networks

Text mining: Topic models

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. What is a topic model? Have you dreamed of a day when algorithms can quickly scan through your textbooks and give you a bullet point summary? How convenient! No more tedious reading! Actually, there are algorithms out there that do automatic summarization of large-scale corpus. They are called topic models. In building topic models, we basically ask computers to discover some abstract topics from the text.

Text mining: Semantic network

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. To understand what a semantic network looks like, go ahead and run the code below. library(quanteda) library(ggplot2) reviews_tok <- tokens(review_corpus, remove_punct = TRUE,remove_numbers = TRUE, remove_symbols = TRUE, remove_twitter=TRUE, remove_url=TRUE) reviews_tok <- tokens_select(reviews_tok, pattern = stopwords('en'), selection = 'remove') reviews_tok <- tokens_select(reviews_tok, min_nchar=3, selection = 'keep') reviews_dfm <- dfm(reviews_tok) #create a feature co-occurrence matrix (FCM) review_fcm <- fcm(reviews_dfm) #extract the top 50 frequent terms from the FCM object feat <- names(topfeatures(review_fcm, 50)) #trim the old FCM object into a one that contains only the 50 frequent terms fcm_select <- fcm_select(review_fcm, pattern = feat) set.

Text mining: discover insights

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. Now you are on course to try basic text mining techniques to extract insights from textual data. In this tutorial, we will try four techniques: simple word frequency, word cloud, n-grams, and keyness. Simple word frequency Suppose we want to see how often the word “noisy” appears in Airbnb reviews from the three cities respectively.

Clean messy text

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. Why text cleaning? Textual data are always messy. The data may contain words that, if taken out of context, would be meaningless. You may also encounter a group of different words which convey the same meaning. Or you might have to convert slangs and acronyms into standard English, or emojis into something computer can recognize.

From corpus to document-feature matrix

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. Text mining: From corpus to DFM There is a lot of interest in quantifying and visualizing textual data. Texts reveal our thoughts, our personality, and the pulse of a society. We broadly refer to the quantification of text as text mining. Thanks to the developments in Natural Language Processing and Information retrieval, we now have a wide selection of easy-to-use R libraries for cleaning, transforming, quantifying, and visualizing text.

Sentiment analysis

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. During the 2012 US presidential election, Twitter, in partnership with several polling agencies, launched something called Twitter Political Index. The idea was to track candidates’ popularity among voters based on sentiment expressed in tweets. Back then, such idea was a novelty. Nowadays, sentiment analysis of social media text has been widely applied in marketing/PR, electoral forecasting, and sports analytics.

Visualizing virality

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. We often wonder which user and what kinds of tweets are more viral. In the divided United States of America, a question that may interest many of you is: which political party’s messages attract more attention and positive responses from the public? In the following example, we will analyze 3,197 tweets from @GOP and 2,337 tweets by @TheDemocrats since July 2017.

Make Wordclouds

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics. Wordclouds are perhaps the most basic way of representing text data. You can simply use wordclouds to reveal important topics in a large body of tweets or to get a sense of user demographics based on keywords used in Twitter bio pages. Do I need new libraries? Yes, we will use quanteda for creating wordclouds.