General Information

In my workshop you learn to use several ways to collect Twitter Data for your own research. The second main emphasis is a descriptive analysis and how to conduct a network analysis with the data.

We mainly use “ready-to-use” functions of existing packages, but if you will conduct a research project later on, you probably have to use customized function to reach your goals. This may seem tough if you are new to R, but it’s the aspect that actually makes R unique compared to other statistical programs.

I’ve decided to use the opinionated tidyverse package, a collection of R packages designed for Data Science for some code chunks. The packages share a common philosophy and work smoothly together. This approach includes the coding practice of piping which makes the R code easier to read. Some R practitioners consider the introduction of piping the most important innovation in the recent years. I will point out the first few times where I use this coding principle in the code chunks. But don’t worry, you don’t need to use the pipe operators yourself to complete this workshop successfully.

The overall subject of the workshop is the Bundestagswahl 2017 to demonstrate the opportunities on an issue of Political Science. I recommend you to go through the instructions linear as some sections build up on knowledge you acquire in previous sections. My second is advice that you try to understand the logic behind my code chunks, but you don’t necessary have to replicate them all. In a few cases it won’t be possible. To check on your knowledge each section finishes with a small exercise.

Happy coding!

Packages

Make sure you have the latest versions of these packages installed and loaded:

# installs packages
install.packages(c("rtweet", "tidyverse", "ggplot2", "tm", "igraph", "data.table", 
    "stringr"), repos = "http://cran.us.r-project.org")

# loads packages
library("rtweet")
library("tidyverse")
library("ggplot2")
library("tm")
library("igraph")
library("data.table")
library("stringr")

Piping

Traditional R coding forces you the either wrap functions into other functions or assign a lot of variables. The concept of piping allows you to forward the output of one function to the next one and read the sequence from left to right (readability). This functionality comes with the package magrittr that is party of tidyverse.

Let me show you two code chunks that do the same and decide for yourself which one you consider more intuitive and more readable.

Traditional approach:

# you have to read from the center outwards to understand what's going on
mean(mtcars[which(mtcars$mpg > 15), "mpg"])
## [1] 21.72308

Pipe approach:

# input data
mtcars %>% # filters data for each obersvation that has a value greater than 15 for mpg
filter(mpg > 15) %>% # select the column mpg
select(mpg) %>% colMeans
##      mpg 
## 21.72308

While the pipe operator %>% is the most important operator, others exist for different uses as well. Consult the vignette and the documentation of the package for more information. I sometimes use . within a function. The dot explicitly states where the output of the previous function shall be put which is in a few cases necessary. In other words:

# the following code snippets are the same
mtcars %>% filter(mpg > 15)
# and
mtcars %>% filter(., mpg > 15)

Data Collection

Twitter Access

  1. To retrieve data from Twitter you first have to register an account on twitter.com.

  2. You need to create an application under apps.twitter.com. The name, description, and website specifications do not matter as you are not creating an actual application for other Twitter users. You can fill these fields with your creativity.

  3. You have to specify http://127.0.0.1:1410 as your callback url to ensure that your access runs smoothly with the R packages.