Computational Analysis of Digital Communication (S_C21C)
All documents for this subject (13)
Seller
Follow
Vustudentt
Reviews received
Content preview
Basics
<- is the same as =
Basic data types in R
Numeric – Numbers
Character – text
Factor – categorical data
Logical – true or false
Number can be expressed as character, but text cannot be expressed as numerical
Vector – sequence of one/more values of the same data type (= variable)
- Can have any type of data
Data frame is a collection of vectors with the same length, tied together as columns
A function has the form: output <- function_name(argument1, argument2, ...)
- function_name is a name to indicate which function you want to use. It is followed by parentheses.
- arguments are the input of the function, and are inserted within the parentheses. Arguments can
be any R object, such as numbers, strings, vectors and data.frames. Multiple arguments can be
given, separated by commas.
- output is anything that is returned by the function, such as vectors, data.frames or the results of a
statistical analysis. Some functions do not have output, but produce a visualization or write data to
disk.
The purpose of a function is to make it easy to perform a (large) set of (complex) operations. This is crucial,
because
- It makes code easier to understand. You don’t need to see the operations, just the name of the
function that performs them
- You don’t need to understand the operations, just how to use the function
1
,Week 1- Practical session 1
R Tidyverse - Data transformation & summarization
Introduction
The goal of this practical session is to get you acquainted with the Tidyverse and to learn how to transform
and summarize data. Tidyverse is a collection of packages that have been designed around a singular and
clearly defined set of principles about what data should look like and how we should work with it. It comes
with a nice introduction in the R for Data Science book, for which the digital version is available for free.
This tutorial deals with most of the material in chapter 5 of that book.
In this part of the tutorial, we’ll focus on working with data using the tidyverse package. This package
includes the dplyr (data-pliers) packages, which contains most of the tools we’re using below, but it also
contains functions for reading, analyzing and visualizing data that will be explained later.
Installing tidyverse
As before, install.packages() is used to download and install the package (you only need to do this once on
your computer) and library() is used to make the functions from this package available for use (required
each session that you use the package).
install.packages("tidyverse")
library(tidyverse)
Tidyverse basics
As in most packages, the functionality in dplyr is offered through functions. In general, a function can be
seen as a command or instruction to the computer to do something and (generally) return the result. In
the tidverse package dplyr, almost all functions primarily operate on data sets, for example for filtering and
sorting data.
With a data set we mean a rectangular data frame consisting of rows (often items or respondents) and
columns (often measurements of or data about these items). These data sets can be R data.frames, but
tidyverse has its own version of data frames called tibble, which is functionally (almost) equivalent to a
data frame but is more efficient and somewhat easier to use.
As a very simply example, the following code creates a tibble containing respondents, their gender, and
their height:
data <- tibble (resp = c(1,2,3),
gender = c("M","F","F"),
height = c(176, 165, 172))
data
Tibble is more recent and powerful
- Dbl (double)
- Chr (character variable)
2
,Reading data
The example above manually created a data set, but in most cases you will start with data that you get
from elsewhere, such as a csv file (e.g. downloaded from an online dataset or exported from excel) or an
SPSS or Stata data file.
Tidyverse contains a function read_csv that allows you to read a csv file directly into a tibble. You specify
the location of the file, either on your local drive (as we did in the last practical session) or directly from the
Internet!
The example below downloads an overview of gun polls from the data analytics site 538, and reads it into a
tibble using the read_csv function:
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/poll-quiz-guns/guns-polls.csv"
d <- read_csv(url)
d
(Note that you can safely ignore the (red) message, they simply tell you how each column was parsed)
The shows the first ten rows of the data set, and if the columns don’t fit they are not printed. The
remaining rows and columns are printed at the bottom. For each column the data type is also mentioned
(stands for integer, which is a numeric value; is textual or character data). If you want to browse through
your data, you can also click on the name of the data.frame (d) in the top-right window “Environment” tab
or call View(d).
Subsetting with filter()
The filter function can be used to select a subset of rows. In the guns data, the Question column specifies
which question was asked. We can select only those rows (polls) that asked whether the minimum
purchase age for guns should be raised to 21:
age21 <- filter(d, Question == 'age-21')
age21
Question == 'age-21') = an expression which is true of false
This call is typical for a tidyverse function: the first argument is the data to be used (d), and the remaining
argument(s) contain information on what should be done to the data.
Note the use of == for comparison: In R, = means assingment and == means equals. Other comparisons are
e.g. > (greather than), <= (less than or equal) and != (not equal). You can also combine multiple conditions
with logical (boolean) operators: & (and), | (or), and ! (not), and you can use parentheses like in
mathematics.
So, we can find all surveys where support for raising the gun age was at least 80%:
filter(d, Question == 'age-21' & Support >= 80) = 80 and larger
Note that this command did not assign the result to an object, so the result is only displayed on the screen
but not remembered. This can be a great way to quickly inspect your data, but if you want to continue
analysing this subset you need to assign it to an object as above.
3
, Selecting certain columns
Where filter selects specific rows, select allows you to select specific columns. Most simply, we can simply
name the columns that we want to retrieve them in that particular order.
###Select specific columuns
select(age21, Population, Support, Pollster)
You can also use some more versatile functions such as contains() or starts_with() within a select()
command:
select(age21, contains("Supp")) # Selects all variables that contain the stem "Supp" in their name
You can also specify a range of columns, for example all columns from Support to Democratic Support:
###Specify range of columns
select(age21, Support:`Democratic Support`)
Note the use of ‘backticks’ (reverse quotes) to specify the column name, as R does not normally allow
spaces in names.
Select can also be used to rename columns when selecting them, for example to get rid of the spaces:
###Rename columns to get rid of spaces (between republican support etc)
select(age21, Pollster, rep = `Republican Support`, dem = `Democratic Support`)
Note that select drops all columns not selected. If you only want to rename columns, you can use the
rename function:
Finally, you can drop a variable by adding a minus sign in front of a name:
### Drop variable by adding - in front of a name
select(age21, -Question, -URL)
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Vustudentt. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.94. You're not tied to anything after your purchase.