Research Methods in Communication
Science
2022/2023 | Vrije Universiteit Amsterdam
Table of content
RESEARCH METHODS IN COMMUNICATION SCIENCE.......................................................................................................................... 1
COMMANDOS IN RSTUDIO................................................................................................................................................................. 2
TUTORIAL 2........................................................................................................................................................................................ 6
MIDTERM 2...................................................................................................................................................................................... 13
TUTORIAL 3...................................................................................................................................................................................... 19
MIDTERM 3...................................................................................................................................................................................... 29
TUTORIAL 4...................................................................................................................................................................................... 33
TUTORIAL 5...................................................................................................................................................................................... 43
MIDTERM 5...................................................................................................................................................................................... 52
TUTORIAL 6...................................................................................................................................................................................... 57
, Commandos in RStudio
Dataset1 = data set | Variables x1, x2, x3
Creating variable
Ong1$New_Var <- R knows to make this part of dataframe
New_Var <- this is hanging out of it’s own. Not part of dataframe
Choose dataset Choose.files(dataset) Apple: File.choose(dataset)
Working Directory at Session. Apple at desktop, Windows C drive
Getting started File --> new file --> RScript
Install package install.packages("x1")
Or on the right side at packages
Load package library(x1)
Open data set Dataset1 <- read_sav(‘’Dataset1”)
- Or: File>import dataset>From SPSS
Descriptive statistics summary(Dataset1)
describe(Dataset1)
Missing values summary(Dataset1)
Mean of variables describe(Dataset1) or specific mean(Dataset1$x1)
- If NA comes up mean(Dataset1$x1, na.rm = T)
Save a mean mean_predicted_y <- mean(Dataset1$y_predicted) to make
predicted value, see predicted value
Frequencies/mode of variables table(Dataset1$x1)
Median of variable describe(Dataset1) or specific median(Dataset1$x1)
Standarddeviation describe(Dataset1)
Sd(Dataset1$x1)
Labels of variables names/values view_df(Dataset1)
Number of variables rows nrow(Dataset1)
Number of variables columns ncol(Dataset1)
Histograms hist(Dataset1$x1)
Making new variable Dataset1$NEWNAME <- ….
Standardizing Dataset1$x1standardized <- scale(Dataset1$x1)
Dummy/recoding Dataset1$x1dummy <- ifelse(Dataset1$x1 < median(Dataset1$x1,
na.rm=T, 0, 1)
- Or: Dataset1$x1 <- ifelse(Dataset1$x1 < 0.5, 0, 1)
- (Condition, outcome if yes, outcome if no)
Continqency table/cross-tabu.. CrossTable(Dataset1$x1, Dataset1$x2)
- CrossTable(x-axis, y-axis)
- Crosstable(row, colomn)
Chi-squared test chisq.test(Dataset1$x1, Dataset1$x2)
- chisq.test(x-axis, y-axis)
- chisq.test(row, colomn)
Linear regression lm(x1 ~ x2, data = Dataset1)
- lm(dependent ~ independent, data = name data set)
- lm(y-axis ~ x-axis, data = name data set)
Standardized coefficients lm.beta(x1)
Confidence intervals confint(x1)
Mean square regression summary(aov(x1))
- Or: summary.aov(x1)
Regression table summary(lm(x1))
- Or: lm.aov(x1)
,Explained variance summary(x1)
Filter filter(Dataset1, x1 <80)
- Filter(Dataset 1, condition)
Scatter plot plot(x1 ~ x2, data = Dataset1)
- plot(dependent ~ independent, data = name data set)
- plot(y-axis ~ x-axis, data = name data set)
Regression line in scatter plot plot(x1 ~ x2, data = Dataset1) and then abline(x3)
New column Dataset1$NEWCOLUMNNAME <- …
- if you want a column which +10 another column (example x3:
o Dataset1$x2plus10 <- Dataset1$x2 + 10
Predicted value Dataset1$Ypredicted <- 9.09 – 0.2 * Dataset1$x1
- After <- is based on the Y=a + bX
Calculate residuals Dataset1$Residuals <- Dataset1$x1 – Dataset1$Ypredicted
- After <- is based on Y -
Mean of residuals mean(Dataset1$Rediduals, na.rm=T)
Mean of squared residuals Dataset1$squaredresiduals <- Dataset1$Residuals^2
- To calculate the mean: mean(…, na.rm=T)
Sum of squared residuals (SSE) sum(Dataset1$squaredresiduals, na.rm=T)
Squared difference between… Y and mean of predicted Y Y-
1. meanY <- (Dataset1$x1)
2. Dataset1$squareddifference <- (Dataset1$x1 – meanY)^2
Predicted Y and mean of Y
1. For predicted Y see predicted value
2. Dataset1$squareddifferencepredy <- (Dataset1$Ypredicted – meanY)^2
Total sum of squares (TSS) sum(Dataset1$squareddifference, na.rm=T)
Regression sum of squares (RSS) sum(Dataset1$squareddifferencepredy, na.rm=t)
R squared R2 R2 = RSS / TSS
RSS <- sum(Dataset1$squareddifferencepredy, na.rm=t)
TSS <- sum(Dataset1$squareddifference, na.rm=T)
Hierarchical linear regression BlockA <- lm(dependent ~ independentv1 + independentv2 + … , data =
Ong1)
See linearity plot(BlockA, 1)
- Shows residuals vs fitted plot
Homoscedasticity plot(BlockA, 3)
- Stan. Residuals vs fitted values
Normality residuals plot(BlockA, 2) or hist(BlockA$residuals)
- Normal Q-Qplot
Independence of residuals durbinWatsonTest(BlockA)
- detect presence of autocorrelation
Multicollinearity vif(BlockA)
- Variance inflation factor
No influential observations Plot(BlockA, 4)
- leverage and residual size (Cook)
How are variables coded typeof(Dataset1$x1)
Test difference significance anova(BlockA, BlockB)
Comparing models anova(BlockA, BlockB)
Delete missing values Dataset1 <- na.omit(Dataset1)
Mean center Dataset1$x1_c <- scale(Dataset1$x1, scale = FALSE)
Process mediation process(data = YOUR_DATASET_NAME, y = "YOUR_Y_VARIABLE", x =
"YOUR_X_VARIABLE", m ="YOUR_MEDIATOR_VARIABLE", model = 4, total
=1, stand =1, normal = 1)
Process moderation process(data = YOUR_DATASET_NAME, y = "YOUR_Y_VARIABLE", x =
, "YOUR_X_VARIABLE", w ="YOUR_MODERATOR_VARIABLE", model = 1, plot
=1, center =1, intprobe = 1, moments = 1, jn=1, hc=0)
Conditional effects sim_slopes((REGRESSION_OBJECT, pred = DEPENDENT_VARIABLE, modx =
MODERATOR, johnson_neyman = FALSE)
Graph conditional effects GraphAggress <- sim_slopes(Reg2_c, pred = Vid_Game_c, modx = CaUnTs_c)
plot (GraphAggress)
Johnson Neyman johnson_neyman(REGRESSION_OBJECT, pred = DEPENDENT_VARIABLE,
modx = MODERATOR, alpha = .05)
Adding up YOUR_DATASET$S2 <- YOUR_DATASET$V11S2 + YOUR_DATASET$V12S2 + ...
+ YOUR_DATASET$V16S2
Where: V11 till V16 and S2,4,5,6)
Taking columns/subset data_practicum5A_SUBSET <- select(data_practicum5A, ppnr,S2,S4,S5,S6)
Or: data_practicum5A_SUBSET <- subset(data_practicum5A, select =
c(ppnr,S2,S4,S5,S6))
From wide to long format data_practicum5A_LONG <-melt(data_practicum5A_SUBSET, id="ppnr",
value.name = "emotie", variable.name= "schilderij")
Run model repeated measures YOUR_MODEL1 <- aov_car(emotie ~ schilderij + Error(ppnr/schilderij),
data=data_practicum5A_LONG)
Estimated Marginal Means emmeans(YOUR_MODEL1, "schilderij")
Visualize Estimated Marginal Means emmip(YOUR_MODEL1, ~ schilderij, CIs = TRUE)
Post-hoc comparisons pairwise.t.test(data_practicum5A_LONG$emotie,
data_practicum5A_LONG$schilderij, p.adj = "bonf")
Coding as categories/factors 1. Test is.factor(data_practicum5B_LONG$stadium)
2. Perform YOUR_DATASET$YOUR_VARIABLE <-
as.factor(YOUR_DATASET$YOUR_VARIABLE)
Create subset t0 <- filter(data_practicum5B_LONG, stadium == "hcul0")
Descriptives different groups describeBy(YOUR_DATA, YOUR_DATA $INDEPENDENT_VARIABLE)
Boxplot ggplot(data_practicum6, aes(x=INDEPENDENT_VAR, fill=INDEPENDENT_VAR,
y=DEPENDENT_VAR1)) + geom_boxplot()
MANOVA Model1 <- manova(cbind(DEPENDENT_VAR1, DEPENDENT_VAR2,
DEPENDENT_VAR3) ~ INDEPENDENT_VAR, data = YOUR_DATA)
Wilks’s Lamda summary(Model1, test="Wilks")
Hotelling’s Trace summary(Model1, test="Hotelling")
Roy’s largest Root summary(Model1, test="Roy")
Pillai’s trace summary(Model1)
ANOVA summary summary.aov(Model1)
Data frame format Dataset <- as.data.frame(dataset)
Box’s test box <- boxM(data_practicum6[, c("DEPENDENT_VAR1",
"DEPENDENT_VAR2", "DEPENDENT_VAR3")], data_practicum6[,
"INDEPENDENT_VAR"])
summary(box, cov=TRUE)
Covariance summary(box, cov=TRUE)
Cross-products SSCP.fn(YOUR_MODEL1)
Sum of Squares + Cross products crossproducts <-SSCP.fn(Model1)
crossproducts$SSCPR/crossproducts$SSCPE
crossproducts$SSCPR/crossproducts$SSCPE*100
Discriminant analysis candisc(Model1)
Correlation of the variates output_candisc = candisc(YOUR_MODEL1)
output_candisc$scores[2]
output_candisc$scores[3]
cor(output_candisc$scores[2], data_practicum6$salary)
cor(output_candisc$scores[3], data_practicum6$salary)
cor(output_candisc$scores[2], data_practicum6$work)
etc…
,Raw coefficients candisc coef(output_candisc, type = c("raw"))
Standardized coefficients candisc coef(output_candisc, type = c("std"))
Plot candisc plot(output_candisc)
, Tutorial 2
#Installeren packages
install.packages("ggplot2")
install.packages("QuantPsyc")
install.packages("car")
install.packages("lm.beta")
install.packages("haven")
install.packages("psych")
install.packages("gmodels")
install.packages("tidyverse")
install.packages("sjPlot")
install.packages("broom")
install.packages("jtools")
install.packages("huxtable")
install.packages("pwr")
#Laden packages
library("ggplot2")
library("QuantPsyc")
library("car")
library("lm.beta")
library("haven")
library("psych")
library("gmodels")
library("tidyverse")
library("sjPlot")
library("broom")
library("jtools")
library("huxtable")
library("pwr")
Checking if regression assumptions are met:
1. linearity,
2. homoscedasticity,
3. independence of the residuals,
4. normal distribution of the residuals,
5. no multicollinearity, and
6. no influential observations
Hierarchical linear regressions
BlockA <- lm(Ong1$FB_Status ~ Ong1$Age + Ong1$Gender, data = Ong1)
BlockB <- lm(Ong1$FB_Status ~ Ong1$Age + Ong1$Gender + Ong1$NEO_FFI, data = Ong1)
BlockC <- lm(Ong1$FB_Status ~ Ong1$Age + Ong1$Gender + Ong1$NEO_FFI + Ong1$NPQC_R, data = Ong1)
Question 1 – check for linearity
plot(BlockC, 1)
A horizontal line, without distinct patterns is an indication for a linear relationship
Linearity is met when in the plot of residuals vs fitted values no particular pattern of divergence from the
horizontal 0-line emerges. Any such patters would mean that residuals tend to be e.g. (highly) positive for some
fitted values and (highly) negative for other fitted values. Your evaluation of the plot can be assisted by the red
lowess line that is printed. This is a smoothed line of the conditional means of the residuals for all fitted