23 November 2017
tidyverse R packageCreate a new project to keep files organised
Create a new notebook in RStudio.
#.## and ###.* for italics.** for bold.Type some text into the notebook and experiment with formatting.
You can use the notebook to keep notes for this course.
```{r}``````{r}
# fancy computations here
```
Add a code block to your notebook.
5 or "dog")ans <- 42
Add a variable assignment to your code chunk.
Execute current line of code:
Execute entire code block:
View(ans)print values
print(ans)1:5 logical(3) letters factor(letters)
mat <- matrix(1:12, ncol=3) dim(mat) mat[2, 3]
lst <- list(position=1:26, letter=letters) lst[[2]][3] lst$letter[3] df <- data.frame(position=1:26, letter=letters) df[3, 2] df[-(1:10),]
? command (?which).which function.A lot of useful information is available online. Take a look at rseek.org.
? command (?which).which function.A lot of useful information is available online. Take a look at rseek.org.
package_name::function_name() library(package_name) function_name()
We can manipulate variables in multiple ways
x <- c(ans, ans/2, ans/3, ans/7) y <- x*x z <- x + y p1 <- z/x eq <- x == p1 - 1 small_idx <- which(x < 20) small <- x[small_idx]
+, -, /, * are arithmetic operators.==, <, >, <=, >= are comparison operators.c concatenates variables.[ extracts subset of a vector.Tidy data should have
It depends on the question we want to ask of the data!
tidyr packageAllows easy reformatting of data to make it tidy.
library(tidyverse)
gather(table4a, `1999`, `2000`,
key = "year", value = "cases")
spread(table2, type, count)
Tidy up the probly dataset by gathering all reported probabilities into a single column. Don't forget to load the tidyverse package.
First add a subject column.
probly <- add_column(probly, subject=1:nrow(probly), .before=1)
probly_long <- gather(probly, `Almost Certainly`:`Chances Are Slight`,
key="phrase", value="probability")ggplot2ggplot2 provides an easy way to combine these elements into plots.ggplot(
data) + geom_
type(aes(
aesthetics))
data: a data.frame with your data.
geom: Geometric object to use to visualise the data.
aesthetics: Mapping of variables to properties of the geom (location, colour, size, …)
Create a simple box plot of the data.
library(ggplot2)
Use geom_boxplot
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability))
Take a look at the help page for geom_boxplot, the examples may provide inspiration.
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + coord_flip()
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + theme(axis.text.x=element_text(angle=45,hjust=1))
ggplot which order to use?forcats package supports reordering factor levels based on another variable.library(forcats)
probly_long <- mutate(probly_long, phrase=fct_reorder(phrase, probability))
ggplot(probly_long) + geom_boxplot(aes(x=phrase,
y=probability)) + coord_flip()
Search the help system for a way to change the axis label.
The xlab and ylab function will set the axis labels. Remember that we flipped the coordinate system.
ggplot(probly_long) + geom_boxplot(aes(x=phrase,
y=probability)) +
xlab("") + coord_flip()
group_by function.mutate function.probly_long <- group_by(probly_long, phrase) probly_long <- mutate(probly_long, avg=median(probability))
ggplot(probly_long) + geom_boxplot(aes(x=phrase,
y=probability,
fill=avg)) +
xlab("") + coord_flip()
ggplot chose a single colour gradient to visualise median probabilities.scale functions.scale_
aesthetic_typescale_fill_gradient2(midpoint=50)
ggplot(probly_long) + geom_boxplot(aes(x=phrase,
y=probability, fill=avg)) +
xlab("") + coord_flip() +
scale_fill_gradient2(midpoint=50)
It may be helpful to inspect individual responses. Create a plot that facilitates comparison between individual and average responses for each phrase.
library(ggrepel)
phrase_count <- length(levels(probly_long$phrase))
probly_long <- mutate(probly_long, unexpected=sign(probability - 50) != sign(avg -50) &
probability != 50 & avg != 50)
ggplot(probly_long, aes(x=as.numeric(phrase),
y=probability)) +
geom_line(aes(colour=factor(subject))) +
geom_point(aes(colour=factor(subject), shape=unexpected, size=unexpected)) +
geom_smooth(method="loess", colour='black') +
geom_text_repel(data=filter(probly_long, unexpected),
aes(x=as.numeric(phrase),
y=probability,
label=subject)) +
scale_x_continuous(breaks=1:phrase_count,
labels=levels(probly_long$phrase),
minor_breaks = NULL,
expand = c(0, 0.4)) +
scale_size_manual(values=c(0, 2)) +
scale_shape_manual(values=c(0, 16)) +
guides(colour='none', shape='none', size='none') + xlab("") +
theme(axis.text.x=element_text(angle=45,hjust=1))
probly_clean <- filter(probly_long, subject != 15)
dplyr::summariseprob_summary <- summarise(probly_clean, mean=mean(probability))
summarise can produce multiple summaries at once.
prob_summary <- summarise(probly_clean, mean=mean(probability),
sd=sd(probability),
'trimmed mean'=mean(probability, trim = 0.1),
median=median(probability), IQR=IQR(probability),
MAD=mad(probability))
Investigate the difference between Better Than Even and Likely.
Use data in wide format (one column for each phrase).
A paired t-test should be useful here.
t.test(probly$Likely[-15], probly$`Better Than Even`[-15], paired=TRUE)
## ## Paired t-test ## ## data: probly$Likely[-15] and probly$`Better Than Even`[-15] ## t = 7.1021, df = 44, p-value = 8.102e-09 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 9.802789 17.570544 ## sample estimates: ## mean of the differences ## 13.68667
Learn about your favourite analysis techniques in R
lme4nlmeeyetrackingRlavaanBayesFactorThese slides are available at https://humburg.github.io/Rintro/