23 November 2017
tidyverse
R packageCreate a new project to keep files organised
Create a new notebook in RStudio.
#
.##
and ###
.*
for italics.**
for bold.Type some text into the notebook and experiment with formatting.
You can use the notebook to keep notes for this course.
```{r}
```
```{r}
# fancy computations here
```
Add a code block to your notebook.
5
or "dog"
)ans <- 42
Add a variable assignment to your code chunk.
Execute current line of code:
Execute entire code block:
View(ans)
print
values
print(ans)
1:5 logical(3) letters factor(letters)
mat <- matrix(1:12, ncol=3) dim(mat) mat[2, 3]
lst <- list(position=1:26, letter=letters) lst[[2]][3] lst$letter[3] df <- data.frame(position=1:26, letter=letters) df[3, 2] df[-(1:10),]
?
command (?which
).which
function.A lot of useful information is available online. Take a look at rseek.org.
?
command (?which
).which
function.A lot of useful information is available online. Take a look at rseek.org.
package_name::function_name() library(package_name) function_name()
We can manipulate variables in multiple ways
x <- c(ans, ans/2, ans/3, ans/7) y <- x*x z <- x + y p1 <- z/x eq <- x == p1 - 1 small_idx <- which(x < 20) small <- x[small_idx]
+
, -
, /
, *
are arithmetic operators.==
, <
, >
, <=
, >=
are comparison operators.c
concatenates variables.[
extracts subset of a vector.Tidy data should have
It depends on the question we want to ask of the data!
tidyr
packageAllows easy reformatting of data to make it tidy.
library(tidyverse)
gather(table4a, `1999`, `2000`, key = "year", value = "cases")
spread(table2, type, count)
Tidy up the probly dataset by gathering all reported probabilities into a single column. Don't forget to load the tidyverse
package.
First add a subject column.
probly <- add_column(probly, subject=1:nrow(probly), .before=1)
probly_long <- gather(probly, `Almost Certainly`:`Chances Are Slight`, key="phrase", value="probability")
ggplot2
ggplot2
provides an easy way to combine these elements into plots.ggplot(
data) + geom_
type(aes(
aesthetics))
data: a data.frame
with your data.
geom: Geometric object to use to visualise the data.
aesthetics: Mapping of variables to properties of the geom (location, colour, size, …)
Create a simple box plot of the data.
library(ggplot2)
Use geom_boxplot
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability))
Take a look at the help page for geom_boxplot
, the examples may provide inspiration.
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + coord_flip()
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + theme(axis.text.x=element_text(angle=45,hjust=1))
ggplot
which order to use?forcats
package supports reordering factor levels based on another variable.library(forcats)
probly_long <- mutate(probly_long, phrase=fct_reorder(phrase, probability)) ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + coord_flip()
Search the help system for a way to change the axis label.
The xlab
and ylab
function will set the axis labels. Remember that we flipped the coordinate system.
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability)) + xlab("") + coord_flip()
group_by
function.mutate
function.probly_long <- group_by(probly_long, phrase) probly_long <- mutate(probly_long, avg=median(probability))
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability, fill=avg)) + xlab("") + coord_flip()
ggplot
chose a single colour gradient to visualise median probabilities.scale
functions.scale_
aesthetic_
typescale_fill_gradient2(midpoint=50)
ggplot(probly_long) + geom_boxplot(aes(x=phrase, y=probability, fill=avg)) + xlab("") + coord_flip() + scale_fill_gradient2(midpoint=50)
It may be helpful to inspect individual responses. Create a plot that facilitates comparison between individual and average responses for each phrase.
library(ggrepel) phrase_count <- length(levels(probly_long$phrase)) probly_long <- mutate(probly_long, unexpected=sign(probability - 50) != sign(avg -50) & probability != 50 & avg != 50) ggplot(probly_long, aes(x=as.numeric(phrase), y=probability)) + geom_line(aes(colour=factor(subject))) + geom_point(aes(colour=factor(subject), shape=unexpected, size=unexpected)) + geom_smooth(method="loess", colour='black') + geom_text_repel(data=filter(probly_long, unexpected), aes(x=as.numeric(phrase), y=probability, label=subject)) + scale_x_continuous(breaks=1:phrase_count, labels=levels(probly_long$phrase), minor_breaks = NULL, expand = c(0, 0.4)) + scale_size_manual(values=c(0, 2)) + scale_shape_manual(values=c(0, 16)) + guides(colour='none', shape='none', size='none') + xlab("") + theme(axis.text.x=element_text(angle=45,hjust=1))
probly_clean <- filter(probly_long, subject != 15)
dplyr::summarise
prob_summary <- summarise(probly_clean, mean=mean(probability))
summarise
can produce multiple summaries at once.
prob_summary <- summarise(probly_clean, mean=mean(probability), sd=sd(probability), 'trimmed mean'=mean(probability, trim = 0.1), median=median(probability), IQR=IQR(probability), MAD=mad(probability))
Investigate the difference between Better Than Even and Likely.
Use data in wide format (one column for each phrase).
A paired t-test should be useful here.
t.test(probly$Likely[-15], probly$`Better Than Even`[-15], paired=TRUE)
## ## Paired t-test ## ## data: probly$Likely[-15] and probly$`Better Than Even`[-15] ## t = 7.1021, df = 44, p-value = 8.102e-09 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 9.802789 17.570544 ## sample estimates: ## mean of the differences ## 13.68667
Learn about your favourite analysis techniques in R
lme4
nlme
eyetrackingR
lavaan
BayesFactor
These slides are available at https://humburg.github.io/Rintro/