BIOL 297: Schedule for week of April 27-May 1

Last updated on May 5, 2020 3 min read

Assignments:

Catch up on labs 10 and 11 if you haven’t finished
There will be a regular lab assignment on the normal distribution in R due by next Tuesday.
Friday, May 1: Data analysis plan for the final project

Readings:

Whitlock & Schluter, Chapter 12: Comparing two means
Interleaf 7: Which test should I use? Essential reading for your data analysis plan

Schedule

Note that on Tuesday, class will meet synchronously on Google Meet meet.google.com/foi-xwhi-ape. Class will be recorded and posted online if you are unable to attend. Remaining instruction will be asynchronous, meaning you can work at your own pace and desired time.

Tuesday, April 28 - synchronous

Read Chapter 12 and Interleaf 7 from textbook
Watch pre-recorded lecture by Dr. Yaniv Brandvain:

You only need to watch up until 27:14, the rest is optional, more advanced material
PDF of slides are posted on Laulima

We will go over the R script that you will submit with your final project
Recorded video lecture/code demo

Thursday, April 30 - lab

Lab on ANOVA in R (I will post by Wednesday night)
Join Google Meeting at 12: meet.google.com/vhq-qczv-dew

Code demo: two-sample t-test (unpaired)


# Demonstrate two-sample t-test (unpaired)

# Load libraries
library(dplyr)
library(ggplot2)

# EXAMPLE: question 15 in Chapter 12
# Beer group
Y_1 <- c(0.36, 0.46, 0.06, 0.18, 0.25, 0.18, -0.06, -0.14, 0.12, 0.39, 
         0.17, -0.16, -0.05, 0.19, 0.25, 0.31, 0.17, -0.03, 0.23, -0.03, 
         0.26, 0.30, 0.11, 0.13, 0.21)
# Water group
Y_2 <- c(0.04, 0, -0.08, -0.12, 0.201, -0.039, 0.10, 0.041, 0.02, 0.236, 0.05, 
         0.097, 0.122, -0.019, 0.021, -0.08, -0.165, -0.28)

ggplot(data.frame(
    Y = c(Y_1, Y_2), 
    group = rep(c("Y_1", "Y_2"), c(length(Y_1), length(Y_2)))
  ), aes(Y)) +
  facet_wrap(~ group, ncol = 1) +
  geom_histogram(fill = "tomato", binwidth = 0.1, color = "black") +
  xlab("Value") +
  ylab("Frequency") +
  theme_bw()

# 1. State H_0 and H_A ----

# H_0: The difference between the mean of Y_1 and Y_2 is mu_0
mu_0 <- 0 # null hypothesis is no difference between groups

# H_A: The difference between the mean of Y_1 and Y_2 is not mu_0

# 2. Calculate a test statistic ----

Ybar_1 <- mean(Y_1)
Ybar_2 <- mean(Y_2)
n_1 <- length(Y_1)
n_2 <- length(Y_2)
s2_1 <- var(Y_1)
s2_2 <- var(Y_2)
df_1 <- n_1 - 1
df_2 <- n_2 - 1

s2_p <- (df_1 * s2_1 + df_2 * s2_2) / (df_1 + df_2)
SE_dY <- sqrt(s2_p * (1 / n_1 + 1 / n_2))
test_stat <- (Ybar_1 - Ybar_2) / SE_dY

# 3. Generate the null distribution

sampling_dist <- data.frame(Y = seq(-4, 4, 0.1))
sampling_dist$probability <- dt(sampling_dist$Y, df = df_1 + df_2)

head(sampling_dist)

sampling_dist_plot <- ggplot(sampling_dist, aes(Y, probability)) +
  geom_area(fill = "tomato", alpha = 0.5) +
  geom_line() +
  xlab(expression(italic(t))) +
  ylab("Probability density") +
  theme_bw() +
  theme(
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  )

sampling_dist_plot

sampling_dist_plot +
  geom_vline(xintercept = test_stat, size = 2)

head(sampling_dist)

# 4. Find a critical value at specified alpha ----

alpha <- 0.05

critical_value <- qt(
  p = alpha / 2,
  df = df_1 + df_2,
  lower.tail = FALSE
)

critical_value

sampling_dist_plot + 
  geom_vline(xintercept = critical_value * c(-1, 1), size = 2)

# 5. Find the P-value

P_value <- 2 * pt(
  q = abs(test_stat),
  df = df_1 + df_2,
  lower.tail = FALSE
)

P_value

# Decide ----
P_value < alpha

# The quick way
t.test(Y_1, Y_2, var.equal = TRUE)
t.test(Y_1, Y_2, var.equal = FALSE) # safer