BIOL 297: Schedule for week of May 4-9
Announcements
Last week of class! Woohoo!
I will give you feedback on data analysis plan this week.
I will be sending out details on the Final Project written/oral presentations
Final Project presentations are scheduled for Thursday, May 14 @ 12 PM HST. You can join the Google Meeting using this link: meet.google.com/xxj-wayo-kxf
Assignments:
There will be a regular lab assignment on correlation in R
Complete all labs by Tuesday, May 12 @ 11:59 PM HST
Work on your Final Project!
Readings:
Whitlock & Schluter, Chapter 15: Comparing means of more than two groups
Whitlock & Schluter, Chapter 16: Correlation between numerial variables
Note: I recommend reading by Chapter 15 and 16 by the end of the semester. For this week, focus on whichever will be more useful for your final project.
Schedule
Note that on Tuesday, class will meet synchronously on Google Meet meet.google.com/foi-xwhi-ape. Class will be recorded and posted online if you are unable to attend.
Tuesday, May 5 - synchronous
Read Chapters 15 and/or 16 textbook
Watch pre-recorded lecture by Dr. Yaniv Brandvain on ANOVA:
- Note that this lecture is a bit longer (43 minutes) than previous ones
- PDF of slides are posted on Laulima
- We’ll go over background and code demo of ANOVA.
Thursday, May 7 - lab
Lab on Correlation in R (I will post by Wednesday night)
Join Google Meeting at 12: meet.google.com/vhq-qczv-dew
Code demo (ANOVA)
# Demo one-factor ANOVA
# Load libraries
library(dplyr)
library(ggplot2)
# Raw data (Section 15-1, pg. 465)
df <- data.frame(
treatment = factor(c(rep("Control", 8), rep("Knees", 7), rep("Eyes", 7)),
levels = c("Control", "Knees", "Eyes")),
Y = c(0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0.31, 0.03,
-0.29, -0.56, -0.96, -1.61, -0.78, -0.86, -1.35, -1.48, -1.52, -2.04,
-2.83)
)
# Function to calculate standard error of the mean
se_mean <- function(x) sd(x) / sqrt(length(x))
# Plot data (Figure 15.1-1)
gp <- ggplot(df, aes(treatment, Y)) +
geom_point() +
stat_summary(
fun.data = "mean_se",
position = position_nudge(x = 0.1),
fill = "tomato", shape = 21, fatten = 5
) +
xlab("Treatment") +
ylab("Shift in circadian rhythm (h)") +
theme_bw()
gp
# 1. State H_0 and H_A ----
# H_0: All group means are the same, mu_1 = mu_2 = mu_3
# H_A: At least one mu_i is different from the others
# 2. Calculate a test statistic ----
# Summarized data.frame for calculation
grouped_df <- group_by(df, treatment)
summary_df <- summarize(
grouped_df,
ybar = mean(Y),
lower = mean(Y) - se_mean(Y),
upper = mean(Y) + se_mean(Y),
n = n()
)
summary_df$n_ybar <- summary_df$n * summary_df$ybar
grand_mean <- sum(summary_df$n_ybar) / sum(summary_df$n)
# Group SS
summary_df$s2 <- summary_df$n * (summary_df$ybar - grand_mean) ^ 2
SS_groups <- sum(summary_df$s2)
SS_groups
MS_groups <- SS_groups / (nlevels(df$treatment) - 1)
MS_groups
# Error SS
grouped_df <- mutate(grouped_df, error = (Y - mean(Y)) ^ 2)
SS_error <- sum(grouped_df$error)
SS_error
MS_error <- SS_error / (nrow(df) - nlevels(df$treatment))
MS_error
# F-statistic (test statistic!)
test_stat <- MS_groups / MS_error
# 3. Generate the null distribution ----
df1 <- nlevels(df$treatment) - 1
df2 <- nrow(df) - nlevels(df$treatment)
sampling_dist <- data.frame(Y = seq(0, 10, 0.01))
sampling_dist$probability <- df(sampling_dist$Y, df1 = df1, df2 = df2)
sampling_dist_plot <- ggplot(sampling_dist, aes(Y, probability)) +
geom_area(fill = "tomato", color = "black", alpha = 0.5) +
xlab(expression(italic(F) == frac(MS[groups], MS[error]))) +
ylab("Probability density") +
theme_bw() +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_blank()
)
sampling_dist_plot +
geom_vline(xintercept = test_stat, size = 2)
# 4. Find a critical value at specified alpha ----
alpha <- 0.05
critical_value <- qf(
p = alpha,
df1 = df1,
df2 = df2,
lower.tail = FALSE
)
critical_value
# 5. Calculating our P-value ----
P_value <- pf(
q = test_stat,
df1 = df1,
df2 = df2,
lower.tail = FALSE
)
P_value
# 6. Decide
P_value < alpha
# Check answers using anova() function
fit <- lm(Y ~ treatment, data = df)
anova(fit)