BIOL 297: Schedule for week of May 4-9

Announcements

  • Last week of class! Woohoo!

  • I will give you feedback on data analysis plan this week.

  • I will be sending out details on the Final Project written/oral presentations

  • Final Project presentations are scheduled for Thursday, May 14 @ 12 PM HST. You can join the Google Meeting using this link: meet.google.com/xxj-wayo-kxf

Assignments:

  • There will be a regular lab assignment on correlation in R

  • Complete all labs by Tuesday, May 12 @ 11:59 PM HST

  • Work on your Final Project!

Readings:

  • Whitlock & Schluter, Chapter 15: Comparing means of more than two groups

  • Whitlock & Schluter, Chapter 16: Correlation between numerial variables

Note: I recommend reading by Chapter 15 and 16 by the end of the semester. For this week, focus on whichever will be more useful for your final project.

Schedule

Note that on Tuesday, class will meet synchronously on Google Meet meet.google.com/foi-xwhi-ape. Class will be recorded and posted online if you are unable to attend.

Tuesday, May 5 - synchronous

  • Read Chapters 15 and/or 16 textbook

  • Watch pre-recorded lecture by Dr. Yaniv Brandvain on ANOVA:

  • Note that this lecture is a bit longer (43 minutes) than previous ones
  • PDF of slides are posted on Laulima
  • We’ll go over background and code demo of ANOVA.

Thursday, May 7 - lab

Code demo (ANOVA)

Recorded video of code demo.


# Demo one-factor ANOVA

# Load libraries
library(dplyr)
library(ggplot2)

# Raw data (Section 15-1, pg. 465)
df <- data.frame(
  treatment = factor(c(rep("Control", 8), rep("Knees", 7), rep("Eyes", 7)), 
                     levels = c("Control", "Knees", "Eyes")),
  Y = c(0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0.31, 0.03,
        -0.29, -0.56, -0.96, -1.61, -0.78, -0.86, -1.35, -1.48, -1.52, -2.04,
        -2.83)
)

# Function to calculate standard error of the mean
se_mean <- function(x) sd(x) / sqrt(length(x))

# Plot data (Figure 15.1-1)
gp <- ggplot(df, aes(treatment, Y)) +
  geom_point() +
  stat_summary(
    fun.data = "mean_se",
    position = position_nudge(x = 0.1),
    fill = "tomato", shape = 21, fatten = 5
  ) +
  xlab("Treatment") +
  ylab("Shift in circadian rhythm (h)") +
  theme_bw()

gp

# 1. State H_0 and H_A ----

# H_0: All group means are the same, mu_1 = mu_2 = mu_3
# H_A: At least one mu_i is different from the others

# 2. Calculate a test statistic ----

# Summarized data.frame for calculation
grouped_df <- group_by(df, treatment)
summary_df <- summarize(
  grouped_df,
  ybar = mean(Y),
  lower = mean(Y) - se_mean(Y),
  upper = mean(Y) + se_mean(Y),
  n = n()
)

summary_df$n_ybar <- summary_df$n * summary_df$ybar

grand_mean <- sum(summary_df$n_ybar) / sum(summary_df$n)

# Group SS
summary_df$s2 <- summary_df$n * (summary_df$ybar - grand_mean) ^ 2
SS_groups <- sum(summary_df$s2)

SS_groups

MS_groups <- SS_groups / (nlevels(df$treatment) - 1)

MS_groups

# Error SS
grouped_df <- mutate(grouped_df, error = (Y - mean(Y)) ^ 2)
SS_error <- sum(grouped_df$error)

SS_error

MS_error <- SS_error / (nrow(df) - nlevels(df$treatment))

MS_error

# F-statistic (test statistic!)
test_stat <- MS_groups / MS_error

# 3. Generate the null distribution ----

df1 <- nlevels(df$treatment) - 1
df2 <- nrow(df) - nlevels(df$treatment)

sampling_dist <- data.frame(Y = seq(0, 10, 0.01)) 

sampling_dist$probability <- df(sampling_dist$Y, df1 = df1, df2 = df2)

sampling_dist_plot <- ggplot(sampling_dist, aes(Y, probability)) +
  geom_area(fill = "tomato", color = "black", alpha = 0.5) +
  xlab(expression(italic(F) == frac(MS[groups], MS[error]))) +
  ylab("Probability density") +
  theme_bw() +
  theme(
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14),
    legend.text = element_text(size = 12),
    legend.title = element_blank()
  )

sampling_dist_plot +
  geom_vline(xintercept = test_stat, size = 2)
  
# 4. Find a critical value at specified alpha ----

alpha <- 0.05

critical_value <- qf(
  p = alpha,
  df1 = df1,
  df2 = df2,
  lower.tail = FALSE
)

critical_value

# 5. Calculating our P-value ----

P_value <- pf(
  q = test_stat,
  df1 = df1,
  df2 = df2,
  lower.tail = FALSE
)

P_value

# 6. Decide
P_value < alpha

# Check answers using anova() function
fit <- lm(Y ~ treatment, data = df)
anova(fit)

Related