BIOL 297: Schedule for week of April 6-10

Assignments:

  • There will be a regular lab assignment on analyzing frequency data in R due by next Tuesday.

  • Final project: turn in one paragraph proposal of scientific question and hypothesis by Friday, April 10 @ 11:59 PM HST.

Readings:

  • Whitlock & Schluter, Chapter 9: Contingency analysis

Schedule

We will not meet synchronously this Tuesday. I will post some things later on Tuesday, or more likely Wednesday morning. You can work at your own pace and desired time.

Tuesday, April 7

  • We will introduce Contingency analysis using $\chi^2$ contingency tests (different than the $\chi^2$ test from last week!) and Fisher’s Exact Test

  • I’ve updated the Class Project page with

  • Read chapter 9 from textbook

  • Watch pre-recorded lecture by a friend of mine, Dr. Yaniv Brandvain:

This lecture covers tests of contingency analysis.

  • Watch pre-recorded lecture with code deom on $\chi^2$ contigency tests

# Demonrate chi-squared contingency test

# Load libraries
library(dplyr)
library(forcats)
library(ggplot2)

# data from OpenNahele
fig2 <- read.csv("https://raw.githubusercontent.com/dylancraven/Hawaii_diversity/master/Data/Hawaii_Div_Macro_SIE.csv") %>%
  filter(species_group != "all") %>%
  rename(N_observed = SppN) %>%
  mutate(
    Island = fct_relevel(geo_entity2, c("Hawai'i Island", "Maui Nui", "O'ahu Island", "Kaua'i Island")),
    Status = fct_recode(species_group, Alien = "exotic", Native = "native", `Single Island Endemic` = "nSIE"),
    Status = fct_relevel(Status, c("Single Island Endemic", "Native", "Alien"))
  ) %>%
  select(Island, Status, N_observed) %>%
  arrange(Island, Status)

ggplot(fig2, aes(Island, N_observed, fill = Status)) +
  geom_col(position = position_dodge()) +
  ylab("Species Richness") +
  theme_bw()


# 1. State H_0 and H_A ----

# H_0: The frequency of different groups (Endemic, Native, and Alien) is the same on every island
# N_observed is approximately equal to N_expected

# H_A: The frequency of different groups (Endemic, Native, and Alien) is NOT the same on every island
# N_observed != N_expected

# 2. Calculate a test statistic ----

# Make into a table
fig2_table1 <- matrix(fig2$N_observed, ncol = 4)
# library(tidyr)
# fig2_table2 <- fig2 %>%
#   pivot_wider(names_from = Island, values_from = N_observed)

# Shortcut for calculating expected frequencies
N_expected <- (matrix(rowSums(fig2_table1), ncol = 1) %*% 
    matrix(colSums(fig2_table1), nrow = 1)) / 
  sum(fig2_table1)

fig2$N_expected <- c(N_expected) # This works, but be careful!
fig2$chisq <- (fig2$N_expected - fig2$N_observed) ^ 2 / fig2$N_expected
test_stat <- sum(fig2$chisq)  

# 3. Generate the null distribution

# degrees of freedom is:
df <- (nrow(fig2_table1) - 1) * (ncol(fig2_table1) - 1)

sampling_dist <- data.frame(chisq = seq(0, 70, 0.1))
sampling_dist$Probability <-
  dchisq(
    x = sampling_dist$chisq,
    df = df
  )

head(sampling_dist)

gp <- ggplot(sampling_dist, aes(chisq, Probability)) +
  geom_area(fill = "tomato", color = "black") +
  xlab(expression(chi^2)) +
  theme_bw()

gp

gp + geom_vline(xintercept = test_stat, size = 2)

# 4. Find a critical value at specified alpha ----

alpha <- 0.05

qchisq(
  p = alpha,
  df = df,
  lower.tail = FALSE
)

gp + geom_vline(xintercept = 12.59159, size = 2)

# 5. Find the P-value

P_value <- pchisq(
  q = test_stat,
  df = df,
  lower.tail = FALSE
)

P_value

# Decide ----
P_value < alpha

Thursday, April 3

Related