12 points total for the homework

Problem 1 - 3 pts

The p value represents the probability of the observed data (or more extreme data), assuming the null hypothesis is true.

Problem 2

part A - 2pts

foots <- read.table("http://hompal-stats.wabarr.com/datasets/american_feet.txt", header=TRUE)
library(ggplot2)
thePlot <- qplot(height, 
            foot_length, 
            data=foots, 
            main="foot length by height in fake americans", 
            ylab="foot length(cm)", 
            xlab="height(cm)")

thePlot + 
  stat_smooth(method="lm") + 
  theme_bw(18)

part B - 2pts

cors <- numeric(10000)
for(each in 1:10000){
  cors[each] <- abs(cor(foots$height, sample(foots$foot_length)))
}

part C - 1pt

qplot(cors, main="test statistics from 10,000 iterations", xlab="absolute value of correlation coefficient") + theme_bw(18)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.