## Problem # 1

Using this fake data set on lemur feeding rates do a two-factor ANOVA to determine whether feeding rate is influenced by sex and by the group each individual belongs to.

## A. Make a boxplot of feeding rate by sex

library(ggplot2)
theme_set(theme_bw(30))
qplot(x=sex, y=feedingrate, data=lemurs, geom="boxplot", fill=sex)

## B. Make a boxplot of feeding rate by group

qplot(x=group, y=feedingrate, data=lemurs, geom="boxplot", fill=group)

## C. Show the ANOVA table for the two way ANOVA with the interaction term.

anova(lm(feedingrate ~ group * sex, data=lemurs))
## Analysis of Variance Table
##
## Response: feedingrate
##            Df Sum Sq Mean Sq F value    Pr(>F)
## group       2  467.7  233.85  4.5898 0.0121018 *
## sex         1 3077.2 3077.20 60.3967 3.732e-12 ***
## group:sex   2 1011.9  505.97  9.9307 0.0001057 ***
## Residuals 114 5808.3   50.95
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## D. Explore the interaction between the two factors by creating a plot analogous to figure 10.3 (p 329) in the Gotelli book. Put the group on the X axis, and the mean feeding rate for each group on the Y axis (make this a line going across all factor levels). Color code the lines by sex.

library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
##     filter, lag
## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union
groupedLemurs <- lemurs %>% group_by(group, sex) %>% summarize(mean=mean(feedingrate))
qplot(x=group, y=mean, data=groupedLemurs, group=sex, geom="line", color=sex)

## E. Explain whether or not there is an interaction between the two factors.

Heck yeah there is. These lines are not parallel, so there is a non-additive interaction between sex and group.

# Write a function to simulate ANOVA type data……

doANOVA <- function(sampleSize=20, grandMean=50, errorSD=5, meanDiff=3) {
library(ggplot2)
myError <- rnorm(sampleSize, mean = 0, sd = errorSD)
groups <- rep(c("groupX", "groupY"), sampleSize/2)
y <- grandMean + c((meanDiff/2)*-1, (meanDiff/2)) + myError
print(qplot(x=groups, y=y, geom="boxplot"))
myModel <- lm(y~groups)
return(pf(summary(myModel)$fstatistic[1],summary(myModel)$fstatistic[2],summary(myModel)\$fstatistic[3], lower.tail = FALSE))
}

# what effect do these parameters have on the p value?

1. sample size - increasing n decreases the p value
2. grand mean - no effect
3. standard deviation of the error term - increasing error variance (sd) increases the p value