This is solely to help you prepare your R skills for the exam

This is NOT a comprehensive exam review

Theoretical topics are important too!

Subsetting, Indexing, Summary Stats

We will use the built in mtcars dataset (just type mtcars and it already exists in R)

This includes data on vehicles, and includes the model, the miles per gallon (mpg), number of cylinders (cyl), the size of the engine in cubic inches, (disp) and the weight in tons (wt), as well as other info.

  • make a new dataframe that only includes 6 cylinder vehicles (all columns)
  • How many vehicles have a displacement higher than 300?
  • What are the mean horsepower (hp) ratings for vehicles with 4, 6, and 8 cylinders (cyl)?

For loops

Write a for loop to conduct a basic power analysis to answer the question: what is the statistical power of simple T test designed to tell if 2 group means differ from one another?

To do the T test, you will simply do t.test(x, y) and you can extract the p value from this result by name e.g., t.test(x, y)$p.value

For each iteration of your for loop, simulate a vector x which represents a sample size of 7 drawn from a normal distribution with a mean of 10. Then simulate a vector y which represents a sample size of 7 drawn from a normal distribution with a mean of 12. Use the T test to calculate the p value for the test. Repeat this 1000 times.

Now, calculate the false negative rate \(\beta\), which is reflected by the number of t tests returning a p value greater than 0.05 (remember, you know the group means are different in reality, because you simulated it that way). Finally, calculate the statistical power of this T test with these sample sizes and effect size like this: 1 - \(\beta\).

Regression

  • Read this dataset into R http://hompal-stats.wabarr.com/datasets/ungs.txt and save to a variable called ungulates
  • Do an OLS regression of the log of brain size as a function of the log of body mass
  • Be sure to be able to pick out and interpret the overall p-value, the \(r^2\) value, the f-statistic, the slope, the intercept, etc.
  • Make a scatter plot of these two variables, and add the regression line to the plot. Color code the points based on the diet of each species, and make sure you have a plot legend.

Writing simple functions

  • write a simple function that accepts a single numeric vector argument called X.
    • The function should return a count of how many values in X are greater than the mean value of X

Example: If you ran your function on this vector c(16.86350, 17.27193, 16.29412, 21.59090, 14.93115, 20.33967, 22.47104) you should get a return value of 3, because the mean of that vector is 18.53, and 3 values are greater than this mean value.