Problem 1

  1. Load Howell’s craniometric dataset
  2. Get rid of unwanted data. Select only the following columns: ID, Sex, Population, BNL, MDH, EKB, ZOR, BAA, NBA. Filter the dataset to only include individuals from the following populations: BUSHMAN, PERU, NORSE, ZULU. Use this filtered dataset for all remaining questions in this homework.
  3. Calculate the variance / covariance matrix for the 6 numeric variables.

Problem 2

  1. Compute a Euclidian distance matrix for the 6 numerical variables. Save this to a variable, DO NOT PRINT THIS TO THE SCREEN OR INCLUDE IT IN THE OUTPUT
  2. Perform a hierarchical cluster analysis using the distance matrix you just computed. Use the hclust() function for this
  3. Use the plot() function to plot the cluster analysis dendrogram.
  4. Interpret the plot visually. Which which single specimen (identified on the plot by its data-frame row number) is most distinct from all other specimens?

Problem 3

  1. Perform a Principal Components Analysis on the data…make sure to use scale=TRUE to scale your variables!
  2. Which single variable has the strongest loading on PC1?
  3. What is the cumulative proportion of variance explained by PC1 and PC2?
  4. Make a plot of the PC1 scores against PC2 scores. Color code the point based on the population variable.