This homework assignment asks students to:
1) Perform principal component analysis (PCA) and clustering on personality data and interpret the results.
2) Examine factor loadings and eigenvalues to determine the number of factors to retain from the PCA.
3) Compare k-means and hierarchical clustering results to interpret personality clusters.
4) Analyze and interpret the factor and cluster solutions to understand patterns in the personality data.
This homework assignment asks students to:
1) Perform principal component analysis (PCA) and clustering on personality data and interpret the results.
2) Examine factor loadings and eigenvalues to determine the number of factors to retain from the PCA.
3) Compare k-means and hierarchical clustering results to interpret personality clusters.
4) Analyze and interpret the factor and cluster solutions to understand patterns in the personality data.
This homework assignment asks students to:
1) Perform principal component analysis (PCA) and clustering on personality data and interpret the results.
2) Examine factor loadings and eigenvalues to determine the number of factors to retain from the PCA.
3) Compare k-means and hierarchical clustering results to interpret personality clusters.
4) Analyze and interpret the factor and cluster solutions to understand patterns in the personality data.
Introduction to Computational Statistics (DSSH 6301),
Homework 11, Assignments
For all problems, please show all your work. As described in the Homework Guidelines, use RMarkdown to write up your work as a .Rmd file, knit the result to a PDF file, and submit that PDF file to Blackboard. (You can also knit to an HTML or Word document and save that as a PDF, as decribed in the Homework Guidelines. Be sure to use R code for all your calculations, and the LATEX equation format to write up any math. See the Homework Guidelines in Course Resources on Blackboard for more formatting details. NOTES: include question number and text in your solutions, provide solutions in the same order as given in the assignment some problems may have multiple solutions. If you did not have points taken o for a question, we considered your solution correct be concise use minimum necessary number of signigficant digits; round() and signif() are your friends follow Googles R Style Guide regex sandbox R: r-project R-intro short-intro refcard formatR RStudio RMarkdown RevolutionR swirl LATEX: latex-project wiki sheet sandbox hostmath knitr::kable tables generator detexify tcolorbox Please perform a principal component analysis (PCA) and clustering using bfi data set in the psych package (25 personality items thought to boil down to a few core personality types) and interpret the results. You can load the data using, data(bfi) after loading the psych package; you may need to clean it a bit first with na.omit() to remove the observations with NA items, or else impute those missing items. It might also help to use scale() on your dataset before analysis. scale() takes all your variables (columns) and rescales them to have a mean of 0 and a sd of 1, so that you can more easily compare all your principal components or clusters to see which are larger or smaller. 1. Examine the factor eigenvalues or variances (or the sdev or standard deviations as reported by prcomp or princomp, which you then need to square to get the variances). Plot these in a scree plot and use the elbow test to guess how many factors should be retained. What proportion of the total variance does your subset of variables explain? 2. Examine the loadings of the PCs on the variables (sometimes called the rotation in the function output) - ie, the projection of the principal components on the variables - focusing on just the first one or two PCs. Sort the variables by their loadings, and try to interpret what the first one or two PCs factors mean. This may require looking more carefully into the dataset to understand exactly what each of the variables were measuring. You can find more about the data in the psych package using ?psych or visiting http://personality-project.org 3. First use k-means and examine the centers of the first two or three clusters. How are they similar to and dierent from the factor loadings of the first couple factors? 4. Next use hierarchical clustering. Print the dendrogram, and use that to guide your choice of the number of clusters. Use cutree to generate a list of which clusters each observation belongs to. Aggregate the data by cluster and then examine those centers (the aggregate means) as you did in (3). Can you interpret all of them meaningfully using the methods from (3) to look at the centers? 5. From the factor and cluster analysis, what can you say more generally about what you have learned about your data?