348 views

Uploaded by Ari David Paul

Principal Component Analysis (PCA) is a statistical tool for analyzing complex datasets. I had to learn it recently for a project the risk team is working on, and had trouble finding the right materials to teach myself. Everything I could find was either way over my head and assumed a lot of complex linear algebra knowledge, or too simplistic to give me confidence that I really understood what was going on “under the hood.” So…I created my own guide. Just in case anyone is interested in learning PCA (or the basics of linear algebra and matrix math), this is a good guide that assumes very little prior math training.

- Intermediate R - Principal Component Analysis
- PrincipalComponentAnalysis(1)
- A Tutorial on Principal Component Analysis
- Practical Guide to Principal Component Analysis (PCA) in R & Python
- Multivariate Data Analysis
- Principle Component Analysis
- Cryptocurrency and Blockchains
- Analysis of a Complex of Statistical Variables Into Principal Components
- Principal Component Analysis
- Principal Component Analysis.ppt
- Practical Guide to Principal Component Methods in R Multivariate Analysis Book 2 by Alboukadel Kas
- Presentation - Overview PCA & PLS - March 2009
- User Friendly Multivariate Calibration GP
- Principal Component Analysis i To
- Principal Component Analysis-PRESENTATION.ppt
- Principal Component Analysis 4 Dummies
- Bayes Network
- 6536-8489-1-PB (pp. 152-158)
- Principal Component Analysis
- Academic Performance of Statistics Students at II (Submitted)Revised

You are on page 1of 15

An Introduction

April 15th, 2015

Ari Paul

Overview

What is Principal Component Analysis (PCA)?

PCA is a tool for analyzing a complex data set and re-expressing it in simpler terms. The textbook definition

is: a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly

correlated variables into a set of values of linearly uncorrelated variables called principal components. It

replaces a set of observations with a new dataset that may better explain the underlying dynamics of the

system with less data.

Before diving into the meat of PCA, Ill provide 4 slides on Linear Algebra, a necessary toolkit for

understanding and performing PCA.

Outline:

Linear Algebra

Slide

Terminology and Basic Matrix Math

3

Matrices

4

Determinants and Characteristic Equation

5

Eigenvectors and Eigenvalues

PCA

Introduction

7

Step 1 Centering and Scaling

Step 2 Covariance Matrix

Step 3 Calculating Eigenvectors

9

Step 3 (cont) Eigenvector Explanation

10

Step 4 Re-express the Data

Interpretation

Assumptions and Pitfalls

Appendices

Appendix

Algebraic

Support

for

THE UNIVERSITY

OF 1CHICAGO

OFFICE

OF INVESTMENTS

Eigenvectors

14

6

8

8

11

12

13

x-axis and y-axis to Principal Components 1 and

2. This delineation better captures the dynamic

of high variance along the PC 1 axis, and low

variance along the PC 2 axis.

TerminologyMath

Matrix

A single real number is a scalar (e.g. 7). A scalar can be thought of as a matrix with one row and one

column. A vector is a matrix with one row and multiple columns, or one column and multiple rows. [4 0 2

1] is a vector with one row and four columns. This can be referred to as a 1x4 matrix, or m x n matrix more

broadly, where m is the # of rows, and n is the # of columns. We can refer to a specific number within a

matrix using an i subscript to refer to the row and j to refer to the column. In the vector example given

above, V1,3 refers to the 2.

Matrix and Vector Math

Adding and substracting matrices is easy, but they must be of equal dimensions. With matrices of equal

dimensions, you simply add or subtract the number in the same position in each matrix.

Multiplying a matrix by a matrix is a bit more complicated. The matrices can be different sizes, but the

number of columns of the first matrix must equal the number of rows of the second matrix. The resulting

matrix always has the number of rows of the first matrix and the number of columns of the second matrix.

This is known as the dot product or outer product. Matrix multiplication is NOT commutative. A x B B x

A. Multiplying a vector by a matrix is identical, since vectors are just matrices with either a single row or

column.

This is a special matrix called the identity matrix. It has 1s along the diagonal and zeroes

everywhere else. It is square, but can be of any dimension. It has the interesting property that any matrix A

multiplied by it, will equal itself. A x I = A.

Calculating reciprocals of matrices is tricky, but all we need to know for basic PCA is that a matrix

multipled by its recriprocal will always equal the identity matrix, I. A x A-1 = I. For a reciprocal of

a matrix to exist, it must be square (m = n), and it must have a non-zero determinant.

Calculating the determinant of a matrix is an important step in PCA. Its easy for 2x2 matrices,

Characteristics

but for anything larger than 3x3, its prohibitively time consuming to do by hand.

|A| = ad - bc

determinant to identify eigenvalues. The importance of this step will be clear once we dive into the

actual PCA, but first I wanted to introduce the math.

Lets start with matrix A.

This gives us the characteristic equation of matrix A.

The purpose of this exercise is to identify the polynomial whose zeroes are the eigenvalues of matrix A. Well

dive into eigenvalues next.

Eigenvalues

Geometrically, an eigenvector is the direction of a transformation, and an eigenvalue is the

associated stretch. In the shear mapping of the Mona Lisa to the right, the blue line is

an eigenvector because its direction does not change. Its eigenvalue is 1 because its length

does not change. The red line is not an eigenvector, because its direction changes.

In Linear Algebra terminology, an eigenvector is a vector that points in a direction which is invariant under

the associated linear transformation.

In this equation, is a real number (scalar) known as the eigenvalue. v is the eigenvector. And A is the

original matrix. This equation says that a vector v is an eigenvector of matrix A, if the resulting product can

be restated as a scalar multiple of v. the intuition here is that the vector v is being scaled or stretched

by matrix A, but is not otherwise changing (i.e. its direction is unaffected). This will be clearer with an

example.

Start with matrix A:

The equation

It has roots = 1, and = 3; these are As eigenvalues. To find the eigenvectors, we use the roots to solve

the equation.

For = 3:

By setting the determinant equal to zero and then finding the roots, we guarantee that the resulting

THE

UNIVERSITY OF

OFFICE OF INVESTMENTS

eigenvectors

willCHICAGO

be orthogonal.

PCA - Introduction

Having established a mathematical framework, we can now work through the PCA example step by step.

Our Example:

I find PCA easiest to understand via example, so Ill walk through a calculation using a very simple data set.

Imagine that we measure the height, neck circumference, and armspan of 4 individuals. Intuitively, these 3

variables are likely to be correlated (e.g. a taller person is more likely to have a longer armspan etc). Our goal

is to re-express the dataset using fewer variables to better and more simply capture the systems underlying

dynamics. All measurements are in inches.

1. Center and scale the data

2. Calculate the covariance matrix

3. Calculate the eigenvalues and eigenvectors of the covariance matrix, and sort them by eigenvalue

4. Multiply the standardized data by the eigenvectors

1. Center and Scale the Data

This simple step requires subtracting the mean of each column vector from each datapoint and then dividing

by the standard deviation. This results in column vectors with mean zero and standard deviation of 1, and

allows for an apples to apples comparison of variables.

Now we identify the covariance matrix of our centered and scaled dataset. The covariance matrix can be

calculated as 1/(n-1)XX T, where X is our centered/scaled data. The diagonals reflect the variance of the

variables to themselves and the off-diagonals reflect the covariances between the variables. Since we

normalized the data, the correlation matrix and covariance matrix will be equal by definition. From this, we

can immediately notice that all of the variables are highly correlated to one another; the weakest correlation is

71.66%.

PCA Step 3

3. Calculate Eigenvectors and Eigenvalues, and Sort by Associated

Eigenvalue

Realistically youll be using a software package like Matlab to calculate the eigenvectors and eigenvalues,

because it is prohibitively difficult to do by hand for larger matrices, but this example is simple enough for us to

work through in detail. This follows the process laid out on slides 5 and 6.

is rewritten as

setting the determinant equal to zero.

The determinant of a 3x3 matrix is |B| = a(ei - fh) - b(di - fg) + c(dh - eg) = aei afh bdi + bfg + cdh

ceg

Using the characteristic equation of our covariance matrix, this is:

(t-1)(t-1)(t-1) - (t-1)0.7166^2 - .9676^2(t-1) + .9676(.7166)(.8534) + (.8534).9676(.7166) - .8534^2(t-1)

Which further reduces to: t^3 3t^2 + 0.8219t -0.0054

This polynomial has the roots: 2.6959, 0.0067, and 0.2974. These are the eigenvalues of the covariance

matrix, and their relative sizes reflect their relative importance. The eigenvalues will always sum to the

number of variables (in this case 3). The eigenvalues tell us that the first eigenvector will contain 89.9% of the

variance of the entire dataset, the second will contain just 0.2%, and the third 9.9% We then solve for v as

shown on slide 6 and get the eigenvectors: (0.6052, 0.7779, 0.1688), (0.5770, -0.5748, -0.5802), and (0.5484,

-0.2537, 0.7968) respectively. If you graphed these vectors on a 3-dimensional plot, they would all be

perpendicular to one another.

We then sort the eigenvectors in order of their associated eigenvalue (highest eigenvalue first), and arrange

the eigenvectors as columns, we get the following matrix:

3. (Continued) - Purpose and Properties of Eigenvalues and

Eigenvectors

To understand the purpose of using eigenvectors, lets take a step back to the covariance matrix of our

centered/scaled dataset. This covariance matrix reflects that each of our variables has a non-zero correlation

with one another. We want to re-express this data with a set of orthogonal (i.e. zero correlation) variables. This

will let us get a look into the distinct pure forces driving the system. We need to convert a set of correlated

variables into a set of uncorrelated variables; another way of saying the same is that we need to convert an

original covariance matrix with non-zero covariances into a new covariance matrix with only zero covariances

(aka diagonalize the matrix.) The eigenvectors of the covariance matrix are specifically chosen so that they

diagonalize it.

By construction, the eigenvectors are both orthogonal (i.e. perpendicular and uncorrelated), and orthonormal

(unit length 1). Additionally, the signs on each eigenvector are arbitrary, and different software platforms will

output results with different signs. If we flip the sign of every number in a particular eigenvector, its key

characteristics are unchanged. e.g. the vector (1, -3) has the same properties as (-1, 3) in this context; both of

these vectors are perpendicular to the same set of other vectors, and they are of equal length.

The sorting of the eigenvalues is a simple but important step that gets to the heart of PCA. The eigenvector

with the highest eigenvalue has the highest variance, and therefore the most explanatory power for the

original dataset. If you could only pick one vector to describe your original matrix, youd select the eigenvector

with the largest eigenvalue, and this would capture as much of the information in the original dataset as could

possibly be captured in a single vector.

The eigenvectors are sometimes referred to as loadings or principal component coefficients within a PCA

context.

10

PCA Step 4

4. Multiply the Standardized Data by the Eigenvectors

The ultimate goal of PCA is to re-express the original dataset, and now were finally ready to take that step

and to generate the actual principal components. We simply multiply our scaled and cleaned dataset by the

matrix of eigenvectors that we generated in step 3. As a matrix operation, its a simple as A x V. To generate

a particular datapoint for Principal Component 1, were adding together all of the first observations from the

scaled/cleaned dataset, with each one loaded by its associated eigenvector. So, if PCA1 is the first Principal

Component vector and V represents our eigenvector matrix, then PCA1 1 = x1(v1,1) + y1(v2,1) + z1(v3,1).

Principal Component 1 is uncorrelated to Principal Component 2 by construction. Conceptually, were reexpressing the original dataset in terms of the eigenvectors. Since the eigenvectors are perpendicular to one

another, the principal components are uncorrelated.

To interpret the transformed data, its useful to look at the correlation to the original (unscaled and

uncentered). In the table below, we can see that the first principal component (PC1) is extremely similar to

Height, but also captures most of the information in Neck and Armspan; as we learned from the

eigenvalues, PC1 captures 89.9% of the total information of the dataset. PC2 seems to relate primarily to

Neck and Armspan, and PC3 is not closely connected to any of the three variables and adds almost no

information. We could discard the second and third principal components and still retain most of the

information of the original dataset.

11

PCA Interpretation

suspect that one or more of the variables may be

redundant, since intuition suggests that a persons

height, neck circumference, and armspan are all

positive correlated.

datasets information with just a single variable, and 99.8%

with 2 variables. PC 1 is extremely similar to height, but

also captures a substantial portion of the information in

neck and armspan. If all we care about is comparing the

general size of a population, PC 1 may be sufficient. A high

PC 1 score means a person is almost certainly tall, and

probably also has a large neck and armspan. PC 2 functions

as a differentiating factor for neck and armspan; it helps

distinguish people who are tall but with unusually short

armspan etc.

To summarize, weve taken a dataset with 3 variables and generated a set of eigenvectors that are

perpendicular to one another. Then we used those eigenvectors to generate a new dataset consisting of

three principal components with zero correlation. We have re-expressed the data from Height and

Neck and Armspan to PC 1, PC 2, and PC 3, where the latter variables are orthogonal. And this is a key

point: by construction, PC1 contains as much of the variance in the entire dataset as possible, so we have

the option of ignoring PC 3 with hardly any loss of information, and we may even choose to ignore PC 2.

THE UNIVERSITY OF CHICAGO OFFICE OF INVESTMENTS

12

Assumptions

Linearity : While most real world data is non-linear, a linear assumption still works reasonably well in

most cases. For systems where non-linearity is a key feature, there are extensions such as kernel

PCA that are more appropriate.

High Variance = Important Information: By sorting by the largest eigenvalue and often ignoring the

lower variance principal components, we are assuming that high variance reflects more important

information.

Gaussian distribution: PCA uses mean and variance in its construction, which implicitly assume a

Gaussian distribution.

Scaling

Throughout this text Ive assumed that well always center and scale our data before performing

PCA. The scaling is not strictly required. If you dont scale the data, youll simply be incorporating

the scale of the variables into the PCA analysis. For example, if variable a and b contain nearly

identical data, but a is in minutes and b is in hours, failing to scale the data will result in a

overwhelming b, since a will contain much more variance.

Another consideration is that if the data is not scaled, then the correlation and covariance matrices

of the data will not be the same, and so you will have to choose between calculating eigenvectors on

the covariance matrix or on the correlation matrix. Usually the correlation matrix is preferable since

it is not sensitive to arbitrary units and makes comparing the results of PCA of various datasets more

meaningful. The covariance matrix is sometimes chosen because then the statistical inference of

the results of PCA on a sample to a broader population is more effective.

Interpretation

The biggest problem with Principal Component Analysis is interpretation. The principal components

themselves will often have no intrinsic meaning or intuition behind them. In analysis involving many

complex variables, the eigenvectors will be complex and the meaning often obscure. In our

example, PC 1 was very highly correlated to height, but PC 3 could not be intuitively interpreted.

THE UNIVERSITY OF CHICAGO OFFICE OF INVESTMENTS

13

The goal of PCA is to replace original dataset X with a new dataset Y that contains uncorrelated

variables, i.e. the covariance matrix of Y, S Y must be diagonalized. First, note that the covariance

matrix of Y can be calculated as 1/(n1)YYT. In PCA, were looking to find some orthonormal matrix P

where Y = PX such that the covariance matrix of Y is diagonalized. The rows of P are the principal

components of X.

We begin by rewriting SY in terms of our variable of choice P.

SY = 1/(n 1)YYT

= 1/(n 1) (PX)(PX) T

= 1/(n 1)PXXT P T

= 1/(n 1) P(XXT )P T

SY = 1/(n 1)PAPT

We now define a new matrix A=XXT. A is diagonalized by an orthogonal matrix of its eigenvectors.

A = EDET, were D is a diagonal matrix, and E is a matrix of eigenvectors of A. Now we define P as E T

and find that A=PTDP.

When we then continue evaluating SY, we can find proof that the choice of P diagonalizes S Y.

SY = 1/(n 1)PAPT

= 1/(n-1)PPTDPPT

SY =1/(n-1)D

This section comes directly from Jon Shlens, Tutorial on Principal Component Analysis (2003), an

excellent resource.

THE UNIVERSITY OF CHICAGO OFFICE OF INVESTMENTS

14

Start with dataset X with variables as columns and observations as rows. Center and scale the

data with the zscore function:

X2 = zscore(X)

The most direct way to perform pca is with Matlabs aptly named pca() function.

[coeff,score,latent] = pca(X2)

This will output the eigenvectors of the covariance matrix of X2 as coeff sorted by eigenvalue, and

the principal components as score. The eigenvalues will be exported as latent.

You can break the steps of pca() down by using either the eig() or svd() functions. eig() operates

directly on the covariance matrix, not the original data.

[V D] = eig(cov(X2))

This will output the eigenvectors as V and the eigenvalues as D. The eigenvectors are transposed

and unsorted relative to the pca() output.

[U,S,V] = svd(X2)

will also output the eigenvalues as V within a singular value decomposition framework.

You can manually generate pca()s score output by simply multiplying the transposed

eigenvectors by the centered and scaled data.

X2 x coeff = score

Credits:

Tutorial on Principal Component Analysis (2003) by Jon Shlens

https://www.mathsisfun.com/ sections on linear algebra.

15

- Intermediate R - Principal Component AnalysisUploaded byVivay Salazar
- PrincipalComponentAnalysis(1)Uploaded byShyleshNair
- A Tutorial on Principal Component AnalysisUploaded byCristóbal Alberto Campos Muñoz
- Practical Guide to Principal Component Analysis (PCA) in R & PythonUploaded byNilson Bispo
- Multivariate Data AnalysisUploaded bykamranwasti
- Principle Component AnalysisUploaded bymatthewriley123
- Cryptocurrency and BlockchainsUploaded byAri David Paul
- Analysis of a Complex of Statistical Variables Into Principal ComponentsUploaded byAhhhhhhh
- Principal Component AnalysisUploaded byBethany Yollin
- Principal Component Analysis.pptUploaded byKarthik K Naidu
- Practical Guide to Principal Component Methods in R Multivariate Analysis Book 2 by Alboukadel KasUploaded byEnrique Lopez Severiano
- Presentation - Overview PCA & PLS - March 2009Uploaded bylucian.huluta
- User Friendly Multivariate Calibration GPUploaded bySilvio Daniel Di Vanni
- Principal Component Analysis i ToUploaded bySuleyman Kale
- Principal Component Analysis-PRESENTATION.pptUploaded byrubania
- Principal Component Analysis 4 DummiesUploaded byAnubhav Aggarwal
- Bayes NetworkUploaded byPablo Silva
- 6536-8489-1-PB (pp. 152-158)Uploaded byAkolgo Paulina
- Principal Component AnalysisUploaded byFrank Puk
- Academic Performance of Statistics Students at II (Submitted)RevisedUploaded byjuncatalan
- Maths Probability Characteristic functions lec7/8.pdfUploaded byeeshgarg
- Principal component analysis Remote sensingUploaded byivan8877
- ACADEMIC PERFORMANCE EVALUATION USING FUZZY C-MEANSUploaded byTJPRC Publications
- PNl AttributionUploaded byUdayashankar Tharavanat
- Application of multivariate principal component analysis on dimensional reduction of milk composition variablesUploaded byresearchinbiology
- Characteristic FunctionUploaded byAnonymous 0U9j6BLllB
- Studying the behaviour of Indian Sovereign Yield Curve using Principal Component AnalysisUploaded byAbhishek Minz
- Maximum Likelihood EstimatorUploaded byVlad Doru
- FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZATION ON A HETEROGENEOUS DATA SETUploaded byCS & IT
- catpcaUploaded byRodito Acol

- Valuing Crypto Currency Call Block Tower 2Uploaded byAri David Paul
- Cryptocurrency and BlockchainsUploaded byAri David Paul
- Cryptocurrency and BlockchainsUploaded byAri David Paul
- Crypto Finance AriUploaded byAri David Paul
- RoRGeopolitics2Uploaded byAri David Paul
- TrueGDP(ROR)Uploaded byAri David Paul
- Tax HikesUploaded byAri David Paul
- Tax HikesUploaded byAri David Paul
- RoRGeopolitics1Uploaded byAri David Paul
- Economic Fundamentals Money PDFUploaded byAri David Paul

- Dist-001 C2Splitter BasicUploaded bymontoyazumaeta
- ! Tongyu Catalog 11-01-2014Uploaded bylimpbizkit111234
- prob-ceUploaded byKristianne Mae Echavez
- 4th Monthly Test Reviewer Grade 5Uploaded byDivina Gracia Barrion Cuya
- Book Construction of Fills MonahanUploaded byBill Feng
- unit1Uploaded byaraz_1985
- ParaxialRayTracingAndSimpleInstruments.pdfUploaded bylantea1
- Akashic RecordsUploaded byRaajhesh Mutta
- 2 Scalar and Vector FieldUploaded byVivek Kumar
- 10.1007%2Fs40726-016-0035-3Uploaded byAnonymous iVGRoTJO
- Skin Care and the Barrier FunctionUploaded byFhika Oey Tanubrata
- Probability.pptUploaded byNamwangala Rashid Natindu
- 34 Waves-EM SpectrumUploaded byeltytan
- The Fourth-Order Bessel-type Differential EquationUploaded byCarmine Russo
- Standardized High Current Solid Targets for Cyclotron Production of Diagnostic and Therapeutic RadionuclidesUploaded byBioengineer Santirat
- Numerical Comparison of Nonlinear Programming Algorithms for Structural OptimizationUploaded byDaniel Colmenares
- 07019431Uploaded byLalbahadur Majhi
- GetUploaded bymaxlenti
- Physics Syllabus Georgia TechUploaded byAhmad Usman
- CE2100 Lecture 1Uploaded bySaiRam
- Production Volume Rendering Systems 2011Uploaded byyurymik
- Difference Between Icu & IcsUploaded bysesabcd
- Teguh Fitrianto IPA12-G-025 (1)Uploaded byRizki Perdana Putra
- Dirac-SeaUploaded byHerge Haddock
- ESA DUE GlobCurrent URD Iss 1 Rev 3Uploaded byD
- Final Exam Questions - SME4463 Sem 2013-14 1Uploaded byMaipenrai Memee
- Maximize Beverage Emulsion Productivity (PURITY GUM® ULTRA) White PaperUploaded byIngredion Knowledge Bank
- Cryogenic Engine in Rocket PropulsionUploaded bySai Nandan
- strain gageUploaded byFernandocf90
- AQA GCSE Mathematics-Unit 2F-Practice Paper Set 4-V1.1Uploaded bynaz2you