You are on page 1of 7

Assignment # 1

MULTIVARIATE STATISTICAL INFERENCE


(STAT-706)
SUBMITTED TO: SIR KASHIF
SUBMITTED BY: SUHAIL ASHRAF
EJAAZ RAFAQAT
NAZAKAT ALI
MPHIL STATISTICS
1st SEMESTER
Principal Component Analysis (PCA)
It is a dimension-reduction tool that can be used to reduce a large set of variables to a small set
that still contains most of the information in the large set.

Objectives of principal component analysis


• PCA reduces attribute space from a larger number of variables to a smaller number of factors
and as such is a "non-dependent" procedure (that is, it does not assume a dependent variable is
specified).

• PCA is a dimensionality reduction or data compression method. The goal is dimension


reduction and there is no guarantee that the dimensions are interpretable (a fact often not
appreciated by (amateur) statisticians).

•To select a subset of variables from a larger set, based on which original variables have the
highest correlations with the principal component.

Example
A census provided information, by tract, on five socioeconomic variables for the Madison,
Wisconsin, area. The data from 61 tracts are listed in a table. Can the sample variation be
summarized by one or two principal components?

Total Professional Employed Governament employment Median home


2.67 5.71 69.0 30.3 1.48

2.25 4.37 72.98 43.3 1.44

3.12 10.27 64.94 32.0 2.11

5.14 7.44 71.29 24.5 1.85

5.54 9.25 74.94 31.0 2.23

5.04 4.84 53.61 48.2 1.60

3.14 4.82 67.00 37.6 1.52

2.43 2.40 67.20 36.8 1.40

5.38 4.30 83.03 19.7 2.07

7.34 2.73 72.60 24.5 1.42

4.94 4.66 64.32 27.7 1.42

4.82 4.26 82.64 20.3 1.46


5.02 4.17 84.25 20.6 1.42

3.37 1.00 69.93 16.4 1.17

3.63 6.40 70.31 29.0 2.00

7.43 6.00 70.53 37.7 1.44

2.20 10.59 69.85 41.7 2.01

7.16 4.71 79.44 33.0 1.55

6.33 2.88 66.24 38.1 1.73

2.57 1.85 67.25 33.4 1.18

6.38 1.56 63.00 18.2 0.93

5.34 3.41 72.57 20.1 1.66

4.87 5.20 75.13 16.5 3.64

2.04 4.83 67.78 17.4 1.49

5.48 1.34 77.43 21.6 1.32

7.77 5.32 58.57 31.2 3.21

6.29 2.60 64.32 27.4 1.78

6.38 3.71 78.61 34.1 1.30

5.76 4.06 83.77 31.4 1.52

6.03 3.10 76.04 25.0 1.08

5.09 1.85 74.65 24.1 0.97

4.36 1.67 65.43 23.7 1.07

3.07 2.00 68.03 26.2 1.19

1.82 1.13 49.50 21.9 1.62

3.31 0.94 74.75 26.5 1.12

3.45 0.72 65.99 22.0 1.20

1.74 0.97 60.24 22.0 1.17

1.81 1.54 70.05 24.4 1.00

5.59 1.66 77.96 17.1 1.30

3.72 1.69 82.40 16.3 1.52

3.39 1.24 67.17 27.7 1.03

2.25 2.80 70.81 23.4 1.14

3.31 1.30 71.30 19.2 1.21


5.27 1.20 73.08 30.3 1.35

3.26 1.02 74.36 16.5 1.23

6.76 1.53 78.37 22.6 1.33

2.92 4.42 58.50 68.5 2.25

1.64 16.70 64.61 49.4 3.13

1.36 14.26 66.42 22.5 2.80

3.58 3.38 65.57 26.1 1.31

3.38 2.17 66.10 22.6 1.44

7.25 1.16 78.52 23.6 1.50

5.44 2.93 73.59 22.3 1.65

5.83 4.47 77.33 26.2 2.16

3.74 2.26 79.70 20.2 1.58

9.21 2.36 74.58 21.8 1.72

2.14 6.30 86.54 17.4 2.80

6.62 4.79 78.84 20.0 2.33

4.24 5.82 71.39 27.1 1.69

4.72 4.71 78.01 20.6 1.55

6.48 4.93 74.23 20.9 1.98

Procedure
1. Choose Stat > Multivariate > Principal Component.
2. In variables, enter C1-C5.
3. In graphs, select Scree plot and Biplot.
4. Click OK
Coefficients for the principal components

(Correlation coefficients in parentheses)

Variable e1 (ryˆ1, xk ) e2 (ryˆ2, xk ) e3 e4 e5

Total population -0.039(-0.22) 0.0711(0.24) 0.187893 0.977135 -0.057700

Professional 0.105(0.35) 0.1298(0.26) -0.960996 0.17135 -0.138554

Employment (%) -0.492(-0.68) 0.8644(0.73) 0.045797 -0.091044 0.004966

Governament

Employment (%) 0.863(0.95) 0.4803(0.32) 0.153185 -0.029686 0.006692

Medium home 0.009(0.16) 0.0147(0.17) -0.124981 0.081701 0.988637

Value

Variance 107.015 39.6721 8.370866 2.867874 0.154693

Cumulative 67.7 92.8 98.1 99.9 1.00

Percentage of

Total variance

Interpretation
The first principal component explains 67.7% of the total sample variance .The first two
principal components, explain 92.8% of the total sample variance .Consequently,
sample variation is summarized very well by two principal components and a reduction
in the data from 61 observations on 5 observations to 61 observations on 2 principal
components is reasonable.
Scree Plot

Scree Plot of Total pop.(thousands), ..., Median home value($100000)


120

100

80
Eigenvalue

60

40

20

1 2 3 4 5
Component Number

Interpretation
An elbow occurs in the plot at about i=3. That is, the eigen values after ̂2 are all relatively small
and about same size.
Biplot

Biplot of Total pop.(thousands), ..., Median home value($100000)


40
Employed age over 16(%)

30

Govt. employment(%)
Second Component

20

10
Professional degree(%)
Total pop.(thousands)
Median home value($100000)
0

-10

-20

-50 -25 0 25 50 75 100


First Component

Interpretation
In this plot first two principal components explain 92.8% of the total sample variance.

You might also like