You are on page 1of 13

INDUSTRIALIZATION IN

15 DISTRICT
Using Principal Component Analysis

Nirmana Fiqra Q. 12111068


Abd. Salam. M 22114030
Hafidz Mabruri 22114032

Survey Data from Manufacturing Industry


#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

D
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

X1
9.75
10.5
11.25
12.6
11.9
15.2
12.25
12.9
14.3
13.25
15.3
8.9
10.6
17.25
16.9

X2
6.5
10.25
11.9
11.75
11
13.5
12
12.6
13.2
12.9
14
9.25
10.5
15
14.9

X3
1.61
2
2.5
2.7
2.25
3.25
2.9
3
3.1
3.05
3.25
1.9
1.95
3.5
3.4

X4
0.65
0.75
0.9
1.15
0.95
1.75
1.05
1
1.7
1.25
1.8
0.6
0.5
2
1.95

X1 = Contribution To

Domestic Product (%)


X2 = Percent Worker in
this industry (%)
X3 = Productivity per
worker (Rp/worker)
X4 = Investation per
worker (Rp/worker)

Principal Component Analysis, the Basic


Relationship between two parameter is its correlation
Correlation of two identical parameter will be 1, that mean

highly correlated. So zero correlation mean, there is no


correlation between 2 parameters
Corr ( X , Y )

C ( X ,Y )

cov( X , Y )
XY

C XX

C XY

C YX

Corr ( X , X ) 1

C YY C XY

C YX
1

Corr (Y , Y ) 1

Principal Component of Correlation


Principal component of correlation is its eigen matrix.

Producing an orthogonal and zero correlation between its


component.
So, the eigen vector will be transformation matrix from
non-principal axes to the principal axes of the data.

Cx x
(C I ) x 0
C11
C
21

C12

x x

C22

C11
C
21

C12

x0

C22

Eigen Value Eigen Vector


Solve det(C-I)=0 to get the eigen value.
Solve linear equation (C-I)x=0 to get eigen vector x

C11
det
C21
C11 n
C
21

C12

C22
C12 xn1

C22 n xn 2

find n
find eigen vector n

PCA, Step 1 : Normalization


Normalization of data : resize the data to
get zero mean and standard deviation of 1
KABUPATEN X1
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

X2

X3

X4

-1.2305
-0.9335

-2.4299
-0.7579

-1.7615
-1.1258

-1.0713
-0.8765

-0.6364

-0.0223

-0.3108

-0.5843

-0.1017
-0.3789

-0.0892
-0.4236

0.0152
-0.7183

-0.0974
-0.4870

0.9282
-0.2403

0.6911
0.0223

0.9117
0.3412

1.0713
-0.2922

0.0172
0.5717

0.2898
0.5573

0.5042
0.6672

-0.3896
0.9739

0.1558
0.9678

0.4236
0.9140

0.5857
0.9117

0.0974
1.1687

-1.5672
-0.8939

-1.2038
-0.6465

-1.2888
-1.2073

-1.1687
-1.3635

1.7402

1.3598

1.3192

1.5583

1.6015

1.3152

1.1562

1.4609

X (i , j )

x (i , j ) x ( j )

( j)

PCA, Step 2 : Correlation Matrix


Highly correlated or corr(X,Y)=1 means the 2 variable is

complete correlated linearly (Y=mX+c)


No correlation or corr (X,Y) =0 means the 2 variable is no
correlation at all.
Corr(i,
j)
X1
X2
X3
X4
Corr ( X , Y )

X1
1.0000
0.9092
0.9329
0.9690

cov( X , Y )
XY

X2
0.9092
1.0000
0.9519
0.8641

X3
0.9329
0.9519
1.0000
0.9110

Corr ( X , X ) 1

X4
0.9690
0.8641
0.9110
1.0000
Corr (Y , Y ) 1

PCA, Step 3 : Eigen Value and Eigen Vector of


Correlation
Eigen
3.7694
0.1625
0.0442
0.0239

X1(PDR
B)
X2(LB)
0.5056 0.4941
0.3398 -0.6393
0.3568 0.5065
-0.7082 0.3012

Z1
Z2
Z3
Z4

X3(P)
X4(I)
0.5035 0.4967
-0.3179 0.6122
-0.7814 -0.0749
-0.1867 0.6107

Cx x
(C I ) x 0
C11 n
C
21

C31

C12

C13

C22 n

C23

C32

C33 n

C42
C43
C41
Z n xn1. X 1 xn 2 . X 2 xn 3 . X 3 xn 4 . X 4

C14
C24
C34

C44 n

xn1
xn 2

xn 3

xn 4

PCA, Step 4 : Principal Component Value


DIST
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
mean
stdevs

Z1

Z2

Z3

Z4

-3.2418

1.0393

-0.2132

-0.1856

-1.8487

-0.0114

0.2283

0.1078

-0.7795

-0.4610

0.0482

0.1452

-0.1362

-0.0420

-0.0860

-0.0172

-1.0044

0.0722

0.2480

-0.0224

1.8019

0.2397

-0.1114

0.0347

-0.0838

-0.3833

-0.3192

-0.0652

0.2123

-0.5782

-0.2119

-0.2569

1.3841

0.2221

-0.1080

0.2331

vector a = vector x
Z 1 a11 . X 1 a12 . X 2 .... a1 p . X p
Z 2 a 21 . X 1 a 22 . X 2 .... a 2 p . X p
....
Z p a p1 . X 1 a p 2 . X 2 .... a pp . X p

0.6313

-0.3444

-0.1948

-0.0327

1.9805

0.1703

0.0083

0.1333

Z1
Z
2

-2.6166

-0.0688

-0.0744

0.2743

-2.0565

-0.3414

0.3991

-0.1688

2.9900

0.2567

0.1622

-0.1176

2.7674
0.0000
1.9415

0.2303
0.0000
0.4032

0.2248
0.0000
0.2103

-0.0619
0.0000
0.1545

a11

a12

....

a 21

...

a p1

a 22

....

...

....

...

a p2

....

Z p

a1 p

a 2 p
...

a pp

X1
X 2

...

X p

PCA : 1 Component Analysis (Z1)


3
2
1

Z1

-1
-2
-3
-4
0

Z1

10

15

1-st principal component, with fair score of 4 parameter (94.2 % of

sample variance)
Limit by square root eigen value of Z1, , the standard deviation of Z1

PCA : 2 Component Analysis (Z1,Z2)


1A
0.8
0.6
0.4

Z2

0.2

-0.2

-0.6

ON

-0.4

G
H

-0.8
-1

-3

-2

-1

0
Z1

1-st principal component vs 2-nd component (94.2 % of sample variance)


2-nd component is highly Investation, lack of productivity (4.06% of sample variance)
Limit of Z1 by standard deviation of Z1=1.942, limit Z2 by standard deviation of

Z2=0.432

PCA : 3 Component Analysis (Z1,Z2,Z4)


0.3
M

0.2
0.1

Z3

0
-0.1
-0.2
-0.3

C
I

G
0
Z2

-1

-3

-2

-1

Z1

3 component analysis
3-rd component is low investation and low productivity (1.1% of sample variance)
Because Z1 is representing of 94 % variance, conclusion using other Z will lead to anomaly or

wrong conclusion.

Conclusion
High Industrialization District, enough worker and highly

productive : K, O and N
Middle Industrialization, low productive : C and H.
Fair Industrialization District : B, E, D, G, J, I and F.
Low Industrialization : A, M and L