# INDUSTRIALIZATION IN

15 DISTRICT
Using Principal Component Analysis

## Nirmana Fiqra Q. 12111068

Abd. Salam. M 22114030
Hafidz Mabruri 22114032

## Survey Data from Manufacturing Industry

#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

D
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

X1
9.75
10.5
11.25
12.6
11.9
15.2
12.25
12.9
14.3
13.25
15.3
8.9
10.6
17.25
16.9

X2
6.5
10.25
11.9
11.75
11
13.5
12
12.6
13.2
12.9
14
9.25
10.5
15
14.9

X3
1.61
2
2.5
2.7
2.25
3.25
2.9
3
3.1
3.05
3.25
1.9
1.95
3.5
3.4

X4
0.65
0.75
0.9
1.15
0.95
1.75
1.05
1
1.7
1.25
1.8
0.6
0.5
2
1.95

X1 = Contribution To

## Domestic Product (%)

X2 = Percent Worker in
this industry (%)
X3 = Productivity per
worker (Rp/worker)
X4 = Investation per
worker (Rp/worker)

## Principal Component Analysis, the Basic

Relationship between two parameter is its correlation
Correlation of two identical parameter will be 1, that mean

## highly correlated. So zero correlation mean, there is no

correlation between 2 parameters
Corr ( X , Y )

C ( X ,Y )

cov( X , Y )
XY

C XX

C XY

C YX

Corr ( X , X ) 1

C YY C XY

C YX
1

Corr (Y , Y ) 1

## Principal Component of Correlation

Principal component of correlation is its eigen matrix.

## Producing an orthogonal and zero correlation between its

component.
So, the eigen vector will be transformation matrix from
non-principal axes to the principal axes of the data.

Cx x
(C I ) x 0
C11
C
21

C12

x x

C22

C11
C
21

C12

x0

C22

## Eigen Value Eigen Vector

Solve det(C-I)=0 to get the eigen value.
Solve linear equation (C-I)x=0 to get eigen vector x

C11
det
C21
C11 n
C
21

C12

C22
C12 xn1

C22 n xn 2

find n
find eigen vector n

## PCA, Step 1 : Normalization

Normalization of data : resize the data to
get zero mean and standard deviation of 1
KABUPATEN X1
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

X2

X3

X4

-1.2305
-0.9335

-2.4299
-0.7579

-1.7615
-1.1258

-1.0713
-0.8765

-0.6364

-0.0223

-0.3108

-0.5843

-0.1017
-0.3789

-0.0892
-0.4236

0.0152
-0.7183

-0.0974
-0.4870

0.9282
-0.2403

0.6911
0.0223

0.9117
0.3412

1.0713
-0.2922

0.0172
0.5717

0.2898
0.5573

0.5042
0.6672

-0.3896
0.9739

0.1558
0.9678

0.4236
0.9140

0.5857
0.9117

0.0974
1.1687

-1.5672
-0.8939

-1.2038
-0.6465

-1.2888
-1.2073

-1.1687
-1.3635

1.7402

1.3598

1.3192

1.5583

1.6015

1.3152

1.1562

1.4609

X (i , j )

x (i , j ) x ( j )

( j)

## PCA, Step 2 : Correlation Matrix

Highly correlated or corr(X,Y)=1 means the 2 variable is

## complete correlated linearly (Y=mX+c)

No correlation or corr (X,Y) =0 means the 2 variable is no
correlation at all.
Corr(i,
j)
X1
X2
X3
X4
Corr ( X , Y )

X1
1.0000
0.9092
0.9329
0.9690

cov( X , Y )
XY

X2
0.9092
1.0000
0.9519
0.8641

X3
0.9329
0.9519
1.0000
0.9110

Corr ( X , X ) 1

X4
0.9690
0.8641
0.9110
1.0000
Corr (Y , Y ) 1

## PCA, Step 3 : Eigen Value and Eigen Vector of

Correlation
Eigen
3.7694
0.1625
0.0442
0.0239

X1(PDR
B)
X2(LB)
0.5056 0.4941
0.3398 -0.6393
0.3568 0.5065
-0.7082 0.3012

Z1
Z2
Z3
Z4

X3(P)
X4(I)
0.5035 0.4967
-0.3179 0.6122
-0.7814 -0.0749
-0.1867 0.6107

Cx x
(C I ) x 0
C11 n
C
21

C31

C12

C13

C22 n

C23

C32

C33 n

C42
C43
C41
Z n xn1. X 1 xn 2 . X 2 xn 3 . X 3 xn 4 . X 4

C14
C24
C34

C44 n

xn1
xn 2

xn 3

xn 4

## PCA, Step 4 : Principal Component Value

DIST
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
mean
stdevs

Z1

Z2

Z3

Z4

-3.2418

1.0393

-0.2132

-0.1856

-1.8487

-0.0114

0.2283

0.1078

-0.7795

-0.4610

0.0482

0.1452

-0.1362

-0.0420

-0.0860

-0.0172

-1.0044

0.0722

0.2480

-0.0224

1.8019

0.2397

-0.1114

0.0347

-0.0838

-0.3833

-0.3192

-0.0652

0.2123

-0.5782

-0.2119

-0.2569

1.3841

0.2221

-0.1080

0.2331

vector a = vector x
Z 1 a11 . X 1 a12 . X 2 .... a1 p . X p
Z 2 a 21 . X 1 a 22 . X 2 .... a 2 p . X p
....
Z p a p1 . X 1 a p 2 . X 2 .... a pp . X p

0.6313

-0.3444

-0.1948

-0.0327

1.9805

0.1703

0.0083

0.1333

Z1
Z
2

-2.6166

-0.0688

-0.0744

0.2743

-2.0565

-0.3414

0.3991

-0.1688

2.9900

0.2567

0.1622

-0.1176

2.7674
0.0000
1.9415

0.2303
0.0000
0.4032

0.2248
0.0000
0.2103

-0.0619
0.0000
0.1545

a11

a12

....

a 21

...

a p1

a 22

....

...

....

...

a p2

....

Z p

a1 p

a 2 p
...

a pp

X1
X 2

...

X p

3
2
1

Z1

-1
-2
-3
-4
0

Z1

10

15

## 1-st principal component, with fair score of 4 parameter (94.2 % of

sample variance)
Limit by square root eigen value of Z1, , the standard deviation of Z1

1A
0.8
0.6
0.4

Z2

0.2

-0.2

-0.6

ON

-0.4

G
H

-0.8
-1

-3

-2

-1

0
Z1

## 1-st principal component vs 2-nd component (94.2 % of sample variance)

2-nd component is highly Investation, lack of productivity (4.06% of sample variance)
Limit of Z1 by standard deviation of Z1=1.942, limit Z2 by standard deviation of

Z2=0.432

## PCA : 3 Component Analysis (Z1,Z2,Z4)

0.3
M

0.2
0.1

Z3

0
-0.1
-0.2
-0.3

C
I

G
0
Z2

-1

-3

-2

-1

Z1

3 component analysis
3-rd component is low investation and low productivity (1.1% of sample variance)
Because Z1 is representing of 94 % variance, conclusion using other Z will lead to anomaly or

wrong conclusion.

Conclusion
High Industrialization District, enough worker and highly

productive : K, O and N
Middle Industrialization, low productive : C and H.
Fair Industrialization District : B, E, D, G, J, I and F.
Low Industrialization : A, M and L