You are on page 1of 6

PRINCIPAL COMPONENT ANALYSIS

Dinesh M.D. (2nd year M.Sc. Fin Eco) Prabakar R (2nd year M.Sc. Fin Eco)

INTRODUCTION:

Principal component analysis can be performed for a variety of reasons. Arguably the most common use is in terms of dimension reduction. Principal component analysis can be thought of as a data analytic method which provides a specific set of projections which represent a given data set in fewer dimensions. This has obvious advantages when it is possible to reduce dimensionality to two or three as visualisation becomes very straightforward but it should be acknowledged that reduced dimensionality has advantages beyond visualisation. In this analysis six stocks were selected from NSE. Three of these stocks are from telecommunication (telecom) industry (Bharthi Airtel, Reliance Communications and Idea Cellular) and the other three are from oil industry (Indian Oil Company, Bharat Petroleum and Hindustan Petroleum) The Principal Component Analysis (PCA) is done on these stock returns (daily returns from 20-july-2007 to 19- July- 2010)

Results:
. pca airtel idea rcom ioc bpcl hp Principal components/correlation Rotation: (unrotated = principal) Component Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Eigenvalue 3.2377 1.30214 .543758 .415198 .345289 .155915 Difference 1.93556 .758382 .12856 .0699091 .189374 . Number of obs = Number of comp. = Trace = Rho = Proportion 0.5396 0.2170 0.0906 0.0692 0.0575 0.0260 737 6 6 1.0000

Cumulative 0.5396 0.7566 0.8473 0.9165 0.9740 1.0000

Principal components (eigenvectors) Variable airtel idea rcom ioc bpcl hp Comp1 0.3386 0.3779 0.3936 0.4173 0.4506 0.4588 Comp2 0.4424 0.4607 0.4254 -0.3496 -0.3643 -0.3952 Comp3 0.8279 -0.4065 -0.3789 -0.0230 0.0720 -0.0009 Comp4 0.0394 -0.2978 0.2731 0.7496 -0.4600 -0.2482 Comp5 0.0295 0.6236 -0.6672 0.3358 -0.2235 -0.0491 Comp6 -0.0417 0.0463 -0.0294 0.1684 0.6305 -0.7546 Unexplained 0 0 0 0 0 0

INFERENCE:
The proportion of the total variance accounted for by the first and second component s account for around 75.66 % , as also the Eigen value for the first and second components are at a high value ( 3.2377 and 1.30214 ) i.e. greater than 1. In this case, the first and second components could replace the original three variables with little loss of information.

Scree plot of eigenvalues after pca


3 0 1 1 Eigenvalues 2

Number

The scree plot we can determine the number of appropriate number of components to retain. The elbow in the scree plot is an indicator as to how many components to be retained. In the above plot the elbow occurs at i=2. Thus the two sample principal components effectively summarise the total sample variance.
a a airtel idea rcom ioc bpcl hp 1.0000 0.6092 0.6800 0.7082 0.7509 0.8107 0.8255 airtel 1.0000 0.4979 0.5038 0.2603 0.3026 0.2752 idea rcom ioc bpcl hp

1.0000 0.6428 0.2868 0.3301 0.3391

1.0000 0.3498 0.3540 0.3525

1.0000 0.6212 0.6970

1.0000 0.8337

1.0000

The stock of Hindustan Petroleum with a co-efficient of 0.4588 receives the greatest weight in the component a. It also has the highest correlation with a of 0.8255. Also the stock of Bharat Petroleum has a weight of

0.4506. Thus the stocks of Bharat Petroleum and Hindustan Petroleum are equally important to the first principal component. The first component is roughly equal weighted sum or index of the six stocks. This is called as Market component. It may be noted that the oil stocks have higher weights as compared to the telecom stocks. This could to be reason that the telecom companies are fairly new to the markets (as compared to the oil companies) and telecom industry as such is a capital intensive sector. Thus it will take a long time for the telecom industries to give returns on par with the oil companies. The oil companies have been present in the markets for a long time and they also have highly depreciated assets. Moreover these oil companies are listed in the Fortune 500 list. Thus it is appropriate and reasonable that the oil companies have more weights than the telecom industries. The second principal component contrasts between the oil and telecom stocks. This may be called as industry component.

The correlation between the principal components is nonexistent as expected. The geometrical interpretation of the sample principal components is shown in the above diagram. They lie along the axes of the ellipse in the perpendicular directions of the maximum sample variance. The last few sample principal components can often be ignored and the data can be adequately approximated by their representations in the space of the retained values. The red line from the above diagram represents the Principal component 1 and the green line represents the principal component 2. The principal component lines are drawn in such a way that it is under a particular range of values i.e. between the maximum and minimum values in the principal components. The red and the green lines are perpendicular to each other such that the maximum area is covered. Thus the principal component values the red line is between the range of 0.3386 and 0.4588 (horizontally), similarly the green line range is between -0.3952 and 0.4607 (vertically). When the covariance matrix is used the results are same and there is no change in the results

. pca airtel idea rcom ioc bpcl hp, covariance Principal components/covariance Rotation: (unrotated = principal) Component Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Eigenvalue Difference Number of obs = 737 Number of comp. = 6 Trace = .0060533 Rho = 1.0000 Proportion 0.5339 0.2171 0.0985 0.0691 0.0598 0.0215 Cumulative 0.5339 0.7510 0.8495 0.9187 0.9785 1.0000

.0032321 .00191799 .00131411 .000717856 .000596257 .000177714 .000418542 .0000565917 .000361951 .000231654 .000130297 .

Principal components (eigenvectors) Variable airtel idea rcom ioc bpcl hp Comp1 0.3758 0.3800 0.4796 0.4322 0.3744 0.3967 Comp2 -0.4063 -0.3660 -0.4316 0.4656 0.3589 0.4112 Comp3 0.8243 -0.2562 -0.4993 -0.0171 0.0701 0.0206 Comp4 -0.1019 0.4323 -0.2926 -0.6462 0.4419 0.3230 Comp5 -0.0512 0.6840 -0.4985 0.4046 -0.3044 -0.1574 Comp6 -0.0351 0.0365 -0.0183 0.1222 0.6620 -0.7375 Unexplained 0 0 0 0 0 0

There is no difference if we use the correlation Matrix(R) or the covariance matrix (S).

You might also like