Julian Rachlin Final Thesis

Principal Component Analysis and
Extreme Value Theory in Financial

Applications
by
Julian Rachlin
A Thesis submitted to the faculty of Princeton
University in partial fulllment of the
requirements for the degree of Bachelor of Arts
Department of Physics
April 27, 2006
Thesis Adviser: Gyan Bhanot (IAS)
Departmental Representative: Professor Chiara Nappi
Second Reader: Professor Robert Vanderbei
This paper represents my own work in accordance with university
regulations
Abstract
This paper examines the potential of applying two mathematical methods,
Extreme Value Theory and Principal Component Analysis already in use in
astrophysics, to the eld of nance. Following the work of Cici Muldoon,
Extreme Value Thoery will be used to create a quantitative stock trading
strategy, the merits of which will be judged by back-checking using S&P
500 returns from January 1985 to December 2004. These same returns
will then be subjected to scrutiny using principal component analysis with
the objective of discovering underlying market structure or useful trading
information.
i
Acknowledgements
Sincere thanks to all those who lent their time and support to this project.
First and foremost among this group is my advisor Gyan Bhanot. Thank
you for your guidance, patience, and limitless enthusiasm for the project.
Thank you also for taking the time to meet with me weekly and encouraging
me to continue to work to expand the scope of the investigation throughout
the year.
Further recognition belongs to my departmental advisor Chiara Nappi, my
second reader Robert Vanderbei, and nally Professor Michael Strauss for
his help regarding the astrophysical aspects of this project.
Finally, Id like to thank my family for all their support throughout my
career at Princeton.
ii
Contents
1 Introduction 1
2 Extreme Value Theory 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Principal Component Analysis 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Mathematical Description of Basic Method . . . . . . . . . . 14
3.2.1 Geometric Interpretation . . . . . . . . . . . . . . . . 14
3.2.2 An Alternative View: The Karhunen-Loève Transform 18
3.3 PCA in Astrophysics . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Dimensionality Reduction . . . . . . . . . . . . . . . . 19
3.3.2 A Natural Basis . . . . . . . . . . . . . . . . . . . . . 22
3.4 Financial Application . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 A Simple Example . . . . . . . . . . . . . . . . . . . . 24
3.4.2 The Application . . . . . . . . . . . . . . . . . . . . . 26
3.4.3 Data Reduction & Eigenvalue Analysis . . . . . . . . . 27
3.4.4 Eigenvector Analysis . . . . . . . . . . . . . . . . . . . 34
3.4.5 Trading Strategy . . . . . . . . . . . . . . . . . . . . . 40
iii
CONTENTS iv
3.4.6 Market Memory . . . . . . . . . . . . . . . . . . . . . 41
4 Conclusion 46
Chapter 1
Introduction
Conceptually, physics and nance seem to have much in common. Both
market analyst and physicist speak daily about the forces that aect the
movement of their worlds. They each search tirelessly to understand how
these forces arise, the exact eect they have, and how they act. Though
the force of supply and demand may not seem analogous to the force of
gravity, they aect the world in similar ways. Consider a physical system
that has been disturbed from its equilibrium position. The physicists goal
of understanding how equilibrium will be recaptured is identical to the mar-
ket analysts task of understanding how nancial news will disturb todays
market equilibrium and how the market will change to recapture it. Both
the nancial market and physical world change constantly in this way in
response to various forces. Are these similarities supercial or do these
conceptual similarities hint at a deeper overlap still to be explored?
The purpose of this paper is two-fold. It seeks to highlight the signi-
cant conceptual overlap between physics and nance and demonstrate the
possibility for the advancement of nancial research through the application
of mathematical methods currently widely used in physics. An additional
goal is to encourage the further pursuit of nancial physics which may lead
to perhaps more creative, descriptive, and eective nancial models. This
1
CHAPTER 1. INTRODUCTION 2
paper focuses on two mathematical processes already used in astrophysics.
The rst is Extreme Value Theory and the second is Principal Component
Analysis.
In the next chapter, we will explore the possibility of using Extreme
Value Theory to create a stock trading strategy. This strategy is outlined
in a past paper by Cecilia Muldoon. Here we shall carry through this strat-
egy and examine its performance against a standard benchmark, the S&P
500. The following chapter will focus on Principal Component Analysis.
The eectiveness of this statistical method in decomposing a nancial time-
series will be evaluated. The goal of these applications is to discover to
what extent they can successfully be applied to nancial investments and
indicate whether the search for further overlap between these two elds is
advantageous.
Chapter 2
Extreme Value Theory
2.1 Introduction
Extreme Value Theory is a branch of statistical mathematics that studies
extreme deviations from the median of probability distributions [17]. These
values are of practical importance because they often represent times of
greatest loss or gain. Catastrophic earthquakes, oods, and market crashes
are all examples of real world phenomena modeled by EVT. The basis of
the theory was established by Gumbel in 1958. Gumbel found that the dis-
tribution of statistically smallest or largest extremes of a distribution tends
asymptotically to analytic form for a general class of parent distributions
[2]. The general form of the extreme value cumulative distribution function
was found to be
F(x) = e
e
(xx
0
)
(2.1)
for maxima and for minima,
F(x) = 1 e
e
(xx
0
)
(2.2)
This distribution along with the corresponding probability distributions are
pictured in gure 2.1.
3
CHAPTER 2. EXTREME VALUE THEORY 4
Figure 2.1: Gumbel Cumulative & Probability Distribution Functions [17]
To date, extreme value theory is used in a variety of applications. In en-
gineering, it is widely used to model component failure risk and in the insur-
ance industry it is used to model the risk of large payos from catastrophic
events. Other elds of application include hydrogeology and meteorology.
In an unpublished junior paper entitled Extreme Value Theory: Bright
Cluster Galaxies and the Stock Market Cecilia Muldoon discusses recent
eorts by Bhavsar to use extreme value theory to determine whether bright
cluster galaxies are a special class of galaxy or just the extremes of a normal
distribution. His method compared how well empirical evidence matched
extreme value distribution models, normal distribution models, and combi-
nations of the two. Then inspired by her knowledge of extreme value theory,
Muldoon next proposed a nancial trading strategy based on the theory as
formulated by adviser Gyan Bhanot. In the remaining portion of this chap-
ter we shall carry out the proposed strategy and examine its performance.
2.2 Method
The trading strategy is founded on the principal that by basing stock selec-
tion of the constituents of the markets extreme value distribution a portfolio
can be created that consistently outperforms the market. To form this port-
folio Muldoon suggests using the past 12 months of returns to build an
extreme value distribution both for losses and gains at each time step as
shown in gure 2.2. For this project t = 1 Month. The stocks of these
distributions will then be used to form the portfolios short and long po-
sitions, respectively, for the coming month. To build these distributions,
the whole population of stock returns of the past 12 months will be iso-
lated. Then from this group, sub-populations of S random returns will be
selected. The maxima (maximum and minimum) will be chosen from this
sub-population and retained for addition to the extreme value distribution.
These sub-populations will be chosen a few thousand times, the maxima of
which will be used to dene the extreme value distribution of the past 12
months returns. From the two resulting distributions we form a portfolio
the performance of which we can then track by recording the performance
of these stocks the following month.
Figure 2.2: Example of Extreme Losses/Gains Distributions
To accomplish this task we will examine returns from the S&P 500 be-
cause it represents a homogeneous population but is diverse enough to be
representative of the market as a whole. The data we use contains monthly
stock returns for the S&P 500 constituents from the month of January, 1984
to December, 2004.
As a practical matter we introduce a few more logical constraints. The
rst constraint is a pre-determined portfolio size. Our portfolio will contain
no more than 30 stocks. Stocks within this portfolio will be equally weighted.
Secondly, we introduce a step to eliminate the intersection of the two extreme
value distributions. If a stock is volatile it may appear in both distributions.
In an attempt to eliminate potential volatility from portfolio performance
we ignore these stocks. Finally, we ignore dividends and transaction costs
to facilitate calculations.
This leaves only three variables to modify to adjust portfolio perfor-
mance. They are: sample size, number of samples, and long/short ratio.
The rst two refer to the process used to build the extreme value distri-
butions. The sample size is the amount of stocks, S, that form the sub-
population groups from which maxima are selected. S, in this sense, is
inversely proportional to the diversity of stocks in the extreme value dis-
tributions. There is a delicate balance that between identifying stocks that
consistently outperform and identifying the few stocks that happen to have
extraordinary returns on some given month. The number of samples, N, is
the amount of times sub-populations are formed and maxima are selected.
The product of this variable and S divided by the whole population size is
the probability of selecting an individual stock to be in a sub-population.
This product should be many times greater than the whole population. Fi-
nally we are free to chose a long/short ratio, which determines the weight
of long and short positions in the overall portfolio. As a practical matter,
long positions are usually more heavily weighted within a typically portfolio
since the market trends upwards.
2.3 Results
Generally the returns produced by this strategy are tied to the adjustment
of two variables discussed before, sample size and long/short ratio. To begin
our investigation we examine the eects of sample size while keeping other
parameters constant. An initially low sample size of 30 will allow a great
diversity of stocks into the portfolio but still indicate whether the strategy
is protable.
0 50 100 150 200 250
20
0
20
40
60
80
100
120
140
160
180
Time (Months)
R
e
t
u
r
n
s
Student Version of MATLAB
Figure 2.3: Sample Size-30; Number of Samples-2000; Long/Short Ratio-9
Figure 2.3 shows how the trading strategy performs over several trials.
Clearly the strategy is protable, but how the strategy performs in compar-
ison to the S&P index has yet to be seen. Not surprisingly dierent trials
of the strategy yield dierent results. This will be especially true for lower
sample size values. This is because low sample sizes allow a greater diver-
sity of stocks into the portfolio. Though protable, the wide variation in
expected returns is enough to deter investors. To remedy this we increase
the sample size to narrow the range of possible stocks that may comprise
our portfolio. The results are seen below in gure 2.4, 2.5 & 2.6.
As expected, variation in expected returns shrinks as a result. Addition-
ally, expected returns also increases as a result of this change. The increased
sample size has decreased the randomness of the portfolio constituents and
0 50 100 150 200 250
0
50
100
150
200
250
Time (Month)
R
e
t
u
r
n
s
0 50 100 150 200 250
0
50
100
150
200
250
Time (Months)
R
e
t
u
r
n
s
0 50 100 150 200 250
0
50
100
150
200
250
Time (Months)
R
e
t
u
r
n
s
makes the portfolio more reliant on the true extreme value distribution.
The increase in returns suggests three things. The rst is that higher sam-
ple sizes yield higher returns. The second is that higher samples sizes yield
lower variation in expected returns. And nally, that this trading strategy
is signicantly better than a random portfolio of stocks. However, the op-
posite extreme for sample size is displayed in gure 2.7. Here the sample
size is 300. The result is a signicant drop in returns. We conclude that at
either extremes of sample size, returns decline. Thus there is some sample
size within this range that maximizes returns. According to our trials, this
optimal sample size is 100.
Next we investigate the eect of changing the long/short ratio. As a
general rule more weight is typically given to long positions based on the
reasoning that stock prices normally trend upwards. In the previous exercise
the long/short ratio was 9, that is 90% of the portfolio was invested in long
positions and the last 10% in short positions. Figure 2.8 demonstrates that
0 50 100 150 200 250
0
50
100
150
Time (Months)
R
e
t
u
r
n
s
shift the long/short ratio towards the short positions decrease returns. We
deduce that either this trading strategy works less for stocks experiencing
negative losses or that as a general rule short positions do not help form a
good long term portfolio.
Finally, having roughly determined the optimal parameter values for
both long/short ratio and sample size, we compare the performance of this
trading strategy to owning the S&P 500 index. Figure 2.9 compares the
performance of both portfolios. The dotted red line represents the S&P
index and the two blue lines represent two trials for the extreme value the-
ory portfolio. For this comparison there is no short position. Briey, the
performances are comparable. Our trading strategy does not outpace the
S&P index but neither falls short of the S&P returns. This means that
with added transaction costs and the higher volatility of the extreme value
portfolio a wise investor would probably prefer to own the S&P index.
0 50 100 150 200 250
50
0
50
100
150
200
250
Time (Months)
R
e
t
u
r
n
s
Extreme Value Theory: Short/Long Ratio
9
3
3/2
1
Figure 2.8: Sample Size-100;Number of Samples-2000;Long/Short Ratio-
Multiples
0 50 100 150 200 250
0
50
100
150
200
250
Extreme Value Theory vs. S&P 500
Time (Months)
R
e
t
u
r
n
s
Figure 2.9: Sample Size-100;Number of Samples-2000;Long/Short Ratio-All
Long
2.4 Conclusion
The performance of our optimal extreme value portfolio on average matches
the performance of the S&P. Considering transaction costs are ignored and
higher volatility it seems that owning the S&P index is a better alternative to
this active trading strategy. On the other hand, we cannot yet disregard the
strategy completely. Given its competitive performance, it could be useful
as a technique to highlight potential stock investments that could then be
evaluated qualitatively as part of a more comprehensive trading strategy.
Furthermore with additional computational power a more rigorous eort
to optimize the important parameter of sample size might yield stronger
returns.
Chapter 3
Principal Component
Analysis
3.1 Introduction
Principal component analysis, abbreviated as PCA, is a method of statisti-
cal analysis useful in data reduction and interpretation of multivariate data
sets [9]. The earliest dening application of principal component analysis
was in the social sciences. In 1904 Charles Spearman published General
Intelligence, Objectively Determined and Measured in the American Jour-
nal of Psychology. In the paper, he examined human intellectual ability as
it related to various subject matters: mathematics, writing, critical reason-
ing, etc. His analysis using PCA suggested an underlying intelligence factor
that determined intellectual ability regardless of subject matter. This fac-
tor became widely known as IQ. The consequence of the study for PCA was
recognition as a statistical method essential in analyzing large data sets.
PCA has now spread to a large number of scientic elds and is used in a
variety of dierent applications where analysis of inter-object correlations is
the focus. In particular, the past decades have seen growing use of PCA in
cosmology as the eld confronts rising statistical challenges.
13
CHAPTER 3. PRINCIPAL COMPONENT ANALYSIS 14
A general statement of the problem solved by PCA is the following:
Analyze the relationship between the m parameters of n objects provided a
given mn data set. The central step of PCA is the redenition of this data
set in terms of a new set of variables which are mutually-orthogonal linear
combinations of the original variables. The new variables dene a coordinate
axis in multivariate data space that forms a natural classication of the
data. The motivation behind this redenition is the dimensionality reduction
achieved by an orthogonal basis. PCA condenses correlations in the data to
single variables nding the true dimensionality of the data set. Provided
a willingness to sacrice some accuracy for economy of description, there
is the huge potential for signicant reduction in data dimensionality. This
makes PCA a powerful and versatile analysis method.
In this chapter we shall explore the basic mathematical description of the
process and discuss the benets of such an analysis procedure in the context
of astrophysics. Then we will apply this analysis to a nancial times series
as a preliminary search for market structure and trading strategies.
3.2 Mathematical Description of Basic Method
3.2.1 Geometric Interpretation
In the multi-dimensional space of the original data set, dened by its column
and row vectors, PCA seeks new axes that best summarize the data. To that
end PCA choose the axis of best summarization rst, followed by the next
best summarization, and so forth until the last bit of data variability has
been accounted for. This rst axis, or best t axis, is dened as the axis
that minimizes the sum of the squared Euclidean distances between itself
and the original data points. This is visually represented in Figure 3.1. So
this rst new axis is just a best t line as found in a normal regression. The
next best, or second new axis, will form in conjunction with the rst axis
a best-t plane that minimizes the sum of the squared Euclidean distances
to it. The third axis will create a best-t subspace in conjunction with
the rst two axes. This process continues until all data is accounted for
by a given axis. Consider gure 3.1 once again. Minimizing the sum of
squared Euclidean distances is apparently equivalent to maximizing the sum
of squared projections onto this new axis. This in turn is equivalent to
maximizing the data variance accounted for. We concluded that each new
axis is best t in the sense that they explain the maximum amount of residual
variability in the data set. Hence PCA replaces the original data axes with
a new set of orthogonal best-t axes that successively account for as much
residual variability as possible.
Figure 3.1: Simple Visual Representation of PCA
The mathematics of this process is well laid out by Murtagh & Heck
(1987), which shall guide the following explanation. If u is the rst new axis
of the PCA process, then the projection of the data in the original data set,
X, on this axis is given by Xu. As prescribed, our axis, u, must maximize
the squared projections given by
(Xu)
(Xu)
This quadratic will be unbounded unless a suitable restriction is put on u.
So let u be of unit length such that u
u = 1. Now letting S = X
X and
given the constraint u
u = 1, we now maximize,
u
Su
To solve, introduce a Lagrangian multiplier,
u
S u (u
u 1)
Dierentiating yields
2Su 2u = 0
Which reduces to an ordinary eigenvalue problem.
Su = u (3.1)
This result suggests that the rst new axis is equivalent to the rst eigen-
vector u
1
of the matrix S corresponding to the largest eigenvalue
1
. We
may then repeat this procedure to nd the second axis, but now introducing
the additional constraint of orthogonality with the rst axis. This yields a
second solution to Equation 3.1 with eigenvector u
2
corresponding to the
second largest eigenvalue
2
. Hence each successive eigenvector is also the
next new PCA axis. Furthermore because eigenvalues determine the rela-
tive merit of eigenvectors, the eigenvalue
k
represents the relative amount
of data variability accounted for by corresponding eigenvector, u
k
.
Analysis of this result proves dicult when S = X
X. This is because,
as in astrophysical data sets, parameters in X are often measured in dier-
ent units. For example distance and luminosity are common parameters in
these data sets and are measured on very dierent scales. The result is an
apparent overemphasis on certain observations. The standard solution to
this problem is to normalize the data set by reexpressing S as a correlation
matrix. A correlation matrix holds the correlation coecients of all variable
combinations such that cell S
ij
holds the correlation coecient of variable i
with variable j. This correlation coecient is mathematically dened as
corr(X, Y ) =
n
i=1
(x
i
x)(y
i
y)
_
n
i=1
(x
i
x)
2
n
i=1
(y
i
y)
2
_
1/2
(3.2)
where the numerator is the covariance between the two variables and the
denominator is the product of the variables standard deviation. In words,
correlation is a measure of the extent to which variables are related. A cor-
relation coecient represents the strength of this relationship by assigning
it a value between 1 and 1. Near 1 indicates high correlation, which means
the variables are highly related, 0 means not correlated, and 1 represents
a strong inverse relation. S is most frequently a correlation matrix, and
normally otherwise expressed as a covariance matrix.
Continuing on, the solution to equation 3.1 is well known. In general,
a matrix S can be reduced to a diagonal matrix L by premultiplying and
postmultiplying it by a particular orthonormal matrix U such that
U
SU = L (3.3)
The matrix L is a diagonal matrix whose elements are the eigenvalues (
k
)
of S. The columns that make up U are its eigenvectors (u
k
). Equation 3.1
reduces to the characteristic equation,
|S I| = 0 (3.4)
where I is the identity matrix. Equation 3.4 leads to a k
th
degree polynomial
with k roots. These roots are the eigenvalues,
k
, that line L. Plugging each
of these values into the equation
[S
k
I]u
k
= 0 (3.5)
yields k corresponding eigenvectors, u
k
.
3.2.2 An Alternative View: The Karhunen-Loève Transform
We have already seen a geometric representation of PCA. Here we take
time to examine an alternative view of PCA as an expansion of terms. The
expansion is commonly named the Karhunen-Loève Transform and it is this
transform that is often used in astrophysics.
Complementary principal components v
k
may be found by performing
PCA in the dual space of X. That is, both left and right eigenvectors may
be found. Again, following the discussion of Section 3.2 and Murtagh &
Heck (1987), we maximize:
(X
v)
(X
v)
subject to
v
v = 1
As before this produces,
XX
v
1
=
1
v
1
Premultiplying X gives,
(XX
)(Xu
1
) =
1
(Xu
1
)
thus
1
=
1
since the two eigenvalue problems are identical and so we nd
v
1
=
1
1
Xu
1
and more generally we nd that
v
k
=
1
k
Xu
k
u
k
=
1
k
X
v
k
(3.6)
Now taking this relationship (Xu
k
=

k
v
k
) and postmultiplying by u
k
and summing gives
X
n
k=1
u
k
u
k
=
n
k=1
_
k
v
k
u
k
Which, given the orthonormality of vectors, reduces to
X =
n
k=1
_
k
v
k
u
k
(3.7)
The result is an expansion of the original data in terms of its orthogonal
basis. This expansion is called the Karhunen-Loève expansion. As with any
expansion of terms a suitable approximation of X may be given by the rst
few terms in the expansion.
3.3 PCA in Astrophysics
Our objective in this section is to understand the main benets of perform-
ing PCA by examining the function it serves in a few astrophysical applica-
tions. We shall focus on two main benets, reduction of dimensionality and
orthogonal variables.
3.3.1 Dimensionality Reduction
Only decades ago astrophysicists struggled to extract the most amount of
information possible from relatively small data samples. But, thanks mostly
to large digital sky surveys, a eld that once lacked data has now become
inundated with astronomical data sets of unprecedented size and complex-
ity. The trend of increasing data, which results from large scale surveys such
as 2MASS(Two Micron Sky Survey), SDSS (Sloan Digital Sky Survey), or
2dFGRS(Two degree Field Galaxy Redshift Survey), has transformed the
problems of data analysis dramatically. These surveys compile observations
on millions of objects and potentially dozens of parameters. The informa-
tion from these surveys could yield answers to basic astronomical questions
and help measure the global cosmological parameters to within a few per-
cent. But to accomplish this, these large data sets need to be properly and
comprehensively analyzed.
The analysis of such data sets is obviously an inherent multivariate sta-
tistical problem. PCA shows great promise in this role because PCA is an
ecient and objective statistical method of determining physically interest-
ing multivariate correlations. As demonstrated, PCA condenses correlated
variables into new uncorrelated variables thereby gaining an economy of de-
scription with little data loss. Reducing the data to its true dimensionality
means capturing underlying trends present in the original data set.
In 1973, Brosche applied PCA to describe the statistical properties of
galaxies. Using a small data set of 31 objects with 7 measured parameters
he found that two independent variables were largely responsible for the
whole amount of data. Later in 1981 Balkowski, Guibert & Bujarrabal
would conrm this bidimensionality in another paper examining a larger
data set. The two axes grouped the parameters such that diameter, HI
mass, indicative mass, and luminosity were contain by an axis labeled size
and morphological type and color index, into a second called aspect. This
reduction in dimensionality revealed underlying parameter correlations that
is a primary benet of PCA.
But nding the true dimensionality also provides a method for data re-
duction. The two axes Brosche found captured 83% of the original data
implying little loss of information by ignoring remaining eigenvectors. The
data reduction provided by PCA allows the possibility of analyzing larger
data sets. Currently, astronomical data sets increase by an order of magni-
tude in size each generation. This rate outpaces the growth of computational
speed as determined by Moores Law. Soon standard analysis methods will
become infeasible for next generation data sets. PCA provides the data
compression necessary to analysis these larger, more complex data sets.
In light of this, Tegmark, Taylor, & Heavens (1997) consider the possi-
bility of PCA as a standard step for linear data compression. Following this
paper, we consider astronomical observations to be a random variables, x,
with probability distribution L(x; ) dependant on a vector of parameters
= (
1
,
2
, ...,
m
)
The typical procedure for estimating a particular parameter
i
is to maxi-
mize its likelihood function, where the likelihood function of
i
is a condi-
tional probability function of
i
holding all other arguments as xed. This
maximum likelihood estimation procedure seeks the most likely value of pa-
rameter
i
. Important in this procedure is the Fisher Information Matrix
dened by
F
ij

_
2
(ln L)
j
_
(3.8)
This matrix denes our ability to estimate a specic set of parameters and
is a measure of the information content of the observations relative to the
particular parameter. Mathematically, the Fisher Information Matrix is the
variance of the score, where statistically speaking the score is the partial
derivative, with respect to
i
, of the natural log on the likelihood function.
So visually this is a measure of the sharpness of the support curve near the
maximum likelihood function. By the Cramer-Rao inequality, for any unbi-
ased estimator the minimum amount of error using this estimation procedure
is dened by
i
1/(F
1/2
ii
)
Furthermore the covariance matrix C is related to the Fisher Matrix as,
C
1
ij

2
(ln L)
j
For the Gaussian case a well known identity is
F
ij

1
2
Tr(A
i
A
j
+C
1
M
ij
) (3.9)
where A
i
(ln C
,i
) and M
ij
D
,ij
using standard comma notation for
derivatives.
Now as in Tegmark, Taylor & Heavens (1997) consider a general linear
compression
y = Bx
where y is the new compressed data set. Substituting into equation 3.9 and
assuming B = b
t
, that is B is a single row vector transformation, we nd
that Equation 3.9 reduces to
F
ii
=
1
2
_
b
t
C
,i
b
b
t
Cb
_
2
+
(b
t
,i
)
2
(b
t
Cb)
(3.10)
So our task is to nd the transformation that maximizes this value. When
is independent of the parameter then the second term in Equation 3.10
vanishes and we now seek to maximize
(2
F
ii
)
1/2
=
|b
t
C
,i
b|
b
t
Cb
Since the denominator is always positive because the covariance matrix is al-
ways positive, the search reduces to nding an extremum for the numerator.
Normalizing b we again arrive at the Lagrangian problem
b
t
C
,i
b b
t
Cb
which as we have seen reduces to an ordinary eigenvalue problem. Thus
they have shown that the optimal linear compression is the Karhunen-Loève
Transform. Because of this result Tegmark, Taylor, & Heavens (1997) sug-
gest that PCA become a standard method of data compression for the in-
creasingly large cosmological data sets available.
In conclusion, the main benet of PCA, the reduction of data sets to
their true dimensionality, serves two important functions in astrophysical
statistical problems. The rst is the capture of underlying trends by iden-
tifying parameter correlations. The second is data compression, which is
important when analyzing large data sets.
3.3.2 A Natural Basis
The use of PCA to analyze redshift surveys has been advocated by Vogeley &
Szalay in several papers. If current models are correct then large-scale struc-
ture in the universe is the result of gravitational instability on an initially
Gaussian density eld. For this reason describing the large-scale structure
of the universe may lead to a deeper understanding of the early universe.
Thus nding a suitable way to analyze information from redshift surveys is
of critical importance to current astronomical research.
The preferred way to characterize this structure distribution is the power
spectrum, which is typically estimated by directly summing the planewave
contributions from each galaxy. However such a Fourier expansion has sev-
eral weaknesses. The most signicant is the non-orthonormality of the basis
in samples of complex geometry, such as pencil surveys and deep slice sur-
veys. Thus precisely because of its ability to form an orthonormal basis for
any data set, Vogeley & Szalay (1996) suggest an alternative expansion in
terms of the Karhunen-Loève Transform. The standard PCA procedure is
used to dene a new orthonormal basis within original data space.
The results from such an analysis have been shown to yield results that
are comparable to those obtain using traditional methods. Furthermore a
signicant amount of data reduction has been shown to be possible.
This not only shows the benet of the K-L transform in analyzing redshift
surveys of irregular geometries but also demonstrates the usefulness PCA
in dening the a new basis within the data of the original data set. That
is, PCA allows the data to suggest the best basis for data analysis. This
is useful, as shown, when traditional expansions function poorly. But also
for statistical data mining where little a priori knowledge of the results are
known, PCA is an ideal representation of the data.
3.4 Financial Application
The benets of principal component analysis seem especially appealing when
examining the movement of the stock market. Every year thousands of
companies spend millions processing the dizzying amount of information
available about the market. They hope to understand why events occurred
and how to predict if and when they will occur again. If PCA can help in
any way simplify this analysis or give some insight into the market, this will
be a worthwhile eort.
3.4.1 A Simple Example
To further solidify comprehension of the technique, a simple example is
introduced here. In order to foreshadow its nancial application in this
paper, stock returns are used as observations. The mathematical process
should become clearer and a concrete example now will serve to introduce
the specic method used later on.
This example considers three stocks and their returns over a span of the
past three months. The three stocks are: Acme Corp (A), Bells Corp (B),
and Cornerstone Corp (C). The matrix below holds the monthly returns
data for the past three months for each of these three stocks.
_
_
A B C
Month 1 .43269 .130435 .319149
Month 2 .01342 .00962 .06452
Month 3 .06452 .178808 .05085
_
_
The rst task will be to use these returns to form a correlation matrix
using the denition of correlation in equation 3.2. This will require the
standard deviation and mean of each stocks data.
A
= .0496
A
= .2622
B
= .0692
B
= .1007
C
= .0553
C
= .1795
This process is made quite easy with the use of computers and mathematical
software. Plugging in and solving gives,
Correlation Matrix =
_
_
_
_
_
1 .4400 .9106
.4400 1 .3220
.9106 .3320 1
_
_
_
_
_
Next it is possible to use a computer to quickly calculate the eigenvalues
of this matrix and their associated eigenvectors. They are:
_
_
_
_
_
Eigenvalue3 0 0
0 Eigenvalue2 0
0 0 Eigenvalue1
_
_
_
_
_
=
_
_
_
_
_
.080225 0 0
0 .75843 0
0 0 2.1613
_
_
_
_
_
The trace of this eigenvalue matrix is 3, as expected. Knowing that the
eigenvalues reect the relative importance of their associated eigenvector,
the rst eigenvector is expected to be the most telling principal component
of the three. In fact, it will account for 2.1613/3 72% of the data variation.
On the other hand the last eigenvector is inconsequential, accounting for
3% of the variation. Thus PCA has reduced a 3 variable problem into a 2
variable problem. The associated eigenvectors are:
_
_
_
_
_
_
_
E
i
g
e
n
v
e
c
t
o
r
3
E
i
g
e
n
v
e
c
t
o
r
2
E
i
g
e
n
v
e
c
t
o
r
1
_
_
_
_
_
_
_
=
_
_
_
_
_
0.7252 0.21829 0.65302
0.10889 0.90012 0.42181
0.67987 0.377 0.629
_
_
_
_
_
Such a simple example does not lend itself to rm interpretation since
these returns exist as a minute subset of a much larger stock market. How-
ever, imagine for the moment that these three stocks were of interest. It
would be simple to establish that the rst eigenvector demonstrates a strong
general correlation among the three stocks. This is because all coecients
in this eigenvector are positive. Stocks A and C have the highest magnitude
coecients in this rst eigenvector indicating that the correlation described
by the rst eigenvector is most strongly felt by these two stocks. The sec-
ond eigenvector describes a weaker trend of anti-correlation of stock B with
stock A and C, as shown by the negative sign of stock Bs coecient in the
second eigenvector.
Having found these eigenvectors and eigenvalues nishes the mathemat-
ical task required for the principal component process. The next task is to
expand this process to analyze a nancial time series comprised of a much
larger universe of stocks, the S&P 500.
3.4.2 The Application
As demonstrated in section 3.4.1, stocks can be analyzed by PCA like any
other correlated variable. For such an analysis either returns or price can be
used as observations. For this application, the process outlined in section
3.4.1 is exactly the process used here. The main exception is the scale. The
universe of this application is the S&P 500, because it is a good representa-
tion of the market as a whole. The data set used is the same as in chapter 2,
twenty years of monthly S&P 500 stock returns beginning in January 1985
and ending in December 2004. The result of this expansion in scope is that
there will now be 500 original variables to track which when analyzed using
PCA will produce 500 eigenvalues and 500 eigenvectors monthly. There are
240 months in this time series and so this means calculating 240 500 new
eigenvectors in total. To nd monthly eigenvectors a correlation matrix is
formed using the previous years returns. The eigenvalues and eigenvectors
of this correlation matrix were then found. Each month this process was re-
peated revealing new results. The time dependent nature of the data set as
well as this dramatic increase in scope makes the process of eigenvector in-
terpretation is much more complicated than in section 3.4.1. This endeavor
is intimidating especially given the exploratory nature of this exercise. For-
tunately, our project is guided by the search for three main results:
1. Signals
2. Structure
3. Trading Strategies
Signals are anything that might be useful as an indicator of future events.
For example, if a unique eigenvector composition is consistently observed be-
fore a market crash or period of market growth, then this unique eigenvector
composition is a signal that can be responded to in the future. Provided
with a signal of future events, one could reduce risk and increase gains.
Market structure is the organization and hierarchy of stocks. For in-
stance, the market is organized by industry such that industry trends con-
tribute to the overall market trend. Additionally, certain stocks within the
market carry relatively more weight, or inuence, in determining market
movement. These types of divisions and classications dene market struc-
ture. Our analysis might suggest an underlying structure to the market.
By reducing stock correlations a new more fundamental market structure
might be revealed that may yield a deeper insight into the market and its
movement.
Finally, a more all encompassing and nal objective is to nd any addi-
tional information that could be put to use to build a sound and successful
trading strategy.
3.4.3 Data Reduction & Eigenvalue Analysis
The numerical calculations, though lengthy and time consuming, are not
the challenge of the analysis. The challenge is understanding the results of
these calculations and determining how to extract meaningful information.
The time dependency of the data presents a formidable challenge to this.
Here the dicult job of interpretation becomes even more substantial. The
composition of the eigenvectors is determined entirely by the data and thus
changes with time. Inherent variation in the data can cause considerable
changes in eigenvector composition, but so can other factors like the dele-
tion or addition of a stock to the S&P 500. Furthermore, even if the analysis
is capable of revealing market structure it would be unreasonable to expect
this structure to remain unchanged over the last twenty years. As the econ-
omy transforms, and market forces change, we expect the eigenvectors to
dynamically conform to describe the new situation. While successive calcu-
lations allow for the search for patterns and trends, the fact that these may
come and go as time passes makes their identication a delicate process.
Fortunately, there is hope. As promised, PCA distills the enormous amount
of information regarding these 500 variables into only a few uncorrelated
variables. In fact, for our purposes it turns out only the rst 10 of 500
eigenvectors are needed to suciently describe the data and beyond that
only the rst few contain meaningful trends.
Using only a small subset of the total available eigenvectors begs an im-
portant question: How many and which to use? There are a few conventions
to follow. The SCREE Test is based on a visual graph. Here the eigenvalues
are graphed from greatest to least producing a sharply concave scatter plot.
A cuto is then determined visually by judging after what eigenvalue only a
remainder of essentially numerically indistinguishable eigenvalues exist. An-
other method prescribes keeping all the eigenvectors with eigenvalues greater
than the eigenvalue mean. The last method, the one used here, is based on
the fact that the amount of variation accounted for by j eigenvectors can be
determined exactly by summing the eigenvalues of those j eigenvectors and
dividing by the trace of the eigenvalue matrix. So if one chooses an accept-
able numerical cuto, an acceptable percent of the total variation accounted
for, this can be the basis of the stopping rule. Or mathematically,
j
i=1
i
N
i=1
i
Cuto Fraction
where N is the total number of eigenvalues. Graphing this quantity for the
rst few principal component yields gure 3.2.
Figure 3.2 suggests that the rst ten eigenvectors provide sucient in-
formation to describe the data set. For these rst ten eigenvectors the lower
bound cuto fraction over time is 95%. So retaining only the top ten eigen-
vectors means that dimensionality has been reduced by 98% while retaining
95% of the original information. This incredible reduction demonstrates the
0 50 100 150 200 250
0
10
20
30
40
50
60
70
80
90
100
Time (Months)
P
e
r
c
e
n
t

o
f

T
o
t
a
l

V
a
r
i
a
t
i
o
n
Fraction of Total Variation vs. Time
1
110
Figure 3.2: Cumulative percent of variation accounted for by successive com-
ponents over the time series. The rst 10 eigenvectors are shown demon-
strating the high amount of variation these rst few eigenvectors account
for.
strength of using PCA for nancial data sets. The majority of the market
has been captured by these few new variables and PCA has narrowed our
eld of investigation to these few signicant, possibly interpretable variables.
Eigenvalues highlight the important eigenvectors, but remembering that
they represent the relative importance of an eigenvector much more can be
learned from them. For example, examining gure 3.2 reveals that the rst
eigenvector remains dominant in relative importance over time, but the last
nine eigenvectors, instead of being of markedly decaying importance, are of
relatively equal importance to each other. This suggests that the market
is dominated by a single trend but comprised of many additional trends
of lesser importance. Also consider the nature of the rst eigenvalue over
time. Clearly, it is changing from month to month. This means the fraction
of market variation captured by the rst eigenvector changes. Often these
changes are gradual, but more obviously the changes are sometimes marked
by drastic increases. An example of this occurs between t = 22 and t = 35.
This indicates that over that time period, in response to certain market
forces or events, a distinct and broad market trend is being experienced
and thus a larger portion of market variability is being captured by the
rst eigenvector. An interesting feature of these drastic changes is that the
amount of time they appear for is generally quite uniform. As demonstrated
by the arrow widths in gure 3.2 the market enters these abnormal trend
periods abruptly and then remains in such a state for approximately one
year before the market relaxes back to a normal mode. In trading this
could be very useful information. If a trader observes the market entering
this abnormal mode or knows how long the market has been in the mode, he
can then estimate how long prevailing trends will last until a normal market
is regained.
Another potentially interesting comparison is between the fraction of
variation accounted for by each principal component and the movement of
the S&P 500. As demonstrated by the previous exercise, this fraction can
change quite substantially over time. And it might be informative to observe
these changes in comparison to S&P 500 movements to see if there is any
connection. Two benets come from this comparison. The rst is that these
eigenvalue changes might be explained by the S&P 500 movements. For ex-
ample, if an abnormal mode begins the same month as a crash then there is
a clear causal relationship that might in turn also suggest something about
the nature of the crash or subsequent recovery. The second is that eigen-
value changes or magnitudes might serve as a good signal for future S&P
performance. So in this case, we observe if any of the top ten eigenvectors
display indicative and consistent patterns prior to a major event.
0 50 100 150 200 250
0
50
100
150
200
250
Time (Months)
R
e
t
u
r
n
s
S&P Returns
0 50 100 150 200 250
10
20
30
40
50
60
70
Time (Months)
P
e
r
c
e
n
t
First Principal Component Eigenvalue
Figure 3.3: Comparison of rst eigenvalue movement with market move-
ment.
There are a few conclusions that come from this analysis. The rst in-
teresting observation is that market crashes generally occur when the rst
eigenvalue is relatively high. Specically the rst eigenvalue is only 40%
or greater when a crash occurs or is imminent. A market crash will pro-
duce high rst eigenvalues since the rst eigenvector will capture this broad
market trend. However high rst eigenvalues seem to occasionally precede
market crashes as well. And while high eigenvalues can be explained by
market crashes, market crashes may also be explained by high eigenvalues.
A high eigenvalue indicates that the market is driven largely by a singular
trend which could warn of market instability. An appropriate analogy is the
stability of a table supported by four legs compared to just one. It could be
that high rst eigenvalues signal large market downswings. Of course there
are not enough crashes to say with condence that this is a trend. Fur-
ther investigation will be required. More supporting evidence comes from
a second observation of low rst eigenvalues during strong market growth.
Specically over the 1990s (t = 80 150), regarded as a classic example of
a bull market, rst eigenvalues are at an all time low.
To investigate more closely the possibility of market signals in the rst
eigenvalue movements consider gure 3.4. The most striking feature of the
plot is the rapid increase in eigenvalue during the 1987 stock market crash.
This rapid increase is indicative of the general recovery trend of stocks fol-
lowing the crash. An abnormal mode follows the crash. This mode is the
result of the crash and not a signal for the crash as the gure clearly shows.
However it may be informative for further investigation to perform this same
investigation using daily returns. If they exist, signals may be more visible
on this time scale.
This same analysis could be performed for the next nine principal compo-
nents however the remaining nine eigenvalue plots follow approximately the
same pattern. This suggests that they are reacting to the rst eigenvalue and
change very little independently. If this is true then these plots have little
10 15 20 25 30 35
0
10
20
30
40
50
Time (Months)
R
e
t
u
r
n
s
S&P Returns
10 15 20 25 30 35
40
50
60
70
Time (Months)
P
e
r
c
e
n
t
First Principal Component Eigenvalue
Figure 3.4: Closer comparison of rst eigenvalue movement with market
movement at the 1987 market crash.
to reveal. In conclusion it seems that there is a loose anti-correlation be-
tween market returns and rst eigenvalues. Low eigenvalues indicate market
stability and growth and high eigenvalues foreshadow market downswings.
3.4.4 Eigenvector Analysis
Now we move past the information provided by the eigenvalue matrix and
on to an analysis of the eigenvectors themselves. The eigenvectors are just
lists of coecients that correspond to a stock in the S&P 500. These coef-
cients indicate the weight of this stock within that particular eigenvector.
Higher magnitude coecients indicate greater importance within the eigen-
vector. The distribution of these coecients among the stocks dene each
eigenvector. From these coecients one can examine which stocks form the
core of the eigenvector by nding all those with the highest magnitude coef-
cients. In other words, one can nd which group of stocks best represents
the eigenvector as a whole. Furthermore, by comparing the stocks in this
core group to each other one may determine a connection between them
and understand what larger group they are a part of. This could lead to
a suitable approach for eigenvector interpretation. For a static eigenvec-
tor of 500 coecients this is already challenging and unfortunately, due to
the time-dependent nature of this data set, eigenvector composition changes
through time. Thus taking advantage of this interpretive approach requires
creative methods. As a preliminary measure we look at the eigenvectors
performance. To do this, at each new time step t
0
new eigenvectors are
found using the past years data. Then stock is bought in amounts weighted
by the eigenvector coecients. The magnitude of the coecient determines
how heavily each eigenportfolio invests in a certain stock and the sign of
the coecient determines whether that stock is bought or sold (shorted).
Portfolio performance is then measured by the average of the product of the
portfolio stock coecients and their returns over month t
0
. By examining
the trends of these eigenvectors performance it may be possible to under-
stand the eigenvectors and use them without directly sifting through their
composition.
The next step is to dene sectors. In nance, traders often break up the
S&P 500 universe into 10 dierent sectors. This decomposition allows for a
more detailed explanation of the whole markets movement. These sectors
are dened by industry. The utility of such a grouping strategy lies in the
high correlation between stock returns of like companies. By forming sectors
analysts hope to understand the composite trends that move the market. In
essence, the current PCA analysis has done the same thing. Within each
eigenvector is a grouping of like stocks based on correlation. This is dened
by individual stock coecients within each eigenvector. So the ten eigenvec-
tors have essentially redened the ten S&P sectors, this time not in terms
of industry but purely based on correlation of stock movement. However,
because each eigenvector is comprised of all S&P 500 stocks weighted by
varying degrees, which stocks to include in these new sectors becomes im-
portant. As before, coecients within a particular eigenvector dene the
importance of a stock within the eigenvector. Thus taking the stocks with
the highest valued (absolute valued) coecients should suitably dene these
new sectors. The composition of S&P sectors ranges from 10 stocks to 90
stocks. So to dene these new PCA sectors a judgement will have to be
made as to how many stocks to include in each sector. Unlike the industry
breakdown, these sectors are not mutually exclusive and stocks may appear
in several sectors. The most importance criteria in how many stocks to select
is how well this group of stocks approximates the entire eigenvectors perfor-
mance. To make this judgement, observe how closely sectors comprised of
the rst 30 (Blue Line), 50 (Green Line), 75 (Black Line), 100 (Orange Line)
stocks match the movement of the whole eigenvector (Red Line) in gure
3.5. Then choose the smallest number of stocks that still approximates the
eigenvector reasonably well to form the new sectors.
The results of this analysis are below. If none of the stock amounts
Figure 3.5: First Component Poten-
tial Sectors
Figure 3.6: Second Component Poten-
tial Sectors
Figure 3.7: Third Component Poten-
tial Sectors
Figure 3.8: Fourth Component Poten-
tial Sectors
Figure 3.9: Fifth Component Poten-
tial Sectors
Figure 3.10: Sixth Component Poten-
tial Sectors
Figure 3.11: Seventh Component Po-
tential Sectors
Figure 3.12: Eighth Component Po-
tential Sectors
Figure 3.13: Ninth Component Poten-
tial Sectors
Figure 3.14: Tenth Component Poten-
tial Sectors
approximated the eigenvector very well, then the best was chosen and if
higher amounts only estimated the eigenvector better by marginal amounts,
then smaller stock amounts were chosen.
Sector # Stocks in Sector
1 30
2 100
3 50
4 100
5 100
6 30
7 100
8 30
9 75
10 100
The results are actually slightly disappointing and indicate that only a
few eigenvectors, if not just the rst, can be represented by a smaller sub-
set of stocks. However, examining the eigenvector trends that demonstrate
distinct correlation, anti-correlation and neutrality with market movements
does provide some possibility of creating a hedging strategy to gain excess
returns. Furthermore one very interesting result to come from this inves-
tigation is shown in gure 3.15. The gure shows that a trading strategy
formed solely on the premise of always holding the rst eigensector will over
a twenty year period tend to outperform the S&P. This is certainly a re-
markable feat and further investigation should be dedicated to determining
whether this is a feasible trading strategy or whether transaction cost will
destroy this potential strategy for excess returns.
0 50 100 150 200 250
0
50
100
150
200
250
300
Time (Months)
R
e
t
u
r
n
s
First Sector Returns vs. S&P 500 Returns
Figure 3.15: First Sector (Blue Line) Returns against S&P Returns (Red
Line)
3.4.5 Trading Strategy
Having chosen sectors, we now engage in an exercise aimed at using these
sectors to gain excess returns. Weve seen already that buying the rst sec-
tor (the rst 30 stocks of the rst eigenvector) tends to outperform the S&P.
This might be a decent trading strategy in itself, however unless executed at
large scales, transaction costs will likely undercut this potential for excess
returns. Most other eigensectors have somewhat erratic return trends. This
makes them unsuitable for use in this trading strategy. However the rst
four sectors along with the eighth have remarkably predictable trends and
may provide a way to realize signicant returns. The rst sector produces
the highest returns, however is susceptible to market crashes. The second
sector produces much smaller returns, but seems more resilient to crashes
and downward swings. It seems right to let these form a long position.
The short position shall be the combination of the 4th eigensector and the
8th eigensector, both of which are anti-correlated with the market. If the
sacrice in overall returns due to a linear combination of these four sectors
is suciently compensated by lower volatility, then the combination is a
better portfolio. Figure 3.16 shows the results of a few combinations. The
dotted redline is the S&P performance. Above in black is the rst eigensec-
tor alone. Below is an equal weighted combination of all four eigensectors
in green. Finally in blue, a combination of sectors with 80% investment in
the rst sector and the remaining percent distributed among the other three
sectors is displayed. Additionally, the third sector could be included in this
strategy, though this possibility will not be explored here. The volatility of
the third eigensector does suggest that in a more advanced and comprehen-
sive strategy the third eigensector may be exploited using derivatives. A
straddle would be an appropriate tool for this function.
The overall result of the exercise is that lower volatility can be achieved
but not without overly sacricing returns as demonstrated by the green line.
Furthermore the blue line shows that a more reasonable returns sacrice
does little to eliminate volatility. The inability to nd a linear combination
of stocks that produces excess returns or decreases volatility with only a
slight drop in returns is disappointing. And it is largely the result of the
eigenvector structure which is dominated by a rst eigensector that generates
far superior returns than any other eigensector. The simplicity of the market
structure represented by this picture is in itself surprising.
However enough information has been found to suggest an alternative
trading strategy revolving solely around the rst eigensector, which has al-
ready been demonstrated to gain excess returns. Since the returns of the
market can be approximated reasonably well by the top 30 stocks in the
rst eigenvector, one might consider examining the prospects of these rms
before investing to determine an average projection of returns over the next
month. Provided this average is positive, investments in the market and
these 30 stocks in particular should be made. If on the other hand the
overall picture delivered by these 30 stocks seems weak, perhaps one should
avoid investing to avoid potential losses. In this way PCA identies the few
stocks that provide the most accurate composite picture of future market
movement.
3.4.6 Market Memory
The uctuations of eigenvector composition through time demonstrates an
expected temporal variation in market structure. Eigenvector composition
adjusts to best describe current market trends, which are themselves tem-
porary and changing. A parameter of interest to nancial analysts is the
duration of these trends. It may then be informative to examine the extent
to which eigenvectors change as a function of time. Such an examination
could shed light on the nancial concept of market memory, which denes
how long past trends and events persist in aecting todays market. Market
memory is an important concept for traders, who need to judge when to
cash out. If they were to have a deeper understanding as to how long
0 50 100 150 200 250
0
50
100
150
200
250
300
Time (Months)
R
e
t
u
r
n
s
Portfolio Returns
Figure 3.16: Portfolio returns for linear eigensector combinations.
trends persist in general, then they may be in a better position to judge
when current trends will subside. The dynamics of the eigenvectors may
provide an objective method of estimating market memory. There are two
related paths of investigation that should produce results.
The rst test to conduct will measure at each time period how many
months must pass before less than X percent of the original eigensector
remains. That is, at time t
0
the rst eigensector, E
0
is found. Then the rst
eigensector is found for time t
1
, t
2
, t
3
, ... until
E
1
E
n
30
X
Where 30 is the total number of stocks in the sector and n is the value
for market memory or number of months passed. For such a test the rst
eigensector, which captures the majority of all trends, appears to be the
strongest candidate for investigation. If this test were executed using lesser
eigensectors, dened by weaker trends, a progressively lower value for mar-
ket memory should be expected. The result of the test is the relationship
between percent of original eigensector stocks remaining, X, and time, n.
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
0
1
2
3
4
5
6
7
T
i
m
e

(
M
o
n
t
h
s
)
Percent of Original Stocks that Remain in the First Sector
Percent of First Eigensector Overlap vs. Time
Figure 3.17: Test 1 - Percent of rst eigensector that remains as a function
of time.
We note several features of the result. The rst is that the overlap de-
creases exponentially as a function of time and not linearly as might be
expected. Furthermore it seems that there is signicant overlap from month
to month as on average there is still 85% residual overlap in the rst eigen-
sector. However after approximately six months, overlap is only 25%.
The second test will measure the Euclidean distance of a t
0
eigenvector
to n future eigenvectors. The Euclidean distance between two eigenvectors
is dened as,
||(E
0
E
n
)||
Here the distance will serve as a measure of past correlation, where large
distances indicate tiny eigenvector overlap. Distance is expected to grow
as time moves forward, indicating the fading market memory. This should
display how quickly and at what rate market memory fades.
0 2 4 6 8 10 12 14 16 18 20 22 24
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time(Months)
E
u
c
l
i
d
e
a
n

D
i
s
t
a
n
c
e
Euclidean Distance vs. Time
Figure 3.18: Test 2 - Euclidean Distance between rst eigenvectors at dif-
ferent times.
The result reects a quick, nearly linear, increase in Euclidean distance
over a period of six months. This begins with a sharp increase in distance
within the rst month, which corresponds to dropping eigenvector overlap.
Nearly 30% of the maximum Euclidean distance is obtained during this rst
period, indicating that some trends are destroyed within a single month.
This result matches well with the previous test which also indicates a nearly
30% drop in overlap within the rst month.
After the six month mark, Euclidean distance begins to plateau. This
feature indicates two things: residual trends that exist over a period of
roughly two years, and intermediate trends that are destroyed after six
months. These features are not as well observed in the rst test.
In conclusion, it seems these results support the classic division of trends
into three types: short, intermediate, and long-term trends. Intermediate
trends last a maximum of six months and long-term trends can last several
years. The investigation also indicates short term trends last approximately
a month, however given that this is the minimum time step of the investiga-
tion, future work could be done using weekly or even daily data to investigate
the eigenvector overlap with the rst month. This is expected to follow the
same exponential trend as indicated by these results.
Chapter 4
Conclusion
In this paper we demonstrate potential areas of overlap between the elds
of physics and nance. The motivation for such an overlap is the rigorous
mathematical methods in use in physics that may be useful for nancial
analysis. The eld of astrophysics provides a suitable starting place for
such a search because the challenges of statistical analysis in this eld are
similar to those existing in nance. The application of extreme value theory,
which has been used to determine the nature of Bright Cluster Galaxies,
and Principal Component Analysis, which is used extensively to analyze
large astrophysical data sets, to nance in this paper has met with mixed
results. But though the results of this paper are not overwhelming, they
are encouraging enough to suggest that continued investigation into this
potential overlap is a worthy eort.
Notable successes of this investigation have been the ability to quantify
market memory using PCA and the remarkable success of the rst principal
component to describe and outpace the S&P 500. Also the demonstrated
potential of an EVT based trading strategy to produce excess returns means
that though the strategy is erratic, further renement may yet provide a path
to beating the S&P index.
However, to our disappointment PCA failed to reveal any rm market
46
CHAPTER 4. CONCLUSION 47
signals or demonstrate the existence of deep market structure. Furthermore
the erratic returns of the EVT trading strategy are still worrisome.
However the investigation has also uncovered many places for future
investigation. Future studies might focus on, a more rigorous analysis of
the EVT trading strategy focusing on nding optimal parameter values. Or
one might create a new trading strategy based on the performance of the
rst principal component. Additionally one may also choose to examine
eigenvalue movement on a shorter time scale just prior to market crashes to
further investigate the potential existence of market signals.
In conclusion, mathematical methods in use in physics may provide new
and important tools for nancial analysis. Our application of extreme value
theory and principal component analysis has met with encouraging but not
overwhelming results. But the investigation into the merit of such an over-
lap suggests that further work should denitely be pursued, but without
expectation of dramatic insight. Furthermore the implementation of math-
ematical methods in nance based not only on overlap of mathematical
challenges, but inspired by a more fundamental conceptual overlap using
analogy to model the nancial market from physical phenomenon, should
be the next step in the melding of these two elds.
Bibliography
[1] Bhavsar, Suketu P. Probing the Nature of the Brightest Galaxies Us-
ing Extreme Value Theory. Conference on Extreme Value Theory and
Applications, 1993. Dordrecht: Kluwer Academic, 1994.
[2] S.P. Bhavsar, The Astrophysical Journal, 338, 718 (1989).
[3] Bujarrabal, V., J. Guibert, C. Balkowski. Multidimensional Statistical
Analysis of Normal Galaxies. 1981, Astronomy and Astrophysics, 104,
1-9.
[4] Deeming, T. Stellar Spectral Classication. Mont. Not. Astron. Soc.
127, 493-516, 1964.
[5] Feigelson, Eric D., and G J. Babu, eds. Statistical Challenges in Modern
Astronomy. New York: Springer, 1992.
[6] Feigelson, Eric D., and G J. Babu, eds. Statistical Challenges in Modern
Astronomy. New York: Springer, 2003.
[7] Galambos, Janos, James Lechner, and Emil Simiu, eds. Conference
on Extreme Value Theory and Applications, 1993. Dordrecht: Kluwer
Academic, 1994.
[8] G. Bhanot, Personal Correpondence, October 2005 - May 2006.
[9] Jackson, J Edward. A Users Guide to Principal Components. New
York: Wiley-Interscience, 1991.
48
BIBLIOGRAPHY 49
[10] Murtagh, F, and A Heck. Mulitvariate Data Analysis. Dordrecht: D.
Reidel, 1987.
[11] Muldoon, Cecilla. Extreme Value Theory: Bright Cluster Galaxies and
the Stock Market. (unpublished reference), 2005.
[12] Pelat, D. A Study of HI Absorption Using Karhunen-Loeve Series.
Astron. & Astrophys. 40, 285-290 (1975).
[13] Strauss, Michael. Reading the Blueprints of Creation. Scientic
American, February, 2004: 54-61.
[14] Tegmark, Max, Andy N. Taylor, and Alan F. Heavens. Karhunen-
Loeve Eigenvalue Problems in Cosmology: How Should We Tackle
Large Data Sets. The Astrophysical Journal 480 (1997): 22-35.
[15] Vogeley, M.S. & Szalay, A.S. Eigenmode Analysis of Galaxy Redshift
Surveys. The Astrophysical Journal 465 (1996): 34.
[16] Whitney, C.A. Principal Component Analysis of Spectral Data. As-
tron. Astrophys. Suppl. Ser. 51, 443-461 (1983).
[17] http://en.wikipedia.org/wiki/Gumbel distribution, April 24.

Julian Rachlin Final Thesis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Julian Rachlin Final Thesis

Uploaded by

Copyright:

Available Formats

Principal Component Analysis and

Extreme Value Theory in Financial

You might also like