You are on page 1of 81

CFA Level 1 - Quantitative Methods

Email to Friend
Comments

2.1 - Introduction
The Quantitative Methods section of the CFA curriculum has traditionally been placed second in the sequence
of study topics, following the Ethics and Professional Standards review. It's an interesting progression: the
ethics discussions and case studies will invariably make most candidates feel very positive, very high-minded
about the road on which they have embarked. Then, without warning, they are smacked with a smorgasbord of
formulas, graphs, Greek letters, and challenging terminology. We know – it's easy to become overwhelmed. At
the same time, the topics covered in this section – time value of money, performance measurement, statistics
and probability basics, sampling and hypothesis testing, correlation and linear regression analysis – provide the
candidate with a variety of highly essential analytical tools and are a crucial prerequisite for the subsequent
material on fixed income, equities, and portfolio management. In short, mastering the material in this section
will make the CFA's entire Body of Knowledge that
much easier to handle.

The list of topics within Quantitative Methods may


appear intimidating at first, but rest assured that one does
not need a PhD in mathematics or require exceptional
numerical aptitude to understand and relate to the
quantitative approaches at CFA Level 1. Still, some
people will tend to absorb quantitative material better
than others do. What we've tried to do in this study guide
is present the full list of topics in a manner that
summarizes and attempts to tone down the degree of
technical detail that is characteristic of academic
textbooks. At the same time, we want our presentation to
be sufficiently deep that the guide can be effectively utilized as a candidate's primary study resource. For those
who have already purchased and read the textbook, and for those who already clearly understand the material,
this guide should allow for a relatively speedy refresher in those hectic days and weeks prior to exam day.
Along the way, we'll provide tips (primarily drawn from personal experience) on how to approach the CFA
Level 1 exam and help give you the best chance of earning a passing grade.
2.2 - What Is The Time Value Of Money?

The principle of time value of money – the notion that a


given sum of money is more valuable the sooner it is
received, due to its capacity to earn interest – is the
foundation for numerous applications in investment
finance.

Central to the time value principle is the concept of


interest rates. A borrower who receives money today for
consumption must pay back the principal plus an interest
rate that compensates the lender. Interest rates are set in
the marketplace and allow for equivalent relationships to be determined by forces of supply and demand. In
other words, in an environment where the market-determined rate is 10%, we would say that borrowing (or
lending) $1,000 today is equivalent to paying back (or receiving) $1,100 a year from now. Here it is stated
another way: enough borrowers are out there who demand $1,000 today and are willing to pay back $1,100 in a
year, and enough investors are out there willing to supply $1,000 now and who will require $1,100 in a year, so
that market equivalence on rates is reached.

Exam Tips and Tricks

The CFA exam question authors frequently test knowledge of


FV, PV and annuity cash flow streams within questions on
mortgage loans or planning for college tuition or retirement
savings. Problems with uneven cash flows will eliminate the
use of the annuity factor formula, and require that the
present value of each cash flow be calculated individually,
and the resulting values added together.
2.3 - The Five Components Of Interest Rates
CFA Institute's LOS 5.a requires an understanding of the
components of interest rates from an economic (i.e. non-
quantitative) perspective. In this exercise, think of the
total interest rate as a sum of five smaller parts, with each
part determined by its own set of factors.

1. Real Risk-Free Rate – This assumes no risk or


uncertainty, simply reflecting differences in
timing: the preference to spend now/pay back later
versus lend now/collect later.
2. Expected Inflation - The market expects
aggregate prices to rise, and the currency's
purchasing power is reduced by a rate known as the inflation rate. Inflation makes real dollars less
valuable in the future and is factored into determining the nominal interest rate (from the economics
material: nominal rate = real rate + inflation rate).
3. Default-Risk Premium - What is the chance that the borrower won't make payments on time, or will be
unable to pay what is owed? This component will be high or low depending on the creditworthiness of
the person or entity involved.
4. Liquidity Premium- Some investments are highly liquid, meaning they are easily exchanged for cash
(U.S. Treasury debt, for example). Other securities are less liquid, and there may be a certain loss
expected if it's an issue that trades infrequently. Holding other factors equal, a less liquid security must
compensate the holder by offering a higher interest rate.

5. Maturity Premium - All else being equal, a bond obligation will be more sensitive to interest rate
fluctuations the longer to maturity it is.

<< Back
Next >>
2.4 - Time Value Of Money Calculations
Here we will discuss the effective annual rate, time value of money problems, PV of a perpetuity, an ordinary
annuity, annuity due, a single cash flow and a series of uneven cash flows. For each, you should know how to
both interpret the problem and solve the problems on your approved calculator. These concepts will cover LOS'
5.b and 5.c.
The Effective Annual Rate
CFA Institute's LOS 5.b is explained within this section. We'll start by defining the terms, and then presenting
the formula.

The stated annual rate, or quoted rate, is the interest rate on an


investment if an institution were to pay interest only once a year.
In practice, institutions compound interest more frequently,
either quarterly, monthly, daily and even continuously.
However, stating a rate for those small periods would involve
quoting in small fractions and wouldn't be meaningful or allow easy comparisons to other investment vehicles;
as a result, there is a need for a standard convention for quoting rates on an annual basis.

The effective annual yield represents the actual rate of return, reflecting all of the compounding periods during
the year. The effective annual yield (or EAR) can be computed given the stated rate and the frequency of
compounding. We'll discuss how to make this computation next.

Formula 2.1
Effective annual rate (EAR) = (1 + Periodic interest rate)m – 1

Where: m = number of compounding periods in one year, and


periodic interest rate = (stated interest rate) / m

Example: Effective Annual Rate


Suppose we are given a stated interest rate of 9%, compounded monthly, here is what we get for EAR:

EAR = (1 + (0.09/12))12 – 1 = (1.0075) 12 – 1 = (1.093807) – 1 = 0.093807 or 9.38%

Keep in mind that the effective annual rate will always be higher than the stated rate if there is more than one
compounding period (m > 1 in our formula), and the more frequent the compounding, the higher the EAR.

Solving Time Value of Money Problems


Approach these problems by first converting both the rate r and the time period N to the same units as the
compounding frequency. In other words, if the problem specifies quarterly compounding (i.e. four
compounding periods in a year), with time given in years and interest rate is an annual figure, start by dividing
the rate by 4, and multiplying the time N by 4. Then, use the resulting r and N in the standard PV and FV
formulas.

Example: Compounding Periods


Assume that the future value of $10,000 five years from now is at 8%, but assuming quarterly compounding, we
have quarterly r = 8%/4 = 0.02, and periods N = 4*5 = 20 quarters.

FV = PV * (1 + r)N = ($10,000)*(1.02)20 = ($10,000)*(1.485947) = $14,859.47

Assuming monthly compounding, where r = 8%/12 = 0.0066667, and N = 12*5 = 60.

FV = PV * (1 + r)N = ($10,000)*(1.0066667)60 = ($10,000)*(1.489846) = $14,898.46


Compare these results to the figure we calculated earlier with annual compounding ($14,693.28) to see the
benefits of additional compounding periods.

Exam Tips and Tricks

On PV and FV problems, switching the time units - either by calling for quarterly or
monthly compounding or by expressing time in months and the interest rate in years - is
an often-used tactic to trip up test takers who are trying to go too fast. Remember to
make sure the units agree for r and N, and are consistent with the frequency of
compounding, prior to solving.

Present Value of a Perpetuity


A perpetuity starts as an ordinary annuity (first cash flow is one period from today) but has no end and
continues indefinitely with level, sequential payments. Perpetuities are more a product of the CFA world than
the real world – what entity would obligate itself making to payments that will never end? However, some
securities (such as preferred stocks) do come close to satisfying the assumptions of a perpetuity, and the formula
for PV of a perpetuity is used as a starting point to value these types of securities.

The formula for the PV of a perpetuity is derived from the PV of an ordinary annuity, which at N = infinity, and
assuming interest rates are positive, simplifies to:

Formula 2.2

PV of a perpetuity = annuity payment A


interest rate r

Therefore, a perpetuity paying $1,000 annually at an interest rate of 8% would be worth:

PV = A/r = ($1000)/0.08 = $12,500

FV and PV of a SINGLE SUM OF MONEY


If we assume an annual compounding of interest, these problems can be solved with the following formulas:

Formula 2.3

(1) FV = PV * (1 + r)N

(2) PV = FV * { 1 }
(1 + r)N

Where: FV = future value of a single sum of money,


PV = present value of a single sum of money, R = annual interest
rate,
and N = number of years
Example: Present Value
At an interest rate of 8%, we calculate that $10,000 five years from now will be:

FV = PV * (1 + r)N = ($10,000)*(1.08)5 = ($10,000)*(1.469328)

FV = $14,693.28

At an interest rate of 8%, we calculate today's value that will grow to $10,000 in five years:

PV = FV * (1/(1 + r)N) = ($10,000)*(1/(1.08)5) = ($10,000)*(1/(1.469328))

PV = ($10,000)*(0.680583) = $6805.83

Example: Future Value


An investor wants to have $1 million when she retires in 20 years. If she can earn a 10% annual return,
compounded annually, on her investments, the lump-sum amount she would need to invest today to reach her
goal is closest to:

A. $100,000
B. $117,459
C. $148,644
D. $161,506

Answer:
The problem asks for a value today (PV). It provides the future sum of money (FV) = $1,000,000; an interest
rate (r) = 10% or 0.1; yearly time periods (N) = 20, and it indicates annual compounding. Using the PV formula
listed above, we get the following:

PV = FV *[1/(1 + r) N] = [($1,000,000)* (1/(1.10)20)] = $1,000,000 * (1/6.7275) = $1,000,000*0.148644 =


$148,644

Using a calculator with financial functions can save time when solving PV and FV problems. At the same time,
the CFA exam is written so that financial calculators aren't required. Typical PV and FV problems will test the
ability to recognize and apply concepts and avoid tricks, not the ability to use a financial calculator. The
experience gained by working through more examples and problems increase your efficiency much more than a
calculator.
FV and PV of an Ordinary Annuity and an Annuity Due
To solve annuity problems, you must know the formulas for the future value annuity factor and the present
value annuity factor.

Formula 2.4

Future Value Annuity Factor = (1 + r)N - 1


r

Formula 2.5

Present Value Annuity Factor = 1- 1


(1 + r)N

Where r = interest rate and N = number of payments

FV Annuity Factor
The FV annuity factor formula gives the future total dollar amount of a series of $1 payments, but in problems
there will likely be a periodic cash flow amount given (sometimes called the annuity amount and denoted by A).
Simply multiply A by the FV annuity factor to find the future value of the annuity. Likewise for PV of an
annuity: the formula listed above shows today's value of a series of $1 payments to be received in the future. To
calculate the PV of an annuity, multiply the annuity amount A by the present value annuity factor.

The FV and PV annuity factor formulas work with an ordinary annuity, one that assumes the first cash flow is
one period from now, or t = 1 if drawing a timeline. The annuity due is distinguished by a first cash flow
starting immediately, or t = 0 on a timeline. Since the annuity due is basically an ordinary annuity plus a lump
sum (today's cash flow), and since it can be fit to the definition of an ordinary annuity starting one year ago, we
can use the ordinary annuity formulas as long as we keep track of the timing of cash flows. The guiding
principle: make sure, before using the formula, that the annuity fits the definition of an ordinary annuity with
the first cash flow one period away.

Example: FV and PV of ordinary annuity and annuity due


An individual deposits $10,000 at the beginning of each of the next 10 years, starting today, into an account
paying 9% interest compounded annually. The amount of money in the account of the end of 10 years will be
closest to:

A. $109,000
B. $143.200
C. $151,900
D. $165,600

Answer:
The problem gives the annuity amount A = $10,000, the interest rate r = 0.09, and time periods N = 10. Time
units are all annual (compounded annually) so there is no need to convert the units on either r or N. However,
the starting today introduces a wrinkle. The annuity being described is an annuity due, not an ordinary annuity,
so to use the FV annuity factor, we will need to change our perspective to fit the definition of an ordinary
annuity.
Drawing a timeline should help visualize what needs to be done:

Figure 2.1: Cashflow Timeline


The definition of an ordinary annuity is a cash flow stream beginning in one period, so the annuity being
described in the problem is an ordinary annuity starting last year, with 10 cash flows from t0 to t9. Using the FV
annuity factor formula, we have the following:

FV annuity factor = ((1 + r)N – 1)/r = (1.09)10 – 1)/0.09 = (1.3673636)/0.09 = 15.19293

Multiplying this amount by the annuity amount of $10,000, we have the future value at time period 9. FV =
($10,000)*(15.19293) = $151,929. To finish the problem, we need the value at t10. To calculate, we use the
future value of a lump sum, FV = PV*(1 + r)N, with N = 1, PV = the annuity value after 9 periods, r = 9.

FV = PV*(1 + r)N = ($151,929)*(1.09) = $165,603.

The correct answer is "D".

Notice that choice "C" in the problem ($151,900) agrees with the preliminary result of the value of the annuity
at t = 9. It's also the result if we were to forget the distinction between ordinary annuity and annuity due, and go
forth and solve the problem with the ordinary annuity formula and the given parameters. On the CFA exam,
problems like this one will get plenty of takers for choice "C" – mostly the people trying to go too fast!!

PV and FV of Uneven Cash Flows


The FV and PV annuity formulas assume level and sequential cash flows, but if a problem breaks this
assumption, the annuity formulas no longer apply. To solve problems with uneven cash flows, each cash flow
must be discounted back to the present (for PV problems) or compounded to a future date (for FV problems);
then the sum of the present (or future) values of all cash flows is taken. In practice, particularly if there are
many cash flows, this exercise is usually completed by using a spreadsheet. On the CFA exam, the ability to
handle this concept may be tested with just a few future cash flows, given the time constraints.

It helps to set up this problem as if it were on a spreadsheet, to keep track of the cash flows and to make sure
that the proper inputs are used to either discount or compound each cash flow. For example, assume that we are
to receive a sequence of uneven cash flows from an annuity and we're asked for the present value of the annuity
at a discount rate of 8%. Scratch out a table similar to the one below, with periods in the first column, cash
flows in the second, formulas in the third column and computations in the fourth.

Time Cash Present Value Result of


Period Flow Formula Computation
1 $1,000 ($1,000)/(1.08)1 $925.93
2 $1,500 ($1,500)/(1.08)2 $1,286.01
3 $2,000 ($2,000)/(1.08)3 $1,587.66
4 $500 ($500)/(1.08)4 $367.51
5 $3,000 ($3,000)/(1.08)5 $2,041.75

Taking the sum of the results in column 4, we have a PV = $6,208.86.

Suppose we are required to find the future value of this same sequence of cash flows after period 5. Here's the
same approach using a table with future value formulas rather than present value, as in the table above:

Time Cash Future Value Result of


Period Flow Formula computation
1 $1,000 ($1,000)*(1.08)4 $1,360.49
2 $1,500 ($1,500)*(1.08)3 $1,889.57
3 $2,000 ($2,000)*(1.08)2 $2,332.80
4 $500 ($500)*(1.08)1 $540.00
5 $3,000 ($3,000)*(1.08)0 $3,000.00

Taking the sum of the results in column 4, we have FV (period 5) = $9,122.86.

Check the present value of $9,122.86, discounted at the 8% rate for five years:

PV = ($9,122.86)/(1.08)5 = $6,208.86. In other words, the principle of equivalence applies even in examples
where the cash flows are unequal.
<< Back
Next >>
2.5 - Time Value Of Money Applications

I. MORTGAGES
Most of the problems from the time value material are likely to ask for either PV or FV and will provide the
other variables. However, on a test with hundreds of problems, the CFA exam will look for unique and creative
methods to test command of the material. A problem might provide both FV and PV and then ask you to solve
for an unknown variable, either the interest rate (r), the number of periods (N) or the amount of the annuity (A).
In most of these cases, a quick use of freshmen-level algebra is
all that's required. We'll cover two real-world applications –
each was the subject of an example in the resource textbook, so
either one may have a reasonable chance of ending up on an
exam problem.

Annualized Growth Rates


The first application is annualized growth rates. Taking the formula for FV of a single sum of money and
solving for r produces a formula that can also be viewed as the growth rate, or the rate at which that sum of
money grew from PV to FV in N periods.

Formula 2.6

Growth rate (g) = (FV/PV)1/N – 1

For example, if a company's earnings were $100 million five years ago, and are $200 million today, the
annualized five-year growth rate could be found by:

growth rate (g) = (FV/PV)1/N – 1 = (200,000,000/100,000,000) 1/5 – 1 = (2) 1/5 – 1 = (1.1486984) – 1 = 14.87%

Monthly Mortgage Payments


The second application involves calculating monthly mortgage payments. Periodic mortgage payments fit the
definition of an annuity payment (A), where PV of the annuity is equal to amount borrowed. (Note that if the
loan is needed for a $300,000 home and they tell you that the down payment is $50,000, make sure to reduce
the amount borrowed, or PV, to $250,000! Plenty of folks will just grab the $300,000 number and plug it into
the financial calculator.) Because mortgage payments are typically made monthly with interest compounded
monthly, expect to adjust the annual interest rate (r) by dividing by 12, and to multiply the time periods by 12 if
the mortgage loan period is expressed in years.

Since PV of an annuity = (annuity payment)*(PV annuity factor), we solve for annuity payment (A), which will
be the monthly payment:

Formula 2.7

Monthly mortgage payment = (Amount of the loan)/(PV


annuity factor)

Example: Monthly Mortgage Payments


Assuming a 30-year loan with monthly compounding (so N = 30*12 = 360 months), and a rate of 6% (so r = .
06/12 = 0.005), we first calculate the PV annuity factor:

PV annuity factor = (1 – (1/(1 + r)N)/r = (1 – (1/(1.005)360)/0.005 = 166.7916

With a loan of $250,000, the monthly payment in this example would be $250,000/166.7916, or $1,498.88 a
month.
Exam Tips and Tricks

Higher-level math functions usually don't end up on the test, partly because they give an
unfair advantage to those with higher-function calculators and because questions must be
solved in an average of one to two minutes each at Level I. Don't get bogged down with
understanding natural logs or transcendental numbers.

II. RETIREMENT SAVINGS


Savings and retirement planning are sometimes more complicated, as there are various life-cycles stages that
result in assumptions for uneven cash inflows and outflows. Problems of this nature often involve more than
one computation of the basic time value formulas; thus the emphasis on drawing a timeline is sound advice, and
a worthwhile habit to adopt even when solving problems that appear to be relatively simple.

Example: Retirement Savings


To illustrate, we take a hypothetical example of a client, 35 years old, who would like to retire at age 65 (30
years from today). Her goal is to have enough in her retirement account to provide an income of $75,000 a year,
starting a year after retirement or year 31, for 25 years thereafter. She had a late start on saving for retirement,
with a current balance of $10,000. To catch up, she is now committed to saving $5,000 a year, with the first
contribution a year from now. A single parent with two children, both of which will be attending college
starting in five years, she won't be able to increase the annual $5,000 commitment until after the kids have
graduated. Once the children are finished with college, she will have extra disposable income, but is worried
about just how much of an increase it will take to meet her ultimate retirement goals. To help her meet this goal,
estimate how much she will need to save every year, starting 10 years from now, when the kids are out of
college. Assume an average annual 8% return in the retirement account.

Answer:
To organize and summarize this information, we will need her three cash inflows to be the equivalent of her one
cash outflow.

1.The money already in the account is the first inflow.


2. The money to be saved during the next 10 years is the second inflow.
3. The money to be saved between years 11 and 30 is the third inflow.
4.The money to be taken as income from years 31 to 50 is the one outflow.

All amounts are given to calculate inflows 1 and 2 and the outflow. The third inflow has an unknown annuity
amount that will need to be determined using the other amounts. We start by drawing a timeline and specifying
that all amounts be indexed at t = 30, or her retirement day.

Next, calculate the three amounts for which we have all the necessary information, and index to t = 30.

(inflow 1) FV (single sum) = PV *(1 + r)N = ($10,000)*(1.08)30 = $100,627

(inflow 2) FV annuity factor = ((1 + r)N – 1)/r = ((1.08)10 – 1)/.08 = 14.48656

With a $5000 payment, FV (annuity) = ($5000)*(14.48656) = $72,433

This amount is what is accumulated at t = 10; we need to index it to t = 30.

FV (single sum) = PV *(1 + r)N = ($72,433)*(1.08)20 = $337,606

(cash PV annuity factor = (1 – (1/(1 + r)N)/r = (1 – (1/(1.08)25/0.08 = 10.674776.outflow)

With payment of $75,000, PV (annuity) = ($75,000)*(10.674776) = $800,608.

Since the three cash inflows = cash outflow, we have ($100,627) + ($337,606) + X = $800,608, or X =
$362,375 at t = 30. In other words, the money she saves from years 11 through 30 will need to be equal to
$362,375 in order for her to meet retirement goals.

FV annuity factor = ((1 + r)N – 1)/r = ((1.08)20 – 1)/.08 = 45.76196

A = FV/FV annuity factor = (362,375)/45.76196 = $7919

We find that by increasing the annual savings from $5,000 to $7,919 starting in year 11 and continuing to year
30, she will be successful in accumulating enough income for retirement.

How are Present Values, Future Value and Cash Flows connected?
The cash flow additivity principle allows us to add amounts of money together, provided they are indexed to the
same period. The last example on retirement savings illustrates cash flow additivity: we were planning to
accumulate a sum of money from three separate sources and we needed to determine what the total amount
would be so that the accumulated sum could be compared with the client's retirement cash outflow requirement.
Our example involved uneven cash flows from two separate annuity streams and one single lump sum that has
already accumulated. Comparing these inputs requires each amount to be indexed first, prior to adding them
together. In the last example, the annuity we were planning to accumulate in years 11 to 30 was projected to
reach $362,375 by year 30. The current savings initiative of $5,000 a year projects to $72,433 by year 10. Right
now, time 0, we have $10,000. In other words, we have three amounts at three different points in time.

According to the cash flow additivity principle, these


amounts could not be added together until they were
either discounted back to a common date, or compounded
ahead to a common date. We chose t = 30 in the example
because it made the calculations the simplest, but any
point in time could have been chosen. The most common
date chosen to apply cash flow additivity is t = 0 (i.e.
discount all expected inflows and outflows to the present
time). This principle is frequently tested on the CFA
exam, which is why the technique of drawing timelines
and choosing an appropriate time to index has
been emphasized here.
2.6 - Net Present Value and the Internal Rate of
Return
This section applies the techniques and formulas first
presented in the time value of money material toward
real-world situations faced by financial analysts. Three
topics are emphasized: (1) capital budgeting decisions,
(2) performance measurement and (3) U.S. Treasury-bill
yields.

Net Preset Value


NPV and IRR are two methods for making capital-budget
decisions, or choosing between alternate projects and
investments when the goal is to increase the value of the
enterprise and maximize shareholder wealth. Defining
the NPV method is simple: the present value of cash
inflows minus the present value of cash outflows, which
arrives at a dollar amount that is the net benefit to the
organization.

To compute NPV and apply the NPV rule, the authors of


the reference textbook define a five-step process to be
used in solving problems:

1.Identify all cash inflows and cash outflows.


2.Determine an appropriate discount rate (r).
3.Use the discount rate to find the present value of all
cash inflows and outflows.
4.Add together all present values. (From the section on
cash flow additivity, we know that this action is
appropriate since the cash flows have been indexed to t =
0.)
5.Make a decision on the project or investment using the
NPV rule: Say yes to a project if the NPV is positive; say no if NPV is negative. As a tool for choosing among
alternates, the NPV rule would prefer the investment with the higher positive NPV.
Companies often use the weighted average cost of capital, or WACC, as the appropriate discount rate for capital
projects. The WACC is a function of a firm's capital structure (common and preferred stock and long-term debt)
and the required rates of return for these securities. CFA exam problems will either give the discount rate, or
they may give a WACC.

Example:
To illustrate, assume we are asked to use the NPV approach to choose between two projects, and our company's
weighted average cost of capital (WACC) is 8%. Project A costs $7 million in upfront costs, and will generate
$3 million in annual income starting three years from now and continuing for a five-year period (i.e. years 3 to
7). Project B costs $2.5 million upfront and $2 million in each of the next three years (years 1 to 3). It generates
no annual income but will be sold six years from now for a sales price of $16 million.

For each project, find NPV = (PV inflows) – (PV outflows).

Project A: The present value of the outflows is equal to the current cost of $7 million. The inflows can be
viewed as an annuity with the first payment in three years, or an ordinary annuity at t = 2 since ordinary
annuities always start the first cash flow one period away.

PV annuity factor for r = .08, N = 5: (1 – (1/(1 + r)N)/r = (1 – (1/(1.08)5)/.08 = (1 – (1/(1.469328)/.08 = (1 – (1/


(1.469328)/.08 = (0.319417)/.08 = 3.99271

Multiplying by the annuity payment of $3 million, the value of the inflows at t = 2 is ($3 million)*(3.99271) =
$11.978 million.

Discounting back two periods, PV inflows = ($11.978)/(1.08)2 = $10.269 million.

NPV (Project A) = ($10.269 million) – ($7 million) = $3.269 million.

Project B: The inflow is the present value of a lump sum, the sales price in six years discounted to the present:
$16 million/(1.08)6 = $10.083 million.

Cash outflow is the sum of the upfront cost and the discounted costs from years 1 to 3. We first solve for the
costs in years 1 to 3, which fit the definition of an annuity.

PV annuity factor for r = .08, N = 3: (1 – (1/(1.08)3)/.08 = (1 – (1/(1.259712)/.08 = (0.206168)/.08 =


2.577097. PV of the annuity = ($2 million)*(2.577097) = $5.154 million.

PV of outflows = ($2.5 million) + ($5.154 million) = $7.654 million.

NPV of Project B = ($10.083 million) – ($7.654 million) = $2.429 million.


Applying the NPV rule, we choose Project A, which has the larger NPV: $3.269 million versus $2.429
million.

Exam Tips and Tricks

Problems on the CFA exam are frequently set up so that it is tempting to pick a choice
that seems intuitively better (i.e. by people who are guessing), but this is wrong by NPV
rules. In the case we used, Project B had lower costs upfront ($2.5 million versus $7
million) with a payoff of $16 million, which is more than the combined $15 million
payoff of Project A. Don't rely on what feels better; use the process to make the
decision!
The Internal Rate of Return
The IRR, or internal rate of return, is defined as the discount rate that makes NPV = 0. Like the NPV process, it
starts by identifying all cash inflows and outflows. However, instead of relying on external data (i.e. a discount
rate), the IRR is purely a function of the inflows and outflows of that project. The IRR rule states that projects
or investments are accepted when the project's IRR exceeds a hurdle rate. Depending on the application, the
hurdle rate may be defined as the weighted average cost of capital.

Example:
Suppose that a project costs $10 million today, and will provide a $15 million payoff three years from now, we
use the FV of a single-sum formula and solve for r to compute the IRR.

IRR = (FV/PV)1/N –1 = (15 million/10 million)1/3 – 1 = (1.5) 1/3 – 1 = (1.1447) – 1 = 0.1447, or 14.47%

In this case, as long as our hurdle rate is less than 14.47%, we green light the project.

NPV vs. IRR


Each of the two rules used for making capital-budgeting decisions has its strengths and weaknesses. The NPV
rule chooses a project in terms of net dollars or net financial impact on the company, so it can be easier to use
when allocating capital.

However, it requires an assumed discount rate, and also assumes that this percentage rate will be stable over the
life of the project, and that cash inflows can be reinvested at the same discount rate. In the real world, those
assumptions can break down, particularly in periods when interest rates are fluctuating. The appeal of the IRR
rule is that a discount rate need not be assumed, as the worthiness of the investment is purely a function of the
internal inflows and outflows of that particular investment. However, IRR does not assess the financial impact
on a firm; it only requires meeting a minimum return rate.

The NPV and IRR methods can rank two projects differently, depending on thesize of the investment. Consider
the case presented below, with an NPV of 6%:

Project Initial outflow Payoff after one year IRR NPV

A $250,000 $280,000 12% +$14,151

B $50,000 $60,000 20% +6604

By the NPV rule we choose Project A, and by the IRR rule we prefer B. How do we resolve the conflict if we
must choose one or the other? The convention is to use the NPV rule when the two methods are inconsistent, as
it better reflects our primary goal: to grow the financial wealth of the company.

Consequences of the IRR Method


In the previous section we demonstrated how smaller projects can have higher IRRs but will have less of a
financial impact. Timing of cash flows also affects the IRR method. Consider the example below, on which
initial investments are identical. Project A has a smaller payout and less of a financial impact (lower NPV), but
since it is received sooner, it has a higher IRR. When inconsistencies arise, NPV is the preferred method.
Assessing the financial impact is a more meaningful indicator for a capital-budgeting decision.
Project Investment Income in future periods IRR NPV

t1 t2 t3 t4 t5

A $100k $125k $0 $0 $0 $0 25.0% $17,925

B $100k $0 $0 $0 $0 $200k 14.9% $49,452


2.7 - Money Vs. Time-Weighted Return
Money-weighted and time-weighted rates of return are two methods of measuring performance, or the rate of
return on an investment portfolio. Each of these two approaches has particular instances where it is the preferred
method. Given the priority in today's environment on performance returns (particularly when comparing and
evaluating money managers), the CFA exam will be certain to test whether a candidate understands each
methodology.

Money-Weighted Rate of Return


A money-weighted rate of return is identical in concept to
an internal rate of return: it is the discount rate on which
the NPV = 0 or the present value of inflows = present
value of outflows. Recall that for the IRR method, we
start by identifying all cash inflows and outflows. When
applied to an investment portfolio:

Outflows
1. The cost of any investment purchased
2. Reinvested dividends or interest
3. Withdrawals

Inflows
1.The proceeds from any investment sold
2.Dividends or interest received
3.Contributions
Example:
Each inflow or outflow must be discounted back to the present using a rate (r) that will make PV (inflows) = PV
(outflows). For example, take a case where we buy one share of a stock for $50 that pays an annual $2 dividend,
and sell it after two years for $65. Our money-weighted rate of return will be a rate that satisfies the following
equation:

PV Outflows = PV Inflows = $2/(1 + r) + $2/(1 + r)2 + $65/(1 + r)2 = $50

Solving for r using a spreadsheet or financial calculator, we have a money-weighted rate of return = 17.78%.

Exam Tips and Tricks

Note that the exam will test knowledge of the concept of money-weighted return, but any
computations should not require use of a financial calculator

It's important to understand the main limitation of the money-weighted return as a tool for evaluating managers.
As defined earlier, the money-weighted rate of return factors all cash flows, including contributions and
withdrawals. Assuming a money-weighted return is calculated over many periods, the formula will tend to place
a greater weight on the performance in periods when the account size is highest (hence the label money-
weighted).

In practice, if a manager's best years occur when an account is small, and then (after the client deposits more
funds) market conditions become more unfavorable, the money-weighted measure doesn't treat the manager
fairly. Here it is put another way: say the account has annual withdrawals to provide a retiree with income, and
the manager does relatively poorly in the early years (when the account is larger), but improves in later periods
after distributions have reduced the account's size. Should the manager be penalized for something beyond his
or her control? Deposits and withdrawals are usually outside of a manager's control; thus, a better performance
measurement tool is needed to judge a manager more fairly and allow for comparisons with peers – a
measurement tool that will isolate the investment actions, and not penalize for deposit/withdrawal activity.

Time-Weighted Rate of Return


The time-weighted rate of return is the preferred industry standard as it is not sensitive to contributions or
withdrawals. It is defined as the compounded growth rate of $1 over the period being measured. The time-
weighted formula is essentially a geometric mean of a number of holding-period returns that are linked together
or compounded over time (thus, time-weighted). The holding-period return, or HPR, (rate of return for one
period) is computed using this formula:

Formula 2.8

HPR = ((MV1 – MV0 + D1 – CF1)/MV0)

Where: MV0 = beginning market value, MV1 = ending market


value,
D1 = dividend/interest inflows, CF1 = cash flow received at period
end (deposits subtracted, withdrawals added back)

For time-weighted performance measurement, the total period to be measured is broken into many sub-periods,
with a sub-period ending (and portfolio priced) on any day with significant contribution or withdrawal activity,
or at the end of the month or quarter. Sub-periods can cover any length of time chosen by the manager and need
not be uniform. A holding-period return is computed using the above formula for all sub-periods. Linking (or
compounding) HPRs is done by
(a) adding 1 to each sub-period HPR, then
(b) multiplying all 1 + HPR terms together, then
(c) subtracting 1 from the product:

Compounded time-weighted rate of return, for N holding periods

= [(1 + HPR1)*(1 + HPR2)*(1 + HPR3) … *(1 + HPRN)] – 1.

The annualized rate of return takes the compounded time-weighted rate and standardizes it by computing a
geometric average of the linked holding-period returns.

Formula 2.9

Annualized rate of return = (1 + compounded rate)1/Y – 1

Where: Y = total time in years


Example: Time-Weighted Portfolio Return
Consider the following example: A portfolio was priced at the following values for the quarter-end dates
indicated:

Date Market Value


Dec. 31, 2003 $200,000
March 31, 2004 $196,500
June 30, 2004 $200,000
Sept. 30, 2004 $243,000
Dec. 31, 2004 $250,000
On Dec. 31, 2004, the annual fee of $2,000 was deducted from the account. On July 30, 2004, the annual
contribution of $20,000 was received, which boosted the account value to $222,000 on July 30. How would we
calculate a time-weighted rate of return for 2004?

Answer:
For this example, the year is broken into four holding-period returns to be calculated for each quarter. Also,
since a significant contribution of $20,000 was received intra-period, we will need to calculate two holding-
period returns for the third quarter, June 30, 2004, to July 30, 2004, and July 30, 2004, to Sept 30, 2004. In total,
there are five HPRs that must be computed using the formula HPR = (MV1 – MV0 + D1 – CF1)/MV0. Note that
since D1, or dividend payments, are already factored into the ending-period value, this term will not be needed
for the computation. On a test problem, if dividends or interest is shown separately, simply add it to ending-
period value. The ccalculations are done below (dollar amounts in thousands):

Period 1 (Dec 31, 2003, to Mar 31, 2004):

HPR = (($196.5 – $200 – (–$2))/$200) = (–1.5)/200 = –0.75%.

Period 2 (Mar 31, 2004, to June 30, 2004):

HPR = (($200 – $196.5)/$196.5) = 3.5/196.5 = +1.78%.

Period 3 (June 30, 2004, to July 30, 2004):

HPR = (($222 – $200 – ($20))/$200) = 2/200 = +1.00%.

Period 4 (July 30, 2004, to Sept 30, 2004):

HPR = ($243 – $222)/$222 = 21/222 = +9.46%.

Period 5 (Sept 30, 2004, to Dec 31, 2004):

HPR = ($250 – $243)/$243 = 7/243 = +2.88%

Now we link the five periods together, by adding 1 to each HPR, multiplying all terms, and subtracting 1 from
the product, to find the compounded time- weighted rate of return:

2004 return = ((1 + (–.0075))*(1 + 0.0178)*(1 + 0.01)*(1 + 0.0946)*(1 + 0.0288)) – 1 =


((0.9925)*(1.0178)*(1.01)*(1.0946)*(1.0288)) – 1 = (1.148964) – 1 = 0.148964, or 14.90% (rounding to the
nearest 1/100 of a percent).

Annualizing: Because our compounded calculation was for one year, the annualized figure is the same
+14.90%. If the same portfolio had a 2003 return of 20%, the two-year compounded number would be ((1 +
0.20)*(1 + 14.90)) – 1, or 37.88%. Annualize by adding 1, and then taking to the 1/Y power, and then
subtracting 1: (1 + 37.88)1/2 – 1 = 17.42%.

Note: The annualized number is the same as a geometric average, a concept covered in the statistics section.

Example: Money Weighted Returns


Calculating money-weighted returns will usually require use of a financial calculator if there are cash flows
more than one period in the future. Earlier we presented a case where a money-weighted return for two periods
was equal to the IRR, where NPV = 0.

Answer:
For money-weighted returns covering a single period, we know PV (inflows) – PV (outflows) = 0. If we pay
$100 for a stock today, and sell it in one year later for $105, and collect a $2 dividend, we have a money-
weighted return or IRR = ($105)/(1 + r) + ($2)/(1 + r) – $100 = $0. r = ($105 + $2)/$100 – 1, or 7%.

Money-weighted return = time-weighted return for a single period where the cash flow is received at the end. If
the period is any time frame other than one year, take (1 + the result), multiply by 1/Y and subtract 1 to find the
annualized return.
2.8 - Calculating Yield

Calculating Yield for a U.S. Treasury Bill


A U.S. Treasury bill is the classic example of a pure
discount instrument, where the interest the government
pays is the difference between the amount it promises to
pay back at maturity (the face value) and the amount it
borrowed when issuing the T-bill (the discount). T-bills
are short-term debt instruments (by definition, they have
less than one year to maturity), and there is zero default
risk with a U.S. government guarantee. After being
issued, T-bills are widely traded in the secondary market,
and are quoted based on the bank discount yield (i.e. the
approximate annualized return the buyer should expect if
holding until maturity). A bank discount yield (RBD) can be computed as follows:

Formula 2.10

RBD = D/F * 360/t

Where: D = dollar discount from face value, F = face value,


T = days until maturity, 360 = days in a year

By bank convention, years are 360 days long, not 365. If you recall the joke about banker's hours being shorter
than regular business hours, you should remember that banker's years are also shorter.

For example, if a T-bill has a face value of $50,000, a current market price of $49,700 and a maturity in 100
days, we have:
RBD = D/F * 360/t = ($50,000-$49,700)/$50000 * 360/100 = 300/50000 * 3.6 = 2.16%

On the exam, you may be asked to compute the market price, given a quoted yield, which can be accomplish by
using the same formula and solving for D:

Formula 2.11

D = RBD*F * t/360
Example:
Using the previous example, if we have a bank discount yield of 2.16%, a face value of $50,000 and days to
maturity of 100, then we calculate D as follows:

D = (0.0216)*(50000)*(100/360) = 300

Market price = F – D = 50,000 – 300 = $49,700


Holding-Period Yield (HPY)
HPY refers to the un-annualized rate of return one receives for holding a debt instrument until maturity. The
formula is essentially the same as the concept of holding-period return needed to compute time-weighted
performance. The HPY computation provides for one cash distribution or interest payment to be made at the
time of maturity, a term that can be omitted for U.S. T-bills.

Formula 2.12

HPY = (P1 – P0 + D1)/P0

Where: P0 = purchase price, P1 = price at maturity, and D1= cash


distribution at maturity
Example:
Taking the data from the previous example, we illustrate the calculation of HPY:

HPY = (P1 – P0 + D1)/P0 = (50000 – 49700 + 0)/49700 = 300/49700 = 0.006036 or 0.6036%


Effective annual yield (EAY)
EAY takes the HPY and annualizes the number to facilitate comparability with other investments. It uses the
same logic presented earlier when describing how to annualize a compounded return number: (1) add 1 to the
HPY return, (2) compound forward to one year by carrying to the 365/t power, where t is days to maturity, and
(3) subtract 1.

Here it is expressed as a formula:

Formula 2.13

EAY = (1 + HPY)365/t – 1

Example:
Continuing with our example T-bill, we have:
EAY = (1 + HPY)365/t – 1 = (1 + 0.006036)365/100 – 1 = 2.22 percent.

Remember that EAY > bank discount yield, for three reasons: (a) yield is based on purchase price, not face
value, (b) it is annualized with compound interest (interest on interest), not simple interest, and (c) it is based on
a 365-day year rather than 360 days. Be prepared to compare these two measures of yield and use these three
reasons to explain why EAY is preferable.

The third measure of yield is the money market yield, also known as the CD equivalent yield, and is denoted by
rMM. This yield measure can be calculated in two ways:

1. When the HPY is given, rMM is the annualized yield based on a 360-day year:

Formula 2.14

rMM = (HPY)*(360/t)

Where: t = days to maturity

For our example, we computed HPY = 0.6036%, thus the money market yield is:

rMM = (HPY)*(360/t) = (0.6036)*(360/100) = 2.173%.

2. When bond price is unknown, bank discount yield can be used to compute the money market yield, using this
expression:

Formula 2.15

rMM = (360* rBD)/(360 – (t* rBD)

Using our case:

rMM = (360* rBD)/(360 – (t* rBD) = (360*0.0216)/(360 – (100*0.0216)) = 2.1735%, which is identical to the result
at which we arrived using HPY.

Interpreting Yield
This involves essentially nothing more than algebra: solve for the unknown and plug in the known quantities.
You must be able to use these formulas to find yields expressed one way when the provided yield number is
expressed another way.

Since HPY is common to the two others (EAY and MM yield), know how to solve for HPY to answer a
question.

Effective Annual Yield EAY = (1 + HPY)365/t – 1 HPY = (1 + EAY)t/365 – 1


Money Market Yield rMM = (HPY)*(360/t) HPY = rMM * (t/360)

Bond Equivalent Yield


The bond equivalent yield is simply the yield stated on a semiannual basis multiplied by 2. Thus, if you are
given a semiannual yield of 3% and asked for the bond equivalent yield, the answer is 6%.
2.9 - Statistical Concepts And Market Returns
The term statistics is very broad. In some contexts it is
used to refer to specific data. Statistics is also a branch of
mathematics, a field of study – essentially the analysis
tools and methods that are applied to data. Data, by itself,
is nothing more than quantities and numbers. With
statistics, data can be transformed into useful information
and can be the basis for understanding and making
intelligent comparisons and decisions.

Basics

• Descriptive Statistics - Descriptive statistics are tools used to summarize and consolidate large masses
of numbers and data so that analysts can get their hands around it, understand it and use it. The learning
outcomes in this section of the guide (i.e. the statistics section) are focused on descriptive statistics.

• Inferential Statistics - Inferential statistics are tools used to draw larger generalizations from observing
a smaller portion of data. In basic terms, descriptive statistics intend to describe. Inferential statistics
intend to draw inferences, the process of inferring. We will use inferential statistics in section D.
Probability Concepts, later in this chapter.

Population Vs. Sample


A population refers to every member of a group, while a sample is a small subset of the population. Sampling is
a method used when the task of observing the entire population is either impossible or impractical. Drawing a
sample is intended to produce a smaller group with the same or similar characteristics as the population, which
can then be used to learn more about the whole population.

Parameters and Sample Statistics


A parameter is the set of tools and measures used in descriptive statistics. Mean, range and variance are all
commonly used parameters that summarize and describe the population. A parameter describes the total
population. Determining the precise value of any parameter requires observing every single member of the
population. Since this exercise can be impossible or impractical, we use sampling techniques, which draw a
sample that (the analyst hopes) represents the population. Quantities taken from a sample to describe its
characteristics (e.g. mean, range and variance) are termed sample statistics.

Population  P arameter Sample S


 ample Statistic

Measurement Scales
Data is measured and assigned to specific points based on a chosen scale. A measurement scale can fall into one
of four categories:

1. Nominal - This is the weakest level as the only purpose is to categorize data but not rank it in any way.
For example, in a database of mutual funds, we can use a nominal scale for assigning a number to
identify fund style (e.g. 1 for large-cap value, 2 for large-cap growth, 3 for foreign blend, etc.). Nominal
scales don't lend themselves to descriptive tools – in the mutual fund example, we would not report the
average fund style as 5.6 with a standard deviation of 3.2. Such descriptions are meaningless for
nominal scales.

2. Ordinal - This category is considered stronger than nominal as the data is categorized according to some
rank that helps describe rankings or differences between the data. Examples of ordinal scales include the
mutual fund star rankings (Morningstar 1 through 5 stars), or assigning a fund a rating between 1 and 10
based on its five-year performance and its place within its category (e.g. 1 for the top 10%, 2 for funds
between 10% and 20% and so forth). An ordinal scale doesn't always fully describe relative differences
– in the example of ranking 1 to 10 by performance, there may be a wide performance gap between 1
and 2, but virtually nothing between 6, 7, and 8.

3. Interval - This is a step stronger than the ordinal scale, as the intervals between data points are equal,
and data can be added and subtracted together. Temperature is measured on interval scales (Celsius and
Fahrenheit), as the difference in temperature between 25 and 30 is the same as the difference between 85
and 90. However, interval scales have no zero point – zero degrees Celsius doesn't indicate no
temperature; it's simply the point at which water freezes. Without a zero point, ratios are meaningless –
for example, nine degrees is not three times as hot as three degrees.

4. Ratio - This category represents the strongest level of measurement, with all the features of interval
scales plus the zero point, giving meaning to ratios on the scale. Most measurement scales used by
financial analysts are ratios, including time (e.g. days-to-maturity for bonds), money (e.g. earnings per
share for a set of companies) and rates of return expressed as a percentage.

Frequency Distribution
A frequency distribution seeks to describe large data sets by doing four things:

(1) establishing a series of intervals as categories,


(2) assigning every data point in the population to one of the categories,
(3) counting the number of observations within each category and
(4) presenting the data with each assigned category, and the frequency of observations in each category.

Frequency distribution is one of the simplest methods employed to describe populations of data and can be used
for all four measurement scales – indeed, it is often the best and only way to describe data measured on a
nominal, ordinal or interval scale. Frequency distributions are sometimes used for equity index returns over a
long history – e.g. the S&P 500 annual or quarterly returns
grouped into a series of return intervals.
<< Back
Next >>
2.10 - Basic Statistical Calculations
Holding Period Return
The holding return period formula was introduced
previously when discussing time-weighted return
measurement. The same formula applies when applied to
frequency distributions (descriptions changed slightly):

Formula 2.16

Rt = [(Pt – Pt - 1 + Dt)/ Pt – 1]

Where: Rt = holding period return for time period (t) and Pt =


price of asset at end of time period t, Pt - 1 = price of asset at end
of time period (t – 1),
Dt = cash distributions received during time t

Relative and Cumulative Frequencies


Relative frequency is calculated by dividing the absolute frequency of a particular interval by the total
population. Cumulative relative frequency is a process where relative frequencies are added together to show
the percentage of observations that fall at or below a certain point. For an illustration on calculating relative
frequency and cumulative relative frequency, refer to the following frequency distribution for quarterly returns
over the last 10 years for a mutual fund:

Quarterly Number of Relative Cumulative Cumulative


return observations frequency absolute relative
interval (absolute frequency frequency
frequency)

–15% to 2 5.0% 2 5.0%


-10%

–10% to – 1 2.5% 3 7.5%


5%

–5% to 0% 5 12.5% 8 20.0%

0% to +5% 17 42.5% 25 62.5%

+5% to 10 25.0% 35 87.5%


+10%

+10% to 2 5.0% 37 92.5%


+15%

+15% to 3 7.5% 40 100.0%


+20%

There are 40 observations in this distribution (last 10 years, four quarters per year), and the relative frequency is
found by dividing the number in the second column by 40. The cumulative absolute frequency (fourth column)
is constructed by adding the frequency of all observations at or below that point. So for the fifth interval, +5%
to +10%, we find the cumulative absolute frequency by adding the absolute frequency in the fifth interval and
all previous intervals: 2+1+5+17+10=35. The last column, cumulative relative frequency, takes the number in
the fourth column and divides by 40, the total number of observations.

Histograms and Frequency Polygons


A histogram is a frequency distribution presented as a bar chart, with number of observations on the Y axis and
intervals on the X.

The frequency distribution above is presented as a histogram in figure 2.2 below:

Figure 2.2: Histogram

A return polygon presents a line chart rather than a bar chart. Here is the data from the frequency distribution
presented with a return polygon:

Figure 2.3: Return Polygon

Look Out!

You may be asked to describe the data presented for a histogram


or frequency polygon. Most likely this would involve evaluating
risk by indicating that there are two examples of the most
negative outcomes (i.e. quarters below –10%, category 1). Also
you may be asked how normally distributed the graph appears.
Normal distributions are detailed later in this study guide.

Central Tendency
The term "measures of central tendency" refers to the various methods used to describe where large groups of
data are centered in a population or a sample. Here it is stated another way: if we were to pull one value or
observation from a population or sample, what would we typically expect the value to be? Various methods are
used to calculate central tendency. The most frequently used is the arithmetic mean, or the sum of observations
divided by the number of observations.

Example: Arithmetic Mean


For example, if we have 20 quarters of return data:

-1.5%-2.5%+5.6%+10.7%

+0.8%-7.7%-10.1% +2.2%

+12.0%+10.9% -2.6% +0.2%

-1.9%-6.2%+17.1% +4.8%

+9.1% +3.0% -0.2% +1.8%

We find the arithmetic mean by adding the 20 observations together, then dividing by 20.

((-1.5%) + (-2.5%) + 5.6% + 10.7% + 0.8% + (7.7%) + (-10.1%) + 2.2% + 12.0% + 10.9% + (-2.6%) + 0.2% +
(-1.9%) + (-6.2%) + 17.1% + 4.8% + 9.1% + 3.0% + (-0.2%) + 1.8%) = 45.5%

Arithmetic mean = 45.5%/20 = 2.275%


The mean is usually interpreted as answering the question of what will be the most likely outcome, or what
represents the data most fairly.

The arithmetic mean formula is used to compute population mean (often denoted by the Greek symbol μ),
which is the arithmetic mean of the entire population. The population mean is an example of a parameter, and
by definition it must be unique. That is, a given population can have only one mean. The sample mean (denoted
by X or X-bar) is the arithmetic mean value of a sample. It is an example of a sample statistic, and will be
unique to a particular sample. In other words, five samples drawn from the same population may produce five
different sample means.

While the arithmetic mean is the most frequently used measure of central tendency, it does have shortcomings
that in some cases tend to make it misleading when describing a population or sample. In particular, the
arithmetic mean is sensitive to extreme values.

Example:
For example, let's say we have the following five observations: -9000, 1.4, 1.6, 2.4 and 3.7. The arithmetic
mean is –1798.2 [(-9000 + 1.4 + 1.6 + 2.4 + 3.7)/5], yet –1798.2 has little meaning in describing our data set.

The outlier (-9000) draws down the overall mean. Statisticians use a variety of methods to compensate for
outliers, such as, for example, eliminating the highest and lowest value before calculating the mean.

For example, by dropping –9000 and 3.7, the three remaining observations have a mean of 1.8, a more
meaningful description of the data. Another approach is to use either the median or mode, or both.
Weighted Average or Mean
The weighted average or weighted mean, when applied to a portfolio, takes the mean return of each asset class
and weights it by the allocation of each class.
Say a portfolio manager has the following allocation and mean annual performance returns achieved for each
class:

Asset Class Portfolio Mean annual


weight return
U.S. Large Cap 30% 9.6%
U.S. Mid Cap 15% 11.2%
U.S. Small Cap 10% 7.4%
Foreign (Developed Mkts.) 15% 8.8%
Emerging Markets 8% 14.1%
Fixed Income 12% 4.1%
(short/intermediate)
Fixed Income (long maturities) 7% 6.6%
Cash/Money Market 3% 2.1%

The weighted mean is calculated by weighting the return on each class and summing:

Portfolio return = (0.30)*(0.096) + (0.15)*(0.112) + (0.10)*(0.074) + (0.15)*(0.088) + (0.08)*(0.141) +


(0.12)*(0.041) + (0.07)*(0.066) + (0.03)*(0.021) = 8.765%

Median
Median is defined as the middle value in a series that is sorted in either ascending or descending order. In the
example above with five observations, the median, or middle value, is 1.6 (i.e. two values below 1.6, and two
values above 1.6). In this case, the median is a much fairer indication of the data compared to the mean of –
1798.2.

Mode
Mode is defined as the particular value that is most frequently observed. In some applications, the mode is the
most meaningful description. Take a case with a portfolio of ten mutual funds and their respective ratings: 5, 4,
4, 4, 4, 4, 4, 3, 2 and 1. The arithmetic mean rating is 3.5 stars. However in this example, the modal rating of
four describes the majority of observations and might be seen as a fairer description.

Weighted Mean
Weighted mean is frequently seen in portfolio problems in which various assets classes are weighted within the
portfolio – for example, if stocks comprise 60% of a portfolio, then 0.6 is the weight. A weighted mean is
computed by multiplying the mean of each weight by the weight, and then summing the products.

Take an example where stocks are weighted 60%, bonds 30% and cash 10%. Assume that the stock portion
returned 10%, bonds returned 6% and cash returned 2%. The portfolio's weighted mean return is:

Stocks (wtd) + Bonds (wtd) + Cash (wtd) = (0.6)*(0.1) + (0.3)*(0.06) + (0.1)*(0.02) = (0.06) + (0.018) +
(0.002) = 8%

Geometric Mean
We initially introduced the geometric mean earlier in the computations for time-weighted performance. It is
usually applied to data in percentages: rates of return over time, or growth rates. With a series of n observations
of statistic X, the geometric mean (G) is:

Formula 2.1 7
G = (X1*X2*X3*X4 … *Xn)1/n

So if we have a four-year period in which a company's sales grew 4%, 5%,


-3% and 10%, here is the calculation of the geometric mean:

G = ((1.04)*(1.05)*(0.97)*(1.1))1/4 – 1 = 3.9%.

It's important to gain experience with using geometric mean on percentages, which involves linking the data
together: (1) add 1 to each percentage, (2) multiply all terms together, (3) carry the product to the 1/n power and
(4) subtract 1 from the result.

Harmonic mean is computed by the following steps:

1. Taking the reciprocal of each observation, or 1/X,

2. Adding these terms together,

3. Averaging the sum by dividing by n, or the total number of observations,

4. Taking the reciprocal of this result.

The harmonic mean is most associated with questions about dollar cost averaging, but its use is limited.
Arithmetic mean, weighted mean and geometric mean are the most frequently used measures and should be the
main emphasis of study.

Quartiles, Quintiles, Deciles, and Percentiles.


These terms are most associated with cases where the point of central tendency is not the main goal of the
research study. For example, in a distribution of five-year performance returns for money managers, we may
not be interested in the mean performer (i.e. the manager at the 50% level), but rather in those in the top 10% or
top 20% of the distribution. Recall that the median essentially divides a distribution in half.

By the same process, quartiles are the result of a distribution being divided into four parts; quintiles refer to five
parts; deciles, 10 parts; and percentiles, 100 parts. A manager in the second quintile would be better than 60%
(bottom three quintiles) and below 20% (the top quintile) (i.e. somewhere between 20% and 40% in percentile
terms). A manager at the 21st percentile has 20 percentiles above, 79 percentiles below.
2.11 - Standard Deviation And Variance

Range and Mean Absolute Deviation


The range is the simplest measure of dispersion, the
extent to which the data varies from its measure of central
tendency. Dispersion or variability is a concept covered
extensively in the CFA curriculum, as it emphasizes risk,
or the chances that an investment will not achieve its
expected outcome. If any investment has two dimensions
– one describing risk, one describing reward – then we
must measure and present both dimensions to gain an idea
of the true nature of the investment. Mean return
describes the expected reward, while the measures of
dispersion describe the risk.
Range
Range is simply the highest observation minus the lowest observation. For data that is sorted, it should be easy
to locate maximum/minimum values and compute the range. The appeal of range is that it is simple to interpret
and easy to calculate; the drawback is that by using just two values, it can be misleading if there are extreme
values that turn out to be very rare, and it may not fairly represent the entire distribution (all of the outcomes).

Mean Absolute Deviation (MAD)


MAD improves upon range as an indicator of dispersion by using all of the data. It is calculated by:

1. Taking the difference between each observed value and the mean, which is the deviation

2. Using the absolute value of each deviation, adding all deviations together

3. Dividing by n, the number of observations.


Example:
To illustrate, we take an example of six mid-cap mutual funds, on which the five-year annual returns are +10.1,
+7.7%, +5.0, +12.3%, +12.2% and +10.9%.

Answer:
Range = Maximum – Minimum = (+12.3%) – (+5.0%) = 7.3%

Mean absolute deviation starts by finding the mean: (10.1% + 7.7% + 5.0% + 12.3% + 12.2% + 10.9%)/6 =
9.7%.

Each of the six observations deviate from the 9.7%; the absolute deviation ignores +/–.

1st: 10.1 – 9.7 = 0.4 3rd: 5.0 – 9.7 = 4.7 5th: 12.2 – 9.7 = 2.5

2nd: 7.7 – 9.7 = 2.0 4th: 12.3 – 9.7 = 2.6 6th: 10.9 – 9.7 = 1.2

Next, the absolute deviations are summed and divided by 6:(0.4 + 2.0 + 4.7 + 2.6 + 2.5 + 1.2)/6 = 13.4/6 =
2.233333, or rounded, 2.2%.
Variance
Variance (σ2) is a measure of dispersion that in practice can be easier to apply than mean absolute deviation
because it removes +/– signs by squaring the deviations.

Returning to the example of mid-cap mutual funds, we had six deviations. To compute variance, we take the
square of each deviation, add the terms together and divide by the number of observations.

Observation Value Deviation from +9.7% Square of Deviation


1 +10.1% 0.4 0.16
2 +7.7% 2.0 4.0
3 +5.0% 4.7 22.09
4 +12.3% 2.6 6.76
5 +12.2% 2.5 6.25
6 +10.9% 1.2 1.44

Variance = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/6 = 6.7833. Variance is not in the same units as the
underlying data. In this case, it's expressed as 6.7833% squared – difficult to interpret unless you are a
mathematical expert (percent squared?).

Standard Deviation
Standard deviation (σ) is the square root of the variance, or (6.7833)1/2 = 2.60%. Standard deviation is expressed
in the same units as the data, which makes it easier to interpret. It is the most frequently used measure of
dispersion.

Our calculations above were done for a population of six mutual funds. In practice, an entire population is either
impossible or impractical to observe, and by using sampling techniques, we estimate the population variance
and standard deviation. The sample variance formula is very similar to the population variance, with one
exception: instead of dividing by n observations (where n = population size), we divide by (n – 1) degrees of
freedom, where n = sample size. So in our mutual fund example, if the problem was described as a sample of a
larger database of mid-cap funds, we would compute variance using n – 1, degrees of freedom.

Sample variance (s2) = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/(6 – 1) = 8.14

Sample Standard Deviation (s)


Sample standard deviation is the square root of sample variance:

(8.14)1/2 = 2.85%.

In fact, standard deviation is so widely used because, unlike variance, it is expressed in the same units as the
original data, so it is easy to interpret, and can be used on distribution graphs (e.g. the normal distribution).

Semivariance and Target Semivariance


Semivariance is a risk measure that focuses on downside risk, and is defined as the average squared deviation
below the mean. Computing a semivariance starts by using only those observations below the mean, that is, any
observations at or above the mean are ignored. From there, the process is similar to computing variance. If a
return distribution is symmetric, semivariance is exactly half of the variance. If the distribution is negatively
skewed, semivariance can be higher. The idea behind semivariance is to focus on negative outcomes.

Target semivariance is a variation of this concept, considering only those squared deviations below a certain
target. For example, if a mutual fund has a mean quarterly return of +3.6%, we may wish to focus only on
quarters where the outcome is –5% or lower. Target semivariance eliminates all quarters above –5%. From
there, the process of computing target semivariance follows the same procedure as other variance measures.

Chebyshev's Inequality
Chebyshev's inequality states that the proportion of observations within k standard deviations of an arithmetic
mean is at least 1 – 1/k2, for all k > 1.

# of Standard
Chebyshev's % of
Deviations from Mean
Inequality Observations
(k)
1 – 1/(2)2, or 1 – 1/4,
2 75 (.75)
or 3/4
1 – 1/(3)2, or 1 – 1/9,
3 89 (.8889)
or
1 – 1/(4)2, or 1 –
4 94 (.9375)
1/16, or 15/16
Given that 75% of observations fall within two standard deviations, if a distribution has an annual mean return
of 10% and a standard deviation of 5%, we can state that in 75% of the years, the return will be anywhere from
0% to 20%. In 25% of the years, it will be either below 0% or above 20%. Given that there are 89% falling
within three standard deviations means that in 89% of the years, the return will be within a range of –5% to
+25%. Eleven percent of the time it won't.

Later we will learn that for so-called normal distributions, we expect about 95% of the observations to fall
within two standard deviations. Chebyshev's inequality is more general and doesn't assume a normal
distribution, that is, it applies to any shaped distribution.

Coefficient of Variation
The coefficient of variation (CV) helps the analyst interpret relative dispersion. In other words, a calculated
standard deviation value is just a number. Does this number indicate high or low dispersion? The coefficient of
variation helps describe standard deviation in terms of its proportion to its mean by this formula:

Formula 2.18

CV = s/X

Where: s = sample standard deviation, X = sample mean

Sharpe Ratio
The Sharpe ratio is a measure of the risk-reward tradeoff of an investment security or portfolio. It starts by
defining excess return, or the percentage rate of return of a security above the risk-free rate. In this view, the
risk-free rate is a minimum rate that any security should earn. Higher rates are available provided one assumes
higher risk.

The Sharpe ratio is calculated by dividing the ratio of excess return, to the standard deviation of return.

Formula 2.19

Sharpe ratio = [(mean return) – (risk-free return)] / standard


deviation of return

Example: Sharpe Ratio


If an emerging-markets fund has a historic mean return of 18.2% and a standard deviation of 12.1%, and the
return on three-month T-bills (our proxy for a risk-free rate) was 2.3%, the Sharpe ratio = (18.2)–(2.3)/12.1 =
1.31. In other words, for every 1% of additional risk we accept by investing in this emerging markets fund, we
are rewarded with an excess 1.31%. Part of the reason that the Sharpe ratio has become popular is that it's an
easy to understand and appealing concept, for practitioners and investors.
2.12 - Skew And Kurtosis
Skew
Skew, or skewness, can be mathematically defined as the
averaged cubed deviation from the mean divided by the
standard deviation cubed. If the result of the computation
is greater than zero, the distribution is positively skewed.
If it’s less than zero, it’s negatively skewed and equal to
zero means it’s symmetric. For interpretation and
analysis, focus on downside risk. Negatively skewed
distributions have what statisticians call a long left tail
(refer to graphs on previous page), which for investors
can mean a greater chance of extremely negative
outcomes. Positive skew would mean frequent small
negative outcomes, and extremely bad scenarios are not
as likely.

A nonsymmetrical or skewed distribution occurs when one side of the distribution does not mirror the other.
Applied to investment returns, nonsymmetrical distributions are generally described as being either positively
skewed (meaning frequent small losses and a few extreme gains) or negatively skewed (meaning frequent small
gains and a few extreme losses).

Positive Skew Negative Skew


Figure 2.4

For positively skewed distributions, the mode (point at the top of the curve) is less than the median (the point
where 50% are above/50% below), which is less than the arithmetic mean (sum of observations/number of
observations). The opposite rules apply to negatively skewed distribution: mode is greater than median, which
is greater than arithmetic mean.

Positive: Mean > Median > Mode Negative: Mean < Median < Mode

Notice that by alphabetical listing, it’s mean à median à mode. For positive skew, they are separated with a
greater than sign, for negative, less than.

Kurtosis
Kurtosis refers to the degree of peak in a distribution. More peak than normal (leptokurtic) means that a
distribution also has fatter tails and that there is a greater chance of extreme outcomes compared to a normal
distribution.

The kurtosis formula measures the degree of peak. Kurtosis equals three for a normal distribution; excess
kurtosis calculates and expresses kurtosis above or below 3.

In figure 2.5 below, the solid line is the normal distribution; the dashed line is leptokurtic distribution.
Figure 2.5: Kurtosis

Sample Skew and Kurtosis


For a calculated skew number (average cubed deviations divided by the cubed standard deviation), look at the
sign to evaluate whether a return is positively skewed (skew > 0), negatively skewed (skew < 0) or symmetric
(skew = 0). A kurtosis number (average deviations to the fourth power divided by the standard deviation to the
fourth power) is evaluated in relation to the normal distribution, on which kurtosis = 3. Since excess kurtosis =
kurtosis – 3, any positive number for excess kurtosis would mean the distribution is leptokurtic (meaning fatter
tails and greater risk of extreme outcomes).
<< Back
Next >>
2.13 - Basic Probability Concepts
To help make logical and consistent investment decisions, and help manage expectations in an environment of
risk, an analyst uses the concepts and tools found in probability theory. A probability refers to the percentage
chance that something will happen, from 0 (it is impossible) to 1 (it is certain to occur), and the scale going
from less likely to more likely. Probability concepts help define risk by quantifying the prospects for unintended
and negative outcomes; thus probability concepts are a major focus of the CFA curriculum.

I. Basics

Random Variable
A random variable refers to any quantity with uncertain
expected future values. For example, time is not a random
variable since we know that tomorrow will have 24 hours,
the month of January will have 31 days and so on.
However, the expected rate of return on a mutual fund
and the expected standard deviation of those returns are
random variables. We attempt to forecast these random
variables based on past history and on our forecast for the
economy and interest rates, but we cannot say for certain
what the variables will be in the future – all we have are
forecasts or expectations.

Outcome
Outcome refers to any possible value that a random
variable can take. For expected rate of return, the range of
outcomes naturally depends on the particular investment or proposition. Lottery players have a near-certain
probability of losing all of their investment (–100% return), with a very small chance of becoming a
multimillionaire (+1,000,000% return – or higher!). Thus for a lottery ticket, there are usually just two extreme
outcomes. Mutual funds that invest primarily in blue chip stocks will involve a much narrower series of
outcomes and a distribution of possibilities around a specific mean expectation. When a particular outcome or a
series of outcomes are defined, it is referred to as an event. If our goal for the blue chip mutual fund is to
produce a minimum 8% return every year on average, and we want to assess the chances that our goal will not
be met, our event is defined as average annual returns below 8%. We use probability concepts to ask what the
chances are that our event will take place.

Event
If a list of events ismutually exclusive, it means that only one of them can possibly take place. Exhaustive
events refer to the need to incorporate all potential outcomes in the defined events. For return expectations, if
we define our two events as annual returns equal to or greater than 8% and annual returns equal to or less than
8%, these two events would not meet the definition of mutually exclusive since a return of exactly 8% falls into
both categories. If our defined two events were annual returns less than 8% and annual returns greater than 8%,
we've covered all outcomes except for the possibility of an 8% return; thus our events are not exhaustive.

The Defining Properties of Probability


Probability has two defining properties:

1. The probability of any event is a number between 0 and 1, or 0 < P(E) < 1. A P followed by parentheses
is the probability of (event E) occurring. Probabilities fall on a scale between 0, or 0%, (impossible) and
1, or 100%, (certain). There is no such thing as a negative probability (less than impossible?) or a
probability greater than 1 (more certain than certain?).

2. The sum of all probabilities of all events equals 1, provided the events are both mutually exclusive and
exhaustive. If events are not mutually exclusive, the probabilities would add up to a number greater than
1, and if they were not exhaustive, the sum of probabilities would be less than 1. Thus, there is a need to
qualify this second property to ensure the events are properly defined (mutually exclusive, exhaustive).
On an exam question, if the probabilities in a research study are added to a number besides 1, you might
question whether this principle has been met.

These terms refer to the particular approach an analyst has used to define the events and make predictions on
probabilities (i.e. the likelihood of each event occurring). How exactly does the analyst arrive at these
probabilities? What exactly are the numbers based upon? The approach is empirical, subjective or a priori.

Empirical Probabilities
Empirical probabilities are objectively drawn from historical data. If we assembled a return distribution based
on the past 20 years of data, and then used that same distribution to make forecasts, we have used an empirical
approach. Of course, we know that past performance does not guarantee future results, so a purely empirical
approach has its drawbacks.

Subjective Probabilities
Relationships must be stable for empirical probabilities to be accurate and for investments and the economy,
relationships change. Thus, subjective probabilities are calculated; these draw upon experience and judgment to
make forecasts or modify the probabilities indicated from a purely empirical approach. Of course, subjective
probabilities are unique to the person making them and depend on his or her talents – the investment world is
filled with people making incorrect subjective judgments.

A Priori Probabilities
A priori probabilities represent probabilities that are objective and based on deduction and reasoning about a
particular case. For example, if we forecast that a company is 70% likely to win a bid on a contract (based on an
either empirical or subjective approach), and we know this firm has just one business competitor, then we can
also make an a priori forecast that there is a 30% probability that the bid will go to the competitor.

Exam Tips and Tricks

Know how to distinguish between the empirical, subjective and a


priori probabilities listed above.

Stating the Probability of an Event as Odds "For" or "Against"

Given a probability P(E),

Odds "FOR" E = P(E)/[1 – P(E)] A probability of 20% would be "1 to 4".

Odds "AGAINST" E = [1 – P(E)]/P(E) A probability of 20% would be "4 to 1".


Example:
Take an example of two financial-services companies, whose publicly traded share prices reflect a certain
probability that interest rates will fall. Both firms will receive an equal benefit from lower interest rates.
However, an analyst's research reveals that Firm A's shares, at current prices, reflect a 75% likelihood that rates
will fall, and Firm B's shares only suggest a 40% chance of lower rates. The analyst has discovered (in
probability terms) mutually inconsistent probabilities. In other words, rates can't be (simultaneously) both 75%
and 40% likely to fall. If the true probability of lower rates is 75% (i.e. the market has fairly priced this
probability into Firm A's shares), then as investors we could profit by buying Firm B's undervalued shares. If
the true probability is 40%, we could profit by short selling Firm A's overpriced shares. By taking both actions
(in a classic pairs arbitrage trade), we would theoretically profit no matter the actual probability since one stock
or the other eventually has to move. Many investment decisions are made based on an analyst's perception of
mutually inconsistent probabilities.
Unconditional Probability
Unconditional probability is the straightforward answer to this question: what is the probability of this one event
occurring? In probability notation, the unconditional probability of event A is P(A), which asks, what is the
probability of event A? If we believe that a stock is 70% likely to return 15% in the next year, then P(A) = 0.7,
which is that event's unconditional probability.

Conditional Probability
Conditional probability answers this question: what is the probability of this one event occurring, given that
another event has already taken place? A conditional probability has the notation P(A | B), which represents the
probability of event A, given B. If we believe that a stock is 70% likely to return 15% in the next year, as long
as GDP growth is at least 3%, then we have made our prediction conditional on a second event (GDP growth).
In other words, event A is the stock will rise 15% in the next year; event B is GDP growth is at least 3%; and
our conditional probability is P(A | B) = 0.9.
2.14 - Joint Probability
Joint probability is defined as the probability of both A and B taking place, and is denoted by P(AB).

Joint probability is not the same as conditional probability, though the two concepts are often confused.
Conditional probability assumes that one event has taken place or will take place, and then asks for the
probability of the other (A, given B). Joint probability does not
have such conditions; it simply asks for the chances of both
happening (A and B). In a problem, to help distinguish between
the two, look for qualifiers that one event is conditional on the
other (conditional) or whether they will happen concurrently (joint).

Probability definitions can find their way into CFA exam questions. Naturally, there may also be questions that
test the ability to calculate joint probabilities. Such computations require use of the multiplication rule, which
states that the joint probability of A and B is the product of the conditional probability of A given B, times the
probability of B. In probability notation:

Formula 2.20

Multiplication rule: P(AB) = P(A | B) * P(B)

Given a conditional probability P(A | B) = 40%, and a probability of B = 60%, the joint probability P(AB) =
0.6*0.4 or 24%, found by applying the multiplication rule.

The Addition Rule


The addition rule is used in situations where the probability of at least one of two given events - A and B - must
be found. This probability is equal to the probability of A, plus the probability of B, minus the joint probability
of A and B.

Formula 2.21

Addition Rule: P(A or B) = P(A) + P (B) – P(AB)

For example, if the probability of A = 0.4, and the probability of B = 0.45, and the joint probability of both is
0.2, then the probability of either A or B = 0.4 + 0.45 – 0.2 = 0.65.

Remembering to subtract the joint probability P(AB) is often the difficult part of applying this rule. Indeed, if
the addition rule is required to solve a probability problem on the exam, you can be sure that the wrong answers
will include P(A) + P(B), and P(A)*P(B). Just remember that the addition rule is asking for either A or B, so
you don't want to double count. Thus, the probability of both A and B, P(AB), is an intersection and needs to
be subtracted to arrive at the correct probability.

Dependent and Independent Events


Two events are independent when the occurrence of one has no effect on the probability that the other will
occur. Earlier we established the definition of a conditional probability, or the probability of A given B, P(A |
B). If A is completely independent of B, then this conditional probability is the same as the unconditional
probability of A. Thus the definition of independent events states that two events - A and B - are independent of
each other, if, and only if, P(A | B) = P(A). By the same logic, B would be independent of A if, and only if, P(B
| A), which is the probability of B given that A has occurred, is equal to P(B).

Two events are not independent when the conditional probability of A given B is higher or lower than the
unconditional probability of A. In this case, A is dependent on B. Likewise, if P(B | A) is greater or less than
P(B), we know that B depends on A.

Calculating the Joint Probability of Two or More Independent Events


Recall that for calculating joint probabilities, we use the multiplication rule, stated in probability notation as
P(AB) = P(A | B) * P(B). For independent events, we've now established that P(A | B) = P(A), so by
substituting P(A) into the equation for P(A | B), we see that for independent events, the multiplication rule is
simply the product of the individual probabilities.

Formula 2.22

Multiplication rule, independent events: P(AB) = P(A) * P(B)AB)

Moreover, the rule generalizes for more than two events provided they are all independent of one another, so the
joint probability of three events P(ABC) = P(A) * (P(B) * P(C), again assuming independence.

The Total Probability Rule


The total probability rule explains an unconditional probability of an event, in terms of that event's conditional
probabilities in a series of mutually exclusive, exhaustive scenarios. For the simplest example, there are two
scenarios, S and the complement of S, or SC, and P(S) + P(SC) = 1, given the properties of being mutually
exclusive and exhaustive. How do these two scenarios affect event A? P(A | S) and P(A | SC) are the conditional
probabilities that event A will occur in scenario S and in scenario SC, respectively. If we know the conditional
probabilities, and we know the probability of the two scenarios, we can use the total probability rule formula to
find the probability of event A.

Formula 2.23

Total probability rule (two scenarios): P(A) = P(A | S)*P(S) + P(A


| SC)*P(SC)

This rule is easiest to remember if you compare the formula to the weighted-mean calculation used to compute
rate of return on a portfolio. In that exercise, each asset class had an individual rate of return, weighted by its
allocation to compute the overall return. With the total probability rule, each scenario has a conditional
probability (i.e. the likelihood of event A, given that scenario), with each conditional probability weighted by
the probability of that scenario occurring.

Example: Total Probability


So if we define conditional probabilities of P(A | S) = 0.4, and P(A | SC) = 0.25, and the scenarios P(S) and
P(SC) are 0.8 and 0.2 respectively, the probability of event A is:

P(A) = P(A | S)*P(S) + P(A | SC)*P(SC) = (0.4)*(0.8) + (0.25)*(0.2) = 0.37.

The total probability rule applies to three or more scenarios provided they are mutually exclusive and
exhaustive. The formula is the sum of all weighted conditional probabilities (weighted by the probability of
each scenario occurring).

Using Probability and Conditional Expectations in Making Investment Decisions


Investment decisions involve making future predictions based upon all information that we believe is relevant to
our forecast. However, these forecasts are dynamic; they are always subject to change based on new
information being made public. In many cases this new information causes us to modify our forecasts and either
raise or lower our opinion on an investment. In other words, our expected values are conditional on changing
real-world events, and thus can never be perceived as unconditional probabilities. In fact, a random variable's
expected value is the weighted average of conditional probabilities, weighted by the probability of each scenario
(where scenarios are mutually exclusive and exhaustive). The total probability rule applies to determining
expected values.

Expected Value Methodology


An expected value of a random variable is calculated by assigning a probability to each possible outcome and
then taking a probability-weighted average of the outcomes.

Example: Expected Value


Assume that an analyst writes a report on a company and, based on the research, assigns the following
probabilities to next year's sales:

Scenario Probability Sales ($ Millions)


1 0.10 $16
2 0.30 $15
3 0.30 $14
3 0.30 $13

Answer:
The analyst's expected value for next year's sales is (0.1)*(16.0) + (0.3)*(15.0) + (0.3)*(14.0) + (0.3)*(13.0) =
$14.2 million.

The total probability rule for finding the expected value of variable X is given by E(X) = E(X | S)*P(S) + E(X |
SC)*P(SC) for the simplest case: two scenarios, S and SC, that are mutually exclusive and exhaustive. If we refer
to them as Scenario 1 and Scenario 2, then E(X | S) is the expected value of X in Scenario 1, and E(X | SC) is the
expected value of X in Scenario 2.

Tree Diagram
The total probability rule can be easier to visualize if the information is presented in a tree diagram. Take a case
where we have forecasted company sales to be anywhere in a range from $13 to $16 million, based on
conditional probabilities.

This company is dependent on the overall economy and on Wal-Mart's same-store sales growth, leading to the
conditional probability scenarios demonstrated in figure 2.7 below:

In a good economy, our expected sales would be 25% likely to be $16 million, and 75% likely to be $15
million, depending on Wal-Mart's growth number. In a bad economy, we would be equally likely to generate
$13 million if Wal-Mart sales drop more than 2% or $14 million (if the growth number falls between –2% and
+1.9%).

Expected sales (good economy) = (0.25)*(16) + (0.75)*(15) = 15.25 million.


Expected sales (bad economy) = (0.5)*(13) + (0.5)*(14) = 13.5 million.

We predict that a good economy is 40% likely, and a bad economy 60% likely, leading to our expected value
for sales: (0.4)*(15.25) + (0.6)*(13.5) = 14.2 million.
2.15 - Advanced Probability Concepts
Covariance
Covariance is a measure of the relationship between two random variables, designed to show the degree of co-
movement between them. Covariance is calculated based on the probability-weighted average of the cross-
products of each random variable's deviation from its own
expected value. A positive number indicates co-
movement (i.e. the variables tend to move in the same
direction); a value of 0 indicates no relationship, and a
negative covariance shows that the variables move in the
opposite direction.

The process for actually computing covariance values is


complicated and time-consuming, and it is not likely to be
covered on a CFA exam question. Although the detailed
formulas and examples of computations are presented in
the reference text, for most people, spending too much
valuable study time absorbing such detail will have you
bogged down with details that are unlikely to be tested.

Correlation
Correlation is a concept related to covariance, as it also gives an indication of the degree to which two random
variables are related, and (like covariance) the sign shows the direction of this relationship (positive (+) means
that the variables move together; negative (-) means they are inversely related). Correlation of 0 means that
there is no linear relationship one way or the other, and the two variables are said to be unrelated.

A correlation number is much easier to interpret than covariance because a correlation value will always be
between –1 and +1.

• –1 indicates a perfectly inverse relationship (a unit change in one means that the other will have a unit
change in the opposite direction)
• +1 means a perfectly positive linear relationship (unit changes in one always bring the same unit
changes in the other).

Moreover, there is a uniform scale from –1 to +1 so that as correlation values move closer to 1, the two
variables are more closely related. By contrast, a covariance value between two variables could be very large
and indicate little actual relationship, or look very small when there is actually a strong linear correlation.

Correlation is defined as the ratio of the covariance between two random variables and the product of their two
standard deviations, as presented in the following formula:

Formula 2.24

Correlation (A, B) = _____Covariance (A, B)


Standard Deviation (A)* Standard
Deviation (B)

As a result: Covariance (A, B) = Correlation (A, B)*Standard Deviation (A)*Standard Deviation (B)

Both correlation and covariance with these formulas are likely to be required in a calculation in which the other
terms are provided. Such an exercise simply requires remembering the relationship, and substituting the terms
provided. For example, if a covariance between two numbers of 30 is given, and standard deviations are 5 and
15, the correlation would be 30/(5)*(15) = 0.40. If you are given a correlation of 0.40 and standard deviations of
5 and 15, the covariance would be (0.4)*(5)*(15), or 30.

Expected Return, Variance and Standard Deviation of a Portfolio


Expected return is calculated as the weighted average of the expected returns of the assets in the portfolio,
weighted by the expected return of each asset class. For a simple portfolio of two mutual funds, one investing in
stocks and the other in bonds, if we expect the stock fund to return 10% and the bond fund to return 6%, and our
allocation is 50% to each asset class, we have:

Expected return (portfolio) = (0.1)*(0.5) + (0.06)*(0.5) = 0.08, or 8%

Variance (σ2) is computed by finding the probability-weighted average of squared deviations from the expected
value.

Example: Variance
In our previous example on making a sales forecast, we found that the expected value was $14.2 million.
Calculating variance starts by computing the deviations from $14.2 million, then squaring:

Scenario Probability Deviation from Expected Value Squared


1 0.1 (16.0 – 14.2) = 1.8 3.24
2 0.30 (15.0 – 14.2) = 0.8 0.64
3 0.30 (14.0 – 14.2) = – 0.2 0.04
4 0.30 (16.0 – 14.2) = – 1.2 1.44

Answer:
Variance weights each squared deviation by its probability: (0.1)*(3.24) + (0.3)*(0.64) + (0.3)*(0.04) +
(0.3)*(1.44) = 0.96

The variance of return is a function of the variance of the component assets as well as the covariance between
each of them. In modern portfolio theory, a low or negative correlation between asset classes will reduce overall
portfolio variance. The formula for portfolio variance in the simple case of a two–asset portfolio is given by:

Formula 2.25

Portfolio Variance = w2A*σ2(RA) + w2B*σ2(RB) +


2*(wA)*(wB)*Cov(RA, RB)
Where: wA and wB are portfolio weights, σ2(RA) and σ2(RB) are
variances and
Cov(RA, RB) is the covariance

Example: Portfolio Variance


Data on both variance and covariance may be displayed in a covariance matrix. Assume the following
covariance matrix for our two–asset case:

Stock Bond
Stock 350 80
Bond 80

From this matrix, we know that the variance on stocks is 350 (the covariance of any asset to itself equals its
variance), the variance on bonds is 150 and the covariance between stocks and bonds is 80. Given our portfolio
weights of 0.5 for both stocks and bonds, we have all the terms needed to solve for portfolio variance.

Answer:
Portfolio variance = w2A*σ2(RA) + w2B*σ2(RB) + 2*(wA)*(wB)*Cov(RA, RB) =(0.5)2*(350) + (0.5)2*(150) +
2*(0.5)*(0.5)*(80) = 87.5 + 37.5 + 40 = 165.

Standard Deviation (σ), as was defined earlier when we discuss statistics, is the positive square root of the
variance. In our example, σ = (0.96)1/2, or $0.978 million.

Standard deviation is found by taking the square root of variance:

(165)1/2 = 12.85%.

A two–asset portfolio was used to illustrate this principle; most portfolios contain far more than two assets, and
the formula for variance becomes more complicated for multi-asset portfolios (all terms in a covariance matrix
need to be added to the calculation).

Joint Probability Functions and Covariance


Let's now apply the joint probability function to calculating covariance:

Example: Covariance from a Joint Probability Function


To illustrate this calculation, let's take an example where we have estimated the year-over-year sales growth for
GM and Ford in three industry environments: strong (30% probability), average (40%) and weak (30%). Our
estimates are indicated in the following joint-probability function:

F Sales +6% F Sales +3% F Sales –1%

GM Sales +10% Strong (0.3) - -


GM Sales + 4% - Avg. (0.4) -
GM Sales –4% - - Weak (0.3)

Answer:
To calculate covariance, we start by finding the probability-weighted sales estimate (expected value):
GM = (0.3)*(10) + (0.4)*(4) + (.03)*( –4) = 3 + 1.6 – 1.2 = 3.4%

Ford = (0.3)*(6) + (0.4)*(3) + (0.3)*( –1) = 1.8 + 1.2 – 0.3 = 2.7%

In the following table, we compute covariance by taking the deviations from each expected value in each
market environment, multiplying the deviations together (the cross products) and then weighting the cross
products by the probability

Environment GM deviation F deviation Cross-products Prob. Prob-wtd.


Strong 10 – 3.4 = 6.6 6 – 2.7 = 3.3 6.6*3.3 = 21.78 0.3 6.534
Average 4 – 3.4 = 0.6 3 – 2.7 = 0.3 0.6*0.3 = 0.18 0.4 0.072
Weak –4 – 3.4 = –7.4 –1 – 2.7 = –3.7 –7.4*–3.7 = 27.38 0.3 8.214

The last column (prob-wtd.) was found by multiplying the cross product (column 4) by the probability of that
scenario (column 5).

The covariance is found by adding the values in the last column: 6.534+0.072+8.214 = 14.82.

Bayes' Formula
We all know intuitively of the principle that we learn from experience. For an analyst, learning from experience
takes the form of adjusting expectations (and probability estimates) based on new information. Bayes' formula
essentially takes this principle and applies it to the probability concepts we have already learned, by showing
how to calculate an updated probability, the new probability given this new information. Bayes' formula is the
updated probability, given new information:

Bayes' Formula:

Conditional probability of new info. given the event * (Prior probability of the event)
Unconditional Probability of New Info

Formula 2.26

P(E | I) = P(I | E) / P(I) * P(E)

Where: E = event, I = new info

The Multiplication Rule of Counting


The multiplication rule of counting states that if the specified number of tasks is given by k and n1, n2, n3, … nk
are variables used for the number of ways each of these tasks can be done, then the total number of ways to
perform k tasks is found by multiplying all of the n1, n2, n3, … nk variables together.

Take a process with four steps:

Number of ways
Step
this step can be done
1 6
2 3
3 1
4 5

This process can be done a total of 90 ways. (6)*(3)*(1)*(5) = 90.

Factorial Notation
n! = n*(n – 1)*(n – 2) … *1. In other words, 5!, or 5 factorial is equal to (5)*(4)*(3)*(2)*(1) = 120. In counting
problems, it is used when there is a given group of size n, and the exercise is to assign the group to n slots; then
the number of ways these assignments could be made is given by n!. If we were managing five employees and
had five job functions, the number of possible combinations is 5! = 120.

Combination Notation
Combination notation refers to the number of ways that we can choose r objects from a total of n objects, when
the order in which the r objects is listed does not matter.

In shorthand notation:
Formula 2.27

nCr = n = n!
r (n – r)!*r!

Thus if we had our five employees and we needed to choose three of them to team up on a new project, where
they will be equal members (i.e. the order in which we choose them isn't important), formula tells us that there
are 5!/(5 – 3)!3! = 120/(2)*(6) = 120/12, or 10 possible combinations.

Permutation notation
Permutation notation takes the same case (choosing r objects from a group of n) but assumes that the order that
“r” is listed matters. It is given by this notation:

Formula 2.28

P = n!/(n – r)!
n r

Returning to our example, if we not only wanted to choose three employees for our project, but wanted to
establish a hierarchy (leader, second-in-command, subordinate), by using the permutation formula, we would
have 5!/(5 – 3)! = 120/2 = 60 possible ways.

Now, let's consider how to calculate problems asking the number of ways to choose robjects from a total of
nobjects when the order in which the robjects are listed matters, and when the order does not matter.

• The combination formula is used if the order of r does not matter. For choosing three objects from a
total of five objects, we found 5!/(5 – 3)!*3!, or 10 ways.
• The permutation formula is used if the order of r does matter. For choosing three objects from a total of
five objects, we found 5!/(5 – 3)!, or 60 ways.
Method When appropriate?
Factorial Assigning a group of size n to n slots
Combination Choosing r objects (in any order) from group of n
Permutation Choosing r objects (in particular order) from group
of n
2.16 - Common Probability Distributions
The topics in this section provide a number of the quantitative building blocks useful in analyzing and
predicting random variables such as future sales and earnings, growth rates, market index returns and returns on
individual asset classes and specific securities. All of these variables have uncertain outcomes; thus there is risk
that any downside uncertainty can result in a surprising and material impact. By understanding the mechanics of
probability distributions, such risks can be understood and analyzed, and measures taken to hedge or reduce
their impact.

Probability Distribution
A probability distribution gathers together all possible outcomes of a random variable (i.e. any quantity for
which more than one value is possible), and summarizes these outcomes by indicating the probability of each of
them. While a probability distribution is often associated with the bell-shaped curve, recognize that such a curve
is only indicative of one specific type of probability, the so-called normal probability distribution. The CFA
curriculum does focus on normal distributions since they frequently apply to financial and investment variables,
and are used in hypothesis testing. However, in real life, a probability distribution can take any shape, size and
form.

Example: Probability Distribution


For example, say if we wanted to choose a day at random in the
future to schedule an event, and we wanted to know the
probability that this day would fall on a Sunday, as we will need to avoid scheduling it on a Sunday. With seven
days in a week, the probability that a random day would happen to be a Sunday would be given by one-seventh
or about 14.29%. Of course, the same 14.29% probability would be true for any of the other six days.

In this case, we would have a uniform probability distribution: the chances that our random day would fall on
any particular day are the same, and the graph of our probability distribution would be a straight line.

Figure 2.8: Probability Distribution

Probability distributions can be simple to understand as in this example, or they can be very complex and
require sophisticated techniques (e.g., option pricing models, Monte Carlo simulations) to help describe all
possible outcomes.
Discrete Random Variables
Discrete random variables can take on a finite or countable number of possible outcomes. The previous example
asking for a day of the week is an example of a discrete variable, since it can only take seven possible values.
Monetary variables expressed in dollars and cents are always discrete, since money is rounded to the nearest
$0.01. In other words, we may have a formula that suggests a stock worth $15.75 today will be $17.1675 after it
grows 9%, but you can’t give or receive three-quarters of a penny, so our formula would round the outcome of
9% growth to an amount of $17.17.

Continuous Random Variables


A continuous random variable has infinite possible outcomes. A rate of return (e.g. growth rate) is continuous:

• a stock can grow by 9% next year or by 10%, and in between this range it could grow by 9.3%, 9.4%,
9.5%
• in between 9.3% and 9.4% the rate could be 9.31%, 9.32%, 9.33%, and in between 9.32% and 9.33% it
could grow 9.32478941%
• clearly there is no end to how precise the outcomes could be broken down; thus it’s described as a
continuous variable.

Outcomes in Discrete vs. Continuous Variables


The rule of thumb is that a discrete variable can have all possibilities listed out, while a continuous variable
must be expressed in terms of its upper and lower limits, and greater-than or less-than indicators. Of course,
listing out a large set of possible outcomes (which is usually the case for money variables) is usually impractical
– thus money variables will usually have outcomes expressed as if they were continuous.

Rates of return can theoretically range from –100% to positive infinity. Time is bound on the lower side by 0.
Market price of a security will also have a lower limit of $0, while its upper limit will depend on the security –
stocks have no upper limit (thus a stock price’s outcome > $0), but bond prices are more complicated, bound by
factors such as time-to-maturity and embedded call options. If a face value of a bond is $1,000, there’s an upper
limit (somewhere above $1,000) above which the price of the bond will not go, but pinpointing the upper value
of that set is imprecise.

Probability Function
A probability function gives the probabilities that a random variable will take on a given list of specific values.
For a discrete variable, if (x1, x2, x3, x4 …) are the complete set of possible outcomes, p(x) indicates the chances
that X will be equal to x. Each x in the list for a discrete variable will have a p(x). For a continuous variable, a
probability function is expressed as f(x).

The two key properties of a probability function, p(x) (or f(x) for continuous), are the following:

1. 0 < p(x) < 1, since probability must always be between 0 and 1.


2. Add up all probabilities of all distinct possible outcomes of a random variable, and the sum must equal
1.

Determining whether a function satisfies the first property should be easy to spot since we know that
probabilities always lie between 0 and 1. In other words, p(x) could never be 1.4 or –0.2. To illustrate the
second property, say we are given a set of three possibilities for X: (1, 2, 3) and a set of three for Y: (6, 7, 8),
and given the probability functions f(x) and g(y).
x f(x) y g(y)
1 0.31 6 0.32
2 0.43 7 0.40
3 0.26 8 0.23

For all possibilities of f(x), the sum is 0.31+0.43+0.26=1, so we know it is a valid probability function. For all
possibilities of g(y), the sum is 0.32+0.40+0.23 = 0.95, which violates our second principle. Either the given
probabilities for g(y) are wrong, or there is a fourth possibility for y where g(y) = 0.05. Either way it needs to
sum to 1.

Probability Density Function


A probability density function (or pdf) describes a probability function in the case of a continuous random
variable. Also known as simply the “density”, a probability density function is denoted by “f(x)”. Since a pdf
refers to a continuous random variable, its probabilities would be expressed as ranges of variables rather than
probabilities assigned to individual values as is done for a discrete variable. For example, if a stock has a 20%
chance of a negative return, the pdf in its simplest terms could be expressed as:

x f(x)
<0 0.2
>0 0.8
2.17 - Common Probability Distribution Calculations

Cumulative Distribution Functions


A cumulative distribution function or CDF, expresses a probability’s function in terms of lowest to highest
value, by giving the probability that a random variable X is less than or equal to a particular value x. Expressed
in shorthand, the cumulative distribution function is P(X < x). A
cumulative distribution function is constructed by summing up,
or cumulating all values in the probability function that are less
than or equal to x. The concept is similar to the cumulative
relative frequency covered earlier in this study guide, which
computed values below a certain point in a frequency distribution.

Example: Cumulative Distribution Function


For example, the following probability distribution includes the cumulative function.

X=x P(X = x) P(X < x) or cdf


< –12 0.15 0.15
–12 to –3 0.15 0.30
–3 to 4 0.25 0.55
4 to 10 0.25 0.80
> 10 0.2 1.0

From the table, we find that the probability that x is less than or equal to 4 is 0.55, the summed probabilities of
the first three P(X) terms, or the number found in the cdf column for the third row, where x < 4. Sometimes a
question might ask for the probability of x being greater than 4, for which this problem is 1 – P(x < 4) = 1 –
0.55 = 0.45. This is a question most people should get – but one that will still have too many people answering
0.55 because they weren’t paying attention to the "greater than".

Discrete Uniform Random Variable


A discrete uniform random variable is one that fulfills the definition of "discrete", where there are a finite and
countable number of terms, along with the definition of "uniform", where there is an equally likely probability
that the random variable X will take any of its possible values x. If there are n possible values for a discrete
uniform random variable, the probability of a specific outcome is 1/n.

Example: Discrete Uniform Random Variable


Earlier we provided an example of a discrete uniform random variable: a random day is one-seventh likely to
fall on a Sunday. To illustrate some examples on how probabilities are calculated, take the following discrete
uniform distribution with n = 5.

X=x P(X = x) P(X < x)

2 0.2 0.2

4 0.2 0.4

6 0.2 0.6

8 0.2 0.8

10 0.2 1.0

According to the distribution above, we have the probability of x = 8 as 0.2. The probability of x = 2 is the
same, 0.2.

Suppose that the question called for P(4 < X < 8). The answer would be the sum of P(4) + P(6) + P(8) = 0.2 +
0.2 + 0.2 = 0.6.

Suppose the question called for P(4 < X < 8). In this case, the answer would omit P(4) and P(8) since it’s less
than, NOT less than or equal to, and the correct answer would be P(6) = 0.2. The CFA exam writers love to test
whether you are paying attention to details and will try to trick you – the probability of such tactics is pretty
much a 1.0!

Binomial Random Variable


Binomial probability distributions are used when the context calls for assessing two outcomes, such as
"success/failure", or "price moved up/price moved down". In such situations where the possible outcomes are
binary, we can develop an estimate of a binomial random variable by holding a number of repeating trials (also
known as "Bernoulli trials"). In a Bernoulli trial, p is the probability of success, (1 – p) is the probability of
failure. Suppose that a number of Bernoulli trails are held, with the number denoted by n. A binomial random
variable X is defined as the numberof successes in n Bernoulli trials, given two simplifying assumptions: (1) the
probabilityp of success is the same for all trials and (2) the trials are independent of each other.

Thus, a binomial random variable is described by two parameters: p (the probability of success of one trial) and
n (the number of trials). A binomial probability distribution with p = 0.50 (equal chance of success or failure)
and n = 4 would appear as:

x (# of successes) p(x) cdf, P(X < x)

0 0.0625 0.0625

1 0.25 0.3125

2 0.375 0.6875

3 0.25 0.9325

4 0.0625 1.0000

The reference text demonstrates how to construct a binomial probability distribution by using the formula p(x) =
(n!/(n – x)!x!)*(px)*(1 – p)n-x. We used this formula to assemble the above data, though the exam would
probably not expect you to create each p(x); it would probably provide you with the table, and ask for an
interpretation. For this table, the probability of exactly one success is 0.25; the probability of three or fewer
successes is 0.9325 (the cdf value in the row where x = 3); the probability of at least one is 0.9325 (1 – P(0)) =
(1 – 0.0625) = 0.9325.

Calculations
The expected value of a binomial random variable is given by the formula n*p. In the example above, with n =
4 and p = 0.5, the expected value would be 4*0.5, or 2.

The variance of a binomial random variable is calculated by the formula n*p*(1 – p). Using the same example,
we have variance of 4*0.5*0.5 = 1.

If our binomial random variable still had n = 4 but with a greater predictability in the trial, say p = 9, our
variance would reduce to 4*0.9*0.1 = 0.36. For successive trials (i.e. for higher n), both mean and variance
increase but variance increases at a lower rate – thus the higher the n, the better the model works at predicting
probability.

Creating a Binomial Tree


The binomial tree is essentially a diagram showing that the future value of a stock is the product of a series of
up or down movements leading to a growing number of possible outcomes. Each possible value is called a
node.

Figure 2.9: Binomial Tree


Continuous Uniform Distribution
A continuous uniform distribution describes a range of outcomes, usually bound with an upper and lower limit,
where any point in the range is a possibility. Since it is a range, there are infinite possibilities within the range.
In addition, all outcomes are all equally likely (i.e. they are spread uniformly throughout the range).

To calculate probabilities, find the area under a pdf curve such as the one graphed here. In this example, what is
the probability that the random variable will be between 1 and 3? The area would be a rectangle with a width of
2 (the distance between 1 and 3), and height of 0.2, 2*0.2 = 0.4.

What is the probability that x is less than 3? The rectangle would have a width of 3 and the same height: 0.2.
3*0.2 = 0.6
2.18 - Common Probability Distribution Properties
Normal Distribution
The normal distribution is a continuous probability distribution that, when graphed as a probability density,
takes the form of the so-called bell-shaped curve. The bell shape results from the fact that, while the range of
possible outcomes is infinite (negative infinity to positive infinity), most of the potential outcomes tend to be
clustered relatively close to the distribution’s mean value. Just how close they are clustered is given by the
standard deviation. In other words, a normal distribution is described completely by two parameters: its mean
(μ) and its standard deviation (σ).

Here are other defining characteristics of the normal distribution:


it is symmetric, meaning the mean value divides the distribution
in half and one side is the exact mirror image of the other –that
is, skewness = 0. Symmetry also requires that mean = median =
mode. Its kurtosis (measure of peakedness) is 3 and its excess kurtosis (kurtosis – 3) equals 0. Also, if given 2
or more normally distributed random variables, the linear combination must also be normally distributed.

While any normal distribution will share these defining characteristics, the mean and standard deviation will be
unique to the random variable, and these differences will affect the shape of the distribution. On the following
page are two normal distributions, each with the same mean, but the distribution with the dotted line has a
higher standard deviation.

Univariate vs. Multivariate Distributions


A univariate distribution specifies probabilities for a single random variable, while a multivariate distribution
combines the outcomes of a group of random variables and summarizes probabilities for the group. For
example, a stock will have a distribution of possible return outcomes; those outcomes when summarized would
be in a univariable distribution. A portfolio of 20 stocks could have return outcomes described in terms of 20
separate univariate distributions, or as one multivariate distribution.
Earlier we indicated that a normal distribution is completely described by two parameters: its mean and standard
deviation. This statement is true of a univariate distribution. For models of multivariate returns, the mean and
standard deviation of each variable do not completely describe the multivariate set. A third parameter is
required, the correlation, or co-movement, between each pair of variables in the set. For example, if a
multivariate return distribution was being assembled for a portfolio of stocks, and a number of pairs were found
to be inversely related (i.e. one increases at the same time the other decreases), then we must consider the
overall effect on portfolio variance. For a group of assets that are not completely positively related, there is the
opportunity to reduce overall risk (variance) as a result of the interrelationships.

For a portfolio distribution with n stocks, the multivariate distribution is completely described by the n mean
returns, the n standard deviations and the n*(n – 1)/2 correlations. For a 20-stock portfolio, that’s 20 lists of
returns, 20 lists of variances of return and 20*19/2, or 190 correlations.
2.19 - Confidence Intervals
While a normally-distributed random variable can have
many potential outcomes, the shape of its distribution
gives us confidence that the vast majority of these
outcomes will fall relatively close to its mean. In fact, we
can quantify just how confident we are. By
using confidence intervals - ranges that are a function of
the properties of a normal bell-shaped curve - we can
define ranges of probabilities.

The diagram below has a number of percentages – these


numbers (which are approximations and rounded off)
indicate the probability that a random outcome will fall
into that particular section below the curve.

In other words, by assuming normal distribution, we are 68% confident that a variable will fall within one
standard deviation. Within two standard deviation intervals, our confidence grows to 95%. Within three
standard deviations, 99%. Take an example of a distribution of returns of a security with a mean of 10% and a
standard deviation of 5%:

• 68% of the returns will be between 5% and 15% (within 1 standard deviation, 10 + 5).
• 95% of the returns will be between 0% and 20% (within 2 std. devs., 10 + 2*5).
• 99% of the returns will be between –5% and 25% (within 3 std. devs., 10 + 3*5)

Standard Normal Distribution


Standard normal distribution is defined as a normal distribution where mean = 0 and standard deviation = 1.
Probability numbers derived from the standard normal distribution are used to help standardize a random
variable – i.e. express that number in terms of how many standard deviations it is away from its mean.

Standardizing a random variable X is done by subtracting X from the mean value (μ), and then dividing the
result by the standard deviation (σ). The result is a standard normal random variable which is denoted by the
letter Z.

Formula 2.31

Z = (X – μ)/σ

Example 1:
If a distribution has a mean of 10 and standard deviation of 5, and a random observation X is –2, we would
standardize our random variable with the equation for Z.

Z = (X – μ)/ σ = (–2 – 10)/5 = –12/5 = –2.4

The standard normal random variable Z tells us how many standard deviations the observation is from the
mean. In this case, –2 translates to 2.4 standard deviations away from 10.

Example 2:
You are considering an investment portfolio with an expected return of 10% and a standard deviation of 8%.
The portfolio's returns are normally distributed. What is the probability of earning a return less than 2%?

Again, we'd start with standardizing random variable X, which in this case is 10%:

Z = (X – μ)/ σ = (2 – 10)/8 = –8/8 = –1.0

Next, one would often consult a Z-table for cumulative probabilities for a standard normal distribution in order
to determine the probability. In this case, for Z = -1, P(Z ≤ x) – 0.158655, or 16%.

Therefore, there is a 16% probability of earning a return of less than 2%.

Keep in mind that your upcoming exam will not provide Z-tables, so, how would you solve this problem on test
day?

The answer is that you need to remember that 68% of observations fall + 1 standard deviation on a normal
curve, which means that 32% are not within one standard deviation. This question essentially asked for
probability of more than one standard deviation below, or 32%/2 = 16%. Study the earlier diagram that shows
specific percentages for certain standard deviation intervals on a normal curve – in particular, remember 68%
for + one away, and remember 95% for + two away.

Shortfall Risk
Shortfall risk is essentially a refinement of the modern-day development of mean-variance analysis, that is, the
idea that one must focus on both risk and return as opposed to simply the return. Risk is typically measured by
standard deviation, which measures all deviations – i.e. both positive and negative. In other words, positive
deviations are treated as if they were equal to negative deviations. In the real world, of course, negative
surprises are far more important to quantify and predict with clarity if one is to accurately define risk. Two
mutual funds could have the same risk if measured by standard deviation, but if one of those funds tends to have
more extreme negative outcomes, while the other had a high standard deviation due to a preponderance of
extreme positive surprises, then the actual risk profiles of those funds would be quite different. Shortfall risk
defines a minimum acceptable level, and then focuses on whether a portfolio will fall below that level over a
given time period.

Roy's Safety-First Ratio


An optimal portfolio is one that minimizes the probability that the portfolio's return will fall below a threshold
level. In probability notation, if RP is the return on the portfolio, and RL is the threshold (the minimum
acceptable return), then the portfolio for which P(RP < RL) is minimized will be the optimal portfolio according
to Roy's safety-first criterion. The safety-first ratio helps compute this level by giving the number of standard
deviations between the expected level and the minimum acceptable level, with the higher number considered
safer.

Formula 2.32

SFRatio = (E(RP) – RL)/ σP

Example: Roy's Safety First Ratio


Let's say our minimum threshold is –2%, and we have the following expectations for portfolios A and B:

Portfolio A Portfolio B
Expected Annual Return 8% 12%
Standard Deviation 10% 16%

Answer:
The SFRatio for portfolio A is (8 – (–2))/10 = 1.0
The SFRatio for portfolio B is (12 – (–2))/16 = 0.875

In other words, the minimum threshold is one standard deviation away in Portfolio A, and just 0.875 away in
Portfolio B, so by safety-first rules we opt for Portfolio A.

Lognormal Distributions
A lognormal distribution has two distinct properties: it is always positive (bounded on the left by zero), and it is
skewed to the right. Prices for stocks and many other financial assets (anything which by definition can never
be negative) are often found to be lognormally distributed. Also, the lognormal and normal distributions are
related: if a random variable X is lognormally distributed, then its natural log, ln(X) is normally distributed.
(Thus the term “lognormal” – the log is normal.) Figure 2.11 below demonstrates a typical lognormal
distribution.
2.20 - Discrete and Continuous Compounding
In discrete compounded rates of return, time moves
forward in increments, with each increment having a rate
of return (ending price / beginning price) equal to 1. Of
course, the more frequent the compounding, the higher
the rate of return. Take a security that is expected to
return 12% annually:

• With annual holding periods, 12% compounded


once = (1.12)1 – 1 = 12%.
• With quarterly holding periods, 3% compounded
4 times = (1.03)4 – 1 = 12.55%
• With monthly holding periods, 1% compounded
12 times = (1.01)12 – 1 = 12.68%
• With daily holding periods, (12/365) compounded 365 times = 12.7475%
• With hourly holding periods, (12/(365*24) compounded (365*24) times = 12.7496%

With greater frequency of compounding (i.e. as holding periods become smaller and smaller) the effective rate
gradually increases but in smaller and smaller amounts. Extending this further, we can reduce holding periods
so that they are sliced smaller and smaller so they approach zero, at which point we have the continuously
compounded rate of return. Discrete compounding relates to measurable holding periods and a finite number of
holding periods. Continuous compounding relates to holding periods so small they cannot be measured, with
frequency of compounding so large it goes to infinity.

The continuous rate associated with a holding period is found by taking the natural log of 1 + holding-period
return) Say the holding period is one year and holding-period return is 12%:

ln (1.12) = 11.33% (approx.)

In other words, if 11.33% were continuously compounded, its effective rate of return would be about 12%.

Earlier we found that 12% compounded hourly comes to about 12.7496%. In fact, e (the transcendental number)
raised to the 0.12 power yields 12.7497% (approximately).

As we've stated previously, actual calculations of natural logs are not likely for answering a question as they
give an unfair advantage to those with higher function calculators. At the same time, an exam problem can test
knowledge of a relationship without requiring the calculation. For example, a question could ask:

Q. A portfolio returned 5% over one year, if continuously compounded, this is equivalent to ____?

A. ln 5
B. ln 1.05
C. e5
D. e1.05

The answer would be B based on the definition of continuous compounding. A financial function calculator or
spreadsheet could yield the actual percentage of 4.879%, but wouldn't be necessary to answer the question
correctly on the exam.

Monte Carlo Simulation


A Monte Carlo Simulation refers to a computer-generated series of trials where the probabilities for both risk
and reward are tested repeatedly in an effort to help define these parameters. These simulations are
characterized by large numbers of trials – typically hundreds or even thousands of iterations, which is why it's
typically described as “computer generated”. Also know that Monte Carlo simulations rely on random numbers
to generate a series of samples.

Monte Carlo simulations are used in a number of applications, often as a complement to other risk-assessment
techniques in an effort to further define potential risk. For example, a pension-benefit administrator in charge of
managing assets and liabilities for a large plan may use computer software with Monte Carlo simulation to help
understand any potential downside risk over time, and how changes in investment policy (e.g. higher or lower
allocations to certain asset classes, or the introduction of a new manager) may affect the plan. While traditional
analysis focuses on returns, variances and correlations between assets, a Monte Carlo simulation can help
introduce other pertinent economic variables (e.g. interest rates, GDP growth and foreign exchange rates) into
the simulation.

Monte Carlo simulations are also important in pricing derivative securities for which there are no existing
analytical methods. European- and Asian-style options are priced with Monte Carlo methods, as are certain
mortgage-backed securities for which the embedded options (e.g. prepayment assumptions) are very complex.

A general outline for developing a Monte Carlo simulation involves the following steps (please note that we are
oversimplifying a process that is often highly technical):

1. Identify all variables about which we are interested, the time horizon of the analysis and the distribution
of all risk factors associated with each variable.
2. Draw K random numbers using a spreadsheet generator. Each random variable would then be
standardized so we have Z1, Z2, Z3… ZK.
3. Simulate the possible values of the random variable by calculating its observed value with Z1, Z2, Z3…
ZK.
4. Following a large number of iterations, estimate each variable and quantity of interest to complete one
trial. Go back and complete additional trials to develop more accurate estimates.

Historical Simulation
Historical simulation, or back simulation, follows a similar process for large numbers of iterations, with
historical simulation drawing from the previous record of that variable (e.g. past returns for a mutual fund).
While both of these methods are very useful in developing a more meaningful and in-depth analysis of a
complex system, it's important to recognize that they are basically statistical estimates; that is, they are not as
analytical as (for example) the use of a correlation matrix to understand portfolio returns. Such simulations tend
to work best when the input risk parameters are well defined.
<< Back
Next >>
2.21 - Sampling and Estimation
A data sample, or subset of a larger population, is used to help understand the behavior and characteristics of
the entire population. In the investing world, for example, all of the familiar stock market averages are samples
designed to represent the broader stock market and indicate its performance return. For the domestic publicly-
traded stock market, populated with at least 10,000 or more companies, the Dow Jones Industrial
Average (DJIA) has just 30 representatives; the S&P 500 has 500. Yet these samples are taken as valid
indicators of the broader population. It's important to understand the mechanics of sampling and estimating,
particularly as they apply to financial variables, and have the insight to critique the quality of research derived
from sampling efforts.

BASICS

Simple Random Sampling


To begin the process of drawing samples from a larger population, an analyst must craft a sampling plan, which
indicates exactly how the sample was selected. With a large population, different samples will yield different
results, and the idea is to create a consistent and unbiased approach. Simple random sampling is the most basic
approach to the problem. It draws a representative sample with the principle that every member of the
population must have an equal chance of being selected. The key to simple random sampling is assuring
randomness when drawing the sample. This requirement is achieved a number of ways, most rigorously by first
coding every member of the population with a number, and then using a random number generator to choose a
subset.

Sometimes it is impractical or impossible to label every single member of an entire population, in which case
systematic sampling methods are used. For example, take a case where we wanted to research whether the S&P
500 companies were adding or laying off employees, but we didn't have the time or resources to contact all 500
human resources departments. We do have the time and resources for an in-depth study of a 25-company
sample. A systematic sampling approach would be to take an alphabetical list of the S&P 500 and contact every
25th company on the list, i.e. companies #25, #50, #75, etc., up until #500. This way we end up with 25
companies and it was done under a system that's approximately random and didn't favor a particular company or
industry.

Sampling Error
Suppose we polled our 25 companies and came away with a
conclusion that the typical S&P 500 firm will be adding
approximately 5% to their work force this fiscal year, and, as a result, we are optimistic about the health of the
economy. However, the daily news continues to indicate a fair number of layoffs at some companies and hiring
freezes at other firms, and we wonder whether this research has actually done its job. In other words, we suspect
sampling error: the difference between the statistic from our sample (5% job growth) and the population
parameter we were estimating (actual job growth).

Sampling Distribution
A sampling distribution is analogous to a population distribution: it describes the range of all possible values
that the sampling statistic can take. In the assessment of the quality of a sample, the approach usually involves
comparing the sampling distribution to the population distribution. We expect the sampling distribution to be a
pattern similar to the population distribution – that is, if a population is normally distributed, the sample should
also be normally distributed. If the sample is skewed when we were expecting a normal pattern with most of the
observations centered around the mean, it indicates potential problems with the sample and/or the methodology.

Stratified Random Sampling.


In a stratified random approach, a population is first divided into subpopulations or strata, based upon one or
more classification criteria. Within each stratum, a simple random sample is taken from those members (the
members of the subpopulation). The number to be sampled from each stratum depends on its size relative to the
population – that is, if a classification system results in three subgroups or strata, and Group A has 50% of the
population, and Group B and Group C have 25% each, the sample we draw must conform to the same relative
sizes (half of the sample from A, a quarter each from B and C). The samples taken from each strata are then
pooled together to form the overall sample.

The table below illustrates a stratified approach to improving our economic research on current hiring
expectations. In our earlier approach that randomly drew from all 500 companies, we may have accidentally
drawn too heavily from a sector doing well, and under-represented other areas. In stratified random sampling,
each of the 500 companies in the S&P 500 index is assigned to one of 12 sectors. Thus we have 12 strata, and
our sample of 25 companies is based on drawing from each of the 12 strata, in proportions relative to the
industry weights within the index. The S&P weightings are designed to replicate the domestic economy, which
is why financial services and health care (which are relatively more important sectors in today's economy) are
more heavily weighted than utilities. Within each sector, a random approach is used – for example, if there are
120 financial services companies and we need five financial companies for our research study, those five would
be selected via a random draw, or by a systematic approach (i.e. every 24th company on an alphabetical list of
the subgroup).

Percent
Percent of Companies to Companies to
Sector Sector of S&P
S&P 500 sample sample
500

Business Svcs 3.8% 1 Health Care 13.6% 4

Consumer 9.4% 2 Idstrl Mtls. 12.7% 3


Goods

Consumer 8.2% 2 Media 3.7% 1


Svcs

Energy 8.5% 2 Software 3.9% 1

Financial Svcs 20.1% 5 Telecomm 3.2% 1

Hardware 9.4% 2 Utilities 3.4% 1

Time-Series Data
Time series date refers to one variable taken over discrete, equally spaced periods of time. The distinguishing
feature of a time series is that it draws back on history to show how one variable has changed. Common
examples include historical quarterly returns on a stock or mutual fund for the last five years, earnings per share
on a stock each quarter for the last ten years or fluctuations in the market-to-book ratio on a stock over a 20-
year period. In every case, past time periods are examined.

Cross-Sectional Data
Cross section data typically focuses on one period of time and measures a particular variable across several
companies or industries. A cross-sectional study could focus on quarterly returns for all large-cap value mutual
funds in the first quarter of 2005, or this quarter's earnings-per-share estimates for all pharmaceutical firms, or
differences in the current market-to-book ratio for the largest 100 firms traded on the NYSE. We can see that
the actual variables being examined may be similar to a time-series analysis, with the difference being that a
single time period is the focus, and several companies, funds, etc. are involved in the study. The earlier example
of analyzing hiring plans at S&P 500 companies is a good example of cross-sectional research.

The Central Limit Theorem


The central limit theorem states that, for a population distribution with mean = μ and a finite variance σ2, the
sampling distribution will take on three important characteristics as the sample size becomes
large:

1. The sample mean will be approximately normally distributed.


2. The sample mean will be equal to the population mean (μ).
3. The sample variance will be equal to the population variance (σ2) divided by the size of the sample (n).

The first assumption - that the sample distribution will be normal - holds regardless of the distribution of the
underlying population. Thus the central limit theorem can help make probability estimates for a sample of a
non-normal population (e.g. skewed, lognormal), based on the fact that the sample mean for large sample sizes
will be a normal distribution. This tendency toward normally distributed series for large samples gives the
central limit theorem its most powerful attribute. The assumption of normality enables samples to be used in
constructing confidence intervals and to test hypotheses, as we will find when covering those subjects.

Exactly how large is large in terms of creating a large sample? Remember the number 30. According to the
reference text, that's the minimum number a sample must be before we can assume it is normally distributed.
Don't be surprised if a question asks how large a sample should be – should it be 20, 30, 40, or 50? It's an easy
way to test whether you've read the textbook, and if you remember 30, you score an easy correct answer.

Standard Error
The standard error is the standard deviation of the sample statistic. Earlier, we indicated that the sample
variance is the population variance divided by n (sample size). The formula for standard error was derived by
taking the positive square root of the variance.

If the population standard deviation is given, standard error is calculated by this ratio: population standard
deviation / square root of sample size, or σ/(n)1/2. If population standard deviation is unknown, the sample
standard deviation (s) is used to estimate it, and standard error = s/(n)1/2. Note that "n" in the denominator means
that the standard error becomes smaller as the sample size becomes larger, an important property to remember.

Point Estimate vs. Confidence Interval Population Parameters


A point estimate is one particular value that is used to estimate the underlying population parameter. For
example, the sample mean is essentially a point estimate of a population mean. However, because of the
presence of sampling error, sometimes it is more useful to start with this point estimate, and then establish a
range of values both above and below the point estimate. Next, by using the probability-numbers characteristic
of normally distributed variables, we can state the level of confidence we have that the actual population mean
will fall somewhere in our range. This process is knows as "constructing a confidence interval".

The level of confidence we want to establish is given by the number α, or alpha, which is the probability that a
point estimate will not fall in a confidence range. The lower the alpha, the more confident we want to be – e.g.
alpha of 5% indicates we want to be 95% confident; 1% alpha indicates 99% confidence.

Properties of an Estimator
The three desirable properties of an estimator are that they are unbiased, efficient and consistent:
1. Unbiased - The expected value (mean) of the estimate's sampling distribution is equal to the underlying
population parameter; that is, there is no upward or downward bias.
2. Efficiency - While there are many unbiased estimators of the same parameter, the most efficient has a
sampling distribution with the smallest variance.
3. Consistency - Larger sample sizes tend to produce more accurate estimates; that is, the sample parameter
converges on the population parameter.

Constructing Confidence Intervals


The general structure for a (1 – α) confidence interval is given by:

Formula 2.33

Where: the reliability factor increases as a function of an increasing confidence level.

In other words, if we want to be 99% confident that a parameter will fall within a range, we need to make
that interval wider than we would if we wanted to be only 90% confident. The actual reliability factors
used are derived from the standard normal distribution, or Z value, at probabilities of alpha/2 since the
interval is two-tailed, or above and below a point.

Degrees of Freedom
Degrees of freedom are used for determining the reliability-factor portion of the confidence interval with the t–
distribution. In finding sample variance, for any sample size n, degrees of freedom = n –
1. Thus for a sample size of 8, degrees of freedom are 7. For a sample size of 58, degrees of freedom are 57.
The concept of degrees of freedom is taken from the fact that a sample variance is based on a series of
observations, not all of which can be independently selected if we are to arrive at the true parameter. One
observation essentially depends on all the other observations. In other words, if the sample size is 58, think of
that sample of 58 in two parts: (a) 57 independent observations and (b) one dependent observation, on which the
value is essentially a residual number based on the other observations. Taken together, we have our estimates
for mean and variance. If degrees of freedom is 57, it means that we would be "free" to choose any 57
observations (i.e. sample size – 1), since there is always that 58th value that will result in a particular sample
mean for the entire group.

Characteristic of the t-distribution is that additional degrees of freedom reduce the range of the confidence
interval, and produce a more reliable estimate. Increasing degrees of freedom is done by increasing sample size.
For larger sample sizes, use of the z-statistic is an acceptable alternative to the t-distribution – this is true since
the z-statistic is based on the standard normal distribution, and the t-distribution moves closer to the standard
normal at higher degrees of freedom.

Student's t-distribution
Student's t-distribution is a series of symmetrical distributions, each distribution defined by its degrees of
freedom. All of the t-distributions appear similar in shape to a standard normal distribution, except that,
compared to a standard normal curve, the t-distributions are less peaked and have fatter tails. With each increase
in degrees of freedom, two properties change: (1) the distribution's peak increases (i.e. the probability that the
estimate will be closer to the mean increases), and (2) the tails (in other words, the parts of the curve far away
from the mean estimate) approach zero more quickly – i.e. there is a reduced probability of extreme values as
we increase degrees of freedom. As degrees of freedom become very large – as they approach infinity – the t-
distribution approximates the standard normal distribution.

Figure 2.12: Student's t-distribution

2.22 - Sampling Considerations


Sample Size
Increasing sample size benefits a research study by increasing the confidence and reliability of the confidence
interval, and as a result, the precision with which the population parameter can be estimated. Other choices
affect how wide or how narrow a confidence interval will be: choice of statistic, with t being wider/more
conservative than z, as well as degree of confidence, with lesser degrees such as 90% resulting in wider/more
conservative intervals than 99%. An increase in sample size tends to have an even more meaningful effect, due
to the formula for standard error (i.e. the ratio of 'sample standard deviation / sample size1/2'), resulting in the
fact that standard error varies inversely with sample size. As a result, more observations in the sample (all other
factors equal) improve the quality of a research study.

At the same time, two other factors tend to make larger sample sizes less desirable. The first consideration,
which primarily affects time-series data, is that
population parameters have a tendency to change over
time. For example, if we are studying a mutual fund and
using five years of quarterly returns in our analysis (i.e.
sample size of 20, 5 years x 4 quarters a year). The
resulting confidence interval appears too wide so in an
effort to increase precision, we use 20 years of data (80
observations). However, when we reach back into the
1980s to study this fund, it had a different fund manager,
plus it was buying more small-cap value companies,
whereas today it is a blend of growth and value, with mid
to large market caps. In addition, the factors affecting
today's stock market (and mutual fund returns) are much
different compared to back in the 1980s. In short, the
population parameters have changed over time, and data
from 20 years ago shouldn't be mixed with data from the most recent five years.

The other consideration is that increasing sample size can involve additional expenses. Take the example of
researching hiring plans at S&P 500 firms (cross-sectional research). A sample size of 25 was suggested, which
would involve contacting the human resources department of 25 firms. By increasing the sample size to 100, or
200 or higher, we do achieve stronger precision in making our conclusions, but at what cost? In many cross-
sectional studies, particularly in the real world, where each sample takes time and costs money, it's sufficient to
leave sample size at a certain lower level, as the additional precision isn't worth the additional cost.

Data Mining Bias


Data mining is the practice of searching through historical data in an effort to find significant patterns, with
which researchers can build a model and make conclusions on how this population will behave in the future.
For example, the so-called January effect, where stock market returns tend to be stronger in the month of
January, is a product of data mining: monthly returns on indexes going back 50 to 70 years were sorted and
compared against one another, and the patterns for the month of January were noted. Another well-known
conclusion from data mining is the 'Dogs of the Dow' strategy: each January, among the 30 companies in the
Dow industrials, buy the 10 with the highest dividend yields. Such a strategy outperforms the market over the
long run.

Bookshelves are filled with hundreds of such models that "guarantee" a winning investment strategy. Of course,
to borrow a common industry phrase, "past performance does not guarantee future results". Data-mining bias
refers to the errors that result from relying too heavily on data-mining practices. In other words, while some
patterns discovered in data mining are potentially useful, many others might just be coincidental and are not
likely to be repeated in the future - particularly in an "efficient" market. For example, we may not be able to
continue to profit from the January effect going forward, given that this phenomenon is so widely recognized.
As a result, stocks are bid for higher in November and December by market participants anticipating the
January effect, so that by the start of January, the effect is priced into stocks and one can no longer take
advantage of the model. Intergenerational data mining refers to the continued use of information already put
forth in prior financial research as a guide for testing the same patterns and overstating the same conclusions.

Distinguishing between valid models and valid conclusions, and those ideas that are purely coincidental and the
product of data mining, presents a significant challenge as data mining is often not easy to discover. A good
start to investigate for its presence is to conduct an out-of-sample test - in other words, researching whether the
model actually works for periods that do not overlap the time frame of the study. A valid model should continue
to be statistically significant even when out-of-model tests are conducted. For research that is the product of
data mining, a test outside of the model's time frame can often reveal its true nature. Other warning signs
involve the number of patterns or variables examined in the research - that is, did this study simply search
enough variables until something (anything) was finally discovered? Most academic research won't disclose the
number of variables or patterns tested in the study, but oftentimes there are verbal hints that can reveal the
presence of excessive data mining.

Above all, it helps when there is an economic rationale to explain why a pattern exists, as opposed to simply
pointing out that a pattern is there. For example, years ago a research study discovered that the market tended to
have positive returns in years that the NFC wins the Super Bowl, yet it would perform relatively poorly when
the AFC representative triumphs. However, there's no economic rationale for explaining why this pattern exists
- do people spend more, or companies build more, or investors invest more, based on the winner of a football
game? Yet the story is out there every Super Bowl week. Patterns discovered as a result of data mining may
make for interesting reading, but in the process of making decisions, care must be taken to ensure that mined
patterns not be blindly overused.

Sample Selection Bias


Many additional biases can adversely affect the quality and the usefulness of financial research. Sample-
selection bias refers to the tendency to exclude a certain part of a population simply because the data is not
available. As a result, we cannot state that the sample we've drawn is completely random - it is random only
within the subset on which historic data could be obtained.

Survivorship Bias
A common form of sample-selection bias in financial databases is survivorship bias, or the tendency for
financial and accounting databases to exclude information on companies, mutual funds, etc. that are no longer
in existence. As a result, certain conclusions can be made that may in fact be overstated were one to remove this
bias and include all members of the population. For example, many studies have pointed out the tendency of
companies with low price-to-book-value ratios to outperform those firms with higher P/BVs. However, these
studies most likely aren't going to include those firms that have failed; thus data is not available and there is
sample-selection bias. In the case of low and high P/BV, it stands to reason that companies in the midst of
declining and failing will probably be relatively low on the P/BV scale yet, based on the research, we would be
guided to buy these very same firms due to the historical pattern. It's likely that the gap between returns on low-
priced (value) stocks and high-priced (growth) stocks has been systematically overestimated as a result of
survivorship bias. Indeed, the investment industry has developed a number of growth and value indexes.
However, in terms of defining for certain which strategy (growth or value) is superior, the actual evidence is
mixed.

Sample selection bias extends to newer asset classes such as hedge funds, a heterogeneous group that is
somewhat more removed from regulation, and where public disclosure of performance is much more
discretionary compared to that of mutual funds or registered advisors of separately managed accounts. One
suspects that hedge funds will disclose only the data that makes the fund look good (self-selection bias),
compared to a more developed industry of mutual funds where the underperformers are still bound by certain
disclosure requirements.

Look-Ahead Bias
Research is guilty of look-ahead bias if is makes use of information that was not actually available on a
particular day, yet the researchers assume it was. Let's returning to the example of buying low price-to-book-
value companies; the research may assume that we buy our low P/BV portfolio on Jan 1 of a given year, and
then (compared to a high P/BV portfolio) hold it throughout the year. Unfortunately, while a firm's current stock
price is immediately available, the book value of the firm is generally not available until months after the start
of the year, when the firm files its official 10-K. To overcome this bias, one could construct P/BV ratios using
current price divided by the previous year's book value, or (as is done by Russell's indexes) wait until midyear
to rebalance after data is reported.

Time-Period Bias
This type of bias refers to an investment study that may appear to work over a specific time frame but may not
last in future time periods. For example, any research done in 1999 or 2000 that covered a trailing five-year
period may have touted the outperformance of high-risk growth strategies, while pointing to the mediocre
results of more conservative approaches. When these same studies are conducted today for a trailing 10-year
period, the conclusions might be quite different. Certain anomalies can persist for a period of several quarters or
even years, but research should ideally be tested in a number of different business cycles and market
environments in order to ensure that the conclusions aren't specific to one unique period or environment.
2.23 - Calculating Confidence Intervals
When population variance (σ2) is known, the z-statistic can be used to calculate a reliability factor. Relative to
the t-distribution, it will result in tighter confidence intervals and more reliable estimates of mean and standard
deviation. Z-values are based on the standard normal distribution.

For establishing confidence intervals when the population


variance is known, the interval is constructed with this formula:

Formula 2.34

For alpha of 5% (i.e. a 95% confidence interval), the reliability factor (Zα/2) is 1.96, but for a CFA exam
problem, it is usually sufficient to round to an even 2 to solve the problem. (Remember that z-value at 95%
confidence is 2, as tables for z-values are sometimes not provided!) Given a sample size of 16, a sample mean
of 20 and population standard deviation of 25, a 95% confidence interval would be 20 + 2*(25/(16)1/2) = 20 +
2*(25/4) = 20 + 12.5. In short, for this sample size and for these sample statistics, we would be 95% confident
that the actual population mean would fall in a range from 7.5 to 32.5.

Suppose that this 7.5-to-32.5 range was deemed too broad for our purposes. Reducing the confidence interval is
accomplished in two ways: (1) increasing sample size, and (2) decreasing our allowable level of confidence.

1. Increasing sample size from 16 to 100 - Our 95% confidence is now equal to 20 + 2*(25/(100)1/2) = 20 +
2*(25/10) = 20 + 5. In other words, increasing the sample size to 100 narrows the 95% confidence range: min
15 to max 25.

2. Using 90% confidence - Our interval is now equal to 20 + 1.65*(25/(100)1/2) = 20 + 1.65*(25/10) = 20 +


4.125. In other words, decreasing the percentage confidence to 90% reduces the range: min 15.875 to max
24.125.

When population variance is unknown, we will need to use the t-distribution to establish confidence intervals.
The t-statistic is more conservative; that is, it results in broader intervals. Assume the following sample
statistics: sample size = 16, sample mean = 20, sample standard deviation = 25.

To use the t-distribution, we must first calculate degrees of freedom, which for sample size 16 is equal to n – 1
= 15. Using an alpha of 5% (95% confidence interval), our confidence interval is 20 + (2.131) * (25/161/2),
which gives a range minimum of 6.68 and a range maximum of 33.32.

As before, we can reduce this range with (1) larger samples and/or (2) reducing allowable degree of confidence:

1. Increase sample size from 16 to 100 - The range is now equal to 20 + 2 * (25/10) à minimum 15 and
maximum 25 (for large sample sizes the t-distribution is sufficiently close to the z-value that it becomes an
acceptable alternative).

2. Reduce confidence from 95% to 90% - The range is now equal to 20 + 1.65 * (25/10) à minimum 15.875 and
maximum 24.125.

Large Sample Size


In our earlier discussion on the central limit theorem, we stated that large samples will tend to be normally
distributed even when the underlying population is non-normal. Moreover, at sufficiently large samples, where
there are enough degrees of freedom, the z and t statistics will provide approximately the same reliability factor
so we can default to the standard normal distribution and the z-statistic. The structure for the confidence interval
is similar to our previous examples.

For a 95% confidence interval, if sample size = 100, sample standard deviation = 10 and our point estimate is
15, the confidence interval is 15 + 2* (10/1001/2) or 15 + 2. We are 95% confident that the population mean will
fall between 13 and 17.

Suppose we wanted to construct a 99% confidence interval. Reliability factor now becomes 2.58 and we have
15 + 2.58*(10/1001/2) or 15 + 2.58, or a minimum of 12.42 and a maximum of 17.58.

The table below summarizes the statistics used in constructing confidence intervals, given various situations:

Distribution Population Sample Size Appropriate


Variance Statistic

Normal Known Small z

Normal Known Large z

Normal Unknown Small t

Normal Unknown Large t or z

Non-Normal Known Small unavailable

Non-Normal Known Large z

Non-Normal Unknown Small unavailable

Non-Normal Unknown Large t or z

Exam Tips and Tricks

While these calculations don't seem difficult, it's true that this material seems at times to run
together, particularly if a CFA candidate has never used it or hasn't studied it in some time.
While not likely to be a major point of emphasis, expect at least a few questions on confidence
intervals and in particular, a case study that will test basic knowledge of definitions, or that
will compare/contrast the two statistics presented (t-distribution and z-value) to make sure you
know which is useful in a given application. More than anything, the idea is to introduce
confidence intervals and how they are constructed as a prerequisite for hypothesis testing
2.24 - Hypothesis Testing
Hypothesis testing provides a basis for taking ideas or theories that someone initially develops about the
economy or investing or markets, and then deciding whether these ideas are true or false. More precisely,
hypothesis testing helps decide whether the tested ideas are probably true or probably false as the conclusions
made with the hypothesis-testing process are never made with 100% confidence – which we found in the
sampling and estimating process: we have degrees of confidence - e.g. 95% or 99% - but not absolute certainty.
Hypothesis testing is often associated with the procedure for acquiring and developing knowledge known as the
scientific method. As such, it relates the fields of
investment and economic research (i.e., business topics)
to other traditional branches of science (mathematics,
physics, medicine, etc.)

Hypothesis testing is similar in some respects to the


estimation processes presented in the previous section.
Indeed, the field of statistical inference, where
conclusions on a population are drawn from observing
subsets of the larger group, is generally divided into two
groups: estimation and hypothesis testing. With
estimation, the focus was on answering (with a degree of
confidence) the value of a parameter, or else a range
within which the parameter most likely falls. Think of estimating as working from general to specific. With
hypothesis testing, the focus is shifted: we start my making a statement about the parameter's value, and then the
question becomes whether the statement is true or not true. In other words, it starts with a specific value and
works the other way to make a general statement.

What is a Hypothesis?
A hypothesis is a statement made about a population parameter. These are typical hypotheses: "the mean annual
return of this mutual fund is greater than 12%", and "the mean return is greater than the average return for the
category". Stating the hypothesis is the initial step in a defined seven-step process for hypothesis testing – a
process developed based on the scientific method. We indicate each step below. In the remainder of this section
of the study guide, we develop a detailed explanation for how to answer each step's question.

Hypothesis testing seeks to answer seven questions:

1. What are the null hypothesis and the alternative hypothesis?


2. Which test statistic is appropriate, and what is the probability distribution?
3. What is the required level of significance?
4. What is the decision rule?
5. Based on the sample data, what is the value of the test statistic?
6. Do we reject or fail to reject the null hypothesis?
7. Based on our rejection or inability to reject, what is our investment or economic decision?

Null Hypothesis
Step #1 in our process involves stating the null and alternate hypothesis. The null hypothesis is the statement
that will be tested. The null hypothesis is usually denoted with "H0". For investment and economic research
applications, and as it relates to the CFA exam, the null hypothesis will be a statement on the value of a
population parameter, usually the mean value if a question relates to return, or the standard deviation if it relates
to risk. It can also refer to the value of any random variable (e.g. sales at company XYZ are at least $10 million
this quarter). In hypothesis testing, the null hypothesis is initially regarded to be true, until (based on our
process) we gather enough proof to either reject the null hypothesis, or fail to reject the null hypothesis.

Alternative Hypothesis
The alternative hypothesis is a statement that will be accepted as a result of the null hypothesis being rejected.
The alternative hypothesis is usually denoted "Ha". In hypothesis testing, we do not directly test the worthiness
of the alternate hypothesis, as our testing focus is on the null. Think of the alternative hypothesis as the residual
of the null – for example, if the null hypothesis states that sales at company XYZ are at least $10 million this
quarter, the alternative hypothesis to this null is that sales will fail to reach the $10 million mark. Between the
null and the alternative, it is necessary to account for all possible values of a parameter. In other words, if we
gather evidence to reject this null hypothesis, then we must necessarily accept the alternative. If we fail to reject
the null, then we are rejecting the alternative.

One-Tailed Test
The labels "one-tailed" and "two-tailed" refer to the standard normal distribution (as well as all of the t-
distributions). The key words for identifying a one-tailed test are "greater than or less than". For example, if our
hypothesis is that the annual return on this mutual fund will be greater than 8%, it's a one-tailed test that will be
rejected based only on finding observations in the left tail.

Figure 2.13 below illustrates a one-tailed test for "greater than" (rejection in left tail). (A one-tailed test for "less
than" would look similar to the graph below, with the rejection region for less than in the right tail rather than
the left.)
Two-Tailed test
Characterized by the words "equal to or not equal to". For example, if our hypothesis were that the return on a
mutual fund is equal to 8%, we could reject it based on observations in either tail (sufficiently higher than 8% or
sufficiently lower than 8%).

Choosing the null and the alternate hypothesis:


If θ (theta) is the actual value of a population parameter (e.g. mean or standard deviation), and θ0 (theta subzero)
is the value of theta according to our hypothesis, the null and alternative hypothesis can be formed in three
different ways:

Choosing what will be the null and what will be the alternative depends on the case and what it is we wish to
prove. We usually have two different approaches to what we could make the null and alternative, but in most
cases, it's preferable to make the null what we believe we can reject, and then attempt to reject it. For example,
in our case of a one-tailed test with the return hypothesized to be greater than 8%, we could make the greater-
than case the null (alternative being less than), or we could make the greater-than case the alternative (with less
than the null). Which should we choose? A hypothesis test is typically designed to look for evidence that may
possibly reject the null. So in this case, we would make the null hypothesis "the return is less than or equal to
8%", which means we are looking for observations in the left tail. If we reject the null, then the alternative is
true, and we conclude the fund is likely to return at least 8%.

Test Statistic
Step #2 in our seven-step process involves identifying an appropriate test statistic. In hypothesis testing, a test
statistic is defined as a quantity taken from a sample that is used as the basis for testing the null hypothesis
(rejecting or failing to reject the null).
Calculating a test statistic will vary based upon the case and our choice of probability distribution (for example,
t-test, z-value). The general format of the calculation is:

Formula 2.36

Test statistic = (sample statistic) - (value of parameter


according to null)
(Standard error of sample statistic)

Type I and Type II Errors


Step #3 in hypothesis testing involves specifying the significance level of our hypothesis test. The significance
level is similar in concept to the confidence level associated with estimating a parameter – both involve
choosing the probability of making an error (denoted by α, or alpha), with lower alphas reducing the percentage
probability of error. In the case of estimators, the tradeoff of reducing this error was to accept a wider (less
precise) confidence interval. In the case of hypothesis testing, choosing lower alphas also involves a tradeoff –
in this case, increasing a second type of error.

Errors in hypothesis testing come in two forms: Type I and Type II. A type I error is defined as rejecting the
null hypothesis when it is true. A type II error is defined as not rejecting the null hypothesis when it is false. As
the table below indicates, these errors represent two of the four possible outcomes of a hypothesis test:

The reason for separating type I and type II errors is that, depending on the case, there can be serious
consequences for a type I error, and there are other cases when type II errors need to be avoided, and it is
important to understand which type is more important to avoid.

Significance Level
Denoted by α, or alpha, the significance level is the probability of making a type I error, or the probability that
we will reject the null hypothesis when it is true. So if we choose a significance level of 0.05, it means there is a
5% chance of making a type I error. A 0.01 significance level means there is just a 1% chance of making a type
I error. As a rule, a significance level is specified prior to calculating the test statistic, as the analyst conducting
the research may use the result of the test statistic calculation to impact the choice of significance level (may
prompt a change to higher or lower significance). Such a change would take away from the objectivity of the
test.

While any level of alpha is permissible, in practice there is likely to be one of three possibilities for significance
level: 0.10 (semi-strong evidence for rejecting the null hypothesis), 0.05 (strong evidence), and 0.01 (very
strong evidence). Why wouldn't't we always opt for 0.01 or even lower probabilities of type I errors – isn't the
idea to reduce and eliminate errors? In hypothesis testing, we have to control two types of errors, with a tradeoff
that when one type is reduced, the other type is increased. In other words, by lowering the chances of a type I
error, we must reject the null less frequently – including when it is false (a type II error). Actually quantifying
this tradeoff is impossible because the probability of a type II error (denoted by β, or beta) is not easy to define
(i.e. it changes for each value of θ). Only by increasing sample size can we reduce the probability of both types
of errors.

Decision Rule
Step #4 in the hypothesis-testing process requires stating a decision rule. This rule is crafted by comparing two
values: (1) the result of the calculated value of the test statistic, which we will complete in step #5 and (2) a
rejection point, or critical value (or values) that is (are) the function of our significance level and the probability
distribution being used in the test. If the calculated value of the test statistic is as extreme (or more extreme)
than the rejection point, then we reject the null hypothesis, and state that the result is statistically significant.
Otherwise, if the test statistic does not reach the rejection point, then we cannot reject the null hypothesis and
we state that the result is not statistically significant. A rejection point depends on the probability distribution,
on the chosen alpha, and on whether the test in one-tailed or two-tailed.

For example, if in our case we are able to use the standard normal distribution (the z-value), if we choose an
alpha of 0.05, and we have a two-tailed test (i.e. reject the null hypothesis when the test statistic is either above
or below), the two rejection points are taken from the z-values for standard normal distributions: below -1.96
and above +1.96. Thus if the calculated test statistic is in these two rejection ranges, the decision would be to
reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

Look Out!

Traditionally, it was said that we accepted the null hypothesis; however, the authors have
discouraged use of the word "accept", in terms of accepting the null hypothesis, as those
terms imply a greater degree of conviction about the null than is warranted. Having made the
effort to make this distinction, do not be surprised if this subtle change (which seems
inconsequential on the surface) somehow finds its way onto the CFA exam (if you answer
"accept the null hypothesis", you get the question wrong, and if you answer "fail to reject the
null hypothesis" you score points.

Power of a Test
The power of a hypothesis test refers to the probability of correctly rejecting the null hypothesis. There are two
possible outcomes when the null hypothesis is false: either we (1) reject it (as we correctly should) or (2) we
accept it – and make a type II error. Thus the power of a test is also equivalent to 1 minus the beta (β), the
probability of a type II error. Since beta isn't quantified, neither is the power of a test. For hypothesis tests, it is
sufficient to specify significance level, or alpha. However, given a choice between more than one test statistic
(for example, z-test, t-test), we will always choose the test that increases a test's power, all other factors equal.

Confidence Intervals vs. Hypothesis Tests


Confidence intervals, as a basis for estimating population parameters, were constructed as a function of
"number of standard deviations away from the mean". For example, for 95% confidence that our interval will
include the population mean (μ), when we use the standard normal distribution (z-statistic), the interval is:
(sample mean) ± 1.96 * (standard error), or, equivalently,-1.96*(standard error) < (sample mean) <
+1.96*(standard error).

Hypothesis tests, as a basis for testing the value of population parameters, are also set up to reject or not reject
based on "number of standard deviations away from the mean". The basic structure for testing the null
hypothesis at the 5% significance level, again using the standard normal, is -1.96 < [(sample mean –
hypothesized population mean) / standard error] < +1.96, or, equivalently,-1.96 * (std. error) < (sample mean) –
(hypo. pop. mean) < +1.96 * (std. error).

In hypothesis testing, we essentially create an interval within which the null will not be rejected, and we are
95% confident in this interval (i.e. there's a 5% chance of a type I error). By slightly rearranging terms, the
structure for a confidence interval and the structure for rejecting/not rejecting a null hypothesis appear very
similar – an indication of the relationship between the concepts.

Making a Statistical Decision


Step #6 in hypothesis testing involves making the statistical decision, which actually compares the test statistic
to the value computed as the rejection point; that is, it carries out the decision rule created in step #4. For
example, with a significance level of 0.05, using the standard normal distribution, on a two-tailed test (i.e. null
is "equal to"; alternative is not equal to), we have rejection points below –1.96 and above +1.96. If our
calculated test statistic
[(sample mean – hypothesized mean) / standard error] = 0.6, then we cannot reject the null hypothesis. If the
calculated value is 3.6, we reject the null hypothesis and accept the alternative.

The final step, or step #7, involves making the investment or economic decision (i.e. the real-world decision). In
this context, the statistical decision is but one of many considerations. For example, take a case where we
created a hypothesis test to determine whether a mutual fund outperformed its peers in a statistically significant
manner. For this test, the null hypothesis was that the fund's mean annual return was less than or equal to a
category average; the alternative was that it was greater than the average. Assume that at a significance level of
0.05, we were able to establish statistical significance and reject the null hypothesis, thus accepting the
alternative. In other words, our statistical decision was that this fund would outperform peers, but what is the
investment decision? The investment decision would likely take into account (for example) the risk tolerance of
the client and the volatility (risk) measures of the fund, and it would assess whether transaction costs and tax
implications make the investment decision worth making. In other words, rejecting/not rejecting a null
hypothesis does not automatically require that a decision be carried out; thus there is the need to assess the
statistical decision and the economic or investment decision in two separate steps.
2.25 - Interpreting Statistical Results
Results Where Data is Normally Distributed and Variance is Known or Unknown

1. Whenever variance of a population (σ2) is known,


the z-test is the preferred alternative to test a
hypothesis of the population mean (μ). To
compute the test statistic, standard error is equal to
population standard deviation / sq. root of sample
size. For example, with a population variance of
64 and a sample size of 25, standard error is equal
to (64)1/2/(25)1/2, or 1.6.

Example: Test Statistic


Suppose that in this same case we have
constructed a hypothesis test that the mean annual
return is equal to 12%; that is, we have a two-
tailed test, where the null hypothesis is that the
population mean = 12, and the alternate is that it is not equal to 12. Using a 0.05 critical level (0.025 for
each tail), our rule is to reject the null when the test statistic is either below –1.96 or above +1.96 (at p
= .025, z = 1.96). Suppose sample mean = 10.6.

Answer:
Test statistic = (10.6 – 12)/1.6 = -1.4/1.6 = -0.875. This value does not fall below the rejection point, so
we cannot reject the null hypothesis with statistical certainty.
2. When we are making hypothesis tests on a population mean, it's relatively likely that the population
variance will be unknown. In these cases, we use a sample standard deviation when computing standard
error, and the t-statistic for the decision rule (i.e. as the source for our rejection level). Compared to the z
or standard normal, a t-statistic is more conservative (i.e. higher rejection points for rejecting the null
hypothesis). In cases with large sample sizes (at least 30), the z-statistic may be substituted.

Example:
Take a case where sample size is 16. In this case, the t-stat is the only appropriate choice. For the t-
distribution, degrees of freedom are calculated as (sample size – 1), df = 15 in this example. In this case,
assume we are testing a hypothesis that a population mean is greater than 8, so this will be a one-tailed
test (right tail): null hypothesis is μ < 8, and the alternative is that μ > 8. Our required significance level
is 0.05. Using the table for Student's t-distribution for df = 15 and p = 0.05, the critical value (rejection
point) is 1.753. In other words, if our calculated test statistic is greater than 1.753, we reject the null
hypothesis.

Answer:
Moving to step 5 of the hypothesis-testing process, we take a sample where the mean is 8.3 and the
standard deviation is 6.1. For this sample, standard error = s /n1/2 = 6.1/(16)1/2 = 6.1/4 = 1.53. The test
statistic is (8.3 – 8.0)/1.53 = 0.3/1.53, or 0.196. Comparing 0.196 to our rejection point of 1.753, we are
unable to reject the null hypothesis.

Note that in this case, our sample mean of 8.3 was actually greater than 8; however, the hypothesis test
is set up to require statistical significance, not simply compare a sample mean to the hypothesis. In
other words, the decisions made in hypothesis testing are also a function of sample size (which at 16 is
low), the standard deviation, the required level of significance and the t-distribution. Our interpretation
in this example is that the 8.3 from the sample mean, while nominally higher than 8, simply isn't
significantly higher than 8, at least to the point where we would be able to definitively make a
conclusion regarding the population mean being greater than 8.

Relative equality of population means of two normally distributed populations, where independent
random sample assumed variances are equal or unequal
For the case where the population variances for two separate groups can be assumed to be equal, a technique for
pooling an estimate of population variance (s2) from the sample data is given by the following formula (assumes
two independent random samples):

Formula 2.37

Where: n1, n2 are samples sizes, and s12, s22 are sample variances.
Degrees of freedom = n1 + n2 – 2

For testing equality of two population means (i.e. μ1 = μ2), the test statistic calculates the difference in sample
means (X1 – X2), divided by the standard error: the square root of (s2/n1 + s2/n2).

Example: Population Means


Assume that the pooled estimate of variance (s2) was 40 and sample size for each group was 20. Standard error
= (40/20 + 40/20)1/2 = (80/20) ½ = 2.

Answer:
If sample means were 8.6 and 8.9, the t = (8.6 – 8.9)/2 = -0.3/2 = -0.15. Tests of equality/inequality are two-
sided tests. With df = 38 (sum of samples sizes – 2) and if we assume 0.05 significance (p = 0.025), the
rejection level is t < -2.024, or t > +2.024. Since our computed test statistic was –0.15, we cannot reject the null
hypothesis that these population means are equal.
1. For hypothesis tests of equal population means where variances cannot be assumed to be equal, the
appropriate test statistic for the hypothesis is the t-stat, but we can no longer pool an estimate of standard
deviation, and the standard error becomes the square root of [(s12/n1) + (s22/n2)]. The null hypothesis remains μ1
= μ2, and the test statistic is calculated similar to the previous example (i.e. difference in sample means /
standard error). Computing degrees of freedom is approximated by this formula

Look Out!

Note: Don't spend time memorizing this formula; it won't be


required for the exam. Focus instead on the steps of hypothesis
testing and interpreting results.

The Paired-Comparisons Test


The previous example tested the equality or inequality of two population means, with a key assumption that the
two populations were independent of each other. In a paired-comparisons test, the two populations have some
degree of correlation or co-movement, and the calculation of test statistic takes account of this correlation.

Take a case where we are comparing two mutual funds that are both classified as large-cap growth, in which we
are testing whether returns for one are significantly above the other (statistically significant). The paired-
comparisons test is appropriate since we assume some degree of correlation, as returns for each will be
dependent on the market. To calculate the t-statistic, we first find the sample mean difference, denoted by d:

d = (1/n)(d1 + d2 + d3 …. + dn), where n is the number of paired observations (in our example, the number of
quarters for which we have quarterly returns), and each d is the difference between each observation in the
sample. Next, sample variance, or (sum of all deviations from d )2/(n – 1) is calculated, with standard deviation
(sd) the positive square root of the variance. Standard error = sd/(n)1/2.

For our mutual example, if our mean returns are for 10 years (40 quarters of data), have a sample mean
difference of 2.58, and a sample standard deviation of 5.32, our test statistic is computed as (2.58)/((5.32)/
(40)1/2), or 3.067. At 49 degrees of freedom with a 0.05 significance level, the rejection point is 2.01. Thus we
reject the null hypothesis and state that there is a statistically significant difference in returns between these
funds.

Hypothesis Tests on the Variance of a Normally Distributed Population


Hypothesis tests concerning the value of a variance (σ2) start by formulating the null and alternative hypotheses.

In hypothesis tests for the variance on a single normally distributed population, the appropriate test statistic is
known as a “chi-square”, denoted by χ2. Unlike the distributions we have been using previously, the chi-square
is asymmetrical as it is bound on the left by zero. (This must be true since variance is always a positive
number.) The chi-square is actually a family of distributions similar to the t-distributions, with different degrees
of freedom resulting in a different chi-square distribution.

Formula 2.38

The test statistic is χ2 = (n – 1)*s2


σ02

Where: n = sample size, s2 = sample variance, σ02 = population variance from hypothesis

Sample variance s2 is refereed to as the sum of deviations between observed values and sample mean2, degrees
of freedom, or n – 1

Example: Hypothesis Testing w/ Chi Squared Statistic


To illustrate a hypothesis test using the chi-square statistic, take an example of a fund that we believe has been
very volatile relative to the market, and we wish to prove that level of risk (as measured by quarterly standard
deviation) is greater than the market's average. For our test, we assume the market's quarterly standard deviation
is 10%.

Our test will examine quarterly returns over the past five years, so n = 20, and degrees of freedom = 19. Our test
is a greater-than test with the null hypothesis of σ2 < (10)2, or 100, and an alternate hypothesis of σ2 > 100.
Using a 0.05 level of significance, our rejection point, from the chi-square tables with df = 19 and p = 0.05 in
the right tail, is 30.144. Thus if our calculated test statistic is greater than 30.144, we reject the null hypothesis
at 5% level of significance.

Answer:
Examining the quarterly returns for this period, we find our sample variance (s2) is 135. With n = 20 and σ02 =
100, we have all the data required to calculate the test statistic.

χ2 = ((n – 1)*s2)/σ02 = ((20 – 1)*135)/100 = 2565/100 or 25.65.

Since 25.65 is less than our critical value of 30.144, we do not have enough evidence to reject the null
hypothesis. While this fund may indeed be quite volatile, its volatility isn't statistically more meaningful than
the market average for the period.

Hypothesis Tests Relating to the equality of the Variances of Two Normally Distributed Populations,
where both Samples are Random and Independent
For hypothesis tests concerning relative values of the variances from two populations – whether σ12 (variance of
the first population) and σ22 (variance of the second) are equal/not equal/greater than/less than – we can
construct hypotheses in one of three ways.

When a hypothesis test compares variances from two populations and we can assume that random samples from
the populations are independent (uncorrelated), the appropriate test is the F-test, which represents the ratio of
sample variances. As with the chi-square, the F-distribution is a family of asymmetrical distributions (bound on
the left by zero). The F-family of distributions is defined by two values of degrees of freedom: the numerator
(df1) and denominator (df2). Each of the degrees of freedom are taken from the sample sizes (each sample size –
1).
The F-test taken from the sample data could be either s12/s22, or s22/s12 - with the convention to use whichever
ratio produces the larger number. This way, the F-test need only be concerned with values greater than 1, since
one of the two ratios is always going to be a number above 1.

Example: Hypothesis Testing w/ Ratio of Sample Variances


To illustrate, take a case of two mutual funds. Fund A has enjoyed greater performance returns than Fund B
(which we've owned, unfortunately). Our hypothesis is that the level of risk between these two is actually quite
similar, meaning the Fund A has superior risk-adjusted results. We test the hypothesis for the past five years of
quarterly data (df is 19 for both numerator and denominator). Using 0.05 significance, our critical value from
the F-tables is 2.51. Assume from the five-year sample that quarterly standard deviations have been 8.5 for
Fund A, and 6.3 for Fund B.

Answer:
Our F-statistic is (8.5)2/(6.3)2 = 72.25/39.69 = 1.82.

Since 1.82 does not reach the rejection level of 2.51, we cannot reject the null hypothesis, and we state that the
risk between these funds is not significantly different.

Concepts from the hypothesis-testing section are unlikely to be tested by rigorous exercises in number
crunching but rather in identifying the unique attributes of a given statistic. For example, a typical question
might ask, “In hypothesis testing, which test statistic is defined by two degrees of freedom, the numerator and
the denominator?”, giving you these choices: A. t-test, B. z-test, C. chi-square, or D. F-test. Of course, the
answer would be D. Another question might ask, “Which distribution is NOT symmetrical?”, and then give you
these choices: A. t, B. z, C. chi-square, D. normal. Here the answer would be C. Focus on the defining
characteristics, as they are the most likely source of exam questions.

Parametric and Nonparametric Tests


All of the hypothesis tests described thus far have been designed, in one way or another, to test the predicted
value of one or more parameters – unknown variables such as mean and variance that characterize a population
and whose observed values are distributed in a certain assumed way. Indeed, these specific assumptions are
mandatory and also very important: most of the commonly applied tests are built with data that assumes the
underlying population is normally distributed, which if not true, invalidates the conclusions reached. The less
normal the population (i.e. the more skewed the data), the less these parametric tests or procedures should be
used for the intended purpose.

Nonparametric hypothesis tests are designed for cases where either (a) fewer or different assumptions about
the population data are appropriate, or (b) where the hypothesis test is not concerned with a population
parameter.

In many cases, we are curious about a set of data but believe that the required assumptions (for example,
normally distributed data) do not apply to this example, or else the sample size is too small to comfortably make
such an assumption. A number of nonparametric alternatives have been developed to use in such cases. The
table below indicates a few examples that are analogous to common parametric tests.

Concern of
Parametric test Nonparametric
hypothesis
Wilcoxian signed-rank
Single mean t-test, z-test
test
Differences between t-test (or Mann-Whitney U-test
means approximate t-test)
Paired comparisons t-test Sign test, or Wilcoxian

Source: DeFusco, McLeavey, Pinto, Runkle, Quantitative Methods


for Investment Analysis, 2nd edition, Chapter 7, p 357.

A number of these tests are constructed by first converting data into ranks (first, second, third, etc.) and then
fitting the data into the test. One such test applied to testing correlation (the degree to which two variables are
related to each other) is the Spearman rank correlation coefficient. The Spearman test is useful in cases where a
normal distribution cannot be assumed – usually when a variable is bound by zero (always positive), or where
the range of values are limited. For the Spearman test, each observation in the two variables is ranked from
largest to smallest, and then the differences between the ranks are measured. The data is then used to find the
test statistic rs: 1 – [6*(sum of squared differences)/n*(n2 – 1)]. This result is compared to a rejection point
(based on the Spearman rank correlation) to determine whether to reject or not reject the null hypothesis.

Another situation requiring a nonparametric approach is to answer a question about something other than a
parameter. For example, analysts often wish to address whether a sample is truly random or whether the data
have a pattern indicating that it is not random (tested with the so-called “runs test”). Tests such as Kolmogorov-
Smirnov find whether a sample comes from a population that is distributed a certain way. Most of these
nonparametric examples are specialized and unlikely to be tested in any detail on the CFA Level I exam.
2.26 - Correlation and Regression
Financial variables are often analyzed for their correlation to other variables and/or market averages. The
relative degree of co-movement can serve as a powerful predictor of future behavior of that variable. A sample
covariance and correlation coefficient are tools used to indicate relation, while a linear regression is a technique
designed both to quantify a positive relationship between random variables, and prove that one variable is
dependent on another variable. When you are analyzing a security, if returns are found to be significantly
dependent on a market index or some other independent source, then both return and risk can be better
explained and understood.

Scatter Plots
A scatter plot is designed to show a relationship between two
variables by graphing a series of observations on a two-
dimensional graph – one variable on the X-axis, the other on the
Y-axis.

Figure 2.15: Scatter Plot


Sample Covariance
To quantify a linear relationship between two variables, we start by finding the covariance of a sample of
paired observations. A sample covariance between two random variables X and Y is the average value of the
cross-product of all observed deviations from each respective sample mean. A cross-product, for the ith
observation in a sample, is found by this calculation: (ith observation of X – sample mean of X) * (ith
observation of Y – sample mean of Y). The covariance is the sum of all cross-products, divided by (n – 1).

To illustrate, take a sample of five paired observations of annual returns for two mutual funds, which we will
label X and Y:

Year X return Y return Cross-Product: (Xi – Xmean)*(Yi – Ymean)

1st +15.5 +9.6 (15.5 – 6.6)*(9.6 – 7.3) = 20.47

2nd +10.2 +4.5 (10.2 – 6.6)*(4.5 – 7.3) = -10.08

3rd -5.2 +0.2 (-5.2 – 6.6)*(0.2 – 7.3) = 83.78

4th -6.3 -1.1 (-6.3 – 6.6)*(-1.1 – 7.3) = 108.36

5th +12.7 +23.5 (12.7 – 6.6)*(23.5 – 7.3) = 196.02

Sum 32.9 36.7 398.55

Average 6.6 7.3 398.55/(n – 1) = 99.64 = Cov (X,Y)

Average X and Y returns were found by dividing the sum by n or 5, while the average of the cross-products is
computed by dividing the sum by n – 1, or 4. The use of n – 1 for covariance is done by statisticians to ensure
an unbiased estimate.

Interpreting a covariance number is difficult for those who are not statistical experts. The 99.64 we computed
for this example has a sign of "returns squared" since the numbers were percentage returns, and a return
squared is not an intuitive concept. The fact that Cov(X,Y) of 99.64 was greater than 0 does indicate a positive
or linear relationship between X and Y. Had the covariance been a negative number, it would imply an inverse
relationship, while 0 means no relationship. Thus 99.64 indicates that the returns have positive co-movement
(when one moves higher so does the other), but doesn’t offer any information on the extent of the co-movement.

Sample Correlation Coefficient


By calculating a correlation coefficient, we essentially convert a raw covariance number into a standard format
that can be more easily interpreted to determine the extent of the relationship between two variables. The
formula for calculating a sample correlation coefficient (r) between two

random variables X and Y is the following:

Formula 2.39
r = (covariance between X, Y) / (sample standard deviation of X) *
(sample std. dev. of Y).

Example: Correlation Coefficient


Return to our example from the previous section, where covariance was found to be 99.64. To find the
correlation coefficient, we must compute the sample variances, a process illustrated in the table below.

Year X return Y return Squared X deviations Squared Y deviations

1st +15.5 +9.6 (15.5 – 6.6)2 = 79.21 (9.6 – 7.3)2 = 5.29

2nd +10.2 +4.5 (10.2 – 6.6)2 = 12.96 (4.5 – 7.3)2 = 7.84

3rd -5.2 +0.2 (-5.2 – 6.6)2 = 139.24 (0.2 – 7.3)2 = 50.41

4th -6.3 -1.1 (-6.3 – 6.6)2 = 166.41 (-1.1 – 7.3)2 = 70.56

5th +12.7 +23.5 (12.7 – 6.6)2 = 146.41 (23.5 – 7.3)2 = 262.44

Sum 32.9 36.7 544.23 369.54

Average 6.6 7.3 136.06 = X variance 99.14 = Y variance

Answer:
As with sample covariance, we use (n – 1) as the denominator in calculating sample variance (sum of squared
deviations as the numerator) – thus in the above example, each sum was divided by 4 to find the variance.
Standard deviation is the positive square root of variance: in this example, sample standard deviation of X is
(136.06)1/2, or 11.66; sample standard deviation of Y is (99.14)1/2, or 9.96.

Therefore, the correlation coefficient is (99.64)/11.66*9.96 = 0.858. A correlation coefficient is a value between
–1 (perfect inverse relationship) and +1 (perfect linear relationship) – the closer it is to 1, the stronger the
relationship. This example computed a number of 0.858, which would suggest a strong linear relationship.

Hypothesis Testing: Determining Whether a Positive or Inverse Relationship Exists Between Two
Random Variables
A hypothesis-testing procedure can be used to determine whether there is a positive relationship or an inverse
relationship between two random variables. This test uses each step of the hypothesis-testing procedure,
outlined earlier in this study guide. For this particular test, the null hypothesis, or H0, is that the correlation in
the population is equal to 0. The alternative hypothesis, Ha, is that the correlation is different from 0. The t-test
is the appropriate test statistic. Given a sample correlation coefficient r, and sample size n, the formula for the
test statistic is this:

t = r*(n – 2)1/2/(1 – r2)1/2, with degrees of freedom = n – 2 since we have 2 variables.


Testing whether a correlation coefficient is equal/not equal to 0 is a two-tailed test. In our earlier example with a
sample of 5, degrees of freedom = 5 – 2 = 3, and our rejection point from the t-distribution, at a significance
level of 0.05, would be 3.182 (p = 0.025 for each tail).

Using our computed sample r of 0.858, t = r*(n – 2)1/2/(1 – r2)1/2 = (0.858)*(3)1/2/(1 – (0.858)2)1/2 = (1.486)/
(0.514) = 2.891. Comparing 2.891 to our rejection point of 3.182, we do not have enough evidence to reject the
null hypothesis that the population correlation coefficient is 0. In this case, while it does appear that there is a
strong linear relationship between our two variables (and thus we may well be risking a type II error), the results
of the hypothesis test show the effects of a small sample size; that is, we had just three degrees of freedom,
which required a high rejection level for the test statistic in order to reject the null hypothesis. Had there been
one more observation on our sample (i.e. degrees of freedom = 4), then the rejection point would have been
2.776 and we would have rejected the null and accepted that there is likely to be a significant difference from 0
in the population r. In addition, level of significance plays a role in this hypothesis test. In this particular
example, we would reject the null hypothesis at a 0.1 level of significance, where the rejection level would be
any test statistic higher than 2.353.

Of course, a hypothesis-test process is designed to give information about that example and the pre-required
assumptions (done prior to calculating the test statistic). Thus it would stand that the null could not be rejected
in this case. Quite frankly, the hypothesis-testing exercise gives us a tool to establish significance to a sample
correlation coefficient, taking into account the sample size. Thus, even though 0.858 feels close to 1, it’s also
not close enough to make conclusions about correlation of the underlying populations – with small sample size
probably a factor in the test.
CFA Level 1 - Quantitative Methods
Email to Friend
Comments

2.27 - Regression Analysis

Linear Regression
A linear regression is constructed by fitting a line through
a scatter plot of paired observations between two
variables. The sketch below illustrates an example of a
linear regression line drawn through a series of (X, Y)
observations:

Figure 2.16: Linear Regression


A linear regression line is usually determined quantitatively by a best-fit procedure such as least squares (i.e.
the distance between the regression line and every observation is minimized). In linear regression, one variable
is plotted on the X axis and the other on the Y. The X variable is said to be the independent variable, and the Y
is said to be the dependent variable. When analyzing two random variables, you must choose which variable is
independent and which is dependent. The choice of independent and dependent follows from the hypothesis –
for many examples, this distinction should be intuitive. The most popular use of regression analysis is on
investment returns, where the market index is independent while the individual security or mutual fund is
dependent on the market. In essence, regression analysis formulates a hypothesis that the movement in one
variable (Y) depends on the movement in the other (X).

Regression Equation
The regression equation describes the relationship between two variables and is given by the general format:

Formula 2.40

Y = a + bX +
ε

Where: Y = dependent variable; X = independent variable,


a = intercept of regression line; b = slope of regression line,
ε = error term

In this format, given that Y is dependent on X, the slope b indicates the unit changes in Y for every unit
change in X. If b = 0.66, it means that every time X increases (or decreases) by a certain amount, Y
increases (or decreases) by 0.66*that amount. The intercept a indicates the value of Y at the point where
X = 0. Thus if X indicated market returns, the intercept would show how the dependent variable
performs when the market has a flat quarter where returns are 0. In investment parlance, a manager has
a positive alpha because a linear regression between the manager's performance and the performance of
the market has an intercept number a greater than 0.

Linear Regression - Assumptions


Drawing conclusions about the dependent variable requires that we make six assumptions, the classic
assumptions in relation to the linear regression model:

1. The relationship between the dependent variable Y and the independent variable X is linear in the slope
and intercept parameters a and b. This requirement means that neither regression parameter can be
multiplied or divided by another regression parameter (e.g. a/b), and that both parameters are raised to
the first power only. In other words, we can't construct a linear model where the equation was Y = a +
b2X + ε, as unit changes in X would then have a b2 effect on a, and the relation would be nonlinear.
2. The independent variable X is not random.
3. The expected value of the error term "ε" is 0. Assumptions #2 and #3 allow the linear regression model
to produce estimates for slope b and intercept a.
4. The variance of the error term is constant for all observations. Assumption #4 is known as the
"homoskedasticity assumption". When a linear regression is heteroskedastic its error terms vary and the
model may not be useful in predicting values of the dependent variable.
5. The error term ε is uncorrelated across observations; in other words, the covariance between the error
term of one observation and the error term of the other is assumed to be 0. This assumption is necessary
to estimate the variances of the parameters.
6. The distribution of the error terms is normal. Assumption #6 allows hypothesis-testing methods to be
applied to linear-regression models.

Standard Error of Estimate


Abbreviated SEE, this measure gives an indication of how well a linear regression model is working. It
compares actual values in the dependent variable Y to the predicted values that would have resulted had Y
followed exactly from the linear regression. For example, take a case where a company's financial analyst has
developed a regression model relating annual GDP growth to company sales growth by the equation Y = 1.4 +
0.8X.

Assume the following experience (on the next page) over a five-year period; predicted data is a function of the
model and GDP, and "actual" data indicates what happened at the company:

Year (Xi) GDP Predicted co. Actual co. Residual Squared


growth growth (Yi) Growth (Yi - Yi) residual
(Yi)

1 5.1 5.5 5.2 -0.3 0.09

2 2.1 3.1 2.7 -0.4 0.16

3 -0.9 0.7 1.5 0.8 0.64

4 0.2 1.6 3.1 1.5 2.25

5 6.4 6.5 6.3 -0.2 0.04

To find the standard error of the estimate, we take the sum of all squared residual terms and divide by (n – 2),
and then take the square root of the result. In this case, the sum of the squared residuals is
0.09+0.16+0.64+2.25+0.04 = 3.18. With five observations, n – 2 = 3, and SEE = (3.18/3)1/2 = 1.03%.
The computation for standard error is relatively similar to that of standard deviation for a sample (n – 2 is used
instead of n – 1). It gives some indication of the predictive quality of a regression model, with lower SEE
numbers indicating that more accurate predictions are possible. However, the standard-error measure doesn't
indicate the extent to which the independent variable explains variations in the dependent model.

Coefficient of Determination
Like the standard error, this statistic gives an indication of how well a linear-regression model serves as an
estimator of values for the dependent variable. It works by measuring the fraction of total variation in the
dependent variable that can be explained by variation in the independent variable.

In this context, total variation is made up of two fractions:

Total variation = explained variation + unexplained variation


total variation total variation

The coefficient of determination, or explained variation as a percentage of total variation, is the first of these
two terms. It is sometimes expressed as 1 – (unexplained variation / total variation).

For a simple linear regression with one independent variable, the simple method for computing the coefficient
of determination is squaring the correlation coefficient between the dependent and independent variables. Since
the correlation coefficient is given by r, the coefficient of determination is popularly known as "R2, or R-
squared". For example, if the correlation coefficient is 0.76, the R-squared is (0.76)2 = 0.578. R-squared terms
are usually expressed as percentages; thus 0.578 would be 57.8%. A second method of computing this number
would be to find the total variation in the dependent variable Y as the sum of the squared deviations from the
sample mean. Next, calculate the standard error of the estimate following the process outlined in the previous
section. The coefficient of determination is then computed by (total variation in Y – unexplained variation in Y)
/ total variation in Y. This second method is necessary for multiple regressions, where there is more than one
independent variable, but for our context we will be provided the r (correlation coefficient) to calculate an R-
squared.

What R2 tells us is the changes in the dependent variable Y that are explained by changes in the independent
variable X. R2 of 57.8 tells us that 57.8% of the changes in Y result from X; it also means that 1 – 57.8% or
42.2% of the changes in Y are unexplained by X and are the result of other factors. So the higher the R-squared,
the better the predictive nature of the linear-regression model.

Regression Coefficients
For either regression coefficient (intercept a, or slope b), a confidence interval can be determined with the
following information:

1. An estimated parameter value from a sample


2. Standard error of the estimate (SEE)
3. Significance level for the t-distribution
4. Degrees of freedom (which is sample size – 2)

For a slope coefficient, the formula for confidence interval is given by b ± tc*SEE, where tc is the critical t value
at our chosen significant level.

To illustrate, take a linear regression with a mutual fund's returns as the dependent variable and the S&P 500
index as the independent variable. For five years of quarterly returns, the slope coefficient b is found to be 1.18,
with a standard error of the estimate of 0.147. Student's t-distribution for 18 degrees of freedom (20 quarters –
2) at a 0.05 significance level is 2.101. This data gives us a confidence interval of 1.18 ± (0.147)*(2.101), or a
range of 0.87 to 1.49. Our interpretation is that there is only a 5% chance that the slope of the population is
either less than 0.87 or greater than 1.49 – we are 95% confident that this fund is at least 87% as volatile as the
S&P 500, but no more than 149% as volatile, based on our five-year sample.

Hypothesis testing and Regression Coefficients


Regression coefficients are frequently tested using the hypothesis-testing procedure. Depending on what the
analyst is intending to prove, we can test a slope coefficient to determine whether it explains chances in the
dependent variable, and the extent to which it explains changes. Betas (slope coefficients) can be determined to
be either above or below 1 (more volatile or less volatile than the market). Alphas (the intercept coefficient) can
be tested on a regression between a mutual fund and the relevant market index to determine whether there is
evidence of a sufficiently positive alpha (suggesting value added by the fund manager).

The mechanics of hypothesis testing are similar to the examples we have used previously. A null hypothesis is
chosen based on a not-equal-to, greater-than or less-than-case, with the alternative satisfying all values not
covered in the null case. Suppose in our previous example where we regressed a mutual fund's returns on the
S&P 500 for 20 quarters our hypothesis is that this mutual fund is more volatile than the market. A fund equal
in volatility to the market will have slope b of 1.0, so for this hypothesis test, we state the null hypothesis (H0)as
the case where slope is less than or greater to 1.0 (i.e. H0: b < 1.0). The alternative hypothesis Ha has b > 1.0. We
know that this is a greater-than case (i.e. one-tailed) – if we assume a 0.05 significance level, t is equal to 1.734
at degrees of freedom = n – 2 = 18.

Example: Interpreting a Hypothesis Test


From our sample, we had estimated b of 1.18 and standard error of 0.147. Our test statistic is computed with
this formula: t = estimated coefficient – hypothesized coeff. / standard error = (1.18 – 1.0)/0.147 = 0.18/0.147,
or t = 1.224.

For this example, our calculated test statistic is below the rejection level of 1.734, so we are not able to reject
the null hypothesis that the fund is more volatile than the market.

Interpretation: the hypothesis that b > 1 for this fund probably needs more observations (degrees of freedom) to
be proven with statistical significance. Also, with 1.18 only slightly above 1.0, it is quite possible that this fund
is actually not as volatile as the market, and we were correct to not reject the null hypothesis.

Example: Interpreting a regression coefficient.


The CFA exam is likely to give the summary statistics of a linear regression and ask for interpretation. To
illustrate, assume the following statistics for a regression between a small-cap growth fund and the Russell 2000
index:

Correlation coefficient 0.864


Intercept -0.417
Slope 1.317

What do each of these numbers tell us?

1. Variation in the fund is about 75%, explained by changes in the Russell 2000 index. This is true because
the square of the correlation coefficient, (0.864)2 = 0.746, gives us the coefficient of determination or R-
squared.
2. The fund will slightly underperform the index when index returns are flat. This results from the value of
the intercept being –0.417. When X = 0 in the regression equation, the dependent variable is equal to the
intercept.
3. The fund will on average be more volatile than the index. This fact follows from the slope of the
regression line of 1.317 (i.e. for every 1% change in the index, we expect the fund's return to change by
1.317%).
4. The fund will outperform in strong market periods, and underperform in weak markets. This fact follows
from the regression. Additional risk is compensated with additional reward, with the reverse being true
in down markets. Predicted values of the fund's return, given a return for the market, can be found by
solving for Y = -0.417 + 1.317X (X = Russell 2000 return).

Analysis of Variance (ANOVA)


Analysis of variance, or ANOVA, is a procedure in which the total variability of a random variable is
subdivided into components so that it can be better understood, or attributed to each of the various sources that
cause the number to vary.

Applied to regression parameters, ANOVA techniques are used to determine the usefulness in a regression
model, and the degree to which changes in an independent variable X can be used to explain changes in a
dependent variable Y. For example, we can conduct a hypothesis-testing procedure to determine whether slope
coefficients are equal to zero (i.e. the variables are unrelated), or if there is statistical meaning to the
relationship (i.e. the slope b is different from zero). An F-test can be used for this process.

F-Test
The formula for F-statistic in a regression with one independent variable is given by the following:

Formula 2.41

F = mean regression sum of squares / mean squared error

= (RSS/1) / [SSE/(n – 2)]

The two abbreviations to understand are RSS and SSE:

1. RSS, or the regression sum of squares, is the amount of total variation in the dependent variable Y that
is explained in the regression equation. The RSS is calculated by computing each deviation between a
predicted Y value and the mean Y value, squaring the deviation and adding up all terms. If an
independent variable explains none of the variations in a dependent variable, then the predicted values of
Y are equal to the average value, and RSS = 0.
2. SSE, or the sum of squared error of residuals, is calculated by finding the deviation between a predicted
Y and an actual Y, squaring the result and adding up all terms.

TSS, or total variation, is the sum of RSS and SSE. In other words, this ANOVA process breaks variance into
two parts: one that is explained by the model and one that is not. Essentially, for a regression equation to have
high predictive quality, we need to see a high RSS and a low SSE, which will make the ratio (RSS/1)/[SSE/(n –
2)] high and (based on a comparison with a critical F-value) statistically meaningful. The critical value is taken
from the F-distribution and is based on degrees of freedom.

For example, with 20 observations, degrees of freedom would be n – 2, or 18, resulting in a critical value (from
the table) of 2.19. If RSS were 2.5 and SSE were 1.8, then the computed test statistic would be F = (2.5/(1.8/18)
= 25, which is above the critical value, which indicates that the regression equation has predictive quality (b is
different from 0)

Estimating Economic Statistics with Regression Models


Regression models are frequently used to estimate economic statistics such as inflation and GDP growth.
Assume the following regression is made between estimated annual inflation (X, or independent variable) and
the actual number (Y, or dependent variable):

Y = 0.154 + 0.917X

Using this model, the predicted inflation number would be calculated based on the model for the following
inflation scenarios:

Inflation estimate Inflation based on model

-1.1% -0.85%

+1.4% +1.43%

+4.7% +4.46%

The predictions based on this model seem to work best for typical inflation estimates, and suggest that extreme
estimates tend to overstate inflation – e.g. an actual inflation of just 4.46 when the estimate was 4.7. The model
does seem to suggest that estimates are highly predictive. Though to better evaluate this model, we would need
to see the standard error and the number of observations on which it is based. If we know the true value of the
regression parameters (slope and intercept), the variance of any predicted Y value would be equal to the square
of the standard error.

In practice, we must estimate the regression parameters; thus our predicted value for Y is an estimate based on
an estimated model. How confident can we be in such a process? In order to determine a prediction interval,
employ the following steps:

1. Predict the value of the dependent variable Y based on independent observation X.

2. Compute the variance of the prediction error, using the following equation:

Formula 2.42

Where: s2 is the squared standard error of the estimate, n is number of observations, X is the value of the
independent variable used to make the prediction, X is the estimated mean value of the independent
variable, and sx2 is the variance of X.

3. Choose a significance level α for the confidence interval.

4. Construct an interval at (1 – α) percent confidence, using the structure Y ± tc*sf.

Here's another case where the material becomes much more technical than necessary and one can get
bogged down in preparing, when in reality the formula for variance of a prediction error isn't likely to be
covered. Prioritize – don't squander precious study hours memorizing it. If the concept is tested at all,
you'll likely be given the answer to Part 2. Simply know how to use the structure in Part 4 to answer a
question.

For example, if the predicted X observation is 2 for the regression Y = 1.5 + 2.5X, we would have a
predicted Y of 1.5 + 2.5*(2), or 6.5. Our confidence interval is 6.5 ± tc*sf. The t-stat is based on a chosen
confidence interval and degrees of freedom, while sf is the square root of the equation above (for variance
of the prediction error. If these numbers are tc = 2.10 for 95% confidence, and sf = 0.443, the interval is
6.5 ± (2.1)*(0.443), or 5.57 to 7.43.

Limitations of Regression Analysis

Focus on three main limitations:

1. Parameter Instability - This is the tendency for relationships between variables to change over time
due to changes in the economy or the markets, among other uncertainties. If a mutual fund produced a
return history in a market where technology was a leadership sector, the model may not work when
foreign and small-cap markets are leaders.

2. Public Dissemination of the Relationship - In an efficient market, this can limit the effectiveness of that
relationship in future periods. For example, the discovery that low price-to-book value stocks outperform
high price-to-book value means that these stocks can be bid higher, and value-based investment
approaches will not retain the same relationship as in the past.

3. Violation of Regression Relationships - Earlier we summarized the six classic assumptions of a linear
regression. In the real world these assumptions are often unrealistic – e.g. assuming the independent
variable X is not random.
<< Back
Next >>

You might also like