You are on page 1of 11

Statistics Cheat Sheets

Descriptive Statistics:
Term
Sort
Mean

Meaning
Sort values in
increasing order
Average

Population Formula

Median
Mode
Variance

Standard
Deviation

Coefficient
of
Variation
Minimum
Maximum
Range

The middle value


half are below and half
are above
The value with the
most appearances
The average of the
squared deviations
between the values
and the mean
The square root of
Variance, thought of as
the average
deviation from the
mean.
The variation relative
to the value of the
mean
The minimum value
The maximum value
Maximum minus
Minimum

Sample Formula

Example {1,16,1,3,9}
{1,1,3,9,16}
n

Xi
i 1

X 1 X 2 ... X n

Xi

i 1

n
3
1

1 N
2
Xi
N i 1

X i X
n

(1-6)2 + (1-6) 2 + (36)2 + (9-6)2 + (16-6)2


divided by 5 values =
168/5 = 33.6

i 1

n 1

X i X
n

s s
CV

Square root of 33.6 =


5.7966

i 1

n 1

s
X

5.7966 divided by 6 =
0.9661
1
16
16 1 = 15

Probability Terms:
Term
Probability
Random
Experiment
Event
Intersection of
Events
Union of Events
Complement
Mutually
Exclusive Events
Collectively
Exhaustive
Events
Basic Outcomes

Sample Space

Meaning
For any event A, probability is represented within 0 P 1.
A process leading to at least 2 possible outcomes with
uncertainty as to which will occur.
A subset of all possible outcomes of an experiment.
Let A and B be two events. Then the intersection of the two
events is the event that both A and B occur (logical AND).
The union of the two events is the event that A or B (or both)
occurs (logical OR).
Let A be an event. The complement of A is the event that A does
not occur (logical NOT).
A and B are said to be mutually exclusive if at most one of the
events A and B can occur.
A and B are said to be collectively exhaustive if at least one of
the events A or B must occur.
The simple indecomposable possible results of an experiment.
One and exactly one of these outcomes must occur. The set of
basic outcomes is mutually exclusive and collectively
exhaustive.
The totality of basic outcomes of an experiment.

Notation

P()

AB
AB
A

Example* (see footnote)


0.5
Rolling a dice
Events A and B
The event that a 2 appears
The event that a 1, 2, 4, 5 or 6
appears
The event that an odd number
appears
A and B are not mutually
exclusive because if a 2 appears,
both A and B occur
A and B are not collectively
exhaustive because if a 3
appears, neither A nor B occur
Basic outcomes 1, 2, 3, 4, 5, and
6
{1,2,3,4,5,6}

* Roll a fair die once. Let A be the event an even number appears, let B be the event a 1, 2 or 5 appears
113

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Probability Rules:
If events A and B are mutually exclusive

If events A and B are NOT mutually exclusive

Area:

Venn:

Term

Equals

Term

Equals

P(A)=

P(A)

P(A)=

P(A)

P( A )=

1 - P(A)

P( A )=

1 - P(A)

P(AB)=

P(AB)=

only if A and
B are
independent

P(AB)=

P(A) + P(B)

P(AB)=

P(A) + P(B)
P(AB)

P(A|B)=

P A B
P B

P(A) * P(B)

General probability rules:


1) If P(A|B) = P(A), then A and B are independent
events! (for example, rolling dice one after the other).
2) If there are n possible outcomes which are equally
likely to occur:
P(outcome i occurs) =

1
for each i [1, 2, ..., n]
n

[Bayes'
Law: P(A
holds given
that B
holds)]

P(A)=

3) If event A is composed of n equally likely basic


outcomes:

*Example: Suppose we toss two dice. Let A


denote the event that the sum of the two dice is
9. P(A) = 4/36 = 1/9, because there are 4 out of
36 basic outcomes that will sum 9.

P(AB) = P(A|B) * P(B)


P(AB) = P(B|A) * P(A)

*Example: Shuffle a deck of cards, and pick one


at random. P(chosen card is a 10) = 1/52.

P(A) = Number of Basic Outcomes in A

P(AB) +
P(A B )
=
P(A|B)P(B) +
P(A| B )P(
B)

*Example: Take a deck of 52 cards. Take out 2 cards sequentially,


but dont look at the first. The probability that the second card you
chose was a is the probability of choosing a (event A) after
choosing a (event B), plus the probability of choosing a (event
A) after not choosing a (event B), which equals (12/51)(13/52) +
(13/51)(39/52) = 1/4 = 0.25.

114

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Random Variables and Distributions:


To calculate the Expected Value E X x P X x , use the following table:

Payoff
Probability
Weighted Payoff
[payoff of first event in $]
[product of Payoff * Probability]
[probability of first event 0P1]
[payoff of second event in $] [probability of second event
[product of Payoff * Probability]
0P1]
[name of third event]
[payoff of third event in $]
[product of Payoff * Probability]
[probability of third event 0P1]
* See example in BOOK 1 page 54
Total (Expected Payoff): [total of all Weighted Payoffs above]
Event
[name of first event]
[name of second event]

To calculate the Variance Var(X) =


Event
[1st
event]
[2nd
event]
[3rd
event]

Payoff
[1st
payoff]
[2nd
payoff]
[3rd
payoff]

x E X P X x

Expected
Payoff
[Total from
above]
[Total from
above]
[Total from
above]

and Standard Deviation X Var X , use:

Error

(Error)2

^2=

[1st payoff minus


Expected Payoff]
[2nd payoff minus
Expected Payoff]
[3rd payoff minus
Expected Payoff]

Probability

1st Error
squared
2nd Error
squared
3rd Error
squared

1st events
probability
2nd events
probability
3rd events
probability
Variance:
Std. Deviation:

Weighted (Error)2
1st (Error)2 * 1st events
probability
2nd (Error)2 * 2nd events
probability
3rd (Error)2 * 3rd events
probability
[total of above]
[square root of Variance]

Counting Rules:
Term
Basic Counting
Rule

Meaning
The number of ways to pick x
things out of a set of n (with
no regard to order). The
probability is calculated as 1/x
of the result.

Bernoulli
Process

For a sequence of n trials, each


with an outcome of either
success or failure, each with a
probability of p to succeed
the probability to get x
successes is equal to the Basic
Counting Rule formula
(above) times px(1-p)n-x.

Bernoulli
Expected Value
Bernoulli
Variance
Bernoulli
Standard
Deviation
Linear
Transformation
Rule

The expected value of a


Bernoulli Process, given n
trials and p probability.
The variance of a Bernoulli
Process, given n trials and p
probability.
The standard deviation of a
Bernoulli Process:
If X is random and Y=aX+b,
then the following formulas
apply:

Formula

n
n!

x! n x !
x

P X x n, p

n!

px 1 p n x

x! n x !

E(X) = np
Var(X) = np(1 - p)
(X) =

np( 1 p)

Example
The number of ways to pick
4 specific cards out of a
deck of 52 is: 52!/((4!)
(48!)) = 270,725, and the
probability is 1/270,725 =
0.000003694
If an airline takes 20
reservations, and there is a
0.9 probability that each
passenger will show up,
then the probability that
exactly 16 passengers will
show is:

20!
(0.9)16(0.1)4
16! 4!
= 0.08978
In the example above, the
number of people expected
to show is: (20)(0.9) = 18
In the example above, the
Bernoulli Variance is (20)
(0.9)(0.1) = 1.8
In the example above, the
Bernoulli Standard
Deviation is 1.8 = 1.34

E(Y) = a*E(X) + b
Var (Y) = a2*Var(X)
(Y) = |a|*(X)
115

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Uniform Distribution:
Term/Meaning
Expected Value
Variance
Standard Deviation
Probability that X falls
between c and d

Formula

a b

2
b a 2
X2
12
b a
X
12
d c
P c X d
ba

Normal Distribution:

Probability Density Function:


1 x


1
fX x
e 2
2

where 3.1416 and e 2.7183

z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 .
.
.
.
.
.
.
.
.
.
000 004 008 012 016 019 023 027 031 035
0
0
0
0
0
9
9
9
9
9
0.1 .
.
.
.
.
.
.
.
.
.
039 043 047 051 055 059 063 067 071 075
8
8
8
7
7
6
6
5
4
3
0.2 .
.
.
.
.
.
.
.
.
.
079 083 087 091 094 098 102 106 110 114
3
2
1
0
8
7
6
4
3
1
0.3 .
.
.
.
.
.
.
.
.
.
117 121 125 129 133 136 140 144 148 151
9
7
5
3
1
8
6
3
0
7
0.4 .
.
.
.
.
.
.
.
.
.
155 159 162 166 170 173 177 180 184 187
4
1
8
4
0
6
2
8
4
9
0.5 .
.
.
.
.
.
.
.
.
.
191 195 198 201 205 208 212 215 219 222
5
0
5
9
4
8
3
7
0
4
0.6 .
.
.
.
.
.
.
.
.
.
225 229 232 235 238 242 245 248 251 254
7
1
4
7
9
2
4
6
7
9
0.7 .
.
.
.
.
.
.
.
.
.
258 261 264 267 270 273 276 279 282 285
0
1
2
3
4
4
4
4
3
2
0.8 .
.
.
.
.
.
.
.
.
.
288 291 293 296 299 302 305 307 310 313
1
0
9
7
5
3
1
8
6
3
0.9 .
.
.
.
.
.
.
.
.
.
315 318 321 323 326 328 331 334 336 338
9
6
2
8
4
9
5
0
5
9
1.0 .
.
.
.
.
.
.
.
.
.
341 343 346 348 350 353 355 357 359 362
3
8
1
5
8
1
4
7
9
1
1.1 .
.
.
.
.
.
.
.
.
.
364 366 368 370 372 374 377 379 381 383
3
5
6
8
9
9
0
0
0
0
116

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Standard Deviations away from the mean:

(Z and are swappable!)

P(a X b) = area under fX(x) between a and b:

a
b
P a X b P
Z

Standard Normal Table - seven usage scenarios:

1.2 .
384
9
1.3 .
403
2
1.4 .
419
2
1.5 .
433
2
1.6 .
445
2
1.7 .
455
4
1.8 .
464
1
1.9 .
471
3
2.0 .
477
2
2.1 .
482
1
2.2 .
486
1
2.3 .
489
3
2.4 .
491
8
2.5 .
493
8
2.6 .
495
3
2.7 .
496
5
2.8 .
497
4
117

.
386
9
.
404
9
.
420
7
.
434
5
.
446
3
.
456
4
.
464
9
.
471
9
.
477
8
.
482
6
.
486
4
.
489
6
.
492
0
.
494
0
.
495
5
.
496
6
.
497
5

.
388
8
.
406
6
.
422
2
.
435
7
.
447
4
.
457
3
.
465
6
.
472
6
.
478
3
.
483
0
.
486
8
.
489
8
.
492
2
.
494
1
.
495
6
.
496
7
.
497
6

.
390
7
.
408
2
.
423
6
.
437
0
.
448
4
.
458
2
.
466
4
.
473
2
.
478
8
.
483
4
.
487
1
.
490
1
.
492
5
.
494
3
.
495
7
.
496
8
.
497
7

.
392
5
.
409
9
.
425
1
.
438
2
.
449
5
.
459
1
.
467
1
.
473
8
.
479
3
.
483
8
.
487
5
.
490
4
.
492
7
.
494
5
.
495
9
.
496
9
.
497
7

.
394
4
.
411
5
.
426
5
.
439
4
.
450
5
.
459
9
.
467
8
.
474
4
.
479
8
.
484
2
.
487
8
.
490
6
.
492
9
.
494
6
.
496
0
.
497
0
.
497
8

.
396
2
.
413
1
.
427
9
.
440
6
.
451
5
.
460
8
.
468
6
.
475
0
.
480
3
.
484
6
.
488
1
.
490
9
.
493
1
.
494
8
.
496
1
.
497
1
.
497
9

.
398
0
.
414
7
.
429
2
.
441
8
.
452
5
.
461
6
.
469
3
.
475
6
.
480
8
.
485
0
.
488
4
.
491
1
.
493
2
.
494
9
.
496
2
.
497
2
.
497
9

.
399
7
.
416
2
.
430
6
.
442
9
.
453
5
.
462
5
.
469
9
.
476
1
.
481
2
.
485
4
.
488
7
.
491
3
.
493
4
.
495
1
.
496
3
.
497
3
.
498
0

.
401
5
.
417
7
.
431
9
.
444
1
.
454
5
.
463
3
.
470
6
.
476
7
.
481
7
.
485
7
.
489
0
.
491
6
.
493
6
.
495
2
.
496
4
.
497
4
.
498
1

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

2.9 .
498
1
3.0 .
498
7

.
498
2
.
498
7

.
498
2
.
498
7

.
498
3
.
498
8

.
498
4
.
498
8

.
498
4
.
498
9

.
498
5
.
498
9

.
498
5
.
498
9

.
498
6
.
499
0

.
498
6
.
499
0

Correlation:

If X and Y are two different sets of data, their correlation is represented by Corr(XY), rXY, or XY (rho).
If Y increases as X increases, 0 < XY < 1. If Y decreases as X increases, -1 < XY < 0.
The extremes XY = 1 and XY = -1 indicated perfect correlation info about one results in an exact prediction about the other.
If X and Y are completely uncorrelated, XY = 0.
The Covariance of X and Y, Cov(XY) , has the same sign as XY, has unusual units and is usually a means to find XY.
Term
Formula
Notes
Correlation
Used with Covariance formulas below
Cov XY

Corr XY

Covariance (2 formulas)

X Y

Cov XY E X X Y Y
(difficult to calculate)

Cov XY E XY X Y
Finding Covariance
given Correlation

Cov XY X Y Corr XY

Sum of the products of all sample pairs distance from their


respective means multiplied by their respective probabilities
Sum of the products of all sample pairs multiplied by their
respective probabilities, minus the product of both means

Portfolio Analysis:
Term
Mean of any Portfolio S
Uncorrelate
d
Correlated

Formula
S aX bY

Example*
S = (8.0%)+ (11.0%) = 8.75%

Portfolio Variance

2 a2 X2 b2 Y2

2 = ()2(0.5)2 + ()2(6.0)2 = 2.3906

Portfolio Standard Deviation

a2 X2 b2 Y2

= 1.5462

Portfolio Variance

2aXbY a2 X2 b2 Y2 2abCov
XY

Portfolio Standard Deviation

aXbY a2 X2 b2 Y2 2abCov
XY

* Portfolio S composed of Stock A (mean return: 8.0%, standard deviation: 0.5%) and Stock B (11.0%, 6.0% respectively)

The Central Limit Theorem

Continuity Correction

Normal distribution can be used to approximate


binominals of more than 30 trials (n30):
Term
Formula
Mean
E(X) = np
Variance
Var(X) = np(1 - p)
Standard
(X) = np( 1 p)
Deviation

Unlike continuous (normal) distributions (i.e. $, time), discrete


binomial distribution of integers (i.e. # people) must be corrected:
Old cutoff
New cutoff

P(X>20)
P(X<20)
P(X20)

P(X>20.5)
P(X<19.5)
P(X19.5)

P(X20)

P(X20.5)

Sampling Distribution of the Mean

Sampling Distribution of a Proportion

is normally
If, for a proportion, n 30 then p
distributed with:
Term
Formula

If the Xi's are normally distributed (or n 30), then


X is normally distributed with:
Term
Formula
118

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets


Mean
Standard Error of the
Mean

Mean

Standard Deviation

119

p 1 p
n

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Confidence Intervals:
Parameter

Confidence Interval

X z

X z 2

X t n 1,

X Y

z 2
p

n
s
n
2

s
n

X Y

X Y z / 2

X2 Y2

n X nY

X Y

X Y z / 2

s X2 sY2

n X nY

p X p Y z / 2

p X pY

Unknown

Normal

Small

Unknown

Binomial

Large

Normal

Matched pairs

Normal

Known ,
Independent Samples
Large

p X 1 p X p Y 1 p Y

nX
nY

Binomial

4
Matched
or
Independent?
5

Mean
or
Proportion?
3
Single p
or
Difference?
6

Confidence Level
80%
= 20%
90%
= 10%
95%
= 5%
99%
= 1%
c
= 1.0-c

Term

Confidence Level to Z-Value Guide


Z/2 (2-Tail)
1.28
1.645
1.96
2.575
Z(c/2)

Z (1-Tail)
0.84
1.28
1.645
2.325
z(c-0.5)

Determining the Appropriate Sample Size


Normal Distribution Formula
Proportion Formula

Sample Size (for +/- e)

1.96 2 2
e2

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

1.313 1.701 2.048 2.467 2.763

Single Mean
or
Difference?

1.962
n
4e 2
120

Large
t-table
0.100 0.050 0.025
12.70
3.078 6.314
6
1.886 2.920 4.303
1.638 2.353 3.182
1.533 2.132 2.776
1.476 2.015 2.571
1.440 1.943 2.447
1.415 1.895 2.365
1.397 1.860 2.306
1.383 1.833 2.262
1.372 1.812 2.228
1.363 1.796 2.201
1.356 1.782 2.179
1.350 1.771 2.160
1.345 1.761 2.145
1.341 1.753 2.131
1.337 1.746 2.120
1.333 1.740 2.110
1.330 1.734 2.101
1.328 1.729 2.093
1.325 1.725 2.086
1.323 1.721 2.080
1.321 1.717 2.074
1.319 1.714 2.069
1.318 1.711 2.064
1.316 1.708 2.060
1.315 1.706 2.056
1.314 1.703 2.052

d. f.

Known
Large

Formulae Guide

Large/Normal
or
Small?

Sample

Normal

p 1 p
n
sD

D z 2

Usage

0.010
31.82
1
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473

0.005
63.65
6
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Hypothesis Testing:
Two-tailed
Test Type

Test Statistic

Ha

Lower-tail

Critical
Value

Upper-tail

Ha

Critical
Value

Ha

Critical
Value

t n 1,

t n 1,

p p0

p p0

Single (n
30)

z0

X 0
s
n

Single (n
30)

t0

X 0
s
n

t n1,

p p0

X Y 0

X Y 0

pX pY 0 z

pX pY 0

pX pY 0

Single p (n
30)

Diff. between
two s

Diff. between
two ps

z0

p0 1 p0
n

X Y 0

X Y 0

sx2 sY2

nx nY

pX pY 0
nX nY
nX nY

p 1 p

Step
Formulate Two
Hypotheses

Select a Test Statistic

Derive a Decision Rule

p p0

z0

z0

Calculate the Value of the


Test Statistic; Invoke the
Decision Rule in light of
the Test Statistic

Classic Hypothesis Testing Procedure


Description
The hypotheses ought to be mutually exclusive and collectively
exhaustive. The hypothesis to be tested (the null hypothesis) always
contains an equals sign, referring to some proposed value of a
population parameter. The alternative hypothesis never contains an
equals sign, but can be either a one-sided or two-sided inequality.
The test statistic is a standardized estimate of the difference between
our sample and some hypothesized population parameter. It answers
the question: If the null hypothesis were true, how many standard
deviations is our sample away from where we expected it to be?
The decision rule consists of regions of rejection and non-rejection,
defined by critical values of the test statistic. It is used to establish the
probable truth or falsity of the null hypothesis.
Either reject the null hypothesis (if the test statistic falls into the
rejection region) or do not reject the null hypothesis (if the test
statistic does not fall into the rejection region.

121

Example

H0: = 0
HA: < 0

X 0
s
n
We reject H0 if

X 0 z

.
n

X 0 0.21 0

0.80
s
50
n

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

Regression:
Regression

Statistic

Symbol

Independent Variables

Dependent Variable (an individual


observation among sample)

Yi

Intercept (or constant); an unknown


population parameter
Estimated intercept; an estimate of

Estimated slope for Independent Variable 1;

Symbol

df

SS
MS
F
Significance F
2
5704.0273 2852.0137 65.0391
0.0000
12
526.2087 43.8507
14
6230.2360
Coefficients Standard Error t Stat P-value
Intercept
-20.3722
9.8139 -2.0758 0.0601
Size (100 sq ft)
4.3117
0.4104 10.5059 0.0000
Lot Size (1000 sq ft)
4.7177
0.7646
6.1705 0.0000
Statistic
(Mapped to
Output Above)

Formula
n

Dependent Variable
(sample mean of n
observations)

Dependent Variable
(estimated value for a
given vector of
independent variables)

Y
i

Error for observation i.


The unexplained
difference between the
actual value of Yi and
the prediction for Yi
based on our regression
model.

6230.2360
Total Sum of Squares

526.2087
Sum of Squares due to
Error

43.8507
Mean Squares due to
Error

Y
i 1

0.9568
0.9155
0.9015
6.6220
15

Regression
Residual
Total

Slope (or coefficient) for Independent


Variable 1 (unknown)

Statistic (Mapped
to Output Above)

Multiple R
R Square
Adjusted R Square
Standard Error
Observations
ANOVA

X1,Xk

Dependent Variable (a random variable)

an estimate of

St
ati
sti
cs

0.9155

R-square
(Coefficient of
Determination)

0.9568

R
0 1x 1i 2x 2i 3x 3i ... k xMultiple
ki

Symbol

Formula

SSE
TSS

R2

R2

(Coefficient of
Multiple Correlation)

R2

SSE

n k 1
1
SST

n 1

t0

TSS
(or SST)

SSE

Y i Y
i 1

Y i Yi
i 1

MSE

0.9015

Y i Yi

Adjusted R-square

SSR SSE

6.6220
Standard Error
(a.k.a. Standard
Error of the
Estimate)

-2.0758

t-statistic for testing

H 0 : 1 0 vs.
H A : 1 0

0.0601

SSE

n k 1

p-value for testing

H 0 : 1 0 vs.
H A : 1 0

122

p-value

SSE
n k 1

1 0
s 1

P T t0

Yoavi Liedersdorf (MBA03)

Statistics Cheat Sheets

5704.0273
Sum of Squares due to
Regression

2852.0137
Mean Squares due to
Regression

SSR

i 1

MSR

Yi Y

65.0391

MSR
MSE

n k 1
R2

k
1 R 2

SSR
k

123

Yoavi Liedersdorf (MBA03)

You might also like