You are on page 1of 108

Part 1 Probability and Distribution Theory

Statistical Inference and Regression


Analysis:
Stat-GB.3302.30, Stat-UB.0015.01
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics

Part 1 Probability and Distribution Theory

Part 1 Probability and


Distribution Theory

Part 1 Probability and Distribution Theory

1 Probability

Part 1 Probability and Distribution Theory

4/108

Sample Space
Random outcomes: The result of a process
l
l
l

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Percent

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Frequency

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Outcomes, experiments and sample spaces

Listing

Sequence of events,
Number of events,
Measurement of a length of time, space, etc.

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

5/108

Consumer Choice: 4 possible ways a


randomly chosen traveler might travel
between Sydney and Melbourne

= {Air, Train, Bus, Car}


600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

6/108

Market Behavior: Fair Isaacs credit card


service to major vendors

= {Reject, Accept}
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

7/108

Measurement of Lifetimes

A box of light bulbs states Average life is 1500


hours

Outcome = length of time until failure (lifetime) of a


randomly chosen light bulb

= {lifetime | lifetime > 0}

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

8/108

Events

Events are defined as


l Subsets of sample space, such as empty set
l Intersection of related events
l Complements such as A and not A
l Disjoint sets such as (train,bus),(air,car)

Any subset including is a disjoint union of


subsets:

= (Air, Train) (Bus, Car)


600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

9/108

Probability is a Measure

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Frequency

Listing

The sample space is a - field:


l Contains at least one nonempty subset (event)
l Is closed under complementarity
l Is closed under countable union
Probability is a measure defined on all subsets of
Axioms of Probability
l P() = 1
l A P(A) > 0
l If A B = {}, P(A B) = P(A) + P(B)

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

10/108

Implications of the Axioms


P(~A)

= 1 P(A) as A ~A =
P() = 0 as = ~ and P() = 1
B P(A) < P(B) as B = A + (~A B)
P(A B) = P(A) + P(B) P(A B)
A

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

11/108

Probability

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Frequency

Listing

Assigning probability: Size of an event relative to


size of sample space.
Counting rules for equally likely discrete outcomes
l Using combinations and permutations to count
elements
l Example: Discrete uniform, poker hands
l Example hypergeometric: the super committee
(House 242R,193D, Senate 49R, 51D&I)
Measurement for continuous outcomes

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

12/108

Applications:
Games of Chance; Poker
In

a 5 card hand from a deck of 52,


there are (52*51*50*49*48)/(5*4*3*2*1)
different possible hands. (Order
doesnt matter). 2,598,960 possible
hands.
How many of these hands have 4
aces? 48 = the 4 aces plus any of the
remaining 48 cards.
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

13/108

Some Poker Hands


Full House 3 of one kind, 2 of another.
(Also called a boat.)

Royal Flush Top 5 cards in a suit

Flush 5 cards in a suit, not sequential


Straight Flush 5 sequential cards in the
same suit suit

Straight 5 cards in a numerical row, not the same


suit
4 of a kind plus any other card

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

14/108

5 Card Poker Hands

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

15/108

The Dead Mans Hand


The dead mans hand is 5 cards, 2 aces, 2 8s
and some other 5th card (Wild Bill Hickok was
holding this hand when he was shot in the back
and killed in 1876.) The number of hands with
two aces and two 8s is 4 4 44 = 1,584

2 2

The rest of the story claims that Hickok held all


black cards (the bullets). The probability for this
hand falls to only 44/2598960. (The four cards in
the picture and one of the remaining 44.)
Some claims have been made about the 5th card,
but noone is sure there is no record.

http://en.wikipedia.org/wiki/Dead_man's_hand

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

16/108

Budget Supercommittee

D
57
Senate
3

41

3

2

0

How many possible committees? About 9.3E20


57(56)(55) 41(40)(39) 2(1)

3(2)(1)
3(2)(1) 2(1)

193 242 0
House


3 3 0

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

193(192)(191) 242(241)(240) 1
3(2)(1)
3(2)(1)
1

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

17/108

Conditional Probability
P(A|B)

= P(A,B)/P(B)
= Size of A relative to a subset of
Basic result p(A,B) = p(A|B) p(B)
(follows from the definition)
Bayes theorem
l Applications

mammography, drug
testing, lie detector test, PSA test.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

18/108

Using Conditional Probabilities: Bayes


Theorem
P(A,B)
P(A | B) =
P(B)
P(B | A)P(A)
=
P(B)
P(B | A)P(A)
=
P(A,B) + P(notA,B)
P(B | A)P(A)
=
P(B | A)P(A) + P(B | notA)P(notA)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

100000
15000

17500

20000

22500
25000
IncomePC

27500

30000

32500

6
4

200000

100000
15000

200000

400000
600000
Listing

800000

1000000

369687
156865
51

80

300000

10

Mean
StDev
N

10

500000
400000

20

300000
200000

60
50
40
30

Normal

100

12

700000
600000

70

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

80

600000

200000

369687
156865
51
0.994
0.012

Computation

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

Definition

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Theorem

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Target

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

19/108

Drug Testing

Pepperoni
21.8%

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Meatball
Garlic 5.0%
2.3%

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

What are P(D|+) and P(N|)?


Note, P(D|+) = the probability that a patient actually has the
disease when the test says they do.

Percent

Frequency

Listing

Data
l P(Test correctly indicates disease)=.98 (Sensitivity)
l P(Test correctly indicates absence)=.95 (Specificity)
l P(Disease) = .005 (Fairly rare)
Notation
l + = test indicates disease, = indicates no disease
l D = presence of disease, N = absence of disease
Data:
l P(D) = .005 (Incidence of the disease)
l P(+|D) = .98
(Correct detection of the disease)
l P(|N) = .95
(Correct failure to detect the disease)

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

20/108

More Information
Deduce: Since P(+|D)=.98,
we know
P(|D)=.02
because
P(-|D)+P(+|D)=1
[P(|D) is the P(False negative).
Deduce: Since P(|N)=.95,
we know
P(+|N)=.05
because
P(-|N)+P(+|N)=1
[P(+|N) is the P(False positive).
Deduce: Since P(D)=.005,
P(N)=.995
because
P(D)+P(N)=1.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

21/108

Now, Use Bayes Theorem


We have P(+|D)=.98.

What is P(D|+)?

P(D and +)
P(+|D)P(D)
=
(By Bayes Theorem)
P(+)
P(+)
P(+) = P(D and +) + P(N and +)
P(D|+)=

= P(+|D)P(D) + P(+|N)P(N) so
P(D|+) =
=

P(+|D)P(D)
P(+|D)P(D)
=
P(+)
P(+|D)P(D) + P(+|N)P(N)

.98(.005)
= 0.08966 (!!)
.98(.005)+.05(.995)

Using the same approach, P(N|-) = 0.999889

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

22/108

Independent events
Definition:

P(A|B) = P(A)

Multiplication

rule P(A,B) = P(A)P(B)

Application:

Infectious disease
transmission

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

2 Random Variables

Part 1 Probability and Distribution Theory

24/108

Random Variable
Definition:

Maps elements of the


sample space to a single variable:
Assigns a number to
l Discrete:

Payoff to poker hands


l Continuous: Lightbulb lifetimes
l Mixed:
Ticket sales with capacity
constraints. (Censoring)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

25/108

Market Behavior: Fair Isaacs credit card


service to major vendors

= {Reject, Accept}
X = 0=reject, 1=accept
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

26/108

Caribbean Stud Poker


{---------------- Sample Space --------------}

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

Variable

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Probability

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

27/108

Features of Random Variables


Probability

Distribution

l Mass

function:
Prob(X=x)=f(x)
l Density function: f(x), x = ...
Cumulative
l Prob(X

probabilities; CDF

< x)

l F(x)
Quantiles:

x such that F(x) = Q

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

Percent

900000

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

x = median, Q = 0.5.
Frequency

l Median:

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

28/108

Discrete Random Variables

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Frequency

Listing

Elemental building block


l Bernoulli: Credit card applications
l Discrete uniform: Die toss
Counting Rules
l Binomial: Family composition
l Hypergeometric: House/Senate Supercommittee
Models
l Poisson: Diabetes incidence, Accidents, etc.

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

29/108

Market Behavior: Fair Isaacs credit card


service to major vendors

X = 0=reject, 1=accept
Prob(X=x)=(1-p)(1-x)px, x=0,1
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

30/108

Binomial
n x
n-x
Prob(X=x)= p (1-p)
x
Sum of n Bernoulli trials

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

31/108

Examples
Family has 4 children.
4 1
Prob[4 daughters] =
4 2

1
1
1

2 16

Weapons system has 20 components. It will fail if 2 or more break down.


Prob any component fails is 0.2
Prob(system breaks down) = Prob[X 2] =

20 2 20 x
.2 .8
x

20
x=2

= 1 - Prob[X < 2]
= Prob(X=0) + Prob(X=1)
= (1).20 .820 + (20).21.819 = .819 (.8 + (20).2)) = .069167

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

32/108

Poisson
Approximation

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

Percent

900000

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

model for a type of process

Frequency

General

to binomial

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

33/108

Poisson Approximation to
Binomial
n
Prob(X=x|n,p)= p x (1 p )n x
x

1
n

n x

, = np

x

n!
= x
1
x ! n (n x)! n
exp() x

, x = 0,...,n
n
x!
Approximation improves as
0
900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

90

400000

200000
100000
15000

60
50
40
30

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

Mean
StDev
N

369687
156865
51

80

200000

Normal

10

500000

300000

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing


100

12

700000

400000

10

17500

Histogram of Listing
14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

900000
Mean
StDev
N
AD
P-Value

95

600000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

99

700000

300000

100000

Probability Plot of Listing


1
n

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Percent

n!

=

x !(n x)! n

= np, n ,p

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

34/108

Diabetes Incidence per 1000

exp(-7)7
Prob(D=d)
, d=0,1,...,1000.
d!
http://www.cdc.gov/diabetes/statistics/incidence/fig2.htm

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

35/108

Poisson Distribution of Disease


Cases in 1000 Draws with =7
Poisson Probabilities for Diabetes Cases
0.16
0.14

PoissonProbability

0.12
0.10
0.08
0.06
0.04
0.02
0.00

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

Mean
StDev
N

369687
156865
51

80

200000

Normal

10

500000

300000

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing


100

12

700000

400000

10

16

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

14

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

12

Scatterplot of Listing vs IncomePC

Normal - 95% CI

99

700000

300000

100000

Probability Plot of Listing

10

17500

20000

22500
25000
IncomePC

27500

30000

32500

Percent

900000

8
Cases

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Frequency

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

36/108

Poisson Process: Doctor visits in the survey year


by people in a sample of 27,326. = .8

Poisson probability model is a description of this process, not an approximation

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

37/108

Continuous RV
Density

function, f(x)

Probability

measure P(event)
obtained using the density.

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Lightbulb lifetimes?

Percent

Application:

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

38/108

Probability Density Function; PDF


Density: f(x) 0
Definition:

f(x)dx=1

Definition: CDF=Prob(X x)=F(x)=

f(x)dx

('Cumulative Density Function' or 'Distribution Function')


Probability: P()= f(x)dx

In range a to b Prob(a < X < b) = Prob(a X b) = f(x)dx


a

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

39/108

CDF and Quantiles


pth

= quantile; 0 < p < 1

Quantile

xp = F-1(p).

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Percent

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Frequency

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

p = .5, xp = median

Listing

For

Percent

= xp such that F(xp) = p.

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

40/108

Model for Light Bulb Lifetimes


This is the exponential model for lifetimes. The model is f(time) = (1/) e-time/

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

41/108

Model for Light Bulb Lifetimes


The area under the entire curve is 1.0.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

42/108

Continuous Distribution
The probability associated with an interval such as 1000 < LIFETIME < 2000 equals the area under the curve
from the lower limit to the upper.

A partial area will be between


0.0 and 1.0, and will produce a
probability.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

43/108

Probability of a Single Value Is Zero

The probability associated with a single point, such as LIFETIME=2000, equals 0.0.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

44/108

Probabilities via the CDF


b

Prob(a X b) = f ( x)dx
a

= F (b) F ( a)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

45/108

Probability for a Range of Values Based on CDF

Prob(Life < 2000)

(.7364)

Minus

Prob(Life < 1000)

(.4866)

Equals

Prob(1000 < Life < 2000) (.2498)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

46/108

Common Continuous RVs

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Percent

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Frequency

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Listing

Continuous random variables are all models; they do


not occur in nature. The model builders toolkit:
l Continuous uniform
l Exponential
l Normal
l Lognormal
l Gamma
l Beta
Defined for specific types of outcomes

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

47/108

Continuous Uniform
= 1/(b a), a < x < b

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

= x/(b a), a < x < b.

Percent

F(x)

Frequency

f(x)

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

48/108

Exponential
f(x)

= exp(-x), x > 0, 0 otherwise


F(x) = 1 exp(-x), x > 0
Median: F(M) = .5
1 exp(-M) = .5
exp(-M) = .5
M = ln.5
M = -ln.5/
= (ln2)/

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

49/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

49

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

50/108

Gamma Density Uses the


Gamma Function
Definition
( P ) =

x P 1e x dx (Note the limits of integration.)

Extremely Useful Recurrence (Prove by integration by parts)


( P) = ( P 1)( P 1)
Implication for positive integers: ( P) = ( P 1)! and (1) = 0! = 1
Also useful special case: (1 / 2) =

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

51/108

Gamma Distributed Random Variable


P x P 1ex
f ( x | , P ) =
, x 0, > 0, P > 0.
( P )
Used to model nonnegative random
variables e.g., survival of people and
electronic components
Two special cases
P = 1 is the exponential distribution
P = and = is the chi squared
with one degree of freedom

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

52/108

Beta Uses Beta Integrals


Beta density is used to model a random variable that ranges
from 0 to 1 such as a proportion
1

(a, b) = x a 1 (1 x)b 1 dx =
0

(a )(b)
( a + b)

(a + b) a 1
x (1 x)b 1 ,
(a )(b)
0 x 1, a > 0, b > 0.

The beta density is f(x|a,b) =

Useful special case, a = 1 and b = 1 is the Uniform(0,1)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

53/108

Normal Density The Model


1

1 x -
f(x) =
exp -

2
2

, - < x < +

Mean = , standard deviation =

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

54/108

Normal Distributions

The scale and location


(on the horizontal
axis) depend on and
. The shape of the
distribution is always
the same. (Bell curve)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

55/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

56/108

Standard Normal Density (0,1)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

57/108

Lognormal Distribution
(ln x )2
f ( x) =
exp
, x > 0.
2
2
x 2

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

58/108

Censoring and Truncation

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Percent

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Frequency

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Listing

Censoring
l Observation mechanism. Values above or below a certain
value are assigned the boundary value
l Applications, ticket market: demand vs. sales given capacity
constraints; top coded income data
Truncation
l Observation mechanism. The relevant distribution only
applies in a restricted range of the random variable
l Application: On site survey for recreation visits. Truncated
Poisson
l Incidental truncation: Income is observed only for those
whose wealth (not income) exceeds $100,000.

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

59/108

Truncated Random Variable


Untruncated

variable has density f(x)


Truncated variable has density
f(x)/Prob(x is in range)
f(x)
Truncated Normal: f(x|x a)=

Prob(x a)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Marginal Plot of Listing vs IncomePC

Normal

100

12

700000

400000

10

),x a

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

1- (a-)

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

( )

(x-)

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

60/108

Truncated Normal:
f(x|x>a) = f(x)/Prob(x>a)

F(x | x > XL )

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

61/108

Truncated Poisson
f(x)=

exp(-) x / (x+1)

f(x|x>0)

= f(x)/Prob(x>0)
= f(x) / [1 Prob(x=0)]

= {exp(-) x / (x+1)} / {1 - exp(-)}

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

62/108

Representations of a Continuous
Random Variable
Representations
l Density,

f(x)
l CDF, F(x) = Prob(X < x)
l Survival, S(x) = Prob(X > x) = 1-F(x)
l Hazard function, h(x) = -dlnS(x)/dx
Representations

are one to one


each uniquely determines the
distribution of the random variable

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

63/108

Application: A Memoryless Process


Hazard Rate
h(t ) = lim t
0 Prob(t < T < t+t|T t)
= Probability of failure in the next t interval given that
failure has not occurred up to t.
In a memoryless process, the hazard rate does not depend on how
long the process has been ongoing.
h(t) = (a constant)
d ln S (t )
= .
dt
General solution: S(t) = Kexp(-t)
h(t)= -

Terminal condition: S(0) = 1, so K = 1 and S(t) = exp(-t)


F(t) = 1 - S(t) = 1 - exp(-t) = the CDF of the exponential random variable

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

64/108

A Change of Variable
Theorem: x = a continuous RV with
continuous density f(x). y=g(x) is a
monotonic function over the range of x.
y=g(x), f(y) = f(x(y)) |dx(y)/dy)|
= f(x(y)) |dg-1(y)/dy)|

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

65/108

Change of Variable Applications


Standardized

to normal

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

probability transform

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Fundamental

Percent

Lognormal

normal

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

66/108

Standardized Normal
X ~ N[, 2]
Prob[X < a] = F(a)
Prob[X < a] = Prob[(X - )/ ] < (a - )/
y = (x - )/
l J = dx(y)/dy =
l f(y) = f(y+ ) = [1/sqr(2)]exp(-y2/2)
Only a table for the standard normal is needed.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

67/108

Textbooks Provide Tables of


Areas for the Standard Normal

Econometric Analysis, WHG, 2008,


Appendix G, page 1093, Rice Table 2
Note that values are only given for z
ranging from 0.00 to 3.99. No values
are given for negative z.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

68/108

Computing Probabilities
Standard

Normal Tables give probabilities


when = 0 and = 1.
For other cases, do we need another
table?
Probabilities for other cases are obtained
by standardizing.
l Standardized

variable is z = (x )/
l z has mean 0 and standard deviation 1

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

69/108

Standard Normal Density

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

70/108

Standard Normal Distribution Facts


The

random variable z runs from - to +


(z) > 0 for all z, but for |z| > 4, it is
essentially 0.
The total area under the curve equals 1.0.
The curve is symmetric around 0. (The
normal distribution generally is symmetric
around .)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

71/108

Only Half the Table Is Needed

The area to left of 0.0 is exactly 0.5.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

72/108

Only Half the Table Is Needed

The area left of 1.60 is exactly 0.5 plus the area between 0.0 and 1.60.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

73/108

Areas Left of Negative Z

Area left of -1.6 equals area right of +1.6.

Area right of +1.6 equals 1 area to the left of +1.6.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

74/108

Computing Probabilities by Standardizing:


Example
P [ 4.5 x 8 | = 3.5, = 2.0 ]
x
8
4.5
=P



x 3.5
8 3.5
4.5 3.5
=P

2.0
2.0
2.0
= P[0.5 z 2.25]
= P[z 2.25] - P[z 0.5]
= 0.9878 0.6915
= 0.2963
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

75/108

Lognormal Distribution
(lnx-) 2
f(x)=
exp , x > 0.
2
2
x 2

y = lnx, x = exp(y), dx/dy = exp(y)


( y ) 2
1
f(y) =
exp
exp( y )
2
2
exp( y ) 2

( y ) 2
=
exp
.
2
2
2

If x has a lognormal distribution, ln x has a normal distribution

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

76/108

Lognormal Distribution of
Monthly Wages in NLS

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

76

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

77/108

Log of Lognormal Variable

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

77

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

78/108

Fundamental Probability
Transformation
x ~ f(x). F(x) is the cdf of x.
What is the distribution of y = F(x), 0 y 1?
The CDF of y is F(y).
Prob(Y y) = Prob(F(x) y)
= Prob(x F-1 (y)]
= F(F-1 (y))
= y
This is a uniform distribution.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

79/108

Random Number
Generation
The

CDF is a monotonic function of x


If u = F(x), x = F-1(u)
We can generate u with a computer
l Example:

Exponential
l Example: Normal

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

80/108

Generating Random Samples


Exponential
l u

= F(x) = 1 exp(-x)
l 1 u = exp(-x)
l x = (-1/ ) ln(1 u)
Normal

(,)

l u

= (z)
l z = -1(u)
l x = z + = -1(u) +
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

81/108

U[0,1] Generation
Linear

congruential generator
x(n) = (a x(n-1) + b)mod m
Properties of RNGs
l Replicability

they are not RANDOM

l Period
l Randomness

tests

The

Mersenne twister: Current state of the


art (of pseudo-random number generation)
600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

3 Joint Distributions

Part 1 Probability and Distribution Theory

83/108

Jointly Distributed Random


Variables
Usually some kind of association
between the variables. E.g., two
different financial assets
Joint cdf for two random variables
F(x, y) = Prob(X < x, Y < y)

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

84/108

Probability of a Rectangle
a1

b1
b2

Prob[a1 < x < b1, a2 < y < b2]

a2

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

100000
15000

17500

20000

22500
25000
IncomePC

27500

30000

32500

6
4

200000

100000
15000

200000

400000
600000
Listing

800000

1000000

369687
156865
51

80

300000

10

Mean
StDev
N

10

500000
400000

20

300000
200000

60
50
40
30

Normal

100

12

700000
600000

70

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

80

600000

200000

369687
156865
51
0.994
0.012

F(a1,a2)

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

Percent

900000

F(a1,b2)

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Frequency

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

F(b1,a2)

Listing

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Percent

F(b1,b2)

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

85/108

Joint Distributions
Discrete:

Multinomial for R kinds of


success in N independent trials

Continuous:

Bi- and Multivariate

normal

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

Percent

900000

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Conditional regression models

Frequency

Mixed:

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

86/108

Multinomial Distribution
N trials, X1 ,...,X N
R different types of 'success'
Outcome: x1 ,...,x R such that

R
r =1

x r =N

N x1
xR
Prob[X1 =x1 ,...,X R =x R ] =
p
...
p
1
R
x
...
x
1 R
Example: Mix of suits in 13 card bridge hand
13!
1
Prob[XS ,X D ,X C ,X H ] =

XS! X D! X C! X H ! 4

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

100000
15000

17500

20000

22500
25000
IncomePC

27500

30000

32500

6
4

200000

100000
15000

200000

400000
600000
Listing

800000

1000000

369687
156865
51

80

300000

10

XH

Marginal Plot of Listing vs IncomePC

Mean
StDev
N

10

500000
400000

20

300000
200000

60
50
40
30

1

4

Normal

100

12

700000
600000

70

XC

Empirical CDF of Listing

14

800000

80

600000

200000

369687
156865
51
0.994
0.012

1

4

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

XD

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

1

4

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

XS

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

87/108

Probabilities: Inherited Color Blindness


Inherited color blindness has different incidence rates in men and women.
Women usually carry the defective gene and men usually inherit it.
Pick an individual at random from the population.
B=1 = has inherited color blindness, B=0, not color blind
G=0 = MALE = gender, G=1, Female
Marginal: P(B=1)
= 2.75%
Conditional: P(B=1|G=0)
= 5.0%
(1 in 20 men)
P(B=1|G=1)
= 0.5%
(1 in 200 women)
Joint:
P(B=1 and G=0)
= 2.5%
P(B=1 and G=1)
= 0.25%

Meatball
Garlic 5.0%
2.3%

Pepperoni
21.8%

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Percent

Frequency

Listing

Percent

Listing

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

88/108

Marginal Distributions
Prob[X=x] = y Prob[X=x,Y=y]
Color Blind

Plain
32.5%

G=1

.4975

.0025

0.50

Total

.97255

.0275

1.00

600000
500000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Prob[G=0]=Prob[G=0,B=0]+
Prob[G=0,B=1]

Percent

Scatterplot of Listing vs IncomePC


900000

400000

Mushroom
16.2%

0.50

900000

Total

Frequency

Sausage
5.8%

.025

700000

Listing

Pepper and Onion


7.3%

.475

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

B=1

Listing

Pepperoni
21.8%

G=0

Listing

Meatball
Garlic 5.0%
2.3%

B=0

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Gender

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

89/108

Joint Continuous Distribution


Probability associated with a region A
Prob((X,Y) A)=

f(x,y)dydx

f(x,y) is the joint density function


Joint CDF
For a region A = {(X,Y) with X x, Y y}
F(x,y) =

f(x,y)dydx

Joint density
2 F(x,y)
f(x,y) =
xy

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

90/108

Marginal Distributions
If f(x,y) is the joint density, x A, y B, then f(y) =

f(x,y)dx

Example: f(x,y) = (x)e ( x ) y , y 0, 0 x 1.


f(y) =

(x)e-(x)y dx. Make the change of variable to t = x

1 -ty
1 (-y)t
te
dt
=
te dt. Now let a = -y

0
0

1 at
t 1 at
f(y) =
te
dt
=
- 2 e 0 where a = -y
0
a a
1
f(y) = 2 2 1-ye-y -e-y
y
f(y) =

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

91/108

Two Leading Applications

Function - Application in Finance

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Normal Distribution

Percent

Bivariate

Frequency

Copula

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

92/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

93/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

94/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

95/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

96/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

97/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

98/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

99/108

The Bivariate Normal Distribution


(x, y) distributed as bivariate normal if
(1) Infinite range - < x,y < +
(2) Normal marginal distributions:
2
2

1
1 x x
1

1
y

f(x) =
exp
exp

, f(y) =
2 y
x 2
y 2
2 x

(3) Joint Density


2
2

x x
1
1 x x
y

f(x,y)=
exp
2
+
2
2

2(1

x
2 x y 1

(6) New parameter is the correlation between x and y.

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

y y

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

100/108

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

101/108

Independent Random Variables


y) = Prob(X < x, Y < y)
= Prob(X < x) Prob(Y < y)
= FX(x) FY(y)

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

= 2 F(x,y)/x y
= f(x) f(y)

Percent

f(x,y)

Frequency

F(x,

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

102/108

Independent Normals
(x, y) distributed as bivariate normal
2
2

x x
1
1 x x
y
f(x,y) =
exp
+

2(1 2 ) x y
2 x y 1 2
x

(2) Uncorrelated; = 0

y y


y

2
2

1
1 x x

f(x,y) =
exp

+
2 x y
2 x y

1 y 2
1 x 2 1
y
x
=
exp
exp

2 y
x 2
2 x y 2

In the bivariate normal case (not generally), correlation = 0 implies indepencence


1

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

103/108

Conditional Distributions
Conditional Probability: Prob(X A | Y B)
Prob(X A and Y B)
Prob(Y B)

In the discrete case: Prob(X = x | Y = y)


Prob(X = x and Y = y)
Prob(Y = y)

Color Blind

.025

0.50

G=1 (F)

.4975

.0025

0.50

Total

.97255

.0275

1.00

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

Listing

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Percent

Pepperoni
21.8%

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

.475

Listing

Meatball
Garlic 5.0%
2.3%

Total

G=0 (M)

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

B=1
(Yes)

Percent

B=0
(No)

Frequency

Gender

Prob(Not color blind given male)


Prob(B=0|G=0)
= Prob(B=0,G=0) / Prob(G=0)
=
.475 / .50
=
.950
Prob(B=1|G=0) = .025/.5 = .05
Prob(B=1|G=0)+Prob(B=0|G=0)=1

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

104/108

Conditional Distribution
Continuous Normal
f (y, x)
f (x)
(x, y) distributed as bivariate normal
f(y|x) =

x 2 y 2
y y

y
x
x
f(x,y) =
exp
2

+

2
2

2(1 ) x y
2 x y 1
x y

1 x 2
1
x
f(x) =
exp

x 2
2 x
2

)(x

)]
1

1
1
1

y
y
x
x
y|x

=
f (y | x) =
exp
exp

2
y|x 2
2 y|x
y 2 1 2
y 1 2

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Marginal Plot of Listing vs IncomePC

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

105/108

Bivariate Normal
Joint

distribution is bivariate normal


Marginal distributions are normal
Conditional distributions are normal
y|x = Y +

y
x

(x x )

= the conditional mean function


= the regression function of y on x.
Note : y|x = + x

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

106/108

Y and
Y|X
Y

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

107/108

Model Building

Pepperoni
21.8%

Sausage
5.8%

900000

800000

800000

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

900000

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

500000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

600000

300000

100000

Probability Plot of Listing


99

700000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Meatball
Garlic 5.0%
2.3%

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

Percent

Frequency

Listing

Typically f(y|x) is of interest


x is generated by a separate process f(x)
Joint distribution is f(y,x)=f(y|x)f(x)
Ex: demographic
y = log(household income|family size)
x = family size
l y|x ~ Normal(y|x , y|x )
l x ~ Poisson ()

Percent

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

Part 1 Probability and Distribution Theory

108/108

X=4

X=3

X=2

X=1

y|x ~ Normal[ 20 + 3x, 42 ], x = 1,2,3,4; Poisson

600000
500000
400000

Mushroom
16.2%

Plain
32.5%

Scatterplot of Listing vs IncomePC

Normal - 95% CI

90

500000
400000

200000
100000
15000

60
50
40
30

17500

20000

22500
25000
IncomePC

27500

30000

32500

200000

100000
15000

400000
600000
Listing

800000

1000000

369687
156865
51

80

200000

Mean
StDev
N

10

500000

300000

Normal

100

12

700000

400000

10

Marginal Plot of Listing vs IncomePC

Empirical CDF of Listing

14

800000

600000

70

20

300000

200000

369687
156865
51
0.994
0.012

80

600000

Histogram of Listing

900000
Mean
StDev
N
AD
P-Value

95

700000

300000

100000

Probability Plot of Listing


99

17500

20000

22500
25000
IncomePC

27500

30000

32500

1000000

60

800000

40
Listing

800000

800000

Percent

900000

Frequency

Sausage
5.8%

Scatterplot of Listing vs IncomePC

900000

700000

Listing

Pepper and Onion


7.3%

Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

Listing

Pepperoni
21.8%

Listing

Meatball
Garlic 5.0%
2.3%

Percent

Pie Chart of Percent vs Type

Mushroom and Onion


9.2%

20

600000
400000

0
0

200000

300000

400000

500000 600000
Listing

700000

800000

900000

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing

200000
15000

20000

25000
IncomePC

30000

You might also like