Professional Documents
Culture Documents
Topics Outline
Probability, Probability Distributions, Describing Distributions
Decision Analysis
Statistical Inference
Simple Linear Regression
Terms and Concepts
1. Probability of events
2. Probability rules
4. Conditional probability
19. Histogram
5. Independent events
, variance
of a
-1-
known)
unknown)
37. Likelihoods
41. EVPI
(Expected value of perfect information)
42. EVSI
(Expected value of sample information)
-2-
Example 1
Cooper realty is a small real estate company located in Albany, New York, specializing
primarily in residential listings. They recently became interested in determining the
likelihood of one of their listings being sold within a certain number of days.
An analysis of company sales of 800 homes in previous years produced the following data.
Initial Asking Price
Under $150,000
$150,000 - $199,999
$200,000 - $250,000
Over $250,000
Total
Total
100
250
400
50
800
(a) If A is defined as the event that a home is listed for more than 90 days before being sold,
estimate the probability of A.
The joint probability table is:
Total
0.125
0.3125
0.5
0.0625
1
B?
or
P(A
(d) Assuming that a contract just signed to list a home has an initial asking price of less than
$150,000, what is the probability that the home will take Cooper realty more than 90 days to sell?
P(A | B) = P(A
B) P(A)P(B) = 0.0313
-4-
Example 2
Nine percent of undergraduate students carry credit card balances greater than $7000.
Suppose 10 undergraduate students are selected randomly to be interviewed about credit
card usage.
(a) Is the selection of 10 students a binomial experiment? Explain.
Yes. Since they are selected randomly, p is the same from trial to trial and the trials
are independent. We have a binomial experiment with n = 10 and p = .09
f ( x)
10!
(.09) x (.91)10
x !(10 x)!
(b) What is the probability that two of the students will have a credit card balance greater
than $7000?
10
10!
8
P(X = 2) = f(2) =
(0.09) 2 (1 0.09)10 2
(0.09) 2 0.91
2
2!(10 2)!
= (45)(0.0081)(0.4703) = 0.1714
(c) How many students would you expect to have a credit card balance greater than $7000
in the sample of 10 students?
E( X )
np = (10)(0.09) = 0.9
(e) What is the standard deviation of the number of students with credit card balances
greater than $7000?
0.819 = 0.905
-5-
Example 3
The time customers spend in a record store is uniformly distributed between 3 and 12
minutes.
(a) What is the probability that a customer will spend less than 5 minutes in the store?
P(X < 5) = (5 3)(1/9) = 2/9 = 0.2222
(b) What is the probability that a customer will spend exactly 5 minutes in the store?
P(X = 5) = 0
(c) What is the probability that a customer will spend between 5 and 15 minutes in the store?
P(5 X 15) = (12 5)(1/9) = 7/9 = 0.7778
(d) Determine the expected time customers spend in the store.
E(x) =
a b
2
3 12
2
15
2
7.50
(e) Compute the standard deviation for the time customers spend in the store.
2
( a b) 2
12
(3 12) 2
12
92
12
81
= 6.75
12
6.75 = 2.5981
-6-
Example 4
The scores of adults on an IQ test are approximately normal with mean 100 and standard
deviation 15.
= 100
= 15
(a) Corinne scores 118 on such a test. She scores higher than what percent of all adults?
118 100
15
1.2
130 100
15
z 90
z120
90 100
15
0.67
120 100
15
1.33
area between 90 and 120 = (area to the left of 1.33) (area to the left of 0.67)
= 0.9082 0.2514
= 0.6568
65.68% of all adults score between 90 and 120.
-7-
(d) What percent of all adults score within one standard deviation of the mean?
The area within one standard deviation from the mean of a normal distribution with
mean
and standard deviation
is equal to the area between (1) and (+1) on the
standard normal distribution.
area between 85 and 115 = (area to the left of z = +1) (area to the left of z = 1)
= 0.8413 0.1587
= 0.6826
68.26% of all adults score within one standard deviation from the mean.
(e) What percent of all adults score within two standard deviations from the mean?
area between 70 and 130 = (area to the left of z = +2) (area to the left of z = 2)
= 0.9772 0.0228
= 0.9544
95.44% of all adults score within two standard deviations from the mean.
(f) What IQ scores would place Corinne in the bottom 30% of all adults?
The area of 0.3015 is to the left of z = 0.52.
Solving
0.52
x 100
gives x = 92.20 92
15
Scores below 92 would place Corinne in the bottom 30% of all adults.
(g) How well must Corinne do in order to place in the top 20% of all adults?
The cut-off point for top 20% is equal to the cut-off point for bottom 80%.
The area of 0.7995 is to the left of z = 0.84.
x 100
gives x = 112.6 113
15
Corinne must score 113 or better to place in the top 20% of all adults.
Solving 0.84
-8-
Example 5
Marketing a New Product at Acme (See Acme_MarketingDecisions.xlsx)
The Acme Company is trying to decide whether to market a new product. Acme believes that it
might be wise to introduce the product in a regional test market before introducing it nationally.
Acme estimates that the net cost of the test market is $100,000. Based on the results of the test
market, it can then decide whether to market the product nationally, in which case it will incur a fixed
cost of $7 million. Acmes unit margin (the difference between the anticipated selling price and the
known unit cost of the product) is $18. We assume this is relevant only for the national market.
Acme classifies the results in either the test market or the national market as great, fair, or awful.
Let NG, NF, and NA represent great, fair, and awful national-market results, respectively,
and TG, TF, and TA represent similar events for the test market.
In the absence of any test market information, Acme estimates that probabilities of the three
national market outcomes are 0.45, 0.35, and 0.20, respectively. Each of the results in the
national market is accompanied by a forecast of total units sold. These sales volumes (in
1000s of units) are 600 (great), 300 (fair), and 90 (awful). In addition, Acme has the following
historical data from products that were introduced into both test markets and national markets.
Of the products that eventually did great in the national market, 64% did great in the test market,
26% did fair in the test market, and 10% did awful in the test market.
Of the products that eventually did fair in the national market, 18% did great in the test market,
57% did fair in the test market, and 25% did awful in the test market.
Of the products that eventually did awful in the national market, 9% did great in the test market,
48% did fair in the test market, and 43% did awful in the test market.
(a) What are Acmes possible strategies?
Acme must first decide whether to run a test market.
Then it must decide whether to introduce the product nationally.
(b) What are the prior probabilites of national-market results?
P(NG) = 0.45
P(NF) = 0.35
P(NA) = 0.20
(c) What are the likelihoods of fair test-market results, given national-market results?
According to the historical percentages,
P(TF|NG) = 0.26
P(TF|NF) = 0.57
P(TF|NA) = 0.48
(e) What is the probability of great national result, given fair test result?
Using Bayes rule gives
To interpret this tree, recall that each value just below each node name is an EMV.
(These are colored red or green in Excel.)
For example, the 796.76 in cell B41 is the EMV for the entire decision problem.
It means that Acme's best EMV from acting optimally is $796,760.
- 10 -
As another example, the 74 in cell D35 means that if Acme ever gets to that point
there is no test market and the product is marketed nationally the EMV is $74,000.
Actually, this is the expected selling profit minus the $7 million fixed cost, so the expected
selling profit, given that no information from a test market has been obtained, is $7,074,000.
(h) Use the decision tree to find Acmes optimal strategy.
Acme's optimal strategy is apparent by following the TRUE branches from left to right:
- 11 -
70%
60%
Probability
50%
40%
30%
20%
10%
4000
3000
2000
1000
-1000
-2000
-3000
-4000
-5000
-6000
0%
No
$30
$25
$20
$15
$10
Yes
$5
Expected Value
Similarly, for large unit margins, it is also best not to run a test market.
Again, the top line is 100 above the bottom line. However, the reasoning now is different.
For large unit margins, the company should market nationally regardless of test-market
results, so there is no reason to spend money on a test market.
Finally, for intermediate unit margins, as in the original model, the chart shows that it
is best to run a test market.
- 12 -
(l) Currently, the fixed cost of the test market is $100,000. How much this test market is really worth?
In other words, what is the EVSI (expected value of sample information)?
EVSI = EMV with (free) sample information EMV without information
The EMV from test marketing is $796,760.
$100,000 of it is the cost of the test market.
Therefore, if this test market were free, the expected profit would be $896,760.
On the other hand, the EMV from not running a test market is $74,000 (see cell C31 in the tree).
The difference is EVSI:
EVSI = $896,760 $74,000 = $822,760
Intuitively, running the test market is worth something because it changes the optimal decision.
With no test-market information, the best decision is to market nationally (see the top part of the tree).
However, with the test-market information, the ultimate decision depends on the test- market results.
Specifically, Acme should market nationally only if the test-market result is great.
This is what makes information worth something its outcome affects the optimal decision.
(m) In general, Acme might have many sources of information it could obtain that would help it make its
national decision; the test market is just one of them. How much such information could be worth?
This is answered by EVPI, the expected value of perfect information. Imagine that Acme
could purchase an envelope that has the true national-market result great, fair, or awful
written inside. EVPI is what this envelope is worth.
If the envelope reveals that the national market result will be great, then Acme will have a
profit of $3,800,000 (600 units sold times $18 per unit minus the fixed cost of $7 million).
If the contents of the envelope reveal that the national market will be fair or awful Acme
should abandon the product right away (that is, the profit will be $0).
The probabilities for great, fair, and awful national market are 0.45, 0.35, and 0.20, respectively.
Therefore, if the envelope (perfect information) is free
EMV with (free) perfect information = 0.45($3,800,000) + 0.35($0) + 0.20($0) = $1,710,000
If there is no information, the EMV is $74,000. Therefore,
EVPI = EMV with (free) perfect information EMV without information
= $1,710,000 $74,000 = $1,636,000
No sample information, test market or otherwise, could possibly be worth more than this.
So if some hotshot market analyst offers to provide extremely reliable market information
to Acme for, say, $1.8 million, Acme knows this information cannot be worth its cost.
- 13 -
Example 6
To estimate the mean height
of male students on your campus, you measure a simple random
sample of 25 students. You know from government data that the heights of young men vary
according to the normal distribution with mean
= 70 inches and standard deviation = 2.8 inches.
(a) If you choose one student at random, what is the probability that he is between 69 and 71 inches tall?
For the height X of an individual student,
P(69 < X < 71) = P
69 70
2.8
71 70
2.8
69 70
2.8
25
71 70
2.8
25
Example 7
A manager of an insurance company wanted to see how well one of his sales
representatives was doing, so he randomly selected 30 matured policies that had been
sold by the sales rep and computed the net profit (premium charged minus paid claims),
for each of the 30 policies:
Profit (in $) from 30 policies
222.80
463.35 2089.40
1756.23
-66.20 2692.75
1100.85
57.90 2495.70
3340.66
833.95 2172.70
1006.50
1390.70 3249.65
445.50
2447.50
-397.10
3255.60
1847.50
-397.31
3701.85
865.40
186.25
-803.35
1415.65
590.85
3865.90
2756.94
578.95
(a) Are the necessary conditions for statistical inference satisfied?
The sample was selected randomly from the matured policies sold by the sales representative.
The sample appears to be unimodal and fairly symmetric without strong skewness or outliers.
The sample size is pretty large and the use of t distribution with df = n 1 = 30 1 = 29 is safe.
10
9
8
Count
7
6
5
4
3
2
1
0
-500
500
1500
2500
3500
Profit
(b) Construct a 95% confidence interval for the mean profit of policies sold by the sales rep.
We calculate the sample mean = $1,438.90 and standard deviation s = $1,329.60
The critical value for t distribution with df = n 1 = 30 1 = 29 (from Table C) for 95%
confidence is 2.045.
- 15 -
x t*
1329 .60
30
= $942.48 to $1,935.32
(c) Interpret the confidence interval in the proper context.
From our analysis of the selected policies, we are 95% confident that the true mean profit of
all policies sold by this sales rep is contained in the interval from $942.48 to $1,935.32.
Note: Insurance losses are notoriously subject to outliers. One very large loss could influence
the average profit substantially. However, there were no such cases in this data set.
(d) Is there evidence that the mean profit of policies sold by this sales representative is less than
$1,500?
To test the hypotheses
H0 :
1500
Ha :
1500
s
n
61.10
242 .75
0.2517
30
The P-value is the probability of observing a sample mean as small as $1,438.90 (or smaller)
if the true mean were $1,500, as the null hypothesis states.
Using the t-Table, we find P-value > 0.25.
If the mean were $1,500, we would expect a sample of size 30 to have a mean this low more
than 25% of the time. Therefore, the result we obtained from the sampled contracts is not
surprising and we conclude that there is not enough evidence in this sample of policies to
indicate that the true mean is below $1,500.
- 16 -
Example 8
European GDP growth
Is economic growth in Europe related to growth in the United States? Heres a regression
output for the average growth in 25 European countries (in % of Gross Domestic Product)
versus the growth in the United States. Each point represents one of years from 1970 to 2007.
6
5
y = 0.3616x + 1.3297
R = 0.2965
4
3
2
1
0
-1
-4 -3
-2
-1
(a) Describe the relationship between the economic growth in the United States and the
economic growth in Europe.
The scatterplot shows a positive linear association, with one or two possible outliers.
The correlation is r = 0.2965 = 0.545 indicating a moderate linear relationship.
(b) Economists speculate that the growth rate of the United States can help predict the growth
rate of the 25 European countries. Do you think the data confirm the economists speculation?
r2 = 0.2965
About 30% of the variation in the growth rates of the 25 European countries is accounted for
by the growth rates of the United States.
The growth rates of the United States can be used to give (very rough) estimates of the growth
rates of the 25 European countries. The spread is so wide that the estimates would not be
very reliable.
- 17 -
(c) In 2007, the United States experienced a 3.20% growth, while Europe grew at a rate of 2.16%.
Is this more or less than you would have predicted?
The predicted value of y at an x value of 3.2% is:
Growth(25 European Countries) = 1.330 + 0.3616Growth(United States)
= 1.330 + 0.3616(3.2)
= 2.48712% or 2.49%
The predicted value using the linear model is higher than the actual percentage.
The actual value performed less than expected:
residual = observed y predicted y = 2.16 2.49 = 0.33%
(d) Would your prediction be better if the outlier (x,y) = (0.2167, 4.3748) has been removed?
After removing the outlier the summary statistics are
x = 3.1487
y = 2.3882
s x = 2.0174
s y = 1.3382
r = 0.6347
sy
sx
0.6347
1.3382
2.0174
(0.6347 )(0.6633)
0.421
The intercept is
y 1.0626 0.421x
(e) How influential is the outlier for the correlation?
The correlation rises from 0.545 to 0.635.
Removing the outlier makes the linear association stronger and so moves r closer to 1.
(f) How influential is the outlier for the coefficient of determination?
The new coefficient of determination is
r2
0.6347
0.4028
or
2.41%
- 19 -