You are on page 1of 15

Using Real Data

For Decision Analysis

Decision Theory & Analysis


David M. Dilts
Which class would you take?
Next semester, two professors will be teaching
Decision Theory.

VS.

Dr. Pieceofcake
Dr. Uragonnafail
 Based on RateMyProfessor.com, the past grades
are posted over the five years.
mean = 1770.16 (out of 2000 points) mean = 1750.82

Which professor would you sign up for?


Finding the Distribution

VS.

mean = 1770.16 (out of 2000 points) mean = 1750.82


Dr. Pieceofcake GPA distribution Dr. Uragonnafail GPA distribution
X <=1750.00 X <=1790.00 X <=1550.0 X <=1950.0
0.2% 99.7% 0.1% 99.9%
6 7

5 6

5
4

Values x 10^-3
Values x 10^-2

4
3
3
2
2

1 1

0 0
1.75 1.755 1.76 1.765 1.77 1.775 1.78 1.785 1.79 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95

Values in Thousands Values in Thousands

After quick graphic analysis, you realize that the distributions are
normally distributed with approximately the same shape

Which professor would you still sign up for?


Standardizing the axis
Dr. Pieceofcake GPA Distribution Dr. Uragonnafail GPA distribuition

6 7

5 6

5
4
Values x 10^-2

Values x 10^-3
4
3
3
2
2

1
1

0 0
1.5 1.591 1.682 1.773 1.864 1.955 1.5 1.591 1.682 1.773 1.864 1.955

Values in Thousands Values in Thousands

mean = 1770.16 (out of 2000 points) mean = 1750.82


var = 7.139664 var = 63.8635

Which professor would you sign up for now?


A matter of perspective
 Which professor would you want if you
were a 4.0 students?

 What if you were a slacker?

 Remember: it is not just the central


tendency that is important, the dispersion
is also critical
There are a variety of probability models
that can be used to help make decisions…
 Binomial distribution
 Poisson distribution
 Normal distribution
 Exponential distribution
 Beta distribution
 etc, etc, etc…
 It is always important to use the correct
distribution to explain your data
– BUT
 More importantly, it is essential to always
consider the context by which you are
making the decision
Caveats in using real data to
make decisions
 Fallacy of Averages
 Assumptions of normality
 Errors in estimations
 Impact of outliers
 Residual Values
Fallacy of averages
0 .4 5

0 .4

0 .3 5

Mean / Average
Frequency of Developed Drugs

0 .3

0 .2 5

0 .2

0 .1 5

0 .1

0 .0 5

0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0

Duration of Drug Development


0 .4 5
Fallacy of Averages
0 .4

Mean Duration of “Typical” Process


0 .3 5

0 .3
Frequency of Developed Drugs

0 .2 5

0 .2

0 .1 5

Mean Duration of
“Complex” Processes
0 .1

0 .0 5

0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0

Duration of Drug Developm ent

A simple case of fallacy of averages is the case of overall height of a specific


population. Men and Women have natural bimodal distributions, but combined
have a normal distribution.
Assumptions of Normality
•Normal distributions are not always “bell
shaped”
Requirements for
normal distribution
symmetrical
“68, 95, 99 rule”

•Not all distributions are normal, few are perfectly normally


distributed

•Much of decision analysis is based on assumptions of the normal curve to make


calculations easier. It is important to understand the limitations of the normal
curve when basing your decisions off of it.
Errors in Estimations
Some linear-
regressions force
the line to the “0-0”
point .
(meaning:
x-intercept = 0
y-intercept = 0 )
When creating
regressions to the
data, have an
understanding of the
range to which your
estimation is
reasonable.

Would it make sense to have a the regression line go be forced through the “0-0” point
in this case?
Impact of Outliers
 Outliers can bias the regression estimation to
accommodate the extreme data point(s).
 What happens if you add Maudie Hopkins (19) and William Cantrell (86) [last civil
war widow] & Anna Nicole Smith (26) to J. Howard Marshal II (89)? Or, more
recently, Demi Moore (42) to Ashton Kutcher (27)?
Impact of Outliers

Outliers can also give a false sense of correlation between two variables. In a
correlation test, the strength of the relationship between the husband’s age and the
wife’s age would be incorrectly accentuated. This would reflect the incorrect
observation pertaining the compactness of the dataset.
Residual value

Two datasets can give identical regression estimations. The graph on the
right has a larger residual value therefore there is a presence of greater
error. In other words, there is not as strong of a relationship of
predicting values of y from x.
Things to remember about real
data
 It is messy!
 It is not always memoryless
– i.e., the recent past really is a better indicator of
future performance
 It can have many outliers that are:
– Important indications of new trends, or
– Oddities that should be eliminated
 There is a major difference between
statistical significance and practical
significance
– Don’t just look at the statistical results, look at
the data itself!

You might also like