Using Real Data For Decision Analysis

Using Real Data
For Decision Analysis
Decision Theory & Analysis

David M. Dilts
Which class would you take?
Next semester, two professors will be teaching
Decision Theory.
VS.
Dr. Pieceofcake
Dr. Uragonnafail
 Based on RateMyProfessor.com, the past grades
are posted over the five years.
mean = 1770.16 (out of 2000 points) mean = 1750.82
Which professor would you sign up for?

Finding the Distribution
VS.

Dr. Pieceofcake GPA distribution Dr. Uragonnafail GPA distribution
X <=1750.00 X <=1790.00 X <=1550.0 X <=1950.0
0.2% 99.7% 0.1% 99.9%
6 7
5 6
5
4
Values x 10^-3
Values x 10^-2
4
3
3
2
2
1 1
0 0
1.75 1.755 1.76 1.765 1.77 1.775 1.78 1.785 1.79 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95
Values in Thousands Values in Thousands
After quick graphic analysis, you realize that the distributions are
normally distributed with approximately the same shape
Which professor would you still sign up for?

Standardizing the axis
Dr. Pieceofcake GPA Distribution Dr. Uragonnafail GPA distribuition
6 7
5 6
5
4
Values x 10^-2
Values x 10^-3
4
3
3
2
2
1
1
0 0
1.5 1.591 1.682 1.773 1.864 1.955 1.5 1.591 1.682 1.773 1.864 1.955
Values in Thousands Values in Thousands

var = 7.139664 var = 63.8635
Which professor would you sign up for now?

A matter of perspective
 Which professor would you want if you
were a 4.0 students?
 What if you were a slacker?
 Remember: it is not just the central

tendency that is important, the dispersion
is also critical
There are a variety of probability models
that can be used to help make decisions…
 Binomial distribution
 Poisson distribution
 Normal distribution
 Exponential distribution
 Beta distribution
 etc, etc, etc…
 It is always important to use the correct
distribution to explain your data
– BUT
 More importantly, it is essential to always
consider the context by which you are
making the decision
Caveats in using real data to
make decisions
 Fallacy of Averages
 Assumptions of normality
 Errors in estimations
 Impact of outliers
 Residual Values
Fallacy of averages
0 .4 5
0 .4
0 .3 5
Mean / Average
Frequency of Developed Drugs
0 .3
0 .2 5
0 .2
0 .1 5
0 .1
0 .0 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Duration of Drug Development

0 .4 5
Fallacy of Averages
0 .4
Mean Duration of “Typical” Process

0 .3 5
0 .3
Frequency of Developed Drugs
0 .2 5
0 .2
0 .1 5
Mean Duration of
“Complex” Processes
0 .1
0 .0 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Duration of Drug Developm ent
A simple case of fallacy of averages is the case of overall height of a specific

population. Men and Women have natural bimodal distributions, but combined
have a normal distribution.
Assumptions of Normality
•Normal distributions are not always “bell
shaped”
Requirements for
normal distribution
symmetrical
“68, 95, 99 rule”
•Not all distributions are normal, few are perfectly normally

distributed
•Much of decision analysis is based on assumptions of the normal curve to make

calculations easier. It is important to understand the limitations of the normal
curve when basing your decisions off of it.
Errors in Estimations
Some linear-
regressions force
the line to the “0-0”
point .
(meaning:
x-intercept = 0
y-intercept = 0 )
When creating
regressions to the
data, have an
understanding of the
range to which your
estimation is
reasonable.
Would it make sense to have a the regression line go be forced through the “0-0” point
in this case?
Impact of Outliers
 Outliers can bias the regression estimation to
accommodate the extreme data point(s).
 What happens if you add Maudie Hopkins (19) and William Cantrell (86) [last civil
war widow] & Anna Nicole Smith (26) to J. Howard Marshal II (89)? Or, more
recently, Demi Moore (42) to Ashton Kutcher (27)?
Impact of Outliers
Outliers can also give a false sense of correlation between two variables. In a
correlation test, the strength of the relationship between the husband’s age and the
wife’s age would be incorrectly accentuated. This would reflect the incorrect
observation pertaining the compactness of the dataset.
Residual value
Two datasets can give identical regression estimations. The graph on the
right has a larger residual value therefore there is a presence of greater
error. In other words, there is not as strong of a relationship of
predicting values of y from x.
Things to remember about real
data
 It is messy!
 It is not always memoryless
– i.e., the recent past really is a better indicator of
future performance
 It can have many outliers that are:
– Important indications of new trends, or
– Oddities that should be eliminated
 There is a major difference between
statistical significance and practical
significance
– Don’t just look at the statistical results, look at
the data itself!

Using Real Data For Decision Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using Real Data For Decision Analysis

Uploaded by

Copyright:

Available Formats

Using Real Data

For Decision Analysis

Decision Theory & Analysis

Which professor would you sign up for?

mean = 1770.16 (out of 2000 points) mean = 1750.82

Values in Thousands Values in Thousands

Which professor would you still sign up for?

Values in Thousands Values in Thousands

mean = 1770.16 (out of 2000 points) mean = 1750.82

Which professor would you sign up for now?

 What if you were a slacker?

 Remember: it is not just the central

Duration of Drug Development

Mean Duration of “Typical” Process

Duration of Drug Developm ent

A simple case of fallacy of averages is the case of overall height of a specific

•Not all distributions are normal, few are perfectly normally

•Much of decision analysis is based on assumptions of the normal curve to make

You might also like