Professional Documents
Culture Documents
VS.
Dr. Pieceofcake
Dr. Uragonnafail
Based on RateMyProfessor.com, the past grades
are posted over the five years.
mean = 1770.16 (out of 2000 points) mean = 1750.82
VS.
5 6
5
4
Values x 10^-3
Values x 10^-2
4
3
3
2
2
1 1
0 0
1.75 1.755 1.76 1.765 1.77 1.775 1.78 1.785 1.79 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95
After quick graphic analysis, you realize that the distributions are
normally distributed with approximately the same shape
6 7
5 6
5
4
Values x 10^-2
Values x 10^-3
4
3
3
2
2
1
1
0 0
1.5 1.591 1.682 1.773 1.864 1.955 1.5 1.591 1.682 1.773 1.864 1.955
0 .4
0 .3 5
Mean / Average
Frequency of Developed Drugs
0 .3
0 .2 5
0 .2
0 .1 5
0 .1
0 .0 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
0 .3
Frequency of Developed Drugs
0 .2 5
0 .2
0 .1 5
Mean Duration of
“Complex” Processes
0 .1
0 .0 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Would it make sense to have a the regression line go be forced through the “0-0” point
in this case?
Impact of Outliers
Outliers can bias the regression estimation to
accommodate the extreme data point(s).
What happens if you add Maudie Hopkins (19) and William Cantrell (86) [last civil
war widow] & Anna Nicole Smith (26) to J. Howard Marshal II (89)? Or, more
recently, Demi Moore (42) to Ashton Kutcher (27)?
Impact of Outliers
Outliers can also give a false sense of correlation between two variables. In a
correlation test, the strength of the relationship between the husband’s age and the
wife’s age would be incorrectly accentuated. This would reflect the incorrect
observation pertaining the compactness of the dataset.
Residual value
Two datasets can give identical regression estimations. The graph on the
right has a larger residual value therefore there is a presence of greater
error. In other words, there is not as strong of a relationship of
predicting values of y from x.
Things to remember about real
data
It is messy!
It is not always memoryless
– i.e., the recent past really is a better indicator of
future performance
It can have many outliers that are:
– Important indications of new trends, or
– Oddities that should be eliminated
There is a major difference between
statistical significance and practical
significance
– Don’t just look at the statistical results, look at
the data itself!