Professional Documents
Culture Documents
Usestandarddeviation(notmadaboutMAD)|WinVectorBlog
1/5
21/7/2015
Usestandarddeviation(notmadaboutMAD)|WinVectorBlog
the first box are nine zeros and one $5 payoff. We are going to use a
general measure of model goodness called a loss function[2] or
loss and ignore any issues of parametric modeling, incorporating
prior knowledge or distributional summaries.
Suppose we use mean absolute deviation as our measure of model
quality. Then the loss (or badness) of a value V is loss(V) = 9*|V-0| +
1*|V-5| which is minimized V=$0. That is it says the best model
under mean absolute error is that all the lottery tickets are
worthless. I personally feel that way about lotteries, but the mean
absolute deviation is missing a lot of what is going on. In fact if we
have nine tickets with zero payoff and a single ticket with a nonzero payoff the mean absolute deviation is minimized for V=0 for
any positive payoff on the last ticket. The mean absolute deviation
says the best model for a lottery ticket given 9 non-payoffs and one
$1,000,000 payoff is that tickets are worth $0. Meaning that we may
not want to always think in terms of the mean absolute deviation
summary.
Here is some R[3]-code demonstrating what models (values of V)
total absolute deviation prefers (for our original problem):
library(ggplot2)
d <- data.frame(V=seq(-5,10,by=0.1))
f <- function(V) { 9*abs(V-0) + 1*abs(V-5)}
d$loss <- f(d$V)
ggplot(data=d,aes(x=V,y=loss)) + geom_line()
chromeextension://iooicodkiihhpojmmeghjclgihfjdjhj/front/in_isolation/reformat.html
2/5
21/7/2015
Usestandarddeviation(notmadaboutMAD)|WinVectorBlog
seem to be worth about $0.5 each while they cost $1 each (typical of
lotteries). Also notice we have 10*V equals $5 the actual total value
of all of the tickets in the first box of lottery tickets. This is a key
advantage of RMSE: it gets group totals and averages right even
when it doesnt know how to value individual tickets. You want this
property.
How can we design loss functions that get totals correct? What we
want is a loss function that when we optimize to minimize loss we
chromeextension://iooicodkiihhpojmmeghjclgihfjdjhj/front/in_isolation/reformat.html
3/5
21/7/2015
Usestandarddeviation(notmadaboutMAD)|WinVectorBlog
chromeextension://iooicodkiihhpojmmeghjclgihfjdjhj/front/in_isolation/reformat.html
4/5
21/7/2015
Usestandarddeviation(notmadaboutMAD)|WinVectorBlog
Links
1. http://www.edge.org/response-detail/25401
2. http://en.wikipedia.org/wiki/Loss_function
3. http://cran.r-project.org/
4. http://en.wikipedia.org/wiki/Median_absolute_deviation
5. http://en.wikipedia.org/wiki/Stationary_point
6. http://www.win-vector.com/blog/2011/09/the-simplerderivation-of-logistic-regression/
7. http://www.win-vector.com/blog/2013/05/bayesian-andfrequentist-approaches-ask-the-right-question/
8. http://www.win-vector.com/blog/2009/04/the-data-enrichmentmethod/
9. http://en.wikipedia.org/wiki/Deviance_(statistics)
chromeextension://iooicodkiihhpojmmeghjclgihfjdjhj/front/in_isolation/reformat.html
5/5