You are on page 1of 7

Misapplications Reviews: Rain Men

Author(s): Arnold Barnett and Harvey Tress


Source: Interfaces, Vol. 20, No. 2 (Mar. - Apr., 1990), pp. 42-47
Published by: INFORMS
Stable URL: http://www.jstor.org/stable/25061331 .
Accessed: 19/09/2014 04:30

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Interfaces.

http://www.jstor.org

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
Misapplications Reviews: Rain Men

Arnold Barnett Massachusetts Institute of Technology


50 Memorial Drive
Cambridge, Massachusetts 02139

Harvey Tress New York State Energy Office


Two Rockefeller Plaza
Albany, New York 12222

Ir 3 m X & m? O ,7T P

/\ > X-, >


"Xtt"
< X $ Z xr% 0"^

The four stories concern rain over time, he performed a least-squares


following
men since only two involve linear regression on sequential
although, analysis
precipitation and the last has a female data:

protagonist, we're using the term


a bit H = A + Bt + e (1)
broadly. What the stories have in common where H= height of water
table, t is time
is that all took place under bleak statisti measured from start of the study period,
cal skies in which clouds of confusion e is a zero-mean normal "random noise"

and blotted out the sunlight term, and A and B are numerical con
imprecision
of reason. Let us quickly don our mathe stants to be estimated in the regression
matical raincoats and visit the scenes of analysis.
the crimes. out that BA, the estimated
It turned
At the Water Table value of the slope B, was slightly negative
A hydrologist for a large American but not significantly lower than zero. Un
state was asked whether water consump der usual standards, therefore, the hy
tion in a growing county was lowering its pothesis that the water table was stable
water table, in which case its system of over time was in harmony with the em
wells might soon be useless. He had pirical evidence. Of course, the data were
measurements on the height of the water also consistent with a modest
steady but
table (in number of feet above sea level) drop over to
time. But
public officials bat
at the end of each recent year. To deter tered by problems above the ground, the
mine the table was a sufficient
whether dropping analysis might have seemed
Copyright ? 1990,The Institute of Management Sciences STATISTICS
0091-2102/90/2002/0042$01.25

INTERFACES20: 2March-April 1990 (pp. 42-47)

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
MISAPPLICATIONSREVIEWS

basis for not worrying about problems The upshot is that BA estimated under
below. (1) might reveal not simply the long-term
When the hydrologist presented his trend in H but rather the sum of that
work at a meeting of state planners, he trend and the short-term effects of
was asked the amount of recent pre variations over time in recent rainfall.
why
cipitation was
not an explanatory variable To gauge the importance of this last
in the regression. He answered that he possibility, one can replace (1) with the
was trends more general
only in long-term
interested model:
and not in the transient effects of year-to H = A + Bt + CR + e (2)
year fluctuations in rainfall. To the extent where R = rainfall over
the last year, and
that such fluctuations affected H, he sug A, B, H, t, and e are defined as in (1).
an
gested, they were adequately represented When analyst calibrated(2) with the
in the model by the zero-mean noise hydrologist's data, he found that CA was
term, e.
positive and highly significant, while the
This for omitting rainfall new BA was both and far
explanation significant
from the model was not preposterous. more negative than its counterpart in (1).
But that is not to say that it was viable. And under the criterion of mean-squared
There is reason to fear that his omission error, the new model was far more com
distorted the meaning of the parameter patible with the data than was its prede
BA and thus that his conclusions about cessor. When H was
subsequently
the water
table...er...might
not hold allowed to depend not just on R but on
water. rainfall in the last few years, BA became
Even if we grant that, between Ice even more
negative and significant. In

Ages, there is no net correlation at a other words, the improved models

given spot between annual rainfall and a water table with a strong
portrayed
time, there can still be correlation be underlying propensity to fall.
tween time and rainfall over a given short A post-mortem revealed that, because
period. Perhaps the rainier years will fall rain had been unusually abundant near

by chance towards the start of the period, the end of the hydrologist's study period,
or towards the end. Should that happen, the water table had not dropped very
a depiction of the observed much over the full period. It was that cir
straight-line
rainfall/time relation would have a posi cumstance that had led to the only
tive (or negative) slope rather than the slightly negative BA in (1). But the very

slope of zero that would prevail in the fact that the water table failed to rise dur

longer
run.
ing several wet years was indirect evi
Suppose that the water table H at the dence of a strong downward trend, a
end of a year varies positively with the trend that became quite visible once H
amount of rain R that fell during that was for the effects of recent
adjusted
year. Then if R is correlated with time rainfall.
over some similarly be
period, H would The moral of this drenching is that,
correlated, all other factors being equal. even if a variable should have no long

March-April 1990 43

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
BARNETT, TRESS

influence the process of interest, ana


on warmest and coldest moments of a five

lysts should beware of the assumption day span will occur during the same
that either the noise-level e or the con calendar day.
stant term A satisfactorily incorporates The phrase "statistical of
probability
that variable into a regression model. A rain" raises an interesting question about
more prudent course might be to include how the forecast was developed. Was it
that variable directly in the analysis and based on historical data for Denver for
trust that, if it indeed has no systematic the period 4/30 to 5/5, or was it tied to
effect within the data set at hand, near the specific conditions on
prevailing
zero estimates of the appropriate 4/29/89 when the forecast was made? In
coefficients will testify to that fact. more formal terms, was the "statistical"

Enigma probability of rain a conditional or an


Itmust be dull to forecast the weather unconditional one?
in Los Angeles; conditions there are al And what does it mean to speak of a
most always sunny and pleasant. Perhaps 29 percent chance of rain in 4/30 to 5/5? Is
that explains why the Los Angeles Times of that the probability that it will rain at
fers detailed forecasts for numer least once over the five-day Or is
five-day period?
ous other cities and presents its it instead an average
daily probability,
predictions in the form of a cryptogram. corresponding to the statement that the
On April 30, 1989, for example, the
Times made a forecast for Denver for the
Angelenos are horrified by
period 4/30 to 5/5/89. The high tempera
ture was listed as 65 degrees Farenheit the weather they encounter in
and the low as 33 degrees. The "statisti other cities.
cal probability of rain" was estimated as
29 percent. No other information was expected number of rainy days over the

provided, which means that the forecast period is 1.45? the two inter
Obviously,
could serve as the embryo of an exciting pretations are not equivalent: if the cumu

game of Twenty Questions. lative chance of rain is 29 percent, then


What, for example, did the Times mean the daily chance is seven percent under
when it predicted a of 65 degrees? an ((l-p)5is
high independence approximation.
Was that the very highest temperature to closer to .71 for p= .07 than for p = .06.)
be reached over
the period, or rather the the Los Angeles Times can't
Admittedly,
average of the five daily highs? Similarly, accompany each weather forecast with a
was 33 the all-period low or the variance/covariance matrix. But it could
degrees
average daily low? It's tempting to treat a
probably dispel good deal of confusion
the difference between 65 and 33 as a with a few modest changes in wording. If

typical daily range of temperatures, but is "statistical" probability means historical


it a reasonable inference? Not if 65 and 33 probability, for example, then the latter
are the extreme statistics for the full pe could replace the former. If
adjective
riod; it seems after all, that the "high" means average daily high, then
unlikely,

INTERFACES20:2 44

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
MISAPPLICATIONSREVIEWS

before German cities. that such mas


simply adding the word "average" Contending
the correct inter sive dislocation would demor
high would make clear thoroughly
pretation. Angelenos
are horrified enough alize the Germans, he argued that, over
in other the next attacking urban centers
by the weather they encounter year,
cities; at least their hometown paper should be the primary mission of RAF
could try a bit harder to reduce the bombers.

element of surprise. Other scientists


sharply disputed Lord
Fire and Rain Cherwell's calculations. Among the prom

During World War II, raindrops


were inent critics was Lord Blackett (director of
not the only things falling from the skies operational research for the Admiralty),
over In some German cities, the who noted that the RAF's bombings were
Europe.
frequency of bombing raids literally
ex not terribly precise; if a bomb were
ceeded that of rain showers. Beyond any dropped on (say) Hamburg, at most one
could say that it would probably land
somewhere within the city. Thus, the
In some German cities, the
greater the fraction of housing stock al
frequency of bombing raids ready destroyed, the greater the chance
literally exceeded that of rain that the next bomb dropped would sim
showers.
ply "make the rubble jump" rather than
cause new destruction. Given such a law
moral issues arising from the deliberate of diminishing returns, the bombing
bombardment of civilians, the efficacy of would get progressively less effective as it
the policy of bombing German cities was went on; hence the degree of urban de
a
subject of fierce debate in British war struction would fall far below that envi
councils. sioned by Lord Cherwell. Lord Blackett

Perhaps the policy's leading scientific urged instead that the bombers be

proponent was Lord Cherwell (professor diverted to North Atlantic U-boats.


of experimental at Oxford), A simple equation implements Lord
philosophy
who in 1942 drew optimistic inferences Blackett/s hypothesis. Let/be the fraction
from data about the early German bomb of a city's housing stock destroyed by
ings of England. A government research time t, and let A be the time required

agency had estimated the destructiveness fully to destroy that city's housing under
of these raids by calculating the ratio of Lord Cherwell's linear model. Then, as
the amount of housing destroyed to the suming bombardment at some fixed rate
of tons of bombs dropped. Mak over time, / would a differ
number roughly follow
ing linear extrapolations based on this ential equation of the form:
=
"houses per ton" statistic, he projected dfldt (l-/)/A (2)
that the tonnage of bombs available Under (2), the time dt required to raise

through mid-1943 to the Royal Air Force the fraction of houses destroyed from / to
would suffice to "dehouse" the "great / + dfwould follow:
majority" of inhabitants of the 58 largest dt = Adf/(l-f)

March-April 1990 45

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
BARNETX TRESS

while the time t(F) needed to destroy frac which so cast its vote for
strongly
tion / of the housing would follow: concavity.

t(F)= fF Adf = -Aln(l-F) (3)


We should note, however, that Lord
Cherwell's ultimate ? that de
Jo iX-f) premise
Note that t(l) is infinite under (3), which housing millions of Germans would pul
an urban ?
means that totally dehousing verize the nation's morale was not

population is an unattainable goal. For F supported by the unfolding of events.


= 0.8
(a level of destruction compatible The bombing did little to reduce German
with Lord Cherwell's "great majority" of war production and, by many accounts,
residents dehoused), t(F) would be 1.61A did more to stiffen German resistance
under (3), more than twice Lord Cherwell's than to suppress it. Lord Cherwell, there
estimate of 0.8A fore, may have won the mathematical ar
corresponding
records that the heavy gument but lost the deeper one.
History bombing strategic
of German cities continued throughout Reign of Terror
the war. And if the newsreels of Germany The Atlanta Constitution of May 9, 1989
in 1945 are any guide, the degree of de a reader's letter that offered a
published
struction in its cities, if not literally 100 grim syllogism:
percent, did not fall short by much. Does (1) Atlanta has the highest crime rate in
that circumstance hint that Lord Cherwell the US;
had the better of the argument with Lord (2) The US has the highest crime rate in
Blackett? the world; and therefore
Not because
the tonnage of (3) Atlanta has the highest crime rate in
necessarily,
bombs dropped 1945 was far greater the world.
by
than the amount that Lord Cherwell as This argument sounds quite natural as it
sumed in his calculations. On the other glides off the tongue, like the assertion
hand, Lord Cherwell's accuracy may have that if A>B and B>C, then A>C. But
benefitted from a phenomenon that nei does (3) represent a logical deduction
ther man considered. When from (1) and (2), or is it instead a non
explicitly
many bombs are dropped in close prox sequitur?
and the result can The question becomes easier to answer
imity quick succession,
be a conflagration that destroys if we first consider another. Suppose that
large
numbers of houses that had not been hit x and y are distributed random
normally
directly. (Such a firestorm caused stagger variables and that u(x), the mean of x, ex

ing damage in Dresden from only one ceeds u(y), the mean of y. Is the 95th per

raid.) Hence the marginal effectiveness of centile of x greater than the 95th
more sometimes grows faster percentile of y?
bombing
than rather than slower. It is Not necessarily. A distribution's 95th
linearly
therefore possible that Lord Cherwell's on its variance as well
percentile depends
model ? which was neutral between con as its mean. If s(x) and s(y) are, respec
?
cavity
and
convexity
was more accu
tively, the standard deviations of x and y,
rate in the aggregate than Lord Blackett's, then the real issue is whether

INTERFACES20:2 46

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions
MISAPPLICATIONSREVIEWS

u(x) + l.(As(x) exceeds u(y) + 1.64s(y). We criteria is Atlanta America's crime capitol:
can that, if s(y) it has a lower per capita murder rate than
readily compute
s(x)>1.56(u(x)-u(y)), then y will have the Detroit or and a substantially
Washington
larger 95th percentile. These formulas re lower robbery rate than New York. Thus,
flect the intuitively clear notion that a dis not only does (3) not follow from (1) and
tribution with around a (2), but (1) and (2) do not themselves fol
large dispersion
low mean can have a higher right tail low clearly from the data they purport to
than another distribution with small summarize.

dispersion around a higher mean. Final Remark


Before we leave x and y, we should no The weather is the classic process that
tice another point. Suppose that m inde everyone talks about but no one changes.

pendent selections are made from the x Are mathematical likewise be


tempests
distribution and n from the y distribution. yond our locus of control? Not at all; if we
The probability that the largest observed see
people flying towards intellectual ty
x value exceeds the largest observed y de we can and should issue
phoons, urgent
pends, of course, on u(x), s(x), u(y) and to change course. And we can
warnings
s(y). But it also depends on n and m: the watch with greater vigilance for the sta

largest of 100 samples can exceed a distri tistical storms that always swirl around
bution's mean by far more than does the our own work,
waiting to blow us away if

largest of six. we let down our guard for a second. The

Returning to Atlanta, we can see by musical Hair ends with an exhortation to

analogy why (1) and (2) do not imply (3). "let the sun shine in!" That sounds like a
Some nation that has a lower average marvelous for us in the new decade
goal
crime rate than the US but greater hetero and the subsequent millennium.

geneity could a city with a higher


contain
crime rate than Atlanta's. a
Alternatively,
nation with the same mean and standard
deviation of urban crime rates as the US
but a larger number of cities could easily
have a city worse than Atlanta in crime. It
isn't certain that either of these scenarios
obtains, but the very possibility that they
do renders the letter writer's argument
insufficient.
And there is more good
news for At
lanta residents: statements (1) and (2) are
themselves of dubious accuracy. The US
almost surely does not have the highest
crime rate in the world. (The country that
does rec
probably doesn't keep serious
ords.) And only under highly specialized

March-April 1990 47

This content downloaded from 193.255.76.113 on Fri, 19 Sep 2014 04:30:56 AM


All use subject to JSTOR Terms and Conditions

You might also like