You are on page 1of 35

Basic Statistics

–An old, but good book on the pitfalls of


statistics can be found in Huff (How to Lie
with Statistics 1954)‫‏‬
Basic Statistics
–The arithmetic mean is a series of equivalent
observations is one of the most commonly
used statistical tools.
–Students will most often want to know what
the mean or average was on an
examination, in order to determine where
there performance fit in relative to other
students.
–In this case the observations (test scores)
are all very nearly random and the mean
examination score has meaning.
Basic Statistics
– Specifically the mean is a predictor of values. As
an example, if I have 200 student test scores and
I were to randomly select a single test score from
the the 200 test scores, can I predict what is the
most likely value that I will pick.
– Using this example the mean is the most likely
value that I will pick.
– Since the mean is a predictor, it cannot improve
the accuracy of of the values, thus if the test
score has an accuracy of ±1% then I cannot
report a mean test score with any better
accuracy.
Basic Statistics

–The standard deviation provides a measure


of the reliability of the mean. It does so by
determining the width of the peak that occurs
at the mean.
Basic Statistics

–The usual measure is that approximately


68% of all values fall 1  on either side of the
mean, 95% fall within 2  and 99% of all
observations fall 3  from the mean.
–This is the basis for grade distributions.
Basic Statistics
–One  on either side of the mean determines
the location of the bottom of a C and the
bottom of a B.
–Two  on either side of the mean determines
the location of the bottom of a D and the
bottom of an A
Basic Statistics
Basic Statistics

–Thus a C encompasses 66% of all grades


with 17% of the grades falling EQUALLY
above and below a C.
–Of the 17% below a C approximately 4% will
be an F and of the 17% above a C
approximately 4% will be an A
Basic Statistics
• One approach is to apply a single,
predetermined distribution of letter
grades to each test.
• For example, one might decide to
assign about 10% A's, 30% B's, 45%
C's, 10% D's, and 5% F's on every test
or assignment.
Basic Statistics
Between 0.67 (inclusive) and
At least 1.33 standard deviations above the
A 0.33 (exclusive) standard
mean C
deviations below the mean
Between 1 (inclusive) and 1.33 (exclusive) Between 1 (inclusive) and 0.67
A- (exclusive) standard deviations
standard deviations above the mean
below the mean C-
Between 2 (inclusive) and 1
Between 0.67 (inclusive) and 1 (exclusive)
B+ (exclusive) standard deviations
standard deviations above the mean
below the mean D
Between 0.33 (inclusive) and 0.67
At least 2 standard deviations
(exclusive) standard deviations above the B
below the mean F
mean
Between 0 (inclusive) and 0.33 (exclusive)
B-
standard deviations above the mean
Between 0.33 (inclusive) and 0 (exclusive)
C+
standard deviations below the mean
Basic Statistics
The observations taken at any observing
station are of fundamental importance to
meteorology.
How a large number of these observations a
examined and interpreted are of similar
importance.
For climatological data amassed at Central
Institutions, such as the National Climatic
Data Center (NCDC), examining individual
observations is rather unwieldy.
Basic Statistics

In this case statistics are used to digest the


observations to make the more amenable
to discussion and discovery.
In physics statistics are used to eliminate
accidental errors in a single observation
and to obtain higher accuracy.
Meteorologists have an intermediate case.
Basic Statistics
We will have a large number of observations
that will need to be interpreted, but we are
also interested in obtaining the best possible
value for the accuracy of a a forecast.
It should be borne in mind that statistics
always require interpretation and never add
information.
Properly applied statistics can help clarify an
issue, but misapplied they will mislead.
Basic Statistics
One factor that is particularly important
is that it must be ascertained ahead of
time whether or not the tool or technique
that is being applied to a data set is
appropriate.
Many statistical tools are valid only if the
data is random or independent a
condition that is often not met in
meteorology.
Basic Statistics

Mean and Standard Deviation


The arithmetic mean is a series of equivalent
observations is one of the most commonly used
statistical tools.
In the case of temperature values the observed
temperature is not random and there is a
relationship between temperature samples.
In this case the mean temperature does not
contain as much useful information as does say
the mean of a series of student test scores.
Basic Statistics
Mean and Standard Deviation
The mean of a series of observations can be
found from:

 N 
 x i 
i1  x1  x 2  x 3  x 4   xN 
x 
N N


Basic Statistics
• Given the mean is a predictor, how reliable is
this predictor?
• Consider the following example. In the table
below are 10 days of high temperatures. Both
stations have a mean high temperature of 44
F.
• How reliable is the mean for the two stations ?
Basic Statistics

A 44.5 43.1 43.0 44.0 43.8 44.9 44.2 44.0 45.0 43.5

B 47.9 42.5 41.2 45.5 40.1 41.0 46.8 48.6 47.0 39.4
Basic Statistics
• In the case of station A the daily high
temperature are all closely grouped around
the station mean and do not deviate by more
than 1.
• On the other hand the values for station B are
scattered around the mean value and deviate
by 4.6 
• We can say that there is some physical
reason why the values are closely related in
stations A and scattered in Station B and that
the mean is less reliable for Station B.
Basic Statistics
• One approach to determining how the values are
distributed around the mean is the standard
deviation.
• The standard deviation is defined as the squares
of deviations of single values from the mean, ie.
0.5
 N 
 x i  x  
2

   i1 


 N 
 
 
Basic Statistics
• The standard deviation provides a
measure of the reliability of the mean.
• It does so by determining the width of
the peak that occurs at the mean. In Fig
1b, the standard deviation is very small
and there are very few values away
from the mean value.
Basic Statistics
• On the other hand in Fig. 1c the standard
deviation is very large and there are many
values far away from the mean.
• The usual measure is that approximately
66% of all values fall 1  on either side of the
mean, 91% fall within 2  and 97% of all
observations fall 3  from the mean.
Basic Statistics
Basic Statistics
• Using Table 1, the standard deviation of
station A's observations is small, but the
standard deviation of station B's observations
is large.
• Thus by providing a measure of the spread of
the distribution curve I can get an estimate of
the quality of the mean value.
• Unfortunately this isn't enough to provide a
complete measure of the quality of the mean.
• We have assumed that the distribution of
values is random and in no way biased.
Basic Statistics
• As discussed earlier, the mean is not a
complete description of a series of values,
while standard deviation provides a measure
of how the values are distributed about mean.
• The assumption was that the values do not
have any bias.
• An estimate of this bias can be made by
examining the median and mode the sample.
Basic Statistics
• The mode is simply the most commonly
occurring value.
• From French, the most fashionable value.
• Although there are many ways that the mode
can be determined, one of the most telling
ways is to create a frequency diagram.
• In this case the mean is subtracted from each
observations.
• Then the observations are binned.
Basic Statistics
• The frequency is the number of times a certain value
occurs within an interval.
• The class interval is the width of the bin the
observations are placed in.
• The size of the bin is chosen according to
convenience.
• For a presentation of temperature, one may select,
e.g., the frequency of hours which a temperature was
within a certain limit at any given place, and use 5
intervals.
• In the case of daily high temperatures a 1 interval
may be a better choice.
Basic Statistics
• The next figure is the frequency of
deviations of annual temperatures from
the Long Period Mean from 1871-1930
for Philadelphia Pa.
• In the table the chosen temperature
deviation marks the middle of the class
for which the frequency was computed.
– Thus +3 marks the deviation from +2.5 to
+3.4, +2 for 1.5 to +2.4
Basic Statistics
• The mode is roughly the tallest peak on the
• histogram.
• In our case the mode is +1, because the
most commonly occurring value is +1 .
• Since the mean is 54.4 F the mode is
• 55.4 F.
Basic Statistics
Basic Statistics
• A third measure of the means reliability
is the median, which simply that value
that divides list of values in half.
• That is half the observations are less
than this value and half the values are
more than the median.
Basic Statistics
• As with the mode, the median is found most
easily by creating a Frequency-deviation
diagram
• Then finding the point where half the number
of deviations are above and half below.
• If the mean were a true predictor of the
observations this point would be the 0
deviation.
Basic Statistics
• Using figure again the median is 0 because
there are 19 values less than 0 and 24
values above 0.
• With the class interval chosen this is as close
as can be expected.
Basic Statistics
• The last measure is the amount of skewness
in the values.
• For a symmetrical distribution of values the
mean, mode and median would all be the
same value.
• In the real world symmetrical distributions are
• rare.
• Which the distribution is skewed can often
provide an insight into the physical reasons
for the character of the values.
Basic Statistics
• Negative skewness is defined as mean <
median < mode and positive skewness is
defined as mean > median > mode`

You might also like