Professional Documents
Culture Documents
Abstract-Extreme value theory is discussed in a manner suitable for scientists working in the air pollution
area. The method of application of the theory is described by means ofancxamplc analysis on an ozone data
set and the theory is applied to several data sets collected in Brisbane, Queensland, Australia. The theory is
used to predict the number of violations of WHO and U.S. standards expected in the year following data
collection. Comparison of these predictions with the relevant observations shows that the theory does quite
well. Ways of using extreme value theory as an air quality management tool are suggested.
Key word index: Extreme value theory, air pollution episodes, air pollution standards.
Probability
340 -
320 -
300 -
280 -
260 -
240 -
no-
200-
180 -
160 -
140 -
120 -
100
60- - ycl
6of-
40
I I I I I I I I I I
-2 -I 0 I 2 3 4 5 6 7
Fig. 1. Observed and theoretical largest I-h ozone concentration per month, Brisbane-across three sites.
Table 1. 95% confidence intervals about the line (_v,vs x) less stringent and is content with the one value in t
representing G,.(x) future trials being greater than the observed maximum
of the experimental record then
Rank of observed
maximum value, r 1 2 3 4 T=f/[r+JzYJ, (i=2) (10)
AX &3.07fo^ f 1.78/g f 1.3S/a^ i l.I7/& where t = p/(1 - p) (Roberts, 1979b). Hence for a 90%
confidence that only one value in the following year
Adapted from Roberts, 1979b.
will exceed the maximum of the record, 2.2 years of
data should be analysed.
Note that points that fall outside the confidence The formulae of (9) and (IO) are meant as guidelines
bands need not necessarily be erroneous but in order for the experimental design. They simply reinforce the
that the extreme value theory may represent the data concept that the longer the data record the more
they are excluded (Roberts, 1979b). confidence one has in the predictions that are made
from it. This is especially important in the air quality
3. APPLICATION
area where the data record should reflect the variety of
meteorological conditions which are experienced in an
It isobvious that no matter how long thedatarecord
area and hence will include those conditions which lead
analysed, there is a certain probability that longer
to air pollution episodes. It should be noted however
observation will produce a value larger than the
that the longer the data record the more likely that the
existing observed maximum. However compromise
emissions may exhibit a trend either due to more
must be made between sample size, time available for
emission sources or conversely, due to tighter emission
the experiment, available funds and other factors and
control. Most developed countries have instigated
the confidence with which one desires to make predic-
continuous air monitoring programmes and hence
tions from the data. The objective is to establish the
records of good length and quality are becoming
total number of observations T, such that for a given
available.
confidence of (I -p). less than i values in the next z
observations will be greater than the maximum ob- (i) The data
served in the sample of T. Roberts (1979b) shows that In Queensland sucha programme wasestablishcd in
Brisbane at the inner city Fortitude Valley site in 1975;
at the SW suburban site of Rocklea in 1977; and at the
near coastal site of Eagle Farm in 1978. Of the data
This is interpreted as, for example, if a confidence of collected at these sites. the following were analysed:
907; is required that no values greater than the Fortitude Valley, 19761983. NO,. CO, 0,; Rocklca,
observed maximum should be observed in the foliow- 1977-1983, OS, 1978-1983, NO,; Eagle Farm,
ing year, 9 years of data should be analysed. If one is 1978-1983. OS, 1979-1983. N02. In addition the
1846 P. G. SURMAN et al.
highest 03 concentration across these three Brisbane Reports of the Air Quality Council of Queensland.
sites for 1976-1983 was analysed. Note that from Missing values in the X1 data set were replaced by the
Equation (9) we can say that for 8,7.6 years of data we relevant monthly mean as graphing of the data showed
are 89.88.86 %, respectively confident that a maximum that there was a seasonal erect but no annual trend.
greater than that of the data record will not occur in the For ixample, the missing observation for March 1976
following year. Perusal of the 1984 data shows that no for NOz at the Fortitude Valley site was replaced by
daily maximum l-h concentration was greater than the the average of the remaining March observations of
maximum of the respective record up to the end of NO2 at this site.
1983. At the sites half-hourly averages of ambient air The 96 observations are ranked from highest to
pollutant concentrations are recorded continuously. lowest so that a new ordered sequence x1 (r); r = 1. 96
i.e. there are 17,520 observations for each pollutant at results. The probability of a given x, value having rank
each site each year. The l-h averages are computed as r out of a sample of 96 is calculated using Equation (5)
the arithmetic averages of pairs of 1/2-h concentra- and the corresponding asymptotic variate y,(r) is
tions. An assumption of independence of such l-h calculated from Equation (6). If a value is repeated j
observations will not be valid, however if we consider times it is assignedj successive ranks from r to r -tj - I
the initial random variable. X, as the maximum l-h for the purpose of fitting the theoretical line. For the
average in a day, then the assumption of independence purpose of plotting. the points corresponding to the
is strengthened. We will confine our analysis to this first and last ranks of the repeated value are shown
definition of X. connected by a line. The point shown on the interval is
the value of!!, corresponding to the rank Jr(r+j-l)
(ii) Example analysis-ozone, Brisbane (Roberts. 1979b). See for example the value of
If we consider that within any month, i. there are. on 118pgm -’ in Fig. I.
average, 30 observations of the daily maximum 1-h O3 The parameters of the theoretical line are de-
concentration, then these can be ordered from highest termined using Equation (7). It can be seen from
to lowest in the form xi., , > Xi.27 > .x,.30. Re- Fig. 1 that the asymptotic theory of extremes fits the
peating the process for each month of the 8 years of monthly maximum well (coefficient of determination
data available results in 96 such ordered sequences; the = 0.995).
first of these being denoted as x,. , , x!.~, . . x1.30and
the last as .rg6., , xg6.2, x96.30. By selecting the (iii) Rrsulrs
highest value. _Yi.,, from each sequence we derive Using Table 1 confidence intervals were placed
another sequence of 96 observations which represent about the four highest ranked observations of each
realizations of the random variable X, i.e. the maxi- data set. it was found that the largest value for NOI, at
mum l-h concentration in a day. (As most standards of Fortitude Valley of 642 pgme3 did not lie within the
interest are written in terms of maximum l-h concen- confidence interval for the highest ranked value
trations other derived sequences are not analysed (195 pgm-’ < x(l) c 565pgm-3). The plot of these
here.) These final data were obtained from the Annual data is shown in Fig. 2. For the application of the
Probability
.Sj 640 -
s 600 -
560 -
G 2m
=” 8 520
0 8 480 -
zr 440 -
7E L 400 -
,% 360 -
“$ 320-
4:kfi 280 -
240-
% zoo-
4 160 -
I I I I
II A 1 I I I I
-2 -I 0 I 2 3 4 5 6 7
Fig. 2. Observed and theoretical largest I-h nitrogen dioxide concentration per month, Fortitute Valley site.
Frequency of air pollution episodes using extreme value theory 1847
extreme value theory this data point was considered the value of 240 fig m -3 for the daiIy maximum I .h O3
discordant with the rest of the data set and was concentration will be equalled or exceeded once in 16
replaced with the relevant monthly mean and the months. This shows that this standard would be
analysis was repeated. expected to be met in the following year. Consider now
The equations for the theoretical lines for each data the World Health Organixation (WHO) long-term
set together with the coefficients of determination goal of a l-h concentration of 120~gm-‘. For the
appear in Table 2. Note the improvement in the Brisbane data set, its return period is 1.31 months, i.e.
coefficient of determination from 0.792 to 0.979 for in the coming year we would expect this goal to be
NOI at Fortitude Valley when the discordant value equalled or exceeded as the daily maximum I-h
was replaced. concentration on 10 occasions. Note that the WHO
The return period for any concentration can be read goal specifies only a l-h concentration and not the
directly from the appropriate graph or calculated as in daily maximum l-h con~ntration and hence the
the following example. Consider the Australian prediction is a lower bound on the number of viol-
National Health and MedicaI Research Council ations expected. Table 3 presents the relevant stan-
(NHMRC) long-term goal for 0,. i.e. the daily maxi- dards, their return periods and the number of viol-
mum I-h concentration must not exceed 240 pgrn-’ ations expected and observed in the coming year for
on more than one occasion per year. For Brisbane. each pollutant and site analysed.
using the appropriate Equation (4) from Table 2,
y1 (240) = 2.71828 and using Equation (3), e1,30 (iv) Air quality management
= 0.936142 which from Equation (8) yields a return The maximum expected for a certain return period
period, R (240) = 15.66 months, i.e. it is expected that can be used in source reduction management problems
CoeJlicient of
Data set Equation of asymptote determination
Table 3. Air quality goals/standards. return periods and expected and observed number of violations
l As defined in text.
where at~inment of the standard may not be im- emissions. Obviously these wiff vary through time due
mediately feasible. For example, over 97% of CO to either tighter control or increased sources ol
entering the Brisbane airshed results from motor emissions in a given area. The resulting effect may be a
vehicle emissions. Examination of Table 3 shows that trend to either lower or higher concentrations in the
the maximum J-h concentration of CO at the record which should be removed from the data prior to
Fortitude Valley site will equal or exceed the U.S. the analysis.
standard on 3 days per year (compared with 23 such The model’s main use is in the prediction of the
occurrences in the 8 years prior to 1984). Of course number of violations expected for a standard in a given
there may be several violations of the standard on any period or the return period for any given value. It is
one of those days. The value expected to be equalled or also useful as an air management tool for the presen-
exceeded once per year is 49 mgmv3. Hence, on tation of scenarios to aid decision makers to determine
average, a reduction of9 mg m -3 needs to be achieved needed source reductions and for predicting severe air
at this site. Using the information contained in an pollution episodes.
emission inventory for the airshed this can be trans-
lated into either a required reduction in the number of Acknowledgements-The authors express their appreciation
vehicles passing the site or an ef%dency factor for use to Chris Czarkowski for writing the computer program for
the analysis and to the staff of the Division of’ Air Pollution
with emission control devices for motor vehicles. Most
Control Queensland, Australia for their assistance with the
large cities have emission inventories If an immediate data.
reduction of 9 mgm -3 is not feasible then difl‘erent
strategies for various levels of reduction can be
evaluated by comparing the return periods of the REFERENCES
concentrations expected. A similar approach can be Air Pollutian Council of Queensland, Annual Reporrs
used for NO3 emissions at sites near roadways. 119X-1984) Government Printer, Brisbane, Australia.
Extreme value theory may also be usefur for the Barbw R. I?.. (1972) Averaging time and maxima for au
prediction of {and hence forward planning for) severe pollution concentrations. Proc. #tk ~~~ke~e~ Symp. on
~~fkern~f~c~~ Statistics. University of California,
air pohution episodes (analogous to the 10 or 50 year
Berkeley, CA.
flood heights in hydrology). Given the Jong lead time Barlow R. Eand Singpurawalfa N. (1974)Averaging time and
required to enact legislation, and devise and install maxima for dependent observations. Prof. Symp. on
emission controls in industry and motor vehicles such St~t~sti~&~ Aspects C$ Air @oiity Data. Report No. EPA
time spans are not irrelevant. The concept of the SO- 650/4-74-038. U.S. Environmental Protection Agency.
Research Triangle Park, NC.
year concentration is also important in that it makes Berger A.. Melice J. L. and Demuth C. L. (1982) Statistical
workers in the air pollution area aware that very large distributions of daily and high atmospheric SO1 canccn-
concentrations can still occur despite stringent control trations. Atmospheric ~~ui~o~rne~r 16, 2563.
measures. Gumbel E. J. (1954) Staristical Theory @Extreme Values and
Some Practical Obseruurions. National Bureau of
Standards, Applied Mathematics Series 33, U.S. Govt.
Printing Of&e.
Gumbe E. J. (1958) Srarisrics of Exrremes. Columbia
University Press, Columbia.
Unlike the usual empirical models, used in the air Horowitz J. and Barakat S. (1979) Statistical analysis of
the maximum concentration of an air pollutant: errCcts
pollution area, the extreme value theory model does of autocorrelation and non-stationarity. Atmospheric
not rely on the assumption of a specific frequency ~~~i~ff~~fft t3,81 f-818.
distribution of the initial data. It is a statistical model Larsen R. I. (1969)A new mathematical model olairpollutant
and is predictive into the future, The confidence one concentration averaging time and frequency. .I. Air Po//ur.
Conrrol Ass. 19, 24.
has in such predictions is increasing as longer records
Roberts E. M (1979a) Review of statistin of extreme values
become avaiIable from continuous air monitoring with applications to air quality data-i. Review. J. Air
networks. As the experimental record increases in Poltut. Control Ass. 29, 632.
length, diverse meteorological conditions will be re- Roberts E. M. (1979b) Review of statistics of extreme values
flected in thedata. These meteoroiogi~lconditionsare with applications to air quality data-Jr. Applications. J.
Air Pollut. Control Ass. 29, 733.
controiiing variabfes of the stochastic process which Singpurawalla N. (1972) Extreme values from a fog normal
results in air poWion concentrations. The other Iaw with applications to air polfution problems.
controlling variables in the process are the levels of Technometrics 14, 703.