Professional Documents
Culture Documents
of Water Sciences, Key Laboratory for Water and Sediment Sciences Ministry of Education,
Beijing Normal University, 19 Xinjiekouwai Street, Beijing, 100875, China
2 College of Mathematic Sciences, Beijing Normal University, 19 Xinjiekouwai Street, Beijing, 100875, China
Correspondence to: C. Wang (chengw@knights.ucf.edu)
Received: 20 March 2014 Published in Nonlin. Processes Geophys. Discuss.: 30 April 2014
Revised: 6 October 2014 Accepted: 13 October 2014 Published: 1 December 2014
ARIMA model
Introduction
Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
1160
2.1
ARIMA model
For a stationary time series, an auto regressive moving average model (ARMA) (p, q) model is defined as follows:
yt = 1 yt1 + 2 yt2 + . . . + p ytp + ut 1 ut1
(1)
2 ut2 . . . q utq ,
where p denotes the autoregressive (AR) parameters, q represents the moving average (MA) parameters, the real parameters 1 , 2 , . . . , and p are called autoregressive coefficients, the real parameters j (j = 1, 2, . . . , q) are moving
average coefficients, and ut is an independent white noise sequence, i.e., ut N (0, 2 ). Usually the mean of {yt } is zero;
if not, yt0 = yt is used in the model.
Lag operator (B) is then introduced; thus
(B) = 1 1 B 2 B 2 . . . p B p ,
2
(B) = 1 1 B 2 B . . . q B ,
(2)
(3)
(4)
(5)
where d is the number of regular differencing. Then the corresponding ARIMA (p, d, q) model for yt can be built (Box,
1997), where d is the number of differencing passes by which
the nonstationary time series might be described as a stationary ARMA process.
2.2
If a time series is identified as nonstationary, differencing is usually made to stationarize the time series. In the
differencing method, the correct amount of differencing
is normally the lowest order of differencing that yields
a time series, which fluctuates around a well-defined
mean value, and whose autocorrelation function (ACF)
plot decays fairly rapidly to zero, either from above or
below. The time series is often transformed for stabilizing its variance through proper transformation, e.g.,
logarithmic transformation. Although logarithmic transformation is commonly used to stabilize the variance of
a time series rather than directly stationarize a time series, the reduction in the variance of a time series is usually helpful to reduce the order of difference in order to
make it stationary.
2. Identification of the order of ARIMA model: after a time
series has been stationarized, the next step is to determine the order terms of its ARIMA model, i.e., the order
of differencing, d for nonstationary time series, the order of auto-regression, p, the order of moving average,
q, and the seasonal terms if the data series show seasonality. While one could just try some different combinations of terms and see what works best strictly, the more
systematic and common way is to tentatively identify
the orders of the ARIMA model by looking at the autocorrelation function (ACF) and partial autocorrelation
(PACF) plots of the stationarized time series. The ACF
plot is merely a bar chart of the coefficients of correlation between a time series and lags of itself, and the
PACF plot presents a plot of the partial correlation coefficients between the series and lags of itself. The detailed guidelines for identifying ARIMA model parameters based on ACF and PACF, can be found elsewhere,
e.g., Pankratz (1983) and Shumway and Stoffer (2005).
It should be noted that, to be strict, the ARIMA model
built in this step is actually an ARMA model with if the
time series is stationary, which is in fact a special case
of ARIMA model with d = 0.
3. Estimation of ARIMA model parameters: while least
square methods (linear or nonlinear) are often used for
the parameter estimation, we use the maximum likelihood method (Mcleod and Sales, 1983; Melard, 1984)
in this paper. A t test is also performed to test the statistical significance.
4. White noise test for residual sequence: it is necessary to
evaluate the established ARIMA model with estimated
parameters before using it to make forecasting. We use
www.nonlin-processes-geophys.net/21/1159/2014/
1161
k0 , t 0
The autocorrelation of the data series is measured by the
autocorrelation coefficient which is defined as
n
P
rk =
et etk
t=k+1
n
P
(k = 1, 2, . . ., m),
(7)
et2
t=1
m
X
rk2
.
nk
k=1
(8)
(9)
Seasonal ARIMA models apply for time series which arranges in order with a certain time interval or step, e.g., a
month. However, in this case, while the seasonal ARIMA
model is capable of dealing with the interannual variation
of each monthly of a monthly data series, the information of
inter-monthly variation of the time series may be lost. For example, after an order of 12 of seasonal differencing (term S in
a general seasonal ARIMA model) of a monthly time series,
the original monthly series has been migrated to a new time
series without seasonality. A nonseasonal ARIMA model is
then fitted to the new time series where the inter-monthly
variation of original monthly time series has also migrated to
Nonlin. Processes Geophys., 21, 11591168, 2014
1162
Figure 2. Prediction steps of a ARIMA model based on clustering and regressive analysis.
(10)
www.nonlin-processes-geophys.net/21/1159/2014/
1163
Estimated
value
Standard
deviation
t test
Tail
probability
1
2
0.16379
0.93434
0.03959
0.02117
4.14
44.14
< .0001
< .0001
Figure 3. Monthly precipitation at the Lanzhou precipitation station. Upper panel: observation (19512001); middle panel: observation (19912000); lower panel: after power transformation
(19512001).
Case study
In this section, we are presenting an application of the proposed improved ARIMA model to the precipitation forecasting of the Lanzhou precipitation station in Lanzhou, China.
Lanzhou is located in the upper basin of Yellow River. It
has a continental climate of mid-temperate zone, with an
average precipitation of 360 mm and mean temperature of
10 C. In general, rainfall seasons are May through September, while drought occurs in spring and winter. The Lanzhou
www.nonlin-processes-geophys.net/21/1159/2014/
1164
2
statistic
Degree of
freedom
0.770
6.910
13.400
16.810
20.650
28.100
30.900
52.940
4
10
16
22
28
34
40
46
Autocorrelations of residue
Tail
probability
0.943
0.734
0.643
0.774
0.840
0.752
0.849
0.224
0.000
0.013
0.092
0.042
0.050
0.045
0.057
0.012
0.007
0.014
0.014
0.007
0.031
0.018
0.015
0.040
0.018
0.012
0.031
0.022
0.048
0.064
0.019
0.022
0.021
0.043
0.004
0.026
0.003
0.044
0.023
0.032
0.007
0.086
0.021
0.032
0.018
0.036
0.006
0.079
0.020
0.019
0.020
0.039
0.008
0.044
0.001
0.156
Autocorrelations of residue for lag 1 through lag 48, 6 lags per row from column 5 through 10.
Figure 4. Autocorrelation and partial correlation plots of data series. Upper panel: autocorrelation; lower panel: partial correlation.
Stationary identification, stationary treatment, model identification, parameter estimation and residual test are performed
for the 12 groups of data. The ACF and PACF plots after seasonal differencing are presented in Fig. 5. A total of
12 ARIMA models are built and the estimated parameters
are shown in Table 3.
4.3
1165
www.nonlin-processes-geophys.net/21/1159/2014/
1166
Model
ML parameter estimation
1
2
3
4
5
6
7
8
9
10
11
12
(1 B) yt = (1 B) ut
(1 B 2 ) yt = ut
yt = (1 B) ut
yt = (1 1 B 2 B 2 ) ut
yt = (1 B 2 ) ut
yt = (1 B) ut
yt = (1 B 2 ) ut
(1 B) yt = (1 B) ut
(1 B) yt = (1 B) ut
yt = (1 B) ut
(1 B) yt = (1 B) ut
(1 B) yt = (1 B) ut
= 0.95, = 0.97
= 0.49
= 0.38
1 = 0.27, 2 = 0.22
= 0.30
= 0.32
= 0.3349
= 0.182, = 0.0528
= 0.956, = 0.469
= 0.32
= 0.681, = 0.741
= 0.650, = 0.766
di
ai
1
2
11
12
0.16
0.21
0.54
0.16
0.09
0.12
0.30
0.27
0.39
1.21
1.51
0.89
0.23
0.14
0.62
0.53
3
4
10
1.92
0.39
1.53
0.50
0.57
1.07
0.46
2.33
0.21
0.53
0.62
0.09
5
6
7
8
9
2.17
0.19
0.22
2.11
0.35
0.41
0.22
0.27
1.07
0.72
0.22
1.49
1.05
0.24
2.01
0.98
0.35
0.35
0.05
0.33
Class
ci
bi
difference from that of AR(36) model; neither models perform as good as the improved ARIMA model or the conventional ARIMA model. However, the AR models give a better
prediction for September precipitation of 2001 than the other
two models.
While the improved ARIMA model catches the correct
trend overall and predicts the monthly precipitation in most
months with high accuracy, it predicts highly accurately for
the dry seasons, such as January, February, March, November, and December. However, it overestimates the precipitation of July and October and underestimates the September precipitation significantly. After a closer look at the data,
we find that the mean precipitations of July and October are
63.8 and 23.48 mm over the period of 1951 through 2000,
respectively, whereas the observation precipitations of both
www.nonlin-processes-geophys.net/21/1159/2014/
1167
Characteristic
variable
ARIMA model
ML parameter
estimating
Standard deviation
estimating
maximum
mean
minimum
(1 B)(1 B) yt = ut
(1 B) yt = (1 B) ut
(1 B)2 yt = (1 B)2 ut
0.56
0.92
0.84
0.13
0.07
0.09
< 0.0001
< 0.0001
< 0.0001
maximum
mean
minimum
(1 B) yt = (1 B)2 ut
(1 B 2 )(1 B)2 yt = ut
(1 B 2 )(1 B)2 yt = ut
0.30
0.52
0.64
0.14
0.12
0.11
0.00311
< 0.0001
< 0.001
maximum
mean
minimum
(1 B 2 )(1 B)2 yt = ut
(1 B)2 (1 B)2 yt = (1 B 4 ) ut
(1 B)2 (1 B)2 yt = (1 B 4 ) ut
0.45
0.82 0.81
0.81 0.80
0.20
0.12
0.16
0.17
0.13
< 0.0001
< 0.0001
Value of P
0.0006
1
2
3
4
5
6
7
8
9
10
11
12
Observation
(mm)
2.8
1.9
0
22.2
11.1
33
39.5
69.8
82
5.2
1.9
0.9
Prediction by
improved ARIMA
model (mm)
Prediction by
conventional ARMA
model (mm)
Prediction by 12
seasonal ARIMA
models (mm)
Prediction by AR(24)
model (mm)
Prediction by AR(36)
model (mm)
prediction
residual
prediction
residual
prediction
residual
prediction
residual
prediction
residual
2.54
1.897
0.099
12.32
12.61
33.58
60.26
72.92
32.5
32.03
1.532
0.898
0.25
0.003
0.099
9.871
1.515
0.582
20.76
3.12
49.5
26.83
0.368
0.002
0
0
5.38
11.99
31.26
41.28
64.88
71.82
37.98
20.15
0
0
2.8
1.9
5.38
10.21
20.16
8.28
25.38
2.02
44.02
14.95
1.9
0.9
1.14
3.58
12.10
12.32
33.17
38.16
47.19
84.12
35.17
24.37
2.68
0.94
1.66
1.68
12.10
9.88
22.07
5.16
7.69
14.32
46.83
19.17
0.78
0.04
0.27
6.4
4.89
5.81
6.49
77.86
22.55
110.5
65.89
55.45
3.9
0
2.53
4.5
4.89
16.3
4.61
44.86
16.9
40.72
16.11
50.25
2
0.9
0.57
6.4
5.24
7.25
12.05
79.75
20.09
114.5
63.2
58.78
3.79
0
2.23
4.5
5.24
14.9
0.95
46.75
19.4
44.73
18.8
53.58
1.89
0.9
9.41
11.49
www.nonlin-processes-geophys.net/21/1159/2014/
11.78
17.05
17.82
1168
References
Ahmad, S., Khan, I. H., and Parida, B. P.: Performance of stochastic
approaches for forecasting river water quality, Water Res., 35,
42614266, 2001.
Box, G. E. P.: Models for forecasting seasonal and nonseasonal time
series, in: Spectral Analysis of Time Series, edited by: Harris, B.,
Wiley, New York, 271311, 1967.
Box, G. E. P.: Time Series Analysis Forecasting And Control, China
Statistics Press, Beijing, China, 1997.
Ip, W. C., Wong, H., and Wang, S. G.: A GIC rule for assessing data
transformation in regression, Statist. Probab. Lett., 68, 105110,
2004.
Jin, J. L., Ding, J., and Wei, Y. M.: Threshold autoregressive model
based on genetic algorithm and its application to forecasting the
shallow groundwater level, Hydraul. Eng., 27, 5155, 1999.
Kohn, R.: Estimation, prediction, and interpolation for ARIMA
models with missing data, J. Am. Stat. Assoc., 81, 751761,
1986.
Lehmann, A. and Rode, M.: Long-term behaviour and crosscorrelation water quality analysis of the River Elbe, Germany,
Water Res., 35, 21532160, 2001.
Mcleod, A. I. and Sales, P. R. H.: An algorithm for approximate
likelihood calculation of ARMA and seasonal ARMA models,
Appl. Stat., 32, 211223, 1983.
Melard, G.: A fast algorithm for the exact likelihood of
autoregressive-moving average models, Appl. Stat., 33, 104119,
1984.
Meloun, M., Snka, M., and Nemec, P.: The analysis of soil cores
polluted with certain metals using the BoxCox transformation,
Environ. Pollut., 137, 273280, 2005.
Niua, X. F., Edmiston, H. L., and Bailey, G. O.: Time series models
for salinity and other environmental factors in the Apalachicola
estuarine system. Estuarine, Coast. Shelf Sci., 46, 549563,
1998.
Pankratz, A.: Forecasting with Univariate BoxJenkins Models:
Concepts and Cases, Wiley, New York, 1983.
Qi, W. and Zhen, M. P.: Winters and ARIMA model analysis of the
lake level of salt Lake Zabuye, Tibetan Plateau, J. Lake Sci., 18,
2128, 2006.
Shumway, R. H. and Stoffer, D. S.: Time Series Analysis and Its Applications Springer-Verlag New York, Inc., Secaucus, NJ, USA,
2005.
Sun, L. H., Zhao, Z. G., and Xu, L.: Study of summer rain pattern in
monsoon region of east China and its circulation cause, J. Appl.
Meteorol. Sci., 16, 5762, 2005.
Thyer, M., Kuczera, G., and Wang, Q. J.: Quantifying parameter
uncertainty in stochastic models using the BoxCox transformation, J. Hydrol., 265, 246257, 2002.
Toth, E. A. and Montanari, B. A.: Real-time flood forecasting via
combined use of conceptual and stochastic models, Phys. Chem.
Earth B, 24, 793798, 1999.
Wang, H. R., Ye, L. T., and Liu, C. M.: Problems in wavelet analysis of hydrologic series and some suggestions on improvement,
Prog. Nat. Sci., 17, 8086, 2007.
Wang, Y., Yin, L. Z., and Zhang, Y.: Judging model of fuzzy optimal dividing based on improved objective function clustering
method, Math. Pract. Theory, 35, 142147, 2005.
www.nonlin-processes-geophys.net/21/1159/2014/