Professional Documents
Culture Documents
Contents
1. Introduction to Time Series
2. Common elements and Decomposition of Time-Series
Trend, Cyclical, Seasonal and Error (Random) components
3. Methods of aggregating, interpolating and smoothing data and trend analysis
4. Introduction to Regression methods (Auto Regression and moving averages)
5. Case Study (Airlines)
6. Box and Jenkins (ARIMA) methodology using a case study
Stationarity, Autocorrelation, Differencing
7. Case Study – Retail Sales
8. Intervention Analysis
9. ARCH and GARCH Models
Time Series Analysis
Time series analysis comprises methods that attempt to understand such time series,
often either to understand the underlying context of the data points (Where did they come
from? What generated them?), or to make forecasts (predictions).
Obtaining an understanding of the underlying forces and structure that produced the
observed data
Fitting a model and proceeding to forecasting monitoring or even feedback and feed
forward control.
Assumptions of Time Series Analysis
As in most other analyses, in time series analysis it is assumed that the data consist of a
systematic pattern (usually a set of identifiable components) and random noise (error).
Key Assumptions:
Pattern of past will continue into the future. (else the series will represent a random walk
which cannot be modelled or forecasted)
Discrete time series data should be equally spaced over time. There should be no missing
values in the training data set( or should be handled using appropriate missing value
treatment in the data preparation process).
Time Series Analysis cannot be used to predict effect of random events( Example- Terrorist
Attacks or acts of god such as Tsunami disaster etc)
Decomposition of Time Series
The decomposition of time series is a statistical method that deconstructs a time series into notional
components. There are two principal types of decomposition which are outlines below:
Hence, Y = f (T,S,C,I)
Where Y denotes the result of the four elements; T = Trend ; S = Seasonal component; C = cyclical components; I = Irregular
component
Exponential Smoothing: in Simple moving average the past observations are weighted equally,
exponential smoothing assigns exponentially decreasing weights as the observations get older. The
simplest form of exponential smoothing is given by the formulae:
where α is the smoothing factor, and 0 < α < 1. In other words, the smoothed statistic st is a simple
weighted average of the latest observation xt and the previous smoothed statistic st−1. Simple exponential
smoothing is easily applied, and it produces a smoothed statistic as soon as two observations are
available.
Smoothing (Contd.)
By direct substitution of the defining equation for simple exponential smoothing back into itself we find
that
In other words, as time passes the smoothed statistic st becomes the weighted average of a greater and
greater number of the past observations xt−n, and the weights assigned to previous observations are in
general proportional to the terms of the geometric progression {1, (1−α), (1−α)2, (1−α)3, …}. A geometric
progression is the discrete version of an exponential function, so this is where the name for this
smoothing method originated.
Seasonal and Non-seasonal Models With or Without Trend
The concept of simple exponential smoothing introduced the basic procedure for identifying a smoothing
parameter.
In addition to simple exponential smoothing, more complex models have been developed to
accommodate time series with seasonal and trend components.
The general idea here is that forecasts are not only computed from consecutive previous observations
(as in simple exponential smoothing), but an independent (smoothed) trend and seasonal component
can be added.
In general the one-step-ahead forecasts are computed as (for no trend models, for linear and exponential
trend models a trend component is added to the model; see below):
Additive model:
Forecastt = St + It-p
Multiplicative model:
Forecastt = St*It-p
In this formula, St stands for the (simple) exponentially smoothed value of the series at time t, and It-p
stands for the smoothed seasonal factor at time t minus p (the length of the season). Thus, compared to
simple exponential smoothing, the forecast is "enhanced" by adding or multiplying the simple smoothed
value by the predicted seasonal component.
We can extend the previous example to illustrate the additive and multiplicative trend-cycle components.
In terms of our toy example, a "fashion" trend may produce a steady increase in sales (e.g., a trend
towards more educational toys in general); as with the seasonal component, this trend may be additive
(sales increase by 3 million dollars per year) or multiplicative (sales increase by 30%, or by a factor of
1.3, annually) in nature.
In addition, cyclical components may impact sales; to reiterate, a cyclical component is different from a
seasonal component in that it usually is of longer duration, and that it occurs at irregular intervals. For
example, a particular toy may be particularly "hot" during a summer season (e.g., a particular doll which
is tied to the release of a major children's movie, and is promoted with extensive advertising). Again such
a cyclical component can effect sales in an additive manner or multiplicative manner.
Airlines Case Study
Case, representing monthly international airline passenger totals (measured in thousands)
in twelve consecutive years from 1949 to 1960.
Year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
Jan 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 235 267 317 356 362 406 419
Apr 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
Jun 135 149 178 218 243 264 315 374 422 435 472 535
Jul 148 170 199 230 264 302 364 413 465 491 548 622
Aug 148 170 199 242 272 293 347 405 467 505 559 606
Sep 136 158 184 209 237 259 312 355 404 404 463 508
Oct 119 133 162 191 211 229 274 306 347 359 407 461
Nov 104 114 146 172 180 203 237 271 305 310 362 390
Dec 118 140 166 194 201 229 278 306 336 337 405 432
Introduction to ARIMA
The modeling and forecasting procedures, involved knowledge about the mathematical
model of the process. However, in real-life research and practice, patterns of the data are
unclear, individual observations involve considerable error, and we still need not only to
uncover the hidden patterns in the data but also generate forecasts.
The ARIMA methodology developed by Box and Jenkins allows us to do just that; it has
gained enormous popularity in many areas and research practice confirms its power and
flexibility. However, because of its power and flexibility, ARIMA is a complex technique; it is
not easy to use, it requires a great deal of experience, and although it often produces
satisfactory results, those results depend on the researcher's level of expertise. The
following sections will introduce the basic ideas of this methodology.
Two Common Processes
Autoregressive process. Most time series consist of elements that are serially dependent
in the sense that one can estimate a coefficient or a set of coefficients that describe
consecutive elements of the series from specific, time-lagged (previous) elements. This can
be summarized in the equation:
xt = + 1*x(t-1) + 2*x(t-2) + 3*x(t-3) + ... +
Where:
is a constant (intercept), and
1, 2, 3 are the autoregressive model parameters.
Put in words, each observation is made up of a random error component (random shock, )
and a linear combination of prior observations.
– Stationarity requirement. Note that an autoregressive process will only be stable if the
parameters are within a certain range; for example, if there is only one autoregressive
parameter then is must fall within the interval of -1 << 1. Otherwise, past effects would
accumulate and the values of successive xt' s would move towards infinity, that is, the
series would not be stationary. If there is more than one autoregressive parameter, similar
restrictions on the parameter values can be defined.
Moving average process. Independent from the autoregressive process, each element in
the series can also be affected by the past error (or random shock) that cannot be
accounted for by the autoregressive component, that is:
xt = µ + t - 1*(t-1) - 2*(t-2) - 3*(t-3) - ...
Where:
µ is a constant,
and 1, 2, 3 are the moving average model parameters.
Put in words, each observation is made up of a random error component (random shock, )
and a linear combination of prior random shocks.
ARIMA Methodology
For example, the model (0,1,2)(0,1,1) describes a model that includes no autoregressive
parameters, 2 regular moving average parameters and 1 seasonal moving average
parameter, and these parameters were computed for the series after it was differenced
once with lag 1, and once seasonally differenced.
The seasonal lag used for the seasonal parameters is usually determined during the
identification phase and must be explicitly specified.
The general recommendations concerning the selection of parameters to be estimated
(based on ACF and PACF) also apply to seasonal models.
The main difference is that in seasonal series, ACF and PACF will show sizable
coefficients at multiples of the seasonal lag (in addition to their overall patterns reflecting
the non seasonal components of the series).
Step 1: Identification
Identification
– As mentioned earlier, the input series for ARIMA needs to be stationary, that is, it should
have a constant mean, variance, and autocorrelation through time.
– Therefore, usually the series first needs to be differenced until it is stationary (this also
often requires log transforming the data to stabilize the variance).
– The number of times the series needs to be differenced to achieve stationarity is reflected
in the d parameter.
– In order to determine the necessary level of differencing, one should examine the plot of
the data and autocorrelogram.
– Significant changes in level (strong upward or downward changes) usually require first
order non seasonal (lag=1) differencing; strong changes of slope usually require second
order non seasonal differencing.
– At this stage we also need to decide how many autoregressive (p) and moving average
(q) parameters are necessary to yield an effective but still parsimonious model of the
process (parsimonious means that it has the fewest parameters and greatest number of
degrees of freedom among all models that fit the data). In practice, the numbers of the p
or q parameters very rarely need to be greater than 2
Examining Correlograms
Autocorrelation occurs when residual error terms from observations of the same variable at
different times are correlated. Autocorrelation function (ACF) describes the strength of the
relationships between different points in the series.
While examining correlograms one should keep in mind that autocorrelations for
consecutive lags are formally dependent. Consider the following example. If the first
element is closely related to the second, and the second to the third, then the first element
must also be somewhat related to the third one, etc. This implies that the pattern of serial
dependencies can change considerably after removing the first order auto correlation (i.e.,
after differencing the series with a lag of 1).
Autocorrelation Function
Passangers
(Standard errors are white-noise estimates)
Lag Corr. S.E. Q p
1 +.948 .0825 132.1 0.000
2 +.876 .0822 245.6 0.000
3 +.807 .0819 342.7 0.000
4 +.753 .0816 427.7 0.000
5 +.714 .0813 504.8 0.000
6 +.682 .0810 575.6 0.000
7 +.663 .0807 643.0 0.000
8 +.656 .0804 709.5 0.000
9 +.671 .0801 779.6 0.000
10 +.703 .0798 857.1 0.000
11 +.743 .0795 944.4 0.000
12 +.760 .0792 1036. 0.000
13 +.713 .0789 1118. 0.000
14 +.646 .0786 1186. 0.000
15 +.586 .0783 1242. 0.000
0 0 Conf. Limit
-1.0 -0.5 0.0 0.5 1.0
Partial autocorrelations. Another useful method to examine serial dependencies is to
examine the partial autocorrelation function (PACF) - an extension of autocorrelation,
where the dependence on the intermediate elements (those within the lag) is removed. In
other words the partial autocorrelation is similar to autocorrelation, except that when
calculating it, the (auto) correlations with all the elements within the lag are partialled out. If
a lag of 1 is specified (i.e., there are no intermediate elements within the lag), then the
partial autocorrelation is equivalent to auto correlation. In a sense, the partial
autocorrelation provides a "cleaner" picture of serial dependencies for individual lags (not
confounded by other serial dependencies).
– One autoregressive (p) parameter: ACF - exponential decay; PACF - spike at lag 1, no
correlation for other lags.
– One moving average (q) parameter: ACF - spike at lag 1, no correlation for other lags;
PACF - damps out exponentially.
– Two moving average (q) parameters: ACF - spikes at lags 1 and 2, no correlation for other
lags; PACF - a sine-wave shape pattern or a set of exponential decays.
– One autoregressive (p) and one moving average (q) parameter: ACF - exponential decay
starting at lag 1; PACF - exponential decay starting at lag 1.
Step 2: Estimation and Forecasting
At the next step the parameters are estimated (using function minimization procedures), so
that the sum of squared residuals is minimized.
– The estimates of the parameters are used in the last stage (Forecasting) to calculate new
values of the series (beyond those included in the input data set) and confidence intervals
for those predicted values.
– The estimation process is performed on transformed (differenced) data; before the
forecasts are generated, the series needs to be integrated (integration is the inverse of
differencing) so that the forecasts are expressed in values compatible with the input data.
– This automatic integration feature is represented by the letter I in the name of the
methodology (ARIMA = Auto-Regressive Integrated Moving Average).
The constant in ARIMA models
If the series is differenced, then the constant represents the mean or intercept of the
differenced series.
For example, if the series is differenced once, and there are no autoregressive
parameters in the model, then the constant represents the mean of the differenced
series, and therefore the linear trend slope of the un-differenced series.
Parameter Estimation
There are several different methods for estimating the parameters. All of them should
produce very similar estimates, but may be more or less efficient for any given model.
In general, during the parameter estimation phase a function minimization algorithm is used
(the so-called quasi-Newton method )to maximize the likelihood (probability) of the
observed series, given the parameter values.
In practice, this requires the calculation of the conditional sums of squares (SS) of the
residuals, given the respective parameters.
Different methods have been proposed to compute the SS for the residuals:
– the approximate maximum likelihood method
– the approximate maximum likelihood method with backcasting
– the exact maximum likelihood method
Comparison of methods
– In general, all methods should yield very similar parameter estimates. Also, all methods
are about equally efficient in most real-world time series applications.
– Melard's exact maximum likelihood method (number 3 above) may also become inefficient
when used to estimate parameters for seasonal models with long seasonal lags (e.g., with
yearly lags of 365 days).
– On the other hand, you should always use the approximate maximum likelihood method
first in order to establish initial parameter estimates that are very close to the actual final
values; thus, usually only a few iterations with the exact maximum likelihood method (3,
above) are necessary to finalize the parameter estimates.
Parameter standard errors
– For all parameter estimates, you will compute so-called asymptotic standard errors. These
are computed from the matrix of second-order partial derivatives that is approximated via
finite differencing
Penalty value
– As mentioned above, the estimation procedure requires that the (conditional) sums of
squares of the ARIMA residuals be minimized
– If the model is inappropriate, it may happen during the iterative estimation process that the
parameter estimates become very large, and, in fact, invalid. In that case, it will assign a
very large value (a so-called penalty value) to the SS.
– This usually "entices" the iteration process to move the parameters away from invalid
ranges. However, in some cases even this strategy fails, and you may see on the screen
(during the Estimation procedure) very large values for the SS in consecutive iterations.
– In that case, carefully evaluate the appropriateness of your model.
– If your model contains many parameters, and perhaps an intervention component ,you
may try again with different parameter start values.
Step 3: Evaluation of the Model
Parameter estimates
– You will report approximate t values, computed from the parameter standard errors. If not
significant, the respective parameter can in most cases be dropped from the model
without affecting substantially the overall fit of the model.
Other quality criteria
– Another straightforward and common measure of the reliability of the model is the
accuracy of its forecasts generated based on partial data so that the forecasts can be
compared with known (original) observations.
However, a good model should not only provide sufficiently accurate forecasts, it
should also produce statistically independent residuals that contain only noise and no
systematic components (e.g., the correlogram of residuals should not reveal any serial
dependencies). A good test of the model is
– to plot the residuals and inspect them for any systematic trends, and
The most straightforward way of evaluating the accuracy of the forecasts based on a
particular value is to simply plot the observed values and the one-step-ahead forecasts.
This plot can also include the residuals (scaled against the right Y-axis), so that regions of
better or worst fit can also easily be identified.
This visual check of the accuracy of forecasts is often the most powerful method for
determining whether or not the current exponential smoothing model fits the data. In
addition, besides the ex post MSE criterion, there are other statistical measures of error
that can be used to determine the optimum parameter
Mean error
– The mean error (ME) value is simply computed as the average error value (average of
observed minus one-step-ahead forecast). Obviously, a drawback of this measure is that
positive and negative error values can cancel each other out, so this measure is not a
very good indicator of overall fit.
where Xt is the observed value at time t, and Ft is the forecasts (smoothed values).
– A final issue that we have neglected up to this point is the problem of the initial value, or
how to start the smoothing process. If you look back at the formula above, it is evident that
one needs an S0 value in order to compute the smoothed value (forecast) for the first
observation in the series. Depending on the choice of the parameter (i.e., when is close to
zero), the initial value for the smoothing process can affect the quality of the forecasts for
many observations. As with most other aspects of exponential smoothing it is
recommended to choose the initial value that produces the best forecasts. On the other
hand, in practice, when there are many leading observations prior to a crucial actual
forecast, the initial value will not affect that forecast by much, since its effect will have long
"faded" from the smoothed series (due to the exponentially decreasing weights, the older
an observation the less it will influence the forecast).
Limitations of ARIMA
– The ARIMA method is appropriate only for a time series that is stationary (i.e., its mean,
variance, and autocorrelation should be approximately constant through time) and it is
recommended that there are at least 50 observations in the input data. It is also assumed
that the values of the estimated parameters are constant throughout the series.
Intervention Analysis
Intervention indicator is used to examine the impact of an event in the Time Series
The presence or absencemof deterministic events are coded as a step and the impact as
response function
Innovative interventions represent shocks at a certain time that have lingering effects on
the data at subsequent time points
Case Study
Interpret the ARIMA model for the airlines example. Use the code provided below: