You are on page 1of 8

Advanced Econometrics, G.

Talmain, Topic 1, Univariate Time Series

1
1.1

Univariate Time Series


Introduction

Time series econometrics is concerned with the analysis of variables that are observed in sequence over
time. Some examples of economic time series variables are GNP, inflation, interest rate, unemployment
rate, and so on. The next figure shows U.S. real GDP per capita observed annually from 1909 to 2004:
US Real GNP per capita, Base Year 2000

$40,000

$35,000

Real GNP/Capita

$30,000

$25,000

$20,000

$15,000

$10,000

$5,000

18
9
18 0
9
18 3
9
18 6
9
19 9
0
19 2
0
19 5
0
19 8
1
19 1
1
19 4
1
19 7
2
19 0
2
19 3
2
19 6
2
19 9
3
19 2
3
19 5
3
19 8
4
19 1
4
19 4
4
19 7
5
19 0
5
19 3
5
19 6
5
19 9
6
19 2
6
19 5
6
19 8
7
19 1
7
19 4
7
19 7
8
19 0
8
19 3
8
19 6
8
19 9
9
19 2
9
19 5
9
20 8
0
20 1
04

$0

Year

US Real GNP per capita


These observations can be thought of as a realisation of a time series process whose properties are unknown.
The question we want to address here is: can we find a model that describes in a simple and parsimonious
way such time series process? Can we use it for forecasting future realisations of the process?
A time series process is a stochastic process, that is, a collection of random variables observed through
time, denoted by

Y = yt=
= {. . . , y1 , y0 , y1 , . . . , yt , yt+1 , . . .} .
A realisation of T observations of such a process is denoted by {yt }Tt=1 = {y1 , ..., yT }. Stochastic means
that the exact point where future realisations will lie is uncertain. Suppose we know that the process of
interest can be described by the following simple time series model:
yt = + t + t1 .

(1)

where and are two non random parameters, and {t }Tt=1 is a series of independent random variables
with zero mean and constant variance. A set of realisations of t , for t = 1, ..., T , defines then a set of
realisations of yt , for t = 1, ..., T . N dierent realisations of t define then N dierent realisations of yt .This
in turn implies that (1) defines a joint distribution for the random variables {y1 , ..., yT }. The knowledge
of the properties of such distribution is crucial for predicting or forecasting future realisations of yt . In
practice, what often aim at learning about the first two moments only: the mean, the autocovariances,
and the autocorrelations of the process.
200
Figure 2 depicts two dierent draws of 200 i.i.d. shocks, {t }t=1 , from a normal law with mean zero

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

and constant variance 1, the blue draw and the pink draw.
Two draws T=200 of normally distributed shocks, N(0, 1)
4

91
96
10
1
10
6
11
1
11
6
12
1
12
6
13
1
13
6
14
1
14
6
15
1
15
6
16
1
16
6
17
1
17
6
18
1
18
6
19
1
19
6

81
86

76

71

66

56
61

46
51

31
36
41

26

21

16

6
11

0
1

shock

-1

-2

-3

-4
time
N001

N002

Figure 2
Figure 3 depicts the corresponding values taken by the previous process (1) for = 0.1 and = 0.99.
Two draws of the autoregressive process with mu = 0.1 and theta = 0.99
6
5
4
3
2

86

81

76

71

66

56
61

46
51

36
41

31

26

21

16

91
96
10
1
10
6
11
1
11
6
12
1
12
6
13
1
13
6
14
1
14
6
15
1
15
6
16
1
16
6
17
1
17
6
18
1
18
6
19
1
19
6

-1

6
11

-2
-3
-4
-5
time
Y001

Y002

Figure 3
Note that we have a problem for y0 . For instance, for the blue process, we have 0 = 0.078 and
1 = 0.97, which yields y1 = + 1 + 0 = 0.1 0.97 0.99 0.078 = 0.95. Likewise, to calculate
y0 = + 0 + 1 which means that we need the value of 1 to calculate y0 . A standard practice for, this
problem of initialisation, is to set 1 to its expected value = 0, in which case y0 = 0.1 0.078 = 0.022.

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

Suppose that, unknown to us, the true value of 1 for that realisation was dierent from 0. Then clearly
we would not have found the correct value for y0 . Would our picture in Figure 3 be much aected?
The properties of a time series process can be examined in the time domain or in the frequency domain.
In the frequency domain, time series processes are represented as the sum of periodic functions and the
focus is on the contributions made by such periodic components to the series. Here, we will examine the
time domain properties of time series processes only, where the focus is on the properties of the joint
distribution of {y1 , ..., yT } at time t and at time t + (for further reference on the frequency domain, see
Harvey, ch. 6, and Hamilton, ch. 6).

1.2
1.2.1

Models of stationary time series


Some definitions

Mean Consider a (absolutely continuous) random process Yt , of which we observe a realisation over time
t = 1, ..., T , {{y1 , ..., yT }. The mean of the process at time t is defined as
t E(Yt ),

(2)

where E(Yt ) = yt fYt (yt )dyt is the expected value of Yt , fYt is the density of probability of Yt . Note
that the expectation may change with time: the mean can be represented by the (deterministic with
respect to the information set under consideration) function:
: t (t) E(Yt ).
We will study many processes which have the property of mean-stationarity, that is the function (t)
is a constant over time: it is time-invariant. In this case, there exists a real number such that
t, (t) = .
For our process (1), the unconditional expected value at time t is calculated as
E [yt ] =
=
=
=

E [ + t + t1 ]
E [] + E [t ] + E [t1 ]
+0+0
.

Hence, this process is, in fact, mean stationary. Note that this unconditional expectation is independent
of the realisation of the process: both the series generated by the blue shocks and by the pink shocks have
the same unconditional expectation = 0.1.
There is another concept of expectation that does come into play: conditional expectation. For
illustration, on the blue series of shocks, we had 9 = 1.16 and 10 = 1.35, which meant that y10 = 0.10.
Does the knowledge of this information aect the expected value of, say, y11 ? The answer is "it does".
Let I10 = {9 = 1.16 and 10 = 1.35} be the information at time t = 10,
E [y11 | I10 ] =
=
=
=

E [|I10 ] + E [11 | I10 ] + E [10 | I10 ]


+ 0 + E [10 | I10 ]
0.1 + 0.99 1.35
1.24.

which is clearly dierent from . More generally, when It = {t }, we would find


E [yt+1 | It ] = + t .

(3)

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

Note, first, that the knowledge of 9 did not matter for the calculation of E [yt+1 | It ], and, second, that the
conditional expectation of yt+1 depends on the realisation. For the pink process, we had in fact 10 = 2.49,
which meant that, in the world of the pink process, E [y11 | I10 ] = 0.1 + 0.99 2.49 = 2.36 6= 1.24.
It is easy enough to read the value of 10 when one has just constructed the process. In practice, one
has to draw a distinction between what is observable, typically the values of yt , and what is unobservobs
able, the underlying shocks {t }. The observable information
set I10
will then consist of the observed

obs
obs
values I10
=
{y
,
.
.
.
,
y
}
,
and
the
variable
E

|I
,
needed
to
calculate
the conditional expected
1
10
10 10

obs
value
y11 |I10 , will have to be inferred from this information. Typically, the conditional expectation
E obs
E y11 |I10
6= E [y11 | I10 ]: the information set matters when calculating the conditional expectation.
Variance The variance is defined as:

V ar(Yt ) E (Yt t )2 .

(4)

Note the variance is never negative. Again, the variance can be thought of as a (deterministic with respect
to the information set under consideration) function of time

V ar : t V ar (Yt ) E (Yt t )2 .

Again, one has to distinguish between conditional and unconditional variances, where the conditional
variance at some future time t + is predicated on the information available at that time

V ar(Yt+ | It ) E (Yt+ E [Yt+ | It ])2 | It .


(5)

Furthermore, we will also be interested in processes for which the variance is constant over time, i.e. there
exists a positive number V , such that t, V ar (Yt ) = V .

i.i.d.
For our process (1), recall that N 0, 2 where = 1. The unconditional variance at time t is
given by

V ar(yt ) = E (yt t )2
h
i
2
= E (( + t + t1 ) )
h
i
2
= E (t + t1 )

= E 2t + 2t t1 + 2t1

= E 2t + 2E [t t1 ] + 2 E 2t1

= E 2t + 2E [t ] E [t1 ] + 2 E 2t1
= 2 + 0 + 2 2

= 1 + 2 2 ,

where we uses independence to go from the fifth


to the sixth line. Again both the blue and the pink

process have the same unconditional variance 1 + 2 2 .


The conditional variance at time t + 1, given the information It = {t }, is

V ar(yt+1 | It ) = E (yt+1 E [yt+ | It ])2 | It


h
i
2
= E (( + t+1 + t ) ( + t )) | It
h
i
2
= E (t+1 ) | It
h
i
= E (t+1 )2
= 2 ,

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

where we used (3) in to go from the first second line, and the i.i.d. property to go from the third
to the forth. Both conditional and unconditional variances are time stationary. Also note that the
conditional expectation, given our information set, is independent of the realisation of t : in contrast with
the conditional expectation of the process, which depended on the realisation of the shock, the conditional
expectation under consideration is completely deterministic: the blue process, the pink process and any
other process we could draw would have the same conditional variance V ar(yt+1 | It ) given It = {t }.
Autocovariance The (unconditional) autocovariance at lag , ,t , are given by:

,t E (Yt+ t+ )(Yt t ) .

(6)

Note that the autocovariance at lag = 0 is also the variance:

0,t E (Yt+0 t+0 )(Yt t ) = V ar(Yt ).

The series ,t = = . . . , 1,t , 0,t , 1,t , . . . is called the autocovariance function. It associates
to each lag the autocovariance at this lag ,t . As before, we can write this relationship as the function:

t : t ( ) E (Yt+ t+ )(Yt t ) .

Note that this function is symmetric, i.e.

t ( ) = t ( ) ,
which means that, in practice, we are only interested in positive lags 0. Also note that the autocovariance function is also a function of time t. Again, a particular case of interest will be when this
function does not depend on time, in which case we write t ( ) = ( ). One can also define the concept
of conditional autocovariance.
For our process (1), the unconditional autocovariance at time t is given by

,t = E (yt+ t+ )(yt t )
= E [(t+ + t+ 1 )(t + t1 )]
= E [t+ t ] + (E [t+ 1 t ] + E [t+ t1 ]) + 2 E [t+ 1 t1 ] .
(7)
We simplify our task by recalling that: (i) we only need to calculate ,t for positive values of , (ii) we

already know that t (0) = 1 + 2 2 .


For = 1, (7) becomes
1,t

= E [t+1 t ] + (E [t t ] + E [t+1 t1 ]) + 2 E [t+1 t1 ]

= 0 + 2 + 0 + 2 0
= 2 ,

where we exploit independence to go from the first to the second line.


For = 2, (7) becomes
2,t = E [t+2 t ] + (E [t+1 t1 ] + E [t+2 t1 ]) + 2 E [t+1 t1 ] = 0.
At lag 1, the observed variables yt+1 and yt have one shock in common, t , which yielded the autocovariance
2 ; at lag 2, the variables yt+2 and yt have no shock in common and their autocovariance is zero. You
should convince yourselves, by staring for a while at the indices in (7), that the same applies to lag 3, 4

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

and beyond. Hence, the autocovariance function of this process is time-invariant. Its graph is depicted in
figure 4.
Autocovariance function

2.5

autocovariance

1.5

0.5

0
-5

-4

-3

-2

-1

lag

Figure 4: autocovariance function for all ts.


For processes whose autocovariance function is not time-invariant, one would have to draw a graph like
figure 4 for each time t. We would then have a family of curves, family which would depict the movement
over time of the autocovariance function.
As an exercise, try to state and calculate the conditional autocovariance function at time t + 1 of our
process given the information set It = {t }: t+1 ( | It ).
Autocorrelation The unconditional autocorrelation at lag , , are given by:
,t
,t
=
,t p
.

V ar(Yt+ )V ar(Yt )
,t 0,t

(8)

As for the autocovariances which defined the (possibly time varying) autocovariance function, these autocorrelations define a (possibly time varying) autocorrelation function

E (Yt+ t+ )(Yt t )
t : t ( ) q
.

E (Yt+ t+ )2 E [(Yt t )2 ]

At lag = 0, the autocorrelation function is always 1. At other lags, the autocorrelation is bounded
between -1 and 1: 1 ,t 1 (Hlder inequality). Hence, the autocorrelation function will always have
a peak of = 1 at lag = 0.
If the variance of the process is constant over time and equal to some constant 0 0, t, V ar(Yt ) =
0,t = 0 , then the autcorrelation can be written more simply as
,t =

,t
.
0

In this case, the symmetry of the autocovariance function ,t transfers to the autocovariance function:
,t = ,t .

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

If, in addition, the autocovariance function is also time-invariant, the autocorrelation function is timeinvariant as well and is given can be written more simply as
=

.
0

One can also define the concepts of conditional autocorrelation.


It would be a painful mistake to try to calculate the autocorrelation of process (1) directly from (8).
In fact, very little algebra is needed at this stage since:
(i) we
know both variance and autocovariance are
time-invariant, (ii) we have already calculated 0 = 1 + 2 2 and, for = 1, 1 = 2 , and for > 1,
1 = 0. Hence

= 0,

1, for
=
/ 1 + 2 , for = 1, or 1,

0
for
all other .
whose graph is depicted in Figure 5

Autocorrelation function

0.9

0.8

autocorrelation

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-5

-4

-3

-2

-1

lag

Figure 5: Autocorrelation function


Weak stationarity Suppose a large number m of realisations, such as the blue and the pink path of y ,
were available. Then, we could get a sensible description of what the true distribution of Yt would look
like, and we would be able to make inference on it. The type of inference we could draw would depend on
how much we know a priori, or are willing to assume, about the underlying process generating y.
As an example, suppose we do not know anything about how the paths of ys have been generated, in
particular, we do not know (1), but we have a lot of observations of these paths, starting with time t = 0.
One possible strategy would be as follows.
For t = 0 Divide the real line into a collection of bins such as: B = {[0.05, 0.05), [0.05, 0.15), . . .}, i.e.
bins around the values: {0, 0.1, . . .}. Construct an histogram of the observations of y0 across the
realisations. Use the frequencies of y0 falling into each bin to estimate the distribution of probability
fy0 .

Advanced Econometrics, G. Talmain, Topic 1, Univariate Time Series

For t = 1 Consider, first, only the paths for which y0 fell into the first bin [0.05, 0.05). Use the same collection B to construct the frequencies for y1 falling within each interval [0.05, 0.05), [0.05, 0.15), . . .:
it provides an estimate of the conditional probability: fy1 | y0 =0 . Then, move on to the next bin
[0.05, 0.15) to estimate, in the same way, the distribution of conditional probability: fy1 | y0 =0.1 , and
so on.
For further ts Iterate the previous procedure to calculate the distribution of conditional probabilities
(example) fyt | {yt1 =0.1, yt2 =1.2, ...} .
It is clear that, even if we start with a very large, but finite, number of observations of paths, we are
likely to have very few observations per bin as t becomes large. An alternative strategy would be to try
to guess what the functional form of the process generating these paths y are (remember, we do not they
have been generated by (1)). Lets suppose we guess something like

i.i.d.
yt = + t , with t N 0, 2 .

(9)

We would have to elaborate a strategy to (i) estimate the parameters {, , }, (ii) test for the congruence
of our modelling to the data: if the ys were really generate by the process (9), how likely would we be
to observe all of these paths? Here (9) implies that there should be no autocorrelation at lag 1 (since
the variability in yt+1 depends only on t+1 and the variability in yt only on t , and t+1 and t are
independent). From all the paths we have, we can easily estimate ,t for many s and ts. If we find,
say, that 1,1 = 0.5, then we may suspect our model does not fit the data very well. We can try to
derive the distribution of probability of the 1,1 associated with the process (9), either by algebra or by
simulating
it on acomputer for the number of paths we have at our disposal. If we we to find, say, that

Prob 1,1 0.1 = 95%, we would conclude that 1,1 = 0.5 is deep into the rejection region and we
would have to conclude our guess was quite likely not a good one. We would then have to try to postulate
another functional form and repeat the process of (i) inference and (ii) testing. Should we be satisfied
with our testing, we would still be interested in the precision of our inference: how far our estimates likely
to be from their actual values? This would be crucial for the next step which is likely to be forecasting.

You might also like