You are on page 1of 8

Urban Studies (1985) 22, 83-90

1985 Urban Studies

Notes and Comments

Population Forecasting at the City Level:


An Econometric Approach
Herman J. Bierens and Roy Hoever

tFirst received, April 1983; in final form, December 1983]

1. Introduction

housing, so that when the housing-market is in


equilibrium the exogenous supply of various types of
dwellings matches the types of households for which
these dwellings are suitable.
We shall also report on a long term forecasting
experiment. On the basis of the size and age
distribution of the population and various
characteristics of the stock of dwellings in 1971, and
detailed information on house-building in the period
1971-1980, in the city of Huizen, annual population
forecasts up to 1980 were made. These forecasts
prove to be reasonably close to the actual values.
In section 2 the model and the estimation results
are described. In section 3 we show how the model
can be used for forecasting future population size and
age distribution and moreover we present there the
above mentioned forecasting experiment for Huizen.
Finally in the appendix the estimation procedure and
the results of a specification test are described.

The usual method of forecasting the population of


cities and countries is the cohort analysis (Wunsch
and Termote, (1978)). However, applying this
method to growing cities and new towns, where a
relatively large part of the population growth is due
to immigration, one faces the problem of how to
forecast the future number and age distribution of the
migrants, as the forecast of the future population size
and age distribution heavily depends upon the
present and future migration, especially if the
forecast period is long. In this paper we therefore
propose an alternative and practical approach to
long term population forecasting on the city level
based on an econometric model, which does not need
explicit forecasts of future migration. This approach
is appropriate if the city growth is planned in the
sense that the number and type of houses to be built
will be decided by the municipal authorities (which is
common practice in the Netherlands). Our approach
is based on a six-equation regression model relating
the number of inhabitants in each of six age groups,
relative to the total stock of dwellings, to the age
distribution of the stock of dwellings, the share of
owner occupied dwellings, the proportion of
dwellings in apartments, the mean size of the
dwellings and the mean income per income earner.
The parameters of this model have been estimated
using cross-sectional data of 131 Dutch cities and
villages. The idea behind the model is that it
represents an equilibrium in the housing-market. In
particular it is assumed that the age-distribution and
size of households determine their demand for

2. The Model

In constructing the model we used 1977 data for 131


Dutch cities and villages with population sizes
ranging from 2,200 to 739,000. The majority of these
cities and villages, i.e. 80 per cent, had a population
size of less than 70,000. The variables considered are
listed in Tables 1 and 2 below.
The choice of the explanatory variables in Table 2
is based on the following considerations:
Young families are often not in the position to buy
a house. Only after some years when family income
has increased sufficiently can they afford to buy one.

The authors are on the staff of the Foundation for Economic Research of the University of Amsterdam.

83

H E R M A N J. B1ERENS AND ROY HOEVER

84
Table 1
Dependent variables

Notation

Description

Yl

N u m b e r o f inhabitants in the age group 0-4 years,


relative to the total stock of dwellings.
ldem in the age group 5-9 years.
Idem in the age group 10-14 years.
ldem in the age group 15-19 years.
ldem in the age group 20--64 years.
Idem in the age group 65 years and older.

Y2
Y3

Y4
Ys
Y6

Table 2

Explanatory variables
Notation

Description

x1

Number of owner occupied dwellings, relative to the


total stock of dwellings.
Number of apartments, relative to the total stock of
dwellings.
Mean number of rooms, including the kitchen, per
dwelling.
Mean income per income earner.
Number of dwellings in the age group 0-4 years
relative to the total stock of dwellings.
ldem in the age group 5-11 years.
ldem in the age group 12-16 years.
Idem in the age group 17-31 years.
Idem in the age group 32-64 years.

x2
x3
x4
x5
x6
x7
x8
x9

Thus families living in an owner occupied dwelling


will p r o b a b l y be more established than families living
in a rented dwelling, and therefore we may expect
that their children will be somewhat older.
Families with children usually prefer to live in a
one family house. Therefore we m a y expect to find
fewer children in apartments than in one family
houses, ceteris paribus.
It is obvious that one will find more children in
large dwellings than in smaller ones and that wealthy
people usually live in a larger dwelling than less
wealthy people. Thus given the size o f the dwelling
one will find less children the wealthier the residents
are, and given the wealth o f the residents one will find
more children the larger the dwellings are.
The age distribution o f the total stock o f dwellings
represented by the variables x 5 to x 9 corresponds
with the age distribution o f the residents in that the
families grow older together with the dwelling in
which they live. This is, o f course, not a causal
relationship but merely a distinct parallelism.
A p a r t from the direct impact o f the variables x 1 to
x 9 on each o f the dependent variables yi there m a y be

some mutual effects between the dependent variables.


F o r instance the variables yl to Y4 will be related to y5
since children generally belong to families with two
parents in the age group 20-64 year. Moreover, the
n u m b e r o f y o u n g children is negatively correlated
with the n u m b e r o f elderly.
In view o f the above considerations we specify our
model as a linear simultaneous equation system, i.e.
y = B y + Fx + c5+ p

(1)

where y is a six c o m p o n e n t vector o f dependent


variables, x is a nine c o m p o n e n t vector o f the
variables x t to x 9, B is a 6 x 6 matrix o f coefficients
representing the mutual relationships between the
dependent variables, F is a 6 x 9 matrix o f coefficients
representing the direct impact o f the explanatory
variables on the dependent variables, 6 is a six
c o m p o n e n t vector o f constant terms a n d / z is a six
c o m p o n e n t vector o f disturbances with zero mean
vector and finite covariance matrix. Assuming that
the matrix I - B (where I is the identity matrix) is
nonsingular we can write the reduced form o f (1) as
y = Co + Cx + v

(2)

where
c o = ( I - B ) - 16,
C = (I-B)- lF
and
v = ( I - B ) - '~t.
Thus each equation o f the system (2) has the form
9

Yi -----Co.i "]- ~ Cl,iXI+ Vi, i = 1, 2, . ...... 6.

(3)

I=1

The reduced form coefficients cj. i (l = 0, 1, . ...... 9; i


= 1. . . . . . . . 6) have been estimated by using the robust
M-estimation method o f Bierens (1981, Section 3.2).
[See also the appendix]. The results are reported in
Table 3 below.
The numbers in brackets are the standard errors of
the coefficients involved. R 2 is the multiple
correlation coefficient. The statistic b 2 is the estimate
o f variance o f the error v i and F (9, 121) is the
F-statistic with 9 and 121 degrees o f freedom
corresponding with the test o f the hypothesis that the
regression coefficients c 1 to c 9 are all zero. In all cases
this hypothesis is rejected at a significance level o f less
than one per cent. We also applied a specification test,

POPULATION

FORECASTING AT THE CITY LEVEL


N

~ _ ~

o
~ _~~~- - _~~-_ ~
N~N~

.oo

~o~

85

86

HERMAN J. BIERENS AND ROY HOEVER

namely test one of Bierens (1982). The null


hypothesis to be tested is that the reduced form
responses represent the mathematical expectations of
the dependent variables conditional on the
regressors. It is well-known that if this null hypothesis
is true the reduced form model is merely the
forecasting scheme which minimizes the forecast
error variance. In other words: the forecasts obtained
from the reduced form model involved are optimal
with respect to a quadratic loss function.
Consequently there is no need to estimate the
structural model (1) as this model cannot give better
forecasts than the reduced form model (3). For all six
equations the null hypotheses involved could not be
rejected, hence the data provide evidence that the
reduced form model (3) is indeed the optimal
forecasting scheme.
Since the results in Table 3 involve reduced form
coefficients, it is difficult, if not impossible, to
interpret the relative magnitude and the signs of
these coefficients, as they not only represent the direct
impact of the variables xl to x 9 but also the
intercorrelation among the dependent variables. The
plausibility of the model should therefore not be
judged by interpreting the estimation results but by
its forecasting power.

(including the terms c4, x4) of the model such that the
model fits perfectly for 1981. Fixing these constant
terms on their new level we then plug in the values of
the explanatory variables x I to x 3 and x5 to x9 for
each year of the forecasting period involved to obtain
'raw' forecasts. Obviously these raw forecasts do not
take into account the possibility of a change of the
birth rate. Since the birth rate is changing over time
we have to make some adjustment. The correction
procedure we have applied is the following. Let b(t)
be the forecast of the birth rate in year t. The raw
forecast of the variable Yi for, say, 1985 is now
corrected by multiplying it by
1984

1980

b(t)/
t=i980

b(t)

t=i976

for, at the beginning of 1985, the children in the age


group 0-4 years will have been born in the period
1980-1984. The birth rates in these years are thus
compared with the birth rates of the period
1976-1980 in which the children in the age group 0--4
years, at the beginning of 1981, were born. More
generally, let t o be the year in which the constant
terms have been fixed and let ~ ( t ) .... .~4(t) be the raw
forecasts of the variables y~ to Y4 for t > t o. Then the
corrected forecasts are:
t-I

3. Forecasting

In order to use this model for forecasting the


population and age distribution of a particular city
over a certain period we need the values of the
explanatory variables x~ to x 9 for each year of this
period. Most of the Dutch cities keep statistics of
their stock of dwellings, and since city growth in the
Netherlands is planned by the municipal authorities
sufficient data on the future stock of dwellings are
also available. The only problem is to predict the
future values of the variable x4, the mean income (in
constant prices) per income earner. In practical
applications of our model (Bierens et al. (1982a,
1982b, 1983)) we have always assumed that the
impact of this variable will not vary much in time,
hence that we may treat this variable as a constant
term.
Now suppose we want to forecast the population
and age distribution over the period 1982-1990, say,
in a particular city and that 1981 is the last year for
which the actual values of the dependent variables are
available. We may then shift the constant term

b(j)

91(0 = j=,-5
to- 1
".91(t),
E
b(j)
j=to-5
t-6

Y bO)
~2(t) = J=', 0 - 6 ~o

"')2(0,

j =to - 10
t-it

Z
b(j)
5
3~ ~- - t o - 1i
'x) 3(t)'
rt.

j=t_i

Z bO)
to-15
t-16

E
b(j)
~4(t) = j=,-2o
,o- 16

Z
j=to-20

~),(t).

b(j)

As a test of the predictive ability of our model we


applied the above procedure to a medium size Dutch
city, namely Huizen. On the basis of the actual
situation in 1971 we predicted the population size in
the six age groups over the period 1972-1980 and we

POPULATION FORECASTING AT THE CITY LEVEL

87

Table 4

Long Term Population Forecasts for Huizen on Basis of the Situation in 1971
Number of inhabitants in the age group:

Year

0-4

5-9

10-14

15--19

20-64

65 +

Total

Stock of
dwellings

1972

2400
2360
-1.6
2380
2270
-4.7
2170
2140
-1.5
1970
1970
0.0
1740
2080
19.5
2050
2150
5.0
1930
2190
13.2
2080
2290
10.7
2040
2240
9.8

2600
2670
2.8
2630
2650
0.8
2620
2650
1.4
2650
2620
--1.2
2700
2860
6.2
2850
3080
8,1
2820
3140
11.2
2760
3090
11.8
2650
2880
8.7

2340
2350
0.7
2380
2390
0.6
2500
2430
-3.0
2520
2440
-3.5
2690
2660
-1.3
2810
2940
4.7
2780
3040
9.4
2910
3110
6.9
2960
3070
3.7

2020
1960
3.1
2080
2000
-3.8
2130
2050
-3.8
2210
2090
-5.2
2250
2320
3.4
2300
2670
16.0
2360
2760
16.9
2510
2800
11.6
2620
2770
5.7

12910
12940
0.2
13120
12880
-1.8
13100
12850
-1.9
12980
12610
-2.8
13690
13620
-2.4
15210
14890
-2.1
15360
15070
-1.9
16170
15820
-2.2
16500
15900
-3.6

1930
1900
-1.7
2020
1960
-2.8
2140
2000
-6.6
2210
2000
-9.5
2270
2030
-10.5
2380
2360
-1.3
2440
2190
-10.3
2540
2240
-11.8
2570
2260
-12.1

24200
24190
-0.1
24600
24150
-1.9
24640
24100
-2.2
24540
23720
-3.3
25600
25570
-0.I
27600
28090
1.8
27690
28390
2.5
28980
29350
1.3
29340
29120
-0.7

6878

1973
1974
1975
1976
1977
1978
1979
1980

7005
7150
7161
7740
8481
8657
9175
9239

Row 1= actual number of inhabitants


Row 2 = forecast
Row 3 = percentage forecast error.
c o m p a r e d these p r e d i c t i o n s with the c o r r e s p o n d i n g
a c t u a l data. T h e results are given in T a b l e 4.
T h e birth rate sequence we used to c o r r e c t the raw
forecasts covered the n u m b e r o f live birth p e r 1000
w o m e n in the age g r o u p 15-49 years in the
N e t h e r l a n d s , as r e p o r t e d by the C e n t r a l B u r e a u o f
Statistics (CBS (1976)). It w o u l d be better, o f course,
to use the birth rate in H u i z e n itself, b u t this series
was n o t available. N e v e r t h e l e s s the forecasting
results p r o v i d e evidence t h a t o u r e c o n o m e t r i c
a p p r o a c h is a useful alternative to c o h o r t analysis.
F u r t h e r m o r e we a p p l y a n o t h e r c o r r e c t i o n to the
r a w forecasts to take into a c c o u n t a social c h a n g e in
the N e t h e r l a n d s . In the b e g i n n i n g o f the 1970s it was
still n o t u n c o m m o n for m a i d s , servants a n d ' c h i l d r e n '
in the age g r o u p 20-64 years to live o r stay living with
their m a s t e r o r parents. Since then the situation has
c h a n g e d c o n s i d e r a b l y . D r a w i n g on experience we
a s s u m e t h a t the percentage o f p e o p l e in the age g r o u p

20-64 years living with m a s t e r o r p a r e n t s will linearly


decrease f r o m a p p r o x i m a t e l y 13 per cent in 1971
(CBS (1971)) to zero in 1977.
T h e forecasting p o w e r o f the m o d e l could be
further i m p r o v e d b y c o r r e c t i n g the raw forecasts for
changes in the age-specific d e a t h rates a n d for
changes in v a c a n c y rates. T h e d e a t h rates, however,
d o n o t c h a n g e s u b s t a n t i a l l y in the N e t h e r l a n d s , so
ignoring t h e m will p r o b a b l y n o t affect the forecasting
p e r f o r m a n c e t o o greatly. The v a c a n c y rates in the
p e r i o d 1970-1980 have also n o t c h a n g e d very m u c h
so that they c a n safely be i g n o r e d in the a b o v e
forecasting e x p e r i m e n t . H o w e v e r , since 1980 the
v a c a n c y rate h a s s u b s t a n t i a l l y increased, due to the
e c o n o m i c recession. F o r future a p p l i c a t i o n s o f o u r
m o d e l we therefore a d v o c a t e t h a t the raw forecasts be
c o r r e c t e d for c h a n g e s in the vacancy rate. This m i g h t
be d o n e a n a l o g o u s l y with the c o r r e c t i o n for changes
in the birth rate, if sufficiently reliable forecasts o f

88

HERMAN

J. B I E R E N S A N D R O Y H O E V E R

these vacancy rates are available. Another point


concerns the extent to which the model contains an
implicit migration component. Migration may be
distinguished into housing-market induced migration and labour-market induced migration. Since
our model describes the equilibrium of the
(endogenous) d e m a n d for housing of the public and
the (exogenous) supply of housing determined by the
municipal authorities, we feel that housing-market
induced migration is fairly well covered by the model.
The growth of the city o f Huizen is mainly due to
housing-market induced migration, as Huizen lies
within commuting distance of Amsterdam. The same
applies to the other medium-sized cities (i.e. Hoorn,
Almere, Lelystad) to which the model has been
applied. On the other hand, labour-market induced
migration is obviously not covered by the model, as
none of the explanatory variables is directly related
to regional or urban economic factors. If sufficient
data are available the model can be augmented with
such economic variables in order to capture labourmarket induced migration, but this is beyond the
scope of the present study.

and any ~ > 0 if the null


hypothesis is true
and
lim P(rh(e) ~<~) = 1 for any t > 0
n ---*oo

and any at > 0 if the null


hypothesis is false
where n is the number of observations. Thus
conducting the test at, say, the five per cent
significance level we do not reject the null if for given e
> 0, rh(e) > 0.05, and we reject the null if not. In table
A 1 we present the values of this test-statistic rh(e)-for
e= 1, 2, 3, 4 and 5-for each of the six equations.
Clearly these results show that there is no evidence
that the linear specification is false.

2. Estimation
The usual method for estimating linear relationships
is the Ordinary Least Squares (OLS) estimation
method. Writing a particular equation of our model
in matrix form as

Appendix
1. Specification testing
The reduced form model (3) consists of linear
regression equations. In order that these linear
equations represent the mathematical expectations of
the dependent variables conditional on the
explanatory variables, they should satisfy the usual
regression assumption that the expectation of the
error term v~ conditional on the vector x of regressors
equals zero with probability 1, i.e.
Ho: E(vilx)= 0 a.s. for i = 1, 2, ..., 6

y=Xfl+v
where Y'= (Yl ..... y,), v = (vt, v z..... v,), fl' = (/~x.....

x___(x,., ..... x,,k /


\ X n , 1, .-.~ Xn,k//

with n the number of observations and k the number


of regressors (including the constant term), the OLS
estimator of the parameter vector fl is defined as
/~= (X'X)- XX'y.
Under suitable regularity conditions we have

and all observations.


In Bierens (1982) two consistent tests have been
proposed for testing the above hypothesis. In the
research under review we have applied one of these
tests, i.e. test 1, to each of the six equations. In
particular we have applied the same procedure as in
the numerical example in Section 9 of Bierens (1982).
The statistic of model specification test 1 is a random
variable rh(e), say, depending on a test parameter
0 < e < oo, satisfying
limsup P(rfl(e) ~<ct)~<ct for any e>~0
n ---*oo

x/~(/~- fl)--,Nk[O, a2(plim I X ' X ) - 1] in


n
Table AI
Values of the Test-Statist& of Bierens' Specification test 1
~ e
Equation~,x~

1
2
3
4
5
6

0.74
0.50
0.73
0.99
0.94
0.61

0.95
0.82
1.00
0.98
1.06
0.81

1.02
1.03
1.07
0.98
1.02
0.98

1.00
1.02
1.03
0.99
1.01
1.00

1.00
1.00
1.01
0.99
1.00
1.00

POPULATION FORECASTING AT THE CITY LEVEL


distribution,
where a 2 is the variance of the u]s, i.e.
a2 = Ev 2.
Moreover, tr2 can be estimated consistently by the
variance of the OLS residuals
(~2

1 ~

with the fl* consistent initial estimates of the fli, for


example the OLS estimate or a robust M estimate
with some fixed Yl >0. It can be shown that if ~(y)
takes a minimum at ~ on an interval [y,, oo) with y,
> 0 and if h(y) takes a minimum at Y0 on the same
interval, then
plim ~ = Yo,

^2

L
n j=l uj"

plim ~(,)) = h(yo)

However, in Bierens (1981, Chapter 3) it has been


shown that if the distribution of the errors vj has a
positive kurtosis then the robust M-estimation
method is more efficient. The robust M-estimation
method of Bierens is a two-stage estimation
procedure where an estimator j~(~) is obtained by
maximizing an objective function o f the form
k

R(/~/~,)=

(Yi-,=~lflix'j)
p

,
y

j=l

where 19 is the density of the standard model


distribution and y is a positive scale parameter.
Under some regularity conditions we have
,v/~(j~(Y)- fl)->Nk[O, h(7) (plim 1 X ' X ) - i ]
n
in distr.
Now if the kurtosis of the error distribution,
defined by
E v~
K - - (E v2) 2

89

3,

is positive then there exists a ? > 0 such that

and
x/r~(ff(~) - fl)->N[O, h(yo)(plim I X ' X ) - ' ]
n

in distr.
Thus ff(~) is the efficient robust M-estimator of ft.
All the six equations of our model have been
estimated by using OLS as well as the above robust
M-estimation method. From table A2 below we see
that in all cases the robust M-estimation method
performs better, for ~(~) is always smaller than b 2.
Only for equation 6, the equation concerning the
elderly, the difference between h(~) and b 2 is very
small, so that for this equation the OLS estimator is
Table A2

A Comparison of the Efficiency of OLS and Robust M-estimates

Equation

b2

h0))

1
2
3
4
5
6

0.001694
0.001401
0.001280
0.001895
0.013950
0.002580

0.001560
0.001305
0.001088
0.001135
0.011660
0.002578

h(y) < a 2,
hence in this case fl(y) is more efficient than the OLS
estimator ]7. Note that a positive kurtosis implies
that the tails of the error distribution are heavier
than those of the normal distribution.
The function h(y) can be estimated uniform
consistently by

~(~)_

j:

"(fijl~
J

where

Table A3

Estimated Kurtosis* of the Error Distribution

Equation

Estimated kurtosis of:


OLS
Robust Mresiduals
residuals

1
2
3
4
5
6

1.556
1.348
2.259
6.711
4.315
0.093
1

II1

n j-I

]\n

j=l

*Defined asfij = y j - ~ fl*xij


j=l

1.777
1.430
3.933
9.709
7.474
0.102

Z tft- E W-3, where the fij

are the-mean corrected-residuals.

90

H E R M A N J. B I E R E N S A N D ROY H O E V E R

nearly as efficient as the robust M-estimator. These


results correspond with those in table A3, for a
positive kurtosis implies that robust M-estimation is
more efficient than OLS estimation.

References

BIERENS, H. J. (1981). Robust Methods and Asymptotic Theory in


Nonlinear Econometrics, Lecture Notes in Economics and
Mathematical Systems. Vol. 192: Springer-Verlag: Heidelberg.
BmRENS, H. J. (1982). Consistent Model Specification Tests.
Journal of Econometrics, Vol. 20:105-134.
BIERENS, H. J., GIEBELS, R., HOEVER, R. and DE RUYTER, W.
(1982a). Groei, Voorzieningen en Financidn van de Gemeente
Hoorn in de Periode. 1981-2000, (Growth, Provisions and
Finance of the Municipality of Hoorn in the Period 1981-2000).

Stichting voor Economisch Onderzoek der Universiteit van


Amsterdam: Amsterdam.
BrogaNs, H. J., GmaELs, R. and TEULINGS,C. (1982b). Financi~le
Perspectieven van de Gemeente Lelystad tot het Jaar 2000,
(Financial Perspectives of the Municipality of Ldystad till the
Year 2000), Stichting voor Economisch Onderzoek der
Universiteit van Amsterdam, Amsterdam.
BIERENS, H. J. and DIDERICH, A. E. M. (1983). Financi~n van
Almere op lange termijn, (Long-term finance of Almere).
Stichting voor Economisch Onderzoek der Universiteit van
Amsterdam: Amsterdam.
CBS. (1971). 14e Algemene Volkstelling, (14th National Census),
Centraal Bureau voor de Statistiek, 's-Gravenhage.
CBS. (1976). De Toekomstige Demografische Ontwikkeling in
Nederland na 1975, (The Future Demographic Development in
the Netherlands after 1975), Centraal Bureau voor de Statistiek:
's-Gravenhage.
WUNSCH, G. J. and TERMOTE, M. G. (1978). Introduction to
Demographic Analysis: Principles and Methods, Plenum Press:
New York.

You might also like