Computational Laboratory For Economics

GABRIELE CANTALUPPI
COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students
GABRIELE CANTALUPPI
COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students
Milano 2013
2012-2013 EDUCatt - Ente per il Diritto allo Studio Universitario dell'Universit Cattolica
2012-2013 Largo Gemelli 1, 20123 Milano - tel. 02.7234.22.35 - fax 02.80.53.215
2012-2013 e-mail: editoriale.dsu@educatt.it (produzione); librario.dsu@educatt.it (distribuzione)
2012-2013 web: www.educatt.it/libri
2012-2013 ISBN edizione cartacea: 978-88-6780-021-6
2012-2013 ISBN edizione elettronica: 978-88-6780-022-3
Ledizione cartacea di questo volume stata stampata nel mese di settembre 2013
presso la Litografia Solari (Peschiera Borromeo - Milano)
CONTENTS
Preface
1 Some Elements of Statistical Inference
1.1 On the Properties of the Sample Mean
1.1.1 The Normal Distribution Case
1.1.2 The Central Limit Theorem
2 An Introduction to Linear Regression
2.1 Example: Individual wages (2.1.2)
2.1.1 Data Reading and summary statistics
2.1.2 Some graphical representations and grouping statistics
2.1.3 Simple Linear Regression
2.1.4 Confidence intervals (Section 2.5.2)
2.2 Multiple Linear Regression (Section 2.5.5)
2.2.1 Parameter estimation
2.2.2 ANOVA to compare the two models (Section 2.5.5)
2.3 CAPM example (Section 2.7)
2.3.1 CAPM regressions (without intercept) (Table 2.3)
2.3.2 Testing an hypothesis on 1
2.3.3 CAPM regressions (with intercept) (Table 2.4)
2.3.4 CAPM regressions (with intercept and January dummy) (Table
2.5)
2.4 The Worlds Largest Hedge Found (Section 2.7.3)
2.5 Dummy Variables Treatment and Multicollinearity (Section 2.8.1)
2.6 Missing Data, Outliers and Influential Observations
2.7 How to check the form of the distribution
2.7.1 Data histogram with the theoretical density function
2.7.2 The 2 goodness-of-fit test
2.7.3 The Kolmorogov-Smirnov test
2.7.4 The PP-plot and the QQ-plot
2.7.5 Use of the function fit.cont
2.8 Two tests for assessing normality
2.8.1 The Jarque-Bera test
2.8.2 The Shapiro-Wilk test
2.9 Some further comments on the QQ-plot
ix
1
1
1
5
9
9
9
11
14
16
17
17
18
18
19
23
25
26
26
28
31
31
32
32
34
37
40
41
41
43
43
2.9.1
2.9.2
2.9.3
2.9.4
Positively skewed distributions

Negatively skewed distributions
Leptokurtic distributions
Platikurtic distributions
3 Interpreting and comparing Linear Regression Models

3.1 Explaining House Prices (Section 3.4)
3.1.1 Testing the functional form: construction of the RESET test
3.1.2 Testing the functional form: A direct function to perform the
RESET test
3.1.3 Testing the functional form: the RESET test for the extended
model
3.1.4 Testing the functional form: the interaction term
3.1.5 Prediction
3.1.6 Model with price instead of log(price) as dependent variable
and lotsize instead log(lotsize) among the predictors
3.1.7 The PE test to compare a loglinear specification with the linear
specification
3.2 Selection procedures: Predicting Stock Index Returns (Section 3.5)
3.2.1 The full model
2 criterion
3.2.2 The max R
3.2.3 Stepwise
3.2.4 An algorithm to perform a stepwise backward elimination of
regressors
3.2.5 AIC
3.2.6 BIC
3.2.7 A better output to compare the results
3.2.8 Some remarks on the AIC and BIC values
3.2.9 Out of sample forecasting performance (Table 3.5)
3.3 Explaining Individual Wages (Section 3.6)
3.3.1 Linear Models (Section 3.6.1)
3.3.2 Loglinear Models (Section 3.6.2)
3.3.3 The Effects of Gender (Section 3.6.3)
4 Heteroscedasticity and Autocorrelation
4.1 Explaining Labour Demand (Section 4.5)
4.1.1 Linear Model
4.1.2 Breusch-Pagan test - construction
4.1.3 Breusch-Pagan test - direct function
4.1.4 Loglinear model
4.1.5 White Heteroscedasticity test
4.1.6 Heteroscedasticity consistent covariance matrix
4.1.7 Estimated Generalized Least Squares
4.1.8 Types of Heteroscedasticity consistent covariance matrices
4.2 The Demand for Ice Cream (Section 4.8)
4.2.1 The Durbin-Watson statistic - construction
44
45
46
47
55
55
57
60
60
62
63
63
64
66
67
68
72
75
76
78
79
80
82
83
83
85
90
93
93
93
94
95
95
96
98
99
103
108
111
4.2.2
4.2.3
4.2.4
4.3
The Durbin-Watson statistic - direct function

Estimation of the first-order autocorrelation coefficient
The Breusch-Godfrey test to test the presence of autocorrelation - construction
4.2.5 The Breusch-Godfrey test to test the presence of autocorrelation - direct function
4.2.6 Some remarks on the procedure presented by Verbeek on page
113
4.2.7 The EGLS (iterative Cochrane-Orcutt) procedure
4.2.8 The model with the lagged temperature
Risk Premia in Foreign Exchange Markets (Section 4.11)
4.3.1 Tests for Risk Premia in the 1 month Market
4.3.2 Tests for Risk Premia using Overlapping Samples
112
113
115
117
117
118
120
122
124
129
5 Endogeneity, Instrumental Variables and GMM

5.1 Estimating the Returns to Schooling (Section 5.4)
5.2 Example of an application of the Generalized Method of Moments
5.3 Estimating Intertemporal Asset Pricing Models (Section 5.7)
137
137
142
144
6 Maximum Likelihood Estimation and Specification Tests

6.1 Normal distribution
6.2 Bernoulli distribution
6.3 Exponential distribution
6.4 Poisson distribution
6.5 Linear model
6.6 Individual wages (Section 2.5.5)
151
152
162
166
172
177
182
7 Models with Limited Dependent Variables

187
7.1 The Impact of Unemployment Benefits on Recipiency (Section 7.1.6) 187
7.1.1 Estimation of the linear probability model
188
7.1.2 Estimation of the Logit model
189
7.1.3 Estimation of the Probit model
191
7.1.4 A unique table for comparing model estimates
192
7.1.5 Some additional goodness of fit measures
195
7.2 Some remarks on the interpretation of a parameter in a logit model 198
7.3 Explaining Firms Credit Ratings (Section 7.2.1)
200
7.4 Willingness to Pay for Natural Areas (Section 7.2.4)
204
7.5 Patent and R&D Expenditures (Section 7.3.2)
211
7.6 Expenditures on Alcohool and Tobacco (Part 1) (Section 7.4.3)
224
7.7 Expenditures on Alcohool and Tobacco (Part 2) (Section 7.5.4)
228
8 Univariate Time Series Models
8.1 Some examples of stochastic processes
8.1.1 The Gaussian White Noise
8.1.2 The Autoregressive Process
8.1.3 The Moving Average Process
8.1.4 Simulation of a realization from an AR(1) process with drift
239
239
239
240
244
247
8.2
Autocorrelation, Partial autocorrelation functions and ARMA model

identification
8.2.1 Autocorrelation and Partial autocorrelation functions for an
AR(1) process with drift
8.2.2 Autocorrelation and Partial autocorrelation functions for some
AR(p) processes with drift
8.2.3 Autocorrelation and Partial autocorrelation functions for a
MA(1) process
8.2.4 Autocorrelation and Partial autocorrelation functions for some
MA(p) processes
8.2.5 Autocorrelation and Partial autocorrelation functions for an
ARMA(1,1) process
8.2.6 Problems in identifying an ARMA model for a time series
8.3 On the bias of the OLS estimator of the autoregressive coefficient for
an AR(1) process with AR(1) errors
8.3.1 Some remarks on the use of the function curve
8.4 Estimation of ARIMA Models with the function arima
8.4.1 No unit roots in the characteristic equation p (z) = 0
8.4.2 1 unit root in the characteristic equation p+1 (z) = 0
8.4.3 2 unit roots in the characteristic equation p+2 (z) = 0
8.5 Some other R functions for ARMA model parameter estimation
8.5.1 The arima function
8.5.2 The sarima function in the package astsa
8.5.3 The Arima function in the package forecast
8.5.4 The armaFit function
8.5.5 The FitARMA function
8.5.6 The ar function
8.5.7 The arima function in the package TSA
8.6 R functions for predicting with ARMA models
8.7 Stock Prices and Earnings (Section 8.4.4)
8.7.1 Dickey-Fuller test - construction
8.7.2 Dickey-Fuller test - direct function
8.7.3 How to produce the Dickey-Fuller statistic for different lags
8.7.4 Other tests for unit roots detection
8.7.5 Testing for multiple unitary roots
8.8 Some remarks on the function ur.df
8.8.1 The Dickey-Fuller test for a unit root, type "none"
8.8.2 Dickey-Fuller test for a unit root, type drift
8.8.3 Dickey-Fuller test for a unit root, type trend
8.8.4 Example
8.8.5 Exercise
8.8.6 Exercise
8.9 Long-run Purchasing Power Parity (Part 1) (Section 8.5)
8.10 The Persistence of Inflation (Section 8.8)
8.10.1 AR estimation
8.10.2 The Ljung-Box statistic - construction
249
249
254
262
262
270
270
275
280
280
283
286
291
295
296
297
299
299
300
301
302
302
306
307
308
315
315
317
322
322
323
323
324
333
336
338
345
350
351
8.10.3 The Ljung-Box statistic - direct function

8.10.4 AR estimation via Maximum Likelihood
8.10.5 AR(4) estimation
8.10.6 ARMA estimation
8.10.7 AR(6) estimation
8.10.8 Non complete models
8.11 The Expectations Theory of the Term Structure (Section 8.10)
8.12 Autoregressive Conditional Heteroscedasticity
8.12.1 A Brief Presentation of ARCH Processes
8.12.2 A First Example
8.13 Volatility in Daily Exchange Rates (Section 8.11.3)
351
352
352
353
354
356
358
363
363
365
378
9 Multivariate Time Series Models

9.1 Spurious Regression (Section 9.2.1)
9.2 Long-run Purchasing Power Parity (Part 2) (Section 9.3)
9.3 Long-run Purchasing Power Parity (Part 3) (Section 9.5.4)
9.4 Money Demand and Inflation (Section 9.6)
391
391
393
397
399
10 Models based on panel data

10.1 Explaining Individual Wages (Section 10.3)
10.2 Explaining Capital Structure (Section 10.5)
409
409
419
11
431
431
References
A Some useful R functions
A.1 How to Install R
A.2 How to Install and Update Packages
A.3 Data Reading
A.3.1 zip files
A.3.2 Reading from a text file
A.3.3 Reading from a Stata file
A.3.4 Reading from an EViews file
A.3.5 Reading from a Microsoft Excel file
A.4 formula{stats}
A.5 linear model
A.6 Deducer
435
436
436
436
436
437
438
438
438
439
441
444
B Addendum 3rd edition

B.1 Annual Price/Earnings Ratio (Section 8.4.4 third edition)
B.1.1 Dickey-Fuller test
B.1.2 Testing for multiple unitary roots
B.2 Modelling the Price/Earnings Ratio (Section 8.7.5 third edition)
B.2.1 AR estimation
B.2.2 The Ljung-Box statistic
B.2.3 AR estimation via Maximum Likelihood
B.2.4 MA estimation
449
449
449
452
452
454
456
457
457
B.3
B.4
B.5
B.6
B.2.5 Non complete models

Volatility in Daily Exchange Rates
Long-run Purchasing Power Parity
(Section
(Part 1)
(Part 2)
(Part 3)
8.10.3 third edition)

(Section 8.5 third edition)
(Section 9.3)
(Section 9.5.4)
458
463
471
477
480
PREFACE
These Lecture Notes refer to the examples and illustrations proposed in the book A
Guide to Modern Econometrics by Marno Verbeek (4th and 3rd editions).
The source codes here described are written in the R language (R Development
Core Team 2012) (R version 3.0.1 was used).
Subjects are presented in the course Computational Laboratory for Economics
held at Universit`
a Cattolica del Sacro Cuore, Graduate Program Economics. The
course runs in parallel with the course Empirical Economics where the methodological
background is assessed.
Attention was paid in order to obtain results first according to their mathematical
structure, and then by using appropriate built-in R functions, anyway searching for
an efficient and elegant programming style.
The reader is assumed to possess the basic knowledge of R. An introduction to R by
Longhow Lam, available on http://www.splusbook.com/RIntro/RCourse.pdf may
represent a good reference.
Chapters from 2 to 10 recall the contents of Verbeeks Guide. Appendix A1 describes
how to read data from text, Stata and EViews files, which are the formats used by
Verbeek on his book website, where data sets are available. Appendix B contains
results for examples which were present on the 3rd edition of Verbeeks Guide.
Some companion materials to these Lecture Notes can be downloaded from the
booksite www.educatt.it/libri/materiali.
I warmly thank Diego Zappa and Giuseppe Boari for having read parts of the
manuscript. I wish to thank Stefano Iacus for his short course on an efficient and
advanced use of R, and Achim Zeileis, Giovanni Millo and Yves Croissant for having
improved their packages lmtest and plm in order to properly fit some problems here
presented.
1
Some Elements of Statistical
Inference
1.1
On the Properties of the Sample Mean
Consider a random variable X with mean E(X) = and variance V ar(X) = 2 .

Let (x1 , . . . xn ) be a realization of the n-dimensional random variable X1 , . . . Xn ,
whose components are identically and independently distributed as X.
The random variable sample mean
n
has the properties:

and
1.1.1
X
= 1
Xi
X
n i=1
(1.1)
=
E(X)
(1.2)
= 2 /n.
V ar(X)
(1.3)
The Normal Distribution Case
We consider, as an example giving evidence of the properties of the sample mean,

the empirical distribution of the sample means for k = 100 replications of samples of
size n = 5 of pseudo-random numbers from a Normal distribution, with mean = 4
and variance 2 = 2. We remind that, since the sum of normally distributed random
will be
variables is a normally distributed random variable, also the sample mean X
normally distributed.
By means of the following code it is possible to create an array x whose elements are
the sample means evaluated for the k replications of n pseudo-random numbers from
X N ( = 4, 2 = 2).
>
>
>
>
>
set.seed(1000)
k <- 100
n <- 5
mean <- 4
sigma2 <- 2
Some Elements of Statistical Inference

Table 1.1 Four samples of size 5 from
X N ( = 4, 2 = 1)
and their sample mean
1
2
3
4
x1
x2
x3
x4
x5
3.37
3.45
2.61
4.24
2.29
3.33
3.22
4.22
4.06
5.02
4.17
4.04
4.90
3.97
3.83
1.11
2.89
2.06
2.11
4.30
3.50
3.57
3.19
3.58
> sd <- sigma2^0.5

> x <- replicate(k, mean(rnorm(n, mean, sd = sigma2^0.5)))
By initializing the seed with the instruction set.seed(), we fix the starting point
of the pseudo-random number generation routine; so, when needed, it is possible, by
invoking this instruction, to reproduce the same sequence of pseudo-random numbers.
The instruction rnorm(n,mean,sd) generates n pseudo-random numbers from a
normal random variable with mean = mean and standard deviation = sd.
Note that the third argument of the instruction makes reference to the standard
deviation and not to the variance 2 of the distribution. So in the preceding code
the command rnorm may also be stated as: rnorm(n, mean, sd), which is equivalent
to rnorm(n=n, mean=mean, sd=sd), having respected the ordering of the arguments.
The instruction replicate(n,expr) repeats for the number of times n, specified as
first argument, the expression expr (possibly a function) defined as second argument.
The result is a vector with the returns of expr in the n replications.
Table 1.1 reports 4 realizations of X1 , . . . , X5 (Xi N ( = 4, 2 = 2)) and in the
last column the corresponding value of the sample mean. We remark that each
replication differs, with probability 1, from the other ones; it follows the stochastic
nature also of the sample mean.
Table 1.1 may be reproduced by using the following code:
> set.seed(1000)
> sampletable <- matrix(rnorm(20, mean, sd), 4, 5,
byrow = TRUE)
> xbar <- rowMeans(sampletable)
> sampletable <- cbind(sampletable, xbar)
Figure 1.1, obtained with the code hist(x,breaks=15), reports the histogram of the
generated above.
k = 100 realizations of the random variable sample mean, X,
We can study in more depth the behaviour in presence of k = 50, 100, 500, 1000
replications for each of the sample sizes n = 9, 25, 64, 100 from the same Normal
distribution N ( = 4, 2 = 2), see Fig. 1.2, and we can observe that, according to
relationship (1.3), the dispersion of the sample mean estimator reduces when n
increases, while for large values of k the histograms resemble better the behaviour
of a Normal density function.
Fig. 1.2 may be obtained by means of the following code.
Frequency
10
15
2.5
3.0
3.5
4.0
4.5
5.0
5.5
Figure 1.1 Distribution of the sample mean from X N (4, 2);

sample size: n = 5, number of replications: k = 100
>
>
>
>
>
>
>
>
>
>
set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),
to = max(X$xbar), length = 25), type = "density",

as.table = TRUE, xlab = paste("n = ", paste(nvals,
collapse = ", ")), ylab = paste("k = ", paste(rev(kvals),
collapse = ", ")))
>
>
>
>
>
>
>
>
>
>
set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),
to = max(X$xbar), length = 25), type = "density",
as.table = TRUE, xlab = paste("n = ", paste(nvals,
collapse = ", ")), ylab = paste("k = ", paste(rev(kvals),
collapse = ", ")))
kvals and nvals are arrays containing respectively the values of the variables k and
n in the 16 situations depicted in Fig. 1.2.
X<-data.frame(k=NA,n=NA,xbar=NA) defines a data.frame X, with three columns
named k, n and xbar; the rows of X will contain the number (k) of replications and
the sample size (n) of the experiment, which the sample mean (xbar) refers to.
The sample means are evaluated for the k = 50, 100, 500, 1000 replications of n =
9, 25, 64, 100 pseudo-random numbers from X N ( = 4, 2 = 1). The construction
of X, which may seem a bit clumsy, will simplify the production of the graphs in Fig.
1.2 by means of the function histogram in the package lattice.
The assignment of the rows of X is obtained by using a double for cycle.
Observe that the pseudo-random numbers are generated, by the function replicate,
in blocks (arrays) of dimension k.
cbind binds column/matrix elements in a single matrix: in the present case blocks are
constructed, which contain in the first and second column the k and n identifiers and
in the third column the values of the sample means. All the blocks are subsequently
stacked in the X matrix by means of the function rbind.
By initializing the seed for each n, the first generated samples do not vary when we
increase the number k of replications.
Variables k and n in the data.frame X are then assigned the nature (class) of factors,
that is categorical variables, to simplify the graphical representation by means of the
3.0 3.5 4.0 4.5 5.0
3.0 3.5 4.0 4.5 5.0
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
4
3
2
1
0
k = 1000, 500, 100, 50
4
3
2
1
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
4
3
2
1
0
4
3
2
1
0
3.0 3.5 4.0 4.5 5.0
3.0 3.5 4.0 4.5 5.0
n = 9, 25, 64, 100
Figure 1.2 Distribution of the sample mean from X N (4, 1);

n: sample size, k: number of replications
function histogram available in the package lattice.

The function histogram is applied to represent the values of the sample means (xbar)
classified according to the different levels of the interaction of k and n, see the R help
?lattice::histogram for more information on the function.
1.1.2
The Central Limit Theorem
We now consider what happens when X, random variable with E(X) = and variance
V ar(X) = 2 , is not Normally distributed.
is the sample mean from (x1 , . . . xn ), realization of the n-dimensional random
If X
variable X1 , . . . Xn , whose components are identically and independently distributed
as X, by invoking the central limit theorem we have asymptotically that:
N (, 2 /n).
(1.4)
We remark that in this instance we have not required X to be normally distributed.
To give evidence to the central limit theorem result let X1 , . . . Xn be identically

and independently distributed as a Uniform(0,1) or an Exponential( = 4) random
variable, whose density functions are respectively:

1
0<y<1
Y U (0, 1) :
f (y) =
0
elsewhere

ew
0<w<
W Exp() :
f (w; ) =
0
elsewhere
with expected values:
E(Y ) =
and variances:
1
2
and
E(W ) =
1
1
and
V ar(W ) = 2 .
12
If we study the behaviour of the sample mean in presence of k = 50, 100, 500, 1000
replications for the sample sizes n = 9, 25, 64, 100 from the above distributions X
U (0, 1) and X Exp(), we can observe that, according to relationship (1.4), the
dispersion of the sample mean estimator reduces when n increases; while when k
gets larger the distribution of the sample mean is approximated by a Normal random
variable.
Figures 1.3 and 1.4 give evidence of the result and can be obtained by using the
same code producing Fig. 1.2, after having substituted the instruction
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rnorm(n,4,1)))))
with the code:
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(runif(n)))))
for the uniform case and
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rexp(n,4)))))
for the exponential case.
V ar(Y ) =
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.2 0.3 0.4 0.5 0.6 0.7 0.8
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
10
5
0
k = 1000, 500, 100, 50
10
5
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
10
5
0
10
5
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.2 0.3 0.4 0.5 0.6 0.7 0.8
n = 9, 25, 64, 100
Figure 1.3 Distribution of the sample mean from Y U (0, 1);

0.1 0.2 0.3 0.4 0.5 0.6
0.1 0.2 0.3 0.4 0.5 0.6
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
15
10
5
0
k = 1000, 500, 100, 50
15
10
5
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
15
10
5
0
15
10
5
0
0.1 0.2 0.3 0.4 0.5 0.6
0.1 0.2 0.3 0.4 0.5 0.6
n = 9, 25, 64, 100
Figure 1.4 Distribution of the sample mean from W Exp(4);

2
An Introduction to Linear
Regression
2.1
Example: Individual wages (2.1.2)
We have first to read the data, available in the file wages1.dat, included in the
compressed file ch02.zip.
2.1.1
Data Reading and summary statistics
The function read.table allows one to read from a text data set file, where data
have been stored in text format, and create a data.frame, see Appendix A.3. The
data set file is assumed to be in a tabular form with one or more spaces or a tab as
field separator. The function unzip extracts a file from a compressed archive.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
The description of the variables is provided in the file wages1.txt:
exper: experience in years
male: 1 if male, 0 if female
school: years of schooling
wage: wage (in 1980 $) per hour
To explore the initial and the final part of a data frame use the functions head and
tail.
> head(wages1)
EXPER MALE SCHOOL
1
9
0
13
2
12
0
12
3
11
0
11
4
9
0
14
5
8
0
14
6
9
0
14
WAGE
6.315296
5.479770
3.642170
4.593337
2.418157
2.094058
10
An Introduction to Linear Regression
> tail(wages1)
3289
3290
3291
3292
3293
3294
EXPER MALE SCHOOL

WAGE
5
1
8 5.512004
6
1
9 4.287114
5
1
9 7.145190
6
1
9 4.538784
10
1
8 2.909113
7
1
7 4.153974
The function summary produces some statistics summarizing the columns (variables)
of the data frame. The results may be compared with the sample statistics provided
by Verbeek in the file wages1.txt.
> summary(wages1)
EXPER
Min.
: 1.000
1st Qu.: 7.000
Median : 8.000
Mean
: 8.043
3rd Qu.: 9.000
Max.
:18.000
WAGE
Min.
: 0.07656
1st Qu.: 3.62157
Median : 5.20578
Mean
: 5.75759
3rd Qu.: 7.30451
Max.
:39.80892
MALE
Min.
:0.0000
1st Qu.:0.0000
Median :1.0000
Mean
:0.5237
3rd Qu.:1.0000
Max.
:1.0000
SCHOOL
Min.
: 3.00
1st Qu.:11.00
Median :12.00
Mean
:11.63
3rd Qu.:12.00
Max.
:16.00
If you want all the sample statistics provided in the file wages1.txt you can use the
function vsummary defined by the following code1 :
> vsummary0 <- function(x) c(Obs = length(x), Mean = mean(x),
Std.Dev. = sd(x), Min = min(x), Max = max(x),
na = sum(is.na(x)))
> vsummary <- function(x) t(apply(x, 2, vsummary0))
> vsummary(wages1)
EXPER
MALE
SCHOOL
WAGE
Obs
Mean Std.Dev.
3294 8.0434123 2.2906610
3294 0.5236794 0.4995148
3294 11.6305404 1.6575447
3294 5.7575850 3.2691858
Min
Max na
1.00000000 18.00000 0
0.00000000 1.00000 0
3.00000000 16.00000 0
0.07655561 39.80892 0
1 We add the information regarding the possible presence of missing values. The function is.na
returns the logical value TRUE if its argument is identified as not available (NA), otherwise FALSE.
40
11
30
20
females
males
10
Figure 2.1
2.1.2
Box & Whiskers plot of wages for males and females
Some graphical representations and grouping statistics
Lets compare the wages for males and females. A useful graphical representation is
the Box & Whiskers plot, see Fig. 2.1. Recall that the levels of the three lines defining
the box correspond respectively to the first, the second and the third Quartile of the
data (the second Quartile is the median). The values placed outside the two whiskers
may be considered anomalous with respect to the other data, see Chambers et. al.
(1983).
We can obtain the graph by having recourse to the function boxplot. The first
argument in this function is a formula, see Appendix A.4, establishing that we are
studying the WAGE as a function (~) of the gender (dummy variable MALE). The second
argument is the name of the data.frame containing the involved variables. By using
the third argument we attribute proper names to the values 0 and 1, that are assumed
by the variable MALE, which will appear on the graph.
> boxplot(WAGE ~ MALE, data = wages1, names = c("females",
"males"))
We can also represent the wage as a function of the years of experience, see Fig. 2.2
12
40
20
10
WAGE
30
10
15
40
EXPER
30
20
10
Figure 2.2
experience
9 10
12
14
16
18
Scatterplot and Box & Whiskers plot of wages by the number of years of
> layout(1:2)
> plot(WAGE ~ EXPER, data = wages1)
> boxplot(WAGE ~ EXPER, data = wages1)
The function plot results in a scatter plot diagram of the involved variables. The
function layout(matrix) creates a multifigure environment, the numbers in the
matrix (in our instance a column vector) define the pointer sequence specifying the
order the different graphs will appear.
We may desire to produce different graphs, for males and females, representing the
wage as a function of the years of experience, see Fig. 2.3. It is preferable to first
recode the dummy variable MALE in a categorical one, e.g. GENDER, that is a factor,
whose levels are f and m.
> wages1$gender <- as.factor(wages1$MALE)
> levels(wages1$gender) <- c("f", "m")
Finally we can produce the boxplot by studying the wage as a function of the
interaction (:) between experience and gender, that is as a function of the set of
40
13
30
20
10
9 10
40
12
14
16
18
30
20
10
1.f
4.f
7.f
10.f 13.f 16.f 1.m
4.m
7.m
11.m
15.m
Figure 2.3 Scatterplot and Box & Whiskers plot of wages by gender and the number of
years of experience
pairs of levels assumed by the two variables.

> layout(1:2)
> boxplot(WAGE ~ EXPER, data = wages1)
> boxplot(WAGE ~ EXPER:gender, data = wages1)
An easy way to obtain summary results for the variables in the data.frame, separately
for females and males is by means of the instruction by.
The first argument is an array or a data.frame or a matrix on whose columns
the function specified as third argument will be applied. The second argument is a
grouping variable whose length must be equal to the number of rows of the object
given as first argument.
We omit from the analysis the categorical variable gender (fifth column in the
data.frame wages1).
> by(wages1[, -5], wages1$MALE, summary)
wages1$MALE: 0
EXPER
MALE
SCHOOL
WAGE
14
Min.
: 1.000
Min.
:0
Min.
: 5.00
Min.
: 0.07656
1st Qu.: 6.000
1st Qu.:0
1st Qu.:11.00
1st Qu.: 3.17564
Median : 8.000
Median :0
Median :12.00
Median : 4.69326
Mean
: 7.732
Mean
:0
Mean
:11.84
Mean
: 5.14692
3rd Qu.: 9.000
3rd Qu.:0
3rd Qu.:13.00
3rd Qu.: 6.53275
Max.
:16.000
Max.
:0
Max.
:16.00
Max.
:32.49740
-----------------------------------------------wages1$MALE: 1
EXPER
MALE
SCHOOL
WAGE
Min.
: 2.000
Min.
:1
Min.
: 3.00
Min.
: 0.1535
1st Qu.: 7.000
1st Qu.:1
1st Qu.:10.00
1st Qu.: 4.0290
Median : 8.000
Median :1
Median :12.00
Median : 5.6543
Mean
: 8.326
Mean
:1
Mean
:11.44
Mean
: 6.3130
3rd Qu.:10.000
3rd Qu.:1
3rd Qu.:12.00
3rd Qu.: 7.8913
Max.
:18.000
Max.
:1
Max.
:16.00
Max.
:39.8089
2.1.3
Simple Linear Regression
Lets study by a linear regression model how the mean level of the variable WAGE
changes as a function of the gender: we can regress the variable WAGE on the dummy
variable MALE, which assumes value 1 when the subject is male and 0 when she is
female. We make use of the function linear model (lm); the first argument is the
regression formula, where the ~ symbol separates the dependent variable from the
independent one. The intercept is enclosed by default. The data argument specifies
the name of the data.frame containing the data.
We are thus studying the model
WAGE = 1 + 2 MALE + ERROR
(2.1)
whose parameter estimates are reported in Verbeeks Table 2.1.

> regr2.1 <- lm(WAGE ~ MALE, data = wages1)
With the function summary it is possible to summarize the results contained in the
lm object regr2.1 produced by the linear model estimation.
> summary(regr2.1)
Call:
lm(formula = WAGE ~ MALE, data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***
15
MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Recall that, in R, every result is an object and that the instructions names and str2
allow one to discover respectively the element names and the structure of any object.
> names(regr2.1)
[1] "coefficients"
[4] "rank"
[7] "qr"
[10] "call"
"residuals"
"fitted.values"
"df.residual"
"terms"
"effects"
"assign"
"xlevels"
"model"
Thus the object regr2.1 is a list containing 12 elements. If we want to extract one of
its elements, e.g. the coefficients, we may invoke one of the 3 following commands:
> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1["coefficients"]
$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1[["coefficients"]]
(Intercept)
MALE
5.146924
1.166097
obtaining respectively a vector, a list and again a vector.
Pay attention! The command3
> regr2.1["coefficients"] %*% c(1,2)
returns an Error, since the result of regr2.1["coefficients"] is a list and not a
vector and cannot be used as an argument of a matrix product. See Chapter 2 of
Longhow Lam (2010) for the definition of the Data Objects: list and vector.
Remember to use always double square brackets to extract elements in form of vectors
from a list object. The following instructions are correct:
> regr2.1[["coefficients"]] %*% c(1, 2)
2 We
omit to report the call and the result of the function str(regr2.1).
the help ?Arithmetic to have information on arithmetic operators in R: here %*% stands for
the matrix product.
3 See
16
[,1]
[1,] 7.479118
> regr2.1$coefficients %*% c(1, 2)
[,1]
[1,] 7.479118
Other useful statistics resulting from a regression analysis are available in the
object obtained by applying the function summary to the result of lm; so
names(regr2.1) and names(summary(regr2.1)) give different information. The
result of summary(regr2.1) is itself a list containing 11 elements.
> output <- summary(regr2.1)
> names(output)
[1] "call"
"terms"
[4] "coefficients" "aliased"
[7] "df"
"r.squared"
[10] "fstatistic"
"cov.unscaled"
> output$fstatistic
value
numdf
dendf
107.9338
1.0000 3292.0000
2.1.4
"residuals"
"sigma"
"adj.r.squared"
Confidence intervals (Section 2.5.2)
To test whether the parameter 2 is zero, that is to test the null hypothesis
H0 : 2 = 0, we can construct a confidence interval at level (1 ).
We have first to recall the coefficient estimates, their standard errors and the
degrees of freedom, we must establish a value for and determine the corresponding
percentage points for the t random variable.
As we have just recalled regr2.1$coefficients and regr2.1$df extract

respectively the coefficients and the degrees of freedom from the object regr2.1.
output$cov.unscaled extracts from the object output the matrix (X 0 X)1 .
The instruction is equivalent to summary(regr2.1)$cov.unscaled, remembering that we have assigned to the object output the result of summary(regr2.1).
Finally the function diag extracts the main diagonal from a matrix and by
means of qt(p,df) it is possible to define the p quantile of a t distribution
with df degrees of freedom.
> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> coefse <- output$sigma * diag(output$cov.unscaled)^0.5
> coefse
(Intercept)
MALE
0.08122482 0.11224216
17
> regr2.1$df
[1] 3292
> alpha <- 0.05
> qt(1 - alpha/2, regr2.1$df)
[1] 1.960685
The lower and upper bounds of the MALE coefficient result respectively:
> regr2.1$coefficients[2] + c(-1, 1) * qt(1 - alpha/2,
regr2.1$df) * output$sigma * output$cov.unscaled[2,
2]^0.5
[1] 0.946 1.386
The confidence intervals, based on the t distribution, may also be obtained directly
for all parameter estimates, by using the function confint:
> confint(regr2.1, level = 1 - alpha)
2.5 % 97.5 %
(Intercept) 4.988 5.306
MALE
0.946 1.386
2.2
Multiple Linear Regression (Section 2.5.5)
2.2.1
Parameter estimation
We want to obtain the parameter estimates of the following linear model:

WAGE = 1 + 2 MALE + 3 SCHOOL + 4 EXPER + ERROR
(2.2)
The function lm allows us to perform also a linear regression with more variables as
regressors.
As we have already stated, the symbol ~ separates in a formula the dependent
variable from the independent ones and the + symbol, preceding a variable, indicates
the presence of that variable in the model. The intercept is enclosed by default. See
Appendix A.4.
With the following syntax we declare we desire to study, by making use of a linear
model (lm), the relationship between the variable WAGE and the set of independent
variables MALE, SCHOOL and EXPER for the data.frame wages1.
> regr2.2 <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
> summary(regr2.2)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457
3Q
Max
1.444 34.194
18
Coefficients:
Estimate Std. Error t value
(Intercept) -3.38002
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07
***
***
***
***
"*" 0.05 "." 0.1 " " 1

2.2.2
ANOVA to compare the two models (Section 2.5.5)
To establish if the variables SCHOOL and EXPER add a significant joint effect to the
variable MALE for explaining the dependent variable WAGE, we can compare the latter
model we have estimated (2.2) with (2.1) by using the function anova which performs
an analysis of variance in presence of nested models, see Verbeek p. 27. The first
argument of anova is the object resulting from lm applied to the simpler model, the
second argument is the lm object from the estimation of the more complex model.
> anova(regr2.1, regr2.2)
Analysis of Variance Table
Model 1: WAGE ~ MALE
Model 2: WAGE ~ MALE + SCHOOL + EXPER
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
3292 34077
2
3290 30528 2
3549 191.24 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
2.3
CAPM example (Section 2.7)
We can import as we made in Section 2.1 the data from the data set capm.dat.
> capm <- read.table(unzip("ch02.zip", "Chapter 2/capm.dat"),
header = T)
We remind that by using the functions head(), tail() and summary() applied to
the data.frame capm it is possible to explore the beginning and the final sections of
the data.frame and to obtain the summary statistics for all the variables included
in capm.
19
The data set contains information on stock market data, see the file capm.dat. Data,
pertaining the following variables, were collected from January 1960 to December
2006.
foodrf: excess returns food industry
durblrf: excess returns durables industry
constrrf: excess returns construction industry
rmrf: excess returns market portfolio
rf: risk free return
jan: dummy for January
smb: excess return on the Fama-French size (small minus big) factor
hml: excess return on the Fama-French value (high minus low) factor
2.3.1
CAPM regressions (without intercept) (Table 2.3)
Verbeek first considers the parameter estimation of the following three linear
regression models where the intercept is not included.
foodrf = 1 rmrf + ERROR
(2.3)
durblrf = 1 rmrf + ERROR
(2.4)
constrrf = 1 rmrf + ERROR
(2.5)
Observe the presence of the element -1 in the following formulae, first arguments of
the call to lm. It drops the intercept from the list of the regressors. See Appendix A.4.
> regr2.3f <- lm(foodrf ~ -1 + rmrf, data = capm)
> regr2.3d <- lm(durblrf ~ -1 + rmrf, data = capm)
> regr2.3c <- lm(constrrf ~ -1 + rmrf, data = capm)
Food
> summary(regr2.3f)
Call:
lm(formula = foodrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-13.539 -1.026
Median
0.141
3Q
1.745
Max
15.924
Coefficients:
rmrf 0.75774
0.02579
29.39
<2e-16 ***
---
20
Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Durables
> summary(regr2.3d)
Call:
lm(formula = durblrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q Median
-9.6504 -1.9420 -0.3069
3Q
Max
1.7332 17.8871
Coefficients:
rmrf 1.04736
0.02775
37.74
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
F-statistic: 1424 on 1 and 609 DF, p-value: < 2.2e-16
0.7
Construction
> summary(regr2.3c)
Call:
lm(formula = constrrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-12.9414 -1.7193
Median
-0.1866
3Q
1.4458
Max
11.6551
Coefficients:
rmrf 1.16662
0.02535
46.01
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
21
How to produce results more appealing to read

The three preceding outputs are useful to separately interpret the three models we
had to estimate, regarding respectively the food, durables and construction industries.
We can present the results in a more efficient way to compare the three models,
by making use of the function mtable that is available in the package memisc.
The arguments to pass to mtable are the three objects we obtained applying the
instruction linear model lm to the food, durables and construction industries.
> library(memisc)
> mtable(regr2.3f, regr2.3d, regr2.3c)
Calls:
regr2.3f: lm(formula = foodrf ~ -1 + rmrf, data = capm)
regr2.3d: lm(formula = durblrf ~ -1 + rmrf, data = capm)
regr2.3c: lm(formula = constrrf ~ -1 + rmrf, data = capm)
=============================================
regr2.3f regr2.3d regr2.3c
--------------------------------------------rmrf
0.758*** 1.047*** 1.167***
(0.026)
(0.028)
(0.025)
--------------------------------------------R-squared
0.586
0.700
0.777
adj. R-squared
0.586
0.700
0.776
sigma
2.884
3.105
2.836
F
863.524 1424.100 2117.287
p
0.000
0.000
0.000
Log-likelihood -1511.236 -1556.104 -1500.924
Deviance
5066.744 5869.713 4898.298
AIC
3026.472 3116.207 3005.847
BIC
3035.299 3125.034 3014.674
N
610
610
610
=============================================
22
We can change the title and the labels in the preceding table, specify which statistics
have to appear in the final part of the table, and also relabel the name of the
independent variable rmrf:
> mtable2.3fdc <- mtable(Food = regr2.3f, Durables = regr2.3d,
Construction = regr2.3c, summary.stats = c("R-squared",
"sigma"))
> mtable2.3fdc <- relabel(mtable2.3fdc, rmrf = "excess market return")
> mtable2.3fdc
Calls:
Food: lm(formula = foodrf ~ -1 + rmrf, data = capm)
Durables: lm(formula = durblrf ~ -1 + rmrf, data = capm)
Construction: lm(formula = constrrf ~ -1 + rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------excess market return
0.758***
1.047***
1.167***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.586
0.700
0.777
sigma
2.884
3.105
2.836
============================================================
Evaluation of the uncentered R2 s
According to relationship (2.43) in Verbeek the uncentered R2 s is to be evaluated
when a linear model has no intercept. The uncentered R2 s are automatically produced
by R for the three models and figure in the previous output as R-squared (the R
software takes into account the information that the models are constrained).
> 1
[1]
> 1
[1]
> 1
[1]
- sum(regr2.3f$residuals^2)/sum(capm$foodrf^2)
0.5864245
- sum(regr2.3d$residuals^2)/sum(capm$durblrf^2)
0.7004574
- sum(regr2.3c$residuals^2)/sum(capm$constrrf^2)
0.7766193
2.3.2
23
Testing an hypothesis on 1
To test if the coefficients 1 in the linear models (2.3)-(2.5) can be assumed different
from 1 we have to evaluate the statistic:
1 1
.
se(1 )
The estimate of the variance of 1 may be obtained by using the instruction vcov,
which returns the covariance matrix of the parameter estimates. The matrix reduces
in the present case to a scalar, since we are considering a linear model with only one
predictor and without the constant term.
> vcov(regr2.3f)
rmrf
rmrf 0.0006649123
We can thus evaluate the above statistic for the three situations:
> sampletf <- (regr2.3f$coefficients[[1]] - 1)/vcov(regr2.3f)^0.5
> sampletd <- (regr2.3d$coefficients[[1]] - 1)/vcov(regr2.3d)^0.5
> sampletc <- (regr2.3c$coefficients[[1]] - 1)/vcov(regr2.3c)^0.5
and by using the code:
> paste("(Food) statistic: ", round(sampletf, 4),
"
p-value: ", round(2 * (1 - pt(abs(sampletf),
regr2.3f$df)), 4))
> paste("(Durables) statistic: ", round(sampletd,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletd),
regr2.3d$df)), 4))
> paste("(Construction) statistic: ", round(sampletc,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletc),
regr2.3c$df)), 4))
we obtain
[1] "(Food) statistic:
-9.3951
p-value: 0"
[1] "(Durables) statistic:
1.7065
p-value: 0.0884"
[1] "(Construction) statistic:
6.5719
p-value: 0"
The function linearHypothesis in the package car performs directly an F test. The
first argument is the lm object and the second one specifies the hypothesis to be tested
in matrix or symbolic form (see the help ?car::linearHypothesis).
Observe that the values of the statistic F are equal to the squared values of the t
statistics obtained above, while the p-values do coincide, since the proposed tests are
similar.
> library(car)
> linearHypothesis(regr2.3f, "rmrf=1")
24
Linear hypothesis test

Hypothesis:
rmrf = 1
Model 1: restricted model
Model 2: foodrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
610 5801.1
2
609 5066.7 1
734.37 88.268 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> linearHypothesis(regr2.3d, "rmrf=1")
Hypothesis:
rmrf = 1
Model 2: durblrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
610 5897.8
2
609 5869.7 1
28.067 2.912 0.08843 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> linearHypothesis(regr2.3c, "rmrf=1")
Hypothesis:
rmrf = 1
Model 2: constrrf ~ -1 + rmrf
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
610 5245.7
2
609 4898.3 1
347.39 43.19 1.068e-10 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
2.3.3
25
CAPM regressions (with intercept) (Table 2.4)
In Verbeek it is then considered the parameter estimation of the following three linear
regression models:
foodrf = 1 + 2 rmrf + ERROR
(2.6)
durblrf = 1 + 2 rmrf + ERROR
(2.7)
constrrf = 1 + 2 rmrf + ERROR
(2.8)
>
>
>
>
>
regr2.4f <- lm(foodrf ~ rmrf, data = capm)

regr2.4d <- lm(durblrf ~ rmrf, data = capm)
regr2.4c <- lm(constrrf ~ rmrf, data = capm)
library(memisc)
mtable2.4 <- mtable(Food = regr2.4f, Durables = regr2.4d,
"sigma"))
> mtable2.4 <- relabel(mtable2.4, "(Intercept)" = "constant",
rmrf = "excess market return")
> mtable2.4
Calls:
Food: lm(formula = foodrf ~ rmrf, data = capm)
Durables: lm(formula = durblrf ~ rmrf, data = capm)
Construction: lm(formula = constrrf ~ rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------constant
0.325**
-0.131
-0.073
(0.117)
(0.126)
(0.115)
excess market return
0.751***
1.050***
1.168***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.583
0.700
0.776
sigma
2.869
3.104
2.837
============================================================
26
2.3.4
CAPM regressions (with intercept and January dummy) (Table

2.5)
The following models are considered to verify the presence of the January effect:
foodrf = 1 + 2 jan + 3 rmrf + ERROR
(2.9)
durblrf = 1 + 2 jan + 3 rmrf + ERROR
(2.10)
constrrf = 1 + 2 jan + 3 rmrf + ERROR
(2.11)
>
>
>
>
>
regr2.5f <- lm(foodrf ~ jan + rmrf, data = capm)

regr2.5d <- lm(durblrf ~ jan + rmrf, data = capm)
regr2.5c <- lm(constrrf ~ jan + rmrf, data = capm)
library(memisc)
mtable2.5 <- mtable(Food = regr2.5f, Durables = regr2.5d,
"sigma"))
rmrf = "excess market return", jan = "January dummy")
> mtable2.5
Calls:
Food: lm(formula = foodrf ~ jan + rmrf, data = capm)
Durables: lm(formula = durblrf ~ jan + rmrf, data = capm)
Construction: lm(formula = constrrf ~ jan + rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------constant
0.397**
-0.143
-0.122
(0.121)
(0.132)
(0.120)
January dummy
-0.878*
0.139
0.604
(0.419)
(0.455)
(0.415)
excess market return
0.753***
1.050***
1.167***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.586
0.700
0.776
sigma
2.861
3.107
2.835
============================================================
2.4
The Worlds Largest Hedge Found (Section 2.7.3)
Data are available in the file madoff.dat in the zip file ch02.zip.
> madoff <- read.table(unzip("ch02.zip", "Chapter 2/madoff.dat"),
header = T)
27
The following variables are included:
fsl: return (in %) on Fairfield Sentry
fslrf: excess returns
rf: risk free rate
rmrf: excess return on the market portfolio
hml: excess return on the Fama-French value (high minus low) factor
smb: excess return on the Fama-French size (small minus big) factor
Verbeek observes that a simple inspection of the return series produces some
suspiciuos results that are evident by considering some summary statistics:
the mean and the standard deviation that can be obtained by using the functions
mean and sd
> mean(madoff$fsl)
[1] 0.8422326
> sd(madoff$fsl)
[1] 0.7086928
the number of months with a negative return computed by summing up the

elements of the logical variable resulting from madoff$fsl<0 (which results
TRUE = 1 when the return is negative)
> sum(madoff$fsl < 0)
[1] 16
and the fraction of months with a negative return over the whole considered
periods, that is the ratio between the last result we obtained and the length of
the series (number of periods)
> sum(madoff$fsl < 0)/length(madoff$fsl)
[1] 0.0744186
A CAPM analysis is then performed, see Verbeeks Table 2.6, by considering the
following linear model
fslrf = 1 + 2 rmrf + ERROR
> regr2.6 <- lm(fslrf ~ rmrf, data = madoff)

> summary(regr2.6)
28
Call:
lm(formula = fslrf ~ rmrf, data = madoff)
Residuals:
Min
1Q
Median
-1.34773 -0.48005 -0.08337
3Q
0.38865
Max
2.97276
Coefficients:
(Intercept) 0.50495
0.04570 11.049 < 2e-16 ***
rmrf
0.04089
0.01072
3.813 0.00018 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic: 14.54 on 1 and 213 DF, p-value: 0.0001801
2.5
Dummy Variables Treatment and Multicollinearity

(Section 2.8.1)
With regard to the data set Wages in the USA it is now considered the parameter
estimation of the following three equivalent linear regression models4 :
4 Remind
WAGE = const + M MALE + ERROR
(2.12)
WAGE = const + F FEMALE + ERROR
(2.13)
WAGE = M MALE + F FEMALE + ERROR
(2.14)
that the parameters in the model

WAGE = const + M MALE + F FEMALE + ERROR,
where MALE is a dummy variable with values 0 and 1 and FEMALE satisfies FEMALE = 1 - MALE, are
not identified, since there is exact collinearity among the constant and the dummy variables MALE
and FEMALE; so one of the variables has to be omitted from the model.
In (2.12) the substitution FEMALE = 1 - MALE has been performed, so dropping the variable FEMALE:
(const + F ) + (M F ) MALE = const + M MALE,
In (2.13) the substitution MALE = 1 - FEMALE has been performed, so dropping the variable MALE
(const + M ) + (F M ) FEMALE = const + F FEMALE,
In (2.14) the identity FEMALE + MALE = 1 has been taken into account, it follows that
WAGE = const (MALE + FEMALE) + M MALE + F FEMALE + ERROR
and:
(const + M ) MALE + (const + F ) FEMALE = M MALE + F FEMALE
Finally we have:
const = F
and
const = M .
29
to which correspond the following three model formulae.
WAGE ~ MALE
WAGE ~ I(1 - MALE)
WAGE ~ -1 + MALE + I(1 - MALE)
Remember that the dummy variable MALE assumes value 1 when the statistical unit
is male and 0 when she is female; so we can define a new dummy variable FEMALE as
1 - MALE.
To write the formula for the second regression model we have to use WAGE ~ I(1
- MALE), unless we do explicit define the new variable FEMALE <- 1 - MALE and use
it in the formula: WAGE ~ FEMALE, but this can be avoided.
Observe that with the function as is I() we specify to R that the difference sign
(-) is to be interpreted in the arithmetic sense and not in the formula sense, which
would drop the variable MALE from the model.
The function lm(WAGE ~ 1 - MALE, data = wages1) would namely result in a
model containing only the intercept, since the minus sign indicates to drop the variable
MALE from the model.
In specifying the third model the presence of the term -1 in the formula excludes
the intercept.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
Regression 2.7A
> regr2.7A <- lm(WAGE ~ MALE, data = wages1)
> summary(regr2.7A)
Call:
lm(formula = WAGE ~ MALE, data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***
MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Regression 2.7B
30
> regr2.7B <- lm(WAGE ~ I(1 - MALE), data = wages1)

> summary(regr2.7B)
Call:
lm(formula = WAGE ~ I(1 - MALE), data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
(Intercept) 6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) -1.16610
0.11224 -10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Regression 2.7C
> regr2.7C <- lm(WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
> summary(regr2.7C)
Call:
lm(formula = WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
MALE
6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) 5.14692
0.08122
63.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
31
Presenting the results in a nicer way

As we have already recalled in Section 2.3.1 we can produce an output to compare
the results for the three models, like Verbeeks Table 2.7, by having recourse to the
function mtable in the package memisc.
> library(memisc)
> mtable2.7 <- mtable(A = regr2.7A, B = regr2.7B, C = regr2.7C,
summary.stats = c("R-squared", "sigma"))
MALE = "male", "I(1 - MALE)" = "female")
> mtable2.7
Calls:
A: lm(formula = WAGE ~ MALE, data = wages1)
B: lm(formula = WAGE ~ I(1 - MALE), data = wages1)
C: lm(formula = WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
========================================
A
B
C
---------------------------------------constant
5.147*** 6.313***
(0.081)
(0.077)
male
1.166***
6.313***
(0.112)
(0.077)
female
-1.166*** 5.147***
(0.112)
(0.081)
---------------------------------------R-squared
0.032
0.032
0.764
sigma
3.217
3.217
3.217
========================================
2.6
Missing Data, Outliers and Influential Observations
See Section 4.1.8.

The Least Absolute Deviation approach to parameter estimation has been
implemented in R by Koenker in the package quantreg, see Koenker (2012).
2.7
How to check the form of the distribution
In statistical analyses is important to check if data follow some distribution. For

example the classical assumptions on the linear model require that errors are
distributed according to a Normal random variable. Thus, after having estimated
a linear model, one has to check if this distributional hypothesis is not rejected for
the residuals. The same issue is present in the analysis of time series when e.g. a
distributional assumption on white noise is made, see Chapter 8.
32
Later on let data be a series with elements x1 , . . . , xn . For the sake of simplicity
we work with data simulated from a normal distribution with mean equal to 50 and
unitary variance.
> set.seed(123)
> data <- rnorm(100) + 50
We first consider a graphical inspection of the distribution by plotting an histogram
of data with the theoretical density function of a Normal random variable; then the
2 goodness-of-fit test is introduced. The discussion will proceed by comparing the
empirical cumulative distribution function of data with the theoretical cumulative
distribution function of a Normal random variable; the Kolmogorov-Smirnov test is
based on this comparison. Two graphical tools, the QQ-plot and the PP-plot, will
be derived from the comparison of the empirical and the theoretical distribution
functions. All reasoning applies also in case we want to test some distributional
assumptions different than the Normal one.
2.7.1
Data histogram with the theoretical density function
We can obtain the histogram of data by using the function hist; pay attention to set
the argument freq = FALSE; in this way relative frequencies and relative densities,
in case of equal and different length intervals respectively, will be plotted
> data.hist <- hist(data, freq = FALSE)
We can add the density of a Normal distribution, by setting the mean and the standard
deviation arguments equal to the sample mean and the sample standard deviation
values of data, see Fig. 2.4.
> curve(dnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)
2.7.2
The 2 goodness-of-fit test
The object data.hist contains all information necessary to create the histogram.
Namely data.hist$breaks gives the limits of the intervals (classes) in the histogram,
and data.hist$counts the count corresponding to each class.
> data.hist$breaks
[1] 47.5 48.0 48.5 49.0 49.5 50.0 50.5 51.0 51.5 52.0 52.5
> data.hist$counts
[1] 1 3 10 11 23 22 13 9 5 3
We can thus build the following table by considering the same classes of the histogram
(the lowest and highest bounds of the histogram are replaced with and +
respectively)
33
0.2
0.0
0.1
Density
0.3
0.4
Histogram of data
48
49
50
51
52
data
Figure 2.4 Histogram of data with the theoretical density function under the hypothesis
of normality
> data.hist$breaks[1] <- -Inf

> data.hist$breaks[length(data.hist$breaks)] <- Inf
> table <- cbind(inf = data.hist$breaks[-length(data.hist$breaks)],
sup=data.hist$breaks[-1],"observed count"=data.hist$counts,
"theoretical count" = sum(data.hist$counts) *
diff(pnorm(data.hist$breaks, mean = mean(data),
sd = sd(data))))
> table
inf sup observed count theoretical count
[1,] -Inf 48.0
1
1.100883
[2,] 48.0 48.5
3
2.971850
[3,] 48.5 49.0
10
7.540375
[4,] 49.0 49.5
11
14.275082
[5,] 49.5 50.0
23
20.167108
[6,] 50.0 50.5
22
21.262835
[7,] 50.5 51.0
13
16.730787
34
[8,] 51.0 51.5

[9,] 51.5 52.0
[10,] 52.0 Inf
9
5
3
9.824401
4.304671
1.822008
The first two columns of table contain the class bounds zj1 and zj . The third column
contains the observed frequencies and the fourth column the theoretical frequencies
under the assumption of normality. These theoretical frequencies are obtained as n
pj ,
where the probabilities pj are defined as

zj1 x
zj x
pj =
s
s
, s2 the sample mean and the sample variance.
being zj1 and zj the class limits, and x
For testing the null hyphotesis of Normality we can have recourse to the 2
goodness-of-fit test, see Mood, Graybill and Boes (1974), which is based on the
statistic
k+1
X (nj n
pj )2
Q0k =
n
pj
j=1
where k + 1 is the number of the classes.
Q0k is distributed according to a 2k random variable with k degrees of freedom. With
reference to data we have
> (qstat <- sum((table[, 3] - table[, 4])^2/table[,
4]))
[1] 3.761746
with a corresponding p-value equal to
> 1 - pchisq(qstat, nrow(table) - 1)
[1] 0.9263825
so we will not reject the null hypothesis that the elements of data are distributed
according to a Normal random variable.
2.7.3
The Kolmorogov-Smirnov test
Let
#xi x
n
be the empirical cumulative distribution function (cdf) of data and F0 (x) a theoretical
cumulative distribution function, see Fig. 2.5 where the empirical cdf is the step
function and the theoretical cdf is the continuous one.
The Kolmogorov-Smirnov statistic to test the null hypothesis X F0 (), where
F0 () is some completely specified continuous cumulative distribution function is
Fn (x) =
Kn =
sup
<x<
|Fn (x) F0 (x)|.
(2.15)
0.6
0.4
0.0
0.2
Fn(x) and F0(x)
0.8
1.0
47
48
49
50
51
52
35
53
Figure 2.5 Empirical cumulative distribution function (the step function) and the
theoretical distribution function under the null hypothesis of normality
This test can also be used to check if the observations in two data sets (x1 , . . . , xnx )
and (y1 , . . . , yny ) come from the same distribution; in this case F0 (x) is replaced with
the empirical cdf calculated on (y1 , . . . , yny ).
The Kolmogorov-Smirnov statistic is based on the maximum absolute distance
between the empirical cdf Fn () and the theoretical one F0 (), see Fig. 2.6.
> plot(ecdf(data), xlim = c(47, 53), cex = 0.5, main = "",
ylab = expression(F[n](x)~~and~~F[0](x)))
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)
> x <- sort(data)
> curve(ecdf(data)(x) - pnorm(x, mean = mean(data),
sd = sd(data)), n = 10000, xlim = c(47, 53),
ylim = c(-0.06, 0.06), ylab = "distance")
> abline(h = 0)
36
0.00
0.06
0.04
0.02
distance
0.02
0.04
0.06
47
48
49
50
51
52
53
Figure 2.6 Distance between the empirical cumulative distribution function and the
theoretical distribution function under the null hypothesis of normality
The Kolmogorov-Smirnov test can be performed by having recourse to the function

ks.test, whose arguments are: x the data whose distribution we want to test; y
either a numeric vector of data values (in case one wants to compare y to x), or
a character string naming a cdf given by the user or one of the cdfs available in
R such as pnorm (only continuous cdfs are valid); ... are additional arguments
specifying the parameters of the distribution given (as a character string) by y;
alternative indicates the alternative hypothesis and must be one of "two.sided"
(default), "less", or "greater"; exact, which is NULL by default, can be a logical
indicating whether an exact p-value should be computed.
Relationship (2.15) makes reference to the two.sided alternative hypothesis. By
setting the option alternative = "less" the null hypothesis is specified as FX ()
FY (), that is X Y i.e. X is stochastically smaller than Y ; while if we set the option
alternative = "greater" the null hypothesis is specified as FX () FY (), that is
X Y i.e. X is stochastically greater than Y .
The corresponding Kolmogorov-Smirnov statistics are respectively
D = sup<x< {FY (x) FX (x)} for the set of hypotheses

H0 : X Y i.e. FX (x) FY (x)
alternative
H1 : X > Y i.e. FX (x) < FY (x)
D+ = sup<x< {FX (x) FY (x)} for the set of hypotheses

H0 : X Y i.e. FX (x) FY (x)
alternative =
H1 : X < Y i.e. FX (x) > FY (x)
37
"less",
"greater",
max(D , D+ ) = sup<x< |FY (x) FX (x)| for the set of hypotheses

H0 : X = Y i.e. FX (x) = FY (x)
alternative = "two.sided".
H1 : X 6= Y i.e. FX (x) 6= FY (x)
By applying the function ks.test to our data we have

> ks.test(data, "pnorm", mean = mean(data), sd = sd(data))
One-sample Kolmogorov-Smirnov test
data: data
D = 0.0581, p-value = 0.8884
alternative hypothesis: two-sided
and according to this result we will not reject the null hypothesis of normality.
2.7.4
The PP-plot and the QQ-plot
We now focus our attention, following an idea suggested by Diego Zappa, on the
comparison of the empirical cdf with the theoretical cdf to obtain the so-called PPplot and QQ-plot, see Zappa, Bramante and Nai Ruscone (2012). Figure 2.7 shows a
zoom of the graph in Fig. 2.5.
Figure 2.7 can be obtained with the following code.
> xlim <- c(47, 50)
> xtextshift <- 0.15
> plot(ecdf(data), xlim = xlim, ylim = ecdf(data)(xlim),
xaxs = "i", yaxs = "i", main = "",
ylab = expression(F[n](x)~~and~~F[0](x)), cex = 0.5)
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE, xlim = xlim)
> point <- data[which.max(abs(ecdf(data)(data) - pnorm(data,
mean = mean(data), sd = sd(data))))]
> arrows(point, 0, point, ecdf(data)(point), length = 0.1,
angle = 22)
> arrows(point, 0, point, pnorm(point, mean = mean(data),
sd = sd(data)), length = 0.1, angle = 22)
38
> arrows(point, ecdf(data)(point), xlim[1], ecdf(data)(point),

length = 0.1, angle = 22)
> arrows(point, pnorm(point, mean = mean(data), sd = sd(data)),
xlim[1], pnorm(point, mean = mean(data), sd = sd(data)),
> text(point + 0.05, ecdf(data)(xlim[1]) + 0.01, expression(x[p]))
> text(xlim[1] + xtextshift, ecdf(data)(point) + 0.01,
expression(F[n](x[p])))
> text(xlim[1] + xtextshift, pnorm(point, mean = mean(data),
sd = sd(data)) + 0.01, expression(F[0](x[p])))
> point <- which(ecdf(data)(sort(data)) == 0.2)
> point <- (sort(data))[point - 6]
> fpoint <- ecdf(data)(point)
> arrows(xlim[1], fpoint, qnorm(fpoint, mean = mean(data),
sd = sd(data)), fpoint, length = 0.1, angle = 22)
> arrows(xlim[1], fpoint, point, fpoint, length = 0.1,
angle = 22)
> arrows(point, fpoint, point, ecdf(data)(xlim[1]),
> arrows(qnorm(fpoint, mean = mean(data), sd = sd(data)),
fpoint, qnorm(fpoint, mean = mean(data), sd = sd(data)),
ecdf(data)(xlim[1]), length = 0.1, angle = 22)
> text(xlim[1] + xtextshift/2, fpoint + 0.01, expression(tilde(p)))
> text(point - xtextshift, ecdf(data)(xlim[1]) + 0.01,
expression(x[tilde(p)]))
> text(qnorm(fpoint, mean = mean(data), sd = sd(data)) +
xtextshift, ecdf(data)(xlim[1]) + 0.01,
expression(x[0][tilde(p)]))
We have seen above that the Kolmogorov-Smirnov statistics are defined as a function
of the largest absolute, positive or negative difference between the two functions on
varying x.
For each xp in the ordered data set let p = Fn (xp ) be the value assumed by the
empirical cdf, see Fig. 2.7: xp is the p percentage point in the data.
Let now p = F0 (xp ) be the value assumed by the theoretical cdf in xp .
One way to compare the empirical cdf with the theoretical cdf is to obtain a scatter
plot diagram representing the pairs (p , p). This graphical representation is named
the probability-probability plot (PP-plot), see Fig. 2.8.
> p.orders <- 1:length(data)/length(data)
> plot(pnorm(sort(data), mean = mean(data), sd = sd(data)),
p.orders,pch=16, xlab = "p* (theoretical probabilities)",
ylab = "p (sample probabilities)")
> abline(0, 1)
Observe that if points are displaced on the straight line (0,0), (1,1) then F0 could
represent the data generating model. The PP-plot is particularly effective in detecting
39
Fn(xp)
0.4
F0(xp)
0.3
0.2
Fn(x) and F0(x)
~
p
0.1
0.0
47.0
47.5
48.0
x0p~
x~p
48.5
49.0
xp
49.5
50.0
Figure 2.7 Empirical cumulative distribution function and theoretical cumulative

distribution function: introduction to the PP-plot and QQ-plot graphical representations
deviatons from F0 in regions of high probability density (typically in the middle of

the distribution), see Section 2.9.
A dual way to compare the empirical cdf with the theoretical cdf is to start from a
generic value p assumed by the empirical cdf, see Fig. 2.7. We have two inverse images
of p: the value xp whose image through the empirical cdf, Fn (x), is p and the value
x0p which has image p by using the theoretical cdf F0 . The scatter plot diagram of
the pairs (x0p, xp) is named Quantile-Quantile Plot (QQ-plot), see Fig. 2.9.
> plot(qnorm(p.orders, mean = mean(data), sd = sd(data)),
sort(data), pch = 16,
xlab = expression(x[0][tilde(p)]~~(theoretical~quantiles)),
ylab = expression(x[tilde(p)]~~(sample~quantiles)))
> abline(0, 1)
The same graph can be obtained by applying to data the function qqnormAlso in
this case if points are on a straight line then F0 could represent the data generating
model. The QQ-plot is particularly effective in detecting deviations from F0 on the
tails of the distribution, see Section 2.9.
40
0.6
0.4
0.0
0.2
p (sample probabilities)
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
p* (theoretical probabilities)
Figure 2.8
2.7.5
PP-plot
Use of the function fit.cont
The function fit.cont, available in the package rriskDistributions, gives

several goodness-of-fit statistics (loglikelihood, AIC, BIC, Chi-squared, AndersonDarling and Kolmogorov-Smirnov) to check if the data follow some theoretical
cdf. The Beta, Cauchy, chi-square, non-central chi-square, exponential, F, gamma,
Gompertz, hypergeometric, lognormal, logistic, negative binomial, Normal, pert,
Poisson, Students t, truncated normal, triangular, uniform and Weibull model are
implemented.
A theoretical cdf appears in the output only when the procedure succeeds in
estimating its parameters otherwise a warning message is returned. Have a look at
the help system for more information.
The function fit.cont also produces the histogram with the theoretical density,
the QQ-plot, the empirical and theoretical cdfs and the PP-plot, see Fig. 2.10.
We observe that other statistical softwares draw the PP- and QQ-plots by switching
the x and y axes, so theoretical probabilities and theoretical quantiles will appear on
the y axis.
41
52
49
50
x~p (sample quantiles)
51
48
48
49
50
51
52
x0p~ (theoretical quantiles)
Figure 2.9
2.8
QQ-plot
Two tests for assessing normality
We consider two tests for assessing the normality distributional assumption.
2.8.1
The Jarque-Bera test
The Jarque-Bera test, see Jarque and Bera (1987), is obtained as a Lagrange
Multiplier statistic, see Verbeeks Chapter 6, and has the following forms:
in case of a sample of n observations (x1 , . . . , xn ) the Jarque-Bera statistic is

defined as:

(b1 )2
(b2 3)2
+
JB = n
6
24
where:
b1 =
,
3/2
b2 =
4
,
22
j =
1X
(xi x
)j
n i=1
and x
=
1X
xi .
n i=1
42
Figure 2.10
Fitting a continuous distribution by using the function fit.cont
Observe that b1 and b2 are respectively the skewness and kurtosis sample
coefficients, which are null under the normality assumption.
43
in case of a sample of n OLS residuals (e1 , . . . , en ) the Jarque-Bera statistic is

defined as:
"
2 #

2
1
23
4
3
1
3
1
+
3
JB = n
+
n
32
2
6
24
22
2
22
where:
n
j =
1X j
e .
n i=1 i
When the linear model includes a constant the residuals have zero mean, that
is
1 = 0, and the Jarque-Bera statistics reduces to the former definition.
In both cases the Jarque-Bera statistic is distributed as a 22 random variable with 2
degrees of freedom.
In the package tseries the function jarque.bera.test is available to perform the
Jarque-Bera test on a set of observations. By applying it on data we obtain
> library(tseries)
> jarque.bera.test(data)
Jarque Bera Test
data: data
X-squared = 0.1691, df = 2, p-value = 0.9189
and the null hypothesis of normality will not be rejected.
2.8.2
The Shapiro-Wilk test
The Shapiro-Wilk normality test, see Shapiro and Wilk (1965), is implemented in the
function shapiro.test; applying this function to data we obtain
> shapiro.test(data)
Shapiro-Wilk normality test
data: data
W = 0.9939, p-value = 0.9349
which does not reject the null hypothesis of normality.
2.9
Some further comments on the QQ-plot
We now consider the behaviour of the QQ-plot (and of the PP-plot), under the null
hypothesis of normality, in presence of data characterized by skewness, leptokurtic
and platikurtic behaviour.
44
2.9.1
Positively skewed distributions
Let X be distributed according to a Gamma distribution. The density function is

f (x; , ) =
1 1 x
x
e
I(0,) (x),
()
> 0, > 0
and we have E(X) = and V ar(X) = 2 .

Figure 2.11 shows the density functions and the cdfs of a Gamma random variable,
X, with parameters = 4 and = 2 and of a Normal random variable, Y , with mean
4
4
2 = 2 and variance 22 = 1
>
>
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = alpha/lambda), add = TRUE)
text(0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(pgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = alpha/lambda), add = TRUE)
text(2, 0.75, expression(F[X](x)), cex = 0.75)
text(2, 0.35, expression(F[Y](x)), cex = 0.75)
We can establish the behaviour of the PP- and QQ-plots by considering the cumulative
distribution functions as was shown in Section 2.7.4
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-2, 6, length = 500)
plot(pnorm(x, mean = alpha/lambda), pgamma(x, alpha,
lambda), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities",
ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = alpha/lambda), qgamma(x, alpha,
lambda), xlim = c(-2, 6), ylim = c(-2, 6), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-0.75, 1.5, "left tail thinner than the normal tail",
cex = 0.75)
text(3, 5.5, "right tail fatter than the normal tail",
cex = 0.75)
45
In this situation the left tail of X is thinner than that of Y while the right tail of X
is fatter than that of Y . Thus the quantiles on the tails of the two distributions will
have the following behaviour: for any given p (close to 0 or to 1) the quantiles of X
are larger than those of Y . The behaviour is evident by examining the QQ plot.
The PP-plot clearly detects a different behaviour of the two distributions in the
middle of the domain.
We now apply the function fit.cont to some simulated data, see Fig. 2.15.
> set.seed(123)
> skew.data <- rgamma(100, alpha, lambda)
> library(rriskDistributions)
> fit.cont(skew.data)
2.9.2
Negatively skewed distributions
Figure 2.12 shows the density functions and the cdfs of X = W , being W the Gamma
random variable with parameters = 4 and = 2 considered in the previous section,
and of a Normal random variable, Y , with mean 24 = 2 and variance 242 = 1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(-x, alpha, lambda), xlim = c(-6, 2),
curve(dnorm(x, mean = -alpha/lambda), add = TRUE)
text(-0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(-3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(1 - pgamma(-x, alpha, lambda), xlim = c(-6,
2), ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = -alpha/lambda), add = TRUE)
text(-1.75, 0.75, expression(F[X](x)), cex = 0.75)
text(-1.75, 0.35, expression(F[Y](x)), cex = 0.75)
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-6, 2, length = 500)
plot(pnorm(x, mean = -alpha/lambda), 1 - pgamma(-x,
alpha, lambda), type = "l", xaxs = "i", yaxs = "i",
ylab = "sample probabilities",
ylim = c(0, 1))
> abline(0, 1)
> x <- seq(0, 1, length = 1000)
> plot(qnorm(x, mean = -alpha/lambda), -qgamma(1 x, alpha, lambda), xlim = c(-6, 2), ylim = c(-6,
46
2), type = "l", xlab = "theoretical quantiles",

ylab = "sample quantiles")
> abline(0, 1)
> text(-2.5, -5, "left tail fatter than the normal tail",
cex = 0.75)
> text(0.75, -1.75, "right tail thinner than the normal tail",
cex = 0.75)
In this situation the left tail of X is fatter than that of Y while the right tail of X is
thinner than that of Y . Thus the quantiles on the tails of the two distributions will
have the following behaviour: for any given p (close to 0 or to 1) the quantiles of X
are smaller than those of Y . The behaviour is evident by examining the QQ plot.
As above the PP-plot clearly detects a different behaviour of the two distributions
in the middle of the domain.
We apply the function fit.cont to some simulated data, see Fig. 2.16.
> set.seed(123)
> skew.data <- -rgamma(100, alpha, lambda)
> fit.cont(skew.data)
2.9.3
Leptokurtic distributions
Let X be distributed according to a tk distribution with k degrees of freedom. We

k
.
have E(X) = 0 and V ar(X) = k2
The t distributions is used in finance since it is able to capture the fatter tails which
characterize the residuals distribution.
Figure 2.13 shows the density functions and the cdfs of a t random variable with
k = 4 degrees of freedom and of a Normal random variable, Y , with mean 0 and
variance 64 = 1.5
>
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
k = 4
curve(dt(x, k), xlim = c(-8, 8),
curve(dnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0.75, 0.35, expression(t[4]), cex = 0.75)
text(0, 0.24, "normal", cex = 0.75)
curve(pt(x, k), xlim = c(-8, 8),
curve(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0, 0.2, expression(F[X](x)), cex = 0.75)
text(1.5, 0.7, expression(F[Y](x)), cex = 0.75)
>
>
>
>
>
>
>
>
>
>
47
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-8, 8, length = 500)
plot(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), pt(x,
k), type = "l", xaxs = "i", yaxs = "i",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0, sd = (k/(k - 2))^0.5), qt(x,
k), xlim = c(-8, 8), ylim = c(-8, 8), type = "l",
abline(0, 1)
text(-1.25, -7.5, "left tail fatter than the normal tail",
cex = 0.75)
text(1, 7.5, "right tail fatter than the normal tail",
cex = 0.75)
In this situation the tails of X are fatter than those of Y . Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are smaller than those of Y ; for any given p close to 1 the
quantiles of X are larger than those of Y . The behaviour is evident by examining the
QQ plot.
The density functions are now symmetric and thus the PP-plot intersect the 0-1 line
at the center of the distributions; however it always can detect the different behaviour
of the two distributions in the middle of their domain.
> set.seed(123)
> leptokurtic.data <- rt(100, k)
> fit.cont(leptokurtic.data)
2.9.4
Platikurtic distributions
Let W be distributed according to a uniform distribution. We have E(X) = 0.5 and

1
.
V ar(X) = 12
Figure 2.14 shows the density functions and the cdfs of X and of a Normal random
1
variable, Y , with mean 0.5 and variance 12
> layout(1:2)
> par(mai = c(0.5, 0.82, 0.1, 0.42))
> curve(dunif(x), xlim = c(-1, 2), ylim = c(0, 1.5),
> curve(dnorm(x, mean = 0.5, sd = 1/12^0.5), add = TRUE)
> text(-0.1, 1, expression(f[X](x)), cex = 0.75)
> text(0.75, 1.25, expression(f[Y](x)), cex = 0.75)
48
> curve(punif(x), xlim = c(-1, 2),

> curve(pnorm(x, mean = 0.5, sd = 1/12^0.5), add = TRUE)
> text(0.9, 0.75, expression(F[X](x)), cex = 0.75)
> text(0.4, 0.2, expression(F[Y](x)), cex = 0.75)
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-1, 2, length = 500)
plot(pnorm(x, mean = 0.5, sd = 1/12^0.5), punif(x),
type = "l", xaxs = "i", yaxs = "i",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0.5, sd = 1/12^0.5), qunif(x),
xlim = c(-1, 2), ylim = c(-1, 2), type = "l",
abline(0, 1)
text(-0.5, 0.5, "left tail thinner than the normal tail",
cex = 0.75)
text(1.5, 0.5, "right tail thinner than the normal tail",
cex = 0.75)
In this situation the tails of Y are fatter than those of X. Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are larger than those of Y ; for any given p close to 1 the
quantiles of X are smaller than those of Y . The behaviour is evident by examining
the QQ plot.
As above the density functions are now symmetric and thus the PP-plot intersect
the 0-1 line at the center of the distributions; however it can detect the different
behaviour of the two distributions in the middle of their domain.
> set.seed(123)
> platikurtic.data <- runif(100)
> fit.cont(platikurtic.data)
49
0.4
fX(x)
0.3
0.2
0.0
0.1
fX(x) and fY(x)
fY(x)
1.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
FX(x) and FY(x)
0.8
0.6
0.4
0.2
0.0
sample probabilities
1.0
0.2
0.4
0.6
0.8
theoretical probabilities
4
2
0
left tail thinner than the normal tail
sample quantiles
right tail fatter than the normal tail
theoretical quantiles
Figure 2.11 Density and cumulative distribution functions of a positively skewed

distribution (Gamma( = 4, = 2)) and of a Normal random variable. Theoretical PP-plot
and QQ-plot for the comparison of the two distributions
50
0.4
fX(x)
0.3
0.2
0.0
0.1
fX(x) and fY(x)
fY(x)
1.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
FX(x) and FY(x)
0.8
0.6
0.4
0.2
0.0
1.0
0.2
0.4
0.6
0.8
0
2
right tail thinner than the normal tail
sample quantiles
left tail fatter than the normal tail
Figure 2.12 Density and cumulative distribution functions of a negatively skewed

distribution and of a Normal random variable. Theoretical PP-plot and QQ-plot for the
comparison of the two distributions
51
0.1
0.2
normal
0.0
fX(x) and fY(x)
0.3
t4
1.0
0.4
0.6
FY(x)
FX(x)
0.0
0.2
FX(x) and FY(x)
0.8
0.8
0.6
0.4
0.2
0.0
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
5
sample quantiles
right tail fatter than the normal tail
left tail fatter than the normal tail
Figure 2.13 Density and cumulative distribution functions of a t4 random variable with
4 degrees of freedom (leptokurtic distribution) and a Normal random variable. Theoretical
PP-plot and QQ-plot for the comparison of the two distributions
52
1.5
0.5
1.0
fX(x)
0.0
fX(x) and fY(x)
fY(x)
0.5
0.0
0.5
1.0
1.0
1.5
2.0
1.5
2.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
FX(x) and FY(x)
1.0
0.5
0.0
0.5
1.0
0.8
0.6
0.4
0.2
0.0
1.0
1.0
0.2
0.4
0.6
0.8
0.0 0.5 1.0 1.5 2.0
right tail thinner than the normal tail
left tail thinner than the normal tail
1.0
sample quantiles
1.0
0.5
0.0
0.5
1.0
1.5
2.0
Figure 2.14 Density and cumulative distribution functions of a uniform random variable
(platikurtic distribution) and a Normal random variable. Theoretical PP-plot and QQ-plot
for the comparison of the two distributions
53
Figure 2.15
fit.cont
Fitting positively skewed data, see Section 2.9.1, by using the function
Figure 2.16
fit.cont
Fitting
negatively skewed data, see Section 2.9.2, by using the function
54
Figure 2.17
Fitting leptokurtic data, see Section 2.9.3, by using the function fit.cont
Figure 2.18
Fitting platikurtic data, see Section 2.9.4, by using the function fit.cont
3
Interpreting and comparing Linear
Regression Models
3.1
Explaining House Prices (Section 3.4)
The variable names in the dat file housing.dat provided in the zip file ch03.zip are
not stored only in the first line (the last two names are reported on the second line
of the text file), so if one desires to read data with the command read.table, see
Section 2.1, he has to first settle, by using a text editor, the variable names in the
housing.dat file only on the first line.
Here, we import data from the file housing.dta, which is saved in the Stata format.
We have first to invoke the package foreign and next the command read.dta.
Remember that the function unzip extracts a file from a compressed archive.
> library(foreign)
> housing <- read.dta(unzip("ch03.zip", "Chapter 3/housing.dta"))
Recall that it is possible to explore the beginning section and the final section of the
data.frame and to obtain summary statistics for all the variables included in the
data-frame by using the functions head(), tail() and summary().
The first linear model proposed by Verbeek studies the interpretation of
log(price) as a function of the log(lotsize), the number of bedrooms, the number
of bathrooms and the presence of air conditioning. The corresponding parameter
estimates may be obtained by using the command lm; the log transformation on
the specified variables can also be performed without applying the function as is
I(log()) (see Section 2.5) since, for the logarithm case, there is no ambiguity
between the use of mathematical and symbolic operators proper of the formula
function. See Appendix A.4.
> regr3.1 <- lm(log(price) ~ log(lotsize) + bedrooms +
bathrms + airco, data = housing)
> summary(regr3.1)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco, data = housing)
56
Interpreting and comparing Linear Regression Models
Residuals:
Min
1Q
-0.81782 -0.15562
Median
0.00778
3Q
0.16468
Max
0.84143
Coefficients:
(Intercept)
7.09378
0.23155 30.636 < 2e-16 ***
log(lotsize) 0.40042
0.02781 14.397 < 2e-16 ***
bedrooms
0.07770
0.01549
5.017 7.11e-07 ***
bathrms
0.21583
0.02300
9.386 < 2e-16 ***
airco
0.21167
0.02372
8.923 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The estimate of s = 0.2456 (Residual standard error) is also available by invoking
the instruction summary(regr3.1)$sigma:
The expected log(price) of an house with specific characteristic may be obtained
by applying the function predict: the first argument is the lm object containing
the parameter estimates to use in the prediction; the second argument specifies the
values of the regressors, for which the prediction of the response is desired, in form
of data.frame.
> predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0))
1
11.03088
One may obtain1 the prediction of the price by calculating the exp of the preceding
value or directly by means of the following expression:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)))
1
61751.63
To include one half of the residual variance s2 in the prediction use:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)) + summary(regr3.1)$sigma^2/2)
1
63641.78
1 Values
different from those proposed by Verbeek are due to approximations.
57
Verbeek observes how the average of predicted prices, which can be extracted from
the lm object regr3.1 as regr3.1$fitted
> mean(exp(regr3.1$fitted))
[1] 66152.74
underestimates the sample average of observed prices
> mean(housing$price)
[1] 68121.6
concluding that the bias can be reduced by adding the half-variance term
> mean(exp(regr3.1$fitted + summary(regr3.1)$sigma^2/2))
[1] 68177.61
3.1.1
Testing the functional form: construction of the RESET test
According to the RESET test procedure for checking the functional form of the
preceding model, one has to include, as predictors in the model specification, the
first Q powers, e.g. the second and the third ones, of the values of y estimated by the
model. The fitted y values may be obtained with the instruction:
> yhat <- predict(regr3.1)
Observe that when no value of the regressors is given as argument of predict, the
prediction is made for the data set used to estimate the parameters in the linear
model: in this way the fitted y values are obtained.
Two ways exist to include additional terms into a formula defining a linear model:2
by invoking lm and including the specified terms in the formula:

> regr3.1RESET2 <- lm(log(price) ~ log(lotsize) + bedrooms +
bathrms + airco + I(yhat^2) + I(yhat^3), data = housing)
we can modify the formula in the basic model by using the function update.
In the sequel we will follow this latter solution.
The function update has three arguments. The first is the object to update, which is
an object of class lm, that is the result of a preceding linear model call. The second one
is the updating formula: the dot stands for the same elements, so .~. means both
members of the invoked model in their original form. The additional term squared
predicted values is then included in the formula. The third optional3 argument is the
data.frame the variables involved in the linear model refer to.
2 Remind to use the as.is I() operator to specify the powers of the regressors, since is a symbol
proper of the formula method, see Appendix A.4.
3 In this way it is possible to update an existing formula and apply it to a new data.frame
58
> regr3.1RESET2 <- update(regr3.1, . ~ . + I(yhat^2))

> summary(regr3.1RESET2)
Call:
airco + I(yhat^2), data = housing)
Residuals:
Min
1Q
-0.81468 -0.15694
Median
0.00836
3Q
0.16274
Max
0.84243
Coefficients:
(Intercept)
5.00888
4.05883
1.234
0.218
log(lotsize) -0.13381
1.03870 -0.129
0.898
bedrooms
-0.02570
0.20157 -0.128
0.899
bathrms
-0.07774
0.57105 -0.136
0.892
airco
-0.07225
0.55235 -0.131
0.896
I(yhat^2)
0.06032
0.11724
0.515
0.607
From the p-value of I(yhat^2)) one can observe that the coefficient of the squared
predicted values is not significantly different from 0.
We have now to include also the third power of the predicted values: we update the
latter model regr3.1RESET2.
> regr3.1RESET3 <- update(regr3.1RESET2, . ~ . + I(yhat^3))
> summary(regr3.1RESET3)
Call:
airco + I(yhat^2) + I(yhat^3), data = housing)
Residuals:
Min
1Q
-0.81241 -0.15526
Median
0.00843
3Q
0.15948
Max
0.84892
Coefficients:
300.6983 -0.913
0.362
log(lotsize) -33.4090
35.8094 -0.933
0.351
bedrooms
-6.4829
6.9490 -0.933
0.351
bathrms
-18.0151
19.3038 -0.933
0.351
airco
-17.6684
18.9363 -0.933
0.351
I(yhat^2)
7.4812
7.9835
0.937
0.349
I(yhat^3)
-0.2207
0.2375 -0.930
0.353
59

To check if the joint effect of the second and third powers of the predicted values is
significant, one can perform an F test by comparing the original simpler linear model
with the current RESET model specification.
> anova(regr3.1, regr3.1RESET3)
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
I(yhat^2) + I(yhat^3)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
541 32.622
2
539 32.554 2 0.068179 0.5644 0.569
It is also possible to perform the Wald test, available in the package lmtest: the
formula is similar to the one used for anova, but with the presence of an argument
specifying the kind of test to be used: the exact F test assuming that the distribution
of the errors is normal or the asymptotic result 2 based upon the application of a
likelihood ratio test.
> library(lmtest)
> waldtest(regr3.1, regr3.1RESET3, test = "F")
Wald test
Res.Df Df
F Pr(>F)
1
541
2
539 2 0.5644 0.569
> waldtest(regr3.1, regr3.1RESET3, test = "Chisq")
Wald test
Res.Df Df Chisq Pr(>Chisq)
1
541
2
539 2 1.1288
0.5687
The first call gives results analogous to anova.
60
3.1.2
Testing the functional form: A direct function to perform the

RESET test
In the package lmtest it is available also the resettest function, which performs
directly the RESET test, by specifying the following arguments:
the linear model to be tested,
the powers of the additional terms to include in the RESET specification,
the kind of terms to include (in our case the fitted values, that is the predicted
values).
> library(lmtest)
> resettest(regr3.1, power = 2, type = "fitted")
RESET test
data: regr3.1
RESET = 0.2647, df1 = 1, df2 = 540, p-value = 0.6071
> resettest(regr3.1, power = 2:3, type = "fitted")
RESET test
data: regr3.1
The RESET statistics correspond to the F statistics in an ANOVA test comparing the
two models; in the first instance the RESET value is equal to the squared t statistic
calculated above to test the significance of I(yhat^2), namely 0.51452 = 0.2647 (the
significance level is the same for the two proposed tests, which are similar, that is
equivalent, since Tk2 = F1,k ). The second RESET value coincides with the F statistic
in the ANOVA analysis.
3.1.3
Testing the functional form: the RESET test for the extended
model
Since prices may also depend on other characteristics, all the variables available in the
data set are included in the preceding model specification: we have to update model
regr3.1:
> regr3.2 <- update(regr3.1, . ~ . + driveway + recroom +
fullbase + gashw + garagepl + prefarea + stories)
> summary(regr3.2)
Call:
airco + driveway + recroom + fullbase + gashw + garagepl +
prefarea + stories, data = housing)
Residuals:
Min
1Q
-0.68355 -0.12247
Median
0.00802
3Q
0.12780
61
Max
0.67564
Coefficients:
(Intercept)
7.74509
0.21634 35.801 < 2e-16 ***
log(lotsize) 0.30313
0.02669 11.356 < 2e-16 ***
bedrooms
0.03440
0.01427
2.410 0.016294 *
bathrms
0.16576
0.02033
8.154 2.52e-15 ***
airco
0.16642
0.02134
7.799 3.29e-14 ***
driveway
0.11020
0.02823
3.904 0.000107 ***
recroom
0.05797
0.02605
2.225 0.026482 *
fullbase
0.10449
0.02169
4.817 1.90e-06 ***
gashw
0.17902
0.04389
4.079 5.22e-05 ***
garagepl
0.04795
0.01148
4.178 3.43e-05 ***
prefarea
0.13185
0.02267
5.816 1.04e-08 ***
stories
0.09169
0.01261
7.268 1.30e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
the estimate of sigma can be extracted as usual with:
> summary(regr3.2)$sigma
[1] 0.2103959
An F test is performed to compare the present model with the previous one:
> anova(regr3.1, regr3.2)
driveway + recroom + fullbase+gashw+garagepl+prefarea+stories
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
541 32.622
2
534 23.638 7
8.9839 28.993 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The following results show that according to the RESET tests the hypothesis of linear
model should not be rejected.
> resettest(regr3.2, power = 2, type = "fitted")
62
RESET test
data: regr3.2
> resettest(regr3.2, power = 2:3, type = "fitted")
RESET test
data: regr3.2
The 0.0033 in the first RESET output is the square of the rounded value (0.06) present
in Verbeek at p. 75, which is referred to the t-test formulation of the RESET test.
3.1.4
Testing the functional form: the interaction term
To establish in the formula the presence of the interaction term among prefarea and
bedrooms we can follow two methods, see Appendix A.4:
define the new term in the formula as the product between the involved
variables: I(prefarea*bedrooms).
define the new term in the formula by making use of the : operator, which in
the formula algebra stands for interaction, as prefarea:bedrooms.
We will use the second option.

> regr3.2interact <- update(regr3.2, . ~ . + prefarea:bedrooms)
> summary(regr3.2interact)
Call:
airco + driveway + recroom + fullbase + gashw + garagepl +
prefarea + stories + bedrooms:prefarea, data = housing)
Residuals:
Min
1Q
-0.68341 -0.12302
Median
0.00793
3Q
0.12909
Max
0.67559
Coefficients:
(Intercept)
log(lotsize)
bedrooms
bathrms
airco
driveway
recroom
fullbase
gashw

7.743235
0.216995 35.684 < 2e-16 ***
0.303096
0.026719 11.344 < 2e-16 ***
0.035086
0.015213
2.306 0.021477 *
0.166005
0.020429
8.126 3.12e-15 ***
0.166688
0.021453
7.770 4.06e-14 ***
0.110282
0.028259
3.903 0.000107 ***
0.057990
0.026077
2.224 0.026581 *
0.104573
0.021721
4.814 1.93e-06 ***
0.178903
0.043943
4.071 5.39e-05 ***
garagepl
0.047961
prefarea
0.146040
stories
0.091473
bedrooms:prefarea -0.004675
--Signif. codes: 0 "***" 0.001
0.011487
0.110285
0.012729
0.035556
4.175
1.324
7.186
-0.131
63
3.48e-05 ***
0.186003
2.26e-12 ***
0.895454
"**" 0.01 "*" 0.05 "." 0.1 " " 1

3.1.5
Prediction
The expected log sale price for an arbitrary house in Windsor, with the characteristics
specified in Verbeek can be obtained as:
> predictregr3.2 <- predict(regr3.2, data.frame(lotsize = 10000,
bedrooms = 4, bathrms = 1, airco = 1, driveway = 1,
recroom = 1, fullbase = 1, gashw = 1, garagepl = 2,
prefarea = 1, stories = 2))
> predictregr3.2
1
11.86959
> exp(predictregr3.2)
1
142855.1
and the prediction corrected by considering the half variance factor:
> exp(predictregr3.2 + summary(regr3.2)$sigma^2/2)
1
146052.2
3.1.6
Model with price instead of log(price) as dependent variable and

lotsize instead log(lotsize) among the predictors
Prices are then modelled instead of log prices:

> regr3.3 <- update(regr3.2, price ~ lotsize + . log(lotsize))
> summary(regr3.3)
Call:
lm(formula = price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories,
data = housing)
64
Residuals:
Min
1Q Median
-41389 -9307
-591
3Q
7353
Max
74875
Coefficients:
(Intercept) -4038.3504 3409.4713 -1.184 0.236762
lotsize
3.5463
0.3503 10.124 < 2e-16 ***
bedrooms
1832.0035 1047.0002
1.750 0.080733 .
bathrms
14335.5585 1489.9209
9.622 < 2e-16 ***
airco
12632.8904 1555.0211
8.124 3.15e-15 ***
driveway
6687.7789 2045.2458
3.270 0.001145 **
recroom
4511.2838 1899.9577
2.374 0.017929 *
fullbase
5452.3855 1588.0239
3.433 0.000642 ***
gashw
12831.4063 3217.5971
3.988 7.60e-05 ***
garagepl
4244.8290
840.5442
5.050 6.07e-07 ***
prefarea
9369.5132 1669.0907
5.614 3.19e-08 ***
stories
6556.9457
925.2899
7.086 4.37e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 15420 on 534 degrees of freedom
3.1.7
The PE test to compare a loglinear specification with the linear

specification
To choose between a linear and a loglinear functional form Verbeek, p. 69, suggests
to make recourse to the PE test procedure4 .
We consider first the construction step by step of the test. We have first to obtain
the predictors in the linear and in the loglinear specifications.
> predlin <- predict(regr3.3)
> predloglin <- predict(regr3.2)
Then we have to consider the estimation of the augmented linear model by adding
the proper term, see Verbeek, and perform an ANOVA to compare the augmented
model with the initial one.
> linaugm <- update(regr3.3, . ~ . + I(log(predlin) predloglin))
> anova(linaugm, regr3.3)
4 At Verbeeks pp. 67-68 the encompassing procedure is presented to compare two non-nested
linear models. This is implemented in the R function encomptest, available in the package lmtest.
See the help ?lmtest::encomptest for more information on this function.
65

Model 1: price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories +
I(log(predlin) - predloglin)
recroom + fullbase + gashw + garagepl + prefarea + stories
Res.Df
RSS Df
Sum of Sq
F
Pr(>F)
1
533 1.1849e+11
2
534 1.2703e+11 -1 -8534724490 38.391 1.159e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Finally we have to estimate the augmented loglinear model by adding the proper
term, see again Verbeek, and perform an ANOVA to compare the augmented model
with the initial one.
> loglinaugm <- update(regr3.2, . ~ . + I(predlin exp(predloglin)))
> anova(loglinaugm, regr3.2)
driveway + recroom + fullbase + gashw + garagepl + prefarea
stories + I(predlin - exp(predloglin))
driveway + recroom + fullbase + gashw + garagepl + prefarea
stories
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
533 23.624
2
534 23.638 -1 -0.014342 0.3236 0.5697
+
+
+
+
However, in the package lmtest it is available the function petest that performs the
two previous tests by adding the proper augmentation terms to the linear and loglinear
models and returns the parameter estimates and the t statistics of the augmented
terms in the augmented models.
> library(lmtest)
> petest(regr3.3, regr3.2)
PE test
recroom + fullbase + gashw + garagepl + prefarea + stories
driveway + recroom + fullbase + gashw + garagepl + prefarea +
stories
66
M1 + log(fit(M1))-fit(M2)
-74774
12068 -6.1961 1.159e-09 ***
M2 + fit(M1)-exp(fit(M2))
0
0 -0.5688
0.5697
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Observe that the square of the t values correspond to the F statistic obtained before.
3.2
Selection procedures: Predicting Stock Index Returns

(Section 3.5)
In Section 3.2.2 Verbeek presents some criteria apt to perform regressor selection. In
2 , the stepwise,
Section 3.5 Verbeek compares the models corresponding to the max R
the min AIC and the min BIC criteria with a general unrestricted model explaining
the excess return on S&P 500 index, EXRET, conditional on the full set of regressors
consisting of:
CS_1: credit spread (yield on Moodys Aaa minus BBa debt), lagged one month,
DY_1: dividend yield S&P 500 index, lagged one month (in % per month),
I12_1: 12-month interest rate, lagged one month,
I12_2: 12-month interest rate, lagged two months,
I3_1: 1-month interest rate, lagged one month,
I3_2: 1-month interest rate, lagged two months,
INF_2: inflation, lagged two months,
IP_2: change industrial production, lagged two months,
MB_2: change in monetary base, lagged two months,
PE_1: price earnings ratio (S&P500), lagged one month,
TS_1: term spread, lagged one month,
WINTER: dummy, 1 in November to April, 0 otherwise,
First we read the data set available in the file predictsp.dat.

> pred <- read.table(unzip("ch03.zip", "Chapter 3/predictsp.dat"),
header = T)
One, as usual, may use the instructions head(pred), tail(pred) and
summary(pred) to explore the data and check the main summary statistics.
To check the time extension of the data in the data.frame, we can derive from the
variable OBS, which has the following text form: yyyyMmm, (e.g. 1985M07) the variables
year and month by extracting respectively the substrings consisting of the first 4 and
the last 2 characters:
> year <- substr(pred$OBS, 1, 4)
> month <- substr(pred$OBS, 6, 7)
67
We next construct a contingency table month by year. By looking at this table possibly
missing time records may be found:
> table(year, month)
month
year
01 02 03 04 05 06 07 08 09 10 11 12
1966 1 1 1 1 1 1 1 1 1 1 1 1
1967 1 1 1 1 1 1 1 1 1 1 1 1
1968 1 1 1 1 1 1 1 1 1 1 1 1
omitted
month
year
01 02 03 04 05 06 07 08 09 10 11 12
2003 1 1 1 1 1 1 1 1 1 1 1 1
2004 1 1 1 1 1 1 1 1 1 1 1 1
2005 1 1 1 1 1 1 1 1 1 1 1 1
To check if the time series is complete (presence of no missing data) one may control
if some entry in the preceding table exists, whose value is zero.
> prod(table(year, month))
[1] 1
In our case the time series is complete since the product of the entries in the table is
different from 0.
The regression analyses are proposed on the time window beginning at January 1966
and ending at December 1995, so it is possible to define a time series object and a
temporary variable to perform regressions.
> pred <- ts(data = pred, start = c(1966, 1), frequency = 12)
> predtmp <- window(pred, start = c(1966, 1), end = c(1995,
12))
To compare the goodness of the models, the out of sample forecasting performance
will also be evaluated on the time window starting at January 1996 and ending at
December 2005, see below Section 3.2.9.
> predctrl <- window(pred, start = c(1996, 1))
Pay attention to the definition of the variable EXRET, the excess return, which is
expressed as percentage; so in the following analyses it has to be divided by 100.
We now present the methods to implement the four main procedures described by
Verbeek to perform a model selection.
3.2.1
The full model
The full model consists simply in the estimation of the general unrestricted linear
model.
68
> regr3.4f <- lm(EXRET/100 ~ PE_1 + DY_1 + INF_2 +

IP_2 + I3_1 + I3_2 + I12_1 + I12_2 + MB_2 + CS_1 +
WINTER, data = predtmp)
> summary(regr3.4f)
Call:
lm(formula = EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 +
I3_2 + I12_1 + I12_2 + MB_2 + CS_1 + WINTER, data = predtmp)
Residuals:
Min
1Q
-0.194908 -0.023106
Median
0.001566
3Q
0.026069
Max
0.138232
Coefficients:
(Intercept) 0.020743
0.040228
0.516 0.60644
PE_1
-0.119712
0.129367 -0.925 0.35542
DY_1
0.126504
0.082880
1.526 0.12783
INF_2
-0.163318
0.076788 -2.127 0.03413 *
IP_2
-0.059783
0.061097 -0.978 0.32851
I3_1
0.268687
0.124505
2.158 0.03161 *
I3_2
-0.222916
0.121074 -1.841 0.06645 .
I12_1
-0.505236
0.123478 -4.092 5.33e-05 ***
I12_2
0.388662
0.127934
3.038 0.00256 **
MB_2
-0.043959
0.083836 -0.524 0.60037
CS_1
0.175387
0.109343
1.604 0.10962
WINTER
0.006249
0.004405
1.419 0.15693
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic: 6.47 on 11 and 348 DF, p-value: 8.4e-10
3.2.2
2 criterion
The max R
2 criterion, denoted in R by all/best subsets regression may be performed

The max R
by invoking the command regsubsets available in the package leaps. The first
argument is the model; in the present case EXRET/100~. includes as regressors all
the variables in the data set specified as third argument, the fourth argument is the
maximum number of subset models to consider and the fifth one an array specifying
the columns to exclude.
The result is an object listing the models ordered by their adjusted R2 values.
> library(leaps)
> regr3.4mr <- regsubsets(EXRET/100 ~ ., data = predtmp,
nvmax = 12, force.out = c(1, 12))
> summary(regr3.4mr)
Subset selection object
Call: regsubsets.formula(EXRET/100
force.out = c(1, 12))
13 Variables (and intercept)
Forced in Forced out
CS_1
FALSE
TRUE
DY_1
FALSE
FALSE
I12_1
FALSE
FALSE
I12_2
FALSE
FALSE
I3_1
FALSE
FALSE
I3_2
FALSE
FALSE
INF_2
FALSE
FALSE
IP_2
FALSE
FALSE
MB_2
FALSE
FALSE
PE_1
FALSE
FALSE
WINTER
FALSE
FALSE
OBS
FALSE
TRUE
TS_1
FALSE
FALSE
1 subsets of each size up to 12
Selection Algorithm: exhaustive
CS_1 DY_1 I12_1 I12_2 I3_1 I3_2
1 "*" " " " "
" "
" " " "
2 " " " " "*"
"*"
" " " "
3 "*" " " "*"
"*"
" " " "
4 "*" "*" "*"
"*"
" " " "
5 "*" "*" "*"
"*"
" " " "
6 "*" "*" "*"
"*"
"*" "*"
7 "*" "*" "*"
"*"
"*" "*"
8 "*" "*" "*"
"*"
"*" "*"
9 "*" "*" "*"
"*"
"*" "*"
10 "*" "*" "*"
"*"
"*" "*"
11 "*" "*" "*"
"*"
"*" "*"
69
~ ., data = predtmp, nvmax = 12,
INF_2
" "
" "
" "
" "
" "
" "
"*"
"*"
"*"
"*"
"*"
IP_2
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
MB_2
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
" "
"*"
PE_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
WINTER
" "
" "
" "
" "
"*"
" "
" "
" "
"*"
"*"
"*"
OBS
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
TS_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
2 , that is the adjusted R2 , for the

With the following instructions we can print R
selected models and obtain a graphical representation, see Fig. 3.1, representing the
2 s.
structure of the models, which have been considered, with their R
> cbind(model = 1:11, "Adjusted R-squared" = summary(regr3.4mr)$adjr2)
model Adjusted R-squared
[1,]
1
0.01899891
[2,]
2
0.08149153
[3,]
3
0.11350781
[4,]
4
0.12806215
[5,]
5
0.13212041
[6,]
6
0.13601132
[7,]
7
0.14031904
70
0.15
0.14
0.14
0.14
adjr2
0.14
0.14
0.13
0.13
0.11
0.081
Figure 3.1
TS_1
OBS
WINTER
PE_1
MB_2
IP_2
I3_2
INF_2
I3_1
I12_2
I12_1
DY_1
CS_1
(Intercept)
0.019
adjusted R squared values for subset regression models
[8,]
8
0.14310437
[9,]
9
0.14496314
[10,]
10
0.14532321
[11,]
11
0.14354387
> plot(regr3.4mr, scale = "adjr2")
We can extract the coefficients5 for the model with max R2
> coef(regr3.4mr, 10)
(Intercept)
CS_1
DY_1
I12_1
I12_2
0.031193369 0.158311342 0.102823200 -0.503855027 0.397589862
I3_1
I3_2
INF_2
IP_2
PE_1
0.258969875 -0.223074248 -0.156767570 -0.068582462 -0.157694683
WINTER
0.006415849
5 This
can also be made with the command coef(regr3.4mr,which.max(summary(regr3.4mr)$adjr2))
71
To compare the parameter estimates of this model with those pertaining the other
selection criteria we have to obtain an lm object with the estimation results.
This may be obtained by performing the following procedure.
1. The names of the variables, included as regressors in the selected model, correspond
to the names of the coefficients we have just obtained but the first element:
> anames <- names(coef(regr3.4mr, 10))[-1]
> anames
[1] "CS_1"
"DY_1"
"I12_1" "I12_2" "I3_1"
[7] "INF_2" "IP_2"
"PE_1"
"WINTER"
"I3_2"
2. The function match returns a vector of the positions of (first) matches of its first
argument in its second.
We can use this function to get the column indices in the data.frame predtmp
(specified as second argument) that match to the vector of elements consisting of
the dependent variable name, EXRET, and the independent variable names, anames,
(first argument).
> a <- match(c("EXRET", anames), colnames(predtmp))
> a
[1] 4 2 3 5 6 7 8 9 10 12 14
3. Finally the data.frames necessary to perform the linear regression and the
out of sample performance evaluation can be defined by selecting the involved
columns/variables in predtmp and predctrl, and the regr3.4mr is obtained as an
lm object.
> predtmpmr <- data.frame(predtmp[, a])
> predctrlmr <- data.frame(predctrl[, a])
> regr3.4mr <- lm(EXRET/100 ~ ., data = predtmpmr)
> summary(regr3.4mr)
Call:
lm(formula = EXRET/100 ~ ., data = predtmpmr)
Residuals:
Min
1Q
-0.195037 -0.023523
Median
0.001951
3Q
0.026865
Max
0.138763
Coefficients:
0.034907
0.894 0.37214
CS_1
0.158311
0.104272
1.518 0.12986
DY_1
0.102823
0.069422
1.481 0.13947
I12_1
-0.503855
0.123322 -4.086 5.46e-05 ***
I12_2
0.397590
0.126664
3.139 0.00184 **
I3_1
0.258970
0.122990
2.106 0.03595 *
I3_2
-0.223074
0.120947 -1.844 0.06597 .
72
INF_2
-0.156768
IP_2
-0.068582
PE_1
-0.157695
WINTER
0.006416
--Signif. codes: 0 "***"
0.075686
0.058686
0.107072
0.004389
-2.071
-1.169
-1.473
1.462
0.03907 *
0.24335
0.14171
0.14470
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

3.2.3
Stepwise
In the package MASS the instruction dropterm is available, which returns some
statistics for testing the presence of every term appearing in a linear regression model.
An initial model, which we assume to be a general unrestricted model, may then be
recursively improved, by alternating the functions dropterm and update, until no
regressor needs to be excluded. In the next section we report an algorithm to perform
a stepwise backward selection procedure in an automatic way.
The syntax of dropterm consists of two arguments: the first one is an lm object,
that is an object resulting from a linear model estimation; the latter is the test to be
performed, in our case an ANOVA consisting in an F test. We recall that the function
update has two main arguments: the first is the lm object to update; the second one
is the updating formula.
We first present the sequence of steps for the model selection in the current case
study. The variable with the lowest (non significant) F statistic will be excluded at
each step.
> library(MASS)
> dropterm(regr3.4f, test = "F")
Single term deletions
Model:
EXRET/100
I12_2
Df
<none>
PE_1
1
DY_1
1
INF_2
1
IP_2
1
I3_1
1
I3_2
1
I12_1
1
I12_2
1
MB_2
1
~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +

+ MB_2 + CS_1 + WINTER
Sum of Sq
RSS
AIC F Value
Pr(F)
0.56287 -2301.9
0.0013850 0.56425 -2303.0 0.8563 0.355416
0.0037682 0.56663 -2301.5 2.3297 0.127832
0.0073167 0.57018 -2299.2 4.5236 0.034133 *
0.0015486 0.56441 -2302.9 0.9574 0.328514
0.0075326 0.57040 -2299.1 4.6571 0.031608 *
0.0054829 0.56835 -2300.4 3.3899 0.066449 .
0.0270789 0.58994 -2287.0 16.7420 5.326e-05 ***
0.0149278 0.57779 -2294.5 9.2293 0.002562 **
0.0004447 0.56331 -2303.6 0.2749 0.600375
CS_1
1 0.0041614 0.56703 -2301.2 2.5729 0.109619
WINTER 1 0.0032546 0.56612 -2301.8 2.0122 0.156931
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_1 <- update(regr3.4f, . ~ . - MB_2)
> dropterm(step_1, test = "F")
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56331 -2303.6
PE_1
1 0.0035011 0.56681 -2303.4 2.1691
0.14171
DY_1
1 0.0035409 0.56685 -2303.3 2.1938
0.13947
INF_2
1 0.0069248 0.57024 -2301.2 4.2903
0.03907 *
IP_2
1 0.0022044 0.56551 -2304.2 1.3657
0.24335
I3_1
1 0.0071561 0.57047 -2301.1 4.4336
0.03595 *
I3_2
1 0.0054907 0.56880 -2302.1 3.4018
0.06597 .
I12_1
1 0.0269434 0.59025 -2288.8 16.6928 5.456e-05 ***
I12_2
1 0.0159032 0.57921 -2295.6 9.8528
0.00184 **
CS_1
1 0.0037206 0.56703 -2303.2 2.3051
0.12986
WINTER 1 0.0034489 0.56676 -2303.4 2.1368
0.14470
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_2 <- update(step_1, . ~ . - IP_2)
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 +
CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56551 -2304.2
PE_1
1 0.0027553 0.56827 -2304.4 1.7053 0.192459
DY_1
1 0.0042573 0.56977 -2303.5 2.6349 0.105441
INF_2
1 0.0056055 0.57112 -2302.7 3.4693 0.063355 .
I3_1
1 0.0078989 0.57341 -2301.2 4.8886 0.027680 *
I3_2
1 0.0052475 0.57076 -2302.9 3.2477 0.072385 .
I12_1
1 0.0286570 0.59417 -2288.4 17.7360 3.231e-05 ***
I12_2
1 0.0153555 0.58087 -2296.6 9.5036 0.002213 **
CS_1
1 0.0117877 0.57730 -2298.8 7.2954 0.007249 **
WINTER 1 0.0029908 0.56851 -2304.3 1.8510 0.174540
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
73
74
> step_3 <- update(step_2, . ~ . - PE_1)

Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1 +
WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56827 -2304.4
DY_1
1 0.0146810 0.58295 -2297.3 9.0679 0.002790 **
INF_2
1 0.0037588 0.57203 -2304.1 2.3217 0.128485
I3_1
1 0.0086626 0.57693 -2301.0 5.3506 0.021293 *
I3_2
1 0.0060113 0.57428 -2302.7 3.7130 0.054798 .
I12_1
1 0.0295562 0.59783 -2288.2 18.2558 2.491e-05 ***
I12_2
1 0.0168036 0.58507 -2296.0 10.3790 0.001394 **
CS_1
1 0.0111500 0.57942 -2299.5 6.8870 0.009062 **
WINTER 1 0.0032091 0.57148 -2304.4 1.9822 0.160048
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_4 <- update(step_3, . ~ . - WINTER)
Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.57148 -2304.4
DY_1
1 0.016450 0.58793 -2296.2 10.1323 0.001587 **
INF_2
1 0.004495 0.57597 -2303.6 2.7688 0.097007 .
I3_1
1 0.009860 0.58134 -2300.3 6.0733 0.014201 *
I3_2
1 0.005593 0.57707 -2302.9 3.4452 0.064270 .
I12_1
1 0.031646 0.60312 -2287.0 19.4919 1.346e-05 ***
I12_2
1 0.015937 0.58742 -2296.5 9.8162 0.001875 **
CS_1
1 0.012315 0.58379 -2298.8 7.5852 0.006191 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_5 <- update(step_4, . ~ . - INF_2)
Model:
EXRET/100
Df
<none>
DY_1
1
I3_1
1
~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1

Sum of Sq
RSS
AIC F Value
Pr(F)
0.57597 -2303.6
0.0119841 0.58796 -2298.2 7.3447 0.0070543 **
0.0083165 0.58429 -2300.4 5.0970 0.0245780 *
I3_2
I12_1
I12_2
CS_1
--Signif.
1
1
1
1
0.0072565
0.0303013
0.0201211
0.0129920
codes:
0.58323
0.60628
0.59610
0.58897
75
-2301.1 4.4473 0.0356604 *

-2287.2 18.5709 2.126e-05 ***
-2293.2 12.3317 0.0005031 ***
-2297.6 7.9624 0.0050461 **
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The final model can be estimated by applying the function lm to the formula defined
at step 5.
> regr3.4sw <- lm(step_5)
> summary(regr3.4sw)
Call:
lm(formula = step_5)
Residuals:
Min
1Q
-0.208944 -0.025168
Median
0.000683
3Q
0.027499
Max
0.128369
Coefficients:
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
3.2.4
>
>
>
>
An algorithm to perform a stepwise backward elimination of

regressors
check <- 0
step <- 0
regr3.4sw <- regr3.4f
while (check == 0) {
step <- step + 1
a <- dropterm(regr3.4sw, test = "F")
ind <- which.min(as.numeric(a[[5]]))
76
ifelse(a[[5]][ind] <= 1.96^2, {

vartodrop <- row.names(a[5])[ind]
print(paste("at step ", step, " variable ",
vartodrop, " is excluded", collapse = ""))
regr3.4sw <- update(regr3.4sw, as.formula(paste(".~.-",
vartodrop, collapse = "")))
}, check <- 1)
}
[1] "at step 1 variable MB_2 is excluded"
[1] "at step 2 variable IP_2 is excluded"
[1] "at step 3 variable PE_1 is excluded"
[1] "at step 4 variable WINTER is excluded"
[1] "at step 5 variable INF_2 is excluded"
> summary(regr3.4sw)
Call:
lm(formula = EXRET/100 ~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2 +
CS_1, data = predtmp)
Residuals:
Min
1Q
-0.208944 -0.025168
Median
0.000683
3Q
0.027499
Max
0.128369
Coefficients:
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
3.2.5
AIC
The command stepAIC, available in the package MASS, performs a stepwise model
selection by the Akaike Information Criterion. The first argument is the general
unrestricted linear model the procedure will be applied to; the option trace=0
suppresses the output of the procedure. See the help ?stepAIC for more information
on this function.
> library(MASS)
> regr3.4aic <- stepAIC(regr3.4f, trace = 0)
> regr3.4aic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1 +
WINTER
Step Df
Deviance Resid. Df Resid. Dev
AIC
1
348 0.5628657 -2301.895
2 - MB_2 1 0.0004446865
349 0.5633103 -2303.610
3 - IP_2 1 0.0022043540
350 0.5655147 -2304.204
4 - PE_1 1 0.0027552908
351 0.5682700 -2304.455
> summary(regr3.4aic)
Call:
lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
Residuals:
Min
1Q
-0.202784 -0.023496
Median
0.002058
3Q
0.026805
Max
0.136163
Coefficients:
0.010225 -2.133 0.03360 *
DY_1
0.166210
0.055195
3.011 0.00279 **
INF_2
-0.106879
0.070144 -1.524 0.12848
I3_1
0.283111
0.122393
2.313 0.02129 *
I3_2
-0.232290
0.120551 -1.927 0.05480 .
I12_1
-0.524830
0.122834 -4.273 2.49e-05 ***
I12_2
0.406251
0.126100
3.222 0.00139 **
CS_1
0.222507
0.084787
2.624 0.00906 **
WINTER
0.006159
0.004375
1.408 0.16005
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
77
78
3.2.6
BIC
By considering in the command stepAIC, available in the package MASS, the argument
k=log(n) specifying the penalty parameter6 , where n is the length of the data set, it is
possible to perform a stepwise model selection by the Bayesian Information Criterion
(BIC) or Schwarz Bayesian Criterion (SBC). The other argument in stepAIC, we
recall, is the linear model the procedure applies to (the option trace=0 suppresses
the output of the procedure).
See the help ?stepAIC for more information on this function.
> regr3.4bic <- stepAIC(regr3.4f, k = log(length(regr3.4f$res)),
trace = 0)
> regr3.4bic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + I12_1 + I12_2 + CS_1
Step Df
Deviance Resid. Df Resid. Dev
1
348 0.5628657
2
- MB_2 1 0.0004446865
349 0.5633103
3
- IP_2 1 0.0022043540
350 0.5655147
4
- PE_1 1 0.0027552908
351 0.5682700
5 - WINTER 1 0.0032091205
352 0.5714791
6 - INF_2 1 0.0044952452
353 0.5759744
7
- I3_2 1 0.0072564641
354 0.5832308
8
- I3_1 1 0.0013361484
355 0.5845670
> summary(regr3.4bic)
Call:
lm(formula = EXRET/100 ~ DY_1 + I12_1 + I12_2 +
Residuals:
Min
1Q
-0.214771 -0.025708
Median
0.001165
3Q
0.027578
AIC
-2255.261
-2260.863
-2265.343
-2269.480
-2273.338
-2276.404
-2277.783
-2282.845
Max
0.132124
Coefficients:
0.00941 -1.308 0.19187
DY_1
0.12378
0.04698
2.635 0.00879 **
6k
is by default set equal to 2 giving the AIC criterion.
I12_1
-0.27563
I12_2
0.20603
CS_1
0.22539
0.05083
0.05298
0.08369
79
-5.423 1.09e-07 ***

3.889 0.00012 ***
2.693 0.00742 **
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

3.2.7
A better output to compare the results
To compare the parameter estimates in the previous models, we can make use of the
function mtable available7 in the package memisc.
> library(memisc)
> mtable3.4 <- mtable(full = regr3.4f, "max adj R2" = regr3.4mr,
stepwise = regr3.4sw, "min AIC" = regr3.4aic,
"min BIC" = regr3.4bic)
PE_1 = "pe_{t-1}", DY_1 = "dy_{t-1}", INF_2 = "infl_{t-1}",
IP_2 = "ip_{t-2}", I3_1 = "i3_{t-1}", I3_2 = "i3_{t-2}",
I12_1 = "i12_{t-1}", I12_2 = "i12_{t-2}", MB_2 = "mb_{t-2}",
CS_1 = "cs_{t-1}", WINTER = "winter_t")
> mtable3.4
Calls:
full: lm(formula = EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 +
I3_2 + I12_1 + I12_2 + MB_2 + CS_1 + WINTER, data = predtmp)
max adj R2: lm(formula = EXRET/100 ~ ., data = predtmpmr)
stepwise: lm(formula = EXRET/100 ~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2+
min AIC: lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
min BIC: lm(formula = EXRET/100 ~ DY_1+I12_1+I12_2+CS_1,data=predtmp)
======================================================================
full
max adj R2 stepwise
min AIC
min BIC
---------------------------------------------------------------------constant
0.021
0.031
-0.013
-0.022*
-0.012
(0.040)
(0.035)
(0.009)
(0.010)
(0.009)
pe_{t-1}
-0.120
-0.158
(0.129)
(0.107)
dy_{t-1}
0.127
0.103
0.130**
0.166**
0.124**
7 Observe the use of the double quotes in the mtable call to specify the names of the lm objects
in the output: they are needed only when spaces are present in the name assigned to the lm object.
80
(0.083)
(0.069)
(0.048)
(0.055)
(0.047)
-0.163*
-0.157*
-0.107
(0.077)
(0.076)
(0.070)
ip_{t-2}
-0.060
-0.069
(0.061)
(0.059)
i3_{t-1}
0.269*
0.259*
0.274*
0.283*
(0.125)
(0.123)
(0.121)
(0.122)
i3_{t-2}
-0.223
-0.223
-0.252*
-0.232
(0.121)
(0.121)
(0.120)
(0.121)
i12_{t-1}
-0.505*** -0.504*** -0.528*** -0.525*** -0.276***
(0.123)
(0.123)
(0.123)
(0.123)
(0.051)
i12_{t-2}
0.389**
0.398**
0.435***
0.406**
0.206***
(0.128)
(0.127)
(0.124)
(0.126)
(0.053)
mb_{t-2}
-0.044
(0.084)
cs_{t-1}
0.175
0.158
0.239**
0.223**
0.225**
(0.109)
(0.104)
(0.085)
(0.085)
(0.084)
winter_t
0.006
0.006
0.006
(0.004)
(0.004)
(0.004)
---------------------------------------------------------------------R-squared
0.170
0.169
0.150
0.162
0.138
adj. R-squared
0.144
0.145
0.136
0.143
0.128
sigma
0.040
0.040
0.040
0.040
0.041
F
6.470
7.104
10.419
8.470
14.182
p
0.000
0.000
0.000
0.000
0.000
Log-likelihood
652.129
651.987
647.985
650.409
645.320
Deviance
0.563
0.563
0.576
0.568
0.585
AIC
-1278.259 -1279.975 -1279.971 -1280.819 -1278.640
BIC
-1227.740 -1233.341 -1248.882 -1241.958 -1255.323
N
360
360
360
360
360
======================================================================
infl_{t-1}
3.2.8
Some remarks on the AIC and BIC values
The values of the AIC statistics given by R for the estimated models in section 3.2.5
differ from those reported in the mtable output and also from the ones in Verbeek.
In Verbeek formula (3.17) AIC is defined as:

AIC = log
N
1 X 2 2k
e +
N i=1 i
N
where N is the dimension of the data set and k is the number of the unknown
parameters in the model.
In R the function extractAIC is available, which computes the AIC as:

AIC = 2 log(L) + 2 k
(3.1)
81
where L is the likelihood, but in case of a linear model with unknown scale
parameter, if RSS denotes the residual sum of squares then extractAIC uses
N log(RSS/N ) for 2 log(L), then we have:
AIC = N log
N
1 X 2
e + 2k
N i=1 i
(3.2)
Thus (3.2) results N times the value obtained according to Verbeeks

relationship (3.17).
In the report obtained with the function mtable the AIC and BIC are computed
by using the function AIC, according to relationship (3.1) by considering the
estimate of the logLikelihood given by the function logLik, (k is multiplied by
2 in the case of AIC and by log(N ) in the case of BIC).
Observe that if k is the number of the parameters in a linear regression model, when
the maximum likelihood estimation method is applied to a linear regression model,
k is substituted by k + 1 since the maximum likelihood estimation method considers
also the variance of the error as a parameter to estimate.
So with regard to the computation of AIC, e.g., for the full model we have that the
value given by mtable may be obtained as:
> -2 * logLik(regr3.4f) + 2 * (12 + 1)
'log Lik.' -1278.259 (df=13)
or, simply, as:
> AIC(regr3.4f)
[1] -1278.259
which divided by N = 360 is quite close to the value reported in Verbeeks Table 3.4
at page 73 (The exact correspondence can be created by substituting 2 * 12 instead
of 2 * (12 + 1))
> AIC(regr3.4f)/360
[1] -3.550719
The value given by extractAIC, computed according to (3.2), is
> extractAIC(regr3.4f)
[1]
12.000 -2301.895
and is equivalent to
> 360 * log(sum(residuals(regr3.4f)^2)/360) + 2 * 12
[1] -2301.895
and dividing by 360 we obtain the AIC value according to Verbeeks formula (3.17)
> log(sum(residuals(regr3.4f)^2)/360) + 2 * 12/360
[1] -6.394152
82
3.2.9
Out of sample forecasting performance (Table 3.5)
Verbeek presents in Table 3.5 the results of the out of sample forecasting performance,
which may be obtained by applying proper functions, here coded in the following
function outsamplefit, to the actual excess returns and the predicted ones8 .
> actual <- predctrl[, "EXRET"]/100
> outsamplefit <- function(actual, predict, name = "") {
mad <- sum(abs(predict - actual))/length(predict)
mape <- sum(abs(predict - actual)/actual)/length(predict)
rmse <- (sum((predict - actual)^2)/length(predict))^0.5
r2os1 <- 1 - sum((predict - actual)^2)/sum((mean(predtmp[,
"EXRET"]/100) - actual)^2)
r2os2 <- (cor(predict, actual))^2
hit <- sum(sign(predict) == sign(actual))/length(predict)
output <- rbind(RMSE = rmse, MAD = mad, MAPE = mape,
r2os1 = r2os1, r2os2 = r2os2, hit = hit)
colnames(output) <- name
output
}
> pr_f <- predict(regr3.4f, predctrl)
> pr_mr <- predict(regr3.4mr, predctrlmr)
> pr_sw <- predict(regr3.4sw, predctrl)
> pr_aic <- predict(regr3.4aic, predctrl)
> pr_bic <- predict(regr3.4bic, predctrl)
> outofsamplefit <- cbind(full = outsamplefit(actual,
predict=pr_f,name="full"),"max adj R2"=outsamplefit(actual,
predict=pr_mr,name="max adj R2"),stepwise=outsamplefit(actual,
predict=pr_sw,name="stepwise"),"min AIC"=outsamplefit(actual,
predict=pr_aic,name="min AIC"),"min BIC"=outsamplefit(actual,
predict=pr_bic,name="min BIC"))
> outofsamplefit[c(1:2, 6), ] <- 100 * outofsamplefit[c(1:2,
6), ]
> round(outofsamplefit, 4)
full max adj R2 stepwise min AIC min BIC
RMSE
4.8332
4.9362
4.8421 4.8843 4.7903
MAD
3.7913
3.8994
3.8040 3.8519 3.7480
MAPE
0.6998
0.6634
0.7736 0.9517 0.4830
r2os1 -0.1583
-0.2082 -0.1626 -0.1830 -0.1379
r2os2 0.0094
0.0105
0.0003 0.0003 0.0000
hit
50.0000
49.1667 48.3333 46.6667 47.5000
Pay attention to the different meaning of the values in the last output: results in the
1st, 2nd and 6th rows are expressed as percentages.
8 Observe that the predicted values for the max adjusted R2 model are based on the data.frame
predctrlmr, defined at the step 4. of the procedure described in Section 3.2.2
3.3
83
Explaining Individual Wages (Section 3.6)
Data may be read by means of the function read.table, having extracted the file
bwages.dat from the compressed file ch03.zip.
The summary statistics mean and standard deviation can be obtained for a single
variable, say WAGE, EDUC or EXPER, conditional on the levels of the variable MALE by
using the function tapply. The first argument is the variable we want to study (a
column of a data.frame); the second argument is a conditioning variable; the third
argument is the function used to study the variable in the first argument.
> indwages <- read.table(unzip("ch03.zip", "Chapter 3/bwages.dat"),
header = T)
> tapply(indwages$WAGE, indwages$MALE, mean)
0
1
10.26154 11.56223
We can combine the use of sapply to choose which variables the mean and standard
deviations are computed for
> means <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, mean))
> stdevs <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, sd))
> meanandstd <- array(c(means = means, stdevs = stdevs),
c(2, 3, 2))
> dimnames(meanandstd) <- list(c("females", "males"),
names(indwages)[c(1, 3, 4)], c("means", "stdevs"))
> meanandstd
, , means
WAGE
EDUC
EXPER
females 10.26154 3.587219 15.20380
males
11.56223 3.243001 18.52296
, , stdevs
WAGE
EDUC
EXPER
females 3.808585 1.086521 9.704987
males
4.753789 1.257386 10.251041
We have omitted from the summary analysis the variables MALE, LNWAGE, LNEXPER
and LNEDUC.
3.3.1
Linear Models (Section 3.6.1)
We estimate the parameter in the linear model, Table 3.7 in Verbeek

WAGE = 1 + 2 MALE + 3 EDUC + 4 EXPER + ERROR
84
> indwages3.7 <- lm(WAGE ~ MALE + EDUC + EXPER, data = indwages)

> summary(indwages3.7)
Call:
lm(formula = WAGE ~ MALE + EDUC + EXPER, data = indwages)
Residuals:
Min
1Q
-13.5294 -1.9686
Median
-0.3124
3Q
1.5679
Max
30.7015
Coefficients:
0.386895
0.552
0.581
MALE
1.346144
0.192736
6.984 4.32e-12 ***
EDUC
1.986090
0.080640 24.629 < 2e-16 ***
EXPER
0.192275
0.009583 20.064 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic:
282 on 3 and 1468 DF, p-value: < 2.2e-16
To include the effect of the squared number of years of experience, see Table 3.8:
WAGE = 1 + 2 MALE + 3 EDUC + 4 EXPER + 5 EXPER2 + ERROR
use
> indwages3.8 <- lm(WAGE ~ MALE + EDUC + EXPER + I(EXPER^2),
data = indwages)
Call:
lm(formula = WAGE ~ MALE + EDUC + EXPER + I(EXPER^2), data = indwages)
Residuals:
Min
1Q
-12.7246 -1.9519
Median
-0.3107
3Q
1.5117
Max
30.5951
Coefficients:
(Intercept) -0.8924849 0.4329127 -2.062
0.0394
MALE
1.3336935 0.1908668
6.988 4.23e-12
EDUC
1.9881267 0.0798526 24.897 < 2e-16
EXPER
0.3579993 0.0316566 11.309 < 2e-16
I(EXPER^2) -0.0043692 0.0007962 -5.487 4.80e-08
---
*
***
***
***
***
Signif. codes:
85
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

In presence of a linear model object lm we can automatically produce 4 plots, see
Figure 3.2, by means of the function plot using as first argument the object resulting
from a linear model and specifying as the second, which, a number from 1 to 4, see
Faraway (2002) Chapter 7:
1. if which = 1 we obtain the plot of residuals versus fitted values;
2. if which = 2 we obtain the normal Q-Q plot of the standardized residuals;
3. if which = 3 we obtain the plot of the square root of absolute standardized
residuals versus fitted values;
4. if which = 4 we obtain the plot of the Cook statistic to identify influential
observations, see Faraway (2002), p. 78;
The function layout(matrix) creates a multifigure environment, the numbers in the
matrix (in our instance a 2 2 matrix) define the pointer sequence specifying the
order the different graphs will appear.
> a <- matrix(1:4, 2, 2)
> a
[,1] [,2]
[1,]
1
3
[2,]
2
4
> layout(a)
> for (i in 1:4) plot(indwages3.8, which = i)
3.3.2
Loglinear Models (Section 3.6.2)
We have to estimate the model:

log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + 4 log(EXPER) + 5 (log(EXPER))2 + ERROR
Observe that when a value of EXPER is null, the corresponding value of log(EXPER) is
not defined: Verbeek has constructed in the dataset wages in Belgium the variable
LNEXPER as log(EXPER+1) to avoid this situation.
> indwages3.9 <- lm(LNWAGE ~ MALE + LNEDUC + LNEXPER +
I(LNEXPER^2), data = indwages)
Call:
lm(formula = LNWAGE ~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2),
data = indwages)
86
10
3.0
ScaleLocation
1446
2.0
15
Fitted values
0.08
264
1404
0.04
1446
0.00
Cook's distance
0 2 4 6 8
4
Standardized residuals
1165
1404
15
Cook's distance
1446
10
Fitted values
Normal QQ
1165
1404
1.0
1165
1404
0.0
1446
Standardized residuals
10 20 30
10
Residuals
Residuals vs Fitted
Theoretical Quantiles
500
1000
1500
Obs. number
Figure 3.2 Graphs that can be obtain directly from the lm object indwages3.8 related to
the linear model
Residuals:
Min
1Q
-1.75085 -0.15921
Median
0.00618
3Q
0.17145
Max
1.10533
Coefficients:
(Intercept)
1.26271
0.06634 19.033 < 2e-16 ***
MALE
0.11794
0.01557
7.574 6.35e-14 ***
LNEDUC
0.44218
0.01819 24.306 < 2e-16 ***
LNEXPER
0.10982
0.05438
2.019
0.0436 *
I(LNEXPER^2) 0.02601
0.01148
2.266
0.0236 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
87
Residuals vs Fitted
1.0
0.5
1.0
0.5
0.0
Residuals
1.5
462
312
2.0
677
1.4
1.6
1.8
2.0
2.2
2.4
2.6
2.8
Fitted values
lm(LNWAGE ~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2))
Figure 3.3
Residuals against fitted values, loglinear model
F-statistic: 223.1 on 4 and 1467 DF,
p-value: < 2.2e-16
Figure 3.3 shows that heteroscedasticity is much less pronounced than for the additive
model. It may be obtained by means of the following instruction.
> plot(indwages3.9, which = 1)
Model without the effect of LNEXPER and of LNEXPER2
To check the joint effect of log(EXPER) and (log(EXPER))2 , we consider the parameter
estimation of the model:
log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + ERROR
> indwages3.9restr <- update(indwages3.9, . ~ . - LNEXPER I(LNEXPER^2))

> anova(indwages3.9restr, indwages3.9)
88
Model 1: LNWAGE
Model 2: LNWAGE
Res.Df
RSS
1
1469 158.57
2
1467 120.20
--Signif. codes:
~ MALE + LNEDUC
~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2)
Df Sum of Sq
F
Pr(>F)
2
38.362 234.09 < 2.2e-16 ***
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Model do not including the effect of LNEXPER2

We consider the parameter estimation of the model:
log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + 4 log(EXPER) + ERROR
> indwages3.10 <- update(indwages3.9, . ~ . - I(LNEXPER^2))

Call:
lm(formula = LNWAGE ~ MALE + LNEDUC + LNEXPER, data = indwages)
Residuals:
Min
1Q
-1.70476 -0.15862
Median
0.00485
3Q
0.17366
Max
1.11815
Coefficients:
(Intercept) 1.14473
0.04118 27.798 < 2e-16 ***
MALE
0.12008
0.01556
7.715 2.22e-14 ***
LNEDUC
0.43662
0.01805 24.188 < 2e-16 ***
LNEXPER
0.23065
0.01073 21.488 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic:
295 on 3 and 1468 DF, p-value: < 2.2e-16
Model with education considered as a factor
We can now consider the education as a factor to study the effects of the different
education levels against the first level of education.
The model matrix (hereafter p1), that is the matrix X containing the values of the
regressors, some of which are dummy variables recoding the factor education, can
be obtained by using the function model.matrix whose first argument is the second
member of the formula defining the linear model, while its second argument is the
data.frame containing the variables involved in the linear model (some rows of p1
are reported).
> indwages$EDUC <- as.factor(indwages$EDUC)

> p1 <- model.matrix(~MALE + EDUC + LNEXPER,
> p1[c(1:2, 100:101, 365:366, 785:786), ]
(Intercept) MALE EDUC2 EDUC3 EDUC4 EDUC5
1
1
1
0
0
0
0
2
1
0
0
0
0
0
100
1
1
1
0
0
0
101
1
1
1
0
0
0
365
1
1
0
1
0
0
366
1
1
0
1
0
0
785
1
1
0
0
1
0
786
1
0
0
0
1
0
89
data = indwages)
LNEXPER
3.178054
2.772589
2.890372
3.218876
3.663562
1.791759
2.772589
1.609438
In defining the formula for the linear model we can express log(indwages$WAGE) as a
function of the model matrix p1 we have just obtained; we have to remember to drop
the constant since it already appears in the model matrix p1.
> indwages3.11 <- lm(log(indwages$WAGE) ~ -1 + p1)
Call:
lm(formula = log(indwages$WAGE) ~ -1 + p1)
Residuals:
Min
1Q
-1.65548 -0.15138
Median
0.01324
3Q
0.16998
Max
1.11684
Coefficients:
p1(Intercept) 1.27189
0.04483 28.369 < 2e-16 ***
p1MALE
0.11762
0.01546
7.610 4.88e-14 ***
p1EDUC2
0.14364
0.03336
4.306 1.77e-05 ***
p1EDUC3
0.30487
0.03202
9.521 < 2e-16 ***
p1EDUC4
0.47428
0.03301 14.366 < 2e-16 ***
p1EDUC5
0.63910
0.03322 19.237 < 2e-16 ***
p1LNEXPER
0.23022
0.01056 21.804 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic: 1.455e+04 on 7 and 1465 DF, p-value: < 2.2e-16
Note that since EDUC was recoded as a factor the above result may be also obtained
with
> indwages3.11 <- lm(log(indwages$WAGE) ~ MALE + EDUC +
LNEXPER, data = indwages)
90
Call:
lm(formula = log(indwages$WAGE)~MALE+EDUC+LNEXPER, data=indwages)
Residuals:
Min
1Q
-1.65548 -0.15138
Median
0.01324
3Q
0.16998
Max
1.11684
Coefficients:
(Intercept) 1.27189
0.04483 28.369
MALE
0.11762
0.01546
7.610
EDUC2
0.14364
0.03336
4.306
EDUC3
0.30487
0.03202
9.521
EDUC4
0.47428
0.03301 14.366
EDUC5
0.63910
0.03322 19.237
LNEXPER
0.23022
0.01056 21.804
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
< 2e-16
4.88e-14
1.77e-05
< 2e-16
< 2e-16
< 2e-16
< 2e-16
***
***
***
***
***
***
***
"*" 0.05 "." 0.1 " " 1

without having to drop the constant. The recourse to the model.matrix function
allows different parametrizations for the factor variable to be considered (via the
contrasts argument).
3.3.3
The Effects of Gender (Section 3.6.3)
To study the effects of gender we have to include the interactions MALE:EDUC and
MALE:LNEXPER. Remember that with the (*) the direct effects of the arguments
it applies and their interactions (which are usually defined by the : operator) are
included. So we have two ways for defining the model matrix:
model.matrix(~MALE+EDUC+LNEXPER+MALE:EDUC+MALE:LNEXPER,indwages)
model.matrix(~MALE*EDUC+MALE*LNEXPER,indwages)
We will use the second method which is more compact, directly within the lm formula.
> indwages3.12 <- lm(log(indwages$WAGE) ~ MALE * EDUC +
MALE * LNEXPER, data = indwages)
Call:
lm(formula = log(indwages$WAGE) ~ MALE * EDUC + MALE * LNEXPER,
data = indwages)
Residuals:
Min
1Q
-1.63955 -0.15328
Median
0.01225
3Q
0.16647
91
Max
1.11698
Coefficients:
(Intercept)
1.21584
0.07768 15.652 < 2e-16 ***
MALE
0.15375
0.09522
1.615 0.106595
EDUC2
0.22411
0.06758
3.316 0.000935 ***
EDUC3
0.43319
0.06323
6.851 1.08e-11 ***
EDUC4
0.60191
0.06280
9.585 < 2e-16 ***
EDUC5
0.75491
0.06467 11.673 < 2e-16 ***
LNEXPER
0.20744
0.01655 12.535 < 2e-16 ***
MALE:EDUC2
-0.09651
0.07770 -1.242 0.214381
MALE:EDUC3
-0.16677
0.07340 -2.272 0.023215 *
MALE:EDUC4
-0.17236
0.07440 -2.317 0.020663 *
MALE:EDUC5
-0.14616
0.07551 -1.935 0.053123 .
MALE:LNEXPER 0.04063
0.02149
1.891 0.058875 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
We can perform an ANOVA analysis to evaluate if the joint effect of the interactions
MALE:EDUC and MALE:LNEXPER is significant.
> anova(indwages3.11, indwages3.12)
Model 1: log(indwages$WAGE) ~ MALE + EDUC +
Model 2: log(indwages$WAGE) ~ MALE * EDUC +
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
1465 116.47
2
1460 115.37 5
1.0957 2.7732 0.01683
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
LNEXPER
MALE * LNEXPER
*
0.05 "." 0.1 " " 1
A Model without the interaction between male and education

As before we can define the proper model matrix which will be used in the linear
model call.
> indwages3.13 <- lm(log(indwages$WAGE) ~ MALE + EDUC *
LNEXPER, data = indwages)
Call:
lm(formula = log(indwages$WAGE) ~ MALE + EDUC * LNEXPER, data = indwages)
92
Residuals:
Min
1Q
-1.63623 -0.15046
Median
0.00831
3Q
0.16713
Max
1.12415
Coefficients:
(Intercept)
1.48891
0.21203
7.022 3.34e-12 ***
MALE
0.11597
0.01548
7.493 1.16e-13 ***
EDUC2
0.06727
0.22628
0.297
0.7663
EDUC3
0.13525
0.21889
0.618
0.5367
EDUC4
0.20495
0.21946
0.934
0.3505
EDUC5
0.34130
0.21808
1.565
0.1178
LNEXPER
0.16312
0.06539
2.494
0.0127 *
EDUC2:LNEXPER 0.01933
0.07049
0.274
0.7839
0.06821
0.731
0.4647
0.06877
1.277
0.2017
0.06822
1.465
0.1430
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4
Heteroscedasticity and
Autocorrelation
4.1
Explaining Labour Demand (Section 4.5)
We import data from the file labour2.wf1, which is a work file of EViews.
We have first to invoke the package hexView and next the command readEViews.
The function unzip extracts a file from a compressed archive.
> library(hexView)
> labour <- readEViews(unzip("ch04.zip", "Chapter 4/labour2.wf1"))
Skipping boilerplate variable
Recall that by using the functions head(labour), tail(labour) and summary(labour) it is possible to explore the beginning, the final section and to obtain
summary statistics for all the variables included in the data-frame.
4.1.1
Linear Model
In Verbeek, see Table 4.1, it is first proposed the estimation of the following linear
model:
LABOR = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
This can be made by using the funtion lm, see Appendix A.5.
> labour4.1 <- lm(LABOR ~ WAGE + OUTPUT + CAPITAL,
data = labour)
> summary(labour4.1)
Call:
lm(formula = LABOR ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-1267.04
-54.11
Median
-14.02
3Q
37.20
Max
1560.48
94
Heteroscedasticity and Autocorrelation
Coefficients:
19.6418
14.65
<2e-16 ***
WAGE
-6.7419
0.5014 -13.45
<2e-16 ***
OUTPUT
15.4005
0.3556
43.30
<2e-16 ***
CAPITAL
-4.5905
0.2690 -17.07
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.1.2
Breusch-Pagan test - construction
The Breusch-Pagan test (Table 4.2) can be used to check for the presence of
heteroscedasticity. To perform the test we have first to regress the squared residuals
of the preceding regression on the predictors present in the model.
RES2 = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
The residuals can be extracted from the object labour4.1 by means of the instruction
labour4.1$res. Observe that to define the squared residuals in the first member of
the model formula, we do not have to make recourse to the function as.is I(), which
instead must necessarily be invoked when a squared variable is included as regressor,
see Appendix A.4.
> labour4.2 <- lm(labour4.1$res^2 ~ WAGE + OUTPUT +
CAPITAL, data = labour)
Call:
lm(formula = labour4.1$res^2 ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-500023 -12448
Median
2722
3Q
Max
13354 1193685
Coefficients:
11838.9 -1.919
0.0555 .
WAGE
228.9
302.2
0.757
0.4492
OUTPUT
5362.2
214.4 25.016
<2e-16 ***
CAPITAL
-3543.5
162.1 -21.858
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
95
Residual standard error: 94180 on 565 degrees of freedom

F-statistic:
262 on 3 and 565 DF, p-value: < 2.2e-16
We can perform the Breusch-Pagan test by multiplying the R2 of the preceding
regression, available by using the code summary(labour4.2)$r.squared, by N = 569,
the number of observations, which may be obtained with length(labour4.2)$res.
Under the null hypothesis of homoscedasticity the distribution of the statistic
is a 2 with 3 degrees of freedom. The number of d.o.f. is equal to the one
in the numerator of the F statistic, which appears in the regression output; it
may be automatically obtained by extracting the second element from the object
summary(labour4.2)$fstatistic.
> bpstat <- length(labour4.2$res) * summary(labour4.2)$r.squared
> df <- summary(labour4.2)$fstatistic[2]
> paste("Breusch-Pagan stat = ", round(bpstat, 4),
" df = ", df, " p-value = ", 1 - pchisq(bpstat,
df), collapse = "")
[1] "Breusch-Pagan stat = 331.0653
df = 3
p-value = 0"
4.1.3
Breusch-Pagan test - direct function
We can also obtain the Breusch-Pagan result, without performing the preceding
regression, by directly invoking the function bptest available in the package lmtest.
> library(lmtest)
> bptest(labour4.1)
studentized Breusch-Pagan test
data: labour4.1
BP = 331.0653, df = 3, p-value < 2.2e-16
4.1.4
Loglinear model
Verbeeks Table 4.3 reports the OLS estimation results for the loglinear model:
log(LABOR) = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) + ERROR (4.1)
which may be obtained as:
> labour4.3 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour)
96
Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour)
Residuals:
Min
1Q
-3.9744 -0.1641
Median
0.1079
3Q
0.2595
Max
1.9466
Coefficients:
(Intercept)
6.177290
0.246211 25.089
<2e-16 ***
log(WAGE)
-0.927764
0.071405 -12.993
<2e-16 ***
log(OUTPUT)
0.990047
0.026410 37.487
<2e-16 ***
log(CAPITAL) -0.003697
0.018770 -0.197
0.844
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The corresponding Breusch-Pagan statistic is:
> bptest(labour4.3)
data: labour4.3
BP = 7.7269, df = 3, p-value = 0.05201
4.1.5
White Heteroscedasticity test
To perform the White Heteroscedasticity test and obtain the results in Verbeeks
Table 4.4 we have to consider the estimation of the following linear model:
RES2 = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) +
+ 22 log(WAGE) + 33 log(OUTPUT) + 44 log(CAPITAL) +
+ 23 log(WAGE) log(OUTPUT) + 24 log(WAGE log(CAPITAL) +
+ 34 log(OUTPUT) log(CAPITAL) + ERROR
(4.2)
To write this formula, have first a check to the variables that result as regressors by
applying the following regression statement1 , see A.4 for the instruction lm.
1 The function coeftest in the package lmtest performs the t tests on the estimated coefficients.
In the present situation it is used to have a look at which variables are present in the model.
97
> temp <- lm(labour4.3$res^2 ~ (log(WAGE) + log(OUTPUT) +

log(CAPITAL))^2, data = labour)
> library(lmtest)
> coeftest(temp)
t test of coefficients:
(Intercept)
0.416498
0.816906 0.5098 0.61036
log(WAGE)
-0.039661
0.230421 -0.1721 0.86340
log(OUTPUT)
-0.916128
0.476087 -1.9243 0.05482
log(CAPITAL)
0.777534
0.368707 2.1088 0.03540
log(WAGE):log(OUTPUT)
0.247144
0.130601 1.8924 0.05896
log(WAGE):log(CAPITAL)
-0.239506
0.101826 -2.3521 0.01901
log(OUTPUT):log(CAPITAL) 0.019062
0.013475 1.4146 0.15774
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
.
*
.
*
We observe that the interaction terms appear in the model specification (4.2), but
the squared predictors are not included; so we have to adjust the regression formula
in the following way:
> labour4.4 <- lm(labour4.3$res^2 ~ (log(WAGE) + log(OUTPUT) +
log(CAPITAL))^2 + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2), data = labour)
Call:
lm(formula = labour4.3$res^2~(log(WAGE)+log(OUTPUT)+log(CAPITAL))^2+
I(log(WAGE)^2)+I(log(OUTPUT)^2)+I(log(CAPITAL)^2),data=labour)
Residuals:
Min
1Q Median
-2.2664 -0.1650 -0.0724
3Q
Max
0.0212 15.2247
Coefficients:
(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)
I(log(WAGE)^2)
I(log(OUTPUT)^2)
I(log(CAPITAL)^2)
log(WAGE):log(OUTPUT)
log(WAGE):log(CAPITAL)
log(OUTPUT):log(CAPITAL)

2.54460
3.00278
0.847 0.397126
-1.29900
1.75274 -0.741 0.458929
-0.90372
0.55985 -1.614 0.107045
1.14205
0.37582
3.039 0.002486 **
0.19274
0.25895
0.744 0.457003
0.13820
0.03565
3.877 0.000118 ***
0.08954
0.01399
6.401 3.27e-10 ***
0.13804
0.16256
0.849 0.396168
-0.25178
0.10497 -2.399 0.016782 *
-0.19160
0.03687 -5.197 2.84e-07 ***
98
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The White statistic is defined by N R2 , that is N times the Multiple R-squared, and
the corresponding p-value may be obtained by considering the probability that a 2
random variable with 9 degrees of freedom (number of predictors in the preceding
regression) exceeds the White statistic.
> whitestat <- length(labour4.2$res) * summary(labour4.4)$r.squared
> df <- summary(labour4.4)$fstatistic[2]
> paste("White stat = ", round(whitestat, 4), " df = ",
df, " p-value = ", round(1 - pchisq(whitestat,
df), 4), collapse = "")
[1] "White stat = 58.5443
df = 9
p-value = 0"
The hypothesis of homoscedasticity is rejected.
The result can also be obtained by means of the function bptest; we have to specify
the auxiliary regression statement in the varformula argument, and the data.frame
in the data argument.
> length(labour4.2$res) * summary(labour4.4)$r.squared
[1] 58.54435
> bptest(labour4.3, varformula = ~(log(WAGE) + log(OUTPUT) +
log(CAPITAL))^2 + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
data: labour4.3
BP = 58.5443, df = 9, p-value = 2.555e-09
4.1.6
Heteroscedasticity consistent covariance matrix
We can obtain the Heteroscedasticity consistent covariance matrix2 of the parameter

estimates, by invoking the command vcovHC(x, type) in the package sandwich. The
first argument is a fitted model object, the second one specifies the estimation type
for the covariance matrix, with "HC1" the corrected White estimate is obtained.
> library(sandwich)
> round(vcovHC(labour4.3, type = "HC1"), 4)
2 The command vcovHC provides 5 estimators of the covariance matrix, denoted in the literature
as HC0 to HC4, see next Section 4.1.8 and Zeileis (2004).
(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)
99
(Intercept) log(WAGE) log(OUTPUT) log(CAPITAL)

0.0864
-0.0250
0.0004
0.0025
-0.0250
0.0075
-0.0008
-0.0004
0.0004
-0.0008
0.0022
-0.0015
0.0025
-0.0004
-0.0015
0.0014
We may then perform the heteroscedasticity consistent parameter inference, by having

recourse to the function coeftest, whose first argument is the lm object; by using
the second argument it is possible to specify the type of heteroscedasticity consistent
matrix to be considered in computing the parameter standard errors.
> library(lmtest)
> labour4.5 <- coeftest(labour4.3, vcov = vcovHC(labour4.3,
type = "HC1"))
> labour4.5
(Intercept)
6.1772896 0.2938869 21.0193
<2e-16
log(WAGE)
-0.9277642 0.0866604 -10.7057
<2e-16
log(OUTPUT)
0.9900474 0.0467902 21.1593
<2e-16
log(CAPITAL) -0.0036975 0.0378770 -0.0976
0.9223
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "."
***
***
***
0.1 " " 1
2 assume
The standard error of the regression (residual standard error), R2 and R
the same values as in the preceding regression, see Section 4.1.5. To obtain the value
of the F test adjusted for the presence of heteroscedasticity it is possible to use the
function waldtest, available in the package lmtest by specifying as first argument
the lm object labour4.3 resulting from the preceding call to linear model, and as
second argument the object containing the estimate of the White covariance matrix.
> waldtest(labour4.3, vcov = vcovHC(labour4.3, type = "HC1"))
Wald test
Model 1: log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL)
Model 2: log(LABOR) ~ 1
Res.Df Df
F
Pr(>F)
1
565
2
568 -3 544.73 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.1.7
Estimated Generalized Least Squares
When some assumptions on the form of heteroscedasticity can be made, it is possible

to have recourse to the Estimated Generalized Least Squares (EGLS). In Verbeeks
100
(4.36) it is considered the multiplicative form (with zi = xi ). With regard to the

present example and to the multiplicative heteroscedasticity hypothesis, we have to
estimate the parameters in the following model:
log(e2i ) = 0 + 1 log(WAGE) + 2 log(OUTPUT) + 3 log(CAPITAL) + ERROR
(4.3)
where ei are the residuals from model (4.1).

> labour4.6 <- lm(log(labour4.3$res^2) ~ log(WAGE) +
log(OUTPUT) + log(CAPITAL), data = labour)
Call:
lm(formula = log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour)
Residuals:
Min
1Q
-11.7445 -0.7645
Median
0.3281
3Q
1.1430
Max
6.7871
Coefficients:
1.18545 -2.745 0.006247 **
log(WAGE)
-0.06105
0.34380 -0.178 0.859112
log(OUTPUT)
0.26695
0.12716
2.099 0.036231 *
0.09037 -3.659 0.000277 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.01931
Verbeek checks if the preceding form of heteroscedasticity is not too restrictive by

estimating a version of the model including also the three squared terms:
log(e2i ) = 0 + 1 log(WAGE) + 2 log(OUTPUT) + 3 log(CAPITAL)
+ 4 (log(WAGE))2 + 5 (log OUTPUT))2 + 6 (log(CAPITAL))2 + ERROR
> labour4.6sq <- update(labour4.6, . ~ . + I(log(WAGE)^2) +

I(log(OUTPUT)^2) + I(log(CAPITAL)^2))
> summary(labour4.6sq)
Call:
lm(formula = log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL) + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
Residuals:
Min
1Q
-11.6861 -0.8002
Median
0.3633
3Q
1.1849
101
Max
6.6993
Coefficients:
(Intercept)
5.819683
6.530195
0.891 0.373205
log(WAGE)
-4.942304
3.561094 -1.388 0.165729
log(OUTPUT)
0.187647
0.188814
0.994 0.320738
log(CAPITAL)
-0.331626
0.090318 -3.672 0.000264 ***
I(log(WAGE)^2)
0.653428
0.486332
1.344 0.179625
I(log(OUTPUT)^2)
0.001372
0.047232
0.029 0.976834
I(log(CAPITAL)^2) 0.030694
0.026799
1.145 0.252569
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.02372
An analysis of variance confirms that the initial model for heteroscedasticity, see (4.3),
cannot be rejected.
> anova(labour4.6, labour4.6sq)
Model 1: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL)
Model 2: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL) + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
565 2836.1
2
562 2808.3 3
27.756 1.8515 0.1367
To obtain EGLS we have to transform the variables, by considering also the constant,
and perform the initial regression on the transformed variables. This can be made
directly in the linear model formula.
> hhat <- exp(fitted(labour4.6))
Observe that hhat contains a possible estimate for the variances of the error related
to each statistical unit, V ar(i |xi ) see Verbeeks relationship (4.36); that is of the
elements on the diagonal of 2 , the covariance matrix of the errors. appears in
the generalized least squares (GLS) estimator of (), see Verbeeks relationship (4.9):
= (X 0 1 X)1 X 0 1 y.
102
In the present case is assumed a diagonal matrix, that is the errors are assumed to
be uncorrelated.
> labour4.7 <- lm(log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
Call:
lm(formula = log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
Residuals:
Min
1Q
-29.1086 -0.7875
Median
0.4852
3Q
1.2394
Max
10.4219
Coefficients:
I(1/hhat^0.5)
5.89536
0.24764 23.806 < 2e-16 ***
I(log(WAGE)/hhat^0.5)
-0.85558
0.07188 -11.903 < 2e-16 ***
I(log(OUTPUT)/hhat^0.5)
1.03461
0.02731 37.890 < 2e-16 ***
I(log(CAPITAL)/hhat^0.5) -0.05686
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
which reproduces Verbeeks Table 4.7. Note that it is also possible to specify, in a
simpler way, the latter model, by including the weights option in the function lm,
thus performing weighted least squares.
> labour4.7 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour, weights = hhat^-1)
Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour, weights = hhat^-1)
Weighted Residuals:
Min
1Q
Median
-29.1086 -0.7875
0.4852
3Q
1.2394
Max
10.4219
Coefficients:
103
(Intercept)
5.89536
0.24764 23.806 < 2e-16 ***
log(WAGE)
-0.85558
0.07188 -11.903 < 2e-16 ***
log(OUTPUT)
1.03461
0.02731 37.890 < 2e-16 ***
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The goodness of fit statistics reported in the last output differ from those in the
preceding one.
To obtain, as suggested by Verbeek, R2 = corr2 (yi , yi ) use the code
> cor(log(labour$LABOR), fitted(labour4.7))^2
[1] 0.8404098
4.1.8
Types of Heteroscedasticity consistent covariance matrices
The command vcovHC(x, type) provides 5 estimators3 for the Heteroscedasticity

consistent covariance matrix to make consistent inference on the parameters of a
linear model summarized in an lm object x. The types are denoted in the literature
as HC0 to HC4, see Zeileis (2004), and correspond to substitute to e2i in Verbeeks
relationship (4.30):
e2i with HC or HC0,
n
2
nk ei
e2i
1hi
with HC1,
with HC2,
e2i
(1hi )2
e2i
(1hi )i
with HC3,
with HC4,
where hi is the ith element on the main diagonal of the so-called Hat matrix
H = X(X 0 X)1 X 0 . The estimator HC0 was proposed by White (1980), HC1-HC3 by
MacKinnon and White (1985) to improve the performance in small samples. Long and
Ervin (2000) suggested HC3. The estimator HC4 by Cribari-Neto (2004) should improve
the small sample performance, especially in the presence of influential observations.
An observation is defined influential when its presence considerably alters the
parameter estimates.
Let consider the estimation of the linear model
y = 1 + 2 x +
in presence of an artificial dataset.
3 By setting type="const" the usual homoscedastic estimator for the covariance matrix of the
parameter estimates is selected.
104
> set.seed(123456)
> x <- runif(49, 10, 20)
> y <- 10 + 3 * x + rnorm(49)
The data have been simulated by considering for the variable x a sample, with
dimension 49, from a uniform distribution with values in the interval 10, 20; by setting
1 = 10, 2 = 3 and by considering for the errors realizations from a standardized
Normal random variable.
The OLS estimator may be obtained as:
> lm49 <- lm(y ~ x)
> summary(lm49)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
Median
-2.69955 -0.70519 -0.04837
3Q
0.63405
Max
2.19068
Coefficients:
0.83438
12.06 5.36e-16 ***
x
2.98002
0.05264
56.61 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.9852
The parameter estimates are quite close to their theoretical values.

Let now assume that a new observation is present in our data with values 26 for x
and 60 for y. See the scatterplot in Fig. 4.1: the new observation is the point isolated
on the right side.
> x <- c(x, 26)
> y <- c(y, 60)
The parameter of the linear model can be re-estimated in presence of the new
observation and the p-values, computed without taking into account the presence
of any type of heteroscedasticity, give evidence that the estimates are significantly
different from 0.
> lm50 = lm(y ~ x)
> summary(lm50)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
-21.6949 -1.4477
Median
0.9419
3Q
1.8599
105
Max
4.2664
Coefficients:
(Intercept) 17.6932
2.5530
6.93 9.4e-09 ***
x
2.4616
0.1584
15.54 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.8307
We cannot definitively trust in previous results, due to the presence of the anomalous
case we added to the initial data: this observation has an influence on the parameter
estimates: both on the intercept and on the slope of the linear model.
Figure 4.1 shows the data and the regression lines we have estimated, the dotted one
is referred to the 50 data (with the anomalous presence).
> plot(y ~ x)
> abline(lm49)
> abline(lm50, lty = 2)
The presence of influential data may be detected by performing a Heteroscedasticity
consistent covariance inference by setting the option type="HC4". We have:
>
>
>
t
library(sandwich)
library(lmtest)
coeftest(lm50, vcov = vcovHC(lm50, type = "HC4"))
test of coefficients:

(Intercept) 17.6932
9.7824 1.8087 0.0767688 .
x
2.4616
0.6639 3.7078 0.0005416 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
In this case the t value for the intercept has changed, being the intercept no more
significantly different from 0. It is possible to analize the data set to detect which data
are influential: we can check the values on the diagonal of the so-called hat matrix:
X(X 0 X)1 X 0 .
The matrix X of the regressors may be extracted by applying the function
model.matrix to the lm object lm50 and the diagonal of the hat matrix may be
obtained by using the function hat.
106
70
65
60
55
50
45
15
20
25
Figure 4.1 Scatter plot diagram of the data and regression lines without considering the
anomalous case (plain line) and considering the anomalous case (dotted line)
> X = model.matrix(lm50)
> round(hat(X), 3)
[1] 0.029 0.026 0.026 0.030
[11] 0.029 0.020 0.040 0.037
[21] 0.052 0.065 0.056 0.050
[31] 0.021 0.036 0.042 0.059
[41] 0.043 0.025 0.039 0.035
0.029
0.052
0.022
0.031
0.021
0.046
0.039
0.023
0.066
0.024
0.020
0.037
0.037
0.046
0.020
0.063
0.047
0.037
0.036
0.025
0.051
0.031
0.034
0.023
0.022
0.051
0.027
0.051
0.020
0.212
The values are plotted in Fig 4.2.

> plot(hat(X))
The last value gives evidence that the new observation is influential with respect to
other data. Observe that the problem, due to the presence of the anomalous datum,
cannot be detected so clearly by using the other types of heteroscedasticity treatment
available with the instruction vcovHC.
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC0"))
0.10
hat(X)
0.15
0.20
0.05
20
10
30
40
50
Index
Figure 4.2
Influential observations detection for the artificial data example
6.15375 2.8752 0.006003 **
x
2.46161
0.41681 5.9059 3.49e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Intercept) 17.6932
6.2806 2.8171 0.007015 **
x
2.4616
0.4254 5.7865 5.303e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
107
108
6.90622 2.5619
0.0136 *
x
2.46161
0.46802 5.2596 3.312e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
7.75586 2.2813
0.02701 *
x
2.46161
0.52584 4.6813 2.365e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Also with reference to the Labour Demand Example it is possible to detect the
presence of anomalous data. See Fig. 4.3.
> X = model.matrix(labour4.3)
> plot(hat(X))
4.2
The Demand for Ice Cream (Section 4.8)
Data can be read by means of the function read.table, having exctracted the
file icecream.dat from the compressed archive ch04.zip. As usual it is possible
to check for the consistency of the data with the information contained in the file
icecream.txt available in the zip file by means of the functions summary, head and
tail.
> icecream <- read.table(unzip("ch04.zip", "Chapter 4/icecream.dat"),
header = TRUE)
The following variables, with four-weekly observations from March 18, 1951 to July
11, 1953, (30 observations) are present:
cons: consumption of ice cream per head (in pints)4 ;
income: average family income per week (in US Dollars);
price: price of ice cream (per pint);
temp: average temperature (in Fahrenheit);
time: index from 1 to 30.
Figure 4.4 shows the evolution of the time series Consumption, Temperature/100
and Price (cfr. Verbeeks Fig. 4.3). It may be obtained by first transforming the
4 Expenditures
on ice cream, not actual consumption are reported.
109
0.06
0.05
0.03
hat(X)
0.04
0.00
0.01
0.02
100
200
300
400
500
Index
Figure 4.3
Influential observations detection for Verbeek firm example
data.frame icecream in the multiple time series icecream1. One has then to
rescale the temperature by 0.01. Remember that a time series object is not a
data.frame, so the values of the temperature cannot be extracted with the command
icecream1$temp: the code icecream1[,4] will return the temperature values since
the variable temp is the fourth time series in icecream1 (the variable is stored in the
fourth column of the object). A time series object can thus be treated like a matrix.
> icecream1 <- ts(icecream)
> icecream1[, 4] <- icecream1[, 4]/100
To assign proper names to the time series in icecream1 have first a check to the
structure of the object by means of the function str.
> str(icecream1)
mts [1:30,1:5] 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "cons" "income" "price" "temp" ...
110
- attr(*, "tsp")= num [1:3] 1 30 1

- attr(*, "class")= chr [1:3] "mts" "ts" "matrix"
The names of the time series are stored as the second element of the attribute
dimnames of the mts object icecream1. Since dimnames is a list it is possible to
make use of the following code to change the names:
> dimnames(icecream1)[[2]] <- c("Consumption", "Income",
"Price", "Temperature", "time")
The graph is finally produced by using the function xyplot of the package lattice.
The first argument is the multiple time series object containing the series to be plotted;
the option superpose allows one to plot the time series in an unique graph (TRUE) or
in separate subgraphs (FALSE which is the default); the graphical parameters type and
pch define respectively the type of line and the symbols to be used: in the present case
with type="o" a solid line is produced for all the series and the lines are decorated
with the symbols specified in the list associated to pch, see the help on ?par and
?lattice::xyplot for more details.
With the auto.key=list(points=TRUE,lines=FALSE) option it is possible to insert
a legend in the graph associating a symbol to the name of each series5 .
> library(lattice)
> xyplot(icecream1[, c(1, 3, 4)], superpose = TRUE,
type = "o", pch = list(21, 24, 22), auto.key=list(points=TRUE,
lines = FALSE))
In Verbeeks Table 4.9 the OLS estimation results for the following model are reported
const = 1 + 2 pricet + 3 incomet + 4 tempt + t
(4.4)
Observe that no lagged variables are present among the regressors.

The parameter estimates may be obtained by using the function lm.
> icecream4.9 <- lm(cons ~ price + income + temp, data = icecream)
> summary(icecream4.9)
Call:
lm(formula = cons ~ price + income + temp, data = icecream)
Residuals:
Min
1Q
-0.065302 -0.011873
Median
0.002737
3Q
0.015953
Max
0.078986
Coefficients:
(Intercept) 0.1973151 0.2702162
0.730 0.47179
price
-1.0444140 0.8343573 -1.252 0.22180
5 With the option auto.key=list(points=FALSE,lines=TRUE), which defines the default label and
can thus be omitted, each series is identified by a coloured segment.
Consumption
Price
Temperature
111
0.6
0.7
0.5
0.4
0.3
10
15
20
25
30
Time
Figure 4.4
Evolution of the time series Consumption, Temperature/100 and Price
income
0.0033078 0.0011714
2.824 0.00899 **
temp
0.0034584 0.0004455
7.762 3.1e-08 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.2.1
The Durbin-Watson statistic - construction
The Durbin-Watson statistic to test for the presence of the first-order autocorrelation
in the residual series may be obtained by implementing Verbeeks relationship (4.51):
PT
(et et1 )2
dW = i=2PT
.
2
i=1 et
112
> dwstat <- sum(diff(icecream4.9$res)^2)/sum(icecream4.9$res^2)

> dwstat
[1] 1.02117
Note that the function diff applied to a vector x = [x1 , x2 , . . . , xn ] of length n results
in the vector [x2 x1 , . . . , xn xn1 ] of length n 1.
4.2.2
The Durbin-Watson statistic - direct function
The Durbin-Watson statistic can be also directly obtained by making use of the
function dwtest available in the package lmtest which produces also the significance
level of the test. In the present case the hypothesis of no first-order autocorrelation
of the errors has to be rejected.
> library(lmtest)
> dwtest(icecream4.9, alternative = "greater")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0003024
alternative hypothesis: true autocorrelation is greater than 0
By specifying the argument alternative in the function dwtest it is possible to
define the direction of the test; by default alternative is set to "greater", that is
the autocorrelation of first order is greater than 0.
> dwtest(icecream4.9, alternative = "two.sided")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0006048
alternative hypothesis: true autocorrelation is not 0
> dwtest(icecream4.9, alternative = "less")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.9997
alternative hypothesis: true autocorrelation is less than 0
Figure 4.5 shows the actual and fitted values for the ice cream consumption giving
evidence of the presence of a pattern in the residual behaviour. To obtain this graph
we define first a time series object icecream1 containing the consumption fitted values
from the regression model (4.4) and the real ice cream consumption values. The fitted
values may be derived by applying the function fitted to the lm object icecream4.9.
> icecream1 <- ts(cbind("Fitted Consumption" = fitted(icecream4.9),
Consumption = icecream$cons))
0.55
113
0.50
0.45
0.40
0.35
Consumption
0.30
0.25
10
15
20
25
30
Time
Figure 4.5
Actual and fitted values for the ice cream consumption
We may then apply the function xyplot to the multiple time series
> xyplot(icecream1, type = list("l", "p"), pch = 19,
xlab = "Time", ylab = "Consumption", superpose = TRUE,
auto.key = FALSE)
The list("l","p") in the argument type specifies that the first time series is plotted
with a line "l", while the second one with points "p"; use pch=19 for plain bullets.
xlab and ylab specify the axis labels and auto.key=FALSE suppresses the legend.
4.2.3
Estimation of the first-order autocorrelation coefficient
The first-order autocorrelation coefficient of the residuals, corresponding to the

parameter in the linear model (without intercept)
t = t1 + vt
may be estimated by applying the function lm to the series of the residuals.
Two different options may be adopted.
114
A time window starting at time 2 and ending at time 30 (the length of the
residual series) can be considered: in this way the dependent variable in the
linear model is given by the series of the residuals without its first element,
while the independent one is the series of the residuals without its last element.
The alternative is to consider the complete series of the residuals as dependent

variable and the lagged series of the residuals completed by setting to 0 the only
one missing pre-sample value as the independent one.
Let resautocorr and resautocorr0 denote the lm objects obtained according to the
first and second option respectively.
> resautocorr <- lm(icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
> resautocorr0 <- lm(icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
> summary(resautocorr)
Call:
lm(formula = icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
Residuals:
Min
1Q
Median
-0.063581 -0.014006 -0.000714
3Q
0.009123
Max
0.080090
Coefficients:
icecream4.9$res[-30]
0.4006
0.1774
2.258
0.0319 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> summary(resautocorr0)
Call:
lm(formula = icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
Residuals:
Min
1Q
Median
-0.063581 -0.013547 -0.000351
3Q
0.012530
Max
0.080090
Coefficients:
c(0, icecream4.9$res[-30])
0.4006
0.1907
2.101
0.0444 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

Adjusted R-squared:
p-value: 0.04444
115
0.1022
The autocorrelation estimate is the same for the two models. In both outputs the
Multiple R-squared is produced as uncentered R-squared since an intercept term is
not present in the proposed models.
> 1
[1]
> 1
[1]
- sum(resautocorr$res^2)/sum(residuals(icecream4.9)[-1]^2)
0.1540577
- sum(resautocorr0$res^2)/sum(icecream4.9$res^2)
0.1321176
For the resautocorr0 model the centered and uncentered R-squared do coincide
since we have considered as dependent variable the complete series of residuals from
equation (4.4), which has zero mean. In Verbeek the first version of the model
(resautocorr) is considered but the centered R-squared has been computed. The
difference with the value reported here is due to the fact that the residual series
without its first element hasnt zero mean.
> (VerRsq<-1-sum(resautocorr$res^2)/sum((residuals(icecream4.9)[-1]mean(residuals(icecream4.9)[-1]))^2))
[1] 0.1491856
or
> (VerRsq<-1-sum(resautocorr$res^2)/(length(residuals(icecream4.9)[-1])1)/var(residuals(icecream4.9)[-1]))
[1] 0.1491856
The asymptotic test on
suggests to reject the hypothesis of no first-order autocorrelation

> (test<-length(icecream4.9$res)^0.5*as.numeric(coef(resautocorr)))
[1] 2.194355
> (pvalue <- 1 - pnorm(test))
[1] 0.01410495
4.2.4
The Breusch-Godfrey test to test the presence of autocorrelation

- construction
The assumption we have used in the second model formulation (resatocorr0) in

the preceding section to consider the complete residual series as dependent variable
and to set to zero any presample values of the residuals is the one usually adopted for
defining the auxiliary regression necessary to compute the Breusch-Godfrey statistic,
according to what proposed by Davidson and MacKinnon (1993).
116
The Breusch-Godfrey statistic (for testing the first-order correlation) is obtained by

multiplying the length of the time series by the Multiple R-squared of the following
auxiliary regression:
et = x0t + et1 + vt = 1 + 2 price + 3 income + 4 temp + et1 + vt .
where et are the residuals from the linear model yt = x0t , in our case model (4.4).
The auxiliary regression thus contains the regressors of the reference model and the
lagged residual series (terms corresponding to the residual series lagged until order p
also appear as regressors in the auxiliary regression when autocorrelations of order 1
to p are jointly tested).
The auxiliary regression takes into account also the fact that et may be correlated
with some lagged regressors, possibly also the lagged yt .
> shift.res <- c(0, residuals(icecream4.9)[-30])
> resautocorrint <- update(icecream4.9, icecream4.9$res ~
. + shift.res)
> summary(resautocorrint)
Call:
lm(formula = icecream4.9$res ~ price + income + temp + shift.res,
data = icecream)
Residuals:
Min
1Q
Median
-0.064280 -0.014474 -0.002083
3Q
0.008912
Max
0.081859
Coefficients:
(Intercept) 0.0615530 0.2571651
0.239
0.8128
price
-0.1476412 0.7918621 -0.186
0.8536
income
-0.0001158 0.0011085 -0.104
0.9176
temp
-0.0002033 0.0004328 -0.470
0.6426
shift.res
0.4282815 0.2112149
2.028
0.0534 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 4.237064
The test statistic must be compared with the quantile of a Chi-squared random
variable with p degrees of freedom. In the present example, since only the presence of
the first-order autocorrelation is tested, we have p = 1 and 21,0.95 = 3.84.
4.2.5
117
The Breusch-Godfrey test to test the presence of autocorrelation

- direct function
The Breusch-Godfrey test with its significance level may be directly obtained by
having recourse to the function bgtest available in the package lmtest.
> library(lmtest)
> bgtest(icecream4.9)
Breusch-Godfrey test for serial correlation of order up to 1
data: icecream4.9
LM test = 4.2371, df = 1, p-value = 0.03955
Observe that by applying the function coeftest to the object resulting from bgtest
the coefficients from the auxiliary regression including lagged residuals may be
obtained.
> coeftest(bgtest(icecream4.9))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.06155297 0.25716506 0.2394 0.81083
price
-0.14764118 0.79186210 -0.1864 0.85209
income
-0.00011579 0.00110852 -0.1045 0.91681
temp
-0.00020333 0.00043284 -0.4698 0.63852
lag(resid)_1 0.42828155 0.21121490 2.0277 0.04259 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
You will observe that inference on parameter estimates is made through the Normal
distribution (z values and not t values are reported). This because OLS assumptions
are not satisfied and only a normal limiting distribution may be derived for the OLS
parameter estimators, see Johnston and Di Nardo (1997) and Mann and Wald (1943).
4.2.6
Some remarks on the procedure presented by Verbeek on page 113
Observe that there are no lagged regressors in equation (4.4) and the residuals have
been assumed to be uncorrelated with the regressors in the model, so Verbeek defines
the auxiliary regression for computing the Breusch-Godfrey statistic by including only
the intercept term and the lagged values of the residuals, see Verbeek p. 120, omitting
any other regressors present in equation (4.4).
In consequence of this particular formulation of the model, the value of the statistic
will be different from the one (usually provided by standard software), which we
reported above and which takes into account the possible presence of autocorrelation
between yt and the lagged regressor components of xt , see Johnston and Di Nardo
(1997) p. 191 (6.54).
In Verbeek the estimation of the following auxiliary regression has been considered.
t = const. + t1 + vt
118
We have:
> resautocorrint <- lm(icecream4.9$res ~ c(0, icecream4.9$res[-30]))
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 3.992121
which is equivalent to invoke the function bgtest for a linear model where the series
et of the residuals depends only on the constant term.
> bgtest(lm(icecream4.9$res ~ 1))
data: lm(icecream4.9$res ~ 1)
LM test = 3.9921, df = 1, p-value = 0.04571
In Verbeek the statistic is actually computed by considering the Multiple R-squared
from model (4.4)
> (length(icecream4.9$res) - 1) * VerRsq
[1] 4.326382
Both formulations of the Breusch-Godfrey statistic reject the hypothesis of no firstorder autocorrelation presence, since the 0.95 quantile of a Chi-squared distribution
with 1 degree of freedom is 3.84.
4.2.7
The EGLS (iterative Cochrane-Orcutt) procedure
The estimation by means of the iterative Cochrane-Orcutt procedure may be obtained

by applying the following algorithm:
>
>
>
>
>
>
>
temp <- lm(cons ~ price + income + temp, data = icecream)

pred <- model.matrix(temp)
temp0 <- temp$coef
resid <- temp$residuals
T <- length(resid)
k <- 0
while (k == 0) {
rhoest <- lm(resid ~ -1 + c(0, resid[-T]))
rho <- as.numeric(rhoest$coef)
pred1 <- pred[-1, ] - rho * pred[-T, ]
temp1 <- lm(icecream$cons[-1] - rho * icecream$cons[-T] ~
-1 + pred1)
ctrl <- sum((temp1$coef - temp0)^2)
resid <- icecream$cons - model.matrix(temp) %*% temp1$coef
temp0 <- temp1$coef
if (ctrl < 1e-09)
k <- 1
}
> summary(temp1)
119
Call:
lm(formula = icecream$cons[-1] - rho * icecream$cons[-T] ~ -1 +
pred1)
Residuals:
Min
1Q
Median
-0.061510 -0.013400 -0.000524
3Q
0.013603
Max
0.082052
Coefficients:
pred1(Intercept) 0.1571429 0.2896285
0.543
0.5922
pred1price
-0.8923919 0.8108501 -1.101
0.2816
pred1income
0.0032028 0.0015460
2.072
0.0488 *
pred1temp
0.0035584 0.0005547
6.415 1.02e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
F-statistic:
347 on 4 and 25 DF, p-value: < 2.2e-16
> summary(rhoest)
Call:
lm(formula = resid ~ -1 + c(0, resid[-T]))
Residuals:
Min
1Q
-0.061510 -0.013163
Median
0.001124
3Q
0.014793
Max
0.082052
Coefficients:
c(0, resid[-T])
0.4009
0.1923
2.085
0.046 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The results are reported in Verbeeks Table 4.10. We observe some differences with
Verbeeks standard errors. In this case, as remarked by Verbeek, the Durbin-Watson
statistic is not appropriate since it would be referred to the transformed model.
120
4.2.8
The model with the lagged temperature
Verbeeks Table 4.11 reports the estimation results for the following model, including
the lagged value of the temperature, and the corresponding Durbin-Watson statistic.
const = 1 + 2 pricet + 3 incomet + 4 tempt + 4 tempt1 + t .
(4.5)
The parameter estimates may be obtained by applying the functions lm and then the
function dwtest to the lm object resulting from the following instruction.
> icecream4.11 <- lm(cons[-1] ~ price[-1] + income[-1] + temp[-1] +
temp[-T], data = icecream)
Call:
lm(formula = cons[-1] ~ price[-1] + income[-1] + temp[-1] + temp[-T],
data = icecream)
Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745
3Q
0.014766
Max
0.080892
Coefficients:
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price[-1]
-0.8383023 0.6880209 -1.218 0.23490
income[-1]
0.0028673 0.0010533
2.722 0.01189 *
temp[-1]
0.0053321 0.0006704
7.953 3.5e-08 ***
temp[-T]
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
We can check for the sign of a possible first order autocorrelation, see Section 4.2.3:
> signcheck <- lm(icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
> summary(signcheck)
Call:
lm(formula = icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
Residuals:
Min
1Q
Median
-0.045799 -0.014388 -0.007997
3Q
0.014110
Max
0.082036
Coefficients:
c(0, icecream4.11$res[-29])
0.07432
0.22636
0.328

Adjusted R-squared:
The asymptotic test on
121
0.745
-0.03174
suggests to accept the hypothesis of no first-order autocorrelation.

> (test <- length(icecream4.11$res)^0.5 * as.numeric(coef(signcheck)))
[1] 0.4002376
> (pvalue <- 1 - pnorm(test))
[1] 0.3444907
So we can perform a two sided Durbin-Watson test
> dwtest(icecream4.11, alternative = "two.sided")
Durbin-Watson test
data: icecream4.11
DW = 1.5822, p-value = 0.05751
alternative hypothesis: true autocorrelation is not 0
and conclude for the absence of autocorrelation. In Verbeek it is observed that the
value of the Durbin-Watson statistic is in the inconclusive region; according to its
p-value we conclude that the statistic is on the boundary of the acceptance region.
Observe that the last model can be formulated in a easier manner by having recourse
to the operators available in the package dynlm (Dynamic Linear Regression): in
particular by means of the function L(x,k) (equivalent to the function lag(x,lag=k) available in the base system) the series x is lagged of k time units (by default k is
set to 1).
To estimate the linear model parameters we have then to invoke the function dynlm,
which is characterized by the same structure of the function lm, we have used in
the preceding Sections for the estimation of linear models. Note that the data set
is required to have the structure of a multiple time series: the function as.ts is
used here to convert the data.frame in an undated time series.
> library(dynlm)
> icecream4.11 <- dynlm(cons ~ price + income + temp +
L(temp), data = as.ts(icecream))
Time series regression with "ts" data:
Start = 2, End = 30
Call:
dynlm(formula=cons~price+income+temp+L(temp),data=as.ts(icecream))
122
Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745
3Q
0.014766
Max
0.080892
Coefficients:
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price
-0.8383023 0.6880209 -1.218 0.23490
income
0.0028673 0.0010533
2.722 0.01189 *
temp
0.0053321 0.0006704
7.953 3.5e-08 ***
L(temp)
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.3
Risk Premia in Foreign Exchange Markets (Section 4.11)
Data can be read by means of the function read.table, having exctracted the file
forward2c.dat from the compressed archive ch04.zip. Use the functions summary,
head and tail to check for the consistency of the imported data with the information
contained in the file forward2c.txt available in the zip archive.
> riskpremia <- read.table(unzip("ch04.zip", "Chapter 4/forward2.dat"),
header = TRUE)
The data.frame contains 276 observations from January 1979 to December 2001
taken from DATASTREAM on the following variables6 .
EXUSBP: exchange rate USDollar/British Pound Sterling
EXUSEUR: exchange rate USDollar/Euro
EXEURBP: exchange rate Euro/Pound
F1USBP: 1 month forward rate USD/Pound
F1USEUR: 1 month forward rate USD/Euro
F1EURBP: 1 month forward rate Euro/Pound
F3USBP: 3 month forward rate USD/Pound
F3USEUR: 3 month forward rate USD/Euro
F3EURBP: 3 month forward rate Euro/Pound.
6 Verbeek observes that none of the variables is expressed in logs and that pre-Euro rates are
based on exchange rates against the German mark.
123
1.5
2.0
2.5
EXUSBP
0.6
0.8
1.0
1.2
1.4
EXUSEUR
1980
1985
1990
1995
2000
Time
Figure 4.6
Evolution of the US$/BP and US$/EUR exchange rates
A multiple time series object may be defined by using the information that data were
collected with a monthly frequence starting from 1979, January.
> riskpremia <- ts(data = riskpremia, start = c(1979,
1), frequency = 12)
Graphical representations may be obtained by using the function xyplot available in
the package lattice (cfr. ?lattice::lattice and Longhow Lam (2010) for more
information), with separate graphs or a unique graph for each of the time series, see
Figures 4.6 and 4.7.
> library(lattice)
> xyplot(riskpremia[, 1:2])
> xyplot(riskpremia[, 1:2], superpose = TRUE)
Figure 4.8 shows the evolution of the forward discounts obtained as the difference
between the logarithms of the spot rates and of the forward rates (1 month): the
computed series are first combined in the matrix rp, which preserves the multiple
time series, mts, attribute, and the column names are also specified. The matrix is
then plotted by means of the function xyplot.
124
0.5
1.0
1.5
2.0
2.5
EXUSBP
EXUSEUR
1980
1985
1990
1995
2000
Time
Figure 4.7
Evolution of the US$/BP and US$/EUR exchange rates
> rp <- cbind("US$/GBP" = log(riskpremia[, 1]) - log(riskpremia[,

4]), "US$/EUR" = log(riskpremia[, 2]) - log(riskpremia[,
5]))
> xyplot(rp, superpose = TRUE)
4.3.1
Tests for Risk Premia in the 1 month Market
We have to estimate the parameters in the following model, cfr. equation (4.70) in
Verbeek:
st ft1 = 1 + 2 (st1 ft1 ) + et .
log(EXUSBPt ) log(F1USBPt1 ) = 1 + 2 [log(EXUSBPt1 ) log(F1USBPt1 )] + et
(4.6)
To define lagged variables within the formula of a linear model it is possible to make
use of the operators available in the package dynlm (Dynamic Linear Regression): in
particular by means of the function L(x,k) (equivalent to the function lag(x,lag=k) available in the base system) the series x is lagged of k time units, by default k is
125
0.015
0.010
0.005
0.000
0.005
US$/GBP
US$/EUR
1980
1985
1990
1995
2000
Time
Figure 4.8
Forward discount, US$/BP and US$/EUR
set equal to 1.
To estimate the linear model parameters we have then to invoke the function dynlm,
which is characterized by the same structure of the function lm, we have used in the
preceding Sections for the estimation of linear models.
> library(dynlm)
> riskpremia4.12 <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
L(log(EXUSBP) - log(F1USBP)), data = riskpremia)
> summary(riskpremia4.12)
Start = 1979(2), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F1USBP)) ~ L(log(EXUSBP) log(F1USBP)), data = riskpremia)
126
Residuals:
Min
1Q
-0.14766 -0.01909
Median
0.00073
3Q
0.02082
Max
0.12527
Coefficients:
(Intercept)
-0.005112
0.002365 -2.162 0.031514 *
L(log(EXUSBP) - log(F1USBP)) 3.212170
0.817474
3.929 0.000108 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Breusch-Godfrey statistic may be computed with regard to the two tests for the
presence of first and up to twelfth-order autocorrelation. We can invoke the function
bgtest, we presented in Section 4.2.5.
> library(lmtest)
> bgtest(riskpremia4.12)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremia4.12
LM test = 0.2179, df = 1, p-value = 0.6406
> bgtest(riskpremia4.12, order = 12)
to 12
LM test = 10.2603, df = 12, p-value = 0.5931
Both tests reject the null hypotheses of no serial correlation of the residuals. We
remind that the 0.95 quantiles of the two Chi-squared distributions with 1 and 12
degrees of freedom may be obtained as:
> qchisq(0.95, c(1, 12))
[1] 3.841459 21.026070
An ANOVA test may be performed to verify if the intercept and the coefficient in the
linear model (4.6) may jointly be assumed both equal to 0. It is necessary to produce
an lm object related to the simpler model, the one under the null hyphotesis that
1 = 2 = 0. This model contains no regressors, so in defining the formula we have
to exclude the intercept.
> riskpremia4.12anov <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
-1, data = riskpremia)
> anova(riskpremia4.12anov, riskpremia4.12)
127

Model 1: log(EXUSBP) - log(L(F1USBP)) ~ -1
Model 2: log(EXUSBP) - log(L(F1USBP)) ~ L(log(EXUSBP) - log(F1USBP))
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
275 0.28699
2
273 0.27159 2 0.015406 7.7433 0.0005359 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F statistic gives evidence that the hypothesis 1 = 2 = 0, equivalent to no risk
premium, has to be rejected.
The t tests presented in the regression output are asymptotically valid only if the
errors t are not serially correlated and heteroscedasticity is not present. To check
the latter hypothesis the Breusch-Pagan test can be performed, which rejects the
hypothesis of homoscedastic errors.
> bptest(riskpremia4.12)
BP = 7.2569, df = 1, p-value = 0.007063
The significance of the coefficients has thus to be checked by making recourse to a
heteroscedasticity consistent parameter inference.
> coeftest(riskpremia4.12, vcov = vcovHC(riskpremia4.12, type = "HC1"))
(Intercept)
-0.0051118 0.0021386 -2.3903
L(log(EXUSBP) - log(F1USBP)) 3.2121699 0.9826770 3.2688
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "
Pr(>|t|)
0.017513 *
0.001218 **
" 1
USD/Euro currency ratio

The above analysis is proposed for the USD/Euro currency ratio.
> riskpremiauseur <- dynlm(log(EXUSEUR) - log(L(F1USEUR)) ~
L(log(EXUSEUR) - log(F1USEUR)), data = riskpremia)
> summary(riskpremiauseur)
Start = 1979(2), End = 2001(12)
128
Call:
dynlm(formula = log(EXUSEUR) - log(L(F1USEUR)) ~ L(log(EXUSEUR) log(F1USEUR)), data = riskpremia)
Residuals:
Min
1Q
Median
-0.103024 -0.021487 -0.000015
3Q
0.020975
Max
0.088699
Coefficients:
(Intercept)
-0.002280
0.003149 -0.724
0.470
L(log(EXUSEUR) - log(F1USEUR)) 0.484791
0.766435
0.633
0.528
Adjusted R-squared: -0.002194
> bgtest(riskpremiauseur)
to 1
data: riskpremiauseur
LM test = 0.1176, df = 1, p-value = 0.7316
> bgtest(riskpremiauseur, order = 12)
to 12
LM test = 14.1237, df = 12, p-value = 0.2929
As observed by Verbeek no risk premium is found for the USD/Euro rate, namely
both the regression coefficients are not significantly different from zero; furthermore
the hyphoteses of no first-order and up to the twelfth-order autocorrelation are not
rejected. The Breusch-Pagan test gives evidence of the presence of heteroscedasticity
but the inference based upon heteroscedasticity consistent standard errors confirms
the previous conclusions.
> bptest(riskpremiauseur)
BP = 3.965, df = 1, p-value = 0.04646
> coeftest(riskpremiauseur,vcov=vcovHC(riskpremiauseur,type="HC1"))
129
(Intercept)
-0.0022795 0.0030643 -0.7439
0.4576
L(log(EXUSEUR)-log(F1USEUR)) 0.4847906 0.8420818 0.5757
0.5653
4.3.2
Tests for Risk Premia using Overlapping Samples
The parameters in the following model, cfr. Verbeek equation (4.72), have to be
estimated:
3
3
st ft3
= 1 + 2 (st3 ft3
) + et
log(EXUSBPt ) log(F3USBPt3 ) = 1 + 2 (log(EXUSBPt3 ) log(F3USBPt3 )) + et
(4.7)
The function dynlm may be invoked to estimate a linear model in presence of lagged
variables. Remember that lagged variables can be introduced in the model formula
with the operator L(.).
> library(dynlm)
> riskpremiaoverlUSBP <- dynlm(log(EXUSBP) - log(L(F3USBP,
3)) ~ L(log(EXUSBP) - log(F3USBP), 3), data = riskpremia)
> summary(riskpremiaoverlUSBP)
Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F3USBP, 3)) ~ L(log(EXUSBP) log(F3USBP), 3), data = riskpremia)
Residuals:
Min
1Q
-0.285511 -0.025561
Median
0.001782
3Q
0.029698
Max
0.176615
Coefficients:
(Intercept)
-0.013566
0.004216 -3.218 0.00145 **
L(log(EXUSBP)-log(F3USBP),3) 3.135215
0.529277
5.924 9.53e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Breusch-Godfrey statistic is then computed to check for the presence of serially
correlated errors; in particular there is evidence of a strong autocorrelation with
130
reference to the first- and the twelfth-order, but as Verbeek observes the conclusions
are incorrect due to the fact that monthly data for 3 months contracts are considered
and, though see Verbeek relationship (4.73) t may be assumed to be uncorrelated
with xt3 , t may be possibly correlated with t1 and with t2 .
> bgtest(riskpremiaoverlUSBP)
data: riskpremiaoverlUSBP
LM test = 119.6924, df = 1, p-value < 2.2e-16
> bgtest(riskpremiaoverlUSBP, order = 12)
data: riskpremiaoverlUSBP
The Breusch-Godfrey statistic must then be computed only with reference to the
autocorrelations of order 3,4, until 12. The auxiliary equation referred to (4.7) is:
3
et = 1 + 2 (st3 ft3
) + 3 et3 + + 12 et12
According to what suggested by Davidson and MacKinnon (1993) we have to set to 0

any presample value of the residuals to compute the Breusch-Godfrey statistic. This
can be obtained in the following way by creating a proper matrix for the regressors.
>
>
>
>
>
>
>
reUSBP <- residuals(riskpremiaoverlUSBP)

re <- lag(reUSBP)
for (o in 3:12) re <- cbind(re, lag(reUSBP, -o))
re <- window(re, start = c(1979, 4), end = c(2001, 12))
re[is.na(re)] <- 0
re <- re[, -1]
dimnames(re)[[2]] <- paste("USBPlag", 3:12, sep = "")
The matrix re is an mts (multiple time series) object obtained by binding the time
series of the residuals and its proper lagged versions and preserves the initial start of
the series, but adds information at the end of the series have a look at the object
re once You have created it. The correct time window is to be considered and this is
obtained by means of the function window. Any presample value (identified as a NA
not available case in the matrix) is set to zero and the series of the order 1 lagged
errors is dropped. Proper names are finally assigned to the elements (columns) of the
multiple time series object.
> check <- dynlm(reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + re,
data = riskpremia)
The results of the auxiliary regression estimation are reported here only for
completeness, thus the following instruction may be dropped.
> summary(check)
131

Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + re,
data = riskpremia)
Residuals:
Min
1Q
-0.277077 -0.027074
Median
0.003567
3Q
0.033847
Max
0.162478
Coefficients:
(Intercept)
-0.0003768 0.0042542 -0.089
0.9295
L(log(EXUSBP)-log(F3USBP),3) 0.0680321 0.5378860
0.126
0.8994
reUSBPlag3
-0.0372339 0.0970268 -0.384
0.7015
reUSBPlag4
-0.0582779 0.1313641 -0.444
0.6577
reUSBPlag5
0.0615611 0.1312624
0.469
0.6395
reUSBPlag6
-0.1456368 0.1402550 -1.038
0.3001
reUSBPlag7
-0.0228997 0.1470880 -0.156
0.8764
reUSBPlag8
0.1280666 0.1471511
0.870
0.3849
reUSBPlag9
-0.0768684 0.1408519 -0.546
0.5857
reUSBPlag10
-0.0840098 0.1323110 -0.635
0.5260
reUSBPlag11
0.2226356 0.1325330
1.680
0.0942 .
reUSBPlag12
-0.1622903 0.0973472 -1.667
0.0967 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Breusch-Godfrey statistic can finally be computed by multiplying the multiple
R-squared by the number of observations in the auxiliary regression.
> T <- length(reUSBP)
> summary(check)$r.squared * T
[1] 7.170992
> qchisq(0.95, 10)
[1] 18.30704
To obtain the results of Verbeek the above assumption (consisting in setting to 0 any
presample residual values) must not be made, and T must be substituted by T 12:
it then follows a slightly different formulation, though with the same final conclusion.
132
> reUSBP <- residuals(riskpremiaoverlUSBP)

> check <- dynlm(reUSBP ~ L(log(EXUSBP) - log(F3USBP),
3) + L(reUSBP, 3:12), data = riskpremia)
The results of the auxiliary regression estimation are reported only for completeness,
thus the following instruction may be dropped. Observe that the starting and final
times for the following regression are diffferent from the ones appearing in the
preceding output.
> summary(check)
Start = 1980(4), End = 2001(12)
Call:
dynlm(formula = reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + L(reUSBP,
3:12), data = riskpremia)
Residuals:
Min
1Q
-0.276828 -0.027926
Median
0.004025
3Q
0.033975
Max
0.164022
Coefficients:
(Intercept)
-0.002471
0.004304 -0.574
0.5663
L(log(EXUSBP)-log(F3USBP),3) 0.166460
0.535840
0.311
0.7563
L(reUSBP, 3:12)3
-0.016621
0.098831 -0.168
0.8666
L(reUSBP, 3:12)4
-0.060702
0.133137 -0.456
0.6488
L(reUSBP, 3:12)5
0.037894
0.131270
0.289
0.7731
L(reUSBP, 3:12)6
-0.141032
0.141052 -1.000
0.3183
L(reUSBP, 3:12)7
-0.023177
0.146965 -0.158
0.8748
L(reUSBP, 3:12)8
0.122700
0.146858
0.835
0.4042
L(reUSBP, 3:12)9
-0.079221
0.140486 -0.564
0.5733
L(reUSBP, 3:12)10
-0.085884
0.131756 -0.652
0.5151
L(reUSBP, 3:12)11
0.223343
0.131967
1.692
0.0918 .
L(reUSBP, 3:12)12
-0.163520
0.096852 -1.688
0.0926 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
It follows the value of the Breusch-Godfrey statistic
> T <- length(reUSBP)
> summary(check)$r.squared * (T - 12)
[1] 7.852361
133
Heteroskedasticity and Autocorrelation Consistent (HAC) Covariance

Matrix Estimation (Newey-West)
To take into account the presence of autocorrelation for the parameter inference it
is possible to make recourse to OLS but by computing the corrected (Newey-West)
standard errors, that is the Heteroskedasticity and Autocorrelation Consistent (HAC)
standard errors.
> library(sandwich)
Verbeek suggests to use H=3 in relationships (4.62) and (4.63). Pay attention: with
regard to the instruction NeweyWest, see also Zeileis (2004), the argument lag is the
maximum lag with positive correlation, so here is lag=H-1; Verbeek does not suggest
to use the adjusted estimates for finite samples so adjust=FALSE. The argument
prewhite must be set to FALSE to obtain Newey-West standard errors according to
(4.62) and (4.63).
> coeftest(riskpremiaoverlUSBP, vcov = NeweyWest(riskpremiaoverlUSBP,
lag = 2, adjust = FALSE, sandwich = TRUE, prewhite = FALSE))
(Intercept)
-0.0135664 0.0053729 -2.5250
L(log(EXUSBP)-log(F3USBP),3) 3.1352149 1.0560150 2.9689
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 "
Pr(>|t|)
0.012142 *
0.003256 **
" 1
There seem to be some misprints in the standard errors values reported on Verbeeks
p. 133; according to Verbeeks formulae (4.62) and (4.63) we perform the following
check.
> a <- ts(cbind(model.matrix(riskpremiaoverlUSBP),
riskpremiaoverlUSBP$res), start = c(1979, 4),
frequency = 12)
> dimnames(a)[[2]] <- c("int", "x1", "e")
> first.term.4.70 <- t(a[, 1:2]) %*% (a[, 1:2] * a[,
3]^2)/T
> H <- 3
> cum <- 0
> for (j in 1:(H - 1)) {
w_j <- (1 - j/H)
as <- window(a, start = c(1979, 4 + j), end = c(2001,
12))
aj <- window(lag(a, -j), start = c(1979, 4 +
j), end = c(2001, 12))
cum <- cum + w_j * (t(as[, 1:2] * as[, 3]) %*%
(aj[, 1:2] * aj[, 3]) + t(aj[, 1:2] * aj[,
3]) %*% (as[, 1:2] * as[, 3]))
134
}
second.term.4.70 <- cum/T
Sstar <- first.term.4.70 + second.term.4.70
ext.term.4.69 <- solve(t(a[, 1:2]) %*% a[, 1:2])
vcovbeta <- ext.term.4.69 %*% (T * Sstar) %*% ext.term.4.69
diag(vcovbeta)^0.5
int
x1
0.005372888 1.056015009
>
>
>
>
>
USD/Euro currency ratio

For the USD/Euro currency ratio we have analogous code:
> riskpremiaoverlUSEUR <- dynlm(log(EXUSEUR) - log(L(F3USEUR, 3)) ~
L(log(EXUSEUR) - log(F3USEUR), 3), data = riskpremia)
> summary(riskpremiaoverlUSEUR)
Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = log(EXUSEUR) - log(L(F3USEUR, 3)) ~ L(log(EXUSEUR) log(F3USEUR), 3), data = riskpremia)
Residuals:
Min
1Q
Median
-0.15097 -0.04598 -0.00358
3Q
0.04268
Max
0.15541
Coefficients:
(Intercept)
-0.010506
0.005983 -1.756
0.0802 .
L(log(EXUSEUR)-log(F3USEUR),3) 0.006050
0.534784
0.011
0.9910
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Multiple R-squared: 4.722e-07,
Adjusted R-squared:
-0.00369
> bgtest(riskpremiaoverlUSEUR)
to 1
data: riskpremiaoverlUSEUR
> bgtest(riskpremiaoverlUSEUR, order = 12)
135

to 12
data: riskpremiaoverlUSEUR
> qchisq(0.95, c(1, 12))
[1] 3.841459 21.026070
> reUSEUR <- residuals(riskpremiaoverlUSEUR)
> check <- dynlm(reUSEUR ~ L(log(EXUSEUR) - log(F3USEUR),
3) + L(reUSEUR, 3:12), data = riskpremia)
> summary(check)$r.squared * (T - 12)
[1] 9.040125
> coeftest(riskpremiaoverlUSEUR, vcov = NeweyWest(riskpremiaoverlUSEUR,
lag = 2, adjust = FALSE, sandwich = TRUE, prewhite = FALSE))
(Intercept)
-0.0105060 0.0082893 -1.2674
0.2061
L(log(EXUSEUR)-log(F3USEUR),3) 0.0060495 0.7667389 0.0079
0.9937
5
Endogeneity, Instrumental
Variables and GMM
5.1
Estimating the Returns to Schooling (Section 5.4)
Data are available in the file schooling.wf1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The function unzip extracts the file from the compressed archive ch05.zip.
> library(hexView)
> schooling <- readEViews(unzip("ch05.zip", "Chapter 5/schooling.wf1"))
Recall that by using the functions head(), tail() and summary() it is possible to
explore the beginning, the final section and to obtain summary statistics for all the
variables included in the data-frame.
The files schooling contain data taken from the National Longitudinal Survey of
Young Men (NLSYM) concerning the United States. The analysis focuses on 1976
but uses some variables that date back to earlier years. The following variables (many
are dummy variables) are present:
SMSA66 1 if lived in smsa in 1966
SMSA76 1 if lived in smsa in 1976
NEARC2 grew up near 2-year college
NEARC4 grew up near 4-year college
NEARC4A grew up near 4-year public college
NEARC4B grew up near 4-year private college
ED76 education in 1976
ED66 education in 1966
AGE76 age in 1976
DADED dads education (imputed avg if missing)
NODADED 1 if dads education imputed
138
Endogeneity, Instrumental Variables and GMM
MOMED mothers education
NOMOMED 1 if moms education imputed
MOMDAD14 1 if lived with mom and dad at age 14
SINMOM14 1 if single mom at age 14
STEP14 1 if step parent at age 14
SOUTH66 1 if lived in south in 1966
SOUTH76 1 if lived in south in 1976
LWAGE76 log wage in 1976 (outliers trimmed)
FAMED mom-dad education class (1-9)
BLACK 1 if black
WAGE76 wage in 1976 (raw, cents per hour)
ENROLL76 1 if enrolled in 1976
KWW the kww score
IQSCORE a normed IQ score
MAR76 marital status in 1976 (1 if married)
LIBCRD14 1 if library card in home at age 14
EXP76 experience in 1976
EXP762 EXP76 squared
The estimation of a linear model explaining the log wage in 1976 is proposed:
LWAGE76 = 1 + 2 ED76 + 3 EXP76 + 4 EXP762 + 5 BLACK + 6 SMSA76 + 7 SOUTH76 + ERROR
The parameter estimates appearing in Verbeeks Table 5.1 may be obtained by using
the function lm.
> schooling5.1 <- lm(LWAGE76 ~ ED76 + EXP76 + EXP762 +
BLACK + SMSA76 + SOUTH76, data = schooling)
> summary(schooling5.1)
Call:
lm(formula = LWAGE76 ~ ED76 + EXP76 + EXP762 + BLACK + SMSA76 +
SOUTH76, data = schooling)
Residuals:
Min
1Q
-1.59297 -0.22315
Median
0.01893
3Q
0.24223
Max
1.33190
Coefficients:
(Intercept)
ED76
EXP76

4.7336644 0.0676026 70.022 < 2e-16 ***
0.0740090 0.0035054 21.113 < 2e-16 ***
0.0835958 0.0066478 12.575 < 2e-16 ***
EXP762
-0.0022409
BLACK
-0.1896315
SMSA76
0.1614230
SOUTH76
-0.1248615
139
0.0003178 -7.050 2.21e-12 ***

0.0176266 -10.758 < 2e-16 ***
0.0155733 10.365 < 2e-16 ***
0.0151182 -8.259 < 2e-16 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The average return on wage of schooling is approximately equal to 7.4% per year of
instruction. As Verbeek observes since schooling is endogenous also the experience and
its square have to be considered endogenous; so we need at least three instruments.
Age and its square are chosen as instruments for the corresponding experience
variables; live near a college may be chosen as instrument for schooling if this variable
affects schooling conditional on the other variables in the initial model (Verbeek
underlines this is a necessary but not sufficient condition for the variable to be a
valid instrument).
The estimation of the following reduced form model is then performed to check for
the relevance of the proposed instruments:
ED76 = 0 + 1 AGE76 + 2 AGE762 + 3 BLACK + 4 SMSA76 + 5 SOUTH76 + 6 NEARC4 + ERROR
> schooling5.2 <- lm(ED76 ~ AGE76 + I(AGE76^2) + BLACK +

SMSA76 + SOUTH76 + NEARC4, data = schooling)
Call:
lm(formula = ED76 ~ AGE76 + I(AGE76^2) + BLACK + SMSA76 + SOUTH76 +
NEARC4, data = schooling)
Residuals:
Min
1Q
-12.511 -1.722
Median
-0.296
3Q
1.876
Max
7.199
Coefficients:
4.298357 -0.435 0.663638
AGE76
1.061441
0.301398
3.522 0.000435 ***
I(AGE76^2) -0.018760
0.005231 -3.586 0.000341 ***
BLACK
-1.468367
0.115443 -12.719 < 2e-16 ***
SMSA76
0.835403
0.109252
7.647 2.76e-14 ***
SOUTH76
-0.459700
0.102434 -4.488 7.47e-06 ***
NEARC4
0.347105
0.106997
3.244 0.001191 **
---
140
Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

We can now obtain instrumental variables (IV) estimates by invoking the function
tsls, available in the package sem, which performs the two stage least squares
estimation. The function tsls has three arguments: the first is the linear model
formula, the second one is the same formula with the endogenous variables replaced
by their respective instruments. Observe that both formulae are expressed according
to the lm convention, see Appendix A.4. The third argument is the data.frame where
the involved variables are present.
> library(sem)
> schooling5.3 <- tsls(LWAGE76 ~ ED76 + EXP76 + EXP762 +
BLACK + SMSA76 + SOUTH76, ~NEARC4 + AGE76 + I(AGE76^2) +
BLACK + SMSA76 + SOUTH76, data = schooling)
2SLS Estimates
Model Formula: LWAGE76 ~ ED76+EXP76+EXP762+BLACK+SMSA76+SOUTH76
Instruments: ~NEARC4 + AGE76 + I(AGE76^2) + BLACK+SMSA76+SOUTH76
Residuals:
Min. 1st Qu.
-1.82400 -0.25250
Median
0.02286
Mean
0.00000
3rd Qu.
0.26350
Estimate
Std. Error
(Intercept) 4.0656681454 0.6084960487
ED76
0.1329471988 0.0513793955
EXP76
0.0559613854 0.0259944248
EXP762
-0.0007956595 0.0013403005
BLACK
-0.1031403726 0.0773729097
SMSA76
0.1079848759 0.0497398928
SOUTH76
-0.0981751843 0.0287645065
--Signif. codes: 0 "***" 0.001 "**" 0.01
Max.
1.31600
t value
6.68150
2.58756
2.15282
-0.59364
-1.33303
2.17099
-3.41307
Pr(>|t|)
2.8078e-11
0.00971243
0.03141204
0.55279589
0.18262324
0.03000991
0.00065087
***
**
*
*
***
"*" 0.05 "." 0.1 " " 1

The same result may also be obtained by implementing the formula
= (Z 0 X)1 Z 0 Y,
see Verbeeks Section 5.3.4.
141
>
>
>
>
Y <- schooling$LWAGE76
X <- model.matrix(schooling5.1)
Z <- X
Z[, 2:4] <- cbind(schooling$NEARC4, schooling$AGE76,
schooling$AGE76^2)
> solve(t(Z) %*% X) %*% t(Z) %*% Y
[,1]
(Intercept) 4.0656681823
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
model.matrix(schooling5.1) returns the matrix of the regressors in the first model.
Z[,2:4]<-cbind(schooling$NEARC4,schooling$AGE76,schooling$AGE762 ) replaces
the values of the endogenous variables in the matrix Z with the values of their instruments (note that the matrix Z was initially set equal to X).
%*% is the operator performing matrix multiplication.
solve() returns the inverse of a matrix, when applied to a single element, or in the
following form gives the solution of a linear system of equations, here (Z 0 X) = (Z 0 Y ):
> solve(t(Z) %*% X, t(Z) %*% Y)
[,1]
(Intercept) 4.0656681824
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
Observe that the latter code is computationally more efficient than the former for
solving a linear system of equations.
The function tsls may also be invoked by using the following four arguments: the
response, the matrix of independent variables, the matrix containing the instruments,
and a vector of weights to be used in the fitting process. Here we consider unitary
weights
> a <- tsls(Y, X, Z, w = 1)
The coefficients and their standard errors may be obtained by applying the function
summary to the object a, or by extracting from a the elements coefficients and
their covariance matrix V, (see the structure of a: str(a)):
> a$coefficients
142
[1] 4.0656681454 0.1329471988 0.0559613854 -0.0007956595

[5] -0.1031403726 0.1079848759 -0.0981751843
> diag(a$V)^0.5
[1] 0.608496049 0.051379396 0.025994425 0.001340301 0.077372910
[6] 0.049739893 0.028764507
5.2
Example of an application of the Generalized Method of

Moments
The following example by Dieter Rozenich is taken from the R help system of the
function gmm, (Chausse 2010).
For the two parameters of a normal distribution N (, 2 ) we have the following
three moment conditions:
E(X) = 0
E[(X )2 ] 2 = 0
E(X 3 ) (2 + 3 2 ) = 0
The first two moment conditions are directly obtained by the definition of N (, 2 ).
The third moment condition may be derived from the third derivative of the moment
generating function (MGF)

2 t2
MX (t) = E exp t +
2
evaluated at t = 0.
Note that, as is usual in GMM, we have more equations (3) than unknown
parameters (2).
A function, say g, is first defined, in order to establish the moment conditions which
depend on the unknown parameters, collected in a vector, say theta = [1 = , 2 =
2 ] and, of course, on the data x = [x1 , x2 , . . . , xn ].
> g <- function(theta, x) {
m1 <- x - theta[1]
m2 <- (x - theta[1])^2 - theta[2]
m3 <- x^3 - theta[1] * (theta[1]^2 + 3 * theta[2])
f <- cbind(m1, m2, m3)
return(f)
}
In presence of a vector of observations x = [x1 , x2 , . . . , xn ]:
m1 results in an n 1 vector with generic element m1i = xi 1 , i = 1, 2, . . . , n.

n
1 0
1X
1X
1n m1 =
m1i =
(xi 1 ) = 0,
n
n i=1
n i=1
where 1n is the n 1 unitary vector, corresponds to the first moment condition.
143
the generic element of the vector m2 is m2i = (xi 1 )2 2 , i = 1, 2, . . . , n.

n
1 0
1X
1X
1n m2 =
m2i =
(xi 1 )2 2 = 0
n
n i=1
n i=1
corresponds to the second moment condition.
analogously for m3,

n
1X
1X 3
1 0
1n m3 =
m3i =
x 1 (12 + 32 ) = 0
n
n i=1
n i=1 i
corresponds to the third moment condition.
f = [m1, m2, m3] is a n 3 matrix resuming the 3 moment conditions.
Let us generate a vector x of 100 pseudo random numbers, distributed according to

a Normal random variable with mean = 3 and variance 2 = 25.
> set.seed(123)
> x <- rnorm(100, mean = 3, sd = 5)
We now invoke the function gmm, available in the package gmm, whose main arguments
are the function, g, defining the moment conditions1 , the data and the starting values
to be passed to the numerical procedure for obtaining the parameter estimates.
> library(gmm)
> gmm(g, x, c(10, 10))
Method
twoStep
Objective function value:
Theta[1]
3.4615
0.0009022213
Theta[2]
20.7162
Convergence code =
The Jacobian related to the moment conditions can also be passed to the function gmm
to define the gradient, possibly improving the efficiency of the minimization algorithm
to solve the GMM problem. In the present case the Jacobian results:
1
0
J = 2 2E(X) 1
32 3 2 3
The function Dg is created to define the Jacobian.
1 The function g can also correspond to a formula when the model is linear (see the R-help
?gmm::gmm).
144
> Dg <- function(theta, x) {

jacobian <- matrix(c(-1, 2 * (theta[1] - mean(x)),
-3 * theta[1]^2 - 3 * theta[2], 0, -1, -3 *
theta[1]), 3, 2)
}
Pay attention: the Jacobian is the expected value of the Jacobian matrix, so in writing
its second row, first column, we had to specify mean(x) and not x, as we made in the
code defining the moment conditions by a matrix with typical element gj (, xi ).
The function gmm can be now invoked by taking into account also the argument
grad=Dg in order to specify the Jacobian.
> (estimation <- gmm(g, x, c(10, 10), grad = Dg))
Method
twoStep
Objective function value:
Theta[1]
3.4615
0.0009022213
Theta[2]
20.7162
Convergence code =
The covariance matrix of the parameter estimates can be obtained by means of the
function vcov.gmm.
> vcov.gmm(estimation)
Theta[1]
Theta[2]
Theta[1] 0.20798058 0.05594737
Theta[2] 0.05594737 7.88828621
5.3
Estimating Intertemporal Asset Pricing Models (Section

5.7)
The GMM framework is used to estimate an asset pricing model.

Data are available in the file pricing.dat, which is a text file, and may be read by
using the function read.table.
The function unzip extracts the file pricing.dat from the compressed archive
ch05.zip.
> pricing <- read.table(unzip("ch05.zip", "Chapter 5/pricing.dat"),
header = TRUE)
The data set contains monthly observations from February 1959 to November 1993
(T=418) on the returns on 10 size-based portfolios, the risk free rate and the
consumption growth.
The following variables are present:
145
r1: monthly return on portfolio 1 (small firms)
r2: monthly return on portfolio 2
...
r10: monthly return on portfolio 10 (large firms)
rf: risk free rate (return on 3-month T-bill)
cons: real per capita consumption growth based on total US personal

consumption expenditures (nondurables and services)
The portfolios are composed by the Center for Research in Security Prices (CRSP)
and contain stocks listed at NYSE, divided into 10 size-based deciles. For instance,
portfolio 1 contains the 10% smallest firms listed at NYSE.
Observe that the values of cons are relative values; that is they are obtained by
the ratio of total US personal consumption expenditures at times t and t 1.
We can transform the data.frame in a multiple time series. This is useful since it
allows us to work with data collected in a matrix object.
> pricing <- ts(data = pricing, start = c(1959, 2),
frequency = 12)
To apply GMM estimation we have first to recall the moment conditions, see Verbeeks
relationships (5.78) and (5.79):

(1
+
r
)
1=0
E CCt+1
f,t+1
t

(r
r
)
= 0, j = 1, . . . , J.
E CCt+1
j,t+1
f,t+1
t
We define g, a function of the parameters, resumed by the vector theta = [, ], and
of the data, here represented by the matrix x. The function g returns a n q matrix
with typical element gi (, xt ) for i = 1, . . . , q and t = 1, . . . , n. The columns of this
matrix are then used to build the q sample moment conditions.
> g <- function(theta,
e1 <- theta[1]
11]) - 1
e2 <- theta[1]
x[, 11])
f <- cbind(e1,
return(f)
}
x) {
* x[, 12]^(-theta[2]) * (1 + x[,
* x[, 12]^(-theta[2]) * (x[, 1:10] e2)
e1 contains the elements necessary to the function gmm to define the first moment
condition; the twelfth column, x[,12], of x is assumed to contain the ratio
values of US consumption expenditures (indeed they are stored as the twelfth
variable in the data.frame pricing); the eleventh column, x[,11], of x is
146
assumed to contain the risk free rates.

It consists of a vector with generic element:
2
1 x
12(i) (1 + x11(i) ) 1
Note that
1X
2
1 x
12(i) (1 + x11(i) ) 1 = 0
n i=1
is the empirical counterpart of the first moment condition.
e2 defines a matrix whose columns contain the elements necessary for defining
the other moment conditions; the columns of x[,1:10]-x[,11] contain the
differences between the monthly returns on portfolios 1-10 and the risk free rate
(Note that the recycling rule for vector differences has been applied).
So the elements in the generic jth column of e2 are of the type:
2
1 x
12(i) (xj(i) x11(i) ),
j = 1, . . . , 10;
by taking the sample averages we can obtain the empirical counterparts of the
remaining 10 moment conditions, j = 1, . . . , 10.
We now invoke the function gmm to estimate the two parameters using the GMM.
Two-step GMM
> library(gmm)
> pricing5.4_two <- gmm(g, pricing, c(0, 0), type = "twoStep",
wmatrix = "ident")
> summary(pricing5.4_two)
Call:
gmm(g=g,x=pricing,t0=c(0,0),type="twoStep",wmatrix="ident")
Method:
twoStep
Kernel:
Quadratic Spectral
Coefficients:
Estimate
Theta[1] 7.0043e-01
Theta[2] 9.1209e+01
Std. Error
1.4694e-01
3.9654e+01
t value
4.7666e+00
2.3001e+00
Pr(>|t|)
1.8732e-06
2.1442e-02
J-Test: degrees of freedom is 9

J-test
P-value
Test E(g)=0:
7.21050 0.61521
#############
Information related to the numerical optimization
Convergence code = 0
147
Function eval. = 129

Gradian eval. = NA
Iterated GMM
> pricing5.4_iter <- gmm(g, pricing, c(0, 0), type = "iterative",
vcov = "iid")
> summary(pricing5.4_iter)
Call:
gmm(g = g, x = pricing, t0 = c(0, 0), type = "iterative", vcov = "iid")
Method:
iterative
Kernel:
Quadratic Spectral
Coefficients:
Estimate
Theta[1] 8.2736e-01
Theta[2] 5.7394e+01
Std. Error
1.1616e-01
3.4221e+01
t value
7.1228e+00
1.6772e+00
Pr(>|t|)
1.0576e-12
9.3508e-02
J-Test: degrees of freedom is 9

J-test
P-value
Test E(g)=0:
5.76334 0.76335
Initial values of the coefficients
Theta[1] Theta[2]
0.700429 91.209310
#############
Information related to the numerical optimization
Convergence code = 0
Function eval. = 63
Gradian eval. = NA
One-step GMM The code to obtain the one-step estimates for general cases seems
not to be present in the package gmm, the method being used only in presence of a
linear model. However it is not difficult to implement it. We have to search for the
values of the parameters and minimizing the following function

2

2

Pn
Pn
Ct+1
Ct+1
1
1
(1 + rf,t+1 ) 1 + n t=1 Ct
(rj,t+1 rf,t+1 )
.
t=1
n
Ct
Let us first evaluate the moment conditions for a trial guess of the parameters:
> colMeans(g(c(0.5, 10), x = pricing))
e1
-0.485104512
148
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r1

0.004238430
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r2
0.003682385
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r3
0.003324212
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r4
0.003350064
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r5
0.002922254
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r6
0.003062154
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r7
0.002735428
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r8
0.002861830
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r9
0.002327386
e2.(x[, 1:10] - x[, 11]).x[, 1:10].r10
0.001604280
The objective function to be minimized is
> obj.f <- function(theta) sum(colMeans(g(theta, x = pricing))^2)
We can now call the R optimizer optim to search for the values minimizing the
objective function.
> output <- optim(c(0.5, 10), obj.f)
The arguments of the function optim are: par the starting values for the parameters
to be optimized over. fn the function to be minimized (or maximized), with first
argument the vector of parameters over which minimization is to take place. It should
return a scalar result. gr A function to return the gradient for the BFGS, CG
and L-BFGS-B methods. If it is NULL, a finite-difference approximation will be
used. method The method to be used for minimization. lower, upper Bounds on
the variables for the L-BFGS-B method, or bounds in which to search for method
Brent. control A list of control parameters. hessian a logical value specifying if a
numerically differentiated Hessian matrix should be returned. See the R help ?optim
for more information on this function.
The numerical result of the optimization problem can be obtained with
> output$par
[1] 0.6993077 91.4828214
Verbeeks Figure 5.1, see Fig. 5.1, may be obtained by by using the following code:
> it_mrs <- output$par[1] * pricing[, 12]^(-output$par[2])
> pred.mer <- as.numeric(-cov(it_mrs, pricing[, 1:10] -
0.08
0.06
0.04
0.02
0.00
Mean excess return
0.10
0.12
0.14
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Predicted mean excess return
Figure 5.1
Actual versus predicted mean excess returns of size-based portfolios
pricing[, 11])/mean(it_mrs))
mer <- colMeans(pricing[, 1:10] - pricing[, 11])
pred.mer <- exp(12 * pred.mer) - 1
mer <- exp(12 * mer) - 1
plot(mer ~ pred.mer, xlim = c(0, 0.14), ylim = c(0,
0.14), pch = 17, xlab = "Predicted mean excess return",
ylab = "Mean excess return")
> abline(0, 1)
>
>
>
>
149
6
Maximum Likelihood Estimation
and Specification Tests
In this Chapter the Maximum Likelihood method is applied to obtain the parameter
estimates characterizing some statistical distributions: we will take into consideration
the Normal, the Bernoulli, the Exponential and the Poisson ones. The method will
then be applied to estimate the parameters of a linear model with Gaussian errors.
Let (x1 , . . . , xn ) be a sample from a random variable X, discrete or continuous,
with probability density function
f (x; ).
(6.1)
The n-dimensional random variable, (X1 , . . . , Xn ), associated to (x1 , . . . , xn ), when
the hyphothesis of independence and identical distribution of the components of
(X1 , . . . , Xn ) can be assumed, has the following distribution function.
L(x1 , . . . , xn ; ) = fX1 ,...,Xn (x1 , . . . , xn ; ) =
n
Y
f (xi ; ).
i=1
By definition the Maximum Likelihood estimate of is given, in presence of a sample

(x1 , . . . , xn ), by the value which maximizes
L(|x1 , . . . , xn ) =
n
Y
f (xi ; ).
i=1
Usually one works with the log-Likelihood function:

l(x1 , . . . , xn ; ) = log [L(x1 , . . . , xn ; )] =
n
X
log[f (xi ; )].
i=1
since
argmax L(x1 , . . . , xn ; ) = argmax log [L(x1 , . . . , xn ; )] .
Sometimes the solution to the above maximization problem cannot be derived

algebraically; so one has to rely on numerical solutions. Observe that different
algorithms of optimization may give non-identical numerical solutions to the problem,
possibly depending also on the starting values assigned to the numerical procedure,
expecially when the problem is ill-posed from a computational point of view.
152
0.2
0.0
0.1
density
0.3
0.4
Maximum Likelihood Estimation and Specification Tests
10
10
Figure 6.1
6.1
Density function for X N (, 2 ), {5, 0, 5}, 2 = 1
Normal distribution
The density function of a Normal random variable is

(x )2
1
2
exp
f (x; , ) =
2 2
2
and has a bell-shaped behaviour. and are respectively location and scale
parameters corresponding to the mean and the standard deviation of the distribution.
Figure 6.1 and 6.2 show the behaviour of f for various and .
> plot(function(x) dnorm(x), xlim = c(-10, 10), ylab = "density")
> sapply(c(-5, 5), function(i) curve(dnorm(x, mean = i,
sd = 1), add = TRUE))
> plot(function(x) dnorm(x), xlim = c(-10, 10), ylab = "density")
> sapply(1:5, function(i) curve(dnorm(x, sd = i), add = TRUE))
153
0.2
0.0
0.1
density
0.3
0.4
10
10
Figure 6.2
Density function for X N (, 2 ), = 0, 2 {1, 4, 9, 16, 25}
Lets now generate a sample of n = 100 pseudo-random numbers from a Normal

random variable X N ( = 4, 2 = 9).
By using the function set.seed it is possible to replicate the same realization of the
sample and the reader can reproduce the same results presented below.
The function rnorm(n,mean,sd) generates n pseudo-random numbers from a Normal
random variable with = mean and = sd.
>
>
>
>
>
set.seed(1000)
n <- 100
mean <- 4
sd <- 3
x <- rnorm(n, mean = mean, sd = sd)
We now construct
log (L(x1 , . . . , xn ; )) =
n
X
i=1

ln

1
(xi )2
exp
2 2
(2 2 )1/2
154
that is the opposite of the log-Likelihood function1 , under the assumption that the
observations in the sample are i.i.d. X N (, 2 ) with and 2 unknown parameters.
With the function dnorm(x,mean,sd,log) we can obtain, for the value x, the density
function of the Normal distribution with = mean and = sd when log=FALSE and
the log-density when log=TRUE. Observe that by default the argument log is FALSE.
> ll <- function(theta) -sum(dnorm(x, mean = theta[1],
sd = theta[2]^0.5, log = TRUE))
Here theta is a vector with 2 elements: respectively the mean and the standard
deviation of a Normal distribution.
To invoke the minimization algorithm we need to specify the starting values for the
parameters upon which the objective function depends. Observe that in ill-posed
problems the solution of the minimization algorithm might depend highly on the
choice of the starting parameters.
We can use a Newton type algorithm as minimization algorithm, which is available
in the function nlm. The main arguments of nlm are the function, which has to be
minimized, and the starting values. One can also require to produce the hessian, which
will be used to construct I(, 2 ), the Fisher Information Matrix, and the covariance
matrix of the parameter estimates as the inverse of I(, 2 ). (See the help ?nlm for
more information on the function nlm).
Usually one has to try with different starting values to evaluate the sensitivity of the
solution. Here we propose the following two options:
the median and one half the interquartile range for the location and the scale
parameters respectively;
>
>
>
>
theta.start <- c(median(x), IQR(x)/2)

out <- nlm(ll, theta.start, hessian = TRUE)
theta.hat <- out$estimate
theta.hat
[1] 4.049135 9.027908

> fish <- out$hessian
> fish
[,1]
[,2]
[1,] 11.076763955 -0.000245457
[2,] -0.000245457 0.613226159
> solve(fish)
[,1]
[,2]
[1,] 9.027908e-02 3.613614e-05
[2,] 3.613614e-05 1.630720e+00
the values 0 and 1 respectively for and .
1 This
because R internal optimization routines provide for the solution to minimization problems.
>
>
>
>
155
theta.start <- c(0, 1)

out <- nlm(ll, theta.start, hessian = TRUE)
theta.hat <- out$estimate
theta.hat
[1] 4.049131 9.027900

> fish
[,1]
[,2]
[1,] 11.0767726456 -0.0002398594
[2,] -0.0002398594 0.6132280859
> solve(fish)
[,1]
[,2]
[1,] 9.027900e-02 3.531194e-05
[2,] 3.531194e-05 1.630715e+00
In this case the problem is well posed, so the solution does not depend on the initial
starting values. There are only negligible differences.
The behaviour of the logLikelihood function is shown in Figure 6.3 and Figure 6.4
that can be obtained with the code:
>
>
>
>
xx <- seq(-4, 15, l = 50)

yy <- seq(1, 20, l = 50)
grid <- expand.grid(xx, yy)
z <- sapply(1:nrow(grid), function(i) -ll(c(grid[i,
1], grid[i, 2])))
> z <- matrix(z, nrow = length(xx), ncol = length(yy))
> persp(xx, yy, z, theta = 45, phi = 45, shade = 0.45,
xlab = expression(mu), ylab = expression(sigma^2),
zlab = "logLikelihood")
> contour(xx, yy, z, nlevels = 250, xlab = expression(mu),
ylab = expression(sigma^2))
To define the surface of the graph we could not have recourse to the function outer,
see the help ?persp or Longhow Lam (2010), p. 91, since mean and sd are scalar
arguments for the function dnorm.
The object grid obtained by means of the function expand.grid contains the
elements of the cartesian product betwen the two sets xx and yy defining respectively
the supports for the mean and the variance.
The values, z, of the density function are then evaluated for each element in the
product set xx*yy by using sapply.
z is finally re-arranged as a matrix defining the surface to be plotted by the functions
156
logL
sig
m
a^
od
iho
ikel
u
m
Figure 6.3 Estimating the parameters of a Normal distribution.

Loglikelihood function ln(, 2 |x1 , . . . , xn ): perspective plot
persp and contour.

From the analysis of the contour plot it is evident that the estimate for the mean will
be more precise than that for the variance.
The packages stat4 and bbmle have functions devoted to maximum likelihood
estimation, which automatically produce the standard errors for the parameter
estimates. The function mle is present in stat4; the function mle2 in bbmle. In
particular mle2 returns also asymptotic tests.
In both cases the likelihood function must be defined by specifying the arguments as
scalars without collecting them in an array as made above with theta for nlm. So the
starting values have to be collected in a list.
> ll <- function(mi1, s2) -sum(dnorm(x, mean = mi1,
sd = s2^0.5, log = TRUE))
> theta.start <- list(mi1 = 0, s2 = 1)
> library(stats4)
> out <- mle(ll, start = theta.start)
80
290
0
7
6
00
60
8
0
8
2
80
60
60
40
60
8
40
13
20
50
0
14
72
98
208
0
0
0
0
0
0
02
20
2
1
0
28
1 0
48
1
0
46
40
0
40
1
0
76
8 10
1
30
1
0
0
66
220
0
9
40
1
3
0
16
6 78
40
60
00
1
20
4
380
29
15
22
0 1120
0
0
20 2400
000000000
98
86
88
90
92
94
96
72
74
76
78
80
82
84
64
66
68
70
540
62
86
17
334
4
4
4
4
4
4
4
4
4
0
300
80
3
3 20
60
28
4
00
60
80
80
20
38
0
18
20
5
20
10
20
60
7
00
40
6
4
80
26
7
40
9
40
1
8
80
10
0
1
00
0
56
20
10
15
20
00
00
10
15

Loglikelihood function ln(, 2 |x1 , . . . , xn ): contour plot
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
mi1 4.048892 0.3004372
s2 9.026253 1.2762730
-2 log L: 503.8196
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)
157
158

Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
mi1 4.04889
0.30044 13.4767 < 2.2e-16 ***
s2
9.02625
1.27627 7.0724 1.523e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 503.8196
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.5.
> plot(profile(out))
We now consider the behaviour of the Likelihood function

n
Y
(xi )2
1
exp
L(, 2 |x1 , . . . , xn ) =
2 2
(2 2 )1/2
i=1
for different sample sizes. Let generate a sample x of length 250 from X N ( =
4, 2 = 25) and define the function llplot to obtain the perspective and contour
graphs of the Likelihood function for the n initial elements of x.
The function pdf redirects the graph on a pdf (file) device.
dev.cur() returns the name of the current device R is working on.
dev.set() defines to R the device to work on.
dev.off() closes the current open device.
Before invoking the following function llplot two devices are opened to which are
respectively redirected the perspectives and contours plots.
> set.seed(1000)
> x <- rnorm(250, mean = 4, sd = 3)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
z <- matrix(z, nrow = length(xx), ncol = length(yy))
dev.set(dev1)
Likelihood profile: mi1
Likelihood profile: s2
2.5
99%
2.0
2.0
2.5
99%
95%
95%
1.5
90%
1.5
90%
1.0
80%
1.0
80%
0.0
0.0
0.5
50%
0.5
50%
3.5
4.0
mi1
4.5
10
12
s2

Likelihood function profiles
persp(xx, yy, z, theta = 45, phi = 45, shade = 0.15,

xlab = expression(mu), ylab = expression(sigma^2),
zlab = "Likelihood", sub = paste("n: ", n,
", Sample mean: ", round(mean(x), 2),
", Sample variance: ", round(var(x),
2), sep = ""))
dev.set(dev2)
contour(xx, yy, z, nlevels = 25, xlab = expression(mu),
ylab = expression(sigma^2), sub = paste("n: ",
n, ", Sample mean: ", round(mean(x),
2), ", Sample variance: ", round(var(x),
2), sep = ""))
}
> pdf("Chapter06-normallikelihoodpersp.pdf")
> dev1 <- dev.cur()
> layout(matrix(1:4, 2, 2, byrow = TRUE))
159
160
a^
2
sig
m
sig
m
u
m
u
m
a^
2
od
liho
o
liho
Like
Like
n: 100, Sample mean: 4.05, Sample variance: 9.12
Like
a^
2
sig
m
sig
m
u
m
u
m
a^
2
od
od
liho
liho
Like

Likelihood function L(, 2 |x1 , . . . , xn ), perspectives (n = 100, 150, 200, 250)
>
>
>
>
>
>
pdf("Chapter06-normallikelihoodcontour.pdf")
dev2 <- dev.cur()
layout(matrix(1:4, 2, 2, byrow = TRUE))
sapply(c(100, 150, 200, 250), function(i) llplot(i))
dev.off(dev1)
dev.off(dev2)
Figures 6.6 and 6.7 report respectively the Likelihood function behaviour via
perspective and contour plots of the surface for in the interval [3, 5] and 2 in
the interval [6, 12]. Observe that perspective plots do not have the same scale for the
Likelihood. It is evident that the Likelihood gets more and more concentrated on the
true values = 4, 2 = 9 when the sample size increases. There is a larger uncertainty
in estimating the mean than the variance.
Observe that with R version 2.15.1 64-bit, run on a Windows 7 system, contour plots
are not produced for n 150. The code works with the 32-bit version.
12
11
10
8
2e163
5e164
8e111
2e111
3.0
3.5
4.0
4.5
5.0
3.0
3.5
4.0
4.5
5.0
11
10
70
2
6e
2
15
215
1e
2e
216
215
e
1.6
8e
4e270
10
2e216
6e216
26
9
3e
269
2.6e269
1.2e
26
1.2e215
1.4e
4e216
8e270
11
12
12
7e163
5.
5e
1
3e
63
1
63
0
11
4e
2.
1
1
1.4
3.5e163
11
10
9
1
1
63
1
63
1
4e
0
11
8e
1
11
63
5e
2e
1.
4e
1e163
2.5e1
1.
6e
1
11
10 1
1e1
.2
e
10
11
1
e
0
1.6
10
1
6e
2.
12
2e270
3.0
3.5
4.0
4.5
5.0
3.0
3.5
4.0
4.5
5.0

Likelihood function L(, 2 |x1 , . . . , xn ), contour plots (n = 100, 150, 200, 250)
161
162
For a dynamic graph representation allowing a three-dimensional graph to be

examined from different perspectives, by changing the theta angle, use2 :
>
>
>
>
>
>
set.seed(1000)
x <- rnorm(250, mean = 4, sd = 3)
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
> z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
> z <- matrix(z, nrow = length(xx), ncol = length(yy))
> sapply(1:200, function(i) persp(xx, yy, z, theta = i,
phi = 45, shade = 0.15, xlab = expression(mu),
ylab = expression(sigma^2), zlab = "Likelihood"))
6.2
Bernoulli distribution
Lets generate a sample of n = 100 pseudo random numbers from a Bernoulli

distribution with p = 0.7.
Remind that the Bernoulli distribution has the following probability distribution
function
P (X = x) = px (1 p)1x , x = 0, 1
and may be considered a special case of the Binomial distribution whose probability
distribution function is:

n x
P (X = x) =
p (1 p)nx , x = 0, 1, . . . , n.
x
The Bernoulli is obtained for n = 1.
Use the function rbinom to generate a sample of a specified dimension from a
binomial distribution with parameters size = n and prob = p; the arguments of the
function are the dimension, size and probability.
>
>
>
>
set.seed(1000)
n <- 100
p <- 0.7
x <- rbinom(n, size = 1, prob = p)
The opposite of the log-Likelihood function

log [L(x1 , . . . , xn ; )] =
n
X

ln pxi (1 p)1xi
i=1
can be obtained by using the function dbinom, by specifying log=TRUE.

2 To
stop the sequence of pictures press the Escape key
163
> ll <- function(theta) -sum(dbinom(x, size = 1, prob = theta,

log = TRUE))
To start the minimization algorithm, available by invoking nlm, we consider here the
situation of complete uncertainty about p by establishing the starting value p = 0.5.
> theta.start <- 0.5
> out <- nlm(ll, theta.start, hessian = TRUE)
> theta.hat <- out$estimate
> theta.hat
[1] 0.72
> fish
[,1]
[1,] 496.2484
> solve(fish)
[,1]
[1,] 0.00201512
The behaviour of the logLikelihood function is shown in Figure 6.8 that can be
obtained with the code:
> xx <- seq(0, 1, l = 500)
> yy <- sapply(1:length(xx), function(i) -ll(xx[i]))
> plot(xx, yy, type = "l", xlab = "p", ylab = "logLikelihood")
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. For the sake of clarity we change the name to the parameter to estimate, to
remark that the starting value has to be specified as an element of a list.
> ll <- function(phat) -sum(dbinom(x, size = 1, prob = phat,
log = TRUE))
> theta.start <- list(phat = 0.5)
> library(stats4)
> summary(out)
Call:
Coefficients:
Estimate Std. Error
phat 0.7200001 0.04489945
-2 log L: 118.5907
164
200
300
400
logLikelihood
100
0.0
0.2
0.4
0.6
0.8
1.0
Figure 6.8 Estimating the parameter of a Bernoulli distribution.

Loglikelihood function ln(p|x1 , . . . , xn )
165
> library(bbmle)
> summary(out)
Call:
Coefficients:
Pr(z)
phat
0.7200
0.0449 16.036 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 118.5907
Here we did not encounter any problem of convergence thought the parameter space
was a subset of <. In case a constrained optimization were necessary, one can make
recourse to the functions nlminb or optim or constrOptim (see the R help system
for more information).
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.9.
n
Y
pxi (1 p)1xi
L(|x1 , . . . , xn ) =
i=1
for different sample sizes. Let generate a sample x of length 250 from X Be(p = .7)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x.
> set.seed(1000)
> x <- rbinom(250, size = 1, prob = 0.7)
x <- x[1:n]
xx <- seq(0, 1, l = 500)
ll <- function(theta) prod(dbinom(x, size = 1,
prob = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = "p", ylab = "Likelihood",
sub = paste("n: ", n, ", Sample mean: ",
round(mean(x), 2), sep = ""))
}
> pdf("Chapter06-bernoullilikelihood.pdf")
166
Likelihood profile: phat
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
0.60
0.65
0.70
0.75
0.80
phat

Profile of the Likelihood function
> sapply(c(100, 150, 200, 250), function(i) llplot(i))

> dev.off()
Figure 6.10 reports the Likelihood behaviour for p (0, 1). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.
6.3
Exponential distribution
The density function of an Exponential distribution is:

f (x; ) = ex ,
x 0.
We have E(X) = 1 and V ar(X) = 12 .

Figure 6.11 shows the behaviour of f for various .
> plot(function(x) dexp(x, rate = 3), xlim = c(0, 10),
ylab = "density")
0e+00
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0e+00
0.0e+00
4e55
1.0e68
p
n: 150, Sample mean: 0.71
Likelihood
p
Likelihood
167
2e40
Likelihood
1.0e26
0.0e+00
Likelihood
0.0
0.2
0.4
0.6
0.8
1.0
0.0
p
0.2
0.4
0.6
0.8
1.0
p

Likelihood function L(p|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)
> sapply(1/c(2, 4, 5), function(i) curve(dexp(x, rate = i),

add = TRUE))
Lets generate a sample of n = 100 pseudo random numbers from an Exponential
distribution with = 4. The function rexp may be invoked.
>
>
>
>
set.seed(1000)
n <- 100
lambda <- 4
x <- rexp(n, rate = lambda)

log [L(x1 , . . . , xn ; )] =
n
X

ln exi
i=1
can be obtained by using the function dexp with the argument log=TRUE.
168
1.5
0.0
0.5
1.0
density
2.0
2.5
3.0
10
Figure 6.11
Density function for X Exp(), {1, 0.5, 0.25, 0.2}
> ll <- function(theta) -sum(dexp(x, rate = theta,

log = TRUE))
As starting value we use
and invoke nlm to obtain the parameter estimates
> theta.hat
[1] 4.250685
> fish
[,1]
[1,] 5.53344
> solve(fish)
169
200
400
300
logLikelihood
100
10
Figure 6.12 Estimating the parameter of an Exponential distribution.

Loglikelihood function ln(|x1 , . . . , xn )
[,1]
[1,] 0.1807194
> xx <- seq(0, 10, l = 500)
> plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "logLikelihood")
> ll <- function(lambdahat) -sum(dexp(x, rate = lambdahat,
log = TRUE))
170
> theta.start <- list(lambdahat = 0.5)

> library(stats4)
> summary(out)
Call:
Coefficients:
Estimate Std. Error
lambdahat 4.250685 0.4250685
-2 log L: -89.41605
> library(bbmle)
> summary(out)
Call:
Coefficients:
Pr(z)
lambdahat 4.25069
0.42507
10 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: -89.41605
L(|x1 , . . . , xn ; ) =
n
Y
exi
i=1
for different sample sizes. Let generate a sample x of length 250 from X Exp( = 4)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x. The function pdf opens a pdf file as graphical device to save
the graph.
> set.seed(1000)
> x <- rexp(250, rate = 4)
Likelihood profile: lambdahat
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
3.5
4.0
4.5
5.0
5.5
lambdahat


x <- x[1:n]
xx <- seq(0, 10, l = 500)
ll <- function(theta) prod(dexp(x, rate = theta))
plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "Likelihood", sub = paste("n: ", n,
sep = ""))
}
> pdf("Chapter06-exponentiallikelihood.pdf")

> dev.off()
171
Likelihood
1.0e+19
0e+00
2e+34
2.0e+19
4e+34
0.0e+00
Likelihood
172
10
10
4e+58
Likelihood
0e+00
2e+58
6.0e+45
0.0e+00
Likelihood
1.2e+46
10
10

Likelihood function L(|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)
Figure 6.14 reports the Likelihood behaviour for (0, 10). Observe that the graphs
6.4
Poisson distribution
The distribution function of a Poisson distribution is:

p(x; ) =
x e
,
x!
x = 0, 1, 2, . . . , .
We have E(X) = V ar(X) = .

Figure 6.15 shows the behaviour of p(x; ) for various .
> layout(matrix(1:4, 2, 2))
> sapply(1:4, function(i) plot(0:15, dpois(0:15, lambda = i),
ylim=c(0,0.4),type="h",xlab="x",ylab=expression(paste(lambda),
"=", i, sep = ""), lwd = 2))
0.4
0.3
0.1
0.0
0
10
15
10
15
10
15
0.3
0.1
0.0
0.0
0.1
0.2
0.2
0.3
0.4
0.4
173
0.2
0.2
0.0
0.1
0.3
0.4
10
15
Figure 6.15
Probability distribution function for X P oisson(), {1, 2, 3, 4}
Lets generate a sample of n = 100 pseudo random numbers from a Poisson

distribution with = 4. The function rpois may be invoked.
>
>
>
>
set.seed(1000)
n <- 100
lambda <- 4
x <- rpois(n, lambda = lambda)

n
X
xi e
log [L(x1 , . . . , xn ; )] =
ln
xi !
i=1

can be obtained by using the function dpois with the argument log=TRUE.
> ll <- function(theta) -sum(dpois(x, lambda = theta,
log = TRUE))
As starting value we use
174

and invoke nlm to obtain the parameter estimates
> theta.hat
[1] 3.789998
> fish
[,1]
[1,] 26.37997
> solve(fish)
[,1]
[1,] 0.03790754
> xx <- seq(0, 10, l = 500)
> plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "logLikelihood")
> ll <- function(lambdahat) -sum(dpois(x, lambda = lambdahat,
log = TRUE))
> theta.start <- list(lambdahat = 0.5)
> library(stats4)
> summary(out)
Call:
Coefficients:
Estimate Std. Error
lambdahat
3.79 0.1946792
-2 log L: 405.4604
> library(bbmle)
> summary(out)
1000
1500
logLikelihood
500
10
Figure 6.16 Estimating the parameter of a Poisson distribution.

Loglikelihood function ln(|x1 , . . . , xn )
175
176

Call:
Coefficients:
Pr(z)
lambdahat 3.79000
0.19468 19.468 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 405.4604
L(|x1 , . . . , xn ; ) =
n
Y
xi e
i=1
xi !
for different sample sizes. Let generate a sample x of length 250 from X
P oisson( = 4) and define the function llplot to obtain the graph of the Likelihood
function for the n initial elements of x.
> set.seed(1000)
> x <- rpois(250, lambda = 4)
x <- x[1:n]
xx <- seq(0, 10, l = 500)
ll <- function(theta) prod(dpois(x, lambda = theta))
plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "Likelihood", sub = paste("n: ", n,
sep = ""))
}
> pdf("Chapter06-poissonlikelihood.pdf")
> dev.off()
Figure 6.18 reports the Likelihood behaviour for (0, 10). Observe that the graphs
177
Likelihood profile: lambdahat
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
3.4
3.6
3.8
4.0
4.2
lambdahat
Figure 6.17 Estimating the parameter of a Poisson distribution.

6.5
Linear model
Lets generate the data according to a linear model formulation

yi = 1 + 2 x1i + 3 x2i + i
with i iid realizations of a Normal random variable with = 0 and 2 = 4.
The elements of X1 and X2 are obtained as iid realizations from a Normal random
variable with = 2 and 2 = 1.
The parameters are set to 1 = 10, 2 = 2 and 3 = 3.
>
>
>
>
>
set.seed(1000)
n <- 100
beta <- c(10, 2:3)
E <- rnorm(n, mean = 0, sd = 2)
W <- matrix(rnorm(n * (length(beta) - 1), mean = 2,
sd = 1), nrow = n, byrow = TRUE)
> X <- cbind(1, W)
> Y <- X %*% beta + E
178
1.5e134
Likelihood
0.0e+00
4e89
0e+00
Likelihood
8e89
10
10
Likelihood
0e+00
2e181
4e225
4e181
8e225
0e+00
Likelihood
10
10
Figure 6.18 Estimating the parameter of an Poisson distribution.

Likelihood function L(|x1 , . . . , xn ), perspective plots (n = 100, 150, 200, 250)
The log-Likelihood function, which depend on the parameters 1 , 2 , 3 and 2 ,

collected in the vector = [1 , 2 , 3 , 4 ] is

n
X
(yi (1 + 2 x1i + 3 x2i ))2
1
exp
ln
log (L(x1 , . . . , xn ; )) =
24
(24 )1/2
i=1
and its opposite can be formalized in the following way
> ll <- function(theta) -sum(dnorm(Y - X %*% theta[1:length(beta)],
mean = 0, sd = theta[length(beta) + 1]^0.5, log = TRUE))
Starting values for the minimization algorithm, via nlm, are defined randomly for the
linear model parameters, by only ensuring that the starting value pertaining to the
variance of the error is positive.
>
>
>
>
beta.start <- c(rnorm(length(beta)), runif(1) * 10)

out <- nlm(ll, beta.start, hessian = TRUE)
beta.hat <- out$estimate
beta.hat
179
[1] 9.958192 1.841883 3.207483 3.953898

> fish
[,1]
[,2]
[,3]
[,4]
[1,] 25.29149416 52.850482554 49.364456806 -0.003152660
[2,] 52.85048255 133.812630879 103.181750523 -0.003085456
[3,] 49.36445681 103.181750523 117.183959136 -0.004704955
[4,] -0.00315266 -0.003085456 -0.004704955 3.197022477
> solve(fish)
[,1]
[,2]
[,3]
[,4]
[1,] 0.4087678112 -8.929471e-02 -9.357096e-02 1.792117e-04
[2,] -0.0892947085 4.278377e-02 -5.563977e-05 -4.684676e-05
[3,] -0.0935709619 -5.563977e-05 4.799992e-02 -2.168630e-05
[4,] 0.0001792117 -4.684676e-05 -2.168630e-05 3.127911e-01
bbmle. Observe that the minus log-likelihood must be expressed as a function of the
single parameters which are collected in a list to define the starting values.
> ll <- function(beta1, beta2, beta3, sigma2) -sum(dnorm(Y X %*% c(beta1, beta2, beta3), mean = 0, sd = sigma2^0.5,
log = TRUE))
> beta.start <- list(beta1 = rnorm(1), beta2 = rnorm(1),
beta3 = rnorm(1), sigma2 = runif(1) * 10)
> library(stats4)
> out <- mle(ll, start = beta.start)
> summary(out)
Call:
mle(minuslogl = ll, start = beta.start)
Coefficients:
Estimate Std. Error
beta1 9.957560 0.6394234
beta2 1.841974 0.2068663
beta3 3.207276 0.2191142
sigma2 3.954813 0.5594242
-2 log L: 421.258
> library(bbmle)
> out <- mle2(ll, start = beta.start)
> summary(out)
Call:
180
mle2(minuslogl = ll, start = beta.start)

Coefficients:
beta1
9.95756
0.63942 15.5727
beta2
1.84197
0.20687 8.9042
beta3
3.20728
0.21911 14.6375
sigma2 3.95481
0.55942 7.0694
--Signif. codes: 0 "***" 0.001 "**"
Pr(z)
< 2.2e-16
< 2.2e-16
< 2.2e-16
1.556e-12
***
***
***
***
0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 421.258
The parameter estimation of the linear model can be obtained also via OLS, by means
of the function lm.
> summary(lm(Y ~ W))
Call:
lm(formula = Y ~ W)
Residuals:
Min
1Q
-4.5902 -1.2625
Median
0.1035
3Q
1.0177
Max
5.2219
Coefficients:
(Intercept)
9.9582
0.6492
15.34 < 2e-16 ***
W1
1.8419
0.2100
8.77 6.04e-14 ***
W2
3.2075
0.2225
14.42 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.7409
Observe that the parameter standard errors and their p-values provided by Maximum
Likelihood are based on the asymptotic covariance matrix and on the normality
assumption for the asymptotic distributions of the parameter estimators, while those
obtained in the linear model via OLS are based on an unbiased estimate for the
variance of the residuals and on the t distribution (always under the assumption of
normality of the errors).
The estimator for the residual standard error can be obtained, for Maximum
Likelihood, as:
> out@coef[length(beta) + 1]^0.5
sigma2
1.988671
Likelihood profile: beta1
2.5
99%
2.5
99%
2.0
95%
2.0
95%
90%
80%
0.5
50%
0.0
0.0
0.5
50%
8.5 9.0 9.5
10.5
11.5
1.4
1.6
1.8
2.0
2.2
2.4
99%
2.5
Likelihood profile: sigma2

99%
95%
2.0

2.5
beta2
2.0
beta1
95%
90%
80%
1.0
80%
1.5
1.5
90%
1.0
181
1.0
80%
1.5
1.5
90%
1.0
0.5
50%
0.0
0.0
0.5
50%
2.6
2.8
3.0
3.2
3.4
3.6
3.8
beta3
3.0
3.5
4.0
4.5
5.0
5.5
6.0
sigma2
Figure 6.19
This is a biased estimate; the unbiased estimate that coincides with the OLS one is
> (out@coef[length(beta) + 1] * n/(n - length(beta)))^0.5
sigma2
2.01919
Observe that the function mle in the package bbmle produces as a result an object
of class S4. By executing the instruction str(out), you will notice that it is not a
traditional list, but some of its values are identified with the symbol @, which was
also used above to extract the coefficients.
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.19.
182
6.6
Individual wages (Section 2.5.5)
The preceding code is finally applied to the estimation of the linear model (2.2), we
considered in Chapter 2.
WAGE = 1 + 2 MALE + 3 SCHOOL + 4 EXPER + ERROR
> wages <- read.table(unzip("wages_in_the_USA.zip",
"wages1.dat"), header = TRUE)
> regr <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages)
> summary(regr)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457
3Q
Max
1.444 34.194
Coefficients:
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07
***
***
***
***
"*" 0.05 "." 0.1 " " 1

> attach(wages)
> ll <- function(beta) -sum(dnorm(WAGE - model.matrix(regr) %*%
beta[1:4], mean = 0, sd = beta[5]^0.5, log = TRUE))
> beta.start <- c(rnorm(dim(model.matrix(regr))[2]),
10)
> out <- nlm(ll, beta.start, hessian = TRUE)
> beta.hat <- out$estimate
> beta.hat
[1] -3.3797358 1.3443640 0.6387780 0.1248189 9.2677087
> round(fish, 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 355.4274 186.1301 4133.816 2858.8511 -0.0019
[2,] 186.1301 186.1301 2129.761 1549.7899 -0.0013
[3,] 4133.8157 2129.7606 49054.736 32988.8446 -0.2620
183
[4,] 2858.8511 1549.7899 32988.845 24859.3271 -0.1328

[5,]
-0.0019
-0.0013
-0.262
-0.1328 19.1680
> round(solve(fish), 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 0.2160 -0.0078 -0.0138 -6e-03 -0.0002
[2,] -0.0078 0.0116 0.0003 -3e-04 0.0000
[3,] -0.0138 0.0003 0.0011 1e-04 0.0000
[4,] -0.0060 -0.0003 0.0001 6e-04 0.0000
[5,] -0.0002 0.0000 0.0000 0e+00 0.0522
> ll <- function(beta) -sum(dnorm(WAGE - cbind(1, MALE,
SCHOOL, EXPER) %*% beta[1:4], mean = 0, sd = beta[5]^0.5,
log = TRUE))
> beta.start <- c(rnorm(4), 10)
> out <- nlm(ll, beta.start, hessian = TRUE)
> beta.hat <- out$estimate
> beta.hat
[1] -3.3797292 1.3443662 0.6387777 0.1248184 9.2677158
> round(fish, 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 355.4274 186.1300 4133.812 2858.8491 -0.0018
[2,] 186.1300 186.1300 2129.759 1549.7887 -0.0013
[3,] 4133.8124 2129.7589 49054.698 32988.8189 -0.2620
[4,] 2858.8491 1549.7887 32988.819 24859.3078 -0.1327
[5,]
-0.0018
-0.0013
-0.262
-0.1327 19.1679
> round(solve(fish), 4)
[,1]
[,2]
[,3]
[,4]
[,5]
[1,] 0.2159 -0.0078 -0.0138 -6e-03 -0.0002
[2,] -0.0078 0.0116 0.0003 -3e-04 0.0000
[3,] -0.0138 0.0003 0.0011 1e-04 0.0000
[4,] -0.0060 -0.0003 0.0001 6e-04 0.0000
[5,] -0.0002 0.0000 0.0000 0e+00 0.0522
> ll <- function(beta1, beta2, beta3, beta4, sigma2) -sum(dnorm(WAGE cbind(1, MALE, SCHOOL, EXPER) %*% c(beta1, beta2,
beta3, beta4), mean = 0, sd = sigma2^0.5,
log = TRUE))
> beta.start <- list(beta1 = 1, beta2 = 2, beta3 = 3,
beta4 = 4, sigma2 = 4)
> library(stats4)
> out <- mle(ll, start = beta.start)
> summary(out)
Call:
mle(minuslogl = ll, start = beta.start)
184
Coefficients:
Estimate
beta1 -3.3800069
beta2
1.3443683
beta3
0.6387972
beta4
0.1248248
sigma2 9.2677234
Std. Error
0.46469418
0.10761051
0.03277593
0.02374833
0.22836335
-2 log L: 16682.18
> library(bbmle)
> out <- mle2(ll, start = beta.start)
> summary(out)
Call:
mle2(minuslogl = ll, start = beta.start)
Coefficients:
Pr(z)
beta1 -3.380007
0.464694 -7.2736 3.500e-13 ***
beta2
1.344368
0.107611 12.4929 < 2.2e-16 ***
beta3
0.638797
0.032776 19.4898 < 2.2e-16 ***
beta4
0.124825
0.023748 5.2562 1.471e-07 ***
sigma2 9.267723
0.228363 40.5832 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 16682.18
> logLik(out)
'log Lik.' -8341.091 (df=5)
> round(vcov(out), 4)
beta1
beta2
beta3 beta4 sigma2
beta1
0.2159 -0.0078 -0.0138 -6e-03 0.0000
beta2 -0.0078 0.0116 0.0003 -3e-04 0.0000
beta3 -0.0138 0.0003 0.0011 1e-04 0.0000
beta4 -0.0060 -0.0003 0.0001 6e-04 0.0000
sigma2 0.0000 0.0000 0.0000 0e+00 0.0521
2.5
99%
2.5
99%
2.5
99%
2.0
95%
2.0
95%
2.0
95%
1.5
1.5
beta2
Likelihood profile: sigma2
99%
99%
95%
95%
0.60
0.65
0.70
beta3
1.5
80%
1.0
1.0
80%
0.55
90%
1.5
90%
50%
0.5
1.3
2.5
1.1
2.0
2.5
beta1
2.5
3.5
2.0
4.5
80%
0.0
0.5
50%
0.0
0.0
0.5
50%
90%
1.0
80%
1.0
1.0
80%
1.5
90%
1.5
90%
0.5
50%
0.0
0.0
0.5
50%
0.06
0.10
0.14
beta4
0.18
8.8
9.2
9.6
sigma2
Figure 6.20
185
7
Models with Limited Dependent
Variables
7.1
The Impact of Unemployment Benefits on Recipiency

(Section 7.1.6)
Data are available in the file BENEFITS.WF1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The file is extracted from the compressed archive ch07.zip with the function unzip.
> library(hexView)
> benefits <- readEViews(unzip("ch07.zip", "Chapter 7/benefits.wf1"))
The files BENEFITS contain a sample of 4877 blue collar workers that got unemployed
in the USA between 1982 and 1991. The following variables (many are dummy
variables) are present:
STATEUR state unemployment rate (in %)
STATEMB state maximum benefit level
STATE state of residence code
AGE age in years
AGE2 age squared and then divided by 10
TENURE years of tenure in job lost
SLACK1 if job lost due to slack work
ABOL 1 if job lost because position abolished
SEASONAL 1 if job lost becasue seasonal job ended
NWHITE 1 if non white
SCHOOL12 1 if more than 12 years of school
MALE 1 if male
188
Models with Limited Dependent Variables
BLUECOL 1 if blue collar worker
SMSA 1 if live in a metropolitan area
MARRIED 1 if married
DKIDS 1 if kids
DYKIDS 1 if young kids (0-5 yrs)
YRDISPL year of job displacement (1982=1,..., 1991=10)
RR replacement rate
RR2 RR squared
HEAD 1 if head of household
Y 1 if applied for (and received) UI benefits
Verbeek proposes the estimation of three different models to explain the

unemployment workers choice to apply for unemployment benefits:
a linear probability model
a Logit model
a Probit model
7.1.1
Estimation of the linear probability model
The linear probability model can be estimated (without making any attempt to
constrain the implied probabilities between 0 and 1) with the function lm, we used
for linear models, see Chapter 2.
> lpmfit <- lm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, data = benefits)
> summary(lpmfit)
Call:
lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Residuals:
Min
1Q
-0.9706 -0.5374
Median
0.2231
3Q
0.3347
Max
0.6770
Coefficients:
(Intercept) -0.0768689 0.1220560 -0.630 0.52887
RR
0.6288584 0.3842068
1.637 0.10174
RR2
-1.0190587 0.4809550 -2.119 0.03416 *
AGE
0.0157489
AGE2
-0.0014595
TENURE
0.0056531
SLACK
0.1281283
ABOL
-0.0065206
SEASONAL
0.0578745
HEAD
-0.0437490
MARRIED
0.0485952
DKIDS
-0.0305088
DYKIDS
0.0429115
SMSA
-0.0351950
NWHITE
0.0165889
YRDISPL
-0.0133149
SCHOOL12
-0.0140365
MALE
-0.0363176
STATEMB
0.0012394
STATEUR
0.0181479
0.0047841
0.0006016
0.0012152
0.0142249
0.0248281
0.0357985
0.0166430
0.0161348
0.0174321
0.0197563
0.0140138
0.0187109
0.0030686
0.0168433
0.0178142
0.0002039
0.0030843
3.292
-2.426
4.652
9.007
-0.263
1.617
-2.629
3.012
-1.750
2.172
-2.511
0.887
-4.339
-0.833
-2.039
6.078
5.884
0.00100
0.01530
3.37e-06
< 2e-16
0.79285
0.10601
0.00860
0.00261
0.08016
0.02990
0.01206
0.37534
1.46e-05
0.40468
0.04154
1.31e-09
4.28e-09
189
**
*
***
***
**
**
.
*
*
***
*
***
***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The predicted probability for each worker to apply for unemployment benefits can be
extracted from lpmfit with lpmfit$fitted.
7.1.2
Estimation of the Logit model
The parameters in the Logit model can be estimated with the function glm which is
used to fit generalized linear models. These models are specified by giving a symbolic
description of (see Hardin and Hilbe (2007)):
1. the probability distribution function (belonging to the exponential family) of
the dependent variable (response).
2. the linear systematic component relating the predictor, = X, to the product
of the matrix X containing the explanatory variables with the parameters .
3. the link function relating the mean of the response to the linear predictor.
This is made with the two main arguments of the function glm:
a model formula for the linear systematic component with the same structure
used for defining linear models,
the family which is a description of the probability distribution of the

dependent variable, with the specification of the link function.
In case of a logit model we have a binomial distribution Bin(1, p) (Bernoulli

distribution) for the dependent variable and the link can be specified as "logit".
190
Finally the data.frame containing the variables in the model may be specified with
the argument data.
By default the estimation method used by R for generalized linear models consists
in iteratively reweighted least squares (IWLS). See the help ?glm for more information
on the features of the glm function and Hardin and Hilbe (2007) for a detailed
presentation of generalized linear models.
> logitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
MALE + STATEMB + STATEUR, family = binomial(link = "logit"),
data = benefits)
> summary(logitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"logit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2024 -1.2216
0.6959
3Q
0.8844
Max
1.6015
Coefficients:
0.604168 -4.635 3.56e-06 ***
RR
3.068078
1.868226
1.642 0.10054
RR2
-4.890616
2.333522 -2.096 0.03610 *
AGE
0.067697
0.023910
2.831 0.00463 **
AGE2
-0.005968
0.003038 -1.964 0.04950 *
TENURE
0.031249
0.006644
4.703 2.56e-06 ***
SLACK
0.624822
0.070639
8.845 < 2e-16 ***
ABOL
-0.036175
0.117808 -0.307 0.75879
SEASONAL
0.270874
0.171171
1.582 0.11354
HEAD
-0.210682
0.081226 -2.594 0.00949 **
MARRIED
0.242266
0.079410
3.051 0.00228 **
DKIDS
-0.157927
0.086218 -1.832 0.06699 .
DYKIDS
0.205894
0.097492
2.112 0.03470 *
SMSA
-0.170354
0.069781 -2.441 0.01464 *
NWHITE
0.074070
0.092956
0.797 0.42555
YRDISPL
-0.063700
0.014997 -4.247 2.16e-05 ***
SCHOOL12
-0.065258
0.082413 -0.792 0.42845
MALE
-0.179829
0.087535 -2.054 0.03994 *
STATEMB
0.006027
0.001009
5.973 2.33e-09 ***
STATEUR
0.095620
0.015912
6.009 1.86e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
191
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 6086.1
Residual deviance: 5746.4
AIC: 5786.4
on 4876
on 4857
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4

The predicted probabilities for each worker to apply for unemployment benefits are
stored in the object logitfit as fitted and can be extracted with logitfit$fitted.
7.1.3
Estimation of the Probit model
The parameters in a Probit model can be estimated by using the same function glm,
that allowed the Logit estimates to be obtained.
We only need to specify the "probit" link.
> probitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
MALE + STATEMB + STATEUR, family = binomial(link = "probit"),
data = benefits)
> summary(probitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"probit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2247 -1.2269
0.6988
3Q
0.8884
Max
1.5834
Coefficients:
(Intercept)
RR
RR2
AGE
AGE2
TENURE
SLACK
ABOL
SEASONAL
HEAD
MARRIED

-1.6999888 0.3629508 -4.684 2.82e-06 ***
1.8634727 1.1293243
1.650 0.09893 .
-2.9804349 1.4119428 -2.111 0.03478 *
0.0422140 0.0143142
2.949 0.00319 **
-0.0037741 0.0018118 -2.083 0.03724 *
0.0176943 0.0038476
4.599 4.25e-06 ***
0.3754931 0.0423881
8.858 < 2e-16 ***
-0.0223136 0.0718629 -0.311 0.75618
0.1612070 0.1040951
1.549 0.12147
-0.1247463 0.0490620 -2.543 0.01100 *
0.1454763 0.0478152
3.042 0.00235 **
192
DKIDS
-0.0965778
DYKIDS
0.1236097
SMSA
-0.1001520
NWHITE
0.0517937
YRDISPL
-0.0384797
SCHOOL12
-0.0415517
MALE
-0.1067168
STATEMB
0.0036399
STATEUR
0.0568271
0.0518420
0.0586377
0.0418419
0.0558335
0.0090509
0.0497219
0.0527404
0.0006065
0.0094328
-1.863 0.06247 .
2.108 0.03503 *
-2.394 0.01668 *
0.928 0.35359
-4.251 2.12e-05 ***
-0.836 0.40333
-2.023 0.04303 *
6.002 1.95e-09 ***
6.024 1.70e-09 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Dispersion parameter for binomial family taken to be 1)

AIC: 5788.1
on 4876
on 4857
degrees of freedom
degrees of freedom

To obtain the predicted probabilities of applying for unemployment benefits use
probitfit$fitted like for the Logit model.
7.1.4
A unique table for comparing model estimates
To compare the parameter estimates in the preceding model specifications we can use
the function mtable available in the package memisc. See p. 21 for the use of mtable.
> library(memisc)
> table7.2 <- mtable(LPM = lpmfit, Logit = logitfit,
Probit = probitfit)
> table7.2 <- relabel(table7.2, "(Intercept)" = "constant",
RR = "replacement rate", RR2 = "replacement rate^2",
AGE = "age", AGE2 = "age^2/10", TENURE = "tenure",
SLACK = "slack work", ABOL = "abolished position",
SEASONAL = "seasonal work", HEAD = "head of household",
MARRIED = "married", DKIDS="children",DYKIDS="youngchildren",
SMSA = "live in SMSA", NWHITE = "non white",
YRDISPL = "year of displacement", SCHOOL12=">12yearsofschool",
MALE = "male", STATEMB = "state max. befefits",
STATEUR = "state unempl. befefits")
> table7.2
Calls:
LPM: lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Logit: glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL+
193

YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, family =
binomial(link = "logit"), data = benefits)
Probit: glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL+
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, family =
binomial(link = "probit"), data = benefits)
=====================================================
LPM
Logit
Probit
----------------------------------------------------constant
-0.077
-2.800*** -1.700***
(0.122)
(0.604)
(0.363)
replacement rate
0.629
3.068
1.863
(0.384)
(1.868)
(1.129)
replacement rate^2
-1.019*
-4.891*
-2.980*
(0.481)
(2.334)
(1.412)
age
0.016**
0.068**
0.042**
(0.005)
(0.024)
(0.014)
age^2/10
-0.001*
-0.006*
-0.004*
(0.001)
(0.003)
(0.002)
tenure
0.006*** 0.031*** 0.018***
(0.001)
(0.007)
(0.004)
slack work
0.128*** 0.625*** 0.375***
(0.014)
(0.071)
(0.042)
abolished position
-0.007
-0.036
-0.022
(0.025)
(0.118)
(0.072)
seasonal work
0.058
0.271
0.161
(0.036)
(0.171)
(0.104)
head of household
-0.044** -0.211** -0.125*
(0.017)
(0.081)
(0.049)
married
0.049**
0.242**
0.145**
(0.016)
(0.079)
(0.048)
children
-0.031
-0.158
-0.097
(0.017)
(0.086)
(0.052)
young children
0.043*
0.206*
0.124*
(0.020)
(0.097)
(0.059)
live in SMSA
-0.035*
-0.170*
-0.100*
(0.014)
(0.070)
(0.042)
non white
0.017
0.074
0.052
(0.019)
(0.093)
(0.056)
year of displacement
-0.013*** -0.064*** -0.038***
(0.003)
(0.015)
(0.009)
> 12 years of school
-0.014
-0.065
-0.042
(0.017)
(0.082)
(0.050)
male
-0.036*
-0.180*
-0.107*
194
(0.018)
(0.088)
(0.053)
0.001*** 0.006*** 0.004***
(0.000)
(0.001)
(0.001)
state unempl. befefits
0.018*** 0.096*** 0.057***
(0.003)
(0.016)
(0.009)
----------------------------------------------------R-squared
0.067
adj. R-squared
0.063
sigma
0.450
F
18.331
p
0.000
0.000
0.000
Log-likelihood
-3016.708 -2873.197 -2874.071
Deviance
983.900 5746.393 5748.142
AIC
6075.415 5786.393 5788.142
BIC
6211.753 5916.239 5917.987
N
4877
4877
4877
Aldrich-Nelson R-sq.
0.065
0.065
McFadden R-sq.
0.056
0.056
Cox-Snell R-sq.
0.067
0.067
Nagelkerke R-sq.
0.094
0.094
phi
1.000
1.000
Likelihood-ratio
339.663
337.914
=====================================================
state max. befefits
Observe that the R-squared, adj. R-squared, sigma and F final statistics do not
have any statistical relevance for a linear probability model; so they have not to be
considered.
The estimated marginal effect for TENURE, evaluated at the sample average of the
regressors, can be obtained as:
> xlevels <- apply(logitfit$model[, -1], 2, mean)
> avefitlogit <- c(1, xlevels) %*% logitfit$coef
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["TENURE"]
[,1]
[1,] 0.00659471
for the logit model and
> avefitprobit <- c(1, xlevels) %*% probitfit$coef
> dnorm(avefitprobit) * coef(probitfit)["TENURE"]
[,1]
[1,] 0.006203453
for the probit model. The estimated marginal effect of being married for the average
person is:
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["MARRIED"]
195
[,1]
[1,] 0.05112677
> dnorm(avefitprobit) * coef(probitfit)["MARRIED"]
[,1]
[1,] 0.05100272
in the two specifications.
7.1.5
The
Some additional goodness of fit measures
Rp2 ,
see Verbeeks relationship (7.21) is defined as:

Rp2 = 1
wr1
wr0
where wr1 and wr0 are the proportions of incorrect predictions respectively for the
considered model and for a model containing only an intercept.
Rp2 (and the HM index, see Verbeeks Section 7.1.5) can be obtained by defining a
function R2p:
> R2p <- function(y, estobject, cutoff = 0.5) {
a <- table(y, (estobject$fitted > cutoff) * 1)
wr_1 <- 1 - sum(diag(a))/sum(a)
phat <- sum(y)/length(y)
wr_0 <- (1 - phat) * (phat > cutoff) + phat *
(phat <= cutoff)
pa <- prop.table(a, 1)
return(list("Cross-tabulation of actual and predicted outcomes" =
a, Rsq_p = round(1 - wr_1/wr_0, 4), HM = round(pa[1,
1] + pa[2, 2], 4)))
}
estobject can be an object of clas lm or glm. Observe that $fitted extracts
two different type of predicted values; in particular for glm they correspond to the
transformation through the Logit and Probit functions of the predicted values from
a linear specification, see Verbeeks relationship (7.15). cutoff is a threshold value:
when the estimated probality is larger than cutoff the fitted response is set equal to
1, otherwise the response is set to 0. Verbeek assumes cutoff = 0.5.
The Rp2 may then be obtained for the preceding models by invoking the function
R2p, specifying the actual outcomes for the y argument, benefits$Y, and the objects
(lpmfit, logitfit and probitfit) resulting respectively from the linear probability,
Logit and Probit model estimation procedures.
> R2p(y = benefits$Y, estobject = lpmfit, cutoff = 0.5)
$"Cross-tabulation of actual and predicted outcomes"
y
0
0
1
184 1358
196
130 3205
$Rsq_p
[1] 0.035
$HM
[1] 1.0803
> R2p(benefits$Y, logitfit)
y
0
1
0
1
242 1300
171 3164
$Rsq_p
[1] 0.046
$HM
[1] 1.1057
> R2p(benefits$Y, probitfit)
y
0
1
0
1
231 1311
162 3173
$Rsq_p
[1] 0.0447
$HM
[1] 1.1012
With the function pR2, available in the package pscl the following pseudo-R2
measures may be produced in presence of a glm object, like the Logit and Probit
ones, that results applying the glm function, see Hardin and Hilbe (2007).
llh: The log-likelihood from the fitted model
llhNull: The log-likelihood from the intercept-only restricted model
G2: Minus two times the difference in the log-likelihoods
McFadden: McFaddens pseudo r-squared
r2ML: Maximum likelihood pseudo r-squared
r2CU: Cragg and Uhlers pseudo r-squared
See the help ?pscl::pR2 and Hardin and Hilbe (2007) for more information.
> library(pscl)
> pR2(logitfit)
llh
llhNull
-2.873197e+03 -3.043028e+03
r2ML
r2CU
6.727594e-02 9.436996e-02
> pR2(probitfit)
llh
llhNull
-2.874071e+03 -3.043028e+03
r2ML
r2CU
6.694146e-02 9.390077e-02
G2
3.396629e+02
McFadden
5.581002e-02
G2
3.379143e+02
McFadden
5.552271e-02
197
A cross-tabulation (like the one returned by the function R2p) of actual outcomes
against predicted outcomes for discrete data models, with summary statistics such as
the percentage of correctly predicted under fitted and null models may be obtained
by applying the function hitmiss available in the package pscl to a glm object.
The user can also specify a classification threshold different from 0.5 for the predicted
probabilities by changing the default argument k=0.5.
> hitmiss(logitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 242 171
yhat=1 1300 3164
Percent Correctly Predicted = 69.84%
Percent Correctly Predicted = 15.69%,
Null Model Correctly Predicts 68.38%
[1] 69.83802 15.69390 94.87256
> hitmiss(probitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 231 162
yhat=1 1311 3173
Percent Correctly Predicted = 14.98%,
Null Model Correctly Predicts 68.38%
[1] 69.79701 14.98054 95.14243
for y = 0
for y = 1
for y = 0
for y = 1
Observe that the log-likelihood for the naive model, where the probability of applying
for benefits is constant, can be obtained, e.g. for the Logit model from the crosstabulation of actual and predicted outcomes, see Verbeeks relationship (7.19):
> a <- R2p(benefits$Y, logitfit)[[1]]
> log0 <- sum(a[2, ]) * log(sum(a[2, ])/sum(a)) + sum(a[1,
]) * log(sum(a[1, ])/sum(a))
> log0
198
[1] -3043.028
The function logLik extracts the log-likelihood from a glm object; so the Mc Fadden
pseudo R-squared (returned by the function pR2) can be also computed with:
> 1 - logLik(logitfit)[1]/log0
[1] 0.05581002
7.2
Some remarks on the interpretation of a parameter in

a logit model
According to the logistic link function we have

p=
It follows

log
that is
or
p
1p
exp(X 0 )
.
1 + exp(X 0 )

= 0 + 1 X1 + + p Xp
p
= exp{0 + 1 X1 + + p Xp }
1p
p
= e0 e1 X1 ep Xp .
1p
If x1 (x1 + 1) then, ceteris paribus, p p1 :

p
p
p1
e1 =
1p
1p
1 p1
Lets now study for various p and 1 the relationship between p1 and p, see 7.1
>
>
>
>
>
>
>
>
>
p <- 0:25/25
p <- p[-c(1, length(p))]
odds <- p/(1 - p)
beta <- 0:16/4 - 2
expbeta <- exp(beta)
table <- outer(odds, expbeta, "*")
rownames(table) <- p
colnames(table) <- round(beta, 2)
p1 <- round(table/(1 + table), 3)
If p = 0.4 and 1 = 1.25 when X1 increases of 1 we have p1 = 0.16. Were 1 = 1.25

we would have p1 = 0.70.
The dummy variable case
If X1 is a dummy variable the consequence on p is the same, when considering the
effect for category 1 with respect to category 0.
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
-1.75
0.01
0.01
0.02
0.03
0.04
0.05
0.06
0.08
0.09
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.31
0.35
0.41
0.48
0.56
0.67
0.81
0.01
0.01
0.02
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.19
0.22
0.26
0.30
0.35
0.41
0.50
0.61
0.76
0.01
0.02
0.03
0.04
0.05
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.41
0.47
0.54
0.62
0.72
0.84
-1.5
0.01
0.02
0.04
0.05
0.07
0.08
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.42
0.48
0.53
0.60
0.68
0.77
0.87
-1.25
0.01
0.03
0.05
0.06
0.08
0.10
0.12
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.40
0.44
0.49
0.54
0.59
0.66
0.73
0.81
0.90
-1
0.02
0.04
0.06
0.08
0.11
0.13
0.15
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.41
0.46
0.50
0.55
0.60
0.65
0.71
0.78
0.84
0.92
-0.75
0.02
0.05
0.08
0.10
0.13
0.16
0.19
0.22
0.25
0.29
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.61
0.66
0.71
0.76
0.82
0.88
0.94
-0.5
0.03
0.06
0.10
0.13
0.16
0.20
0.23
0.27
0.30
0.34
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.67
0.71
0.76
0.80
0.85
0.90
0.95
-0.25
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.72
0.76
0.80
0.84
0.88
0.92
0.96
0
0.05
0.10
0.15
0.20
0.24
0.29
0.33
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.66
0.69
0.73
0.77
0.80
0.84
0.87
0.90
0.94
0.97
0.25
0.06
0.12
0.18
0.24
0.29
0.34
0.39
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.81
0.84
0.87
0.90
0.92
0.95
0.97
0.5
0.08
0.15
0.22
0.29
0.35
0.40
0.45
0.50
0.54
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.87
0.89
0.92
0.94
0.96
0.98
0.75
0.10
0.19
0.27
0.34
0.40
0.46
0.51
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.80
0.83
0.85
0.88
0.90
0.92
0.94
0.95
0.97
0.98
1
0.13
0.23
0.32
0.40
0.47
0.52
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.92
0.93
0.95
0.96
0.98
0.99
1.25
0.16
0.28
0.38
0.46
0.53
0.59
0.64
0.68
0.72
0.75
0.78
0.80
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.95
0.96
0.97
0.98
0.99
1.5
0.19
0.33
0.44
0.52
0.59
0.64
0.69
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.91
0.92
0.94
0.95
0.96
0.97
0.98
0.98
0.99
1.75
2
0.23
0.39
0.50
0.58
0.65
0.70
0.74
0.78
0.81
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.94
0.95
0.96
0.97
0.97
0.98
0.99
0.99
on the rows p and on the columns ; the entries in the table are the new values of p for a unitary variation of x
-2
Table 7.1

199
200
The logarithm case

When X1 is expressed as a logarithm, say 1 log(X1 ) appears in the model, if
X1 1.01X1 , that is X1 increases 1%, we have:
log(X1 ) log(1.01X1 ) = log(1.01) + log(X1 )
[' 0.01 + log(X1 )]
and
p
1p
p
1p

e
1 log(1.01)
p1
=
1 p1

p
p1
0.011
'
e
=
1p
1 p1
Exercise
Produce a table similar to Table 7.1 for the logarithm case. Comment the results.
7.3
Explaining Firms Credit Ratings (Section 7.2.1)
The parameter estimation of an ordered response model is considered and the results
are compared with those pertaining a Logit framework.
Data are available in the file credit.dta, which is in the Stata format and is available
in the compressed archive ch07.zip. To import data, we have first to invoke the
package foreign and next the command read.dta.
> library(foreign)
> credit <- read.dta(unzip("ch07.zip", "Chapter 7/credit.dta"))
The data base contains 921 observations for 2005 for US firms credit ratings, including
a set of firm characteristics. The data are taken from Compustat.
The following variables are available:
booklev ratio of book value of debt / assets
marklev ratio of market value of debt / assets
ebit earnings before interest and taxes / total assets
invgrade dummy, 1 if investment grade, 0 if speculative grade
logsales log firm sales
rating credit rating, from 1 (low) to 7 (high)
reta retained earnings / total assets
wka working capital / total assets
Some summary statistics are first obtained, see Verbeeks Table 7.4, which may be
reproduced with the following code:
> t(apply(credit, 2, function(x) c(mean = mean(x),
median = median(x), min = min(x), max = max(x))))
or
201
> t(sapply(credit, function(x) c(mean = mean(x), median = median(x),

min = min(x), max = max(x))))
mean
median
min
max
booklev 0.29318678 0.2639547 0.0000000 0.9992067
ebit
0.09389208 0.0904365 -0.3841692 0.6515085
invgrade 0.47231270 0.0000000 0.0000000 1.0000000
logsales 7.99575383 7.8841529 1.1002775 12.7014179
marklev 0.25472868 0.2111638 0.0000000 0.9648595
rating
3.49945711 3.0000000 1.0000000 7.0000000
reta
0.15699418 0.1804797 -0.9958922 0.9799219
wka
0.14041423 0.1228412 -0.4120839 0.7480223
A binary Logit model is then used to describe the probability to obtain an investment
grade rating, which occurs when the credit rating is 4 or more. To this aim we use
the function glm. The independent variables in the model are the book leverage and
ebit ratios, the log firm sales, the retained earnings over total assets and the working
capital over total assets. We are considering the binomial family with the "logit"
link function.
> logitfit <- glm(invgrade ~ booklev + ebit + logsales +
reta + wka, family = binomial(link = "logit"),
data = credit)
Use summary(logitfit) to produce the output of the estimation procedure.
The ordered logit model may be estimated by having recourse to the function polr
available in the package MASS.
polr fits a logistic or probit regression model to an ordered factor response. So we
have first to recode the variable rating as an ordered factor.
> credit$rating <- ordered(credit$rating)
The default logistic case is proportional odds logistic regression, after which the
function is named. It is however possible to change the link function by fixing the
argument method to one of the following options: "logistic", "probit", "cloglog",
"cauchit".
The main arguments of the function polr are thus:
a model formula with the same structure used for defining linear models, and
the method specification.
> library(MASS)
> orderedlogitfit <- polr(rating ~ booklev + ebit +
logsales + reta + wka, data = credit, method = "logistic")
The results can be resumed in a unique output with the function mtable, available
in the package memisc.
202
> library(memisc)
> mtable(logitfit, orderedlogitfit)
Calls:
logitfit: glm(formula = invgrade ~ booklev + ebit + logsales + reta +
wka, family = binomial(link = "logit"), data = credit)
orderedlogitfit: polr(formula = rating ~ booklev + ebit + logsales +
reta + wka, data = credit, method = "logistic")
=====================================================
logitfit
orderedlogitfit
----------------------------------------------------(Intercept)
-8.214***
(0.867)
booklev
-4.427***
-2.752***
(0.771)
(0.477)
ebit
4.355**
4.731***
(1.440)
(0.945)
logsales
1.082***
0.941***
(0.096)
(0.059)
reta
4.116***
3.560***
(0.489)
(0.302)
wka
-4.012***
-2.580***
(0.748)
(0.483)
1|2
-0.370
(0.633)
2|3
4.881***
(0.521)
3|4
7.626***
(0.551)
4|5
9.885***
(0.592)
5|6
12.883***
(0.673)
6|7
14.783***
(0.784)
----------------------------------------------------Aldrich-Nelson R-sq.
0.391
0.484
McFadden R-sq.
0.465
0.309
Cox-Snell R-sq.
0.474
0.608
Nagelkerke R-sq.
0.633
0.639
phi
1.000
Likelihood-ratio
591.796
862.873
p
0.000
0.000
Log-likelihood
-341.078
-965.307
Deviance
682.155
1930.614
AIC
694.155
1952.614
203
BIC
723.108
2005.694
N
921
921
=====================================================
Observe that the likelihood ratio test may also be performed by having recourse to
the function lrtest in the package lmtest.
Additional pseudo-R2 measures may be obtained with the function pR2 in the package
pscl.
> library(pscl)
> pR2(logitfit)
llh
llhNull
G2
McFadden
r2ML
-341.0775772 -636.9757787 591.7964028
0.4645360
0.4740549
r2CU
0.6327213
> pR2(orderedlogitfit)
llh
llhNull
G2
McFadden
-965.3071623 -1396.7436930
862.8730614
0.3088874
r2ML
r2CU
0.6081543
0.6389289
To compute the probabilities for the average firm to obtain an investment grade,
having a book leverage of .25 and .75, see Verbeek p. 225, we first obtain the linear
estimates x0i corresponding to the first and third quantiles of booklev by considering
the sample average levels for the other variables:
> xlevels <- apply(credit[, c("ebit", "logsales", "reta",
"wka")], 2, mean)
> avefit <- c(1, quantile(credit$booklev, 0.25), xlevels) %*%
logitfit$coef
> avefit1 <- c(1, quantile(credit$booklev, 0.75), xlevels) %*%
logitfit$coef
and then apply the logistic transformation exp(x0i )/(1 + exp(x0i )), namely:
P {yi 0|xi } = P {i 1 x0i |xi } = P {i 1 + x0i |xi } = F (1 + x0i ) =
=
exp(1 + x0i )
=
1 + exp(1 + x0i )
> 1/(1 + exp(-avefit))

[,1]
[1,] 0.5433236
> 1/(1 + exp(-avefit1))
[,1]
[1,] 0.3119182
1
1
exp(1 +x0i )
+1
1
exp(1 x0i ) + 1
204
For the ordered logit by using the logistic transformation:

P {yi 3 |xi } = P {i 3 x0i |xi } = P {i 3 + x0i |xi } = F (3 + x0i ) =
=
exp(3 + x0i )
=
1 + exp(3 + x0i )
1
1
exp(3 +x0i )
+1
1
exp(3 x0i ) + 1
we have
> avefit <- c(quantile(credit$booklev, 0.25), xlevels) %*%
orderedlogitfit$coef
> avefit1 <- c(quantile(credit$booklev, 0.75), xlevels) %*%
orderedlogitfit$coef
> 1/(1 + exp(7.626 - avefit))
[,1]
[1,] 0.5169951
> 1/(1 + exp(7.626 - avefit1))
[,1]
[1,] 0.3701199
According to both models the probability of obtaining an investment grade decreases
when the booking leverage increases.
7.4
Willingness to Pay for Natural Areas (Section 7.2.4)
As an illustration of the ordered probit model in presence of fixed thresholds Verbeek

considers the analysis of how the value of a good that is not traded may be determined
in public economics. Data are available in the file wtp.dat, a text file that may be
read with the command read.table.
> wtp <- read.table(unzip("ch07.zip", "Chapter 7/wtp.dat"),
header = TRUE)
The dataset wtp contains 312 observations with survey results concerning willingness
to pay for the preservation of the Alentejo Natural Park in Portugal. The survey was
held in 1997. The original amounts in escudos were transformed into euros by dividing
through 200. The following variables (some are dummy variables) are present:
BID1 initial bid, in euro
BIDH higher bid
BIDL lower bid
NN dummy, 1 if answers are no,no
NY dummy, 1 if no,yes
YN dummy, 1 if yes,no
YY dummy, 1 if yes,yes
DEPVAR dependent variable from 1 to 4 (NN,NY,YN,YY)
AGE age in 6 classes
FEMALE dummy, 1 if female
INCOME income in 8 classes
205
Two ordered probit models are considered for explaining the willingness to pay: the
first one including only an intercept, while the age class, the gender and the income
class are included as explanatory variables for the second model.
To obtain the maximum likelihood parameter estimates we have first to build the
corresponding likelihood functions, see Verbeeks relationships (7.33)-(7.34). Recall
from Chapter 6 that we have to define the opposite of the Log-likelihood function,
since internal optimization routines provide for minimization.
For both models we define a variable regr including the regressors (for the first
model only a vector of ones). In the former model the likelihood depends on b1 and
on sigma. In the latter one it depends also on b2, b3 and b4.
> llI <- function(b1, sigma) {
regr <- as.matrix(rep(1, nrow(wtp)))
s = sigma
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr * b1)/s)) +
wtp$NY * log(pnorm((wtp$BID1 - regr * b1)/s) pnorm((wtp$BIDL - regr * b1)/s)) + wtp$YN *
log(pnorm((wtp$BIDH - regr * b1)/s) - pnorm((wtp$BID1 regr * b1)/s)) + wtp$YY * log(1 - pnorm((wtp$BIDH regr * b1)/s)))
}
> llII <- function(b1, b2, b3, b4, sigma) {
regr <- cbind(1, wtp$AGE, wtp$FEMALE, wtp$INCOME)
s <- sigma
b <- c(b1, b2, b3, b4)
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr %*%
b)/s)) + wtp$NY * log(pnorm((wtp$BID1 - regr %*%
b)/s) - pnorm((wtp$BIDL - regr %*% b)/s)) +
wtp$YN * log(pnorm((wtp$BIDH - regr %*% b)/s) pnorm((wtp$BID1 - regr %*% b)/s)) + wtp$YY *
log(1 - pnorm((wtp$BIDH - regr %*% b)/s)))
}
We can now use the function mle2 in the package bbmle to obtain the parameter
estimates, having defined a list of starting values for the two models.
> library(bbmle)
> b.start <- list(b1 = 10, sigma = 15)
> out <- mle2(llI, start = b.start)
> summary(out)
206
Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Pr(z)
b1
18.7391
2.4969 7.5049 6.148e-14 ***
sigma 38.6122
2.9332 13.1637 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5, sigma = 30)
> out <- mle2(llII, start = b.start)
> summary(out)
Call:
mle2(minuslogl = llII, start = b.start)
Coefficients:
Pr(z)
b1
30.0730
8.2788 3.6325 0.0002806
b2
-6.9309
1.6656 -4.1613 3.164e-05
b3
-5.1561
4.7135 -1.0939 0.2739901
b4
4.8940
1.9114 2.5604 0.0104549
sigma 36.4774
2.7488 13.2701 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
***
***
*
***
0.05 "." 0.1 " " 1
-2 log L: 782.8182
The parameter estimates for the second model and the asymptotic standard errors
for the parameter estimates in both models are similar to those obtained by Verbeek
but not equal. Let us check what happens by changing the starting values in the
minimization procedure. The function coef allows only the paramater estimates to
be extracted from the object resulting by applying mle2.
> b.start <- list(b1 =
> coef(mle2(llI, start
b1
sigma
18.73886 38.61274
b1
sigma
18.73940 38.61309
1, sigma = 15)
= b.start))
1, sigma = 20)
= b.start))
4, sigma = 50)
= b.start))
207
b1
sigma
18.73680 38.61727
> coef(mle2(llI, start = b.start))
b1
sigma
18.73239 38.61679
The minimization procedure applied to the first model seems to be quite robust with
respect to different sets of initial starting values.
For the second model we have:
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 6,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.132584 -6.939768 -5.186392 4.887521 36.489069
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5,
b1
b2
b3
b4
sigma
29.978269 -6.916038 -5.156463 4.908729 36.469106
> b.start <- list(b1 = 1, b2 = 2, b3 = 3, b4 = 4,
b1
b2
b3
b4
sigma
30.058567 -6.925511 -5.157301 4.893616 36.462669
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 =
sigma = 30)
b1
b2
b3
b4
sigma
30.110749 -6.933783 -5.184297 4.886962 36.481068
sigma = 30)
sigma = 14)
sigma = 14)
4,
The shape of the likelihood function seems to be too flat around its minimum; this
issue renders the optimization problem somewhat difficult and the estimation result
is unstable depending on initial starting values. To overcome this situation we have to
specify some parameters to control for the minimization algorithm; in particular we
fix the relative convergence tolerance to reltol = 1015 and the maximum number
of iterations to maxit = 10000. For the first model we have:
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61273
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61272
15)
control = list(reltol = 1e-15,
20)
control = list(reltol = 1e-15,
208

> coef(mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
sigma
18.73884 38.61272
> coef(mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
sigma
18.73884 38.61272
and for the second model we have:
> coef(mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109369 -6.933365 -5.184575 4.887612 36.477856
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109394 -6.933368 -5.184583 4.887607 36.477853
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109367 -6.933365 -5.184581 4.887613 36.477855
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109366 -6.933364 -5.184582 4.887613 36.477855
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 = 4,
sigma = 30)
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109386 -6.933367 -5.184587 4.887609 36.477855
> b.start <- list(b1 = 5, b2 = -20, b3 = 30, b4 = -10,
sigma = 60)
maxit = 10000)))
b1
b2
b3
b4
sigma
30.109372 -6.933365 -5.184584 4.887612 36.477854
209
The estimation results, though a bit different from those proposed by Verbeek for the
second model, do no more depend in a significant manner on initial starting values.
So we have the following final output:
> outi <- mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outi)
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Pr(z)
b1
18.7388
2.4970 7.5047 6.158e-14 ***
sigma 38.6127
2.9333 13.1635 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 = 4,
sigma = 30)
> outc <- mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outc)
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Pr(z)
b1
30.1094
8.2791 3.6368 0.000276
b2
-6.9334
1.6657 -4.1625 3.148e-05
b3
-5.1846
4.7138 -1.0999 0.271390
b4
4.8876
1.9113 2.5572 0.010552
sigma 36.4779
2.7489 13.2699 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
***
***
*
***
0.05 "." 0.1 " " 1
-2 log L: 782.8181
From the model with only the intercept the proportion of population with a negative
willingness to pay is:
210
> mu <- coef(outi)["b1"]

> sigma <- coef(outi)["sigma"]
> (pneg <- pnorm(-mu/sigma))
b1
0.313731
Reinterpreting the latent variable as desired willingness to pay, that is assuming that
the actual willingness to pay is the maximum between 0 and the desired amount, we
have that the expected value for the truncated normal distribution is:
> (expectpos <- mu + sigma * dnorm(mu/sigma)/pnorm(mu/sigma))
b1
38.69165
Then the estimate for expected willingness to pay over the entire sample will be:
> (expect <- 0 * pneg + (1 - pneg) * expectpos)
b1
26.55288
with a total willingness to pay of about
> expect * 3 * 10^6
b1
79658625
With reference to the model with characteristics we have that the expected willingness
to pay for a male in income class 1, aged between 20 and 29 (1st age class) is:
> sum(coef(outc)[c("b1", "b2", "b4")])
[1] 28.06363
and taking into account the censoring we have
> mu <- sum(coef(outc)[c("b1", "b2", "b4")])
> sigma <- coef(outc)["sigma"]
> (pneg <- pnorm(-mu/sigma))
sigma
0.2208477
> (expectpos <- mu + sigma * dnorm(mu/sigma)/pnorm(mu/sigma))
sigma
41.95654
> (expect <- 0 * pneg + (1 - pneg) * expectpos)
sigma
32.69053
7.5
211
Patent and R&D Expenditures (Section 7.3.2)
As an illustration for models in presence of count data Verbeek considers the analysis
of the relationship between the number of patents obtained by some firms and their
Research and Development Expenditures.
Data are available in the file patents.dat, a text file that may be read with the
command read.table.
> patents <- read.table(unzip("ch07.zip", "Chapter 7/patents.dat"),
header = TRUE)
The file patents contain data on 181 international manufacturing firms, with their
R&D expenditures, number of patents, industry, etc. for 1990 and 1991. The following
variables are available:
P91 number of patents in 1991
P90 number of patents in 1990
LR91 log R&D expenditures in 1991
LR90 log R&D expenditures in 1990
AEROSP dummy for aerospace industry
CHEMIST dummy for chemistry industry
COMPUTER dummy for computer industry (hardware and software)
MACHINES dummy for machinery and instruments industry
VEHICLES dummy for motor vehicles industry
JAPAN dummy for Japan
US dummy for USA
The relationship between the number of patents (a count variable) and the
expenditures in Research and Development are first analyzed by means of a Poisson
regression model.
The maximum likelihood parameter estimates may be obtained by using the function
glm and specifying poisson as family and "log" as link function.
> poissonfit <- glm(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
family = poisson(link = "log"), data = patents)
> summary(poissonfit)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = poisson(link="log"),data=patents)
Deviance Residuals:
Min
1Q
Median
-27.979
-5.246
-1.572
3Q
2.352
Max
29.246
212
Coefficients:
0.065868 -13.27 < 2e-16 ***
LR91
0.854525
0.008387 101.89 < 2e-16 ***
AEROSP
-1.421850
0.095640 -14.87 < 2e-16 ***
CHEMIST
0.636267
0.025527
24.93 < 2e-16 ***
COMPUTER
0.595343
0.023338
25.51 < 2e-16 ***
MACHINES
0.688953
0.038346
17.97 < 2e-16 ***
VEHICLES
-1.529653
0.041864 -36.54 < 2e-16 ***
JAPAN
0.222222
0.027502
8.08 6.46e-16 ***
US
-0.299507
0.025300 -11.84 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Dispersion parameter for poisson family taken to be 1)
AIC: 9919.6
on 180
on 172
degrees of freedom
degrees of freedom

Goodness of fit statistics may be produced by means of the function pR2 available in
the package pscl. The pseudo R2 in Verbeek Table 7.7 can be found here as McFadden
pseudo R2 ; while the likelihood ratio appears as G2, see Section 7.1.5.
> library(pscl)
> pR2(poissonfit)
llh
llhNull
-4.950789e+03 -1.524456e+04
r2ML
r2CU
1.000000e+00 1.000000e+00
G2
2.058754e+04
McFadden
6.752422e-01
The same results (except1 for the McFadden pseudo R2 ) together with the parameter
estimates may also be obtained by applying the function mtable of the package memisc
to the glm object poissonfit.
> library(memisc)
> mtable(poissonfit)
1 mtable
computes McFadden R2 as
1
deviance(model)
deviance(Saturated M odel)
and for the present case may be obtained with R as

1-poissonfit$deviance/poissonfit$null.deviance
The definition coincides with Verbeeks (and also McFaddens) only in presence of a Generalized
Linear Model with a dichotomous response (Bernoulli, that is Binomial with n = 1) and a logit or
probit link function. So it is better to make reference to the value returned by the function pR2.
213
Calls:
poissonfit: glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = poisson(link = "log"),
data = patents)
===============================
(Intercept)
-0.874***
(0.066)
LR91
0.855***
(0.008)
AEROSP
-1.422***
(0.096)
CHEMIST
0.636***
(0.026)
COMPUTER
0.595***
(0.023)
MACHINES
0.689***
(0.038)
VEHICLES
-1.530***
(0.042)
JAPAN
0.222***
(0.028)
US
-0.300***
(0.025)
------------------------------Aldrich-Nelson R-sq.
0.991
McFadden R-sq.
0.694
Cox-Snell R-sq.
1.000
Nagelkerke R-sq.
1.000
phi
1.000
Likelihood-ratio
20587.541
p
0.000
Log-likelihood
-4950.789
Deviance
9081.901
AIC
9919.578
BIC
9948.364
N
181
===============================
The function coeftest in the package lmtest produces a table of the coefficients with
their robust standard errors, the z statistics and the statistical significance. Robust
standard errors are obtained by using the function vcovHC, available in the package
sandwich, see Section 4.1.6.
> library(sandwich)
> library(lmtest)
> coeftest(poissonfit, vcovHC(poissonfit, type = "HC"))
214
z test of coefficients:
0.742962 -1.1760 0.239591
LR91
0.854525
0.093695 9.1203 < 2.2e-16 ***
AEROSP
-1.421850
0.380168 -3.7401 0.000184 ***
CHEMIST
0.636267
0.225359 2.8233 0.004753 **
COMPUTER
0.595343
0.300803 1.9792 0.047796 *
MACHINES
0.688953
0.414664 1.6615 0.096619 .
VEHICLES
-1.529653
0.280693 -5.4496 5.049e-08 ***
JAPAN
0.222222
0.352840 0.6298 0.528819
US
-0.299507
0.273621 -1.0946 0.273689
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Verbeek performs a Wald test to check for the joint effect of the explanatory variables
included in the model.
We have first to estimate the base glm model including only an intercept and
then call the function waldtest, available in the package lmtest. The first and
second arguments are respectively the baseline model and the complete one; with
the argument test it is possible to specify the type of test ("Chisq" or "F") to be
performed and with the argument vcov the covariance matrix, in the present case a
robust estimate of the covariance matrix.
> poissonfit0 <- glm(P91 ~ 1, family = poisson(link = "log"),
data = patents)
> lmtest::waldtest(poissonfit0, poissonfit, test = "Chisq",
vcov = vcovHC(poissonfit, type = "HC"))
Wald test
Model 1: P91 ~ 1
Model 2: P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +VEHICLES+
JAPAN + US
Res.Df Df Chisq Pr(>Chisq)
1
180
2
172 8 339.97 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Wald test rejects the hypothesis that the conditional mean is constant and
independent of the explanatory variables.
With regard to the interpretation of the coefficients Verbeek observes that b2 = 0.85,
pertaining the logarithm of the Research & Development expenditures, is to be
interpreted as an elasticity.
We obtain the percentage difference for each industry, ceteris paribus, in the number
of patents with respect to the reference industries (food, fuel, metal and others) by
transforming the parameters b3 , . . . , b9 as 100[exp(bi ) 1], i = 3, . . . , 9
> round(100 * (exp(poissonfit$coef[3:9]) - 1), 1)

AEROSP CHEMIST COMPUTER MACHINES VEHICLES
JAPAN
-75.9
88.9
81.4
99.2
-78.3
24.9
215
US
-25.9
The function dispersiontest is available in the package AER to test if we can expect a
dispersion parameter different from 1. The first argument is a Poisson model estimated
with glm. By default the argument alternative is set to "greater" thus testing for
a situation of overdispersion implying a value for the conditional variance larger than
the value of the conditional mean:
2 = dispersion ,
with dispersion > 1
> library(AER)
> dispersiontest(poissonfit)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
57.36236
A sample estimate of the dispersion parameter is also given. By using the argument
trafo it is possible to specify to 1 or 2 the power k in the following expression
2 = + k
corresponding to the formulation of the variance in the Negative Binomial I and
II models that will be used below, see Kleiber and Zeileis (2008) for more detailed
information. The estimate of the dispersion parameter will be given for .
Note that for k = 1 we have dispersion = (1 + ).
> dispersiontest(poissonfit, trafo = 1)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
56.36236
> dispersiontest(poissonfit, trafo = 2)
Overdispersion test
216
data: poissonfit
z = 3.8271, p-value = 6.482e-05
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
0.4278121
To overcome the problem of overdispersion of the sigma parameter Verbeek considers
the estimation of the two models: NegBinI and NegBinII, both based on a
Negative Binomial distribution for the number of patents but depending on different
specifications for the conditional variance, see e.g. Johnson et. al. (2005) and Rigby
and Stasinopoulos (2009).
Observe that in the statistical literature the name of the distribution functions
pertaining these two models are exchanged. So the model NBI will be used to estimate
what in the econometric literature is known as NegBinII and vice versa; later on the
econometric convention is used for naming model objects.
Negative Binomial II - 1st estimation
The function glm provides also the family negative.binomial(1) that is used to
estimate the NegBinII model. The corresponding call and output are:
> NegBinII <- glm(P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = negative.binomial(1),
data = patents)
> summary(NegBinII)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = negative.binomial(1), data = patents)
Deviance Residuals:
Min
1Q
Median
-2.9205 -1.1676 -0.3058
3Q
0.3976
Max
2.9594
Coefficients:
0.55216 -0.590
LR91
0.83129
0.07861 10.575
AEROSP
-1.49826
0.37399 -4.006
CHEMIST
0.48779
0.26256
1.858
COMPUTER
-0.16953
0.27553 -0.615
MACHINES
0.05990
0.27957
0.214
VEHICLES
-1.53392
0.36098 -4.249
JAPAN
0.25361
0.40086
0.633
US
-0.58792
0.28228 -2.083
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.5560
< 2e-16
9.17e-05
0.0649
0.5392
0.8306
3.51e-05
0.5278
0.0388
***
***
.
***
*
"*" 0.05 "." 0.1 " " 1
(Dispersion parameter for Negative Binomial(1) family taken to be 1.241)

AIC: 1663.7
on 180
on 172
217
degrees of freedom
degrees of freedom

> pR2(NegBinII)
llh
llhNull
G2
-822.8364665 -960.2437462 274.8145593
r2CU
0.7809380
McFadden
0.1430963
r2ML
0.7809187
Observe that the estimates and their standard errors differ somewhat from those
provided in Verbeeks Table 7.8, but we have to remind that glm uses an estimation
method based upon iteratively reweighted least squares and not on maximum
likelihood.
Verbeeks coefficient estimates2 for the NegBinII model may be reproduced by
applying the function glm.nb available in the package MASS.
Negative Binomial II - 2nd estimation
> library(MASS)
> NegBinII.glm.nb <- glm.nb(P91 ~ LR91 + AEROSP + CHEMIST +
data = patents)
> summary(NegBinII.glm.nb)
Call:
glm.nb(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, data = patents, init.theta = 0.7686768238,
link = log)
Deviance Residuals:
Min
1Q
Median
-2.6373 -1.0264 -0.2694
3Q
0.3438
Max
2.5966
Coefficients:
0.56234 -0.577
0.5638
LR91
0.83148
0.08006 10.386 < 2e-16 ***
AEROSP
-1.49746
0.37691 -3.973 7.10e-05 ***
CHEMIST
0.48861
0.26788
1.824
0.0682 .
COMPUTER
-0.17355
0.28086 -0.618
0.5366
MACHINES
0.05926
0.28429
0.208
0.8349
VEHICLES
-1.53065
0.36852 -4.153 3.27e-05 ***
JAPAN
0.25222
0.40983
0.615
0.5383
US
-0.59050
0.28834 -2.048
0.0406 *
2 Also
the procedures presented below are not based on maximum likelihood.
218
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Dispersion parameter for Negative Binomial(0.7687) family taken to be 1)

AIC: 1659.2
on 180
on 172
degrees of freedom
degrees of freedom
Theta:
Std. Err.:
2 x log-likelihood:
0.7687
0.0812
-1639.1910
The estimate for 2 may be retrieved as:

> 1/NegBinII.glm.nb$theta
[1] 1.300937
The package gamlss, (http://gamlss.org), provide tools for estimating Generalized
Additive Models for Location Scale and Shape of which the NegBinI and NegBinII
are special cases.
Negative Binomial I - 1st estimation
> library(gamlss)
> NegBinI <- gamlss(P91 ~ LR91 + AEROSP + CHEMIST +
~1, family = NBII, data = patents)
GAMLSS-RS iteration 1: Global Deviance = 1730.599
> summary(NegBinI)
*******************************************************************
Family: c("NBII", "Negative Binomial type II")
219
Call:
gamlss(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, sigma.formula = ~1, family = NBII,
data = patents)
Fitting method: RS()
------------------------------------------------------------------Mu link function: log
Mu Coefficients:
Pr(>|t|)
(Intercept)
0.6962
0.50450
1.3799 1.694e-01
LR91
0.5779
0.06712
8.6095 4.652e-15
AEROSP
-0.7873
0.33707 -2.3359 2.066e-02
CHEMIST
0.7321
0.18525
3.9518 1.133e-04
COMPUTER
0.1440
0.20644
0.6978 4.863e-01
MACHINES
0.1549
0.25490
0.6076 5.443e-01
VEHICLES
-0.8177
0.26869 -3.0433 2.709e-03
JAPAN
0.4012
0.25742
1.5584 1.210e-01
US
0.1581
0.19856
0.7964 4.269e-01
------------------------------------------------------------------Sigma link function: log
Sigma Coefficients:
Estimate Std. Error
t value
Pr(>|t|)
4.560e+00
1.470e-01
3.103e+01
3.854e-72
------------------------------------------------------------------No. of observations in the fit: 181
Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 171
at cycle: 13
Global Deviance:
1696.391
AIC:
1716.391
SBC:
1748.376
*******************************************************************
> exp(NegBinI$sigma.coef)
(Intercept)
95.56499
> pR2(NegBinI)
220

llh
llhNull
G2
McFadden
-848.1955684 -892.4697439
88.5483510
0.0496086
Negative Binomial II 3rd estimation
> NegBinII <- gamlss(P91 ~ LR91 + AEROSP + CHEMIST +
sigma.formula = ~1, family = NBI, data = patents)
> summary(NegBinII)
*******************************************************************
Family: c("NBI", "Negative Binomial type I")
Call:
gamlss(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, sigma.formula = ~1, family = NBI,
data = patents)
Fitting method: RS()
------------------------------------------------------------------Mu link function: log
Mu Coefficients:
Pr(>|t|)
0.49878 -0.6524 5.150e-01
LR91
0.83160
0.07681 10.8272 3.893e-21
AEROSP
-1.49745
0.37730 -3.9689 1.061e-04
CHEMIST
0.48875
0.25683
1.9030 5.872e-02
COMPUTER
-0.17365
0.29888 -0.5810 5.620e-01
MACHINES
0.05944
0.27936
0.2128 8.318e-01
VEHICLES
-1.53081
0.37397 -4.0934 6.542e-05
JAPAN
0.25251
0.42654
0.5920 5.546e-01
US
-0.59036
0.27883 -2.1173 3.568e-02
------------------------------------------------------------------Sigma link function: log
Sigma Coefficients:
Estimate Std. Error
0.26311
0.10568
t value
2.48971
221
Pr(>|t|)
0.01374
------------------------------------------------------------------No. of observations in the fit: 181

Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 171
at cycle: 3
Global Deviance:
1639.191
AIC:
1659.191
SBC:
1691.176
*******************************************************************
> exp(NegBinII$sigma.coef)
(Intercept)
1.300965
> pR2(NegBinII)
llh
llhNull
G2
McFadden
-819.59573868 -892.46969695 145.74791654
0.08165427
Parameter estimates and their standard errors are again close but not equal to the
values provided by Verbeek.
Negative Binomial I and II (maximum likelihood)
We now build the likelihood functions for the NegBinI and NegBinII models by having
recourse respectively to the distribution functions NBII and NBI that are available
in the package gamlss.dist.
> library(gamlss.dist)
> regr <- cbind(1, patents$LR91, patents$AEROSP, patents$CHEMIST,
patents$COMPUTER, patents$MACHINES, patents$VEHICLES,
patents$JAPAN, patents$US)
> llI <- function(b1, b2, b3, b4, b5, b6, b7, b8, b9,
d2) {
b <- c(b1, b2, b3, b4, b5, b6, b7, b8, b9)
mu <- exp(regr %*% b)
-sum(dNBII(patents$P91, mu = mu, sigma = d2,
log = TRUE))
}
> llII <- function(b1, b2, b3, b4, b5, b6, b7, b8,
b9, d2) {
b <- c(b1, b2, b3, b4, b5, b6, b7, b8, b9)
mu <- exp(regr %*% b)
-sum(dNBI(patents$P91, mu = mu, sigma = d2, log = TRUE))
}
222
We can use the function mle2 in the package bbmle to obtain the maximum likelihood
parameter estimates with their standard errors and significance information.
> library(bbmle)
> b.start <- as.list(c(NegBinI$mu.coef, exp(NegBinI$sigma.coef)))
> names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
> NegBinIout <- mle2(llI, start = b.start)
> summary(NegBinIout)
Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Estimate Std. Error
b1 0.695898
0.507413
b2 0.577782
0.067676
b3 -0.786540
0.336884
b4 0.732261
0.185290
b5 0.144167
0.206440
b6 0.154857
0.255039
b7 -0.816659
0.268684
b8 0.400487
0.257415
b9 0.158445
0.198506
d2 95.564819 14.100465
z value
Pr(z)
1.3715
0.17023
8.5375 < 2.2e-16 ***
-2.3348
0.01956 *
3.9520 7.751e-05 ***
0.6984
0.48496
0.6072
0.54372
-3.0395
0.00237 **
1.5558
0.11976
0.7982
0.42476
6.7774 1.223e-11 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1696.391
By changing the relative tolerance value and incrementing the maximum number of
iterations we have:
> NegBinIout <- mle2(llI, start = b.start, control=list(reltol = 1e-15,
maxit = 50000))
> summary(NegBinIout)
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Pr(z)
b1 0.690189
0.506968 1.3614 0.173385
b2 0.578394
0.067628 8.5525 < 2.2e-16 ***
b3 -0.786539
0.336789 -2.3354 0.019522 *
223
b4 0.733320
0.185161 3.9604 7.481e-05 ***
b5 0.144998
0.206314 0.7028 0.482179
b6 0.155770
0.254981 0.6109 0.541259
b7 -0.817559
0.268611 -3.0437 0.002337 **
b8 0.400543
0.257280 1.5568 0.119508
b9 0.158789
0.198397 0.8004 0.423502
d2 95.243438 14.006343 6.8000 1.046e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1696.39
>
>
>
>
b.start<-as.list(c(round(NegBinI$mu.coef,2),exp(NegBinII$sigma.coef)))
b.start <- as.list(c(NegBinII.glm.nb$coef, 1/NegBinII.glm.nb$theta))
names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
NegBinIIout <- mle2(llII,start=b.start,control=list(reltol=1e-15,
maxit = 50000))
> summary(NegBinIIout)
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Estimate Std. Error
b1 -0.324623
0.498168
b2 0.831479
0.076595
b3 -1.497458
0.377230
b4 0.488611
0.256769
b5 -0.173551
0.298809
b6 0.059264
0.279293
b7 -1.530649
0.373899
b8 0.252223
0.426426
b9 -0.590497
0.278778
d2 1.300937
0.137459
z value
-0.6516
10.8555
-3.9696
1.9029
-0.5808
0.2122
-4.0937
0.5915
-2.1182
9.4641
Pr(z)
0.51464
< 2.2e-16
7.199e-05
0.05705
0.56137
0.83196
4.245e-05
0.55420
0.03416
< 2.2e-16
***
***
.
***
*
***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1639.191
Parameter estimates and their standard errors are now closer to the values in
Verbeeks Table 7.8.
224
7.6
Expenditures on Alcohool and Tobacco (Part 1) (Section

7.4.3)
Data are available in the file TOBACCO.WF1, which is a work file of EViews.
The function unzip allows to extract the file from the compressed archive ch07.zip.
> library(hexView)
> at <- readEViews(unzip("ch07.zip", "Chapter 7/tobacco.wf1"))
The file tobacco contains information about 2724 Belgian households, taken from
the Belgian household budget survey of 1995/96. The data are kindly supplied by the
National Institute of Statistics (NIS), Belgium. The following variables are present
(some are dummy variables):
BLUECOL dummy, 1 if head is blue collar worker (1)
WHITECOL dummy, 1 if head is white collar worker (1)
FLANDERS dummy, 1 if living in Flanders (2)
WALLOON dummy, 1 if living in Wallonie (2)
NKIDS number of children > 2 years old
NKIDS2 number of children <= 2 years old
NADULTS number of adults in household
LNX log total expenditures
SHARE2 budgetshare tobacco
SHARE1 budgetshare alcohol
NADLNX nadults x lnx
AGELNX age x lnx
AGE age in brackets (0-4)
D1 dummy, 1 if share1>0
D2 dummy, 1 if share2>0
W1 budgetshare alcohol ,if >0, missing otherwise
W2 budgetshare tobacco ,if >0, missing otherwise
LNX2 lnx squared
AGE2 age squared
The shares of families having zero expenditures on alcohol and tobaccoes may be
determined as:
> sum(at$SHARE1 == 0)/length(at$SHARE1)
[1] 0.171072
225
> sum(at$SHARE2 == 0)/length(at$SHARE2)

[1] 0.6196769
and the average budget shares over the corresponding subsamples are
> mean(at$SHARE1[at$SHARE1 > 0])
[1] 0.0215073
> mean(at$SHARE2[at$SHARE2 > 0])
[1] 0.03219081
The Engel curves are estimated separately for alcohol and tobacco expenditures3 .
To estimate a Tobit model, see Verbeeks Table 7.9, we can have recourse to the
tobit function available in the package AER, see Kleiber and Zeileis (2008). The first
argument is the model specification and the second one the data.frame containing
the variables in the model. To obtain the estimation output apply summary to the
tobit object obtained with the function tobit.
Observe that the tobit function returns the standard error and the z statistic for the
estimate of the logarithm of the scale parameter . In the output the estimate of the
scale parameter is also given. The Wald statistic is approximately equal to the one
provided by Verbeek.
> library(AER)
> at7.9a <- tobit(SHARE1 ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX, data = at)
> summary(at7.9a)
Call:
tobit(formula = SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at)
Observations:
Total
2724
Left-censored
466
Uncensored Right-censored
2258
0
Coefficients:
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
LNX
AGE:LNX
NADULTS:LNX
Log(scale)

-0.1591526 0.0437780
-3.635
0.0134936 0.0108824
1.240
0.0291900 0.0169468
1.722
-0.0026408 0.0006049
-4.366
-0.0038789 0.0023835
-1.627
0.0126678 0.0032156
3.939
-0.0008093 0.0008006
-1.011
-0.0022484 0.0012232
-1.838
-3.7123577 0.0153342 -242.096
3 Since AGE is in brackets, we should transform it in a factor:

at$AGE<-as.factor(at$AGE)
Pr(>|z|)
0.000277
0.214993
0.084988
1.27e-05
0.103652
8.17e-05
0.312072
0.066051
< 2e-16
***
.
***
***
.
***
226
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Scale: 0.02442
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 4755 on 9 Df
Wald-statistic: 118.8 on 7 Df, p-value: < 2.22e-16
> at7.9t <- tobit(SHARE2 ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX, data = at)
> summary(at7.9t)
Call:
tobit(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at)
Observations:
Total
2724
Left-censored
1688
Uncensored Right-censored
1036
0
Coefficients:
(Intercept) 0.5899802 0.0934269
6.315 2.70e-10 ***
AGE
-0.1258530 0.0241783
-5.205 1.94e-07 ***
NADULTS
0.0153697 0.0380475
0.404 0.68624
NKIDS
0.0042697 0.0013247
3.223 0.00127 **
NKIDS2
-0.0099719 0.0054713
-1.823 0.06837 .
LNX
-0.0444314 0.0068893
-6.449 1.12e-10 ***
AGE:LNX
0.0088221 0.0017832
4.947 7.52e-07 ***
NADULTS:LNX -0.0006007 0.0027501
-0.218 0.82709
Log(scale) -3.0366568 0.0246517 -123.183 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Scale: 0.048
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 758.7 on 9 Df
Wald-statistic: 171.3 on 7 Df, p-value: < 2.22e-16
To obtain the total expenditure elasticities evaluated at the sample averages of
those households that have positive expenditures, we have first to adapt Verbeeks
relationship (7.72) to the present context. The budget share for good j
wj =
pj qj
x
227
is modeled according to the following relationship:

wj (x) = 1j + 2j AGE + 3j NADULTS + 4j NKIDS + 5j NKIDS2 + 6j ln(x)
+ 7j AGE ln(x) + 8j NADULTS ln(x)
From the budget share we have:

qj = x
wj
wj (x)
=x
pj
pj
Deriving with respect to x we obtain:

wj (x)
dqj
1 dwj (x)
=
+x
dx
pj
pj dx

wj
1
1
x
1
=
+
6j + 7j AGE + 8j NADULTS
pj
pj
x
x
x
=
wj
1
+ (6j + 7j AGE + 8j NADULTS)
pj
pj
The elasticity follows multiplying by (x/qj )

ej =
dqj x
wj x
1 x
=
+
(6j + 7j AGE + 8j NADULTS)
dx qj
pj qj
pj qj
=1+
6j + 7j AGE + 8j NADULTS
.
wj
Let averages be the vector containing the average levels of the independent variables
in the model we have estimated for alcohool
> (averages <- apply(model.matrix(at7.9a)[at$SHARE1 >
0, ], 2, mean))
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
1.00000000 2.43489814 2.00177148 0.56864482 0.04384411
LNX
AGE:LNX NADULTS:LNX
13.77622903 33.42541096 27.74124174
The function model.matrix extracts the matrix of the regressors (including the
constant) of model at7.9a.
[at$SHARE1>0,] selects the observations with a positive expenditure in alcohol.
The column averages are obtained by means of the function apply(,2,mean).
The elements 2 and 3 of averages correspond respectively to the averages of the
variables AGE and NADULTS and coef(at7.9a)[6:8] are the 6j , 7j and 8j coefficient
estimates for alcohool.
The elasticity results:
228
> w_j <- mean(at$SHARE1[at$SHARE1 > 0])

> 1 + sum(c(1, averages[2:3]) * coef(at7.9a)[6:8])/w_j
[1] 1.288113
For tobacco we have:
> averages <- apply(model.matrix(at7.9t)[at$SHARE2 >
0, ], 2, mean)
> 1 + sum(c(1, averages[2:3]) * coef(at7.9t)[6:8])/w_j
[1] 0.1795591
7.7
Expenditures on Alcohool and Tobacco (Part 2) (Section

7.5.4)
For data reading and description see the preceding Section 7.6.
Verbeek suggests to estimate the Engel curve only for statistical units who have a
positive budget share, by means of OLS4 .
To this aim we can use the function lm by using the argument subset.
> at7.10a <- lm(SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE1 >
0))
> at7.10t <- lm(SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE2 >
0))
> library(memisc)
> mtable(at7.10a, at7.10t)
Calls:
at7.10a: lm(formula = SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE1 >
0))
at7.10t: lm(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE2 >
0))
=====================================
at7.10a
at7.10t
------------------------------------(Intercept)
0.053
0.490***
(0.044)
(0.074)
AGE
0.008
-0.031
(0.011)
(0.021)
4 As observed above since AGE is in brackets, we should transform it in a factor:
at$AGE<-as.factor(at$AGE)
229
NADULTS
-0.013
-0.013
(0.016)
(0.032)
NKIDS
-0.002***
0.001
(0.001)
(0.001)
NKIDS2
-0.002
-0.003
(0.002)
(0.005)
LNX
-0.002
-0.034***
(0.003)
(0.005)
AGE x LNX
-0.000
0.002
(0.001)
(0.002)
NADULTS x LNX
0.001
0.001
(0.001)
(0.002)
------------------------------------R-squared
0.051
0.154
adj. R-squared
0.048
0.148
sigma
0.022
0.029
F
17.270
26.732
p
0.000
0.000
Log-likelihood
5467.424 2200.044
Deviance
1.043
0.868
AIC
-10916.849 -4382.088
BIC
-10865.348 -4337.599
N
2258
1036
=====================================
Detailed results about single models can be obtained with summary(at7.10a) and
summary(at7.10t).
To obtain the total expenditure elasticities evaluated at the sample averages of those
households that have positive expenditures, we have to follow a procedure similar to
that presented above.
> averages <- apply(model.matrix(at7.10a), 2, mean)
[1] 0.922836
> averages <- apply(model.matrix(at7.10t), 2, mean)
[1] 0.1765833
Verbeek suggests then the estimation of two probit models.
> at7.11a <- glm(sign(SHARE1) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +
WHITECOL, family = binomial(link = probit), data = at)
> at7.11t <- glm(sign(SHARE2) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +
230
WHITECOL, family = binomial(link = probit), data = at)

> library(memisc)
> mtable(at7.11a, at7.11t)
Calls:
at7.11a: glm(formula = sign(SHARE1) ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX+AGE:LNX+NADULTS:LNX+BLUECOL+WHITECOL,family=binomial(link=probit),
data = at)
at7.11t: glm(formula = sign(SHARE2) ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX+AGE:LNX+NADULTS:LNX+BLUECOL+WHITECOL,family=binomial(link=probit),
data = at)
===========================================
at7.11a
at7.11t
------------------------------------------(Intercept)
-15.882*** 8.244***
(2.570)
(2.213)
AGE
0.668
-2.483***
(0.652)
(0.560)
NADULTS
2.255*
0.485
(1.020)
(0.872)
NKIDS
-0.077*
0.081**
(0.037)
(0.031)
NKIDS2
-0.186
-0.212
(0.140)
(0.123)
LNX
1.236*** -0.632***
(0.191)
(0.163)
BLUECOL
-0.061
0.206*
(0.098)
(0.084)
WHITECOL
0.051
0.022
(0.085)
(0.070)
AGE x LNX
-0.045
0.175***
(0.049)
(0.041)
NADULTS x LNX
-0.169*
-0.025
(0.074)
(0.063)
------------------------------------------Aldrich-Nelson R-sq.
0.060
0.038
McFadden R-sq.
0.069
0.030
Cox-Snell R-sq.
0.062
0.039
Nagelkerke R-sq.
0.103
0.053
phi
1.000
1.000
Likelihood-ratio
173.176
108.910
p
0.000
0.000
Log-likelihood
-1159.865 -1754.886
Deviance
2319.730
3509.772
AIC
2339.730
3529.772
BIC
2398.828
3588.871
231
N
2724
2724
===========================================
Detailed results pertaining single models can be obtained with summary(at7.11a)
and summary(at7.11t) and the goodness of fit statistics with pR2(at7.11a) and
pR2(at7.11t).
In presence of an household consisting of two adults, the head being a 35-year-old
(belonging to the second AGE class) blue-collar worker and two children older than
2, with total expenditures equal to the overall sample average, the implied estimated
probabilities of a positive budget share of alcohol and tobacco are respectively
(assuming the total expenditures to increase 10%).
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.800741
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.8215568
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.5171916
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.5045241
The estimated probabilities for alcohoolic beverages are slightly different from those
by Verbeek.
Observe that the standard errors of the parameter estimates obtained with glm
are somewhat different from those obtained by Verbeek since glm uses iteratively
reweighted least squares and not maximum likelihood as estimation method. The
probit function in the package sampleSelection provides maximum likelihood
estimates and standard errors.
> library(sampleSelection)
> at7.11aa <- probit(sign(SHARE1) ~ AGE + NADULTS +
NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, data = at)
> at7.11tt <- probit(sign(SHARE2) ~ AGE + NADULTS +
232
NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +

BLUECOL + WHITECOL, data = at)
> summary(at7.11aa)
-------------------------------------------Probit binary choice model/Maximum Likelihood estimation
Newton-Raphson maximisation, 5 iterations
Return code 1: gradient close to zero
Log-Likelihood: -1159.865
Model: Y == '1' in contrary to '0'
2724 observations (466 'negative' and 2258 'positive')
and 10 free parameters (df = 2714)
Estimates:
Estimate Std. error t value
Pr(> t)
2.573933 -6.1704 6.810e-10 ***
AGE
0.667851
0.652002 1.0243
0.30569
NADULTS
2.255389
1.024527 2.2014
0.02771 *
NKIDS
-0.077049
0.037246 -2.0686
0.03858 *
NKIDS2
-0.185719
0.140827 -1.3188
0.18725
LNX
1.235531
0.191295 6.4588 1.056e-10 ***
BLUECOL
-0.061166
0.097771 -0.6256
0.53157
WHITECOL
0.050564
0.084713 0.5969
0.55058
AGE:LNX
-0.044801
0.048540 -0.9230
0.35602
NADULTS:LNX -0.168792
0.074232 -2.2738
0.02298 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Significance test:
chi2(9) = 173.1762 (p=1.344063e-32)
-------------------------------------------> summary(at7.11tt)
-------------------------------------------Probit binary choice model/Maximum Likelihood estimation
Newton-Raphson maximisation, 4 iterations
Return code 1: gradient close to zero
Log-Likelihood: -1754.886
Model: Y == '1' in contrary to '0'
2724 observations (1688 'negative' and 1036 'positive')
and 10 free parameters (df = 2714)
Estimates:
Estimate Std. error t value
Pr(> t)
2.211077 3.7287 0.0001925 ***
AGE
-2.483000
0.559601 -4.4371 9.118e-06 ***
NADULTS
0.485200
0.871738 0.5566 0.5778085
NKIDS
0.081283
0.030830 2.6365 0.0083766 **
NKIDS2
-0.211662
0.123050 -1.7201 0.0854076 .
LNX
-0.632080
0.163195 -3.8731 0.0001074 ***
BLUECOL
0.206420
0.083432 2.4741 0.0133566 *
233
WHITECOL
0.021534
0.069428 0.3102 0.7564324
AGE:LNX
0.174736
0.041305 4.2303 2.334e-05 ***
0.062923 -0.4027 0.6871532
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Significance test:
chi2(9) = 108.9097 (p=2.450886e-19)
-------------------------------------------Note that the functions mtable and predict cannot (at the moment) be applied to
objects resulting from sampleSelection::probit.
The Engel curves are finally re-estimated by Verbeek with the two-step estimation
procedure proposed by Heckmann. The function selection, available in the package
sampleSelection can be used. The function selection depends on 4 main
arguments: selection: a formula specifying the (probit) selection model; outcome
a formula relating the outcome to its explanatory variables; data, a data.frame
containing the data to analyze; the method specifying the estimation method, in our
case "2step".
> library(sampleSelection)
> at7.12a <- selection(selection = sign(SHARE1) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE1 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> at7.12t <- selection(selection = sign(SHARE2) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE2 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> summary(at7.12a)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (466 censored and 2258 observed)
21 free parameters (df = 2704)
Probit selection equation:
2.57393 -6.170 7.83e-10 ***
AGE
0.66785
0.65200
1.024
0.3058
NADULTS
2.25539
1.02453
2.201
0.0278 *
NKIDS
-0.07705
0.03725 -2.069
0.0387 *
NKIDS2
-0.18572
0.14083 -1.319
0.1874
LNX
1.23553
0.19130
6.459 1.25e-10 ***
BLUECOL
-0.06117
0.09777 -0.626
0.5316
WHITECOL
0.05056
0.08471
0.597
0.5506
234
AGE:LNX
-0.04480
0.04854 -0.923
0.3561
0.07423 -2.274
0.0231 *
Outcome equation:
(Intercept) 0.0542675 0.1329935
0.408 0.68327
AGE
0.0077095 0.0130468
0.591 0.55463
NADULTS
-0.0133444 0.0247045 -0.540 0.58913
NKIDS
-0.0020244 0.0007637 -2.651 0.00808 **
NKIDS2
-0.0024127 0.0025715 -0.938 0.34821
LNX
-0.0024288 0.0093674 -0.259 0.79544
AGE:LNX
-0.0004044 0.0009420 -0.429 0.66773
NADULTS:LNX 0.0008461 0.0018047
0.469 0.63922
Multiple R-Squared:0.051,
Adjusted R-Squared:0.0476
Error terms:
invMillsRatio -0.0002045 0.0165285 -0.012
0.99
sigma
0.0214876
NA
NA
NA
rho
-0.0095160
NA
NA
NA
-------------------------------------------> summary(at7.12t)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (1688 censored and 1036 observed)
21 free parameters (df = 2704)
(Intercept) 8.24447
2.21108
3.729 0.000196 ***
AGE
-2.48300
0.55960 -4.437 9.48e-06 ***
NADULTS
0.48520
0.87174
0.557 0.577855
NKIDS
0.08128
0.03083
2.637 0.008425 **
NKIDS2
-0.21166
0.12305 -1.720 0.085522 .
LNX
-0.63208
0.16320 -3.873 0.000110 ***
BLUECOL
0.20642
0.08343
2.474 0.013418 *
WHITECOL
0.02153
0.06943
0.310 0.756456
AGE:LNX
0.17474
0.04131
4.230 2.41e-05 ***
0.06292 -0.403 0.687185
Outcome equation:
(Intercept) 0.4515813 0.1086284
4.157 3.32e-05 ***
AGE
-0.0172991 0.0358591 -0.482 0.629547
NADULTS
-0.0174378 0.0339635 -0.513 0.607693
NKIDS
0.0007643 0.0015130
0.505 0.613471
NKIDS2
-0.0020755 0.0053883 -0.385 0.700128
LNX
-0.0301094 0.0090459 -3.329 0.000885 ***
AGE:LNX
0.0012243 0.0025454
0.481 0.630568
235
NADULTS:LNX 0.0013650 0.0024238

0.563 0.573371
Multiple R-Squared:0.1542,
Adjusted R-Squared:0.1476
Error terms:
invMillsRatio -0.009018
0.018589 -0.485
0.628
sigma
0.029881
NA
NA
NA
rho
-0.301793
NA
NA
NA
-------------------------------------------To obtain the total expenditure elasticities evaluated at the sample averages of those
households that have positive expenditures, we have to follow the same procedure
adopted for the preceding models.
> averages <- apply(model.matrix(at7.12a), 2, mean,
na.rm = TRUE)
[1] 0.9200393
> averages <- apply(model.matrix(at7.12t), 2, mean,
na.rm = TRUE)
[1] 0.2334157
Robust Heckitt estimates can be obtained by having recourse to the function
heckitrob available in the package ssmrob
> library(ssmrob)
> with(at, {
at7.12arob <- heckitrob(selection = sign(SHARE1) ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX + BLUECOL + WHITECOL, outcome = SHARE1 ~
NADULTS:LNX,
control = heckitrob.control(weights.x1 = "robCov"))
summary(at7.12arob)
})
------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Estimate Std.Error t-value p-value
XS(Intercept) -15.95924691 2.68714692 -5.9390 2.87e-09 ***
XSAGE
0.69340378 0.68917714 1.0060 3.14e-01
XSNADULTS
2.28871511 1.06754439 2.1440 3.20e-02
*
XSNKIDS
-0.09667342 0.03852380 -2.5090 1.21e-02
*
XSNKIDS2
-0.22720078 0.14400781 -1.5780 1.15e-01
XSLNX
1.24363619 0.19999064 6.2180 5.02e-10 ***
236
XSBLUECOL
-0.06540005 0.10183231 -0.6422 5.21e-01
XSWHITECOL
0.01981345 0.08890862 0.2229 8.24e-01
XSAGE:LNX
-0.04739224 0.05139110 -0.9222 3.56e-01
XSNADULTS:LNX -0.17079467 0.07741752 -2.2060 2.74e-02
*
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.0481414810 0.506033190 0.09514
0.924
XOAGE
-0.0056858851 0.030390752 -0.18710
0.852
XONADULTS
-0.0044571146 0.075028273 -0.05941
0.953
XONKIDS
-0.0013210916 0.002505419 -0.52730
0.598
XONKIDS2
-0.0018052433 0.005900382 -0.30600
0.760
XOLNX
-0.0022514209 0.035468795 -0.06348
0.949
XOAGE:LNX
0.0005240257 0.002130696 0.24590
0.806
XONADULTS:LNX 0.0002636993 0.005524299 0.04773
0.962
imrData$IMR1 -0.0011748054 0.067667883 -0.01736
0.986
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
------------------------------------------------------------> with(at, {
at7.12trob <- heckitrob(selection = sign(SHARE2) ~
NADULTS:LNX + BLUECOL + WHITECOL, outcome = SHARE2 ~
NADULTS:LNX,
control = heckitrob.control(weights.x1 = "robCov"))
summary(at7.12trob)
})
------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Estimate Std.Error t-value p-value
XS(Intercept) 8.28126218 2.22147554 3.7280 1.93e-04 ***
XSAGE
-2.48452150 0.56216375 -4.4200 9.89e-06 ***
XSNADULTS
0.46546383 0.87382559 0.5327 5.94e-01
XSNKIDS
0.08116955 0.03087415 2.6290 8.56e-03 **
XSNKIDS2
-0.19818390 0.12341443 -1.6060 1.08e-01
XSLNX
-0.63408431 0.16394126 -3.8680 1.10e-04 ***
XSBLUECOL
0.20400053 0.08368888 2.4380 1.48e-02
*
XSWHITECOL
0.01799770 0.06979220 0.2579 7.97e-01
XSAGE:LNX
0.17500330 0.04148927 4.2180 2.46e-05 ***
XSNADULTS:LNX -0.02425829 0.06306736 -0.3846 7.01e-01
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.3979558830 0.162089777 2.4550 0.0141 *
XOAGE
-0.0323871073 0.060120904 -0.5387 0.5900
XONADULTS
-0.0117157109 0.036582449 -0.3203 0.7490
XONKIDS
0.0003799716 0.002269950 0.1674 0.8670
XONKIDS2
-0.0024720481 0.006241712 -0.3961 0.6920
XOLNX
-0.0264461249 0.014363553 -1.8410 0.0656 .
XOAGE:LNX
0.0022926188 0.004209442 0.5446 0.5860
XONADULTS:LNX 0.0009147453 0.002477952 0.3692 0.7120
imrData$IMR1 -0.0074295332 0.035604993 -0.2087 0.8350
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-------------------------------------------------------------
237
8
Univariate Time Series Models
8.1
Some examples of stochastic processes
We first consider the behaviour of some weakly stationary stochastic processes by

simulating realizations from a Gaussian White Noise, some autoregressive and some
moving average processes.1
8.1.1
The Gaussian White Noise
The Gaussian White noise consists of a sequence of identically and independently

distributed (i.i.d.) Normal random variables with mean 0 and the same variance
i.i.d.
t N ( = 0, 2 ).
To create a time series E with 100 Normal pseudo-random numbers with mean = 0
and variance 2 = 3 use the code
> E <- ts(data = rnorm(n = 500, mean = 0, sd = 3^0.5))
The function ts is used to create time series objects; its main arguments are the
data consisting in a numeric vector or matrix, start defining the time of the first
observation, which can be a single number or a vector of two integers (a natural
time unit and the number of samples into the time unit, e.g. c(2012,2) for 2012
February in a monthly series), end defining the time of the last observation, specified
in the same way as start, and one of the two options: frequency, the number of
observations per unit of time, or deltat, the fraction of the sampling period between
successive observations; e.g., 1/12 for monthly data.
By default both frequency and deltat are set to 1.
The graphical representation of the stochastic process {t } shows no regular
pattern. See Figure 8.1 that can be obtained with the code:
> plot(E)
Observe that {t } is a White Noise process when it consists of a sequence of
uncorrelated random variables with mean 0 and the same variance.
The Gaussian White Noise is an example of White noise.
1 We address readers interested in the financial implementations of stochastic processes to the
R-metrics site (https://www.rmetrics.org). The e-book by W
urtz et. al. (2009) is a good reference
presenting an overall introduction to the type of classes used by R for dealing with time series.
240
100
200
300
400
500
Time
Figure 8.1
8.1.2
Graphical representation of a Gaussian White Noise process
The Autoregressive Process
Simulation of a realization from an AR(1) Process

Lets consider the following stochastic finite difference equation2 , where {t } is a
White Noise process and |1 | < 1
yt = 1 yt1 + t .
(8.1)
If we assume {t } assigned, the unique, asymptotic, weakly stationary solution of (8.1)

is a stochastic process {yt } named AutoRegressive process of order 1 and denoted with
AR(1).
To simulate a realization y from an AR(1) stochastic process we need the
corresponding realization E of a White Noise process {t } and a first initial condition
2 Verbeeks notation is used: {y } is a 0 mean stochastic process, possibly obtained from {Y },
t
t
stochastic process whose components have common mean , as yt = Yt .
241
for {yt }: y0 the value of yt at t = 0

y[1] = 1 y0 + E[1]
y[2] = 1 y[1] + E[2]
(8.2)
...
y[n] = 1 y[n-1] + E[n]
Initial data need to be dropped to make their memory effect vanish. So we simulate
a realization longer than n.
> theta1 <- 0.9
> n <- 500
> E <- ts(rnorm(n + 100, mean = 0, sd = 1))
We can now define the initial condition y0 for {yt }, initialize a variable y with the
same length as E and obtain y[1]. Relationships (8.2) are then implemented by means
of a for cycle.
>
>
>
>
y0 <- 0
y <- E * 0
y[1] <- theta1 * y0 + E[1]
for (t in 2:(n + 100)) {
y[t] <- theta1 * y[t - 1] + E[t]
}
> y <- y[(length(y) - n + 1):length(y)]
Observe that if, as we assumed, |1 | < 1 we can obtain, by recursive substitutions, the
causal representation of {yt } as a linear filter based on the generating process {t }
yt =
1i ti ;
i=0
this relationship can be implemented by means of the function filter, providing as

arguments: the generating time series E, the vector of the autoregressive coefficients
and by specifying method="recursive"
> y <- ts(filter(E, filter = theta1, method = "recursive")[(length(E) n + 1):length(E)])
Figure 8.2 represents the behaviour of an AR(1) stochastic process which is
characterized by large deviations from the mean, here 0, to be followed by large
deviations from the mean and by a general custering also of small deviations from
the mean.
It can be obtained with:
> plot(y, ylab = paste("Yt AR(1), theta1= ", theta1))
To simulate a realization from an AR(1) process it is also possible to make direct use
of the function arima whose main arguments are the model, consisting of a list with
2
0
2
6
Yt AR(1), theta1= 0.9
242
100
200
300
400
500
Time
Figure 8.2
Graphical representation of an AR(1) process
components ar and/or ma giving the AR and MA coefficients respectively (an empty

list gives an ARIMA(0, 0, 0) model, that is a White Noise); n is the length of the
output series. See the help ?arima.sim for more information on the function and see
also ?arima for the definition of the ARIMA process used by the related R functions.
> y <- arima.sim(model = list(ar = theta1), n = 100)
The reader is invited to observe what happens by changing the theta1 parameter,
e.g. by considering 1 {0.8, 0.6, 0.4, 0.2, 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2}.
She will find that arima.sim returns an error for |1 | 1, since the stochastic process
{yt } becomes no more weakly stationary.
Simulation from an AR(2) process
The stochastic process {yt } named AutoRegressive process of order 2 and denoted
with AR(2) is the unique, asymptotic, weakly stationary solution of the stochastic
243
finite difference equation

yt = 1 yt1 + 2 yt2 + t
(8.3)
3
where {t } is a White Noise process and the roots of the characteristic equation
1 1 z 2 z 2 = 0
lie outside the unit circle.
To simulate a realization from an AR(2) stochastic process we need the realizations
of a White Noise process {t } and 2 initial conditions for {yt }, say y_1 and y0, the
values of {yt } at times t = 1 and 0
y[1] = 1 y0 + 2 y_1 + E[1]
y[2] = 1 y[1] + 2 y0 + E[2]
y[3] = 1 y[2] + 2 y[1] + E[3]
y[4] = 1 y[3] + 2 y[2] + E[4]
(8.4)
...
y[n] = 1 y[n-1] + 2 y[n-2] + E[n]
Initial data are dropped to make their memory effect vanish
>
>
>
>
>
>
>
>
>
theta1 <- 0.5

theta2 <- 0.2
y_1 <- y0 <- 0
n <- 500
E <- ts(rnorm(n + 100, mean = 0, sd = 1))
y <- E * 0
y[1] <- theta1 * y0 + theta2 * y_1 + E[1]
y[2] <- theta1 * y[1] + theta2 * y0 + E[2]
for (i in 3:(n + 100)) {
y[i] <- theta1 * y[i - 1] + theta2 * y[i - 2] +
E[i]
}
> y <- ts(y[(length(y) - n + 1):length(y)])
The realization can also be obtained by using the linear filter representation of the
process:
X
yt =
i ti ,
i=0
where the i do satisfy:

(1 1 z 2 z 2 )(1 + 1 z + 2 z 2 + . . .) = 1.
3 Observe
that sometimes the stationarity condition is expressed by means of the auxiliary equation
z 2 1 z 2 = 0
whose roots must lie inside the unit circle for the process {yt } to be stationary.
244
Observe that to define the filter we do not have to compute the coefficients i ,
which could also be obtained by recursive substitutions of relationship (8.3): we only
need to specify, as for the AR(1) process, the autoregressive coefficients and state
"recursive" as method.
> y <- ts(filter(E, filter = c(theta1, theta2), method =
"recursive")[(length(E) - n + 1):length(E)])
We can also make direct use of the function arima.sim
> y <- arima.sim(model = list(ar = c(theta1, theta2)),
n = 500)
8.1.3
The Moving Average Process
Simulation of a realization from a MA(1) Process

Let {t } be a White Noise process and consider the following stochastic finite
difference equation
yt = t + 1 t1 .
The unique, asymptotic, weakly stationary solution is a stochastic process {yt } named
Moving Average process of order 1 denoted with MA(1), which is invertible say {yt }
can be expressed as a linear filter of its past values only if |1 | < 1.
To simulate a realization from a MA(1) stochastic process we need the realizations
of a White Noise process {t } and also an initial condition, that is e0, the value of
{t } at time 0:
y[1] = e[1] + 1 e0
y[2] = e[2] + 1 e[1]
...
(8.5)
y[n] = e[n] + 1 e[n-1]
>
>
>
>
>
>
>
alpha1 <- 0.7

n <- 500
e0 <- rnorm(1, 0, 1)
E <- ts(rnorm(n, mean = 0, sd = 1))
y <- E * 0
y[1] <- E[1] + alpha1 * e0
for (t in 2:n) {
y[t] <- E[t] + alpha1 * E[t - 1]
}
Observe that the for cycle can be substituted with the following vector instruction:
245
> y <- E * 0
> y[1] <- E[1] + alpha1 * e0
> y[-1] <- E[-1] + alpha1 * E[-length(y)]
Regarding the linear filter representation observe that
yt =
i ti ,
with 0 = 1 and i = 0, i > 1.
i=0
This can be implemented with the function filter by specifying the moving average
coefficient, "convolution" as method and4 sides=1.
> y <- filter(E, filter = c(1, alpha1), method = "convolution",
sides = 1)
The series obtained with filter differs from the one built with the for cycle only in
the first observation y[1], since filter uses this observation as the initial condition.
Observe that initial conditions do not have any effect on the evolution of a moving
average process.
We can also make direct use of the function arima.sim:
> y <- arima.sim(model = list(ma = alpha1), n = 100)
The reader can observe what happens by varying 1 in the set {0.8, 0.6, . . . , 0.6, 0.8}.
Simulation from a MA(2) Process
The unique, asymptotic, weakly stationary solution of the following stochastic finite
difference equation, where {t } is a White Noise,
yt = t + 1 t1 + 2 t2
is a stochastic process {yt } named Moving Average process of order 2 denoted with
MA(2), which is invertible if the roots of the characteristic equation
1 + 1 z + 2 z 2 = 0
To simulate a realization from a MA(2) stochastic process we need the realizations of
4 With
sides=2 the following non-causal linear filter would be applied

Yt =
k
X
i ti ,
with 0 = 1,
i=k
the option filter defining the vector

[k , k+1 , . . . , k1 , k ].
1
0
1
3
Yt MA(1), alpha1= 0.7
246
100
200
300
400
500
Time
Figure 8.3
Graphical representation of a MA(1) process
a White Noise process {t } and two initial conditions, say e0 and e_1, for the values
0 and 1
y[1] = e[1] + 1 e0 + 2 e_1
y[2] = e[2] + 1 e[1] + 2 e0
y[3] = e[3] + 1 e[2] + 2 e[1]
y[4] = e[4] + 1 e[3] + 2 e[2]
...
y[n] = e[n] + 1 e[n-1] + 2 e[n-2]
> alpha1 <- 0.5

> alpha2 <- 0.2
> n <- 100
(8.6)
>
>
>
>
>
>
>
247
E <- ts(rnorm(n, mean = 0, sd = 1))

e0 <- rnorm(1, 0, 1)
e_1 <- rnorm(1, 0, 1)
y <- E * 0
y[1] <- E[1] + alpha1 * e0 + alpha2 * e_1
y[2] <- E[2] + alpha1 * E[1] + alpha2 * e0
y[-(1:2)] <- E[-(1:2)] + alpha1 * E[-c(1, length(y))] +
alpha2 * E[-c(length(y) - 1, length(y))]
We can also use the filter function, with method="convolution" and obtain the
same series except for the 2 initial values.
> y <- filter(E, filter = c(1, alpha1, alpha2), method = "convolution",
sides = 1)
The function arima.sim can also be used
> y <- arima.sim(model = list(ma = c(alpha1, alpha2)),
n = 100)
8.1.4
Simulation of a realization from an AR(1) process with drift
Let us now simulate a realization from the stochastic process {Yt } defined by the
following stochastic finite difference equation:
Yt = + 1 Yt1 + t
with = 2 and 1 = 0.8; {t } is an assigned Gaussian White Noise process.
{Yt } is an autoregressive process of order 1 with the presence of a drift.
To simulate a realization from this process we can use the following code:
>
>
>
>
n <- 500
alpha <- 2
theta1 <- 0.8
yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
alpha + rnorm(n)
})
The argument rand.gen in arima.sim specifies the generating model for {t }. Here
we considered a sequence of i.i.d. normal pseudo-random values shifted by the constant
alpha, which is equivalent to specify a sequence of normal pseudo-random values with
mean alpha:
> yt <- arima.sim(model = list(ar = theta1), n = n,
rnorm(n, mean = alpha)
})
12
10
8
6
Yt AR(1), theta1= 0.8 drift= 2
14
248
100
200
300
400
500
Time
Figure 8.4
Graphical representation of an AR(1) process with the presence of a drift
To obtain the plot of the time series {Yt }, see Fig. 8.4, use
> plot(yt, ylab = paste("Yt AR(1), theta1= ", theta1,
" drift= ", alpha))
We can compute the mean and the variance of Yt , which can be compared with their
theoretical values
1
= 10 and V ar(Yt ) =
= 2.778.
E(Yt ) =
1 1
1 12
> mean(yt)
[1] 10.10053
> var(yt)
[1] 2.179227
Finally we repeat the procedure k = 200 times and obtain summary statistics for the
mean and the variance and plot an histogram for the estimates. See Figures 8.5 and
8.6.
249
> k <- 200

> simula <- function(n = 500, alpha = 2, theta1 = 0.8) {
yt <- arima.sim(model = list(ar = theta1), n = n,
rnorm(n, mean = alpha)
})
return(c(mean(yt), var(yt)))
}
> a <- replicate(k, simula())
> rowMeans(a)
[1] 10.011119 2.734807
> apply(a, 1, var)
[1] 0.04990404 0.11620690
> hist(a[1, ], freq = FALSE)
> curve(dnorm(x, mean = mean(a[1, ]), sd = sd(a[1,
])), add = TRUE)
> hist(a[1, ], freq
xlim = c(9,
> par(new = TRUE)
> plot(density(a[1,
xlim = c(9,
8.2
= FALSE, main = "", xlab = "",

11), ylim = c(0, 2))
]), main = "histogram with kernel density",
11), ylim = c(0, 2))
Autocorrelation, Partial autocorrelation functions and

ARMA model identification
We now consider the study of the autocorrelation function and of the partial
autocorrelation function for a time series with the aim of identifying the order of
an autoregressive or of a moving average model.
8.2.1
Autocorrelation and Partial autocorrelation functions for an

AR(1) process with drift
Let us consider a realization of length n = 500 from an autoregressive stochastic

process of order 1 with drift:
Yt = const + 1 Yt1 + t
where const = 2, 1 = 0.8 and {t } a Gaussian White Noise
> n <- 500
> const <- 2
> theta <- 0.8
(8.7)
250
1.0
0.0
0.5
Density
1.5
2.0
Histogram of a[1, ]
9.4
9.6
9.8
10.0
10.2
10.4
10.6
a[1, ]
Figure 8.5
Mean estimates distribution for 200 replications of the simulation of {yt }
> yt <- arima.sim(model = list(ar = theta), n = n,

rnorm(n, mean = const)
})
The plot of the autocorrelation function (correlogram) of {Yt }, see Fig. 8.7, can be
obtained by using the function acf which also returns the values of the autocorrelation
function.
> (ytcorrelogram <- acf(yt))
Autocorrelations of series
Syt
S, by lag
0
1
2
3
4
5
6
7
8
1.000 0.801 0.621 0.480 0.383 0.324 0.251 0.191 0.133
9
10
11
12
13
14
15
16
17
0.071 0.034 -0.001 -0.044 -0.051 -0.075 -0.090 -0.110 -0.140
18
19
20
21
22
23
24
25
26
-0.146 -0.163 -0.158 -0.141 -0.126 -0.120 -0.112 -0.095 -0.091
251
0.0
0.5
1.0
Density
1.5
2.0
histogram with kernel density
9.0
9.5
10.0
10.5
11.0
N = 200 Bandwidth = 0.0569
Figure 8.6 Mean estimates distribution for 200 replications of the simulation of {yt } and
kernel estimate of the density
We can observe that the autocorrelation function has a quite slow decay; namely for
the AR(1) process defined by relationship (8.7) we have that:
Cor(Yt , Yt1 ) = Cor(yt , yt1 ) = 1 ,
Cor(Yt , Yt2 ) = Cor(yt , yt2 ) = 12 ,
...,
Cor(Yt , Ytk ) = Cor(yt , ytk ) = 1k ,
thus the autocorrelation function for an AR(1) process shows a decay of an exponential
type which is not very slow for larger than 0.7.
The plot of the partial autocorrelation function, see Fig. 8.8, can be obtained with
> (ytpacf <- pacf(yt))
Partial autocorrelations of series
Syt
S, by lag
1
252
0.4
0.2
0.0
0.2
ACF
0.6
0.8
1.0
Series yt
10
15
20
25
Lag
Figure 8.7
Autocorrelation function of {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
0.801 -0.058 -0.001

10
11
12
0.013 -0.031 -0.060
19
20
21
-0.051 0.019 0.011
0.034 0.047 -0.066 0.001 -0.033 -0.055

13
14
15
16
17
18
0.058 -0.065 -0.008 -0.037 -0.053 0.008
22
23
24
25
26
0.005 -0.038 0.009 0.019 -0.053
From a theoretical point of view the partial autocorrelation function of an AR(1)

process cuts off at lag 1.
As observed by Shumway and Stoffer (2011) different scales are used by acf and
pacf when plotting the autocorrelation and partial autocorrelation functions: in
particular with reference to the autocorrelation function also the non informative
autocorrelation at lag 0 is produced, which always results equal to 1. The function
acf2 in the package astsa returns the autocorrelation function and the partial
autocorrelation function plots with the same scale, on a unique device.
> library(astsa)
> t(acf2(yt, 20))
253
0.4
0.0
0.2
Partial ACF
0.6
0.8
Series yt
10
15
20
25
Lag
Figure 8.8
Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
0.8 0.62 0.48 0.38 0.32 0.25 0.19 0.13 0.07 0.03
0.8 -0.06 0.00 0.03 0.05 -0.07 0.00 -0.03 -0.06 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 -0.04 -0.05 -0.08 -0.09 -0.11 -0.14 -0.15 -0.16 -0.16
PACF -0.03 -0.06 0.06 -0.07 -0.01 -0.04 -0.05 0.01 -0.05 0.02
ACF
PACF
Since the values of the autocorrelation and partial autocorrelation functions are
returned as columns of a matrix, we prefer to use, for typographical convenience, the
transpose operator when invoking the acf2 function. The second argument in acf2
establishes the number of lags to be considered when producing the correlogram.
Observe that it is also possible to use the function PacfPlot available in the package
FitAR, which returns 95% confidence intervals for the partial autocorrelations. See
Fig. 8.10.
> library(FitAR)
> PacfPlot(yt)
254
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.9 Autocorrelation and Partial autocorrelations, via acf2, for {yt }, AR(1) process
with drift ( = 0.8, drif t = 2)
As already observed the autocorrelation function shows a slow decay, while the partial
autocorrelation function cuts off at lag 1; so we can conclude that an autoregressive
model of order 1 can fit the data.
8.2.2
Autocorrelation and Partial autocorrelation functions for some

AR(p) processes with drift
We consider the simulation of a realization from the following stochastic process:

Yt = const + 1 Yt1 + 2 Yt2 + 3 Yt3 + 4 Yt4 + 5 Yt5 + t
with const = 2, {t } a Gaussian White Noise and 1 = 0.5, 2 = 0.1, 3 = 0.2, 4 =
0, 5 = 0.2.
> n <- 500
> const <- 2
255
0.0
1.0
0.5
pacf
0.5
1.0
10
12
14
lag
95% confidence intervals for pacf
Figure 8.10 Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
(95% confidence intervals)
Try with
> yt <- arima.sim(model = list(ar = c(0.5,0.1,0.2,0,.2)), n = n,
rand.gen = function(n) {rnorm(n, mean = const)})
To understand why you did not succeed in the generation of a realization from this
process, we can check if the roots of the characteristic equation allow for a stationary
solution of the stochastic difference equation. This can be made with the function
InvertibleQ in the package FitAR which checks if the roots of the characteristic
equation, here
1 0.5z 0.1z 2 0.2z 3 0.2z 5 = 0,
> library(FitAR)
> InvertibleQ(c(0.5, 0.1, 0.2, 0, 0.2))
[1] FALSE
256
Let us now simulate a realization from the following stochastic processes

Yt = const + 1 Yt1 + 2 Yt2 + 3 Yt3 + 4 Yt4 + 5 Yt5 + t
with const = 2, {t } a Gaussian White Noise and:
Process
Process
Process
Process
1
2
3
4
1
1
1
1
= 0.5
=0
= 0.1
= 0.2
2
2
2
2
= 0.1
= 0.1
= 0.5
= 0.1
3
3
3
3
= 0.2
= 0.2
=0
=0
4
4
4
4
=0
=0
= 0.2
= 0.5
5
5
5
5
=0
= 0.5
=0
=0
(8.8)
The autocorrelation function, the partial autocorrelation function and the confidence
intervals for the partial autocorrelation function will also be plotted.
We can define a function for simulating the four processes.
> n <- 500
> const <- 2
> genera <- function(thetas) {
yt <<- arima.sim(model = thetas, n = n, rand.gen = function(n) {
rnorm(n, mean = const)
})
print("theta parameters: ")
print(paste("th", 1:length(thetas[[1]]), "=",
thetas[[1]], sep = "", collapse = ","))
print(t(acf2(yt, 20)))
}
See Figures 8.11, 8.12, 8.13, 8.14.
For these processes it is more complicated to derive the theoretical behaviour of
the autocorrelation function analitically. A simpler way (the reader is invited to try
with this method) is to simulate a very long realization from the processes (e.g. with
n = 105 ) and check for the behaviour of the estimated autocorrelation and partial
autocorrelation functions, which will be very close to their theoretical counterparts.
We can, however, observe that the partial autocorrelation function can help us
identifying the order of the autoregressive model apt to describe the involved time
series. It cuts off at the maximum autoregressive lag.
257
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.11 Autocorrelation and Partial autocorrelation plots for a realization from
Process 1: AR with drif t = 2, the autoregressive coefficients are 0.5, 0.1, 0.2, 0, 0
> genera(list(ar = c(0.5, 0.1, 0.2, 0, 0)))

[1] "theta parameters: "
[1] "th1=0.5,th2=0.1,th3=0.2,th4=0,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.68 0.57 0.55 0.47 0.38 0.36 0.31 0.26 0.19 0.18
PACF 0.68 0.20 0.19 0.02 -0.06 0.06 -0.02 -0.01 -0.08 0.04
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.18
0.2 0.20 0.24 0.22 0.25 0.24 0.26 0.29 0.32
PACF 0.06
0.1 0.05 0.09 -0.01 0.07 -0.01 0.04 0.06 0.09
258
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
0.2
0.2
ACF
0.6
1.0
Series: yt
Process 2: AR with drif t = 2, the autoregressive coefficients are 0, 0.1, 0.2, 0, 0.5
> genera(list(ar = c(0, 0.1, 0.2, 0, 0.5)))

[1] "th1=0,th2=0.1,th3=0.2,th4=0,th5=0.5"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF -0.1 0.21 0.29 -0.09 0.64 -0.09 0.13 0.23 -0.09 0.42
PACF -0.1 0.20 0.35 -0.07 0.57 -0.05 -0.04 -0.07 -0.01 0.02
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.13 0.08 0.16 -0.13 0.26 -0.15 0.03
0.1 -0.12 0.18
PACF -0.09 -0.01 -0.02 -0.05 -0.03 -0.04 0.00
0.0 0.05 0.05
259
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Process 3: AR with drif t = 2, the autoregressive coefficients are 0.1, 0.5, 0, 0.2, 0
> genera(list(ar = c(0.1, 0.5, 0, 0.2, 0)))

[1] "th1=0.1,th2=0.5,th3=0,th4=0.2,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.13 0.59 0.09 0.51 0.11 0.36 0.09 0.23 0.08 0.18 0.09
PACF 0.13 0.58 -0.03 0.25 0.04 -0.03 0.01 -0.08 0.00 0.01 0.04
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.18 0.10 0.12 0.12 0.06 0.08 0.05 0.10 0.06
PACF 0.11 0.03 -0.03 0.05 -0.11 -0.05 0.02 0.03 0.07
260
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Process 4: AR with drif t = 2, the autoregressive coefficients are 0.2, 0.1, 0, 0.5, 0
> genera(list(ar = c(0.2, 0.1, 0, 0.5, 0)))

[1] "th1=0.2,th2=0.1,th3=0,th4=0.5,th5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.39 0.29 0.27 0.60 0.36 0.27 0.24 0.40 0.27 0.21 0.21
PACF 0.39 0.16 0.14 0.53 0.00 0.01 0.05 0.03 -0.05 -0.01 0.03
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.26
0.2 0.17 0.17 0.16 0.11 0.14 0.18 0.12
PACF -0.05
0.0 0.01 -0.01 -0.03 -0.05 0.05 0.08 -0.04
320
210
320
200
320
200
BIC
BIC
310
310
300
190
190
180
290
180
270
250
270
240
270
240
260
230
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
180
300
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
300
260
261
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
230
220
250
220
250
210
240
210
Figure 8.15 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8) from some autoregressive
processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4
The function armasubsets available in the package TSA can also be used to establish
the order of the autoregressive process, that is for ARMA model selection. The
selection algorithm orders different models, chosen following a method proposed by
Hannan and Rissanen (1982), according to their BIC (Bayesian Information Criterion)
value. See Fig. 8.15.
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
> detach("package:TSA")
We detach the package TSA since it re-defines the functions acf and arima.
262
8.2.3
Autocorrelation and Partial autocorrelation functions for a

MA(1) process
Let us now simulate a realization from the following stochastic process:

yt = t + 1 t1
with 1 = 0.8 and {t } a Gaussian White Noise.
> n <- 500
> alpha <- 0.8
> yt <- arima.sim(model = list(ma = alpha), n = n)
and plot the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt }, see Fig. 8.16.
> source("acf2.r")
> t(acf2(yt, 20, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.52 -0.01 -0.11 -0.08 -0.04 -0.07 -0.05 0.04 0.06 0.01
PACF 0.52 -0.38 0.16 -0.14 0.08 -0.15 0.11 0.00 0.00 -0.02
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.04 -0.10 -0.07 0.02 0.06 0.02 -0.06 -0.07 -0.01 -0.01
PACF -0.03 -0.09 0.04 0.03 0.02 -0.05 -0.06 0.01 0.00 -0.02
We can observe, as expected from theoretical results pertaining a MA(1) process,
that the autocorrelation function cuts off at lag 1 while the partial autocorrelation
function shows an exponential decay.
By plotting the autocorrelation function two standard error bounds for k based
on the estimated variance 1 + 2
21 + . . . + 2
2k1 have been included, see Verbeeks
relationship (8.64). Observe that this estimate of the variance holds under the
hypothesis that q = 0 for q > k, that is the process is a Moving Average process of
order k. On the booksite www.educatt.it/libri/materiali a version of the function
acf2 is present, which includes also the logical argument ma.test that can be set to
FALSE or TRUE, according respectively to the white noise or the moving average
hypotheses. Namely, if we want check if a series of residuals is white noise we will
leave ma.test=FALSE.
8.2.4
Autocorrelation and Partial autocorrelation functions for some

MA(p) processes
Let us simulate a realization from the following stochastic process:

yt = t + 1 t1 + 2 t2 + 3 t3 + 4 t4 + 5 t5
(8.9)
with {t } a Gaussian White Noise and 1 = 0.5, 2 = 0.1, 3 = 0.2, 4 = 0, 5 = 0.2.
263
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.16
( = 0.8)
Autocorrelation and partial autocorrelation functions of {yt }, MA(1) process
> n <- 500

> yt <- arima.sim(model = list(ma = c(0.5, 0.1, 0.2,
0, 0.2)), n = n)
To establish if the process is stationary and invertible, we check if the roots of the
characteristic equation
1 0.5z 0.1z 2 0.2z 3 0.2z 5 = 0,
allow for an invertible solution of the stochastic difference equation (8.9).
> library(FitAR)
> InvertibleQ(c(0.5, 0.1, 0.2, 0, 0.2))
[1] FALSE
264
Let us now plot the autocorrelation function and the partial autocorrelation function
for the following parameter configurations:
Process
Process
Process
Process
1
2
3
4
1
1
1
1
= 0.5
=0
= 0.1
= 0.2
2
2
2
2
= 0.1
= 0.1
= 0.5
= 0.1
3
3
3
3
= 0.2
= 0.2
=0
=0
4
4
4
4
=0
=0
= 0.2
= 0.5
5
5
5
5
=0
= 0.5
=0
=0
(8.10)
> n <- 500

> genera <- function(alphas, ma.test = TRUE) {
yt <<- arima.sim(model = alphas, n = n)
print("alpha parameters: ")
print(paste("al", 1:length(alphas[[1]]), "=",
alphas[[1]], sep = "", collapse = ","))
source("acf2.r")
print(t(acf2(yt, 20, ma.test = ma.test)))
}
See Figures 8.17, 8.18, 8.19, 8.20.
As before to study the theoretical behaviour of the autocorrelation and partial
autocorrelation functions the reader can simulate a very long realization from the
processes (e.g. with n = 105 ).
Observe that the autocorrelation function (correlogram) can help us identifying the
order of the moving average model apt to describe the involved time series. It cuts
off at the maximum moving average lag.
265
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Process 1: MA with moving average parameters 0.5, 0.1, 0.2, 0, 0
> genera(list(ma = c(0.5, 0.1, 0.2, 0, 0)), ma.test = TRUE)

[1] "alpha parameters: "
[1] "al1=0.5,al2=0.1,al3=0.2,al4=0,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.52 0.2 0.12 -0.02 0.02 0.08 0.09 0.02 0.02 0.01 0.01
PACF 0.52 -0.1 0.07 -0.13 0.13 0.02 0.05 -0.09 0.06 -0.02 0.03
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.03 0.07 0.07 0.00 -0.02 -0.03 -0.06 0.02 0.04
PACF -0.01 0.08 0.00 -0.06 -0.01 -0.01 -0.03 0.08 -0.04
266
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Process 2: MA with moving average parameters 0, 0.1, 0.2, 0, 0.5
> genera(list(ma = c(0, 0.1, 0.2, 0, 0.5)), ma.test = TRUE)

[1] "al1=0,al2=0.1,al3=0.2,al4=0,al5=0.5"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.01 0.17 0.25 -0.02 0.37 -0.02 -0.05 0.05 -0.05 0.00
PACF 0.01 0.17 0.25 -0.04 0.31 -0.07 -0.16 -0.12 0.01 -0.08
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.04 -0.04 0.04 -0.02 -0.01 -0.04 -0.07 0.04 0.00 -0.04
PACF 0.01 0.07 0.10 -0.01 0.01 -0.08 -0.11 -0.01 0.06 0.00
267
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Process 3: MA with moving average parameters 0.1, 0.5, 0, 0.2, 0
> genera(list(ma = c(0.1, 0.5, 0, 0.2, 0)), ma.test = TRUE)

[1] "al1=0.1,al2=0.5,al3=0,al4=0.2,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.09 0.47 -0.03 0.18 -0.04 0.03 0.03 0.00 0.00 0.00
PACF 0.09 0.46 -0.11 -0.03 0.02 -0.06 0.08 0.01 -0.06 0.03
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.04 0.07 0.07 0.11 0.05 0.07 0.06 0.05 0.11 0.08
PACF 0.06 0.07 0.04 0.05 -0.02 0.00 0.07 0.01 0.08 0.06
268
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Process 4: MA with moving average parameters 0.2, 0.1, 0, 0.5, 0
> genera(list(ma = c(0.2, 0.1, 0, 0.5, 0)), ma.test = TRUE)

[1] "al1=0.2,al2=0.1,al3=0,al4=0.5,al5=0"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.16 0.13 0.03 0.4 0.04 0.04 -0.06 -0.02 0.04 0.01
PACF 0.16 0.10 0.00 0.4 -0.09 -0.03 -0.05 -0.20 0.13 0.00
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.03 -0.07 -0.03 -0.02 0.00 -0.07 0.00 0.01 0.04 -0.04
PACF 0.00 0.03 -0.12 0.00 0.02 -0.07 0.11 0.01 0.01 0.00
110
150
110
150
110
140
100
94
130
88
130
82
100
110
100
110
99
100
95
98
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
140
89
269
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag3
errorlag4
errorlag5
errorlag1
errorlag2
110
150
BIC
150
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
97
84
94
80
89
78
83
Figure 8.21 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8)from some moving average
processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4
The function armasubsets, see Section 8.2.2, available in the package TSA can also
be used for model selection. See Figures 8.21.
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
We can observe that the methods, described above, are not always definitively
resolving for the identification of ARMA models fitting the 4 time series. For Process
3, the autocorrelation function suggests the correct moving average order of the
generating process, while both the partial autocorrelation function and Bayesian
Information Criterion give hints about the presence of an autoregressive model. We
consider again the issue in Section 8.2.6.
270
8.2.5
Autocorrelation and Partial autocorrelation functions for an

ARMA(1,1) process
We simulate a realization from the following stochastic process:

yt = 1 yt1 + t + 1 t1
with 1 = 0.8, 1 = 0.5 and {t } a Gaussian White Noise.
>
>
>
>
>
n <- 500
theta <- 0.8
alpha <- 0.5
set.seed(1234)
yt <- arima.sim(model = list(ar = theta, ma = alpha),
n = n)
and plot the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt }. See Fig. 8.22.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.89 0.70 0.54 0.41 0.3 0.21 0.15 0.10 0.04 0.00 -0.04
PACF 0.89 -0.43 0.20 -0.14 0.0 0.02 0.01 -0.11 0.01 -0.01 -0.05
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.07 -0.1 -0.12 -0.11 -0.08 -0.04 0.00 0.02 0.04
PACF -0.03
0.0 0.00 0.06 0.05 0.03 0.01 -0.04 0.04
We can observe that the autocorrelation and the partial autocorrelation functions die
out very slowly.
A large order is necessary to identify both an AR model and a MA model to the
simulated time series. So the recourse to a more parsimonious ARMA model can help
relieving model complexity.
Fig. 8.23 shows the identification by means of the BIC criterion using the function
armasubsets
> library(TSA)
> plot(armasubsets(yt, nar = 5, nma = 5))
8.2.6
Problems in identifying an ARMA model for a time series
We simulate a realization from the ARMA(4,0,1) stochastic process:

yt = 0.2yt1 + 0.7yt4 + t + 0.4t1
with 1 = 0.8 and {t } a Gaussian White Noise.
271
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.22 Estimate of the autocorrelation and partial autocorrelation functions for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5
> n <- 100

> set.seed(12345)
> yt <- arima.sim(model = list(ar = c(0.2, 0, 0, 0.7),
ma = 0.4), n = n)
and check the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt } for identifying the order of an ARMA model. See Fig. 8.24.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.36 -0.20 0.04 0.61 0.37 -0.12 -0.17 0.27 0.31 0.03
PACF 0.36 -0.37 0.36 0.52 -0.14 0.04 -0.15 0.04 -0.02 0.11
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.10 0.17 0.22 0.08 -0.04 0.11 0.16 0.05 -0.09 0.00
PACF 0.13 0.09 -0.11 0.04 -0.06 0.02 0.05 -0.06 -0.03 -0.11
errorlag5
errorlag4
errorlag3
errorlag2
errorlag1
Ylag5
Ylag4
Ylag3
Ylag2
Ylag1
(Intercept)
272
840
840
840
BIC
830
830
820
820
720
Figure 8.23 Identification of an ARMA model by means of the BIC criterion for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5
We consider the behaviour of the armasubsets function for different values of the
arguments nar and nma, see Fig. 8.25.
> library(TSA)
> plot(armasubsets(yt, nar = 5,
Reordering variables and trying
nma = 5))
again:
nma = 5))
again:
nma = 8))
again:
nma = 8))
again:
273
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.24 Autocorrelation and partial autocorrelation functions of {yt }, AR(1) process
with parameters = 0.8
We can conclude that identification methods based on the use of the autocorrelation
function and the partial autocorrelation function or on the use of information criterion,
like the BIC, can help the researcher but cannot definitively solve the problem.
In particular the function armasubsets has to be called for different combinations
of the maximum AR and MA orders to check for the stability of the proposed
identification solution. Here the model suggested by armasubsets depends clearly
on the maximum order of nar and nma the researcher has chosen.
Another solution is to use the tools available in the package forecast, which
perform automatic model selection and forecasting with reference to Exponential
smoothing and ARIMA methods, see Hyndman and Khandakar (2008). In particular
the function auto.arima deals also with seasonal and integrated5 time series providing
the possible orders of differentiation by having recourse to the KPSS and CanovaHansen tests. See ?auto.arima for more information.
5 See Section 8.4 for Integrated ARMA (ARIMA) models. Seasonal ARMA models are not treated
here.
54
53
54
52
51
49
50
BIC
54
47
48
44
40
40
38
38
33
36
56
54
54
54
52
51
51
50
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1
errorlag2
errorlag3
errorlag4
42
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag1
errorlag2
errorlag3
errorlag4
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
274
47
48
42
44
40
40
38
38
33
36
Figure 8.25 Identification for diffferent choices of nar and nma for {yt }, ARMA(1,0)
process with parameter = 0.8
> library(forecast)
> auto.arima(yt, max.p = 10, max.q = 10)
Series: yt
ARIMA(3,1,2)
Coefficients:
ar1
-0.7519
s.e.
0.0914
ar2
-0.6415
0.0884
ar3
-0.7543
0.0674
ma1
0.3568
0.1375
ma2
-0.2349
0.1319
sigma^2 estimated as 0.8383: log likelihood=-133.06

AIC=278.12
AICc=279.03
BIC=293.69
Observe that AIC and BIC criteria are computed by armasubsets and auto.arima
with different formulae. The reader will note that the proposed solution is again
different from those obtained with armasubsets. We have encountered a realization
275
giving serious problems in its parameter identification.

Sometimes the frequency of data may help the researcher solving the identification
problem: e.g. in presence of daily data an AR(7) component may be present, or an
AR(5) if data were collected only on working days.
8.3
On the bias of the OLS estimator of the autoregressive

coefficient for an AR(1) process with AR(1) errors
Let {yt } be an autoregressive stochastic process of order 1 whose evolution does

not depend upon a White Noise generating mechanism for the error but on an
autoregressive process of order 1, say:
yt = yt1 + ut
ut = ut1 + t
with = 0.5 and = 0.5 and {t } a Gaussian White Noise. We consider the
estimation of the parameter in the first relationship by means of Ordinary Least
Squares without taking into account the autoregressive nature of the error {ut }.
Let us simulate a realization of length n = 100:
>
>
>
>
>
n <- 100
beta <- 0.5
rho <- 0.5
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n, innov = ut)
The argument innov in the function arima.sim defines the sequence to be used as
error in the generating process.
The estimate of results:
> lm(yt[-1] ~ -1 + yt[-n])$coef
yt[-n]
0.7504439
which is quite different from the theoretical value 0.5.
We can expect the following theoretical value for the bias (according to asymptotic
results):
2
) = (1 ) = 0.3.
plim(E()
1 +
Let now perform the preceding task k = 1000 times, obtain summary statistics and
plot an histogram for the bias of the estimates of the coefficient (beta).
To this purpose we can create a function
> simularar <- function(n = 100, beta = 0.5, rho = 0.5) {
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n,
innov = ut)
lm(yt[-1] ~ -1 + yt[-n])$coef
}
276
50
Frequency
100
150
Histogram of betahat beta
0.10
0.15
0.20
0.25
0.30
0.35
0.40
betahat beta
Figure 8.26
Distribution of the parameter estimate bias
To replicate k times the preceding function and collect the results, we can use the
function replicate
> k <- 1000
> betahat <- replicate(k, simularar())
> summary(betahat - beta)
Min. 1st Qu. Median
Mean 3rd Qu.
0.1159 0.2587 0.2953 0.2901 0.3234
Max.
0.4174
Figures 8.26 and 8.27 show the histogram and the density estimate for the distribution
of the bias of the parameter estimates. The graphs can be obtained with the code
> hist(betahat - beta)
> plot(density(betahat - beta))
To obtain in a unique graph, see Fig. 8.28, the histogram, the kernel density and the
normal density use the code.
277
4
0
Density
density.default(x = betahat beta)
0.1
0.2
0.3
0.4
N = 1000 Bandwidth = 0.01092
Figure 8.27
Density estimate of the distribution of the parameter estimate bias
> hist(betahat - beta, freq = FALSE, breaks = 11, xlim = c(0.1,

0.45), ylim = c(0, 10), xlab = "", main = "")
> par(new = TRUE)
> plot(density(betahat - beta), xlim = c(0.1, 0.45),
ylim = c(0, 10), xlab = "", main = "")
> curve(dnorm(x, mean(betahat - beta), sd(betahat beta)), lwd = 3, add = TRUE)
Pay attention the latter code has a disadvantage, in the sense that we have to specify
the ylim.
It is possible to use a code that makes use of the package lattice, see Fig. 8.29 top.
The general instruction is in this case histogram; the successive panels establish
what effectively will appear on the graph: in this case a normal density (with the
same mean and variance of the data) and the kernel estimate of the density function.
With the parameter n, a smootheed plot of the density like in Fig. 8.27 is produced.
The advantage of using the function histogram is that the axis limits can be defined
only once.
Density
10
278
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Figure 8.28 Histogram, Kernel density estimate, and Normal distribution approximation
of the distribution of the parameter estimate bias
See Sarkar (2008) for a detailed presentation of the package lattice and at
demo(lattice) and ?lattice::lattice for its main features.
> library(lattice)
> tp1 <- histogram(~(betahat - beta), type = "density",
breaks = 11, panel = function(x, ...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x),
sd = sd(x)), n = 101)
panel.densityplot(x, col = "black", lwd = 1,
n = 101, ...)
})
Package lattice defines the axis limits by internal algorithms, see Fig. 8.29 top,
being always possible to define their values when results from automatic procedures
are not satisfying, see Fig. 8.29 bottom.
279
Density
0
0.1
0.2
0.3
0.4
(betahat beta)
Density
8
6
4
2
0.1
0.2
0.3
0.4
(betahat beta)
Figure 8.29 Histogram, Normal distribution approximation and Kernel density estimate
of the distribution of the parameter estimate bias by means of the package lattice
> tp2 <- histogram(~(betahat - beta), type = "density",

breaks = 11, ylim = c(0, 10), panel = function(x,
...) {
lwd = 3, args = list(mean = mean(x),
sd = sd(x)), n = 101)
panel.densityplot(x, col = "black", lwd = 1,
n = 101, ...)
})
> plot(tp1, split = c(1, 1, 1, 2))
> plot(tp2, split = c(1, 2, 1, 2), newpage = FALSE)
280
8.3.1
Some remarks on the use of the function curve
Let y be a vector of 1000 pseudo-random numbers from a Normal distribution, with

mean 10 and variance 4.
To produce the histogram of y use the function freq, by specifying freq=FALSE to
obtain densities:
> y <- rnorm(1000, mean = 10, sd = 2)
> hist(y, freq = FALSE)
If you want to add to the graph the density function of the normal distribution with
the mean and s.d. estimated by your data, you can use the function code:
> curve(dnorm(x, mean = mean(y), sd = sd(y)), add = TRUE)
The variable x, you see in the function dnorm is an internal variable defining the
domain of the density function. The parameters mean and sd of the function dnorm
must not refer to x. So your vector of random numbers cannot be named x:
Try the following code, which does not work correctly:
> x <- rnorm(1000, mean = 10, sd = 2)
> hist(x, freq = FALSE)
> curve(dnorm(x, mean = mean(x), sd = sd(x)), add = TRUE)
and the following one, which works correctly.
>
>
>
>
x <- rnorm(1000, mean = 10, sd = 2)

y <- x
hist(x, freq = FALSE)
curve(dnorm(x, mean = mean(y), sd = sd(y)), add = TRUE)
The problem does not ensue when you make recourse to the function histogram of
the package lattice; the general call is histogram applied to the data x (remember
to use the ~ simbol); the panel function specifies which curves have to be plotted.
> library(lattice)
> histogram(~x, type = "density", panel = function(x,
...) {
lwd = 3, args = list(mean = mean(x), sd = sd(x)),
n = 101)
})
8.4
Estimation of ARIMA Models with the function arima
To estimate6 an ARIMA model it is possible to make use of the function arima,

available in the package stats, which is automatically loaded when R starts.
6 In this section it is assumed the possible presence of unit roots in the characteristic equation
pertaining the autoregressive part of the model, so the more general ARIMA (AutoRegressive
Integrated Moving Average) models (see Verbeeks Section 8.3) are considered. Tests for detecting
the presence of unit root are presented in Section 8.7 following Verbeeks example 8.4.4.
281
The main arguments of the function arima are:
the time series x to be modeled;
the order = c(p, d, q) where p and q are respectively the order of the
autoregressive and of the moving average parts, and the integer d is the possible
order of differentiation to render the time series x weakly stationary;
xreg is a vector or a matrix of external regressors describing a deterministic

trend for the series x;
the argument include.mean allows a mean term to be included in the model;

it is by default set to TRUE while it is ignored when d > 0;
Moreover, but the subject is not treated here, the argument seasonal
= list(order = c(0, 0, 0), period = NA) is available for dealing with
seasonal time series, see Hyndman and Khandakar (2008).
The general form for a stationary ARMA(p, q) process, as considered by R, see also
Brockwell and Davis (1991), is:
Yt = 1 (Yt1 ) + . . . + p (Ytp ) + t + 1 t1 + . . . + q tq
(8.11)
where = E(Yt ) is the mean value common to the components of the stochastic
process {Yt }.
The stochastic difference equation may be also referred to the de-meaned process
yt = Yt , or to a de-trended process yt = Yt g(t, Xt ), where g(t, Xt ) is a function
of the time and/or of some other stochastic process {Xt }:
yt = 1 yt1 + 2 yt2 + . . . + p ytp + t + 1 t1 + 2 t2 + . . . + q tq ,
for which it follows that = E(yt ) = 0, t. The above relationships can be re-written
by means of the polynomials in the backward operator B : Byt = yt1 :
p (B) = 1 1 B 2 B 2 . . . p B p ,
q (B) = 1 + 1 B + 2 B 2 + . . . + q B q ,
as follows:
p (B)(Yt ) = q (B)t
or
p (B)yt = q (B)t .
Remind that process {Yt } is stationary if the roots of p (z) = 0 lie outside the unit
circle and it is invertible if the roots of q (z) = 0 lie outside the unit circle. In case
some unit roots, say d, were present in p+d (z) = 0 then {Yt } must be differenced d
times, that is an ARIMA(p, d, q) model has to be fitted.
Different configurations for an ARIMA model are now considered: a stationary
ARMA process and two ARIMA processes with integration orders 1 and 2, for the
case without and with the presence of a drift7 .
7 In
case of an ARMA(p,q) model a drift is present when 6= 0 and from (8.11) we have:
Yt = (1 1 . . . p ) + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq
that is
Yt = drif t + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq .
282
Table 8.1 Summary of the code for estimating ARIMA(p, d, q) models with the arima
function and corresponding code for prediction k steps-ahead.
The sarima function in the package astsa can also be used when d 1, see Section 8.5
(B)yt = drif t + (B)t
typical
behaviour
code for parameter estimation;

interpretation of intercept/xreg;
code for predicting (k steps-ahead).
no unit roots
no drift
Fig. 8.30
no unit roots
with drift
Fig. 8.31
1 unit root
no drift
Fig. 8.32
1 unit root
with drift
Fig. 8.33
2 unit roots
no drift
Fig. 8.34
2 unit roots
with drift
Fig. 8.35
arima(x,c(p,0,q),include.mean=FALSE)
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q))
mean of {Yt }
arima(x,c(p,1,q))
not present
arima(x,c(p,1,q),xreg=1:length(x))
linear deterministic trend slope
predict(obj,k,newxreg=1:k+length(x))
arima(x,c(p,2,q))
not present
arima(x,c(p,1,q),xreg=(1:length(x))2)
quadratic deterministic trend coefficient
predict(obj,k,newxreg=(1:k+length(x))2)
We will assume that no roots of p (z) = 0 are inside the unit circle and that p (z) = 0
and q (z) = 0 do not have any common roots. As an example, the code is reported
for estimating the parameters of a time series simulated with the presence of only an
autoregressive coefficient of order 1.
Remember that among the parameters, that have to be estimated, are also p and q,
the orders respectively of the autoregressive and moving average parts of the model.
We assume here that the order has already been identified e.g. by examining the
autocorrelation and partial autocorrelation functions and/or using automatic criteria
like TSA::arimasubsets, or forecast::auto.arima, see Section 8.2.6.
See Section 8.10.8 for the estimation of non complete (subset) models, which can be
performed by using the argument fixed in the arima function.
In Section 8.5 other R functions are presented for the parameter estimation of ARIMA
models.
Table 8.1 summarizes the use of arima and also shows how to use the function
predict for obtaining k steps-ahead forecasts.
8.4.1
283
No unit roots in the characteristic equation p (z) = 0
No drift presence
The time series x is stationary; the estimation of
p (B)yt = q (B)t
can be performed with:
> arima(x, c(p, 0, q), include.mean = FALSE)
Example: Estimation of an ARMA(1,0,0) without drift. See Fig. 8.30.
>
>
>
>
n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))
If we do not specify include.mean=FALSE, that is we use the code for the case with
the mean of the series different from 0 (a drift is present), we obtain:
> arima(y, c(1, 0, 0))
Series: y
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
AIC=5677.94
AICc=5677.95
BIC=5694.74
we can first observe that the coefficient named intercept in the output is the
estimate for the mean
of {Yt }. Namely the average value for the realization is
mean(y)=0.1507.
As expected the estimate for the mean is not significantly different from zero
(0.1464/0.0969 < 2 ' 1.96 and also the estimate for the drift will not be significant);
so we can proceed to estimate an autoregressive model with 0 mean, that is without
drift, by setting the argument include.mean=FALSE.
> (output <- arima(y, c(1, 0, 0), include.mean = FALSE))
Series: y
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.7719
0
4
y[1:400]
284
100
200
300
400
Index
Figure 8.30 ARIMA(1,0,0) no drift: an autoregressive behaviour with respect to the mean
0 can be observed. A mean reverting is also present, that is the process tends to come back
to its mean value in the short run
s.e.
0.0142

AIC=5678.2
AICc=5678.2
BIC=5689.4
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] -0.009089261 -0.007016222 -0.005415992 -0.004180736
[5] -0.003227212
285
$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9993847 1.2624990 1.3958956 1.4696365 1.5118672
With drift
The time series is again stationary with mean =
drif t
p (0) .
The estimation of
p (B)(Yt ) = q (B)t
can be performed with
> arima(x, c(p, 0, q))
Estimation of an ARMA(1,0,0) with drift. See Fig. 8.31.
>
>
>
>
n <- 2000
drift <- 2
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))
> (output <- arima(y, c(1, 0, 0)))
Series: y
Coefficients:
ar1 intercept
0.7699
10.1463
s.e. 0.0143
0.0969
AIC=5677.97
AICc=5677.98
BIC=5694.77
Remind that intercept is the estimate for the mean of {Yt }: the average value in
the realization results mean(y)=10.1507.
$pred
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 10.02461 10.05261 10.07418 10.09078 10.10356
286
10
6
y[1:400]
12
14
100
200
300
400
Index
Figure 8.31 ARIMA(1,0,0) with drift: an autoregressive behaviour with respect to the
t
mean ( drif
) can be observed
1
$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9988302 1.2605381 1.3926252 1.4653016 1.5067219
8.4.2
1 unit root in the characteristic equation p+1 (z) = 0
If p+1 (z) = 0 has 1 unit root it follows p+1 (B) = p (B)(1 B) = p (B) = 0.
Here it is essential to distinguish if a drift characterizes the differenced series.
287
No drift presence
When no drift is present, the model
p (B)Yt = q (B)t
can be estimated with:
Observe that the intercept, that is the estimate of the mean, will not be produced.
Estimation of an ARMA(1,1,0) no drift. See Fig. 8.32.
>
>
>
>
n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 1, 0)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
AIC=5678.2
AICc=5678.2
BIC=5689.4
which should correspond to
> arima(diff(y), c(1, 0, 0))
Series: diff(y)
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
AIC=5677.94
AICc=5677.95
BIC=5694.74
here the intercept, that is the estimate of the mean of {yt } is produced since
no integration was required in the model, but it is not significant, thus we have to
proceed again to estimate a model without the mean:
> arima(diff(y), c(1, 0, 0), include.mean = FALSE)
20
0
y[1:400]
40
60
288
100
200
300
400
Index
Figure 8.32 ARIMA(1,1,0) no drift: an autoregressive behaviour with respect to random

mean levels can be observed. There is no mean reverting
Series: diff(y)
Coefficients:
ar1
0.7719
s.e. 0.0142
AIC=5678.2
AICc=5678.2
BIC=5689.4
which is equivalent to the first estimation result.
$pred
Time Series:
289
Start = 2002
End = 2006
Frequency = 1
[1] 301.3907 301.3837 301.3783 301.3741 301.3709
$se
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 0.9993847 2.0333772 3.1199618 4.2095757 5.2762076
With drift
When the drift is present for the differenced series, it corresponds to the presence of
a linear (deterministic) trend in {Yt } and the model
p (B)(Yt bt) = q (B)t
> arima(x, c(p, 1, q), xreg = 1:length(x))
where xreg is an external regressor for {Yt }, here the time sequence.
The coefficient corresponding to xreg is the estimate for the linear (deterministic)
slope b, while the estimate for the drift is p (1) xreg = (1 1 . . . p ) xreg.
Namely in case of an ARIMA(1,1,0) we have:
(Yt bt) = (Yt1 b(t 1)) + t
Yt b = [Yt1 b] + t
Yt = (1 )b + Yt1 + t
the drift (1 )b corresponding to a linear (deterministic) trend with slope b.
>
>
>
>
n <- 2000
drift <- 0.2
set.seed(123)
> (output <- arima(y, c(1, 1, 0), xreg = 1:length(y)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1 1:length(y)
290
s.e.
0.7700
0.0143
1.1158
0.0971

AIC=5678.04
AICc=5678.05
BIC=5694.85
The coefficient corresponding to xreg is an estimate for the slope in a linear model
without the intercept describing y as a linear function of the time:
> lm(y ~ -1 + c(1:length(y)))
Call:
lm(formula = y ~ -1 + c(1:length(y)))
Coefficients:
c(1:length(y))
1.114
The estimate of the drift is:
> (1 - output$coef[1]) * output$coef[2]
ar1
0.2565846
> predict(output, n.ahead = 5, newxreg = 1:5 + length(y))
$pred
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 2302.410 2303.450 2304.507 2305.578 2306.660
$se
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 0.9988481 2.0306506 3.1136020 4.1983926 5.2592863
291
200
0
100
y[1:400]
300
400
100
200
300
400
Index
Figure 8.33 ARIMA(1,1,0) with drift: an autoregressive behaviour characterized by an

unitary root with respect to a linear deterministic trend is present. Deviations from the
trend were represented in Fig. 8.32, they can be modelled as an ARIMA(1,1,0) process
without drift
8.4.3
2 unit roots in the characteristic equation p+2 (z) = 0
If 2 unit roots are present in p+2 (B) = 0 we have p+2 (B) = p (B)(1 B)2 =
p (B)2 = 0.
Also here we have to distinguish if a drift characterizes the differenced series.
No drift presence
When no drift is present, the model
p (B)2 Yt = (B)t
292
Estimation of an ARMA(1,2,0) no drift. See Fig. 8.34.

>
>
>
>
n <- 2000
drift <- 0
set.seed(123)
> (output <- arima(y, c(1, 2, 0)))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
AIC=5678.2
AICc=5678.2
BIC=5689.4
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 219352.7 219654.1 219955.5 220256.8 220558.2
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9993847
2.9449759
5.9209014
9.9226820 14.9209790
With drift
When the drift is present for the differenced series, it corresponds to the presence of
a quadratic (deterministic) trend and the model
p (B)2 (Yt ct2 ) = q (B)t
> arima(x, c(p, 2, q), xreg = (1:length(x))^2)
293
4000
0
2000
y[1:400]
6000
8000
100
200
300
400
Index
Figure 8.34 ARIMA(1,2,0) no drift: an autoregressive behaviour with respect to local

linear trends with random slopes can be observed
The coefficient corresponding to xreg is the estimate for the coefficient c of the
quadratic deterministic trend; the estimate for the drift results (1) 2 xreg =
(1 1 . . . p ) 2 xreg.
Namely in case of an ARIMA(1,2,0) we have:
2 (Yt ct2 ) = 2 (Yt1 c(t 1)2 ) + t
and since 2 ct2 = [ct2 c(t 1)2 ] = (2ct c) = 2c
2 Yt 2c = [2 Yt1 2c] + t
2 Yt = (1 )2c + 2 Yt1 + t ;
the drift (1 )2c corresponds to a deterministic quadratic trend with coefficient c.
> n <- 2000
> drift <- 0.2
294
> set.seed(123)
> y <- arima.sim(model = list(order = c(1, 2, 0), ar = 0.8),
> (output <- arima(y, c(1, 2, 0), xreg = (1:length(y))^2))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1 (1:length(y))^2
0.7701
0.5493
s.e. 0.0143
0.0490
AIC=5678.19
AICc=5678.2
BIC=5694.99
The coefficient corresponding to xreg is an estimate for the coefficient in a linear
model without the intercept describing y as a quadratic function of the time:
> lm(y ~ -1 + I((1:length(y))^2))
Call:
lm(formula = y ~ -1 + I((1:length(y))^2))
Coefficients:
I((1:length(y))^2)
0.5493
The estimate of the drift is:
> (1 - output$coef[1]) * 2 * output$coef[2]
ar1
0.2525803
> predict(output, n.ahead = 5, newxreg = (1:5 + length(y))^2)
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 2222338 2224642 2226946 2229252 2231558
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9988837
2.9417787
5.9114554
9.9022919 14.8840825
295
40000
0
20000
y[1:400]
60000
80000
100
200
300
400
Index
Figure 8.35 ARIMA(1,2,0) with drift: an autoregressive behaviour characterized by two

unitary roots with respect to a quadratic deterministic trend is present (thought we cannot
see it). Deviations from the trend can be modelled as an ARIMA(1,2,0) process without drift
8.5
Some other R functions for ARMA model parameter

estimation
Some other functions are present in the packages of the R system to obtain parameter
estimates for an ARIMA model. See the R help for more information.
Results may differ since numerical methods are adopted within different estimation
techniques, they may depend also on the assumptions made on the starting values of
t , when the order q of the moving average part of the model is greater than 1.
We simulate an ARIMA(2,1,0) process yt = {Yt } and apply some of the available
functions to the estimation of the parameters of an ARMA(2,0,0) model to Dyt = Yt .
> n <- 2000
> drift <- 0.2
> set.seed(123456)
296
> yt <- arima.sim(model = list(order = c(2, 1, 0),

ar = c(0.4, 0.2)), n = n, rand.gen = function(n) drift +
rnorm(n))
> Dyt <- diff(yt)
8.5.1
The arima function
drif t
Pay attention, the quantity called intercept is the estimate for the mean 1
=
1 2
0.5; moreover the covariance matrix of the estimates is found from the Hessian of the
log-likelihood, and so may be only a rough guide.
> arima(Dyt, c(2, 0, 0))

Series: Dyt
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
intercept
0.5043
0.0605
sigma^2 estimated as 1.008:

AIC=5700.41
AICc=5700.43
log likelihood=-2846.2
BIC=5722.81
Remember that if we apply an ARIMA(2,1,0) to the undifferenced series yt by means

of the arima function, we have to use also the argument xreg to take into account
the presence of the drift. The argument xreg must be used in arima by using the
time as external regressor.
> arima(yt, c(2, 1, 0), xreg = 1:length(yt))
Series: yt
ARIMA(2,1,0)
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
1:length(yt)
0.5038
0.0605

AIC=5700.41
AICc=5700.43
BIC=5722.81
Small differences in the parameter estimates may be present since numerical

procedures are considered to find roots maximizing the likelihood function. Maximum
likelihood estimators are consistent and the differences vanish for longer series.
The mean of Dyt = Yt corresponds to the average difference between consecutive
observations of Yt that is, in presence of a unit root, to the slope of a linear
deterministic trend describing the evolution of Yt , see Section 8.4.2.
8.5.2
297
The sarima function in the package astsa
If there is no integration or if the order of integration8 is 1, the function sarima in

the package astsa which wraps the use of xreg can be used.
For fitting an ARIMA(p,d,q) model (d=0 or 1) to a time series x the call is
sarima(x,p,d,q), see http://www.stat.pitt.edu/stoffer/tsa2/Examples.htm
for more details. It also return values of the AIC and BIC on the same scale as
Verbeeks formulae (8.68), (8.69).
> library(astsa)
> sarima(Dyt, 2, 0, 0, details = FALSE)
$fit
Series: xdata
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
xmean
0.5043
0.0605

AIC=5700.41
AICc=5700.43
BIC=5722.81
$AIC
[1] 1.011115
$AICc
[1] 1.012125
$BIC
[1] 0.01951616
The result is consistent with:
> sarima(yt, 2, 1, 0, details = FALSE)
$fit
Series: xdata
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
constant
0.5038
0.0605

AIC=5700.41
AICc=5700.43
8 For
BIC=5722.81
practical purposes the order d=1 is the one most frequently occurring.
298
Standardized Residuals
500
1000
1500
2000
Time
ACF of Residuals
Normal QQ Plot of Std Residuals

4
2
0
4
ACF
0.1
0.1
0.3
Sample Quantiles
10
20
30
40
50
LAG
0.4
p value
0.8
p values for LjungBox statistic
0.0
10
15
20
lag
Figure 8.36 Diagnostic tools from sarima for the model ARIMA(4,0,0) applied to the
differenced time series
$AIC
[1] 1.011113
$AICc
[1] 1.012123
$BIC
[1] 0.01951124
Observe that the constant is the estimate for the coefficient defining the linear
deterministic trend.
Some diagnostic graphs for the residuals are also produced, see Figures 8.36.
8.5.3
299
The Arima function in the package forecast
It is available in the package forecast. Like sarima it is essentially a wrapper for

arima. Also in this case the quantity in the output, which is named intercept
corresponds to the estimate for the mean. Some error measures for the training set
are returned.
> library(forecast)
> ar4 <- Arima(Dyt, c(2, 0, 0))
> summary(ar4)
Series: Dyt
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
intercept
0.5043
0.0605

AIC=5700.41
AICc=5700.43
BIC=5722.81
Training set error measures:

ME
RMSE
MAE
-7.814393e-04 1.004066e+00 8.052378e-01
MAPE
MASE
2.395949e+02 8.727309e-01
8.5.4
MPE
6.975735e+01
The armaFit function
For ARIMA models (also fractionally differenced). Pay attention: the mean of the
time series is first estimated, according also to Brockwell and Davis (1991), then other
parameters are estimated for the de-meaned series (without drift). Also in this case
the quantity in the output, which is named intercept corresponds to the estimate for
the mean. With the function armaFit in the package fArma the estimation methods
"MLE" Maximum Likelihood and "ols" Ordinary Least Squares may be used.
> library(fArma)
> ar2 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> summary(ar2)
Title:
ARIMA Modelling
Call:
armaFit(formula = Dyt ~ ar(2), data = Dyt, method = "ols")
Model:
300
AR(2) with method: ols

Coefficient(s):
ar1
ar2
0.4323
0.1971
Residuals:
Min
1Q
-3.75165 -0.70435
intercept
0.5034
Median
0.01726
3Q
0.69478
Max
3.68815
Moments:
Skewness Kurtosis
0.012757 -0.004689
Coefficient(s):
Estimate Std. Error t
ar1
0.43233
0.02193
ar2
0.19710
0.02192
intercept
0.50344
0.02246
--Signif. codes: 0 "***" 0.001 "**"
sigma^2 estimated as:
AIC Criterion:
value Pr(>|t|)
19.71
<2e-16 ***
8.99
<2e-16 ***
22.41
<2e-16 ***
0.01 "*" 0.05 "." 0.1 " " 1
NULL
0
Description:
Fri May 24 17:10:56 2013 by user: gabriele.cantaluppi
8.5.5
The FitARMA function
For ARIMA models (also fractionally differenced). It uses a Fast Maximum Likelihood
estimation method as proposed by McLeod and Zhang (2008). Here the parameter
mean is indicated correctly.
> library(FitARMA)
> a <- FitARMA(Dyt, c(2, 0, 0))
> summary(a)
ARIMA(2,0,0)
length of series = 2000 , number of parameters = 3
loglikelihood = -8.33 , aic = 22.7 , bic = 39.5
> coef(a)
MLE
sd
Z-ratio
phi(1) 0.4321938 0.02192204 19.7150305
phi(2) 0.1970989 0.02192204 8.9909012
mu
0.5034386 1.18871170 0.4235161
301
The Ljung-Box statistics, see Section 8.10.2, can also be extracted

> a$Ljun[1:12, ]
m
Qm
pvalue
1 0.03 0.8669619
2 0.07 0.7984524
3 1.50 0.2203072
4 3.00 0.2227739
5 3.09 0.3774278
6 4.68 0.3212406
7 5.80 0.3257347
8 7.07 0.3145735
9 7.60 0.3690962
10 9.00 0.3420269
11 9.03 0.4348361
12 9.34 0.5005130
Observe that in case of estimation of a Moving Average model by means of FitARMA
the coefficients have signes opposite to those obtained with arima:
> arima(Dyt, c(0, 0, 2))
Series: Dyt
Coefficients:
ma1
ma2
0.4229 0.3051
s.e. 0.0218 0.0191
intercept
0.5036
0.0399

AIC=5809.82
AICc=5809.84
BIC=5832.23
> library(FitARMA)
> a <- FitARMA(Dyt, c(0, 0, 2))
> coef(a)
MLE
sd
Z-ratio
theta(1) -0.4228582 0.02129473 -19.85741
theta(2) -0.3050715 0.02129473 -14.32615
mu
0.5034386 0.02307363 21.81878
8.5.6
The ar function
The function ar fits an autoregressive time series model to the data, by default
selecting the complexity of the model by the Akaike Information Criterion.
> ar(Dyt, order.max = 8, aic = TRUE)
302

Table 8.2
Functions for forecasting with ARMA models
package
function to estimate
function for predicting
stats
arima
sarima by Stoffer
armaFit
FitAR
predict
sarima.for
predict
predict
fArma
FitAR
Call:
ar(x = Dyt, aic = TRUE, order.max = 8)
Coefficients:
1
2
0.4320 0.1972
Order selected 2
8.5.7
sigma^2 estimated as
1.01
The arima function in the package TSA
For ARIMA models with also the presence of exogenous variables affecting the
response through a transfer function, the function arima in the package TSA can
be used.
> library(TSA)
> arima(Dyt, c(2, 0, 0))
Series: x
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
intercept
0.5043
0.0605

AIC=5698.41
AICc=5698.43
8.6
BIC=5720.81
R functions for predicting with ARMA models
According to the function one uses to obtain the parameter estimates of an ARIMA
model there exist corresponding functions to forecast future values of the time series,
see Table 8.2.
Remember to use new.xreg with stats::arima when integrated ARIMA models
are considered.
303
Examples of code
library: stats, function: predict.

The first argument of predict is an object obtained with arima, the second
argument is the number of steps-ahead to consider for the forecast.
> arimaest <- arima(Dyt, c(2, 0, 0))
> predict(arimaest, 4)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 0.1758992 0.2301188 0.3210887 0.3710921
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.093829 1.159756 1.186845
In presence of an integrated process one has to remember to include the time
corresponding to the external regressor xreg used with arima; the argument in
predict is named newxreg and corresponds here to periods n+1, n+2, n+3, n+4.
> n <- length(yt)
> a <- arima(yt, c(2, 1, 0), xreg = 1:n)
> predict(a, 4, newxreg = 1:4 + n)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1007.053 1007.283 1007.603 1007.974
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.753864 2.530035 3.272484
function sarima.for in the package astsa.

The first argument is the time series to be modeled. The second argument is
304
the number of steps-ahead to consider for the forecast; the three successive
arguments specify the order of the ARIMA model.
> sarima.for(Dyt, 4, 2, 0, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 0.1758992 0.2301188 0.3210887 0.3710921
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.093829 1.159756 1.186845
sarima.for produces consistent forecasts also for the original time series
without having to include the new.xreg argument corresponding to the external
regressor.
> sarima.for(yt, 4, 2, 1, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1007.053 1007.283 1007.603 1007.974
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.753864 2.530035 3.272484
The functions predict in the packages fArma and FitAR have the same structure
of the function predict in the package stats.
> library(fArma)
> ar4 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> predict(ar4, 4)
$pred
Time Series:
Dyt
1900
1920
1940
1960
1980
2000
Time
Figure 8.37
Forecasts and their confidence intervals obtained with sarima.for
Start = 2001
End = 2004
Frequency = 1
[1] 0.1746484 0.2283741 0.3188913 0.3686140
$se
Time Series:
Start = 2001
End = 2004
Frequency = 1
[1] 1.004065 1.093883 1.159847 1.186961
$out
Time Series:
Start = 2001
End = 2004
Frequency = 1
305
306
2001
2002
2003
2004
Low 95
-1.7933
-1.9156
-1.9544
-1.9578
Low 80 Forecast High 80 High 95

-1.1121
0.1746 1.4614 2.1426
-1.1735
0.2284 1.6302 2.3723
-1.1675
0.3189 1.8053 2.5921
-1.1525
0.3686 1.8898 2.6950
> library(FitAR)
> a <- FitAR(Dyt, c(2))
> predict(a, 4)
$Forecasts
1
2
3
4
2000 0.1755649 0.2296387 0.3204801 0.3703991
$SDForecasts
1
2
3
4
2000 1.003959 1.093712 1.159633 1.186718
sarima.for and the method for fArma produce also graphs with the forecasts and
their confidence intervals, see Fig. 8.37.
8.7
Stock Prices and Earnings (Section 8.4.4)
Data on the ratio of the S&P composite stock price index and S&P composite earnings
over the period 18712009 (T = 139) are considered; they can be read by means of
the function readEViews, having extracted the file priceearnings.wf1 from the
compressed archive ch08.zip.
Last line (140) has to be dropped, since the corresponding observation does not exist.
The function ts(object, start, frequency) creates a multiple time series from
the columns of a table; in this case there is no need to specify the frequency since
data are annual.
> library(hexView)
> pe <- readEViews(unzip("ch08.zip", "Chapter 8/priceearnings.wf1"))
> pe <- pe[-140, ]
> pe <- ts(pe, start = 1871)
lne log earnings
lnp log price
lnpe log price to earnings
To obtain a plot of the log of the stock price and of the earnings series use the function
xyplot available in the package lattice. Remind that a multiple time series, mts,
307
LNE
LNP
1900
1950
2000
Time
Figure 8.38
Log stock price and earnings, 18712009
object can be treated like a matrix so it is possible to make reference to the appropriate
columns of the object pe
> library(lattice)
> xyplot(pe[, 1:2], superpose = TRUE)
8.7.1
Dickey-Fuller test - construction
As Verbeek observes it is clear that both the log price and log earnings series are not
weakly stationary, he suggests to test if the non stationarity is due to the presence of
a stationary trend or of one or more deterministic roots.
To test for the presence of a unit root we have to consider the standard Dickey-Fuller
regression, see Verbeeks equation (8.58):
Yt = + t + 1 Yt1 + et
(8.12)
Let y be the log price series pe[,2]. The estimates of the parameters in relationship
(8.12) can be obtained by making use of the function dynlm as:
308
> y <- pe[, 2]

> library(dynlm)
> summary(dynlm(d(y) ~ c(1:138) + L(y)))
Start = 1872, End = 2009
Call:
dynlm(formula = d(y) ~ c(1:138) + L(y))
Residuals:
Min
1Q
-0.56867 -0.10424
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
c(1:138)
0.0017627 0.0007406
2.380 0.01870 *
L(y)
-0.0984286 0.0375499 -2.621 0.00977 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Dickey-Fuller statistic is given by the t statistic for the lagged variable coefficient
(2.621): the coefficient must be different from 0 for refusing the presence of a
unit root; its t statistic has to be compared with proper critical values to assess
a conclusion.
8.7.2
Dickey-Fuller test - direct function
The result can be obtained directly by using the function ur.df available in the
package urca. The function ur.df has four parameters. The first is a time series.
With the parameter type it is possible to specify if only a constant has to be included
in model (8.12) (type="drift") or both a trend and the drift (type="trend") or
neither the drift nor the trend (type="none") have to be included. The parameter
lags specifies a number of lags for Yt to include in the regression (8.12); selectlags,
which by default is equal to "fixed", may be set to "AIC" or "BIC" for obtaining an
automatic lag selection according to the Akaike or the Bayesian Information criteria,
within the maximum number of lags specified by lags.
> library(urca)
> summary(ur.df(y, type = "trend", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
309
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
-0.56867 -0.10424
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
z.lag.1
-0.0984286 0.0375499 -2.621 0.00977 **
tt
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Value of test-statistic is: -2.6213 2.8316 3.465

Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
The resulting statistic (the first value in the section Value of test-statistic is:)
is the augmented Dickey-Fuller Test for unit root.
Verbeek provides tests for up to six additional lags of Yt . The ADF(1)-ADF(6)
statistics can be obtained by using the function ur.df specifying the value of lags
from 1 to 6.
###############################################
###############################################
310
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.54915 -0.10718
Median
0.01243
3Q
0.11533
Max
0.32751
Coefficients:
0.169427
2.767 0.00647 **
z.lag.1
-0.106162
0.038692 -2.744 0.00691 **
tt
0.001897
0.000760
2.496 0.01376 *
z.diff.lag
0.077647
0.086853
0.894 0.37293
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.57092 -0.10709
Median
0.01747
3Q
0.12584
Max
0.38166
Coefficients:
(Intercept)

0.4050820 0.1746234
2.320
0.0219 *
z.lag.1
-0.0907927
tt
0.0016462
z.diff.lag1 0.0766940
z.diff.lag2 -0.1370071
0.0399401
0.0007765
0.0867307
0.0906890
-2.273
2.120
0.884
-1.511
311
0.0246 *
0.0359 *
0.3782
0.1333
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1


1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.61682 -0.11229
Median
0.01993
3Q
0.11150
Max
0.39296
Coefficients:
(Intercept) 0.4710862 0.1780298
2.646 0.00916 **
z.lag.1
-0.1067164 0.0407667 -2.618 0.00991 **
tt
0.0018926 0.0007863
2.407 0.01750 *
z.diff.lag1 0.1131589 0.0887265
1.275 0.20447
z.diff.lag2 -0.1370854 0.0902885 -1.518 0.13138
z.diff.lag3 0.1613740 0.0910693
1.772 0.07876 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
312

Adjusted R-squared:
p-value: 0.02625
0.05765

1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.59392 -0.10608
Median
0.02363
3Q
0.10998
Max
0.35148
Coefficients:
(Intercept) 0.4221223 0.1843159
2.290
0.0237 *
z.lag.1
-0.0952906 0.0422614 -2.255
0.0259 *
tt
0.0017278 0.0008067
2.142
0.0341 *
z.diff.lag1 0.1168995 0.0891027
1.312
0.1919
z.diff.lag2 -0.1617383 0.0935295 -1.729
0.0862 .
z.diff.lag3 0.1631230 0.0913932
1.785
0.0767 .
z.diff.lag4 -0.0994732 0.0926253 -1.074
0.2849
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

313
1pct 5pct 10pct

tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.59366 -0.10540
Median
0.02747
3Q
0.11002
Max
0.34461
Coefficients:
(Intercept) 0.4220206 0.1890821
2.232
0.0274 *
z.lag.1
-0.0934907 0.0434077 -2.154
0.0332 *
tt
0.0016215 0.0008219
1.973
0.0507 .
z.diff.lag1 0.1141232 0.0910061
1.254
0.2122
z.diff.lag2 -0.1582504 0.0937244 -1.688
0.0938 .
z.diff.lag3 0.1553505 0.0946890
1.641
0.1034
z.diff.lag4 -0.0971210 0.0927473 -1.047
0.2970
z.diff.lag5 -0.0172506 0.0931058 -0.185
0.8533
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
314
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.60505 -0.09813
Median
0.02635
3Q
0.10937
Max
0.36751
Coefficients:
(Intercept) 0.4665094 0.1942144
2.402
0.0178 *
z.lag.1
-0.1045930 0.0446025 -2.345
0.0206 *
tt
0.0018028 0.0008365
2.155
0.0331 *
z.diff.lag1 0.1306554 0.0923672
1.415
0.1597
z.diff.lag2 -0.1370080 0.0958230 -1.430
0.1553
z.diff.lag3 0.1501264 0.0949723
1.581
0.1165
z.diff.lag4 -0.0682406 0.0960788 -0.710
0.4789
z.diff.lag5 -0.0208425 0.0933334 -0.223
0.8237
z.diff.lag6 0.1097285 0.0933387
1.176
0.2420
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
Observe that the function adf.test is also available in the package tseries; but this
function incorporates both the drift and the trend in the regression, by default. Thus
tseries::adf.test(y,k=k) is equivalent to ur.df(y,type="trend",lags=k).
> library(tseries)
> adf.test(y, k = 0)
315
Augmented Dickey-Fuller Test

data: y
Dickey-Fuller = -2.6213, Lag order = 0, p-value = 0.3179
alternative hypothesis: stationary
8.7.3
How to produce the Dickey-Fuller statistic for different lags
It is possible to create a function that simplifies the code writing, without repeating
7 times the same command. Observe that the function ur.df uses classes of type S4,
so one has to use the @, and not the $ sign, to extract a slot9 from an object of class
S4 produced by ur.df, see str(ur.df(y,type="drift",lags=6)).
> f <- function(x) {
urtest <- ur.df(y, type = "trend", lags = x)
c(stat = urtest@teststat[1], "5% crit. value" = urtest@cval[1,
2])
}
> a <- 0:6
> names(a) <- c("DF", paste("ADF(", 1:6, ")", sep = ""))
> round(sapply(a, f), 3)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
stat
-2.621 -2.744 -2.273 -2.618 -2.255 -2.154 -2.345
5% crit. value -3.430 -3.430 -3.430 -3.430 -3.430 -3.430 -3.430
The function f with argument x is defined, which extracts from the object resulting
from ur.df applied to the time series y, with type="trend" and lags=x, the first
element of the slot teststat, which is the Dickey-Fuller statistic, and the element in
the first row, second column of the slot cval, which is the 5% critical value (see also
the Critical values for test statistics section in the preceding outputs).
The variable a contains the desired lags for the unit root test.
The names DF and ADF(1) to ADF(6) are assigned to the elements of a.
The function sapply is finally used to call the function f for the different values of
the lags in the array a.
8.7.4
Other tests for unit roots detection
None of the preceding tests implies a rejection of the null hypothesis of unit root.
Verbeek suggests to use the Phillips-Perron and the KPSS tests10 for unit root, the
tests can be obtained with the functions ur.pp and ur.kpss available in the package
urca.
The arguments of the function ur.pp are the time series x to be tested for a unit root,
the type, which can be "Z-alpha" or "Z-tau"; the model, with values "constant"
9 Elements
of objects belonging to an S4 class are named slots.

that the null hypothesis fo the KPSS test is stationarity or trend stationarity.
10 Remember
316
or "trend", determining the deterministic part in the test regression, and lags,
specifying the lags used for correction of error term, which can be "short" or "long";
an exact number of lags can be specified with the argument use.lag. The output has
a structure similar to that of ur.df. See the help ?ur.pp for more information.
The arguments of the function kpss.pp are the time series x to be tested for a unit
root, the type, which can be "mu" or "tau"; lags, specicying the maximum number
of lags used for correction of error term, which can be "short", "long", or "nil".
An exact number of lags can be specified with the argument use.lag. The output
has a structure similar to that of ur.df. Only the version with Bartlett weights is
implemented. See the help ?ur.kpss for more information.
> summary(ur.pp(y, type = "Z-tau", model = "trend",
use.lag = 6))
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept and trend
Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-0.56867 -0.10424
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
(Intercept) 0.5586889 0.2065375
2.705 0.00771 **
y.l1
0.9015714 0.0375499 24.010 < 2e-16 ***
trend
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Value of test-statistic, type: Z-tau
Z-tau-mu
Z-tau-beta
aux. Z statistics
2.5749
2.4219
Critical values for Z statistics:
is: -2.6634
317
1pct
5pct
10pct
critical values -4.02682 -3.442804 -3.14582
> summary(ur.kpss(y, type = "tau", use.lag = 6))
#######################
# KPSS Unit Root Test #
#######################
Test is of type: tau with 6 lags.
Value of test-statistic is: 0.2233
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.119 0.146 0.176 0.216
The KPSS statistic 0.2233 is larger than the critical value 0.146 thus rejecting trend
stationarity in favour of a unit root.
The KPSS statistic can also be obtained by means of the function kpss.test available
in the package tseries but the function does not allow an exact lag to be specified.
8.7.5
Testing for multiple unitary roots
By imposing a first unit root it is possible to test for the presence of a second unit
root with regressions of the form, see Verbeek p. 298:
2 Yt = + Yt1 + c1 2 Yt1 + . . . + t
(8.13)
the null hypothesis corresponds to = 0.

We can define a function g computing the Dickey-Fuller statistic with type drift
since as Verbeek observes it seems unlikely that stock returns exhibit a deterministic
trend for the differenced series diff(y) and then use the function sapply to obtain
the results of the function g applied to the different values of the lags in the array a,
defined above. (The 5% critical value is the same as before).
> g <- function(x) summary(ur.df(diff(y), type = "drift",
lags = x))@teststat[1]
> (adf <- round(sapply(a, g), 3))
-11.289 -9.602 -6.558 -6.522 -5.997 -4.921 -4.106
The ADF(6) value strongly rejects the unit root bypothesis. The KPSS test with
Bartlett weights results.
> summary(ur.kpss(diff(y), type = "mu", use.lag = 6))
#######################
#######################
318
Test is of type: mu with 6 lags.

critical values 0.347 0.463 0.574 0.739
The KPSS statistics 0.054 is lower that the 5% critical value 0.463 also indicating
that the first-differenced price series is likely to be stationary.
With regard to the log earnings series we have for the ADF(6) statistic:
> y <- pe[, 1]
###############################################
###############################################
Call:
Residuals:
Min
1Q
-0.99802 -0.10972
Median
0.03045
3Q
0.13696
Max
0.50098
Coefficients:
0.177367
2.835 0.00536 **
z.lag.1
-0.269302
0.096935 -2.778 0.00632 **
tt
0.004042
0.001498
2.698 0.00796 **
z.diff.lag1 0.057546
0.115829
0.497 0.62021
z.diff.lag2 -0.122193
0.109751 -1.113 0.26772
z.diff.lag3 -0.034512
0.105715 -0.326 0.74463
z.diff.lag4 -0.085099
0.102001 -0.834 0.40573
z.diff.lag5 -0.267248
0.095763 -2.791 0.00610 **
z.diff.lag6 0.049989
0.097278
0.514 0.60826
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
319

1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
the value 2.7782 does not reject the presence of a unit root; for the KPSS(6) statistic
we have:
> summary(ur.kpss(y, type = "tau", use.lag = 6))
#######################
#######################
Test is of type: tau with 6 lags.
critical values 0.119 0.146 0.176 0.216
marginally rejecting the trend stationarity at the 5% level. The Phillips Perron
statistic results 4.908 clearly rejecting the unit root hypothesis:
> summary(ur.pp(y, type = "Z-tau", model = "trend",
use.lag = 6))
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept and trend
Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-1.04265 -0.11320
Median
0.04181
3Q
0.13670
Max
0.54002
Coefficients:
0.180139
5.157 8.72e-07 ***
y.l1
0.676123
0.063411 10.662 < 2e-16 ***
320
trend
0.004739
0.001050
4.515 1.37e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Value of test-statistic, type: Z-tau
Z-tau-mu
Z-tau-beta
is: -4.908
aux. Z statistics
5.9746
4.3123
Critical values for Z statistics:

1pct
5pct
10pct
critical values -4.02682 -3.442804 -3.14582
Verbeek then analises the log of the price/earnings ratio. To obtain a graphical
representation of the series, see Fig. 8.39, use:
> library(lattice)
> xyplot(pe[, 3])
Verbeek observes that the series seems to fluctuate around a long-run average but the
mean reverting seems to need several yars to happen. For this series we have ADF(0)
and ADF(6) statistics without trend respectively of 4.424 and 2.208 the first one
rejecting the presence of a unit root.
> y <- pe[, 3]
> ur.df(y, type = "drift", lags = 0)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -4.424 9.8065
> ur.df(y, type = "drift", lags = 6)
###############################################################
###############################################################
Verbeek observes that by considering the presence of a trend the situation does not
make much difference; namely we have:
> ur.df(y, type = "trend", lags = 0)
321
2.0
2.5
3.0
3.5
1900
1950
2000
Time
Figure 8.39
Log price/earnings ratio, 18712009
###############################################################
###############################################################
The value of the test statistic is: -4.6422 7.2047 10.7863
> ur.df(y, type = "trend", lags = 6)
###############################################################
###############################################################
The KPSS(6) statistic with error correction using the Bartlett kernel results 0.331
and does not reject the null of no unit root:
> summary(ur.kpss(y, type = "mu", use.lag = 6))
#######################
322
#######################
critical values 0.347 0.463 0.574 0.739
The conclusion is that data are not sufficiently informative to distinguish between the
two hypotheses; and mean reversion, if present, is very slow.
8.8
Some remarks on the function ur.df
As the reader has surely observed, the function ur.df (with the options none, drift
and trend) produces respectively from 1 to 3 statistics named tau1/tau2/tau3,
phi1/phi2 and phi3. We now consider the hypotheses underlying those test statistics.
No autocorrelations for the differenced series in the augmented Dickey-Fuller test are
taken into account for simplicity.
8.8.1
The Dickey-Fuller test for a unit root, type "none"
The following process is considered:

Yt ut
ut = ut1 + t ,
(8.14)
by substituting the first relationship in the latter one we have an AR(1) process.
Note that if = 1 we have a random walk
Yt = Yt1 + t .
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = ( 1)Yt1 + t .
In the ur.df output will appear only one statistic: tau1, which corresponds to the
hypothesis:

H0 : random walk without drift
H1 : stationary AR(1) without drift
1%, 5% and 10% critical values for the tau1 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when
the statistic is negative and very far from 0 which gives evidence that the process is
stationary.
8.8.2
323
Dickey-Fuller test for a unit root, type drift

Yt = + ut
.
ut = ut1 + t
(8.15)
Substitute ut = Yt in the second relationship to obtain an AR(1) process with

drift:
Yt = (1 ) + Yt1 + t .
(8.16)
Note that if = 1 the last relationship becomes
Yt = Yt1 + t
that is an ARIMA(0,1,0) without drift (a random walk).
model
Yt = a + ( 1)Yt1 + t
(8.17)
The test is named tau2 in the ur.df output and corresponds to the following set of
hypotheses

H0 : random walk, see (8.16)
H1 : stationary AR(1) with drift, see (8.15) and (8.16).
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when
the statistic is negative and very far from 0 which gives evidence that the process is
stationary.
Another test, named phi1, regards the null H0 (a = ( 1) = 0) (that is if Yt is a
random walk without drift), so corresponding to the following hypothesis

H0 : random walk without drift
H1 : stationary AR(1) without drift or random walk with drift;
the second option under H1 is described by the presence of a linear trend and clearly
requires a unit root test with type trend; the behaviour of such a time series can be
clearly detected from its graphical representation, see Section 8.4.2.
8.8.3
Dickey-Fuller test for a unit root, type trend

Yt = + t + ut
.
ut = ut1 + t
(8.18)
that is we have a linear deterministic trend with the presence of an autocorrelated

error; if < 1 {Yt } is said trend stationary. Note that {Yt } is not an ARIMA process.
Substitute ut = Yt t in the second relationship to obtain:
Yt = [ (1 ) + ] + [ (1 )]t + Yt1 + t .
(8.19)
324
Note that if = 1 the last relationship becomes

Yt = + Yt1 + t
(8.20)
that is an ARIMA(0,1,0) (a random walk) with drift, and {Yt } is said difference
stationary; namely, it results
Yt = + t
model
Yt = a + bt + ( 1)Yt1 + t
The test is named tau3 in the ur.df output and corresponds to the following set of
hypotheses

H0 : difference stationary (random walk with drift)
H1 : trend stationary
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when the
statistic is negative and very far from 0 which gives evidence that the process is trend
stationary.
Other tests, say phi2 and phi3, regard the null hypotheses:
(a = b = ( 1) = 0) (Yt is a random walk without drift rather trend stationary)
(b = ( 1) = 0) (Yt is a random walk with drift)
A parabolic behaviour can be also described under the alternative hypotheses

(when b 6= 0 and 1 = 0) but this would be clearly visible from the graphical
representation of the time series, requiring to verify the presence of a second unit
root, see Section 8.4.3.
8.8.4
Example
We simulate a realization from process (8.18), with n = 1000, = 0.4, = 0.5 and
= 0.8.
>
>
>
>
>
>
>
>
>
set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta <- 0.8
epsilon <- rnorm(n)
ut <- arima.sim(model <- list(ar = theta), n, innov = epsilon)
t <- 1:n
yt <- alpha + gamma * t + ut
Figure 8.40 shows the first 200 observations of the simulated time series.
325
20
40
60
80
100
50
100
150
200
Time
Figure 8.40
Simulation of a trend stationary series, see (8.18)
> library(lattice)
> xyplot(ts(cbind(yt[1:200], seq(1, 100, length = 200))),
superpose = TRUE, auto.key = FALSE)
Figure 8.41 shows 200 observations from the simulation of a difference stationary time
series.
> library(lattice)
> xyplot(ts(cbind(cumsum(gamma + epsilon[1:200]), seq(1,
100, length = 200))), superpose = TRUE, auto.key = FALSE)
We now estimate model (8.19) by making use of dynlm
Yt = 0 + 0 t + Yt1 + t
> library(dynlm)
> summary(dynlm(yt ~ t + L(yt, 1)))
20
40
60
80
100
326
50
100
150
200
Time
Figure 8.41
Simulation of a difference stationary series, see (8.20)

Start = 2, End = 1000
Call:
dynlm(formula = yt ~ t + L(yt, 1))
Residuals:
Min
1Q
-3.0157 -0.6895
Median
0.0007
3Q
0.7159
Max
3.0052
Coefficients:
0.062490
9.048
<2e-16 ***
t
0.099609
0.009481 10.507
<2e-16 ***
L(yt, 1)
0.800671
0.018970 42.207
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
327

Multiple R-squared:
1,
Adjusted R-squared:
1
> alpha * (1 - theta) + theta * gamma
[1] 0.48
> gamma * (1 - theta)
[1] 0.1
> theta
[1] 0.8
Let us now check the presence of a unit root by using the augmented Dickey-Fuller
test, by introducing, only for completeness, also an autocorrelation lag of order 1.
Only for checking the behaviour of the function ur.df the options "none" and drift
are considered. So in these instances conclusions on the unit root presence can be
misleading since the series clearly shows the presence of a trend, see Fig. 8.40.
none
We can only test for the presence of a unit root
> library(dynlm)
> summary(dynlm(d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1)))
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-3.0465 -0.6226
Median
0.1114
3Q
0.8558
Max
3.0957
Coefficients:
L(yt, 1)
0.0015126 0.0001259 12.015
<2e-16 ***
L(d(yt), 1) -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1):
328
> library(urca)
> summary(ur.df(yt, type = "none", lags = 1))
###############################################
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-3.0465 -0.6226
Median
0.1114
3Q
0.8558
Max
3.0957
Coefficients:
z.lag.1
0.0015126 0.0001259 12.015
<2e-16 ***
z.diff.lag -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
drift
We first test for the presence of a unit root
> summary(dynlm(d(yt) ~ L(yt, 1) + L(d(yt), 1)))
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-3.1111 -0.7205
Median
0.0281
3Q
0.7169
329
Max
3.1238
Coefficients:
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
L(yt, 1)
-1.666e-05 2.269e-04 -0.073
0.9415
L(d(yt), 1) -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.002226
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1) and

appears (tau2) at the first position in the ur.df output at position Value of the
test statistic is:.
Now we test the null hypothesis = 1 = 0
> anova(dynlm(d(yt) ~ L(yt, 1) + L(d(yt), 1)), dynlm(d(yt) ~
-1 + L(d(yt), 1)))
Model 1: d(yt) ~ L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
995 1059.6
2
997 1291.2 -2
-231.54 108.71 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test appears (phi1) at the second position in the ur.df output at position
Value of the test statistic is:
> ur.df(yt, type = "drift", lags = 1)
###############################################################
###############################################################
> summary(ur.df(yt, type = "drift", lags = 1))
###############################################
###############################################
330
Test regression drift
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-3.1111 -0.7205
Median
0.0281
3Q
0.7169
Max
3.1238
Coefficients:
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
z.lag.1
-1.666e-05 2.269e-04 -0.073
0.9415
z.diff.lag -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
Value of test-statistic is: -0.0734 108.7121

1pct 5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1 6.43 4.59 3.78
trend
We first test for the presence of a unit root
> t <- t[-n]
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-2.99426 -0.68188
Median
0.00284
3Q
0.71159
Max
3.03239
0.002226
331
Coefficients:
0.065077
9.979
<2e-16 ***
t
0.103120
0.009998 10.314
<2e-16 ***
L(yt, 1)
-0.206347
0.020006 -10.314
<2e-16 ***
L(d(yt), 1) 0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1) and
appears (tau3) at the first position in the ur.df output at position Value of the
test statistic is:
Now we test the null hypothesis = = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
994 957.17
2
997 1291.15 -3
-333.98 115.61 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test will appear (phi2) at the second position in the ur.df output at position
Value of the test statistic is:.
We finally test the null hypothesis = 1 = 0
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Model 1: d(yt) ~ t + L(yt, 1)
Model 2: d(yt) ~ 1 + L(d(yt),
Res.Df
RSS Df Sum of Sq
1
994 957.17
2
996 1059.61 -2
-102.44
--Signif. codes: 0 "***" 0.001
+ L(d(yt), 1)
1)
F
Pr(>F)
53.193 < 2.2e-16 ***
"**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test appears (phi3) at the third position in the ur.df output at position Value
of the test statistic is:.
332
> ur.df(yt, type = "trend", lags = 1)

###############################################################
###############################################################
> summary(ur.df(yt, type = "trend", lags = 1))

###############################################
###############################################
Call:
Residuals:
Min
1Q
-2.99426 -0.68188
Median
0.00284
3Q
0.71159
Max
3.03239
Coefficients:
0.065077
9.979
<2e-16 ***
z.lag.1
-0.206347
0.020006 -10.314
<2e-16 ***
tt
0.103120
0.009998 10.314
<2e-16 ***
z.diff.lag
0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34
8.8.5
Exercise
Lets now consider an AR(1) with a unit root

>
>
>
>
>
>
set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta1 <- 0.8
ut <- arima.sim(model <- list(order = c(1, 1, 0),
ar = theta1), n)
> t <- ts(0:n)
> yt <- alpha + gamma * t + ut
Testing for a unit root
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Start = 3, End = 1001
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539
3Q
0.70942
Max
3.09277
Coefficients:
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
t
0.0008830 0.0005898
1.497
0.1347
L(yt, 1)
-0.0011549 0.0007256 -1.592
0.1118
L(d(yt), 1) 0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Testing for = = 1 = 0
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Model 2: d(yt) ~ -1 + L(d(yt), 1)
333
334
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
995 956.70
2
998 977.21 -3
-20.511 7.1106 9.956e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Testing for = 1 = 0
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Model 2: d(yt) ~ 1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
995 956.70
2
997 959.27 -2
-2.5686 1.3357 0.2634
###############################################
###############################################
Call:
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539
3Q
0.70942
Max
3.09277
Coefficients:
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
z.lag.1
-0.0011549 0.0007256 -1.592
0.1118
tt
0.0008830 0.0005898
1.497
0.1347
z.diff.lag
0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34
What happens if we add another lag?
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1:2)))
Start = 4, End = 1001
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1:2))
Residuals:
Min
1Q
Median
-2.90554 -0.69296 -0.00589
3Q
0.70502
Max
3.10600
Coefficients:
(Intercept)
0.1575230 0.0656766
2.398
0.0166 *
t
0.0008424 0.0005911
1.425
0.1544
L(yt, 1)
-0.0011135 0.0007270 -1.532
0.1259
L(d(yt), 1:2)1 0.8165655 0.0316832 25.773
<2e-16 ***
L(d(yt), 1:2)2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
###############################################
###############################################
Call:
Residuals:
Min
1Q
Median
-2.90554 -0.69296 -0.00589
3Q
0.70502
Max
3.10600
335
336
Coefficients:
(Intercept) 0.1575230 0.0656766
2.398
0.0166 *
z.lag.1
-0.0011135 0.0007270 -1.532
0.1259
tt
0.0008424 0.0005911
1.425
0.1544
z.diff.lag1 0.8165655 0.0316832 25.773
<2e-16 ***
z.diff.lag2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.96 -3.41 -3.12
phi2 6.09 4.68 4.03
phi3 8.27 6.25 5.34
8.8.6
Exercise
Lets now consider a Random Walk

> set.seed(456)
> n <- 1000
> yt <- ts(cumsum(rnorm(n)))
> library(dynlm)
> summary(dynlm(d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1)))
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ -1 + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-2.99524 -0.65714
Median
0.02312
3Q
0.73171
Max
3.03899
Coefficients:
L(yt, 1)
0.0007409 0.0008270
0.896
0.371
L(d(yt), 1) 0.0333952
0.0317223
1.053
337
0.293

Adjusted R-squared:
###############################################
###############################################
1.943e-05
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-2.99524 -0.65714
Median
0.02312
3Q
0.73171
Max
3.03899
Coefficients:
z.lag.1
0.0007409 0.0008270
0.896
0.371
z.diff.lag 0.0333952 0.0317223
1.053
0.293
Adjusted R-squared:

1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
###############################################
###############################################
Call:
lm(formula = z.diff ~ z.lag.1 - 1)
1.943e-05
338
Residuals:
Min
1Q
-3.00751 -0.66268
Median
0.02014
3Q
0.73729
Max
3.01637
Coefficients:
z.lag.1 0.0007877 0.0008256
0.954
0.34
Adjusted R-squared:
-8.988e-05

1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
8.9
Long-run Purchasing Power Parity (Part 1) (Section 8.5)
To import the data from the file ppp2.wf1, which is a work file of EViews, call first
the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Observations from January 1988 to December 2010 (T=276) on price indices and
exhange rates for the United Kingdom, the United States and the Euro area are
available11 .
cpieuro: price index Euro area
cpiuk: price index United Kingdom
cpius: price index United States
fxde: exchange rate dollar/euro
fxdp: exchange rate dollar/pound
fxep: exchange rate euro/pound
logcpieuro: log price index Euro area (Jan 1988=100)
logcpiuk: log price index United Kingdom (Jan 1988=100)
11 As Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.
339
4.6
4.8
5.0
5.2
LOGCPIEURO
LOGCPIUK
1990
1995
2000
2005
2010
Time
Figure 8.42
Log consumer price index UK and Euro area, Jan 1988Dec 2010
logcpius: log price index United States (Jan 1988=100)
By using the function ts(object, start, frequency) it is possible to create a

multiple time series from the columns of a table or a matrix; here we have to specify
freq=12 since data have a monthly frequency.
> ppp <- ts(ppp, start = c(1988, 1), freq = 12)
Figure 8.42 represents the log consumer price index for UK and the Euro area.
> library(lattice)
> xyplot(ppp[, 7:8], lty = c(1, 2), superpose = TRUE)
Before analysing if the purhasing power parity condition, see Verbeeks relationship
(8.60), referred to log variables:
st = pt pt
we have to analyse, as a first necessary step, the properties of the involved variables.
The Dickey-Fuller test statistic can be obtained for the log Euro series, obviously in
the version with the trend and the drift, see Fig. 8.42, by using the following code.
340
> library(urca)
> y <- ppp[, 7]
###############################################
###############################################
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
Median
-0.0098260 -0.0013041 -0.0000486
3Q
0.0013773
Max
0.0092253
Coefficients:
(Intercept) 1.010e-01 3.480e-02
2.902 0.00401 **
z.lag.1
-2.106e-02 7.467e-03 -2.821 0.00514 **
tt
3.306e-05 1.399e-05
2.363 0.01882 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau3 -3.98 -3.42 -3.13
phi2 6.15 4.71 4.05
phi3 8.34 6.30 5.36
From the preceding outputs the 5% critical values can be recovered: 3.42 for the
test with the drift and the trend.
It is possible to create a function to simultaneously run a range of Augmented DickeyFuller tests and reproduce Verbeeks Table 8.2.
By finding results for Euro and UK separately we also consider a function f1 which
returns 1 if the corresponding (Augmented) Dickey-Fuller Test concludes for accepting
the presence of a 1 root:
341

wt <- summary(ur.df(y, type = "trend", lags = x))@teststat[1]
}
> f1 <- function(x) {
wt <- summary(ur.df(y, type = "trend", lags = x))
wt <- (wt@teststat[1] >= wt@cval[1, 2])
}
> print("Euro")
[1] "Euro"
> y <- ppp[, 7]
> a <- c(0:12, 24, 36)
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6) ADF(7)
-2.821 -2.810 -2.912 -3.029 -3.241 -3.402 -3.173 -3.368
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12) ADF(24) ADF(36)
-3.518 -3.600 -3.704 -3.730 -3.506 -3.098 -3.361
> print(out1)
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
TRUE
For the UK consumer price index we have:
> print("UK")
[1] "UK"
> y <- ppp[, 8]
-3.587 -3.535 -3.697 -3.706 -3.785 -3.936 -3.316
-3.401 -3.763 -3.816 -3.840 -3.678 -4.068 -2.262
> print(out1)
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
TRUE
And for Euro and UK simultaneously
ADF(7)
-3.439
ADF(7)
FALSE
342
> for (ind in 7:8) {

y <- ppp[, ind]
a <- c(0:12, 24, 36)
names(a) <- c("DF", paste("ADF(", a[-1], ")",
sep = ""))
print(switch(ind - 6, "Euro", "UK"))
out <- sapply(a, f)
print(round(out, 3))
}
[1] "Euro"
-2.821 -2.810 -2.912 -3.029 -3.241 -3.402 -3.173
-3.518 -3.600 -3.704 -3.730 -3.506 -3.098 -3.361
[1] "UK"
-3.587 -3.535 -3.697 -3.706 -3.785 -3.936 -3.316
-3.401 -3.763 -3.816 -3.840 -3.678 -4.068 -2.262
ADF(7)
-3.368
ADF(7)
-3.439
We can observe how for the UK consumer price index the null hypothesis of unit root
is not rejected at 5% only for lags 6, 8 and 36; while for the other lags it is accepted
the hypothesis of stationarity at a confidence level of 5%. If we consider a confidence
level of 1%, to which correspond the critical value 3.98, the null of unit root is never
rejected. A larger negative evidence would be necessary for the Dickey-Fuller statistic
to reject the null of unit root and accept the trend stationarity.
With reference to the log exchange rate Euro/UK we have the following unit root
test results
nt <- summary(ur.df(y, type = "drift", lags = x))@teststat[1]
wt <- summary(ur.df(y, type = "trend", lags = x))@teststat[1]
return(c(nt, wt))
}
> f1 <- function(x) {
nt <- summary(ur.df(y, type = "drift", lags = x))
nt <- (nt@teststat[1] >= nt@cval[1, 2])
wt <- summary(ur.df(y, type = "trend", lags = x))
wt <- (wt@teststat[1] >= wt@cval[1, 2])
return(c(nt, wt))
}
> a <- 0:6
> y <- log(ppp[, 6])
> rownames(out) <- rownames(out1) <- c("Without trend",
343
"With trend")
Without trend -1.264 -1.215 -1.243 -1.340 -1.525 -1.404 -1.284
With trend
-1.249 -1.195 -1.222 -1.319 -1.505 -1.375 -1.248
> print(out1)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
we have no rejection of the null hypothesis of unit root.
With reference to the log real exchange rate between Euro area and UK:
rst = st (pt pt )
see Fig. 8.43 obtained with the code
> xyplot(ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]),
start = c(1988, 1), freq = 12))
we have the following unit root test results
> y <- ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]), start = c(1988,
1), freq = 12)
> a <- c(0:6, 12)
> rownames(out) <- rownames(out1) <- c("Without trend",
"With trend")
Without trend -1.492 -1.473 -1.427 -1.476 -1.627 -1.520 -1.389
With trend
-1.490 -1.469 -1.418 -1.466 -1.616 -1.504 -1.367
ADF(12)
Without trend -1.993
With trend
-1.966
> print(out1)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
ADF(12)
Without trend
TRUE
With trend
TRUE
the null hypothesis of a unit root in rst cannot be rejected.
The KPSS test (Bartlett weights) with a lag length of 6 results:
0.2
0.3
0.4
0.5
0.6
344
1990
1995
2000
2005
2010
Time
Figure 8.43
Log real exchange rate Euro area-UK, 1988:12010:12
> summary(ur.kpss(y, type = "mu", use.lag = 6))

#######################
#######################
critical values 0.347 0.463 0.574 0.739
Based on the results from the Standard Dickey-Fuller regression (without a trend) we
obtain an autocorrelation coefficient equal to:
> library(dynlm)
> (DFreg <- summary(dynlm(y ~ L(y))))
345

Start = 1988(2), End = 2010(12)
Call:
dynlm(formula = y ~ L(y))
Residuals:
Min
1Q
Median
-0.091770 -0.009943 -0.000087
3Q
0.014394
Max
0.053296
Coefficients:
0.004856
1.388
0.166
L(y)
0.981693
0.012267 80.024
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Verbeek observes that according to this result a proportion of 0.982 of any shock in
the real exchange rate will still remain after one month. Thus the shock after two
months is
> DFreg$coef[2]^2
[1] 0.9637211
and the half life of a shock, describing how long it would take for the effect of a shock
to die out 50%, results
> log(0.5)/log(DFreg$coef[2])
[1] 37.51477
8.10
The Persistence of Inflation (Section 8.8)
Data are available in the file inflation.dat.

The last value in the file is missing and the first column contains time references, so
we drop the pertaining rows/columns.
The files inflation contains the data series of the quarterly U.S. inflation rate,
after seasonal adjustment. The inflation rate is expressed in % per year. Source:
Bureau of Labor Statistics.
> infl <- read.table(unzip("ch08.zip", "Chapter 8/inflation.dat"),
header = TRUE)
> infl <- infl[-nrow(infl), -1]
> infl <- ts(infl, start = c(1960, 1), freq = 4)
10
10
15
346
1960
1970
1980
1990
2000
2010
Time
Figure 8.44
Quarterly inflation in the United States, 19602010
To investigate persistence of inflation, that is how strongly a current shock to inflation

affects future inflation rates, and how long the inflation rate needs to return to its
previous level Verbeek investigates the dynamic properties of the quarterly inflation
rate in the US.
See Fig. 8.44, for a graphical representation of the series
> library(lattice)
> xyplot(infl)
The ADF test for unit roots with two and four lags results 3.145 and 2.811
respectively rejecting the presence of a unit root at 5% level (in case of 4 lags we
are on the boundary of the rejecting region):
> library(urca)
> summary(ur.df(infl, type = "drift", lags = 2))
###############################################
###############################################
Call:
Residuals:
Min
1Q
-16.6299 -0.9669
Median
0.0608
3Q
1.1217
Max
7.3940
Coefficients:
(Intercept) 0.76124
0.28860
2.638
z.lag.1
-0.18593
0.05911 -3.145
z.diff.lag1 -0.52310
0.07622 -6.863
0.06921 -4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00901
0.00192
8.57e-11
2.60e-05
**
**
***
***
"*" 0.05 "." 0.1 " " 1


1pct 5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1 6.52 4.63 3.81
> summary(ur.df(infl, type = "drift", lags = 4))
###############################################
###############################################
Call:
Residuals:
Min
1Q
-16.2318 -1.0063
Coefficients:
Median
0.0886
3Q
1.0133
Max
7.2619
347
348

(Intercept) 0.72058
0.30166
2.389
z.lag.1
-0.17547
0.06242 -2.811
0.08459 -6.031
0.09210 -3.363
0.08857 -0.249
0.07281 -1.366
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.017873
0.005447
8.16e-09
0.000929
0.803994
0.173639
*
**
***
***
"*" 0.05 "." 0.1 " " 1


1pct 5pct 10pct
tau2 -3.46 -2.88 -2.57
phi1 6.52 4.63 3.81
By considering more lags (from 0 to 12) Verbeek observes that it becomes increasingly
less likely to reject the null hypothesis of unit root:
> a <- sapply(0:12, function(lag) ur.df(infl, type = "drift",
lags = lag)@teststat[1])
> names(a) <- c("DF", paste("ADF(", 1:12, ")", sep = ""))
> round(a, 3)
-7.044 -4.388 -3.145 -3.133 -2.811 -2.489 -2.680 -3.168
ADF(8) ADF(9) ADF(10) ADF(11) ADF(12)
-2.897 -2.591 -2.277 -2.222 -2.403
Figure 8.45 reports the autocorrelation and partial autocorrelation functions.
Verbeek by plotting the autocorrelation function includes two standard error
bounds for k based on the estimated variance 1 + 2
21 + . . . + 2
2k1 . Observe that
this estimate of the variance holds under the hypothesis that q = 0 for q > k,
that is the process is a Moving Average process of order k. On the booksite
www.educatt.it/libri/materiali a version of the function acf2 is present, which
includes also the logical argument ma.test that can be set to FALSE or TRUE,
according respectively to the white noise or moving average hypotheses.
> source("acf2.r")
> t(acf2(infl, max.lag = 30, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.61 0.59 0.60 0.46 0.49 0.49 0.36 0.29 0.30 0.26 0.26
349
0.4
0.0
ACF
0.4
0.8
Series: infl
4
LAG
4
LAG
0.4
0.0
PACF
0.4
0.8
Figure 8.45 Sample autocorrelation and partial autocorrelation functions of inflation rate.
The scale for lags is years. The 2 is at the eight-th position since we have quarterly data.
PACF 0.61 0.35 0.28 -0.04 0.09 0.12 -0.11 -0.18 0.02 0.08 0.07
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
ACF
0.26 0.19 0.27 0.22 0.23 0.29 0.23 0.23 0.27 0.25
PACF 0.03 -0.05 0.21 -0.04 0.00 0.07 0.01 -0.02 0.02 0.04
[,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.29 0.21 0.21 0.16 0.13 0.16 0.14 0.06 0.06
PACF 0.12 -0.19 -0.02 -0.06 -0.05 0.05 0.01 -0.08 0.02
The autocorrelation function confirms the high persistence of inflation. Verbeek
initially assumes that there are no unit roots and proposes the estimation of an
AR(3) model
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t
since the partial autocorrelation function cuts off at lag 3.
350
8.10.1
AR estimation
We can use the function dynlm in the package dynlm to obtain the estimate of the
AR(3) model by OLS:
> library(dynlm)
> ar3regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3))
> summary(ar3regr)
Start = 1960(4), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3))
Residuals:
Min
1Q
-16.6299 -0.9669
Median
0.0608
3Q
1.1217
Max
7.3940
Coefficients:
(Intercept) 0.76124
0.28860
2.638
L(infl)
0.29097
0.06823
4.264
L(infl, 2)
0.22492
0.06931
3.245
L(infl, 3)
0.29818
0.06921
4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00901
3.11e-05
0.00138
2.60e-05
**
***
**
***
"*" 0.05 "." 0.1 " " 1

The code dynlm(infl L(infl,1:3)) gives the same result.
The estimate of the mean of the process can be obtained as:
> mean(infl)
[1] 3.952869
or in an indirect way by using the estimate of the intercept and of the autoregressive
parameters:
> ar3regr$coef[[1]]/(1 - sum(ar3regr$coef[-1]))
[1] 4.094233
Estimation via maximum likelihood can be obtained with the function arima, see
Section 8.5 also for some other R functions performing ARMA model parameter
estimation.
8.10.2
351
The Ljung-Box statistic - construction
The Ljung-Box statistics for the first 6 autocorrelations can be obtained by applying
Verbeeks relationship (8.67) to the residuals of the regression
QK = T (T + 2)
K
X
k=1
where
rk2
1
r2 .
T k k
is the square of the autocorrelation of order k. With regard to Q6 :
> T <- length(ar3regr$res)

> T * (T + 2) * sum((1/(T - 1:6)) * acf(ar3regr$res)$acf[2:7]^2)
[1] 10.67575
8.10.3
The Ljung-Box statistic - direct function
It is possible to make direct use of the function Box.test(x, lag, type, fitdf),
where x is a univariate time series object; lag specifies the number of lags to consider;
type the statistic to compute "Box-Pierce" or "Ljung-Box"; fitdf is the number of
degrees of freedom to be subtracted to lag, if x is a series of residuals, usually fitdf
= p+q (where p and q are the orders respectively of the AR and MA parts of an
ARMA model describing the process level) so the degrees fo freedom are lag (p + q),
provided of course that lag > fitdf.
The Ljung-Box statistics for the first 6 and 12 autocorrelations result:
> Box.test(residuals(ar3regr), lag = 6, type = "Ljung-Box",
fitdf = 3)
Box-Ljung test
data: residuals(ar3regr)
fitdf = 3)
Box-Ljung test
To obtain AIC and BIC according to Verbeek formulae (8.68) and (8.69) use12
> T = length(infl)
> log(summary(ar3regr)$sigma^2) + 2 * (3 + 1)/T
[1] 1.767248
> log(summary(ar3regr)$sigma^2) + (3 + 1)/T * log(T)
[1] 1.83231
12 In the output Verbeek reports AIC and BIC computed based on the log likelihood, say
AIC = 2l/T + 2(p + q)/T where l is the log likelihood, and not on the variance of the residuals.
The function arima, see the next subsection, uses 2l + 2(p + q) for AIC.
352
8.10.4
AR estimation via Maximum Likelihood
We can use the function arima to obtain the estimate of the AR(3) model:
> (ar3est <- arima(infl, c(3, 0, 0)))
Series: infl
Coefficients:
ar1
ar2
0.2925 0.2278
s.e. 0.0669 0.0681
ar3
0.2970
0.0681
intercept
3.7845
0.8628

AIC=937.26
AICc=937.56
BIC=953.85
where intercept is an estimate for the mean of the process.
8.10.5
AR(4) estimation
Verbeek extends the model by adding an additional autoregressive term; the estimate
of the model with OLS is:
> ar4regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3) + L(infl, 4))
> summary(ar4regr)
Start = 1961(1), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3) + L(infl,
4))
Residuals:
Min
1Q
-16.5054 -1.0348
Median
0.1046
3Q
1.0461
Max
7.3404
Coefficients:
(Intercept) 0.77308
0.29583
2.613
L(infl)
0.30700
0.07179
4.277
L(infl, 2)
0.23416
0.07170
3.266
L(infl, 3)
0.31272
0.07258
4.309
L(infl, 4) -0.04466
0.07266 -0.615
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00967
2.97e-05
0.00129
2.60e-05
0.53950
**
***
**
***
"*" 0.05 "." 0.1 " " 1

Adjusted R-squared:
353
0.4823
while by using maximum likelihood we have

Series: infl
Coefficients:
ar1
ar2
0.3055 0.2381
s.e. 0.0701 0.0701
ar3
0.3097
0.0711
ar4
-0.0440
0.0712
intercept
3.8046
0.8301

AIC=938.88
AICc=939.31
BIC=958.79
The estimate for the fourth autoregressive lag is not significant. The Ljung-Box
statistics for the first 6 and 12 autocorrelations result:
fitdf = 4)
Box-Ljung test
fitdf = 4)
Box-Ljung test
8.10.6
ARMA estimation
Verbeeks adds a moving average term to the AR(3) model. The model:
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t + 1 t1
can be estimated with (Observe that the mean is returned named as intercept):
> (maest <- arima(infl, c(3, 0, 1)))
Series: infl
354
Coefficients:
ar1
ar2
0.1047 0.3029
s.e. 0.2078 0.1055
ar3
0.3607
0.0871
ma1
0.2069
0.2221
intercept
3.8094
0.8235

AIC=938.63
AICc=939.05
BIC=958.54
and the Ljung-Box tests for the first 6 and 12 autocorrelations result
> sapply(c(6, 12), function(i) Box.test(residuals(maest),
lag = i, type = "Ljung-Box", fitdf = 4))
[,1]
[,2]
statistic 10.83081
17.11687
parameter 2
8
p.value
0.004447542
0.02891488
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(maest)" "residuals(maest)"
8.10.7
AR(6) estimation
Since the three estimated models still exhibit some residual serial correlation Verbeek
suggests to inspect the residual autocorrelation and partial autocorrelation functions,
see Fig. 8.46,
> library(astsa)
> t(acf2(ar3regr$res, max.lag = 30))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.03 -0.16 -0.04 -0.01
PACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.04 -0.16 -0.01 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 0.04 -0.14 0.07 -0.05 0.00 0.11 -0.02 -0.05 0.07
PACF -0.05 -0.01 -0.12 0.14 -0.06 -0.03 0.10 0.00 -0.05 0.05
[,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.06 0.17 0.01 0.04 -0.08 -0.04 0.08 0.09 -0.06 -0.04
PACF 0.08 0.20 -0.07 0.06 0.00 -0.07 0.06 0.04 -0.04 0.03
According to Verbeek the inclusion of a sixth lag seems to be appropriate. We have:
> ar6regr <- dynlm(infl ~ L(infl, 1:6))
> summary(ar6regr)
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, 1:6))
355
0.2
0.2
ACF
0.6
1.0
Series: ar3regr$res
4
LAG
4
LAG
0.2
0.2
PACF
0.6
1.0
Figure 8.46
Sample autocorrelation and partial autocorrelation functions of the residuals.
Residuals:
Min
1Q
-16.089 -1.049
Median
0.150
3Q
1.043
Max
7.286
Coefficients:
(Intercept)
0.66109
0.30552
2.164 0.03172 *
L(infl, 1:6)1 0.30011
0.07196
4.170 4.61e-05 ***
L(infl, 1:6)2 0.21447
0.07514
2.854 0.00479 **
L(infl, 1:6)3 0.24636
0.07775
3.168 0.00178 **
L(infl, 1:6)4 -0.10612
0.07759 -1.368 0.17301
L(infl, 1:6)5 0.05719
0.07591
0.753 0.45215
L(infl, 1:6)6 0.13007
0.07300
1.782 0.07636 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
356

Adjusted R-squared:
Series: infl
Coefficients:
ar1
ar2
0.2991 0.2186
s.e. 0.0695 0.0727
ar3
0.2474
0.0751
ar4
-0.1038
0.0753
ar5
0.0595
0.0736
ar6
0.1287
0.0707
0.4855
intercept
3.6914
1.0147

AIC=937.67
AICc=938.41
BIC=964.22
> sapply(c(6, 12), function(i) Box.test(residuals(ar4est),
[,1]
[,2]
statistic 2.92888
13.39927
parameter 0
6
p.value
0
0.03711593
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(ar4est)" "residuals(ar4est)"
The three additional lags estimates are not significantly different from 0, we have the
lowest AIC but the BIC is better with AR(3).
8.10.8
Non complete models
Finally Verbeek suggests the estimation of a non-complete model, including for the
AR the first three and the sixth lags. OLS parameter estimates can be obtained by
using the function dynlm.
> summary(ar6regrrestr <- dynlm(infl ~ L(infl, c(1:3,
6))))
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, c(1:3, 6)))
Residuals:
Min
1Q
-16.4714 -0.9931
Median
0.0571
3Q
1.1065
Max
7.4546
Coefficients:
(Intercept)

0.64992
0.30139
2.156 0.032291 *
L(infl,
L(infl,
L(infl,
L(infl,
--Signif.
c(1:3,
c(1:3,
c(1:3,
c(1:3,
codes:
6))1
6))2
6))3
6))6
0.27279
0.21183
0.23985
0.12020
0.06928
0.06983
0.07591
0.06657
3.937
3.034
3.160
1.806
0.000115
0.002750
0.001835
0.072532
357
***
**
**
.
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

The model can also be estimated with maximum likelihood: the estimation of subsets
ARMA models can be obtained by using the argument fixed in the arima function.
Only the parameter corresponding to NA in the vector, which must have the same
length of the number of the parameters, will be estimated. Ljung-Box statistics
are also reported for lags 6 and 12 (The number of degrees of freedom is named
parameter).
> (ar6restr <- arima(infl, c(6, 0, 0), fixed = c(NA,
NA, NA, 0, 0, NA, NA)))
Series: infl
Coefficients:
ar1
ar2
0.2723 0.2170
s.e. 0.0672 0.0677
ar3
0.2414
0.0737
ar4
0
0
ar5
0
0
ar6
0.1206
0.0649
intercept
3.6820
1.0307

AIC=935.84
AICc=936.58
BIC=962.39
> sapply(c(6, 12), function(i) Box.test(residuals(ar6restr),
[,1]
[,2]
statistic 5.246681
14.90244
parameter 0
6
p.value
0
0.02102923
method
"Box-Ljung test"
"Box-Ljung test"
data.name "residuals(ar6restr)" "residuals(ar6restr)"
Verbeek proposes three scalar measures for evaluating the persistence of inflation.
the sum of the coefficients in the autoregressive specification, which for the
AR(3) and the non complete AR(6) models results:
> sum(coef(ar3regr)[-1])
[1] 0.8140705
358
> sum(coef(ar6regrrestr)[-1])
[1] 0.8446656
The maximum root in the auxiliary equations:

z 3 1 z 2 2 z 3 = 0
z 6 1 z 5 2 z 4 3 z 3 6 = 0
which must be inside the unit circle. The roots can be obtained with the function
polyroot, whose argument is the vector of the coefficients in the preceding
polynomial equations ordered according to the power of z. The function abs
returns the modulus.
> coefs <- rev(coef(ar3regr)[-1])
> abs(polyroot(c(-coefs, 1)))
[1] 0.5742232 0.5742232 0.9043114
> coefs <- rev(coef(ar6regrrestr)[-1])
> abs(polyroot(c(-coefs[1], 0, 0, -coefs[-1], 1)))
[1] 0.6130880 0.6344583 0.7332198 0.6130880 0.7332198 0.9375408
The estimates of the half-life shocks

h=
log(0.5)
P

p
log
j=1 j
> log(0.5)/log(sum(coef(ar3regr)[-1]))
[1] 3.369564
> log(0.5)/log(sum(coef(ar6regrrestr)[-1]))
[1] 4.105969
8.11
The Expectations Theory of the Term Structure

(Section 8.10)
Data can be read by means of the function read.table, having extracted the
file irates.dat from the compressed archive ch08.zip. The function ts(object,
start, frequency) creates a multiple time series from the columns of a table; in
this case we have a monthly frequency so frequency=12.
> irates <- read.table(unzip("ch08.zip", "Chapter 8/irates.dat"),
header = TRUE)
> irates <- ts(irates, start = c(1946, 12), frequency = 12)
359
10
15
r1
r60
1970
1975
1980
1985
1990
Time
Figure 8.47
1 month and 5 year interes rates, 1970:11991:2
The file irates contains monthly interest rates for the United States taken from
McCulloch and Kwon (1993). The series start in December 1946 and ends in February
1991.
All interest rates are expressed in % per year. The variables are coded as:
ri interest rate for a maturity of i months (i = 1, 2, 3, 5, 6, 11, 12, 36, 60, 120).
Verbeek observes that in the text a subsample is used starting in January 1970.
> irates1m <- window(irates[, 1], start = c(1970, 1),
end = c(1991, 2))
To obtain the plot for the 1 month and 5 year interes rates use, as usual, the function
xyplot in the package lattice, see Fig. 8.47.
> library(lattice)
> xyplot(window(irates[, c(1, 9)], start = c(1970,
1), end = c(1991, 2)), superpose = TRUE)
360
The OLS estimate of an AR(1) model for the 1 month interest rate can be obtained,
by making use of the function dynlm.
> library(dynlm)
> irates1moutput <- dynlm(irates1m ~ L(irates1m))
> summary(irates1moutput)
Start = 1970(2), End = 1991(2)
Call:
dynlm(formula = irates1m ~ L(irates1m))
Residuals:
Min
1Q
-4.2955 -0.3207
Median
0.0160
3Q
0.2993
Max
2.9569
Coefficients:
(Intercept) 0.34902
0.15246
2.289
0.0229 *
L(irates1m) 0.95120
0.01963 48.466
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
From which follows the estimate for the mean
> (uncmean <- as.numeric(irates1moutput$coef[1]/(1 irates1moutput$coef[2])))
[1] 7.151415
The sample average results:
> mean(irates1m)
[1] 7.302512
The Dickey-Fuller statistics, in presence of drift, to test for the presence of a unit root
can be obtained by means of the function ur.df available in the package urca.
> library(urca)
> summary(ur.df(irates1m, lags = 0, type = "drift"))
###############################################
###############################################
361
Call:
lm(formula = z.diff ~ z.lag.1 + 1)
Residuals:
Min
1Q
-4.2955 -0.3207
Median
0.0160
3Q
0.2993
Max
2.9569
Coefficients:
(Intercept) 0.34902
0.15246
2.289
0.0229 *
z.lag.1
-0.04880
0.01963 -2.487
0.0135 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1

1pct 5pct 10pct
tau2 -3.44 -2.87 -2.57
phi1 6.47 4.61 3.79
To obtain the augmented Dickey-Fuller tests with one, three and six additional lags
included use:
> library(urca)
> sapply(c(1, 3, 6), function(i) summary(ur.df(irates1m,
type = "drift", lags = i))@teststat[1])
[1] -2.619786 -2.280889 -1.885723
The plot of the autocorrelation function of the residuals can be obtained by using
the function acf2, see Fig. 8.48. Observe the scale on the x axis which follows from
the monthly frequency of the time series: lag 12 corresponds to 1 (year) and lag 6 (6
months) corresponds to a half (0.5 years).
> library(astsa)
> t(acf2(irates1moutput$residuals, 20))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.06 0.01 -0.11 -0.09 0.00 -0.04 -0.05 0.15 0.03 0.08
PACF 0.06 0.00 -0.11 -0.07 0.01 -0.05 -0.06 0.16 0.01 0.06
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.02 0.01 -0.06 0.09 -0.07 0.05 0.09 0.09 0.05 -0.16
PACF 0.04 0.03 -0.06 0.13 -0.07 0.04 0.11 0.06 0.02 -0.15
362
0.2
0.2
ACF
0.6
1.0
Series: irates1moutput$residuals
0.5
1.0
1.5
1.0
1.5
0.2
0.2
PACF
0.6
1.0
LAG
0.5
LAG
Figure 8.48
Residual autocorrelation function, AR(1) model, 1970:11991:2
Verbeek proposes the following regressions, to study the expectations hypothesis, by

regressing a long interest rate on the short rate:
rnt = 1 + 2 r1t + ut
with n = 3, 12, 60. Let irates3m, irates12m and irates60m be the involved
dependent variables over the proper time window. We have:
> irates3m <- window(irates[, 3], start = c(1970, 1),
end = c(1991, 2))
> irates12m <- window(irates[, 7], start = c(1970,
1), end = c(1991, 2))
> irates60m <- window(irates[, 9], start = c(1970,
1), end = c(1991, 2))
> n3 <- lm(irates3m ~ irates1m)
> library(memisc)
> mtable(n3, n12, n60, summary.stats = c("R-squared"))
363
Calls:
n3: lm(formula = irates3m ~ irates1m)
==========================================
n3
n12
n60
-----------------------------------------(Intercept)
0.321*** 1.292*** 3.352***
(0.066)
(0.128)
(0.217)
irates1m
1.009*** 0.947*** 0.739***
(0.009)
(0.017)
(0.028)
-----------------------------------------R-squared
0.982
0.929
0.735
==========================================
The values for n , see Verbeeks relationship (8.86), with = 0.95 result for the three
series:
> (1 - 0.95^3)/(1 - 0.95)/3
[1] 0.9508333
> (1 - 0.95^12)/(1 - 0.95)/12
[1] 0.7660665
> (1 - 0.95^60)/(1 - 0.95)/60
[1] 0.3179767
Forecasts under the hypothesis of random walk corresponds to the last observation:
> irates1m[length(irates1m)]
[1] 5.677
Using the estimate = 0.95 the 10 and 120 periods ahead forecasts result:
> uncmean + 0.95^10 * (irates1m[length(irates1m)] uncmean)
[1] 6.268628
> uncmean + 0.95^120 * (irates1m[length(irates1m)] uncmean)
[1] 7.148285
being the latter forecast very close to the unconditional mean.
8.12
8.12.1
Autoregressive Conditional Heteroscedasticity

A Brief Presentation of ARCH Processes
When a white noise time series {t } (e.g. the residuals from an ARMA model) shows
volatility clustering it can be modeled by means of an ARCH model.
364
The following relationship defines an ARCH process of order p

t = t ht
ht = V ar(t |t ) = 0 + 1 2t1 + 2 2t2 + . . . + p 2tp ,
(8.21)
where 0 > 0, 1 0, . . . , p 0; the stochastic process t is a white noise, usually

a sequence of i.i.d. standard Normal random variables, but sometimes other random
variables are also considered, like the Students t for financial time series to take
into account a leptokurtic behaviour; ht is the conditional variance based on the
information set available at time t.
Let t be the deviations of the squared process 2t from the conditional variance
t = 2t ht = t2 ht ht = (t2 1)ht
t = 2t ht = 2t 0 1 2t1 . . . p 2tp .
(8.22)
We can obtain the autoregressive specification for the squared process by solving the
second relationship for 2t
2t = 0 + 1 2t1 + . . . + p 2tp + t .
(8.23)
We have always to remember that, according to relationship (8.21), the genesis of an

ARCH process is of a multiplicative type; however the autoregressive specification
(8.23) can be used to identify the order of the model and to obtain provisional
estimates of the s parameters which can serve as starting values to apply numerical
procedures related to maximum likelihood.
We remind that the process {t } is weakly stationary if 1 + . . . + p < 1.
The presence of conditional heteroscedasticity can be graphically detected by
examining the autocorrelation function of the squared process and by considering
a test consisting in T times the R2 of the regression of 2t on an intercept and p
lagged values of 2t , which is distributed as 2p .
The order p of an ARCH model can be also determined by examining the behaviour
of the partial autocorrelation function of the squared process. If p is too high a more
parsimonious GARCH(p,q) model can be adopted.
ht = V ar(t |t ) = 0 + 1 2t1 + . . . + p 2tp + 1 ht1 + . . . + q htq
where 0 > 0, i 0 and i 0. The process {t } is weakly stationary if 1 + . . . +
p + 1 + . . . + q < 1.
We can re-express 2t in a fashion similar to (8.22)
t = 2t ht
= 2t 0 1 2t1 . . . p 2tp 1 (2t1 t1 ) . . . q (2tq tq );
we have
2t = 0 + 1 2t1 + . . . + p 2tp + t + 1 (2t1 t1 ) + . . . + q (2tq tq )
and
2t = 0 + 1 2t1 + . . . + p 2tp + 1 2t1 + . . . + q 2tq + t 1 t1 . . . q tq
365
Observe how the latter specification for the squared process resembles that of an
ARMA(max(p, q), q), but always remind the multiplicative nature of the process {t }.
Once the model has been estimated we have to check if the standardized residuals
pt
t
h
are white noise and if they follow the distribution assumed for t .
To check the white noise assumption the autocorrelation and partial autocorrelation
functions can be examined and tests both on the autocorrelations of the standardized
residuals and of squared standardized residuals performed.
The distributional assumption can be graphically tested by means of the QQ plot
and by having recourse to the Jarque-Bera and Shapiro-Wilk tests in case a normal
distribution was assumed, see Section 2.8.
8.12.2
A First Example
Identify a proper ARMA model for the time series ats stored in the workspace
exercise.RData, available on the booksite www.educatt.it/libri/materiali.
To load the R data set use the function load. By examining the behaviour of the time
series, see Fig. 8.49, and of its autocorrelation and partial autocorrelation functions,
see Fig. 8.50, one can conclude that an AR(2) model could fit the data:
> load("exercise.RData")
> library(TSA)
> plot(as.ts(ats))
> t(acf2(ats, 20, ma.test = TRUE))
we can also use, for completeness, the function armasubsets in the package TSA to
select the model with the lowest BIC statistic
> library(TSA)
> plot(armasubsets(ats, nar = 5, nma = 5))
Figure 8.51 shows that the best model is an AR(2) with a drift presence. It can be
estimated by using the function dynlm.
> library(dynlm)
> regr <- dynlm(ats ~ L(ats, 1) + L(ats, 2))
> summary(regr)
Time series regression with "zoo" data:
Start = 2012-01-10, End = 2013-05-21
Call:
dynlm(formula = ats ~ L(ats, 1) + L(ats, 2))
366
0
15
10
as.ts(ats)
10
15
20
100
200
300
400
500
Time
Figure 8.49
Residuals:
Min
1Q
-12.6518 -1.1904
Median
-0.0193
Graph of the ats series
3Q
1.2209
Max
13.6909
Coefficients:
(Intercept) 0.66923
0.14675
4.560
L(ats, 1)
0.51766
0.04423 11.704
L(ats, 2)
0.18209
0.04423
4.117
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
6.44e-06 ***
< 2e-16 ***
4.50e-05 ***
"*" 0.05 "." 0.1 " " 1

By having a look at the residual time series, see Fig. 8.52, it is evident the presence
of heteroscedasticity.
367
0.2
0.2
ACF
0.6
1.0
Series: ats
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.50
series
Autocorrelation function and partial autocorrelation function for the ats
> plot(regr$res)
By checking the behaviour of the autocorrelation function of the squared residuals, see
Fig. 8.53, the presence of autoregressive conditional heteroscedasticity is confirmed
and an ARCH(2) model is suggested.
> acf2(regr$res^2)
Tests for ARCH effects are performed by regressing the squared residuals on a
constant and p lagged squared residuals series.
> (etsumm <- summary(dynlm(regr$res^2 ~ L(I(regr$res^2),
1))))
Time series regression with "zoo" data:
Start = 2012-01-11, End = 2013-05-21
Call:
dynlm(formula = regr$res^2 ~ L(I(regr$res^2), 1))
368
errorlag5
errorlag4
errorlag3
errorlag2
errorlag1
Ylag5
Ylag4
Ylag3
Ylag2
Ylag1
(Intercept)
200
200
190
BIC
190
190
180
180
170
Figure 8.51
Residuals:
Min
1Q
-43.154 -5.175
Median
-4.319
BIC comparison for the best models
3Q
Max
-0.740 175.485
Coefficients:
(Intercept)
4.99773
0.86551
5.774 1.37e-08 ***
L(I(regr$res^2), 1) 0.29382
0.04297
6.838 2.37e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
0.08446
T times the R2 gives the test statistic which is distributed according to a 2 random
variable with p degrees of freedom.
369
0
10
regr$res
10
15400
15500
15600
15700
15800
Index
Figure 8.52
Residuals from the AR model
> etsumm$r.squared * (length(regr$res) - 1)

[1] 42.89536
> qchisq(0.95, 1)
[1] 3.841459
In the package FinTS the function ArchTest is available, which allows an LM test for
the presence of Autoregressive Conditional Heteroscedasticity in the residual series
to be performed:
> library(FinTS)
> ArchTest(regr$res, lags = 1, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: regr$res
Chi-squared = 42.8954, df = 1, p-value = 5.775e-11
> lags = 1:5
> fun <- function(i) ArchTest(regr$res, lags = i, demean = FALSE)
> sapply(lags, fun)
370
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: regr$res^2
10
15
20
25
30
20
25
30
PACF
0.0 0.2 0.4 0.6 0.8 1.0
LAG
10
15
LAG
Figure 8.53 Autocorrelation and partial autocorrelation functions of the residuals from
the AR model
statistic
parameter
p.value
method
data.name
statistic
parameter
p.value
method
data.name
statistic
parameter
p.value
[,1]
42.89536
1
5.774747e-11
"ARCH LM-test;
"regr$res"
[,2]
56.07503
2
6.660228e-13
"ARCH LM-test;
"regr$res"
[,3]
55.92459
3
4.359513e-12
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
method
"ARCH LM-test;
data.name "regr$res"
[,4]
statistic 56.12527
parameter 4
p.value
1.887501e-11
method
"ARCH LM-test;
[,5]
statistic 56.11349
parameter 5
p.value
7.700796e-11
method
"ARCH LM-test;
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
371
It is possible to estimate jointly the parameters in the ARMA-ARCH specification,

by having recourse to the function garchFit in the package fGarch, and considering
different model specifications.
The main arguments13 of the function garchFit are:
formula a formula object describing the mean and variance equation of the
ARMA-GARCH/APARCH model. A pure GARCH(1,1) model is selected
when e.g. formula=~garch(1,1). To specify for example an ARMA(2,1)APARCH(1,1) use formula = ~arma(2,1)+apaarch(1,1).
data an optional timeSeries or data frame object containing the variables in the
model. If not found in data, the variables are taken from environment(formula),
typically the environment from which armaFit is called. If data is an univariate
series, then the series is converted into a numeric vector and the name of the
response in the formula will be neglected.
skew a numeric value, the skewness parameter of the conditional distribution.
shape a numeric value, the shape parameter of the conditional distribution.
cond.dist a character string naming the desired conditional distribution. Valid

values are "dnorm", "dged", "dstd", "dsnorm", "dsged", "dsstd" and "QMLE".
The default value is the normal distribution.
include.mean this flag determines if the parameter for the mean will be
estimated or not. If include.mean=TRUE this will be the case, otherwise the
parameter will be kept fixed during the process of parameter optimization.
include.shape logical flag which determines if the parameter for the shape of
the conditional distribution will be estimated or not. If include.shape=FALSE
then the shape parameter will be kept fixed during the process of parameter
optimization.
trace a logical flag. Should the optimization process of fitting the model
parameters be printed? By default trace=TRUE.
13 From
the R-help system. Have a look at ?fGarch::garchFit for more information.
372
APARCH models and skew distributions for the errors are also implemented.
We start by considering the larger model Yt AR(4) and t ARCH(4)
> garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Mean and Variance Equation:
data ~ arma(4, 0) + garch(4, 0)
<environment: 0x00000000144b8728>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.60499579
0.51475833
omega
alpha1
1.79948380
0.78609851
ar2
0.18565747
alpha2
0.01896518
ar3
-0.01889278
alpha3
0.00000001
ar4
0.01062030
alpha4
0.00000001
Std. Errors:
based on Hessian
Error Analysis:
mu
6.050e-01
8.374e-02
7.225 5.02e-13 ***
ar1
5.148e-01
4.313e-02
11.934 < 2e-16 ***
ar2
1.857e-01
3.694e-02
5.026 5.01e-07 ***
ar3
-1.889e-02
3.484e-02
-0.542
0.588
ar4
1.062e-02
3.047e-02
0.349
0.727
omega
1.799e+00
2.198e-01
8.188 2.22e-16 ***
alpha1 7.861e-01
1.071e-01
7.338 2.16e-13 ***
alpha2 1.897e-02
3.607e-02
0.526
0.599
alpha3 1.000e-08
4.095e-02
0.000
1.000
alpha4 1.000e-08
NA
NA
NA
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1079.044
normalized:
-2.158088
373
Description:
From the preceding results we observe that the coeffients pertaining the 3rd and 4th
autoregressive lags as well as 3 and 4 are not significantly different from 0, so we
can proceed estimating the model Yt AR(2) and t ARCH(1)
By using the function summary applied to the object resulting from garchFit
diagnostic results pertaining standardized residuals ht are also produced
t
> archmodel <- garchFit(formula = ~arma(2, 0) + garch(1,

0), data = ats, trace = FALSE)
> summary(archmodel)
Title:
GARCH Modelling
Call:
trace = FALSE)
<environment: 0x0000000015507c70>
[data = ats]
norm
Coefficient(s):
mu
ar1
0.60017 0.52472
ar2
0.17563
omega
1.78951
alpha1
0.82856
Std. Errors:
based on Hessian
Error Analysis:
mu
0.60017
0.08381
7.161 8.00e-13 ***
ar1
0.52472
0.03971
13.215 < 2e-16 ***
ar2
0.17563
0.03493
5.029 4.94e-07 ***
omega
1.78951
0.20572
8.699 < 2e-16 ***
alpha1
0.82856
0.10820
7.658 1.89e-14 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1077.762
normalized:
-2.155525
374
Description:
Standardised Residuals Tests:

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
1.325538
0.9976004
3.376161
8.124189
11.41074
6.433391
7.772612
14.90346
7.920425
p-Value
0.5154222
0.6966819
0.9711369
0.9187073
0.9348674
0.777633
0.9325749
0.7819045
0.7913176
Information Criterion Statistics:

AIC
BIC
SIC
HQIC
4.331050 4.373196 4.330852 4.347588
To assess if the standardized residuals are white noise the Ljung-Box tests up to order
10, 15, 20 are reported both for standardized residuals and their squared values. There
is no evidence of autocorrelation nor of conditional heteroscedasticity presence.
To assess the normality distributional assumption the Jarque-Bera statistic and
the Shapiro-Wilk test are reported; both tests do not reject the null hypothesis of
normality.
The function plot applied to an fGARCH object deriving from a garchFit estimate,
produces the graphs in Fig. 8.54 and 8.55.
> plot(archmodel, which = 1:8)
> plot(archmodel, which = 9:13)
In Fig. 8.54 are reported, by column:
1. the graphical representation of the time series yt , in the present case ats
2. the graphical representation of the conditional standard deviations ht

3. the graphical representation
of the time series yt with the 2 conditional standard
deviations 2 ht superposed
4. the autocorrelation function of the time series yt
5. the autocorrelation function of the squared series 2t
6. the cross correlation function between yt2 and yt
7. the graphical representation of the residuals t
375
ACF of Squared Observations
200
300
400
500
10
15
20
Conditional SD
Cross Correlation
ACF
25
0.3
Lags
0.1
0
100
200
300
400
500
20
10
10
Lags
Series with 2 Conditional SD Superimposed
Residuals
20
res
100
200
300
400
500
100
200
300
Index
ACF of Observations
Conditional SD's
xcsd
500
400
500
0.0
400
6 10
Index
0.6
10 0
10
20
Index
15
Index
100
6 10
ACF
0.6
ACF
0.0
5
15
20
Time Series
10
15
20
25
100
200
Lags
300
Index
Figure 8.54
fGARCH plots
8. the graphical representation of the conditional standard deviations
ht
while in Fig. 8.55 are reported

9. the graphical representation of the standardized residuals
10. the autocorrelation function of the standardized residuals
t
ht
t
ht
11. the partial autocorrelation function of the standardized residuals

12. the cross correlation function between
2t
ht
and
13. the QQ plot for the standardized residuals
t
ht
t
ht
t
ht
A single graph can be extracted by using plot(archmodel,which=number), where

number is in the set {1, . . . , 13}.
To obtain forecasts the function predict can be used
> predict(archmodel, 4)
meanForecast meanError standardDeviation
1
2.691235 1.381014
1.381014
376
Cross Correlation
0.15
0.05
0.05
ACF
0 1 2
2
sres
100
200
300
400
500
10
qnorm QQ Plot
0.8
0 1 2
ACF of Standardized Residuals
0.4
4
0
10
Lags
0.0
ACF
20
Index
Sample Quantiles
10
15
20
25
20
Lags
ACF
0.0
0.4
0.8
ACF of Squared Standardized Residuals
10
15
20
25
Lags
Figure 8.55
2
3
4
2.544390
2.407925
2.310529
1.973537
2.428413
2.763893
fGARCH plots
1.835684
2.140452
2.363383
Simulation of a GARCH process

The following code was used to generate the ats series.
> library(fGarch)
> set.seed(123456)
> spec = garchSpec(model = list(mu = 0.5, ar = c(0.5,
0.2), omega = 2, alpha = c(0.75), beta = 0))
> ats <- garchSim(spec, n = 500)
The reader can check the estimation result in presence of longer time series.
> ats <- garchSim(spec, n = 10000)
> summary(garchFit(formula = ~arma(2, 0) + garch(1, 0),
data = ats, trace = FALSE))
Title:
GARCH Modelling
Call:
trace=FALSE)
<environment: 0x0000000019fff210>
[data = ats]
norm
Coefficient(s):
mu
ar1
0.53781 0.49843
ar2
0.19361
omega
1.94839
alpha1
0.77242
Std. Errors:
based on Hessian
Error Analysis:
mu
0.537809
0.018706
28.75
<2e-16 ***
ar1
0.498429
0.008842
56.37
<2e-16 ***
ar2
0.193607
0.007809
24.79
<2e-16 ***
omega
1.948387
0.047459
41.05
<2e-16 ***
alpha1 0.772418
0.023185
33.32
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-21554.46
normalized:
-2.155446
Description:

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
R
R
R
R
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Statistic
4.09366
NA
12.31859
15.11452
22.38218
p-Value
0.1291436
NA
0.2643003
0.4431998
0.3201377
377
378
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R^2
R^2
R^2
R
Q(10)
Q(15)
Q(20)
TR^2
17.56984
21.53238
22.21691
19.71963
0.06266804
0.1206646
0.3288577
0.07257902

AIC
BIC
SIC
HQIC
4.311891 4.315496 4.311891 4.313112
8.13
Volatility in Daily Exchange Rates (Section 8.11.3)
Data can be read by means of the function read.table, having extracted the file
usd.dat from the compressed archive ch08.zip.
> crates <- read.table(unzip("ch08.zip", "Chapter 8/usd.dat"),
header = TRUE)
The file usd contains daily exchange rate changes from 5 January 1999 to 28 February
2011 (T=3108), without gaps.
dlogusd100 change in log exchange rate dollar/euro (x 100)
dayoftheweek 1=monday, 2=tuesday, etc.
monday dummy for monday
tuesday dummy for tuesday
wednesday dummy for wednesday
thursday dummy for thursday
Since data are irregular (5 days in a week) it is preferable to create an undated time
series. The differenced US$/Euro exchange rate series is analyzed.
> yt <- as.ts(crates[, 3])
The graphical representation can be obtained as:
> library(lattice)
> xyplot(yt)
By regressing the series yt on a constant the residuals et are obtained that can
be modelled by means of an ARCH specification if conditional heteroscedasticity is
present.
> library(dynlm)
> et <- dynlm(yt ~ 1)$res
Tests for ARCH effects can be obtained by means of the function ArchTest in the
package FinTS. Verbeek considers 1 and 6 lags.
379
1000
2000
3000
Time
Figure 8.56
Daily change in log exchange rate US$/DM, 2 January 1980-21 May 1987
> library(FinTS)
> ArchTest(et, lags = 1, demean = FALSE)
data: et
Chi-squared = 136.3154, df = 1, p-value < 2.2e-16
> ArchTest(et, lags = 6, demean = FALSE)
data: et
Chi-squared = 208.243, df = 6, p-value < 2.2e-16
Both tests reject the hypothesis of homoscedasticity.
The following four models are estimated: an ARCH(6), a GARCH(1,1), an
EGARCH(1,1) and a GARCH(1,1) model with t-distributed errors. The parameter
estimates of the first two and the fourth models can be obtained by using the function
garchFit available in the package fGarch.
380
> library(fGarch)
> arch6 <- garchFit(formula = ~garch(6, 0), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
> summary(arch6)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(6, 0), data = as.vector(et),
data ~ garch(6, 0)
<environment: 0x0000000016613a18>
[data = as.vector(et)]
norm
Coefficient(s):
omega
alpha1
0.237534 0.072566
alpha6
0.078275
alpha2
0.026908
alpha3
0.084783
alpha4
0.112259
alpha5
0.095242
Std. Errors:
based on Hessian
Error Analysis:
omega
0.23753
0.01524
15.590 < 2e-16 ***
alpha1
0.07257
0.02194
3.308 0.000939 ***
alpha2
0.02691
0.02074
1.298 0.194433
alpha3
0.08478
0.02270
3.736 0.000187 ***
alpha4
0.11226
0.02451
4.579 4.67e-06 ***
alpha5
0.09524
0.02284
4.169 3.05e-05 ***
alpha6
0.07827
0.01945
4.025 5.69e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-3045.473
normalized:
-0.979882
Description:
381

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
203.2454
0.992023
5.577922
18.3471
21.25551
9.355185
23.09638
43.9503
11.90132
p-Value
0
4.006602e-12
0.8493912
0.2448572
0.382233
0.4987593
0.08211484
0.001528166
0.4536374

AIC
BIC
SIC
HQIC
1.964269 1.977876 1.964258 1.969154
The estimate of the parameter 2 is not significantly different from 0, but we have
to observe that the current version of the function garchFit does not allow the
estimation of non complete models.
The function plot applied to an fGARCH object deriving from a garchFit estimate,
produces the graphs in Fig. 8.57 and 8.58.
> plot(arch6, which = 1:8)
> plot(arch6, which = 9:13)
Tests for the autocorrelation of standardized residuals do no reject the null of white
noise, while the tests for autocorrelation of squared residuals give evidence of the
presence of conditional heteroscedasticity up to lag 20; this behaviour is not so evident
from the analysis of the autocorrelation function, see Fig. 8.58.
The Jarque-Bera and Shapiro-Wilk statistics reject the null of a normal distribution
for the standardized residuals. The QQ plot gives evidence that the tails are fatter
than those of a normal distribution, see Fig. 8.59.
> plot(arch6, which = 13)
The function predict can be used to obtain forecasts
> predict(arch6, 4)
meanForecast meanError standardDeviation
1
0 0.5944364
0.5944364
2
0 0.5804800
0.5804800
3
0 0.5721766
0.5721766
4
0 0.5828789
0.5828789
The code for estimating a GARCH(1,1) model is:
382
ACF of Squared Observations
0.0
1000
1500
2000
2500
3000
500
15
20
25
Cross Correlation
1000
1500
2000
2500
3000
30
20
10
10
35
Lags
Series with 2 Conditional SD Superimposed
Residuals
20
30
0
4
res
Index
30
0.06 0.04
Conditional SD
500
1000
1500
2000
2500
3000
500
1000
1500
2000
Index
Index
ACF of Observations
Conditional SD's
2500
3000
2500
3000
0.5
0.0
xcsd
0.6
1.5
10
Lags
ACF
0
Index
0.5
500
1.5
ACF
0.6
ACF
0
4
Time Series
10
15
20
25
30
35
500
Lags
1000
1500
2000
Index
Figure 8.57
fGARCH plots
> garch1_1 <- garchFit(formula = ~garch(1, 1), data = as.vector(et),

> summary(garch1_1)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data = as.vector(et),
data ~ garch(1, 1)
<environment: 0x00000000170f00c8>
norm
Cross Correlation
ACF
0.00
0.06
0
4
sres
0.06
500
1000
1500
2000
2500
3000
30
20
10
10
Index
Lags
ACF of Standardized Residuals
qnorm QQ Plot
20
30
ACF
0.0
0.4
0.8
Sample Quantiles
10
15
20
25
30
35
Lags
0.0
ACF
0.4
0.8
ACF of Squared Standardized Residuals
10
15
20
25
30
35
Lags
Figure 8.58
Coefficient(s):
omega
alpha1
0.0016224 0.0308557
fGARCH plots
beta1
0.9658062
Std. Errors:
based on Hessian
Error Analysis:
omega 0.0016224
0.0006968
2.328
0.0199 *
alpha1 0.0308557
0.0041378
7.457 8.86e-14 ***
beta1 0.9658062
0.0044704 216.047 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2978.583
normalized:
-0.9583602
383
384
qnorm QQ Plot
2
0
2
Sample Quantiles
Figure 8.59 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption
Description:

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
135.6136
0.993768
4.401897
15.36537
18.42054
3.850639
6.46133
11.21603
4.583747
p-Value
0
2.814197e-10
0.9274011
0.4254333
0.5597264
0.9538334
0.970914
0.9404265
0.9704592
385
qnorm QQ Plot
0
2
Sample Quantiles
Figure 8.60 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption

AIC
BIC
SIC
HQIC
1.918651 1.924483 1.918649 1.920745
Also in this case the tests for the autocorrelation of standardized residuals do no
reject the null of white noise, and tests for autocorrelation of squared residuals do not
give evidence of the presence of conditional heteroscedasticity.
for the standardized residuals and the QQ plot gives evidence that the tails are fatter
than those of a normal distribution, see Fig. 8.60.
> plot(garch1_1, which = 13)
The code for estimating a GARCH(1,1) model with t distributed errors is:
> garch1_1t <- garchFit(formula = ~garch(1, 1), data = as.vector(et),
cond.dist = "std", include.mean = FALSE, trace = FALSE)
> summary(garch1_1t)
386
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data=as.vector(et), cond.dist =
"std", include.mean = FALSE, trace = FALSE)
data ~ garch(1, 1)
<environment: 0x00000000179a6cc0>
std
Coefficient(s):
omega
alpha1
0.0018883
0.0313289
beta1
0.9648050
shape
10.0000000
Std. Errors:
based on Hessian
Error Analysis:
omega 1.888e-03
8.235e-04
2.293
0.0218 *
alpha1 3.133e-02
4.817e-03
6.504 7.84e-11 ***
beta1 9.648e-01
5.246e-03 183.899 < 2e-16 ***
shape 1.000e+01
1.497e+00
6.681 2.38e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2952.596
normalized:
-0.9499987
Description:

Jarque-Bera Test
Shapiro-Wilk Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
R
R
R
R
R
R^2
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Statistic
136.7413
0.9937392
4.436802
15.43505
18.49223
3.906787
p-Value
0
2.608938e-10
0.9254986
0.4205569
0.5550169
0.951454
387
qstd QQ Plot
0
2
Sample Quantiles
Figure 8.61 QQ plot for the standardardized residuals from a GARCH(1,1) model with
t distributed errors
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R^2
R^2
R
Q(15)
Q(20)
TR^2
6.532642
11.22761
4.622786
0.9693467
0.9401049
0.9694076

AIC
BIC
SIC
HQIC
1.902571 1.910347 1.902568 1.905363
Also in this case the tests for the autocorrelation of standardized residuals do no
reject the null of white noise, and tests for autocorrelation of squared residuals do not
give evidence of the presence of conditional heteroscedasticity.
for the standardized residuals and the QQ plot shows how the t distribution can better
capture the behaviour on the tails of standardized residuals, see Fig. 8.61.
> plot(garch1_1t, which = 13)
Also the Information Criterion Statistics give evidence of a better fit.
388
To variance specification of an EGARCH(1,1) process is

)
(
|t1 |
t1
+ p
ht = exp + log(ht1 ) + p
ht1
ht1
to estimate an EGARCH(1,1) model we have to construct the likelihood function
> "Exponential GARCH likelihood"
[1] "Exponential GARCH likelihood"
> T <- length(et)
> egarchllik <- function(par) {
omega <- par[1]
beta <- par[2]
gamma <- par[3]
alpha <- par[4]
sigma2 <- rep(var(et), T)
for (i in 2:T) {
sigma2[i] <- exp(omega + beta * log(sigma2[i 1]) + gamma * et[i - 1]/sigma2[i - 1]^0.5 +
alpha * abs(et[i - 1])/sigma2[i - 1]^0.5)
}
llik <- -sum(dnorm(et, mean = mean(et), sd = sigma2^0.5,
log = TRUE))
llik
}
and use a numerical procedure to obtain estimates and their standard errors
> out <- optim(c(0, 0.5, 0, 0.5), egarchllik,
control = list(reltol = 1e-15), hessian = TRUE)
which can be extracted from the object out as
> out$par
[1] -0.062172568 0.994893276 -0.008454532 0.074520295
> diag(solve(out$hessian))^0.5
[1] 0.007948912 0.001939568 0.005554193 0.009590379
To extract conditional standard deviations use
>
>
>
>
>
>
omegahat <- out$par[1]

betahat <- out$par[2]
gammahat <- out$par[3]
alphahat <- out$par[4]
sigma2hat <- rep(var(et), T)
for (i in 2:T) sigma2hat[i] <- exp(omegahat + betahat *
log(sigma2hat[i - 1]) + gammahat * et[i - 1]/sigma2hat[i 1]^0.5 + alphahat * abs(et[i - 1])/sigma2hat[i 1]^0.5)
389
Figure 8.62 shows the conditional standard deviations implied by the EGARCH model
and the QQ plot for checking the normality distribution of the standardized residuals.
> xyplot(as.ts(sigma2hat^0.5))
> stdres <- et/sigma2hat^0.5
> qqnorm(stdres)
The function egarch in the package egarch is also available for estimating the
parameters of an EGARCH(1,1) model, but it refers to the following EGARCH model
specification:

#)
(
"

t1
t1
t1
2
2
+ 1 p
log(t ) = 0 + 1 log(t1 ) + 1 p
E p

ht1
ht1
ht1
different from that proposed by Verbeek, see ?egarch::egarch.
390
0.4
0.6
0.8
1.0
1.2
1000
2000
3000
Time
Normal QQ Plot
0
2
4
Sample Quantiles
9
Multivariate Time Series Models
9.1
Spurious Regression (Section 9.2.1)
Data on two stochastic processes Xt and Yt , generated by two independent random

walks, see Verbeeks relationships (9.15) and (9.16):
Yt = Yt1 + 1t ,
1t IID(0, 12 )
Xt = Xt1 + 2t ,
2t IID(0, 22 )
where 1t and 2t mutually independent

are available in the data set spurious.dat. This can be read by means of the function
read.table, having extracted the file spurious.dat from the compressed archive
ch09.zip.
> spurreg <- read.table(unzip("ch09.zip", "Chapter 9/spurious.dat"),
header = TRUE)
The regression model:
Yt = + Xt + t
is estimated to show the spurious regression results. The estimates in Verbeeks Table
9.1 can be obtained by means of the following code, having computed the DurbinWatson statistic by using the function dwtest in the package lmtest, see Section
4.2.2.
> tab9.1 <- lm(Y ~ X, data = spurreg)
> summary(tab9.1)
Call:
lm(formula = Y ~ X, data = spurreg)
Residuals:
Min
1Q Median
-6.4625 -2.5223 -0.0414
3Q
2.1179
Max
8.2879
Coefficients:
392
(Intercept) 3.90971
0.24618
15.88
<2e-16 ***
X
-0.44348
0.04733
-9.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Adjusted R-squared:
> library(lmtest)
> dwtest(tab9.1)
Durbin-Watson test
0.3037
data: tab9.1
DW = 0.1331, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0
As Verbeek remarks, the usual t and F tests may be misleading in situations like the
present one, where we can observe a quite reasonable R-squared and a low DurbinWatson statistic1 , which gives information about the possible presence of a unit
root in the residuals giving also evidence about the non-existence of a cointegrating
relationship between {Xt } and {Yt } as would appear by considering the spurious
relationship. He suggests to include lagged values of both the dependent and the
independent variables in the regression to avoid the spurious regression problem.
Lets perform 1000 Monte Carlo replications of the above spurious regression
experiment to show the large variability level that characterizes parameter estimates,
since there is no actual presence of a relationship between {Xt } and {Yt }.
> set.seed(12345)
> library(lmtest)
> sim <- function(n = 200) {
X <- c(0, cumsum(rnorm(n - 1)))
Y <- c(0, cumsum(rnorm(n - 1)))
res <- lm(Y ~ X)
res1 <- list(int = res$coef[1], slope = res$coef[2],
int.pv=summary(res)$coeff[1,4],slope.pv=summary(res)$coeff[2,
4], r2 = summary(res)$r.squared, dw = dwtest(res)$statistic)
}
> a <- replicate(1000, sim(n = 200))
The function sim generates the data for the present experiment2 ; it returns the
list res1 with elements: the intercept and the slope estimates, the corresponing tstatistics, the multiple R-squared and the Durbin-Watson statistic.
1 We remind that dw 2 2, so approximately 0 dw 4, being dw = 0 when = 1 that is in
presence of a positive unit root, dw = 2 when = 0, and dw = 4 when = 1 that is in presence of
a negative unit root.
2 Observe that to generate a random walk the following for loop can also be used:
X<-rep(0,n)
for (i in 2:n) X[i]<-X[i-1]+rnorm(1)
but it is less efficient and slower; check the two versions of the code for n=1000000.
393
The function replicate (a shortcut for sapply) returns the matrix a, which
contains the results for the 1000 replications of the experiment; in the rows of a are the
values of the intercept, the slope, their significances, the R2 and the Durbin-Watson
statistic, obtained for each replication of the experiment.
By computing the following summary statistics we can observe that 89.5%
replications present a significant3 estimate for the intercept, 82.5% a significant
estimate for the slope, 73.8% significant estimates for both the intercept and the slope,
33.7% a multiple R-squared larger than 0.3 and 98.8% a value for the Durbin-Watson
statistic lower than 0.25.
> sum((a[3, ] < 0.05))/1000
[1] 0.895
> sum((a[4, ] < 0.05))/1000
[1] 0.825
> sum((a[3, ] < 0.05) * (a[4, ] < 0.05))/1000
[1] 0.738
> sum(a[5, ] > 0.3)/1000
[1] 0.337
> sum(a[6, ] < 0.25)/1000
[1] 0.988
Summary statistics for the intercept and slope give evidence that their estimates are
quite different from simulation to simulation, see also Fig. 9.1, giving evidence of the
presence of spurious relationships. Namely {Xt } and {Yt } do not have any reciprocal
relationships.
> summary(as.numeric(a[1, ]))
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
-21.1800 -4.5880 -0.2781 -0.2166
3.7530 27.2800
> summary(as.numeric(a[2, ]))
Min.
1st Qu.
Median
Mean
3rd Qu.
-2.5020000 -0.3608000 0.0091750 0.0004262 0.3686000
Max.
2.7430000
> hist(as.numeric(a[1, ]), freq = FALSE, main = "",
xlab = "intercept")
> hist(as.numeric(a[2, ]), freq = FALSE, main = "",
xlab = "slope")
9.2
Long-run Purchasing Power Parity (Part 2) (Section 9.3)
The analyisis of Long-run Purchasing Power Parity, started in Section 8.9 (Verbeeks
Section 8.5), is continued. To import the data from the file ppp2.wf1, which is a work
3 At
the 5% level.
394
0.0
0.00
0.1
0.01
0.2
0.3
Density
0.03
0.02
Density
0.4
0.04
0.5
0.05
0.6
20
10
intercept
Figure 9.1
20
30
3 2 1
slope
Histograms for the intercept and slope estimates
file of EViews, invoke first the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Verbeek observes that the relationship st = pt pt , where st , pt and pt are
respectively the log of the spot exchange rate, the log of domestic prices and that
of foreign prices, may be interpreted as an equilibrium (long-run) or cointegrating
relationship.
In the example, observations for Euro area and the UK from January 1988 until
December 2010 are considered.
See Section 8.9 for the analysis to detect the non-stationarity of the real exchange
rate rst = st pt + pt .
Verbeek suggests to test if the cointegrating relationship involving the log exchange
rate st and the log of price ratio pt pt can be established. The results in Section
395
8.9 suggested st as I(1).

Augmented Dickey Fuller tests can be performed to establish if also the ratio pt pt
is I(1), by having recourse to the function ur.df available in the package urca. So
Verbeeks Table 9.4 can be reproduced with the following code4 .
>
>
>
>
>
library(urca)
x <- ppp$LOGCPIEURO - ppp$LOGCPIUK
a <- c(0:6, 12, 24, 36)
names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
f <- function(k) {
nt <- summary(ur.df(x, type = "drift", lags = k))@teststat[1]
wt <- summary(ur.df(x, type = "trend", lags = k))@teststat[1]
return(c(nt, wt))
}
> rownames(out) <- c("Without trend", "With trend")
> print(t(round(out, 3)))
Without trend With trend
DF
-2.487
-2.564
ADF(1)
-2.533
-2.622
ADF(2)
-2.518
-2.639
ADF(3)
-2.137
-2.288
ADF(4)
-2.070
-2.229
ADF(5)
-2.037
-2.213
ADF(6)
-2.103
-2.227
ADF(12)
-2.989
-3.041
ADF(24)
-3.131
-3.424
ADF(36)
-2.027
-1.975
Remembering that the 5% critical values for the Dickey Fuller statistics are 2.88,
3.43 respectively for the situations with only a drift and with both a drift and
the trend, the hypothesis of non-stationarity cannot be rejected. The ADF(24) is
marginally significant.
The parameters in the cointegrating regression (see Verbeeks Table 9.5)
st = + (pt pt ) + t
can be estimated by having, as usual, recourse to the function lm.
> a <- lm(log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
Residuals:
4 As
Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.
396
Min
1Q
-0.24561 -0.08619
Median
0.01368
3Q
0.06477
Max
0.22179
Coefficients:
(Intercept)
0.38246
0.01806 21.172 < 2e-16 ***
I(LOGCPIEURO - LOGCPIUK) 1.01657
0.28125
3.614 0.000358 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Tests on the residuals for establishing the possible presence of nonstationarity can
also be performed.
The cointegrating regression Durbin-Watson (CRDW) statistic can be obtained with
the function dwtest available in the package lmtest; pay attention to consider only
the value of the statistic and not its p-value, which is computed against the null of
no autocorrelation presence and not for no cointegration. See Verbeeks Table 9.3 for
the 5% critical values for the CRDW test for no cointegration.
> library(lmtest)
> dwtest(a)$stat
DW
0.04120717
The Augmented Dickey Fuller cointegration test is also performed on the residuals,
see Verbeeks Table 9.6. We can have recourse again to the function f constructed
above, by considering only the first row of the resulting output, which is referred to
the situation with the presence of only the drift.
>
>
>
>
x <- a$residuals
a <- 0:6
out <- sapply(a, f)
print(t(round(out[1, ], 3)))
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[1,] -1.497 -1.478 -1.431 -1.479 -1.628 -1.522 -1.392
Also in this case only the value of the statistic can be considered. See Verbeeks Table
9.2 for the 1%, 5% and 10% asymptotic critical values residual-based unit root ADF
test for no cointegration (with constant term). In the present case the 5% critical
value is 3.34 since two variables were considered in the cointegration relationship.
So the null hypothesis of a unit root cannot be rejected.
Verbeek then suggests to consider a more general cointegrating relationships between
the three variables st , pt and pt , by estimating the parameters in the model
st = + pt + pt + t ,
397
see Verbeeks Table 9.7, and performing the corresponding tests on the residuals as
made above, see Verbeeks Table 9.8.
> a <- lm(log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
Residuals:
Min
1Q
-0.24406 -0.08568
Median
0.01416
3Q
0.06590
Max
0.22198
Coefficients:
(Intercept)
0.4193
0.2082
2.014
LOGCPIEURO
1.0076
0.2863
3.520
LOGCPIUK
-1.0151
0.2819 -3.601
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.045027 *
0.000506 ***
0.000376 ***
"*" 0.05 "." 0.1 " " 1

> library(lmtest)
> dwtest(a)$stat
DW
0.04120736
> x <- a$residuals
> a <- 0:6
> print(t(round(out[1, ], 3)))
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[1,] -1.508 -1.489 -1.439 -1.485 -1.633 -1.528 -1.399
The 5% critical value for the residual-based unit root ADF test for no cointegration
(with constant term) is, from Verbeeks Table 9.2, 3.74 and again, the null hypothesis
of no cointegrating relationship presence cannot be rejected. Verbeek concludes that
the sample period is just not long enough to find sufficient evidence for a cointegrating
relationship.
9.3
Long-run Purchasing Power Parity (Part 3) (Section

9.5.4)
The same data set of the preceding Section is considered.

The Johansens technique is applied to analyse the existence of one or more
cointegrating relationships between the three variables st , pt and pt .
398
Verbeek suggests to temporary consider p = 3 as the maximum order of the lags in

the autoregressive representation he gives in relationship (9.42), that is:
~ t = + 1 Y
~t1 + 2 Y
~t2 + 3 Y
~t3 + ~t
Y
and to include the presence of unrestricted intercepts. He remarks that the first step
of the Johansens technique consists in determining the order p.
Later we will obtain maximum eigenvalue tests for cointegration can be obtained by
means of the function ca.jo available in the package urca, see Section 9.4, which
does not deal with unrestricted intercepts.
In order to perform the current analysis we can have recourse to the function
johansen available in the package oekfinm by Wolfgang Scherrer. The package was
built under R version 2.13.1, so You need that version of R to apply the johansen
function (R version 2.13.1 can be installed in a new folder). The arguments of the
johansen function are y: the data to analyse; r an integer specifying the cointegration
rank; p the order of the VAR; det the deterministic component, which can be "none"
or "const" or "linear"; and restricted which is a logical variable specifying if
restrictions have to be imposed on the deterministic components.
Eigenvalues and trace tests are the elements lambda and tracestats in the list
produced by the function johansen.
>
>
>
>
>
sppstar <- ppp[, c("FXEP", "LOGCPIEURO", "LOGCPIUK")]

sppstar[, 1] <- log(sppstar[, 1])
library(oekfinm)
tab9.10<-johansen(sppstar,r=0,p=3,det="const",restr=FALSE)
tab9.10$tracestat
stat
90%
95%
99%
r=0 37.520270 26.70 29.38 34.87
r=1 10.761479 13.31 15.34 19.69
r=2 4.053898 2.71 3.84 6.64
> tab9.10$lambda
r=1
r=2
r=3
0.09336701 0.02427051 0.01473973
Results are consistent with those reported by Verbeek. Verbeek observes that the
test results may be sensitive to the number of lags that are included and shows what
happens by including a lag of p = 13, see Verbeeks Table 9.11.
> tab9.11<-johansen(sppstar,r=0,p=13,det="const",restr=FALSE)
> tab9.11$tracestat
stat
90%
95%
99%
r=0 25.386442 26.70 29.38 34.87
r=1 8.444536 13.31 15.34 19.69
r=2 3.358870 2.71 3.84 6.64
> tab9.11$lambda
r=1
r=2
r=3
0.06238690 0.01915137 0.01269016
The test in this case does not reject the null of presence of 0 cointegrating vectors.
9.4
399
Money Demand and Inflation (Section 9.6)
Verbeek considers an empirical example to study cointegration in a five-dimensional

vector process.
Quarterly data are considered for the United States from 1954:1 to 1994:4 (T = 164)
for the following variables:
cpr commercial paper rate (in % per year)
infl quarterly inflation rate (change in log prices), % per year
m log of real M1 money stock
quarter quarter of observation (yyyyqq)
tbr treasury bill rate
y log real GDP (in billions of 1987 dollars).
They can be read by means of the function readEViews, available in the package
hexView, having extracted the work file of EViews money.wf1 from the compressed
archive ch09.zip. We can create a multiple time series from the columns of a table
or a data.frame with the function ts(object, start, frequency); in this case we
have to specify frequency=4 since we are dealing with quarterly data.
We drop the column with the time reference.
> library(hexView)
> money <- readEViews(unzip("ch09.zip", "Chapter 9/money.wf1"))
> money <- money[, -4]
> money <- ts(money, start = c(1954, 1), frequency = 4)
Verbeek considers first three theoretical relationships governing the long-run
behaviour of these variables, that can be assumed as theoretical cointegrating
relationships. Verbeek performs three separate OLS estimates. The parameter
estimates for the equations describing the money demand, the inflation rate and
the commercial paper rate:
mt = 1 + 14 yt + 15 tbrt + 1t
inf lt = 2 + 25 tbrt + 2t
cprt = 3 + 35 tbrt + 3t
are obtained by means of the function dynlm available in the package dynlm and the
results organized by the function mtable available in the package memisc.
>
>
>
>
>
library(dynlm)
library(lmtest)
library(urca)
demols <- dynlm(M ~ Y + TBR, data = money)
inflols <- dynlm(INFL ~ TBR, data = money)
400
> commpapols <- dynlm(CPR ~ TBR, data = money)

> library(memisc)
> mtable(demols, inflols, commpapols, summary.stats = "R-squared")
Calls:
demols: dynlm(formula = M ~ Y + TBR, data = money)
inflols: dynlm(formula = INFL ~ TBR, data = money)
commpapols: dynlm(formula = CPR ~ TBR, data = money)
=============================================
demols
inflols
commpapols
--------------------------------------------(Intercept)
3.189***
1.158***
0.461***
(0.122)
(0.333)
(0.065)
Y
0.423***
(0.016)
TBR
-0.031***
0.558***
1.038***
(0.002)
(0.053)
(0.010)
--------------------------------------------R-squared
0.815
0.409
0.984
=============================================
The cointegrating regression Durbin-Watson (CRDW) statistic and the residual-based
unit root ADF statistic for no cointegration (with constant term) can be computed
by using the functions dwtest and ur.df; remember to check Verbeeks Tables 9.2
and 9.3 for the proper critical values.
> dwtest(demols)$stat
DW
0.1989882
> ur.df(demols$res, type = "drift", lags = 6)@teststat[1]
[1] -3.163297
> dwtest(inflols)$stat
DW
0.7841199
> ur.df(inflols$res, type = "drift", lags = 6)@teststat[1]
[1] -1.915916
> dwtest(commpapols)$stat
DW
0.704969
> ur.df(commpapols$res, type = "drift", lags = 6)@teststat[1]
[1] -4.0456
The Durbin-Watson statistics, according to the critical values from Verbeeks Table
9.3, which are based on the assumption of random walks possibly not satisfied
by the money supply and GDP series , suggest to reject the null hypothesis of
401
no cointegration at the 5% level for the last two equations, while, according to the
residual-based unit root ADF statistic for no cointegration (with constant term), the
hypothesis of no cointegration does not hold only for the third equation, describing
risk premium5 .
Verbeek suggests to perform a multivariate vector analysis to have stronger evidence
for the existence of cointegrating relationships between the five variables. He does
also perform a graphical analysis of the residuals of the three equations to check for
stationarity, see Figures 9.2, 9.3 and 9.4.
> plot(demols$res, type = "l")
> abline(h = 0)
> plot(inflols$res, type = "l")
> abline(h = 0)
> plot(commpapols$res, type = "l")
> abline(h = 0)
To perform the Johansen procedure, the maximum length p in the vector
autoregressive model has to be chosen. Verbeek suggests the orders 5 and 6 and
reports in Table 9.14 the trace and maximum eigenvalue tests for cointegration.
The maximum eigenvalue tests for cointegration can be obtained by means of the
function ca.jo available in the package urca, see Verbeeks Table 9.106 . The main
arguments of the function ca.jo are: x, the data matrix to be investigated for
cointegration; type, the test to be conducted, either "eigen" or "trace"; ecdet,
which can be set to "none" for no intercept in cointegration, "const" for constant
term in cointegration and "trend" for trend variable in cointegration; K, the lag
order of the series (levels) in the VAR, spec which determines the specification of
the VECM and can be "longrun" or "transitory". See the help ?urca::ca.jo for
more information.
> allvar <- money[, c("M", "INFL", "CPR", "Y", "TBR")]
> library(urca)
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 5,
spec = "longrun"))
######################
# Johansen-Procedure #
5 Remember that both the CRDW and ADF statistics are null if a unit root is present, that is
under the null hypothesis of no cointegration. Appropriate 5% critical values from Verbeeks Table
9.2 for the ADF test are 3.34 for possible cointegrating relationships involving 2 variables and
3.74 for possible cointegrating relationships involving 3 variables; while with regard to the CRDW
test from Verbeeks Table 9.3 the 5% critical values are 0.20 for possible cointegrating relationships
involving 2 variables and 0.25 for possible cointegrating relationships involving 3 variables (number
of observations: 200).
6 Critical values for the max-eigenvalue test are taken from Osterwald-Lenum (1992). Though quite
similar, they differ somewhat from the critical values reported by Verbeek in Table 9.9. Observe that
tests are reversed with respect to Verbeeks output.
402
0.00
0.05
0.10
0.15
demols$res
0.05
0.10
0.15
1960
1970
1980
1990
Time
Figure 9.2
Residuals of money demand regression
######################
Test type: trace statistic,
without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
Values of teststatistic and critical values of test:
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
test
|
3.60
| 11.93
| 29.17
| 59.41
| 104.26
10pct
7.52
17.85
32.00
49.65
71.86
5pct
9.24
19.96
34.91
53.12
76.07
1pct
12.97
24.60
41.07
60.16
84.45
403
2
0
2
4
inflols$res
1960
1970
1980
1990
Time
Figure 9.3
Residuals of Fisher regression
Eigenvectors, normalised to first column:

(These are the cointegration relations)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5 constant
M.l5
1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
INFL.l5
0.027280 0.033616 0.018488 0.262027 -0.199582 -0.019474
CPR.l5
-0.070161 -1.144488 0.016785 -0.303908 -0.069297 -0.032074
Y.l5
-0.432566 -0.624159 -0.565694 -0.669171 -1.419789 -0.122782
TBR.l5
0.099270 1.210553 0.021598 0.046574 0.212904 0.045646
constant -3.282767 -0.923727 -2.172837 -0.326824 4.972280 -5.362733
Weights W:
(This is the loading matrix)
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5
M.d
-0.009147 0.0047624 -0.0262606 -0.0013178 -0.00076057
INFL.d -2.819053 -1.3318367 -2.7246923 0.1724486 0.26087439
constant
4.9488e-14
8.2817e-12
404
1.0
0.5
0.0
0.5
commpapols$res
1.5
2.0
2.5
1960
1970
1980
1990
Time
Figure 9.4
Residuals of risk premium regression
CPR.d
2.351476 0.2971658 -1.5680125 0.2378696 -0.01334883 -2.3114e-12
Y.d
-0.053129 0.0022981 -0.0055104 0.0017462 -0.00051361 8.3207e-14
TBR.d
1.749904 -0.0049994 -1.5903476 0.2031231 -0.06967642 -7.2022e-13
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 6,
spec = "longrun"))
######################
######################
Test type: trace statistic,
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
test
|
2.76
| 14.66
| 36.66
| 71.35
| 120.86
10pct
7.52
17.85
32.00
49.65
71.86
5pct
9.24
19.96
34.91
53.12
76.07
405
1pct
12.97
24.60
41.07
60.16
84.45

M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.l6
1.000000 1.0000000 1.000000 1.000000 1.000000 1.0000000
INFL.l6
0.027522 0.0048977 0.017315 0.619718 -0.098270 -0.0049827
CPR.l6
-0.128709 0.4810277 -0.067558 -0.077271 -0.072337 -0.1438921
Y.l6
-0.439317 -0.3650629 -0.537060 -1.773107 -0.716382 0.1387489
TBR.l6
0.158646 -0.4620892 0.107309 -0.419051 0.130337 0.1247853
constant -3.188704 -4.0767044 -2.332719 8.273357 -0.491810 -7.2161580
Weights W:
M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.d
-0.0033322 -0.018747 -0.0138332 -0.00072754 -0.00139569 2.9591e-14
INFL.d -3.3139036 1.641475 -6.2992151 0.07074960 0.49103886 7.1952e-12
CPR.d
3.2451187 -0.653724 -1.0908952 0.13929680 0.01714385 -1.9459e-12
Y.d
-0.0536283 -0.012044 0.0023838 0.00107612 0.00019001 4.2005e-14
TBR.d
1.8023190 -0.224897 -1.6813019 0.12665321 -0.08587212 -1.1704e-13
> summary(ca.jo(allvar, ecdet = "const", type = "eigen", K = 5,
spec = "longrun"))
######################
######################
Test type: maximal eigenvalue statistic (lambda max),
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
test 10pct 5pct 1pct
r <= 4 | 3.60 7.52 9.24 12.97
r <= 3 | 8.32 13.75 15.67 20.20
r <= 2 | 17.25 19.77 22.00 26.81
406
r <= 1 | 30.24 25.56 28.14 33.24

r = 0 | 44.86 31.66 34.40 39.79
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5 constant
M.l5
1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
INFL.l5
0.027280 0.033616 0.018488 0.262027 -0.199582 -0.019474
CPR.l5
-0.070161 -1.144488 0.016785 -0.303908 -0.069297 -0.032074
Y.l5
-0.432566 -0.624159 -0.565694 -0.669171 -1.419789 -0.122782
TBR.l5
0.099270 1.210553 0.021598 0.046574 0.212904 0.045646
constant -3.282767 -0.923727 -2.172837 -0.326824 4.972280 -5.362733
Weights W:
M.l5
INFL.l5
CPR.l5
Y.l5
TBR.l5
constant
M.d
-0.009147 0.0047624 -0.0262606 -0.0013178 -0.00076057 4.9488e-14
INFL.d -2.819053 -1.3318367 -2.7246923 0.1724486 0.26087439 8.2817e-12
CPR.d
2.351476 0.2971658 -1.5680125 0.2378696 -0.01334883 -2.3114e-12
Y.d
-0.053129 0.0022981 -0.0055104 0.0017462 -0.00051361 8.3207e-14
TBR.d
1.749904 -0.0049994 -1.5903476 0.2031231 -0.06967642 -7.2022e-13
> summary(ca.jo(allvar, ecdet = "const", type = "eigen", K = 6,
spec = "longrun"))
######################
######################
Test type: maximal eigenvalue statistic (lambda max),
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
|
|
|
|
|
test
2.76
11.90
22.00
34.69
49.51
10pct
7.52
13.75
19.77
25.56
31.66
5pct
9.24
15.67
22.00
28.14
34.40
1pct
12.97
20.20
26.81
33.24
39.79
407

M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.l6
1.000000 1.0000000 1.000000 1.000000 1.000000 1.0000000
INFL.l6
0.027522 0.0048977 0.017315 0.619718 -0.098270 -0.0049827
CPR.l6
-0.128709 0.4810277 -0.067558 -0.077271 -0.072337 -0.1438921
Y.l6
-0.439317 -0.3650629 -0.537060 -1.773107 -0.716382 0.1387489
TBR.l6
0.158646 -0.4620892 0.107309 -0.419051 0.130337 0.1247853
constant -3.188704 -4.0767044 -2.332719 8.273357 -0.491810 -7.2161580
Weights W:
M.l6
INFL.l6
CPR.l6
Y.l6
TBR.l6
constant
M.d
-0.0033322 -0.018747 -0.0138332 -0.00072754 -0.00139569 2.9591e-14
INFL.d -3.3139036 1.641475 -6.2992151 0.07074960 0.49103886 7.1952e-12
CPR.d
3.2451187 -0.653724 -1.0908952 0.13929680 0.01714385 -1.9459e-12
Y.d
-0.0536283 -0.012044 0.0023838 0.00107612 0.00019001 4.2005e-14
TBR.d
1.8023190 -0.224897 -1.6813019 0.12665321 -0.08587212 -1.1704e-13
Verbeek suggests to restrict the rank of the long-run cointegrating matrix to be equal
to 2 and obtains maximum likelihood estimates of the cointegrating vectors and
the error correction model. In the package urca the function cajorls is available
which performs ordinary least squares regression of a restricted VECM. To obtain
the restricted model in Verbeeks Table 9.15 we have first to change the order of
the columns in allvar since restrictions will be referred to the first 2 variables, reapply the function ca.jo assigning the results to an object, and then use the function
cajorls with the argument r=2 establishing the number of restrictions.
> allvar <- allvar[, c(1, 3, 2, 4, 5)]
> restr.coint <- ca.jo(allvar, ecdet = "const", type = "eigen",
K = 6, spec = "longrun")
> restr.coint1 <- cajorls(restr.coint, r = 2)
The element beta in restr.coint1, see str(restr.coint1), contains the OLS
estimates of cointegrating vectors (after normalization) in a restricted VECM.
Observe that summary(cajools(restr.coint)) and summary(restr.coint1$rlm)
return the OLS regressions respectively of the unrestricted and restricted VECM.
> round(restr.coint1$beta, 3)
ect1
ect2
M.l6
1.000 0.000
CPR.l6
0.000 1.000
INFL.l6
0.023 -0.037
Y.l6
-0.424 0.122
TBR.l6
0.028 -1.018
constant -3.376 -1.456
408
The matrix of adjustment coefficients can be obtained with

> restr.coint1$rlm$coef[1:2, ]
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 -0.0220795 2.59139 -1.6724 -0.0656728 1.57742
ect2 -0.0085891 -0.73214 1.2161 0.0011087 -0.34016
Observe that signs are opposite to Verbeekss.
To obtain standard errors we can use:
> coef <- sapply(1:5, function(i) coeftest(cajorls(restr.coint,
r = 2, reg = i)$rlm)[1:2, 1])
> tstat <- sapply(1:5, function(i) coeftest(cajorls(restr.coint,
r = 2, reg = i)$rlm)[1:2, 2])
> colnames(coef) <- colnames(tstat) <- paste(colnames(allvar),
".d", sep = "")
> coef
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 -0.0220795 2.59139 -1.6724 -0.0656728 1.57742
ect2 -0.0085891 -0.73214 1.2161 0.0011087 -0.34016
> tstat
M.d
CPR.d INFL.d
Y.d
TBR.d
ect1 0.0109021 1.19861 2.36702 0.0126428 1.08157
ect2 0.0025669 0.28221 0.55731 0.0029767 0.25465
10
Models based on panel data
10.1
Explaining Individual Wages (Section 10.3)
Verbeek considers the application of the Between, the Fixed effects, the OLS and of
the Random effects estimators to deal with a panel data linear model for an individual
wage equation. The data are saved in the file males.dta, which is in the Stata format
and is available in the compressed archive ch10.zip. To read data we have first to
invoke the package foreign and next the command read.dta.
> library(foreign)
> wages <- read.dta(unzip("ch10.zip", "Chapter 10/males.dta"))
Data are taken from the Youth Sample of the National Longitudinal Survey held in
the USA, and comprise a sample of 545 full-time working males who completed their
schooling by 1980 and were then followed over the period 1980-1987. The males in the
sample are young, with an age in 1980 ranging from 17 to 23, and entered the labour
market fairly recently, with an average of 3 years of experience at the beginning of
the sample period.
NR: Observations number
YEAR: Year of observation
School: Years of schooling
Exper: Age-6-School
Exper2: Experience Squared
LogExper: Log(1+Experience)
Union: Wage set by collective bargaining
Mar: Married
Black: Black
Hisp: Hispanic
Health: Has health disability
Rural: Lives in rural area
NE: Lives in North East
410
NC: Lives in Northern Central
S: Lives in South
Wage: Log of hourly wage
12 Dummy variables for the different industries
9 Dummy variables for the occupational status
Verbeek supposes that log wages are explained by years of schooling, years of
experience and its square, dummy variables for being a union member, working in
the public sector and being married and two racial dummies.
The package plm can be used to deal with models based on panel data; the plm
procedures are thoroughly described in Croissant and Millo (2008).
A data.frame containing panel data must be characterized by the presence of two
variables defining respectively individual and time indices. The first two columns in
the data.frame wages contain such information.
> wages[1:5, 1:2]
NR YEAR
1 13 1982
2 13 1981
3 13 1986
4 13 1983
5 13 1984
Repeated measurements on each statistical unit are not ordered according to time;
the following code can be used to reorder the data frame when needed.
> i <- order(wages[, 1], wages[, 2])
> wages <- wages[i, ]
> wages[1:5, 1:2]
NR YEAR
7 13 1980
2 13 1981
1 13 1982
4 13 1983
5 13 1984
The panel model may be estimated with the function plm which requires the following
arguments:
formula: the usual formula for a linear model.

Instrumental variables estimation may be obtained by using a two-part formula,
the second part indicating the instrumental variables. To specify that y depends
on x1, x2 and x3, with x1 and x2 endogenous and z1 and z2 external
instruments, use:
formula = y ~ x1 + x2 + x3 | x3 + z1 + z2,
or
formula = y ~ x1 + x2 + x3 | . - x1 - x2 + z1 + z2;
411
data, subset and na.action specify respectively the data.frame, a possible

subset to analyze and the missing value treatment;
effect specifies the kind of effect to introduce in the model; the argument may
assume one of the values individual, time or twoways;
model specifies the kind of model: within, random, ht, between, pooling or
fd; the options are referred respectively to the within or fixed effects estimator,
to the random effects estimator, to the Hausman Taylor estimator, the between
effects estimator, to the pooling estimator which is equivalent to OLS and to
the first-difference estimator.
random.method defines the method of estimation of the variance components in

the random effects model; the options are swar, walhus, amemiya and nerlove;
inst.method defines the instrumental variable transformation (bvk or

baltagi),
index defines the indices for the individual and the time; it needs to be used
whenever these two variables are not placed in the first two columns of the
data.frame we are analysing.
See the help ?plm::plm for more information.

With the following instructions the parameter estimates are obtained for the models
presented by Verbeeks in Table 10.2, that is a between, an OLS and a random effects
model.
> library(plm)
> between <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "between")
> fixed <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "within")
> ols <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "pooling")
> random <- plm(WAGE ~ SCHOOL + EXPER + EXPER2 + UNION +
MAR + BLACK + HISP + PUB, data = wages, model = "random")
We can inspect the output for each different model separately by using the function
summary.
The functions pvcovHC or vcovHC.plm, vcovBK and vcovSCC can be used to obtain
robust parameter covariance matrix estimators for the fixed effects, random effects
and pooling models. The arguments of pvcovHC which computes robust covariance
matrices a la White are x: an object of class "plm" which should be the result of
a random effects or a within model or a model of class "pgmm"; method: one of
"arellano","white1","white2"; type: one of "HC0", "HC1", "HC2", "HC3", "HC4";
cluster: one of "group" or "time". vcovBK and vcovSCC respectively compute
unconditional robust parameter covariance matrix estimators a la Beck and Katz for
panel models and nonparametric robust covariance matrix estimators a la Driscoll
and Kraay for panel models with cross-sectional and serial correlation. See the R
help system for more information.
412
Output for the between effects estimation method

> summary(between)
Oneway (individual) effect Between Model
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "between")
Balanced Panel: n=545, T=8, N=4360
Residuals :
Min. 1st Qu.
-1.130 -0.240
Median 3rd Qu.

0.028
0.232
Max.
1.740
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 0.4903902 0.2211917 2.2170 0.0270394 *
SCHOOL
0.0947911 0.0109178 8.6822 < 2.2e-16 ***
EXPER
-0.0502077 0.0503689 -0.9968 0.3193120
EXPER2
0.0051068 0.0032142 1.5888 0.1126871
UNION
0.2743194 0.0471273 5.8208 1.009e-08 ***
MAR
0.1445897 0.0412654 3.5039 0.0004968 ***
BLACK
-0.1391368 0.0489084 -2.8448 0.0046132 **
HISP
0.0054832 0.0427436 0.1283 0.8979738
PUB
-0.0563215 0.1090691 -0.5164 0.6057992
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
83.06
Residual Sum of Squares: 64.819
R-Squared
: 0.2196
Adj. R-Squared : 0.21598
Output for the fixed effects or within estimation method
> summary(fixed)
Oneway (individual) effect Within Model
Call:
BLACK + HISP + PUB, data = wages, model = "within")
Residuals :
Min. 1st Qu.

-4.17000 -0.12600
Median
0.00992
3rd Qu.
0.15900
Max.
1.47000
Coefficients :
EXPER
0.11645699 0.00843090 13.8131 < 2.2e-16
EXPER2 -0.00428857 0.00060544 -7.0834 1.668e-12
UNION
0.08120303 0.01931592 4.2039 2.683e-05
MAR
0.04510613 0.01831141 2.4633
0.01381
PUB
0.03492672 0.03860819 0.9046
0.36571
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05
***
***
***
*
"." 0.1 " " 1

572.05
R-Squared
: 0.17822
> library(lmtest)
> coeftest(fixed, vcov = pvcovHC(fixed))
EXPER
0.11645699 0.01070551 10.8782 < 2.2e-16
EXPER2 -0.00428857 0.00068517 -6.2591 4.301e-10
UNION
0.08120303 0.02270999 3.5757 0.0003537
MAR
0.04510613 0.02096824 2.1512 0.0315259
PUB
0.03492672 0.03762350 0.9283 0.3532994
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05
***
***
***
*
"." 0.1 " " 1
Output for the OLS estimation method

> summary(ols)
Oneway (individual) effect Pooling Model
Call:
BLACK + HISP + PUB, data = wages, model = "pooling")
Residuals :
Min. 1st Qu.
-5.2700 -0.2490
Coefficients :
Median 3rd Qu.

0.0332 0.2960
Max.
2.5600
413
414
Estimate Std. Error t-value

(Intercept) -0.03437245 0.06467230 -0.5315
SCHOOL
0.09936782 0.00468289 21.2194
EXPER
0.08913805 0.01012149 8.8068
EXPER2
-0.00284682 0.00070771 -4.0226
UNION
0.17990427 0.01721460 10.4507
MAR
0.10762116 0.01570528 6.8525
BLACK
-0.14382268 0.02356305 -6.1037
HISP
0.01565030 0.02081966 0.7517
PUB
0.00354615 0.03747396 0.0946
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
Pr(>|t|)
0.5951
< 2.2e-16
< 2.2e-16
5.854e-05
< 2.2e-16
8.271e-12
1.126e-09
0.4523
0.9246
***
***
***
***
***
***
0.05 "." 0.1 " " 1

1236.5
R-Squared
: 0.18659
> coeftest(ols, vcov = pvcovHC(ols))
(Intercept) -0.0343724 0.1201077 -0.2862 0.774754
SCHOOL
0.0993678 0.0092085 10.7909 < 2.2e-16 ***
EXPER
0.0891380 0.0124250 7.1741 8.514e-13 ***
EXPER2
-0.0028468 0.0008687 -3.2771 0.001057 **
UNION
0.1799043 0.0274501 6.5539 6.259e-11 ***
MAR
0.1076212 0.0260702 4.1281 3.726e-05 ***
BLACK
-0.1438227 0.0500258 -2.8750 0.004060 **
HISP
0.0156503 0.0391447 0.3998 0.689319
PUB
0.0035462 0.0501168 0.0708 0.943594
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Output for the random effects estimation method
> summary(random)
Oneway (individual) effect Random Effect Model
(Swamy-Arora's transformation)
Call:
BLACK + HISP + PUB, data = wages, model = "random")
Effects:
415
var std.dev share

idiosyncratic 0.1234 0.3513 0.539
individual
0.1055 0.3248 0.461
theta: 0.6429
Residuals :
Min. 1st Qu.
-4.5800 -0.1450
Median 3rd Qu.

0.0234 0.1860
Max.
1.5400
Coefficients :
Estimate Std. Error t-value
(Intercept) -0.10431133 0.11083404 -0.9411
SCHOOL
0.10102372 0.00892187 11.3232
EXPER
0.11178514 0.00827093 13.5154
EXPER2
-0.00405745 0.00059198 -6.8540
UNION
0.10641339 0.01786690 5.9559
MAR
0.06254646 0.01677617 3.7283
BLACK
-0.14400263 0.04764392 -3.0225
HISP
0.01972690 0.04263026 0.4627
PUB
0.03015546 0.03646707 0.8269
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
Pr(>|t|)
0.3466808
< 2.2e-16
< 2.2e-16
8.189e-12
2.791e-09
0.0001952
0.0025218
0.6435709
0.4083261
***
***
***
***
***
**
0.05 "." 0.1 " " 1

656.8
R-Squared
: 0.17837
> coeftest(random, vcov = pvcovHC(random))
(Intercept) -0.10431133 0.11498178 -0.9072
SCHOOL
0.10102372 0.00888409 11.3713
EXPER
0.11178514 0.01052801 10.6179
EXPER2
-0.00405745 0.00067325 -6.0266
UNION
0.10641339 0.02080771 5.1141
MAR
0.06254646 0.01896620 3.2978
BLACK
-0.14400263 0.05018606 -2.8694
HISP
0.01972690 0.03987080 0.4948
PUB
0.03015546 0.03378064 0.8927
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
Pr(>|t|)
0.3643519
< 2.2e-16
< 2.2e-16
1.813e-09
3.287e-07
0.0009823
0.0041327
0.6207870
0.3720755
***
***
***
***
***
**
0.05 "." 0.1 " " 1
To obtain a unique summary output for the results of different objects of class plm
class, we have to invoke the package tonymisc which allows the function mtable in
416
the package memisc to work also in presence of objects of class plm.

> library(tonymisc)
> library(memisc)
> mtable(between, fixed, ols, random)
Calls:
between: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "between")
fixed: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "within")
ols: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "pooling")
random: plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "random")
========================================================
between
fixed
ols
random
-------------------------------------------------------(Intercept)
0.490*
-0.034
-0.104
(0.221)
(0.065)
(0.111)
SCHOOL
0.095***
0.099*** 0.101***
(0.011)
(0.005)
(0.009)
EXPER
-0.050
0.116*** 0.089*** 0.112***
(0.050)
(0.008)
(0.010)
(0.008)
EXPER2
0.005
-0.004*** -0.003*** -0.004***
(0.003)
(0.001)
(0.001)
(0.001)
UNION
0.274*** 0.081*** 0.180*** 0.106***
(0.047)
(0.019)
(0.017)
(0.018)
MAR
0.145*** 0.045*
0.108*** 0.063***
(0.041)
(0.018)
(0.016)
(0.017)
BLACK
-0.139**
-0.144*** -0.144**
(0.049)
(0.024)
(0.048)
HISP
0.005
0.016
0.020
(0.043)
(0.021)
(0.043)
PUB
-0.056
0.035
0.004
0.030
(0.109)
(0.039)
(0.037)
(0.036)
-------------------------------------------------------R-squared
0.220
0.178
0.187
0.178
adj. R-squared
0.216
0.156
0.186
0.178
F (omnibus)
18.854
165.256
124.759
118.070
p-val (omnibus)
0.000
0.000
0.000
0.000
N
545
4360
4360
4360
========================================================
The function ercomp returns the variance components from a random effects panel
model and returns the estimate of here named theta, see Verbeek p. 382.
417
> ercomp(random)
var std.dev share
idiosyncratic 0.1234 0.3513 0.539
individual
0.1055 0.3248 0.461
theta: 0.6429
The Hausman test can be performed by having recourse to the function phtest
> phtest(fixed, random)
Hausman Test
data: WAGE~SCHOOL + EXPER + EXPER2 + UNION + MAR + BLACK + HISP + PUB
chisq = 31.7531, df = 5, p-value = 6.649e-06
alternative hypothesis: one model is inconsistent
Observe that the function plm returns only:
the between R2 for the Between estimator,
the within R2 for the Fixed effects and the Random Effects estimators,
the overall R2 for the OLS estimator.
It is possible to compute the three goodness of fit statistics for all the estimators by
having recourse to Verbeeks relationships (10.29)-(10.31).

FE

2
Rwithin
F E = corr2 yit
yiF E , yit yi
FE
where yit
yiF E = (xit x
i )0 F E )

2
Rbetween
B = corr2 yiB , yi
where yiB = x
0i B
2
= corr2 {
Roverall
()
yit , yit }
where yit = x
0it b With regard to the between estimator we first need the complete
data for the variables involved in the between model (that is the model matrix for
the OLS estimator), say XdataOLS:
> XdataOLS <- model.matrix(ols)
then we have
>
>
>
>
yit.hat <- XdataOLS %*% coef(between)

yi.hat <- tapply(yit.hat, wages$NR, mean)
yi.bar <- tapply(wages$WAGE, wages$NR, mean)
yiB.hat <- model.matrix(between) %*% coef(between)
Observe that in this case we have a regular panel (that is complete time data for each
statistical unit are available). The goodness of fit statistics result:
418
> (Between.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),

(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.04698327
> (Between.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.2196042
> (Between.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1371153
With regard to the fixed effects estimator let Xdata be the complete data for the
variables involved in the corresponding model (which is a subset of the columns of
the model matrix for the OLS estimator):
> Xdata <- model.matrix(ols)[, c("EXPER", "EXPER2",
"UNION", "MAR", "PUB")]
and
>
>
>
>
>
yit.hat <- Xdata %*% coef(fixed)

ff <- function(i) tapply(Xdata[, i], wages$NR, mean)
modelmatrix <- sapply(1:dim(model.matrix(fixed))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(fixed)
The corresponding goodness of fit statistics result:
> (Fixed.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
[,1]
[1,] 0.1782206
> (Fixed.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.0005952516
> (Fixed.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.06416925
For the OLS estimator we have:
>
>
>
>
yit.hat <- XdataOLS %*% coef(ols)

ff <- function(i) tapply(model.matrix(ols)[, i],
419
wages$NR, mean)
> modelmatrix <- sapply(1:dim(model.matrix(ols))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(ols)
with the corresponding goodness of fit statistics
> (OLS.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
[,1]
[1,] 0.1679288
> (OLS.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.2026535
> (OLS.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1865882
And finally, for the random effects estimator:
>
>
>
>
yit.hat <- XdataOLS %*% coef(random)

ff <- function(i) tapply(model.matrix(random)[, i],
wages$NR, mean)
> modelmatrix <- sapply(1:dim(model.matrix(random))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(random)
with the corresponding goodness of fit statistics
> (random.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
[,1]
[1,] 0.1776096
> (random.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.183495
> (random.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1807709
10.2
Explaining Capital Structure (Section 10.5)
Verbeek applies a dynamic linear panel model to the Flannery and Rangan (2006)
theory for explaining adjustments performed by firms to reach their target capital
structure.
420
Data are available in the file debtratio, a Stata file containing information on
US firms over the years 1987-2001. The panel is unbalanced. Data are taken from
Compustat.
> library(foreign)
> debtratio <- read.dta(unzip("ch10.zip", "Chapter 10/debtratio.dta"))
We can check the structure of the panel data with the function pdim available in the
package plm
> library(plm)
> pdim(debtratio)
Unbalanced Panel: n=5449, T=1-16, N=27762
The following variables are available. (Except for bdr and mdr all variables are already
lagged and refer to the (end of the) previous year).
gvkey: firm identification number
yeara: year of observation
mdr: market debt ratio
bdr: book debt ratio
lagebit_ta: earnings before interest and taxes / total assets (lagged)
lagmb: ratio of market value to book value of assets
lagdep_ta: depreciation expenses / total assets
laglnta: log of total assets
lagfa_ta: fixed assets / total assets
lagrd_dum: dummy, 1 if rd_da is missing
lagrd_ta: R&D expenditures / total assets
lagindmedian: industry median debt ratio
lagrated: dummy, 1 if firm has public debt rating
In Verbeeks Table 10.3 results pertaining the OLS, the within effects and the firstdifference estimators are reported. Robust standard errors have been computed.
Output for the OLS estimation method
> ols <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated, model = "pooling",
data = debtratio)
> summary(ols)
Call:
421
plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +

laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "pooling")
Residuals :
Min. 1st Qu. Median 3rd Qu.
-0.7840 -0.0593 -0.0225 0.0558
Max.
0.7690
Coefficients :
(Intercept)
0.05818177 0.01089409
5.3407 9.364e-08 ***
lag(mdr, 1)
0.88350360 0.00455677 193.8880 < 2.2e-16 ***
lagebit_ta
-0.03233775 0.00570742 -5.6659 1.483e-08 ***
lagmb
0.00164320 0.00078139
2.1029 0.0354844 *
lagdep_ta
-0.26051795 0.03346611 -7.7845 7.344e-15 ***
laglnta
-0.00067042 0.00060575 -1.1068 0.2684048
lagfa_ta
0.02012146 0.00514792
3.9087 9.312e-05 ***
lagrd_dum
0.00688957 0.00202285
3.4059 0.0006609 ***
lagrd_ta
-0.12020508 0.01423761 -8.4428 < 2.2e-16 ***
lagindmedian 0.03212249 0.00910841
3.5267 0.0004218 ***
lagrated
0.00713406 0.00291144
2.4504 0.0142803 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
1184.8
R-Squared
: 0.74093
> library(lmtest)
> coeftest(ols, vcov = pvcovHC)
(Intercept)
lag(mdr, 1)
lagebit_ta
lagmb
lagdep_ta
laglnta
lagfa_ta
lagrd_dum
lagrd_ta
lagindmedian
Estimate
0.05818177
0.88350360
-0.03233775
0.00164320
-0.26051795
-0.00067042
0.02012146
0.00688957
-0.12020508
0.03212249
Std. Error t value Pr(>|t|)

0.01073874
5.4179 6.100e-08 ***
0.00530071 166.6764 < 2.2e-16 ***
0.00667448 -4.8450 1.276e-06 ***
0.00067849
2.4219 0.0154502 *
0.03511707 -7.4186 1.232e-13 ***
0.00060327 -1.1113 0.2664423
0.00556306
3.6170 0.0002988 ***
0.00215206
3.2014 0.0013699 **
0.01293281 -9.2946 < 2.2e-16 ***
0.00975079
3.2943 0.0009883 ***
422
lagrated
0.00713406 0.00279809
2.5496 0.0107916 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Output for the Within estimation method
> within <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagrd_ta + lagindmedian + lagrated, model = "within",
data = debtratio)
> summary(within)
Oneway (individual) effect Within Model
Call:
plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
lagrated, data = debtratio, model = "within")
Residuals :
Min. 1st Qu.
Median
-0.61700 -0.04940 -0.00208
3rd Qu.
0.04310
Max.
0.57600
Coefficients :
lag(mdr, 1)
5.3498e-01 7.6646e-03 69.7992 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 8.0860e-03 -6.1876 6.260e-10 ***
lagmb
2.2776e-03 1.1358e-03 2.0052
0.04495 *
lagdep_ta
-1.2395e-01 5.7544e-02 -2.1541
0.03125 *
laglnta
3.8030e-02 2.0593e-03 18.4678 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.2635e-02 4.6969 2.664e-06 ***
lagrd_dum
5.9768e-05 5.8840e-03 0.0102
0.99190
lagrd_ta
-6.5676e-02 2.7093e-02 -2.4241
0.01536 *
lagindmedian 1.6722e-01 1.8959e-02 8.8201 < 2.2e-16 ***
lagrated
2.0590e-02 4.6521e-03 4.4259 9.670e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
307.38
R-Squared
: 0.3404
> coeftest(within, vcov = pvcovHC)
423
lag(mdr, 1)
5.3498e-01 1.1903e-02 44.9438 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 1.1097e-02 -4.5085 6.575e-06 ***
lagmb
2.2776e-03 1.0083e-03 2.2589 0.0239022 *
lagdep_ta
-1.2395e-01 7.0913e-02 -1.7480 0.0804852 .
laglnta
3.8030e-02 3.0676e-03 12.3974 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.7073e-02 3.4759 0.0005104 ***
lagrd_dum
5.9768e-05 8.0735e-03 0.0074 0.9940935
lagrd_ta
-6.5676e-02 2.6391e-02 -2.4886 0.0128350 *
lagindmedian 1.6722e-01 2.2355e-02 7.4800 7.823e-14 ***
lagrated
2.0590e-02 5.8272e-03 3.5334 0.0004114 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
To obtain the first-difference we need to have recourse to the following trick by
applying OLS to the differenced series since the argument model="fd" doesnt work
correctly, with the current version (1.3-1) of plm, on unbalanced data with holes and
the current data frame has some holes1 .
Output for the First-difference estimation method
> fdmod01 <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +
diff(lagindmedian) + diff(lagrated), model = "pooling",
data = debtratio)
> summary(fdmod01)
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated),
data = debtratio, model = "pooling")
Residuals :
Min. 1st Qu.
Median
-0.83700 -0.05340 -0.00987
3rd Qu.
0.05090
Max.
0.76400
1 The code should be

> fd <- plm(mdr
lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta + laglnta + lagfa_ta +
lagrd_dum + lagrd_ta + lagindmedian + lagrated, model = "fd", data = debtratio)
Results are consistent with those present in the third edition of Verbeeks book.
424
Coefficients :
(Intercept)
0.0088779 0.0010578
8.3927 < 2.2e-16 ***
diff(lag(mdr, 1)) -0.1138871 0.0093735 -12.1499 < 2.2e-16 ***
diff(lagebit_ta)
-0.0451704 0.0076222 -5.9261 3.169e-09 ***
diff(lagmb)
0.0027903 0.0011550
2.4157
0.01572 *
diff(lagdep_ta)
0.1095609 0.0659193
1.6620
0.09652 .
diff(laglnta)
0.0644041 0.0039349 16.3672 < 2.2e-16 ***
diff(lagfa_ta)
0.1055631 0.0157632
6.6968 2.206e-11 ***
diff(lagrd_dum)
-0.0170642 0.0078069 -2.1858
0.02885 *
diff(lagrd_ta)
-0.0592139 0.0278048 -2.1296
0.03322 *
diff(lagindmedian) 0.1815726 0.0250867
7.2378 4.781e-13 ***
diff(lagrated)
0.0094495 0.0063567
1.4865
0.13716
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
238.04
R-Squared
: 0.02813
> coeftest(fdmod01, vcov = pvcovHC)
(Intercept)
0.00887786 0.00094621 9.3825 < 2.2e-16
diff(lag(mdr, 1)) -0.11388713 0.01198604 -9.5016 < 2.2e-16
diff(lagebit_ta)
-0.04517037 0.01012320 -4.4621 8.176e-06
diff(lagmb)
0.00279029 0.00110282 2.5301
0.01141
diff(lagdep_ta)
0.10956092 0.07875898 1.3911
0.16422
diff(laglnta)
0.06440411 0.00510375 12.6190 < 2.2e-16
diff(lagfa_ta)
0.10556310 0.01796875 5.8748 4.323e-09
diff(lagrd_dum)
-0.01706422 0.00907665 -1.8800
0.06013
diff(lagrd_ta)
-0.05921391 0.02865177 -2.0667
0.03878
diff(lagindmedian) 0.18157256 0.02607573 6.9633 3.462e-12
diff(lagrated)
0.00944946 0.00656023 1.4404
0.14977
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " "
***
***
***
*
***
***
.
*
***
By having invoked the packages memisc and tonymisc the function mtable can be
used to collect results in a unique output
> mtable(ols, within)
Calls:
ols: plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
lagrated, data = debtratio, model = "pooling")
425
within: plm(formula = mdr~lag(mdr,1)+lagebit_ta+lagmb+lagdep_ta+

lagrated, data = debtratio, model = "within")
====================================
ols
within
-----------------------------------(Intercept)
0.058***
(0.011)
lag(mdr, 1)
0.884*** 0.535***
(0.005)
(0.008)
lagebit_ta
-0.032*** -0.050***
(0.006)
(0.008)
lagmb
0.002*
0.002*
(0.001)
(0.001)
lagdep_ta
-0.261*** -0.124*
(0.033)
(0.058)
laglnta
-0.001
0.038***
(0.001)
(0.002)
lagfa_ta
0.020*** 0.059***
(0.005)
(0.013)
lagrd_dum
0.007*** 0.000
(0.002)
(0.006)
lagrd_ta
-0.120*** -0.066*
(0.014)
(0.027)
lagindmedian
0.032*** 0.167***
(0.009)
(0.019)
lagrated
0.007*
0.021***
(0.003)
(0.005)
-----------------------------------R-squared
0.741
0.340
adj. R-squared
0.741
0.275
F (omnibus)
5594.769
814.653
p-val (omnibus)
0.000
0.000
N
19573
19573
====================================
Finally Verbeek obtains estimates for the current dynamic panel data model by using
the Anderson-Hiaso instrumental varaibles estimators and the Arellano-Bond GMM
estimators, which can be obtained by making use of the following code
Anderson-Hsiao instrumental variables2
instrumented diff(lag(mdr,1)), instrument diff(lag(mdr, 2))
> fd <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
2 Results
are consistent with those present in the third edition of Verbeeks book.
426

diff(lagindmedian) + diff(lagrated) | . - diff(lag(mdr,
1)) + diff(lag(mdr, 2)), model = "pooling", data = debtratio)
> summary(fd)
Instrumental variable estimation
(Balestra-Varadharajan-Krishnakumar's transformation)
Call:
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated) |
. - diff(lag(mdr, 1)) + diff(lag(mdr, 2)), data = debtratio,
model = "pooling")
Residuals :
Min. 1st Qu.
-4.2000 -0.3410
Median 3rd Qu.

0.0378 0.3680
Max.
7.1200
Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
diff(lagrd_dum)
diff(lagrd_ta)
diff(lagindmedian)
diff(lagrated)
Estimate Std. Error

-0.016634
0.018670
7.033029
5.494342
1.207597
0.970555
0.244267
0.185376
-1.858345
1.577202
-0.521408
0.455800
-1.091279
0.927732
-0.023127
0.056904
0.881936
0.759792
-3.377779
2.746221
-0.272466
0.222713
t-value Pr(>|t|)
-0.8910
0.3730
1.2800
0.2006
1.2442
0.2134
1.3177
0.1876
-1.1783
0.2387
-1.1439
0.2527
-1.1763
0.2395
-0.4064
0.6844
1.1608
0.2458
-1.2300
0.2187
-1.2234
0.2212

175.88
R-Squared
: 0.0080057
F-statistic: -1142.01 on 10 and 11721 DF, p-value: 1
> coeftest(fd, vcov = pvcovHC)
(Intercept)
diff(lag(mdr, 1))
Estimate Std. Error

-0.0166345 0.0058748
7.0330294 0.2525266
t value Pr(>|t|)
-2.8315 0.004641 **
27.8507 < 2.2e-16 ***
427
diff(lagebit_ta)
1.2075966 0.1054279 11.4542 < 2.2e-16 ***
diff(lagmb)
0.2442670 0.0183950 13.2790 < 2.2e-16 ***
diff(lagdep_ta)
-1.8583446 0.7100345 -2.6173 0.008875 **
diff(laglnta)
-0.5214084 0.0537632 -9.6982 < 2.2e-16 ***
diff(lagfa_ta)
-1.0912794 0.1782544 -6.1220 9.534e-10 ***
diff(lagrd_dum)
-0.0231265 0.0790461 -0.2926 0.769856
diff(lagrd_ta)
0.8819365 0.2833939
3.1121 0.001862 **
diff(lagindmedian) -3.3777791 0.2218749 -15.2238 < 2.2e-16 ***
diff(lagrated)
-0.2724663 0.0514317 -5.2976 1.194e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
with instrumental variables, with constant3
> fd <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagindmedian) + diff(lagrated) | . - diff(lag(mdr,
1)) + lag(mdr, 2), model = "pooling", data = debtratio)
> summary(fd)
Instrumental variable estimation
(Balestra-Varadharajan-Krishnakumar's transformation)
Call:
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated) |
. - diff(lag(mdr,1))+lag(mdr,2),data=debtratio,model="pooling")
Residuals :
Min. 1st Qu.
-1.5400 -0.0875
Median 3rd Qu.

0.0069 0.0937
Max.
1.3400
Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
3 Results
Estimate Std. Error

0.0015331 0.0018349
1.3581587 0.1294915
0.2027082 0.0249465
0.0468071 0.0042789
-0.2268575 0.1110867
-0.0532186 0.0121026
-0.1658858 0.0349078
t-value
0.8355
10.4884
8.1257
10.9391
-2.0422
-4.3973
-4.7521
Pr(>|t|)
0.403434
< 2.2e-16
4.791e-16
< 2.2e-16
0.041152
1.104e-05
2.032e-06
are consistent with those present in the third edition of Verbeeks book.
***
***
***
*
***
***
428
diff(lagrd_dum)
-0.0211894 0.0126926 -1.6694 0.095052 .
diff(lagrd_ta)
0.1265905 0.0480137 2.6365 0.008384 **
diff(lagindmedian) -0.5839094 0.0783180 -7.4556 9.432e-14 ***
diff(lagrated)
-0.0524240 0.0116591 -4.4964 6.963e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
238.04
R-Squared
: 0.0065943
F-statistic: -917.329 on 10 and 15028 DF, p-value: 1
> coeftest(fd, vcov = pvcovHC)
(Intercept)
0.0015331 0.0011090
1.3825
0.16684
diff(lag(mdr, 1))
1.3581587 0.0484357 28.0404 < 2.2e-16 ***
diff(lagebit_ta)
0.2027082 0.0212571
9.5360 < 2.2e-16 ***
diff(lagmb)
0.0468071 0.0032282 14.4996 < 2.2e-16 ***
diff(lagdep_ta)
-0.2268575 0.1495174 -1.5173
0.12922
diff(laglnta)
-0.0532186 0.0099653 -5.3404 9.410e-08 ***
diff(lagfa_ta)
-0.1658858 0.0360364 -4.6033 4.193e-06 ***
diff(lagrd_dum)
-0.0211894 0.0163588 -1.2953
0.19524
diff(lagrd_ta)
0.1265905 0.0495030
2.5572
0.01056 *
diff(lagindmedian) -0.5839094 0.0469868 -12.4271 < 2.2e-16 ***
diff(lagrated)
-0.0524240 0.0116359 -4.5054 6.675e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Arellano Bond one-step
The function pgmm can be used to perform generalized method of moments estimation
for static or dynamic models with panel data. The main argument are: formula: a
symbolic description for the model to be estimated. The prefered interface is now
to indicate a multi-part formula, the first two parts describing the covariates and
the gmm instruments and, if any, the third part the normal instruments; data: a
data.frame; effect: the effects introduced in the model, one of "twoways" (the
default) or "individual"; model one of "onestep" (the default) or "twosteps";
transformation the kind of transformation to apply to the model: either "d" (the
default value) for the difference GMM model or "ld" for the system GMM; fsm: the
matrix for the one step estimator: one of "I" (identity matrix) or "G" (= D0 D where
D is the first-difference operator) if transformation="d", one of "GI" or "full" if
transformation="ld".
> gmm <- pgmm(mdr ~ lag(mdr) + lagebit_ta + lagmb +
429
lagrd_ta + lagindmedian + lagrated | lag(mdr,

2:99), data = debtratio, effect = "individual",
fsm = "I")
> summary(gmm)
Oneway (individual) effect One step model
Call:
pgmm(formula = mdr ~ lag(mdr) + lagebit_ta + lagmb + lagdep_ta +
lagrated | lag(mdr, 2:99), data = debtratio, effect = "individual",
fsm = "I")
Number of Observations Used:
Residuals
Min. 1st Qu.
-0.96820 0.00000
Median
0.00000
15039
Mean
0.00106
3rd Qu.
0.00000
Max.
0.86950
Coefficients
Estimate Std. Error
lag(mdr)
0.4538191 0.0505589
lagebit_ta
0.0490861 0.0151092
lagmb
0.0206848 0.0022631
lagdep_ta
-0.0227627 0.0957791
laglnta
0.0271979 0.0065750
lagfa_ta
-0.0057013 0.0235412
lagrd_dum
-0.0178886 0.0105722
lagrd_ta
0.0209480 0.0324596
lagindmedian 0.1221109 0.0383102
lagrated
-0.0089387 0.0077516
--Signif. codes: 0 "***" 0.001 "**"
z-value Pr(>|z|)
8.9761 < 2.2e-16 ***
3.2488 0.001159 **
9.1402 < 2.2e-16 ***
-0.2377 0.812146
4.1366 3.526e-05 ***
-0.2422 0.808639
-1.6920 0.090640 .
0.6454 0.518697
3.1874 0.001435 **
-1.1531 0.248851
0.01 "*" 0.05 "." 0.1 " " 1
Sargan Test: chisq(104) = 434.3268 (p.value=< 2.22e-16)

Autocorrelation test (1): normal = -9.836012 (p.value=< 2.22e-16)
Autocorrelation test (2): normal = -3.352629 (p.value=0.00040024)
Wald test for coefficients: chisq(10) = 451.9441 (p.value=< 2.22e-16)
Arellano Bond two-steps
> gmm <- pgmm(mdr ~ lag(mdr) + lagebit_ta + lagmb +
lagrd_ta + lagindmedian + lagrated | lag(mdr,
2:99), data = debtratio, effect = "individual",
430
model = "twosteps", fsm = "I")

> summary(gmm)
Oneway (individual) effect Two steps model
Call:
pgmm(formula = mdr ~ lag(mdr) + lagebit_ta + lagmb + lagdep_ta +
lagrated | lag(mdr, 2:99), data = debtratio, effect = "individual",
model = "twosteps", fsm = "I")
Number of Observations Used:
Residuals
Min.
-0.945700
1st Qu.
0.000000
Median
0.000000
15039
Mean
0.001129
3rd Qu.
0.000000
Max.
0.860300
Coefficients
Estimate Std. Error
lag(mdr)
0.3868871 0.0725827
lagebit_ta
0.0371411 0.0174741
lagmb
0.0155606 0.0026849
lagdep_ta
0.0784848 0.1090149
laglnta
0.0298020 0.0082703
lagfa_ta
0.0187859 0.0279833
lagrd_dum
-0.0191112 0.0117298
lagrd_ta
-0.0044324 0.0352264
lagindmedian 0.0878274 0.0439521
lagrated
-0.0087387 0.0098295
--Signif. codes: 0 "***" 0.001 "**"
z-value Pr(>|z|)
5.3303 9.805e-08 ***
2.1255 0.033545 *
5.7955 6.812e-09 ***
0.7199 0.471559
3.6035 0.000314 ***
0.6713 0.502013
-1.6293 0.103253
-0.1258 0.899870
1.9983 0.045689 *
-0.8890 0.373987
0.01 "*" 0.05 "." 0.1 " " 1
Sargan Test: chisq(104) = 418.3848 (p.value=< 2.22e-16)

Autocorrelation test (1): normal = -6.258917 (p.value=1.9383e-10)
Autocorrelation test (2): normal = -3.619932 (p.value=0.00014734)
Wald test for coefficients: chisq(10) = 218.1505 (p.value=< 2.22e-16)
11
References
Belgorodski N, Greiner M, Tolksdorf K and Schueller K 2012 rriskDistributions: Fitting
distributions to given data or known quantiles. R package version 1.8. http://CRAN.Rproject.org/package=rriskDistributions
Bolker B and R Development Core Team 2012 bbmle: Tools for general maximum likelihood
estimation. R package version 1.0.5.2. http://CRAN.R-project.org/package=bbmle
Brockwell PJ and Davis RA 1991 Time Series: Theory and Methods, Springer Verlag.
Chambers JM, Cleveland WS, Kleiner B and Tukey PA 1983 Graphical Methods for Data Analysis,
Wadsworth & Brooks/Cole.
Chan K-S, Ripley B 2012 TSA: Time Series Analysis. R package version 1.01. http://CRAN.Rproject.org/package=TSA
Chausse P 2010 Computing Generalized Method of Moments and Generalized Empirical Likelihood
with R. Journal of Statistical Software 34(11), 135, http://www.jstatsoft.org/v34/i11/.
Cookson JA 2012 tonymisc: Functions for Econometrics Output. R package version 1.1.1.
http://CRAN.R-project.org/package=tonymisc
Cribari-Neto F 2004 Asymptotic Inference Under Heteroskedasticity of Unknown Form.
Computational Statistics & Data Analysis 45, 215233.
Croissant Y and Millo G 2008 Panel Data Econometrics in R: The plm Package. Journal of Statistical
Software 27(2), 143, http://www.jstatsoft.org/v27/i02/.
Davidson R and MacKinnon JG 1993 Estimation and Inference in Econometrics, Oxford University
Press.
Elff M 2013 memisc: Tools for Management of Survey Data, Graphics, Programming, Statistics, and
Simulation. R package version 0.96-4. http://CRAN.R-project.org/package=memisc
Faraway
JJ
2002
Practical
Regression
and
Anova
using
R,
July
2002,
http://stat.ethz.ch/CRAN/doc/contrib/Faraway-PRA.pdf.
Flannery MJ and Rangan KP 2006 Partial Adjustment toward Target Capital. Journal of Financial
Economics 41, 4173.
Fox J and Weisberg S 2011 An R Companion to Applied Regression, Second Edition. Thousand
Oaks. CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
Fox J, Nie Z and Byrnes J 2013 sem: Structural Equation Models. R package version 3.1-3.
http://CRAN.R-project.org/package=sem
Graves S 2012 FinTS: Companion to Tsay (2005) Analysis of Financial Time Series. R package
version 0.4-4. http://CRAN.R-project.org/package=FinTS
Hannan EJ Rissanen J 1982 Recursive Estimation of Mixed Autoregressive-Moving Average Order.
Biometrika 69(1), 8194.
Hardin JW Hilbe JM 2007 Generalized Linear Models and Extensions, Stata Press.
Hyndman RJ, Khandakar Y 2008 Automatic Time Series Forecasting: The forecast Package for R.
Journal of Statistical Software 27(3), 122, http://www.jstatsoft.org/v27/i3/.
Hyndman RJ with contributions from G Athanasopoulos, S Razbash, D Schmidt, Z Zhou and Y
Khan 2013 forecast: Forecasting functions for time series and linear models. R package version
4.03. http://CRAN.R-project.org/package=forecast
Jackman S 2012 pscl: Classes and Methods for R Developed in the Political Science Computational
Laboratory, Stanford University. Department of Political Science, Stanford University. Stanford,
California. R package version 1.04.4. URL http://pscl.stanford.edu/
Jarque CM, Bera A 1987 A Test for Normality of Observations and Regression Residuals.
International Statistical Review 55(2), 163172.
Johnson NL, Kemp AW, Kotz S 2005 Univariate Discrete Distributions. Wiley.
Johnston J, Di Nardo J 1997 Econometric Methods, 4th edn. McGraw-Hill.
432
Kleiber C, Zeileis A 2008 Applied Econometrics with R, Springer-Verlag http://CRAN.Rproject.org/package=AER.

Koenker R 2012 Quantile Regression in R: a vignette. 120, http://cran.rproject.org/web/packages/quantreg/vignettes/rq.pdf.
Koenker R 2013 quantreg: Quantile Regression. R package version 4.97. http://CRAN.Rproject.org/package=quantreg
Konnerth K 2010 egarch: EGARCH simulation and fitting. R package version 1.0.0. http://CRAN.Rproject.org/package=egarch
Long JS, Ervin LH 2000 Using Heteroskedasticity Consistent Standard Errors in the Linear
Regression Model. The American Statistician 54, 217224.
Longhow
Lam
2010
An
introduction
to
R,
January
2010,
http://www.splusbook.com/RIntro/RCourse.pdf.
Lumley T using Fortran code by Alan Miller 2009 leaps: regression subset selection. R package
version 2.9. http://CRAN.R-project.org/package=leaps
MacKinnon JG, White H 1985 Some Heteroskedasticity-Consistent Covariance Matrix Estimators
with Improved Finite Sample Properties. Journal of Econometrics 29, 305325.
Mann HB, Wald A 1943 On the Statistical Treatment of Linear Stochastic Difference Equations.
Econometrica 11, 173220.
McCulloch JH and Kwon HC 1993 U.S. Term Structure Data, 1947-1991. Ohio State working paper
93-6, Ohio State University.
McLeod AI, Zhang Y 2007 Faster ARMA maximum likelihood estimation Computational Statistics
Data Analysis 52(4) http://dx.doi.org/10.1016/j.csda.2007.07.020
McLeod AI, Zhang Y 2008 Improved Subset Autoregression: With R Package. Journal of Statistical
Software 28(2 http://www.jstatsoft.org/v28/i02/.
Mood AM, Graybill F, Boes DC 1974 Introduction to the Theory of Statistics, McGraw-Hill.
Murrell P 2010 hexView: Viewing Binary Files. R package version 0.3-2. http://CRAN.Rproject.org/package=hexView
Osterwald-Lenum M 1992 A Note with Quantiles of the Asymptotic Distribution of the Maximum
Likelihood Cointegration Rank Test Statistics. Oxford Bulletin of Economics and Statistics 55(3),
461472.
Pfaff B 2008 Analysis of Integrated and Cointegrated Time Series with R. Second Edition. Springer,
New York.
Rigby RA and Stasinopoulos DM 2005 Generalized additive models for location, scale and
shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby
B
and
Stasinopoulos
M
2010
The
gamlss.family
distributions,
http://finzi.psych.upenn.edu/R/library/gamlss.dist/doc/Distributions-2010.pdf.
R Development Core Team 2013 R: A language and environment for statistical computing, R
Foundation for Statistical Computing, Vienna, Austria http://www.R-project.org.
R Core Team 2013 foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ....
R package version 0.8-53. http://CRAN.R-project.org/package=foreign
Sarkar D 2008 Lattice: Multivariate Data Visualization with R, Springer.
Shapiro SS, Wilk MB 1965 An analysis of variance test for normality (complete samples). Biometrika
52(3 and 4), 591611.
Shumway RH and Stoffer DS 2011 Time Series Analysis and Its Applications. With R Examples,
Springer Verlag.
Stasinopoulos M, Rigby B with contributions from C Akantziliotou, G Heller, R Ospina, N Motpan,
F McElduff, V Voudouris and M Djennad. 2012 gamlss.dist: Distributions to be used for GAMLSS
modelling.. R package version 4.2-0. http://CRAN.R-project.org/package=gamlss.dist
Stoffer D 2012 astsa: Applied Statistical Time Series Analysis. R package version 1.1. http://CRAN.Rproject.org/package=astsa
Toomet O, Henningsen A 2008 Sample Selection Models in R: Package sampleSelection. Journal of
Statistical Software 27(7 http://www.jstatsoft.org/v27/i07/.
Trapletti A and Hornik K 2012 tseries: Time Series Analysis and Computational Finance. R package
version 0.10-30.
Venables WN Ripley BD 2002 Modern Applied Statistics with S. Fourth Edition. Springer, New
York.
Verbeek M 2008 A guide to modern econometrics, 3rd edn. John Wiley.
Verbeek M 2012 A guide to modern econometrics, 4th edn. John Wiley.
White H 1980 A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for
Heteroskedasticity. Econometrica 48, 817838.
W
urtz D, Chalabi Y and Ellis A 2009 A Discussion of Time Series Objects for R in Finance,
https://www.rmetrics.org/ebooks-tseries.
433
Wuertz D, Chalabi Y with contribution from M Miklovic, C Boudt, P Chausse and others 2012
fGarch: Rmetrics - Autoregressive Conditional Heteroskedastic Modelling. R package version
2150.81. http://CRAN.R-project.org/package=fGarch
Wuertz D, many others and see the SOURCE file 2012 fArma: ARMA Time Series Modelling. R
package version 2160.78. http://CRAN.R-project.org/package=fArma
Zappa D, Bramante R, Nai Ruscone M 2012 Appunti di Metodi Statistici per la Finanza e le
Assicurazioni, Educatt.
Zeileis A 2004 Econometric Computing with HC and HAC Covariance Matrix Estimators. Journal
of Statistical Software 11(10), 117, http://www.jstatsoft.org/v11/i10/.
Zeileis A 2006 Object-oriented Computation of Sandwich Estimators. Journal of Statistical Software
16(9), 1-16. URL http://www.jstatsoft.org/v16/i09/.
Zeileis A 2011 dynlm: Dynamic Linear Regression. R package version 0.3-1. http://CRAN.Rproject.org/package=dynlm
Zeileis A, Hothorn T 2002 Diagnostic Checking in Regression Relationships. R News 2(3), 7-10.
http://CRAN.R-project.org/doc/Rnews/
Zhelonkin M, Genton MG, Ronchetti E 2013 ssmrob: Robust estimation and inference in sample
selection models. R package version 0.2. http://CRAN.R-project.org/package=ssmrob
A
Some useful R functions
This Appendix includes some excerpts from the documentation available on the help
of the R system and some advice regarding the creation of graphs.
The topics regard:
how to install R
how to install and update packages
functions useful to read/import data
how to write a formula to define a model
how to estimate the parameters of a linear model
the package Deducer
436
A.1
How to Install R
You can find the R installation files on http://www.R-project.org.
A.2
How to Install and Update Packages
If a package is not present on your R installation, You can download it by using the
option Install package(s) from the menu Packages in the R Console.
It is also possible to use the function install.packages whose main argument is
pkgs a character vector with the names of the packages whose current versions should
be downloaded from the repositories, e.g.
install.packages("lmtest")
You can update the packages available on Your system by using
update.packages(ask=FALSE)
The following code (do not execute if not really needed!) will install all the packages
available on the CRAN site which are not present on Your system (more than 4,000
packages requiring more than 4 GigaBytes of space).
a<-new.packages()
if (length(a)>0) install.packages(a)
The latter code can be useful when R is used without an Internet connection.
??"keyword1 keyword2"
will search keyword1 and keyword2 on the help documentation related to all the
installed packages.
A.3
Data Reading
On Verbeeks site data are saved in the text, Stata and EViews formats, and are
compressed in zip files. We describe the procedures to uncompress a zip file and read
data.
Once having read data in R it is possible to check the consistency of the imported
data with the information contained in the txt file available in the zip file by using
the functions summary, head and tail.
A.3.1
zip files
To see the content of a zip file use the function unzip, available in the utils library,
which is automatically loaded when R starts.
unzip(zipfile, files = NULL, list = FALSE, overwrite = TRUE,
junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE)
zipfile: The pathname of the zip file: tilde expansion (see path.expand) will
be performed.
437
files: A character vector of recorded filepaths to be extracted: the default is

to extract all files.
list: If TRUE, list the files and extract none.
See the R help for the remaining options and more information.
A.3.2
Reading from a text file
To read from a text file use the command read.table, available in the utils library.
It reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
file: the name of the file which the data are to be read from. Each row of the
table appears as one line of the file.
file can also be a complete URL. (For the supported URL schemes, see the
URLs section of the help for url.)
header: a logical value indicating whether the file contains the names of the
variables as its first line. If missing, the value is determined from the file format:
header is set to TRUE if and only if the first row contains one fewer field than
the number of columns.
sep: the field separator character. Values on each line of the file are separated
by this character. If sep = (the default for read.table) the separator is white
space, that is one or more spaces, tabs, newlines or carriage returns.
dec: the character used in the file for decimal points.
row.names: a vector of row names. This can be a vector giving the actual row
names, or a single number giving the column of the table which contains the
row names, or character string giving the name of the table column containing
the row names.
If there is a header and the first row contains one fewer field than the number
of columns, the first column in the input is used for the row names. Otherwise
if row.names is missing, the rows are numbered.
col.names: a vector of optional names for the variables. The default is to use
V followed by the column number.
438
as.is: the default behavior of read.table is to convert character variables (which

are not converted to logical, numeric or complex) to factors. The variable as.is
controls the conversion of columns not otherwise specified by colClasses. Its
value is either a vector of logicals (values are recycled if necessary), or a vector
of numeric or character indices which specify which columns should not be
converted to factors.
na.strings: a character vector of strings which are to be interpreted as NA

values. Blank fields are also considered to be missing values in logical, integer,
numeric and complex fields.
See the R help for the remaining arguments and more information.
A.3.3
Reading from a Stata file
To read a Stata file use the function read.dta which reads a file in Stata version 5-11
binary format into a data frame. The function is available in the package foreign.
read.dta(file, convert.dates = TRUE, convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
See the R help for more information.
A.3.4
Reading from an EViews file
To read an EViews file use the function readEViews, which is available in the package
foreign.
readEViews(filename, as.data.frame = TRUE)
The messages Skipping boilerplate variable will be returned which warn about
the fact that the 2 variables c and resid, which are always created by default by
EViews, and thus are present in the file you are converting, are not read.
See the R help for more information.
A.3.5
Reading from a Microsoft Excel file
See Appendix A.6.
A.4
439
formula{stats}
Description. The generic function formula1 and its specific methods provide a way
of extracting formulae which have been included in other objects.
Usage formula(x, ...)
Arguments x an R object. ... further arguments passed to or from other methods.
Details The models fit by, e.g., the lm and glm functions are specified in a compact
symbolic form. The ~ operator is basic in the formation of such models.
An expression of the form y ~ model is interpreted as a specification that the
response y is modelled by a linear predictor specified symbolically by model. Such a
model consists of a series of terms separated by + operators.
The terms themselves consist of variable and factor names separated by the
interaction : operator.
Such a term is interpreted as the interaction of all the variables and factors
appearing in the term.
In addition to + and :, a number of other operators are useful in model formulae.
The * operator denotes factor crossing:
a*b interpreted as a+b+a:b.
The ^ operator indicates crossing to the specified degree.
(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula
containing the main effects for a, b and c together with their second-order interactions.
The %in% operator indicates that the terms on its left are nested within those on the
right.
a + b %in% a expands to the formula a + a:b.
The - operator removes the specified terms. (a+b+c)^2 - a:b is identical to a + b
+ c + b:c + a:c.
It can also used to remove the intercept term: y ~ x - 1 is a line through the
origin.
A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
While formulae usually involve just variable and factor names, they can also involve
arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal. When such
arithmetic expressions involve operators which are also used symbolically in model
formulae, there can be confusion between arithmetic and symbolic operator use.
To avoid this confusion, the function I() can be used to bracket those portions of a
model formula where the operators are used in their arithmetic sense.
For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as
the sum of b and c.
There are two special interpretations of . in a formula.
1 The
function is available in the package stats which is automatically loaded when R starts.
440
The usual one is in the context of a data argument of model fitting functions and
means all columns not otherwise in the formula: see terms.formula.
In the context of update.formula, only, it means what was previously in this part
of the formula.
References
Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical
Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Examples
In Section 3.1.4 we used the function

update(regr3.2, . ~ . + prefarea:bedrooms, data = housing)
to add the interaction term between prefarea and bedrooms to the linear model
regr3.2 resulting from:
regr3.1<-lm(log(price) ~ log(lotsize) + bedrooms + bathrms +
airco, data=housing)
regr3.2<-update(regr3.1, . ~ . + driveway + recroom + fullbase +
gashw + garagepl + prefarea + stories, data = housing)
In Section 3.3.3 we used the syntax

model.matrix(~MALE * EDUC + MALE * LNEXPER, indwages)
to define a model matrix containing the main effects of MALE, EDUC and
LNEXPER and their interactions, that is (in a cumbersome mode):
model.matrix(~MALE + EDUC + LNEXPER + MALE:EDUC + MALE:LNEXPER,
indwages)
and then the syntax
model.matrix(~MALE + EDUC * LNEXPER)
for
model.matrix(~MALE + EDUC + LNEXPER + EDUC:LNEXPER)
441
linear model
A.5
The function lm is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance (although aov may
provide a more convenient interface for these). It is available in the package stats
which is automatically loaded when R starts.
Usage
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
Arguments
formula: an object of class formula (or one that can be coerced to that class): a
symbolic description of the model to be fitted. The details of model specification
are given under Details.
data: an optional data frame, list or environment (or object coercible by

as.data.frame to a data frame) containing the variables in the model. If not
found in data, the variables are taken from environment(formula), typically
the environment from which lm is called.
subset: an optional vector specifying a subset of observations to be used in the

fitting process.
weights: an optional vector of weights to be used in the fitting process. Should

be NULL or a numeric vector. If non-NULL, weighted least squares is used
with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary
least squares is used. See also Details,
na.action: a function which indicates what should happen when the data
contain NAs. The default is set by the na.action setting of options, and is
na.fail if that is unset. The factory-fresh default is na.omit. Another possible
value is NULL, no action. Value na.exclude can be useful.
method: the method to be used; for fitting, currently only method = "qr" is
supported; method = "model.frame" returns the model frame (the same as
with model = TRUE, see below).
model, x, y, qr: logicals. If TRUE the corresponding components of the fit

(the model frame, the model matrix, the response, the QR decomposition) are
returned.
singular.ok: logical. If FALSE (the default in S but not in R) a singular fit is

an error.
contrasts: an optional list. See the contrasts.arg of model.matrix.default.
offset: this can be used to specify an a priori known component to be included

in the linear predictor during fitting. This should be NULL or a numeric vector of
length equal to the number of cases. One or more offset terms can be included
in the formula instead or as well, and if more than one are specified their sum
is used. See model.offset.
442
...: additional arguments to be passed to the low level regression fitting

functions (see below).
Details
Models for lm are specified symbolically. See Section A.4.
If the formula includes an offset, this is evaluated and subtracted from the response.
If response is a matrix a linear model is fitted separately by least-squares to each
column of the matrix.
See model.matrix for some further details. The terms in the formula will be reordered so that main effects come first, followed by the interactions, all second-order,
all third-order and so on: to avoid this pass a terms object as the formula (see aov
and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either y { x - 1 or y {
0 + x. See formula for more details of allowed formulae.
Non-NULL weights can be used to indicate that different observations have different
variances (with the values in weights being inversely proportional to the variances); or
equivalently, when the elements of weights are positive integers wi , that each response
yi is the mean of wi unit-weight observations (including the case that there are wi
observations equal to yi and the data have been summarized).
lm calls the lower level functions lm.fit, etc, see below, for the actual numerical
computations. For programming only, you may consider doing likewise.
All of weights, subset and offset are evaluated in the same way as variables in formula,
that is first in data and then in the environment of formula.
Value
lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm").
The functions summary and anova are used to obtain and print a summary and
analysis of variance table of the results. The generic accessor functions coefficients,
effects, fitted.values and residuals extract various useful features of the value returned
by lm.
An object of class "lm" is a list containing at least the following components:
coefficients: a named vector of coefficients
residuals: the residuals, that is response minus fitted values.
fitted.values: the fitted mean values.
rank: the numeric rank of the fitted linear model.
weights: (only for weighted fits) the specified weights.
df.residual: the residual degrees of freedom.
call: the matched call.
terms: the terms object used.
contrasts: (only where relevant) the contrasts used.
xlevels: (only where relevant) a record of the levels of the factors used in
fitting.
offset: the offset used (missing if none were used).
443
y: if requested, the response used.
x: if requested, the model matrix used.
model: if requested (the default), the model frame used.
na.action: (where relevant) information returned by model.frame on the special

handling of NAs.
In addition, non-null fits will have components assign, effects and (unless not
requested) qr relating to the linear fit, for use by extractor functions such as summary
and effects.
Using time series
Considerable care is needed when using lm with time series.
Unless na.action = NULL, the time series attributes are stripped from the variables
before the regression is done. (This is necessary as omitting NAs would invalidate the
time series attributes, and if NAs are omitted in the middle of the series the result
would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to line up series, so
that the time shift of a lagged or differenced regressor would be ignored. It is good
practice to prepare a data argument by ts.intersect(..., dframe = TRUE), then
apply a suitable na.action to that data frame and call lm with na.action = NULL so
that residuals and fitted values are time series.
Note
Offsets specified by offset will not be included in predictions by predict.lm, whereas
those specified by an offset term in the formula will be.
Author(s)
The design was inspired by the S function of the same name described in Chambers
(1992). The implementation of model formula by Ross Ihaka was based on Wilkinson
& Rogers (1973).
References
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J.
M. Chambers and T. J. Hastie, Wadsworth Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models
for analysis of variance. Applied Statistics, 22, 392-9.
We observe that with time series it is possible to use the function dynlm available
in the package dynlm. See the R help system
444
A.6
Deducer
The package Deducer contains an intuitive, cross-platform graphical data analysis

system. It uses menus and dialogs to guide the user efficiently through the data
manipulation and analysis process, and has an Excel like spreadsheet for easy data
frame visualization and editing.
After having called library(Deducer) the R GUI menus are enriched with the
options Deducer, Data, Analysis and Plots.
In particular the option Open Data in the menu Deducer allows SPSS, SAS,
DBase, Stata, Systat, ARFF, Epiinfo, Minitab and Excel files to be imported in
R as data.frames.
Estratto da G. Cantaluppi, Computational Laboratory for Economics. Notes for the students, Educatt, 2013
426
A.6
Deducer
427
The library Deducer contains an intuitive, cross-platform graphical data analysis

system. It uses menus and dialogs to guide the user eciently through the data
manipulation and analysis process, and has an Excel like spreadsheet for easy data
frame visualization and editing.
After having called library(Deducer) the R GUI menus are enriched with the
options Deducer, Data, Analysis and Plots.
In particular the option Open Data in the menu Deducer allows SPSS, SAS,
DBase, Stata, Systat, ARFF, Epiinfo, Minitab and Excel les to be imported in
R as data.frames.
The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to dene a linear
model and the output2 .
The menus can be further enriched by calling the library DeducerExtras, with tools
for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.
Figure A.1
Interface to dene a linear model with Deducer
2 The corresponding R code is

indwages$EDUC <- as.factor(indwages$EDUC)
summary(lm(LNWAGE ~ MALE + EDUC + LNEXPER, data = indwages))
The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to define a linear
model and the output2 .
The menus can be further enriched by calling the package DeducerExtras, with
tools for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.
2 The corresponding R code is

indwages$EDUC <- as.factor(indwages$EDUC)
summary(lm(LNWAGE ~ MALE + EDUC + LNEXPER, data = indwages))
Figure A.1
Interface to define a linear model with Deducer
445
446
Figure A.2
Linear model output and diagnostics
Figure A.3
Linear model component + residual and added-variable plots
447
B
Addendum 3rd edition
This Chapter can be downloaded from the booksite www.educatt.it/libri/materiali.
EDUCatt - Ente per il Diritto allo Studio Universitario dellUniversit Cattolica

Largo Gemelli 1, 20123 Milano - tel. 02.7234.22.35 - fax 02.80.53.215
e-mail: editoriale.dsu@educatt.it (produzione); librario.dsu@educatt.it (distribuzione)
web: www.educatt.it/libri
Euro 23,00

Computational Laboratory For Economics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computational Laboratory For Economics

Uploaded by

Copyright:

Available Formats

GABRIELE CANTALUPPI

Positively skewed distributions

3 Interpreting and comparing Linear Regression Models

The Durbin-Watson statistic - direct function

5 Endogeneity, Instrumental Variables and GMM

6 Maximum Likelihood Estimation and Specification Tests

7 Models with Limited Dependent Variables

Autocorrelation, Partial autocorrelation functions and ARMA model

8.10.3 The Ljung-Box statistic - direct function

9 Multivariate Time Series Models

10 Models based on panel data

B Addendum 3rd edition

B.2.5 Non complete models

8.10.3 third edition)

On the Properties of the Sample Mean

Consider a random variable X with mean E(X) = and variance V ar(X) = 2 .

has the properties:

The Normal Distribution Case

We consider, as an example giving evidence of the properties of the sample mean,

Some Elements of Statistical Inference

> sd <- sigma2^0.5

Some Elements of Statistical Inference

Figure 1.1 Distribution of the sample mean from X N (4, 2);

Some Elements of Statistical Inference

to = max(X$xbar), length = 25), type = "density",

Some Elements of Statistical Inference

3.0 3.5 4.0 4.5 5.0

3.0 3.5 4.0 4.5 5.0

k = 1000, 500, 100, 50

3.0 3.5 4.0 4.5 5.0

n = 9, 25, 64, 100

Figure 1.2 Distribution of the sample mean from X N (4, 1);

function histogram available in the package lattice.

The Central Limit Theorem

We remark that in this instance we have not required X to be normally distributed.

Some Elements of Statistical Inference

To give evidence to the central limit theorem result let X1 , . . . Xn be identically

Some Elements of Statistical Inference

0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

k = 1000, 500, 100, 50

0.2 0.3 0.4 0.5 0.6 0.7 0.8

n = 9, 25, 64, 100

Figure 1.3 Distribution of the sample mean from Y U (0, 1);

Some Elements of Statistical Inference

0.1 0.2 0.3 0.4 0.5 0.6

0.1 0.2 0.3 0.4 0.5 0.6

k = 1000, 500, 100, 50

0.1 0.2 0.3 0.4 0.5 0.6

n = 9, 25, 64, 100

Figure 1.4 Distribution of the sample mean from W Exp(4);

Example: Individual wages (2.1.2)

Data Reading and summary statistics

exper: experience in years

male: 1 if male, 0 if female

school: years of schooling

wage: wage (in 1980 $) per hour

An Introduction to Linear Regression

EXPER MALE SCHOOL

An Introduction to Linear Regression

Box & Whiskers plot of wages for males and females

Some graphical representations and grouping statistics

An Introduction to Linear Regression