Professional Documents
Culture Documents
COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students
GABRIELE CANTALUPPI
COMPUTATIONAL LABORATORY
FOR ECONOMICS
Notes for the students
Milano 2013
2012-2013 EDUCatt - Ente per il Diritto allo Studio Universitario dell'Universit Cattolica
2012-2013 Largo Gemelli 1, 20123 Milano - tel. 02.7234.22.35 - fax 02.80.53.215
2012-2013 e-mail: editoriale.dsu@educatt.it (produzione); librario.dsu@educatt.it (distribuzione)
2012-2013 web: www.educatt.it/libri
2012-2013 ISBN edizione cartacea: 978-88-6780-021-6
2012-2013 ISBN edizione elettronica: 978-88-6780-022-3
Ledizione cartacea di questo volume stata stampata nel mese di settembre 2013
presso la Litografia Solari (Peschiera Borromeo - Milano)
CONTENTS
Preface
1 Some Elements of Statistical Inference
1.1 On the Properties of the Sample Mean
1.1.1 The Normal Distribution Case
1.1.2 The Central Limit Theorem
2 An Introduction to Linear Regression
2.1 Example: Individual wages (2.1.2)
2.1.1 Data Reading and summary statistics
2.1.2 Some graphical representations and grouping statistics
2.1.3 Simple Linear Regression
2.1.4 Confidence intervals (Section 2.5.2)
2.2 Multiple Linear Regression (Section 2.5.5)
2.2.1 Parameter estimation
2.2.2 ANOVA to compare the two models (Section 2.5.5)
2.3 CAPM example (Section 2.7)
2.3.1 CAPM regressions (without intercept) (Table 2.3)
2.3.2 Testing an hypothesis on 1
2.3.3 CAPM regressions (with intercept) (Table 2.4)
2.3.4 CAPM regressions (with intercept and January dummy) (Table
2.5)
2.4 The Worlds Largest Hedge Found (Section 2.7.3)
2.5 Dummy Variables Treatment and Multicollinearity (Section 2.8.1)
2.6 Missing Data, Outliers and Influential Observations
2.7 How to check the form of the distribution
2.7.1 Data histogram with the theoretical density function
2.7.2 The 2 goodness-of-fit test
2.7.3 The Kolmorogov-Smirnov test
2.7.4 The PP-plot and the QQ-plot
2.7.5 Use of the function fit.cont
2.8 Two tests for assessing normality
2.8.1 The Jarque-Bera test
2.8.2 The Shapiro-Wilk test
2.9 Some further comments on the QQ-plot
ix
1
1
1
5
9
9
9
11
14
16
17
17
18
18
19
23
25
26
26
28
31
31
32
32
34
37
40
41
41
43
43
2.9.1
2.9.2
2.9.3
2.9.4
44
45
46
47
55
55
57
60
60
62
63
63
64
66
67
68
72
75
76
78
79
80
82
83
83
85
90
93
93
93
94
95
95
96
98
99
103
108
111
4.2.2
4.2.3
4.2.4
4.3
112
113
115
117
117
118
120
122
124
129
137
137
142
144
151
152
162
166
172
177
182
239
239
239
240
244
247
8.2
249
249
254
262
262
270
270
275
280
280
283
286
291
295
296
297
299
299
300
301
302
302
306
307
308
315
315
317
322
322
323
323
324
333
336
338
345
350
351
351
352
352
353
354
356
358
363
363
365
378
391
391
393
397
399
409
409
419
11
431
431
References
A Some useful R functions
A.1 How to Install R
A.2 How to Install and Update Packages
A.3 Data Reading
A.3.1 zip files
A.3.2 Reading from a text file
A.3.3 Reading from a Stata file
A.3.4 Reading from an EViews file
A.3.5 Reading from a Microsoft Excel file
A.4 formula{stats}
A.5 linear model
A.6 Deducer
435
436
436
436
436
437
438
438
438
439
441
444
449
449
449
452
452
454
456
457
457
B.3
B.4
B.5
B.6
(Section
(Part 1)
(Part 2)
(Part 3)
458
463
471
477
480
PREFACE
These Lecture Notes refer to the examples and illustrations proposed in the book A
Guide to Modern Econometrics by Marno Verbeek (4th and 3rd editions).
The source codes here described are written in the R language (R Development
Core Team 2012) (R version 3.0.1 was used).
Subjects are presented in the course Computational Laboratory for Economics
held at Universit`
a Cattolica del Sacro Cuore, Graduate Program Economics. The
course runs in parallel with the course Empirical Economics where the methodological
background is assessed.
Attention was paid in order to obtain results first according to their mathematical
structure, and then by using appropriate built-in R functions, anyway searching for
an efficient and elegant programming style.
The reader is assumed to possess the basic knowledge of R. An introduction to R by
Longhow Lam, available on http://www.splusbook.com/RIntro/RCourse.pdf may
represent a good reference.
Chapters from 2 to 10 recall the contents of Verbeeks Guide. Appendix A1 describes
how to read data from text, Stata and EViews files, which are the formats used by
Verbeek on his book website, where data sets are available. Appendix B contains
results for examples which were present on the 3rd edition of Verbeeks Guide.
Some companion materials to these Lecture Notes can be downloaded from the
booksite www.educatt.it/libri/materiali.
I warmly thank Diego Zappa and Giuseppe Boari for having read parts of the
manuscript. I wish to thank Stefano Iacus for his short course on an efficient and
advanced use of R, and Achim Zeileis, Giovanni Millo and Yves Croissant for having
improved their packages lmtest and plm in order to properly fit some problems here
presented.
1
Some Elements of Statistical
Inference
1.1
1.1.1
X
= 1
Xi
X
n i=1
(1.1)
=
E(X)
(1.2)
= 2 /n.
V ar(X)
(1.3)
set.seed(1000)
k <- 100
n <- 5
mean <- 4
sigma2 <- 2
1
2
3
4
x1
x2
x3
x4
x5
3.37
3.45
2.61
4.24
2.29
3.33
3.22
4.22
4.06
5.02
4.17
4.04
4.90
3.97
3.83
1.11
2.89
2.06
2.11
4.30
3.50
3.57
3.19
3.58
Frequency
10
15
2.5
3.0
3.5
4.0
4.5
5.0
5.5
>
>
>
>
>
>
>
>
>
>
set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),
>
>
>
>
>
set.seed(1000)
kvals <- c(50, 100, 500, 1000)
nvals <- c(9, 25, 64, 100)
X <- data.frame(k = NA, n = NA, xbar = NA)
for (k in kvals) {
for (n in nvals) {
set.seed(1000)
X <- rbind(X, cbind(k = k, n = n, xbar = replicate(k,
mean(rnorm(n, 4, 1)))))
}
}
X <- X[-1, ]
X$k <- factor(X$k)
X$n <- factor(X$n)
library(lattice)
histogram(~xbar | k:n, data = X, breaks = seq(from = min(X$xbar),
to = max(X$xbar), length = 25), type = "density",
as.table = TRUE, xlab = paste("n = ", paste(nvals,
collapse = ", ")), ylab = paste("k = ", paste(rev(kvals),
collapse = ", ")))
kvals and nvals are arrays containing respectively the values of the variables k and
n in the 16 situations depicted in Fig. 1.2.
X<-data.frame(k=NA,n=NA,xbar=NA) defines a data.frame X, with three columns
named k, n and xbar; the rows of X will contain the number (k) of replications and
the sample size (n) of the experiment, which the sample mean (xbar) refers to.
The sample means are evaluated for the k = 50, 100, 500, 1000 replications of n =
9, 25, 64, 100 pseudo-random numbers from X N ( = 4, 2 = 1). The construction
of X, which may seem a bit clumsy, will simplify the production of the graphs in Fig.
1.2 by means of the function histogram in the package lattice.
The assignment of the rows of X is obtained by using a double for cycle.
Observe that the pseudo-random numbers are generated, by the function replicate,
in blocks (arrays) of dimension k.
cbind binds column/matrix elements in a single matrix: in the present case blocks are
constructed, which contain in the first and second column the k and n identifiers and
in the third column the values of the sample means. All the blocks are subsequently
stacked in the X matrix by means of the function rbind.
By initializing the seed for each n, the first generated samples do not vary when we
increase the number k of replications.
Variables k and n in the data.frame X are then assigned the nature (class) of factors,
that is categorical variables, to simplify the graphical representation by means of the
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
4
3
2
1
0
4
3
2
1
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
4
3
2
1
0
4
3
2
1
0
3.0 3.5 4.0 4.5 5.0
1.1.2
We now consider what happens when X, random variable with E(X) = and variance
V ar(X) = 2 , is not Normally distributed.
is the sample mean from (x1 , . . . xn ), realization of the n-dimensional random
If X
variable X1 , . . . Xn , whose components are identically and independently distributed
as X, by invoking the central limit theorem we have asymptotically that:
N (, 2 /n).
(1.4)
1
2
and
E(W ) =
1
1
and
V ar(W ) = 2 .
12
If we study the behaviour of the sample mean in presence of k = 50, 100, 500, 1000
replications for the sample sizes n = 9, 25, 64, 100 from the above distributions X
U (0, 1) and X Exp(), we can observe that, according to relationship (1.4), the
dispersion of the sample mean estimator reduces when n increases; while when k
gets larger the distribution of the sample mean is approximated by a Normal random
variable.
Figures 1.3 and 1.4 give evidence of the result and can be obtained by using the
same code producing Fig. 1.2, after having substituted the instruction
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rnorm(n,4,1)))))
with the code:
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(runif(n)))))
for the uniform case and
X<- rbind(X,cbind("k"=k,"n"=n,"xbar"=replicate(k,mean(rexp(n,4)))))
for the exponential case.
V ar(Y ) =
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
10
5
0
10
5
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
10
5
0
10
5
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8
50:9
50:25
50:64
50:100
100:9
100:25
100:64
100:100
15
10
5
0
15
10
5
0
500:9
500:25
500:64
500:100
1000:9
1000:25
1000:64
1000:100
15
10
5
0
15
10
5
0
0.1 0.2 0.3 0.4 0.5 0.6
2
An Introduction to Linear
Regression
2.1
We have first to read the data, available in the file wages1.dat, included in the
compressed file ch02.zip.
2.1.1
The function read.table allows one to read from a text data set file, where data
have been stored in text format, and create a data.frame, see Appendix A.3. The
data set file is assumed to be in a tabular form with one or more spaces or a tab as
field separator. The function unzip extracts a file from a compressed archive.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
The description of the variables is provided in the file wages1.txt:
To explore the initial and the final part of a data frame use the functions head and
tail.
> head(wages1)
EXPER MALE SCHOOL
1
9
0
13
2
12
0
12
3
11
0
11
4
9
0
14
5
8
0
14
6
9
0
14
WAGE
6.315296
5.479770
3.642170
4.593337
2.418157
2.094058
10
> tail(wages1)
3289
3290
3291
3292
3293
3294
The function summary produces some statistics summarizing the columns (variables)
of the data frame. The results may be compared with the sample statistics provided
by Verbeek in the file wages1.txt.
> summary(wages1)
EXPER
Min.
: 1.000
1st Qu.: 7.000
Median : 8.000
Mean
: 8.043
3rd Qu.: 9.000
Max.
:18.000
WAGE
Min.
: 0.07656
1st Qu.: 3.62157
Median : 5.20578
Mean
: 5.75759
3rd Qu.: 7.30451
Max.
:39.80892
MALE
Min.
:0.0000
1st Qu.:0.0000
Median :1.0000
Mean
:0.5237
3rd Qu.:1.0000
Max.
:1.0000
SCHOOL
Min.
: 3.00
1st Qu.:11.00
Median :12.00
Mean
:11.63
3rd Qu.:12.00
Max.
:16.00
If you want all the sample statistics provided in the file wages1.txt you can use the
function vsummary defined by the following code1 :
> vsummary0 <- function(x) c(Obs = length(x), Mean = mean(x),
Std.Dev. = sd(x), Min = min(x), Max = max(x),
na = sum(is.na(x)))
> vsummary <- function(x) t(apply(x, 2, vsummary0))
> vsummary(wages1)
EXPER
MALE
SCHOOL
WAGE
Obs
Mean Std.Dev.
3294 8.0434123 2.2906610
3294 0.5236794 0.4995148
3294 11.6305404 1.6575447
3294 5.7575850 3.2691858
Min
Max na
1.00000000 18.00000 0
0.00000000 1.00000 0
3.00000000 16.00000 0
0.07655561 39.80892 0
1 We add the information regarding the possible presence of missing values. The function is.na
returns the logical value TRUE if its argument is identified as not available (NA), otherwise FALSE.
40
11
30
20
females
males
10
Figure 2.1
2.1.2
Lets compare the wages for males and females. A useful graphical representation is
the Box & Whiskers plot, see Fig. 2.1. Recall that the levels of the three lines defining
the box correspond respectively to the first, the second and the third Quartile of the
data (the second Quartile is the median). The values placed outside the two whiskers
may be considered anomalous with respect to the other data, see Chambers et. al.
(1983).
We can obtain the graph by having recourse to the function boxplot. The first
argument in this function is a formula, see Appendix A.4, establishing that we are
studying the WAGE as a function (~) of the gender (dummy variable MALE). The second
argument is the name of the data.frame containing the involved variables. By using
the third argument we attribute proper names to the values 0 and 1, that are assumed
by the variable MALE, which will appear on the graph.
> boxplot(WAGE ~ MALE, data = wages1, names = c("females",
"males"))
We can also represent the wage as a function of the years of experience, see Fig. 2.2
12
40
20
10
WAGE
30
10
15
40
EXPER
30
20
10
Figure 2.2
experience
9 10
12
14
16
18
Scatterplot and Box & Whiskers plot of wages by the number of years of
> layout(1:2)
> plot(WAGE ~ EXPER, data = wages1)
> boxplot(WAGE ~ EXPER, data = wages1)
The function plot results in a scatter plot diagram of the involved variables. The
function layout(matrix) creates a multifigure environment, the numbers in the
matrix (in our instance a column vector) define the pointer sequence specifying the
order the different graphs will appear.
We may desire to produce different graphs, for males and females, representing the
wage as a function of the years of experience, see Fig. 2.3. It is preferable to first
recode the dummy variable MALE in a categorical one, e.g. GENDER, that is a factor,
whose levels are f and m.
> wages1$gender <- as.factor(wages1$MALE)
> levels(wages1$gender) <- c("f", "m")
Finally we can produce the boxplot by studying the wage as a function of the
interaction (:) between experience and gender, that is as a function of the set of
40
13
30
20
10
9 10
40
12
14
16
18
30
20
10
1.f
4.f
7.f
4.m
7.m
11.m
15.m
Figure 2.3 Scatterplot and Box & Whiskers plot of wages by gender and the number of
years of experience
WAGE
14
Min.
: 1.000
Min.
:0
Min.
: 5.00
Min.
: 0.07656
1st Qu.: 6.000
1st Qu.:0
1st Qu.:11.00
1st Qu.: 3.17564
Median : 8.000
Median :0
Median :12.00
Median : 4.69326
Mean
: 7.732
Mean
:0
Mean
:11.84
Mean
: 5.14692
3rd Qu.: 9.000
3rd Qu.:0
3rd Qu.:13.00
3rd Qu.: 6.53275
Max.
:16.000
Max.
:0
Max.
:16.00
Max.
:32.49740
-----------------------------------------------wages1$MALE: 1
EXPER
MALE
SCHOOL
WAGE
Min.
: 2.000
Min.
:1
Min.
: 3.00
Min.
: 0.1535
1st Qu.: 7.000
1st Qu.:1
1st Qu.:10.00
1st Qu.: 4.0290
Median : 8.000
Median :1
Median :12.00
Median : 5.6543
Mean
: 8.326
Mean
:1
Mean
:11.44
Mean
: 6.3130
3rd Qu.:10.000
3rd Qu.:1
3rd Qu.:12.00
3rd Qu.: 7.8913
Max.
:18.000
Max.
:1
Max.
:16.00
Max.
:39.8089
2.1.3
Lets study by a linear regression model how the mean level of the variable WAGE
changes as a function of the gender: we can regress the variable WAGE on the dummy
variable MALE, which assumes value 1 when the subject is male and 0 when she is
female. We make use of the function linear model (lm); the first argument is the
regression formula, where the ~ symbol separates the dependent variable from the
independent one. The intercept is enclosed by default. The data argument specifies
the name of the data.frame containing the data.
We are thus studying the model
WAGE = 1 + 2 MALE + ERROR
(2.1)
3Q
Max
1.487 33.496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***
15
MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Recall that, in R, every result is an object and that the instructions names and str2
allow one to discover respectively the element names and the structure of any object.
> names(regr2.1)
[1] "coefficients"
[4] "rank"
[7] "qr"
[10] "call"
"residuals"
"fitted.values"
"df.residual"
"terms"
"effects"
"assign"
"xlevels"
"model"
Thus the object regr2.1 is a list containing 12 elements. If we want to extract one of
its elements, e.g. the coefficients, we may invoke one of the 3 following commands:
> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1["coefficients"]
$coefficients
(Intercept)
MALE
5.146924
1.166097
> regr2.1[["coefficients"]]
(Intercept)
MALE
5.146924
1.166097
obtaining respectively a vector, a list and again a vector.
Pay attention! The command3
> regr2.1["coefficients"] %*% c(1,2)
returns an Error, since the result of regr2.1["coefficients"] is a list and not a
vector and cannot be used as an argument of a matrix product. See Chapter 2 of
Longhow Lam (2010) for the definition of the Data Objects: list and vector.
Remember to use always double square brackets to extract elements in form of vectors
from a list object. The following instructions are correct:
> regr2.1[["coefficients"]] %*% c(1, 2)
2 We
omit to report the call and the result of the function str(regr2.1).
the help ?Arithmetic to have information on arithmetic operators in R: here %*% stands for
the matrix product.
3 See
16
[,1]
[1,] 7.479118
> regr2.1$coefficients %*% c(1, 2)
[,1]
[1,] 7.479118
Other useful statistics resulting from a regression analysis are available in the
object obtained by applying the function summary to the result of lm; so
names(regr2.1) and names(summary(regr2.1)) give different information. The
result of summary(regr2.1) is itself a list containing 11 elements.
> output <- summary(regr2.1)
> names(output)
[1] "call"
"terms"
[4] "coefficients" "aliased"
[7] "df"
"r.squared"
[10] "fstatistic"
"cov.unscaled"
> output$fstatistic
value
numdf
dendf
107.9338
1.0000 3292.0000
2.1.4
"residuals"
"sigma"
"adj.r.squared"
To test whether the parameter 2 is zero, that is to test the null hypothesis
H0 : 2 = 0, we can construct a confidence interval at level (1 ).
We have first to recall the coefficient estimates, their standard errors and the
degrees of freedom, we must establish a value for and determine the corresponding
percentage points for the t random variable.
> regr2.1$coefficients
(Intercept)
MALE
5.146924
1.166097
> coefse <- output$sigma * diag(output$cov.unscaled)^0.5
> coefse
(Intercept)
MALE
0.08122482 0.11224216
17
> regr2.1$df
[1] 3292
> alpha <- 0.05
> qt(1 - alpha/2, regr2.1$df)
[1] 1.960685
The lower and upper bounds of the MALE coefficient result respectively:
> regr2.1$coefficients[2] + c(-1, 1) * qt(1 - alpha/2,
regr2.1$df) * output$sigma * output$cov.unscaled[2,
2]^0.5
[1] 0.946 1.386
The confidence intervals, based on the t distribution, may also be obtained directly
for all parameter estimates, by using the function confint:
> confint(regr2.1, level = 1 - alpha)
2.5 % 97.5 %
(Intercept) 4.988 5.306
MALE
0.946 1.386
2.2
2.2.1
Parameter estimation
(2.2)
The function lm allows us to perform also a linear regression with more variables as
regressors.
As we have already stated, the symbol ~ separates in a formula the dependent
variable from the independent ones and the + symbol, preceding a variable, indicates
the presence of that variable in the model. The intercept is enclosed by default. See
Appendix A.4.
With the following syntax we declare we desire to study, by making use of a linear
model (lm), the relationship between the variable WAGE and the set of independent
variables MALE, SCHOOL and EXPER for the data.frame wages1.
> regr2.2 <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
> summary(regr2.2)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages1)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457
3Q
Max
1.444 34.194
18
Coefficients:
Estimate Std. Error t value
(Intercept) -3.38002
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07
***
***
***
***
2.2.2
To establish if the variables SCHOOL and EXPER add a significant joint effect to the
variable MALE for explaining the dependent variable WAGE, we can compare the latter
model we have estimated (2.2) with (2.1) by using the function anova which performs
an analysis of variance in presence of nested models, see Verbeek p. 27. The first
argument of anova is the object resulting from lm applied to the simpler model, the
second argument is the lm object from the estimation of the more complex model.
> anova(regr2.1, regr2.2)
Analysis of Variance Table
Model 1: WAGE ~ MALE
Model 2: WAGE ~ MALE + SCHOOL + EXPER
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
3292 34077
2
3290 30528 2
3549 191.24 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
2.3
We can import as we made in Section 2.1 the data from the data set capm.dat.
> capm <- read.table(unzip("ch02.zip", "Chapter 2/capm.dat"),
header = T)
We remind that by using the functions head(), tail() and summary() applied to
the data.frame capm it is possible to explore the beginning and the final sections of
the data.frame and to obtain the summary statistics for all the variables included
in capm.
19
The data set contains information on stock market data, see the file capm.dat. Data,
pertaining the following variables, were collected from January 1960 to December
2006.
smb: excess return on the Fama-French size (small minus big) factor
hml: excess return on the Fama-French value (high minus low) factor
2.3.1
Verbeek first considers the parameter estimation of the following three linear
regression models where the intercept is not included.
foodrf = 1 rmrf + ERROR
(2.3)
(2.4)
(2.5)
Observe the presence of the element -1 in the following formulae, first arguments of
the call to lm. It drops the intercept from the list of the regressors. See Appendix A.4.
> regr2.3f <- lm(foodrf ~ -1 + rmrf, data = capm)
> regr2.3d <- lm(durblrf ~ -1 + rmrf, data = capm)
> regr2.3c <- lm(constrrf ~ -1 + rmrf, data = capm)
Food
> summary(regr2.3f)
Call:
lm(formula = foodrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-13.539 -1.026
Median
0.141
3Q
1.745
Max
15.924
Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 0.75774
0.02579
29.39
<2e-16 ***
---
20
Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
3Q
Max
1.7332 17.8871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 1.04736
0.02775
37.74
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.105 on 609 degrees of freedom
Multiple R-squared: 0.7005,
Adjusted R-squared:
F-statistic: 1424 on 1 and 609 DF, p-value: < 2.2e-16
0.7
Construction
> summary(regr2.3c)
Call:
lm(formula = constrrf ~ -1 + rmrf, data = capm)
Residuals:
Min
1Q
-12.9414 -1.7193
Median
-0.1866
3Q
1.4458
Max
11.6551
Coefficients:
Estimate Std. Error t value Pr(>|t|)
rmrf 1.16662
0.02535
46.01
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.836 on 609 degrees of freedom
Multiple R-squared: 0.7766,
Adjusted R-squared: 0.7763
F-statistic: 2117 on 1 and 609 DF, p-value: < 2.2e-16
21
22
We can change the title and the labels in the preceding table, specify which statistics
have to appear in the final part of the table, and also relabel the name of the
independent variable rmrf:
> mtable2.3fdc <- mtable(Food = regr2.3f, Durables = regr2.3d,
Construction = regr2.3c, summary.stats = c("R-squared",
"sigma"))
> mtable2.3fdc <- relabel(mtable2.3fdc, rmrf = "excess market return")
> mtable2.3fdc
Calls:
Food: lm(formula = foodrf ~ -1 + rmrf, data = capm)
Durables: lm(formula = durblrf ~ -1 + rmrf, data = capm)
Construction: lm(formula = constrrf ~ -1 + rmrf, data = capm)
============================================================
Food
Durables
Construction
-----------------------------------------------------------excess market return
0.758***
1.047***
1.167***
(0.026)
(0.028)
(0.025)
-----------------------------------------------------------R-squared
0.586
0.700
0.777
sigma
2.884
3.105
2.836
============================================================
Evaluation of the uncentered R2 s
According to relationship (2.43) in Verbeek the uncentered R2 s is to be evaluated
when a linear model has no intercept. The uncentered R2 s are automatically produced
by R for the three models and figure in the previous output as R-squared (the R
software takes into account the information that the models are constrained).
> 1
[1]
> 1
[1]
> 1
[1]
- sum(regr2.3f$residuals^2)/sum(capm$foodrf^2)
0.5864245
- sum(regr2.3d$residuals^2)/sum(capm$durblrf^2)
0.7004574
- sum(regr2.3c$residuals^2)/sum(capm$constrrf^2)
0.7766193
2.3.2
23
Testing an hypothesis on 1
To test if the coefficients 1 in the linear models (2.3)-(2.5) can be assumed different
from 1 we have to evaluate the statistic:
1 1
.
se(1 )
The estimate of the variance of 1 may be obtained by using the instruction vcov,
which returns the covariance matrix of the parameter estimates. The matrix reduces
in the present case to a scalar, since we are considering a linear model with only one
predictor and without the constant term.
> vcov(regr2.3f)
rmrf
rmrf 0.0006649123
We can thus evaluate the above statistic for the three situations:
> sampletf <- (regr2.3f$coefficients[[1]] - 1)/vcov(regr2.3f)^0.5
> sampletd <- (regr2.3d$coefficients[[1]] - 1)/vcov(regr2.3d)^0.5
> sampletc <- (regr2.3c$coefficients[[1]] - 1)/vcov(regr2.3c)^0.5
and by using the code:
> paste("(Food) statistic: ", round(sampletf, 4),
"
p-value: ", round(2 * (1 - pt(abs(sampletf),
regr2.3f$df)), 4))
> paste("(Durables) statistic: ", round(sampletd,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletd),
regr2.3d$df)), 4))
> paste("(Construction) statistic: ", round(sampletc,
4), "
p-value: ", round(2 * (1 - pt(abs(sampletc),
regr2.3c$df)), 4))
we obtain
[1] "(Food) statistic:
-9.3951
p-value: 0"
[1] "(Durables) statistic:
1.7065
p-value: 0.0884"
[1] "(Construction) statistic:
6.5719
p-value: 0"
The function linearHypothesis in the package car performs directly an F test. The
first argument is the lm object and the second one specifies the hypothesis to be tested
in matrix or symbolic form (see the help ?car::linearHypothesis).
Observe that the values of the statistic F are equal to the squared values of the t
statistics obtained above, while the p-values do coincide, since the proposed tests are
similar.
> library(car)
> linearHypothesis(regr2.3f, "rmrf=1")
24
2.3.3
25
In Verbeek it is then considered the parameter estimation of the following three linear
regression models:
foodrf = 1 + 2 rmrf + ERROR
(2.6)
(2.7)
(2.8)
>
>
>
>
>
26
2.3.4
The following models are considered to verify the presence of the January effect:
foodrf = 1 + 2 jan + 3 rmrf + ERROR
(2.9)
(2.10)
(2.11)
>
>
>
>
>
2.4
Data are available in the file madoff.dat in the zip file ch02.zip.
> madoff <- read.table(unzip("ch02.zip", "Chapter 2/madoff.dat"),
header = T)
27
hml: excess return on the Fama-French value (high minus low) factor
smb: excess return on the Fama-French size (small minus big) factor
Verbeek observes that a simple inspection of the return series produces some
suspiciuos results that are evident by considering some summary statistics:
the mean and the standard deviation that can be obtained by using the functions
mean and sd
> mean(madoff$fsl)
[1] 0.8422326
> sd(madoff$fsl)
[1] 0.7086928
and the fraction of months with a negative return over the whole considered
periods, that is the ratio between the last result we obtained and the length of
the series (number of periods)
> sum(madoff$fsl < 0)/length(madoff$fsl)
[1] 0.0744186
A CAPM analysis is then performed, see Verbeeks Table 2.6, by considering the
following linear model
fslrf = 1 + 2 rmrf + ERROR
28
Call:
lm(formula = fslrf ~ rmrf, data = madoff)
Residuals:
Min
1Q
Median
-1.34773 -0.48005 -0.08337
3Q
0.38865
Max
2.97276
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50495
0.04570 11.049 < 2e-16 ***
rmrf
0.04089
0.01072
3.813 0.00018 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.6658 on 213 degrees of freedom
Multiple R-squared: 0.06388,
Adjusted R-squared: 0.05949
F-statistic: 14.54 on 1 and 213 DF, p-value: 0.0001801
2.5
With regard to the data set Wages in the USA it is now considered the parameter
estimation of the following three equivalent linear regression models4 :
4 Remind
(2.12)
(2.13)
(2.14)
where MALE is a dummy variable with values 0 and 1 and FEMALE satisfies FEMALE = 1 - MALE, are
not identified, since there is exact collinearity among the constant and the dummy variables MALE
and FEMALE; so one of the variables has to be omitted from the model.
In (2.12) the substitution FEMALE = 1 - MALE has been performed, so dropping the variable FEMALE:
(const + F ) + (M F ) MALE = const + M MALE,
In (2.13) the substitution MALE = 1 - FEMALE has been performed, so dropping the variable MALE
(const + M ) + (F M ) FEMALE = const + F FEMALE,
In (2.14) the identity FEMALE + MALE = 1 has been taken into account, it follows that
WAGE = const (MALE + FEMALE) + M MALE + F FEMALE + ERROR
and:
(const + M ) MALE + (const + F ) FEMALE = M MALE + F FEMALE
Finally we have:
const = F
and
const = M .
29
WAGE ~ MALE
Remember that the dummy variable MALE assumes value 1 when the statistical unit
is male and 0 when she is female; so we can define a new dummy variable FEMALE as
1 - MALE.
To write the formula for the second regression model we have to use WAGE ~ I(1
- MALE), unless we do explicit define the new variable FEMALE <- 1 - MALE and use
it in the formula: WAGE ~ FEMALE, but this can be avoided.
Observe that with the function as is I() we specify to R that the difference sign
(-) is to be interpreted in the arithmetic sense and not in the formula sense, which
would drop the variable MALE from the model.
The function lm(WAGE ~ 1 - MALE, data = wages1) would namely result in a
model containing only the intercept, since the minus sign indicates to drop the variable
MALE from the model.
In specifying the third model the presence of the term -1 in the formula excludes
the intercept.
> wages1 <- read.table(unzip("ch02.zip", "Chapter 2/wages1.dat"),
header = T)
Regression 2.7A
> regr2.7A <- lm(WAGE ~ MALE, data = wages1)
> summary(regr2.7A)
Call:
lm(formula = WAGE ~ MALE, data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.14692
0.08122
63.37
<2e-16 ***
MALE
1.16610
0.11224
10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Regression 2.7B
30
3Q
Max
1.487 33.496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) -1.16610
0.11224 -10.39
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.03175,
Adjusted R-squared: 0.03145
F-statistic: 107.9 on 1 and 3292 DF, p-value: < 2.2e-16
Regression 2.7C
> regr2.7C <- lm(WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
> summary(regr2.7C)
Call:
lm(formula = WAGE ~ -1 + MALE + I(1 - MALE), data = wages1)
Residuals:
Min
1Q Median
-6.160 -2.102 -0.554
3Q
Max
1.487 33.496
Coefficients:
Estimate Std. Error t value Pr(>|t|)
MALE
6.31302
0.07747
81.50
<2e-16 ***
I(1 - MALE) 5.14692
0.08122
63.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.217 on 3292 degrees of freedom
Multiple R-squared: 0.764,
Adjusted R-squared: 0.7638
F-statistic: 5328 on 2 and 3292 DF, p-value: < 2.2e-16
31
2.6
2.7
32
Later on let data be a series with elements x1 , . . . , xn . For the sake of simplicity
we work with data simulated from a normal distribution with mean equal to 50 and
unitary variance.
> set.seed(123)
> data <- rnorm(100) + 50
We first consider a graphical inspection of the distribution by plotting an histogram
of data with the theoretical density function of a Normal random variable; then the
2 goodness-of-fit test is introduced. The discussion will proceed by comparing the
empirical cumulative distribution function of data with the theoretical cumulative
distribution function of a Normal random variable; the Kolmogorov-Smirnov test is
based on this comparison. Two graphical tools, the QQ-plot and the PP-plot, will
be derived from the comparison of the empirical and the theoretical distribution
functions. All reasoning applies also in case we want to test some distributional
assumptions different than the Normal one.
2.7.1
We can obtain the histogram of data by using the function hist; pay attention to set
the argument freq = FALSE; in this way relative frequencies and relative densities,
in case of equal and different length intervals respectively, will be plotted
> data.hist <- hist(data, freq = FALSE)
We can add the density of a Normal distribution, by setting the mean and the standard
deviation arguments equal to the sample mean and the sample standard deviation
values of data, see Fig. 2.4.
> curve(dnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)
2.7.2
The object data.hist contains all information necessary to create the histogram.
Namely data.hist$breaks gives the limits of the intervals (classes) in the histogram,
and data.hist$counts the count corresponding to each class.
> data.hist$breaks
[1] 47.5 48.0 48.5 49.0 49.5 50.0 50.5 51.0 51.5 52.0 52.5
> data.hist$counts
[1] 1 3 10 11 23 22 13 9 5 3
We can thus build the following table by considering the same classes of the histogram
(the lowest and highest bounds of the histogram are replaced with and +
respectively)
33
0.2
0.0
0.1
Density
0.3
0.4
Histogram of data
48
49
50
51
52
data
Figure 2.4 Histogram of data with the theoretical density function under the hypothesis
of normality
34
9
5
3
9.824401
4.304671
1.822008
The first two columns of table contain the class bounds zj1 and zj . The third column
contains the observed frequencies and the fourth column the theoretical frequencies
under the assumption of normality. These theoretical frequencies are obtained as n
pj ,
where the probabilities pj are defined as
zj1 x
zj x
pj =
s
s
, s2 the sample mean and the sample variance.
being zj1 and zj the class limits, and x
For testing the null hyphotesis of Normality we can have recourse to the 2
goodness-of-fit test, see Mood, Graybill and Boes (1974), which is based on the
statistic
k+1
X (nj n
pj )2
Q0k =
n
pj
j=1
where k + 1 is the number of the classes.
Q0k is distributed according to a 2k random variable with k degrees of freedom. With
reference to data we have
> (qstat <- sum((table[, 3] - table[, 4])^2/table[,
4]))
[1] 3.761746
with a corresponding p-value equal to
> 1 - pchisq(qstat, nrow(table) - 1)
[1] 0.9263825
so we will not reject the null hypothesis that the elements of data are distributed
according to a Normal random variable.
2.7.3
Let
#xi x
n
be the empirical cumulative distribution function (cdf) of data and F0 (x) a theoretical
cumulative distribution function, see Fig. 2.5 where the empirical cdf is the step
function and the theoretical cdf is the continuous one.
The Kolmogorov-Smirnov statistic to test the null hypothesis X F0 (), where
F0 () is some completely specified continuous cumulative distribution function is
Fn (x) =
Kn =
sup
<x<
(2.15)
0.6
0.4
0.0
0.2
0.8
1.0
47
48
49
50
51
52
35
53
Figure 2.5 Empirical cumulative distribution function (the step function) and the
theoretical distribution function under the null hypothesis of normality
This test can also be used to check if the observations in two data sets (x1 , . . . , xnx )
and (y1 , . . . , yny ) come from the same distribution; in this case F0 (x) is replaced with
the empirical cdf calculated on (y1 , . . . , yny ).
The Kolmogorov-Smirnov statistic is based on the maximum absolute distance
between the empirical cdf Fn () and the theoretical one F0 (), see Fig. 2.6.
> plot(ecdf(data), xlim = c(47, 53), cex = 0.5, main = "",
ylab = expression(F[n](x)~~and~~F[0](x)))
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE)
> x <- sort(data)
> curve(ecdf(data)(x) - pnorm(x, mean = mean(data),
sd = sd(data)), n = 10000, xlim = c(47, 53),
ylim = c(-0.06, 0.06), ylab = "distance")
> abline(h = 0)
36
0.00
0.06
0.04
0.02
distance
0.02
0.04
0.06
47
48
49
50
51
52
53
Figure 2.6 Distance between the empirical cumulative distribution function and the
theoretical distribution function under the null hypothesis of normality
37
"less",
"greater",
2.7.4
We now focus our attention, following an idea suggested by Diego Zappa, on the
comparison of the empirical cdf with the theoretical cdf to obtain the so-called PPplot and QQ-plot, see Zappa, Bramante and Nai Ruscone (2012). Figure 2.7 shows a
zoom of the graph in Fig. 2.5.
Figure 2.7 can be obtained with the following code.
> xlim <- c(47, 50)
> xtextshift <- 0.15
> plot(ecdf(data), xlim = xlim, ylim = ecdf(data)(xlim),
xaxs = "i", yaxs = "i", main = "",
ylab = expression(F[n](x)~~and~~F[0](x)), cex = 0.5)
> curve(pnorm(x, mean = mean(data), sd = sd(data)),
add = TRUE, xlim = xlim)
> point <- data[which.max(abs(ecdf(data)(data) - pnorm(data,
mean = mean(data), sd = sd(data))))]
> arrows(point, 0, point, ecdf(data)(point), length = 0.1,
angle = 22)
> arrows(point, 0, point, pnorm(point, mean = mean(data),
sd = sd(data)), length = 0.1, angle = 22)
38
39
Fn(xp)
0.4
F0(xp)
0.3
0.2
~
p
0.1
0.0
47.0
47.5
48.0
x0p~
x~p
48.5
49.0
xp
49.5
50.0
40
0.6
0.4
0.0
0.2
p (sample probabilities)
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
p* (theoretical probabilities)
Figure 2.8
2.7.5
PP-plot
41
52
49
50
51
48
48
49
50
51
52
Figure 2.9
2.8
QQ-plot
2.8.1
The Jarque-Bera test, see Jarque and Bera (1987), is obtained as a Lagrange
Multiplier statistic, see Verbeeks Chapter 6, and has the following forms:
,
3/2
b2 =
4
,
22
j =
1X
(xi x
)j
n i=1
and x
=
1X
xi .
n i=1
42
Figure 2.10
Observe that b1 and b2 are respectively the skewness and kurtosis sample
coefficients, which are null under the normality assumption.
43
23
4
3
1
3
1
+
3
JB = n
+
n
32
2
6
24
22
2
22
where:
n
j =
1X j
e .
n i=1 i
When the linear model includes a constant the residuals have zero mean, that
is
1 = 0, and the Jarque-Bera statistics reduces to the former definition.
In both cases the Jarque-Bera statistic is distributed as a 22 random variable with 2
degrees of freedom.
In the package tseries the function jarque.bera.test is available to perform the
Jarque-Bera test on a set of observations. By applying it on data we obtain
> library(tseries)
> jarque.bera.test(data)
Jarque Bera Test
data: data
X-squared = 0.1691, df = 2, p-value = 0.9189
and the null hypothesis of normality will not be rejected.
2.8.2
The Shapiro-Wilk normality test, see Shapiro and Wilk (1965), is implemented in the
function shapiro.test; applying this function to data we obtain
> shapiro.test(data)
Shapiro-Wilk normality test
data: data
W = 0.9939, p-value = 0.9349
which does not reject the null hypothesis of normality.
2.9
We now consider the behaviour of the QQ-plot (and of the PP-plot), under the null
hypothesis of normality, in presence of data characterized by skewness, leptokurtic
and platikurtic behaviour.
44
2.9.1
1 1 x
x
e
I(0,) (x),
()
> 0, > 0
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = alpha/lambda), add = TRUE)
text(0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(pgamma(x, alpha, lambda), xlim = c(-2, 6),
ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = alpha/lambda), add = TRUE)
text(2, 0.75, expression(F[X](x)), cex = 0.75)
text(2, 0.35, expression(F[Y](x)), cex = 0.75)
We can establish the behaviour of the PP- and QQ-plots by considering the cumulative
distribution functions as was shown in Section 2.7.4
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-2, 6, length = 500)
plot(pnorm(x, mean = alpha/lambda), pgamma(x, alpha,
lambda), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities",
ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = alpha/lambda), qgamma(x, alpha,
lambda), xlim = c(-2, 6), ylim = c(-2, 6), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-0.75, 1.5, "left tail thinner than the normal tail",
cex = 0.75)
text(3, 5.5, "right tail fatter than the normal tail",
cex = 0.75)
45
In this situation the left tail of X is thinner than that of Y while the right tail of X
is fatter than that of Y . Thus the quantiles on the tails of the two distributions will
have the following behaviour: for any given p (close to 0 or to 1) the quantiles of X
are larger than those of Y . The behaviour is evident by examining the QQ plot.
The PP-plot clearly detects a different behaviour of the two distributions in the
middle of the domain.
We now apply the function fit.cont to some simulated data, see Fig. 2.15.
> set.seed(123)
> skew.data <- rgamma(100, alpha, lambda)
> library(rriskDistributions)
> fit.cont(skew.data)
2.9.2
Figure 2.12 shows the density functions and the cdfs of X = W , being W the Gamma
random variable with parameters = 4 and = 2 considered in the previous section,
and of a Normal random variable, Y , with mean 24 = 2 and variance 242 = 1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
alpha = 4
lambda = 2
curve(dgamma(-x, alpha, lambda), xlim = c(-6, 2),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = -alpha/lambda), add = TRUE)
text(-0.75, 0.4, expression(f[X](x)), cex = 0.75)
text(-3, 0.35, expression(f[Y](x)), cex = 0.75)
curve(1 - pgamma(-x, alpha, lambda), xlim = c(-6,
2), ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = -alpha/lambda), add = TRUE)
text(-1.75, 0.75, expression(F[X](x)), cex = 0.75)
text(-1.75, 0.35, expression(F[Y](x)), cex = 0.75)
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-6, 2, length = 500)
plot(pnorm(x, mean = -alpha/lambda), 1 - pgamma(-x,
alpha, lambda), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities",
ylim = c(0, 1))
> abline(0, 1)
> x <- seq(0, 1, length = 1000)
> plot(qnorm(x, mean = -alpha/lambda), -qgamma(1 x, alpha, lambda), xlim = c(-6, 2), ylim = c(-6,
46
2.9.3
Leptokurtic distributions
layout(1:2)
par(mai = c(0.5, 0.82, 0.1, 0.42))
k = 4
curve(dt(x, k), xlim = c(-8, 8),
ylab = expression(f[X](x)~~and~~f[Y](x)))
curve(dnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0.75, 0.35, expression(t[4]), cex = 0.75)
text(0, 0.24, "normal", cex = 0.75)
curve(pt(x, k), xlim = c(-8, 8),
ylab = expression(F[X](x)~~and~~F[Y](x)))
curve(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), add = TRUE)
text(0, 0.2, expression(F[X](x)), cex = 0.75)
text(1.5, 0.7, expression(F[Y](x)), cex = 0.75)
>
>
>
>
>
>
>
>
>
>
47
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-8, 8, length = 500)
plot(pnorm(x, mean = 0, sd = (k/(k - 2))^0.5), pt(x,
k), type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0, sd = (k/(k - 2))^0.5), qt(x,
k), xlim = c(-8, 8), ylim = c(-8, 8), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-1.25, -7.5, "left tail fatter than the normal tail",
cex = 0.75)
text(1, 7.5, "right tail fatter than the normal tail",
cex = 0.75)
In this situation the tails of X are fatter than those of Y . Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are smaller than those of Y ; for any given p close to 1 the
quantiles of X are larger than those of Y . The behaviour is evident by examining the
QQ plot.
The density functions are now symmetric and thus the PP-plot intersect the 0-1 line
at the center of the distributions; however it always can detect the different behaviour
of the two distributions in the middle of their domain.
We apply the function fit.cont to some simulated data, see Fig. 2.17.
> set.seed(123)
> leptokurtic.data <- rt(100, k)
> library(rriskDistributions)
> fit.cont(leptokurtic.data)
2.9.4
Platikurtic distributions
48
>
>
>
>
>
>
layout(1:2)
par(mai = c(0.9, 0.82, 0.1, 0.42))
x <- seq(-1, 2, length = 500)
plot(pnorm(x, mean = 0.5, sd = 1/12^0.5), punif(x),
type = "l", xaxs = "i", yaxs = "i",
xlab = "theoretical probabilities",
ylab = "sample probabilities", ylim = c(0, 1))
abline(0, 1)
x <- seq(0, 1, length = 1000)
plot(qnorm(x, mean = 0.5, sd = 1/12^0.5), qunif(x),
xlim = c(-1, 2), ylim = c(-1, 2), type = "l",
xlab = "theoretical quantiles", ylab = "sample quantiles")
abline(0, 1)
text(-0.5, 0.5, "left tail thinner than the normal tail",
cex = 0.75)
text(1.5, 0.5, "right tail thinner than the normal tail",
cex = 0.75)
In this situation the tails of Y are fatter than those of X. Thus the quantiles on the
tails of the two distributions will have the following behaviour: for any given p close
to 0 the quantiles of X are larger than those of Y ; for any given p close to 1 the
quantiles of X are smaller than those of Y . The behaviour is evident by examining
the QQ plot.
As above the density functions are now symmetric and thus the PP-plot intersect
the 0-1 line at the center of the distributions; however it can detect the different
behaviour of the two distributions in the middle of their domain.
We apply the function fit.cont to some simulated data, see Fig. 2.18.
> set.seed(123)
> platikurtic.data <- runif(100)
> library(rriskDistributions)
> fit.cont(platikurtic.data)
49
0.4
fX(x)
0.3
0.2
0.0
0.1
fY(x)
1.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
0.8
0.6
0.4
0.2
0.0
sample probabilities
1.0
0.2
0.4
0.6
0.8
theoretical probabilities
4
2
0
sample quantiles
theoretical quantiles
50
0.4
fX(x)
0.3
0.2
0.0
0.1
fY(x)
1.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
0.8
0.6
0.4
0.2
0.0
sample probabilities
1.0
0.2
0.4
0.6
0.8
0
2
sample quantiles
theoretical probabilities
theoretical quantiles
51
0.1
0.2
normal
0.0
0.3
t4
1.0
0.4
0.6
FY(x)
FX(x)
0.0
0.2
0.8
0.8
0.6
0.4
0.2
0.0
sample probabilities
1.0
0.0
0.2
0.4
0.6
0.8
1.0
theoretical probabilities
0
5
sample quantiles
theoretical quantiles
Figure 2.13 Density and cumulative distribution functions of a t4 random variable with
4 degrees of freedom (leptokurtic distribution) and a Normal random variable. Theoretical
PP-plot and QQ-plot for the comparison of the two distributions
52
1.5
0.5
1.0
fX(x)
0.0
fY(x)
0.5
0.0
0.5
1.0
1.0
1.5
2.0
1.5
2.0
0.8
0.4
0.6
FX(x)
FY(x)
0.0
0.2
1.0
0.5
0.0
0.5
1.0
0.8
0.6
0.4
0.2
0.0
sample probabilities
1.0
1.0
0.2
0.4
0.6
0.8
1.0
sample quantiles
theoretical probabilities
1.0
0.5
0.0
0.5
1.0
1.5
2.0
theoretical quantiles
Figure 2.14 Density and cumulative distribution functions of a uniform random variable
(platikurtic distribution) and a Normal random variable. Theoretical PP-plot and QQ-plot
for the comparison of the two distributions
53
Figure 2.15
fit.cont
Fitting positively skewed data, see Section 2.9.1, by using the function
Figure 2.16
fit.cont
Fitting
negatively skewed data, see Section 2.9.2, by using the function
54
Figure 2.17
Fitting leptokurtic data, see Section 2.9.3, by using the function fit.cont
Figure 2.18
Fitting platikurtic data, see Section 2.9.4, by using the function fit.cont
3
Interpreting and comparing Linear
Regression Models
3.1
The variable names in the dat file housing.dat provided in the zip file ch03.zip are
not stored only in the first line (the last two names are reported on the second line
of the text file), so if one desires to read data with the command read.table, see
Section 2.1, he has to first settle, by using a text editor, the variable names in the
housing.dat file only on the first line.
Here, we import data from the file housing.dta, which is saved in the Stata format.
We have first to invoke the package foreign and next the command read.dta.
Remember that the function unzip extracts a file from a compressed archive.
> library(foreign)
> housing <- read.dta(unzip("ch03.zip", "Chapter 3/housing.dta"))
Recall that it is possible to explore the beginning section and the final section of the
data.frame and to obtain summary statistics for all the variables included in the
data-frame by using the functions head(), tail() and summary().
The first linear model proposed by Verbeek studies the interpretation of
log(price) as a function of the log(lotsize), the number of bedrooms, the number
of bathrooms and the presence of air conditioning. The corresponding parameter
estimates may be obtained by using the command lm; the log transformation on
the specified variables can also be performed without applying the function as is
I(log()) (see Section 2.5) since, for the logarithm case, there is no ambiguity
between the use of mathematical and symbolic operators proper of the formula
function. See Appendix A.4.
> regr3.1 <- lm(log(price) ~ log(lotsize) + bedrooms +
bathrms + airco, data = housing)
> summary(regr3.1)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco, data = housing)
56
Residuals:
Min
1Q
-0.81782 -0.15562
Median
0.00778
3Q
0.16468
Max
0.84143
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.09378
0.23155 30.636 < 2e-16 ***
log(lotsize) 0.40042
0.02781 14.397 < 2e-16 ***
bedrooms
0.07770
0.01549
5.017 7.11e-07 ***
bathrms
0.21583
0.02300
9.386 < 2e-16 ***
airco
0.21167
0.02372
8.923 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2456 on 541 degrees of freedom
Multiple R-squared: 0.5674,
Adjusted R-squared: 0.5642
F-statistic: 177.4 on 4 and 541 DF, p-value: < 2.2e-16
The estimate of s = 0.2456 (Residual standard error) is also available by invoking
the instruction summary(regr3.1)$sigma:
The expected log(price) of an house with specific characteristic may be obtained
by applying the function predict: the first argument is the lm object containing
the parameter estimates to use in the prediction; the second argument specifies the
values of the regressors, for which the prediction of the response is desired, in form
of data.frame.
> predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0))
1
11.03088
One may obtain1 the prediction of the price by calculating the exp of the preceding
value or directly by means of the following expression:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)))
1
61751.63
To include one half of the residual variance s2 in the prediction use:
> exp(predict(regr3.1, data.frame(lotsize = 5000, bedrooms = 4,
bathrms = 1, airco = 0)) + summary(regr3.1)$sigma^2/2)
1
63641.78
1 Values
57
Verbeek observes how the average of predicted prices, which can be extracted from
the lm object regr3.1 as regr3.1$fitted
> mean(exp(regr3.1$fitted))
[1] 66152.74
underestimates the sample average of observed prices
> mean(housing$price)
[1] 68121.6
concluding that the bias can be reduced by adding the half-variance term
> mean(exp(regr3.1$fitted + summary(regr3.1)$sigma^2/2))
[1] 68177.61
3.1.1
According to the RESET test procedure for checking the functional form of the
preceding model, one has to include, as predictors in the model specification, the
first Q powers, e.g. the second and the third ones, of the values of y estimated by the
model. The fitted y values may be obtained with the instruction:
> yhat <- predict(regr3.1)
Observe that when no value of the regressors is given as argument of predict, the
prediction is made for the data set used to estimate the parameters in the linear
model: in this way the fitted y values are obtained.
Two ways exist to include additional terms into a formula defining a linear model:2
we can modify the formula in the basic model by using the function update.
In the sequel we will follow this latter solution.
The function update has three arguments. The first is the object to update, which is
an object of class lm, that is the result of a preceding linear model call. The second one
is the updating formula: the dot stands for the same elements, so .~. means both
members of the invoked model in their original form. The additional term squared
predicted values is then included in the formula. The third optional3 argument is the
data.frame the variables involved in the linear model refer to.
2 Remind to use the as.is I() operator to specify the powers of the regressors, since is a symbol
proper of the formula method, see Appendix A.4.
3 In this way it is possible to update an existing formula and apply it to a new data.frame
58
Median
0.00836
3Q
0.16274
Max
0.84243
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.00888
4.05883
1.234
0.218
log(lotsize) -0.13381
1.03870 -0.129
0.898
bedrooms
-0.02570
0.20157 -0.128
0.899
bathrms
-0.07774
0.57105 -0.136
0.892
airco
-0.07225
0.55235 -0.131
0.896
I(yhat^2)
0.06032
0.11724
0.515
0.607
Residual standard error: 0.2457 on 540 degrees of freedom
Multiple R-squared: 0.5676,
Adjusted R-squared: 0.5636
F-statistic: 141.8 on 5 and 540 DF, p-value: < 2.2e-16
From the p-value of I(yhat^2)) one can observe that the coefficient of the squared
predicted values is not significantly different from 0.
We have now to include also the third power of the predicted values: we update the
latter model regr3.1RESET2.
> regr3.1RESET3 <- update(regr3.1RESET2, . ~ . + I(yhat^3))
> summary(regr3.1RESET3)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + I(yhat^2) + I(yhat^3), data = housing)
Residuals:
Min
1Q
-0.81241 -0.15526
Median
0.00843
3Q
0.15948
Max
0.84892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -274.5008
300.6983 -0.913
0.362
log(lotsize) -33.4090
35.8094 -0.933
0.351
bedrooms
-6.4829
6.9490 -0.933
0.351
bathrms
-18.0151
19.3038 -0.933
0.351
airco
-17.6684
18.9363 -0.933
0.351
I(yhat^2)
7.4812
7.9835
0.937
0.349
I(yhat^3)
-0.2207
0.2375 -0.930
0.353
59
60
3.1.2
In the package lmtest it is available also the resettest function, which performs
directly the RESET test, by specifying the following arguments:
the kind of terms to include (in our case the fitted values, that is the predicted
values).
> library(lmtest)
> resettest(regr3.1, power = 2, type = "fitted")
RESET test
data: regr3.1
RESET = 0.2647, df1 = 1, df2 = 540, p-value = 0.6071
> resettest(regr3.1, power = 2:3, type = "fitted")
RESET test
data: regr3.1
RESET = 0.5644, df1 = 2, df2 = 539, p-value = 0.569
The RESET statistics correspond to the F statistics in an ANOVA test comparing the
two models; in the first instance the RESET value is equal to the squared t statistic
calculated above to test the significance of I(yhat^2), namely 0.51452 = 0.2647 (the
significance level is the same for the two proposed tests, which are similar, that is
equivalent, since Tk2 = F1,k ). The second RESET value coincides with the F statistic
in the ANOVA analysis.
3.1.3
Testing the functional form: the RESET test for the extended
model
Since prices may also depend on other characteristics, all the variables available in the
data set are included in the preceding model specification: we have to update model
regr3.1:
> regr3.2 <- update(regr3.1, . ~ . + driveway + recroom +
fullbase + gashw + garagepl + prefarea + stories)
> summary(regr3.2)
Call:
lm(formula = log(price) ~ log(lotsize) + bedrooms + bathrms +
airco + driveway + recroom + fullbase + gashw + garagepl +
prefarea + stories, data = housing)
Residuals:
Min
1Q
-0.68355 -0.12247
Median
0.00802
3Q
0.12780
61
Max
0.67564
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
7.74509
0.21634 35.801 < 2e-16 ***
log(lotsize) 0.30313
0.02669 11.356 < 2e-16 ***
bedrooms
0.03440
0.01427
2.410 0.016294 *
bathrms
0.16576
0.02033
8.154 2.52e-15 ***
airco
0.16642
0.02134
7.799 3.29e-14 ***
driveway
0.11020
0.02823
3.904 0.000107 ***
recroom
0.05797
0.02605
2.225 0.026482 *
fullbase
0.10449
0.02169
4.817 1.90e-06 ***
gashw
0.17902
0.04389
4.079 5.22e-05 ***
garagepl
0.04795
0.01148
4.178 3.43e-05 ***
prefarea
0.13185
0.02267
5.816 1.04e-08 ***
stories
0.09169
0.01261
7.268 1.30e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2104 on 534 degrees of freedom
Multiple R-squared: 0.6865,
Adjusted R-squared: 0.6801
F-statistic: 106.3 on 11 and 534 DF, p-value: < 2.2e-16
the estimate of sigma can be extracted as usual with:
> summary(regr3.2)$sigma
[1] 0.2103959
An F test is performed to compare the present model with the previous one:
> anova(regr3.1, regr3.2)
Analysis of Variance Table
Model 1: log(price) ~ log(lotsize) + bedrooms + bathrms + airco
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
driveway + recroom + fullbase+gashw+garagepl+prefarea+stories
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
541 32.622
2
534 23.638 7
8.9839 28.993 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The following results show that according to the RESET tests the hypothesis of linear
model should not be rejected.
> resettest(regr3.2, power = 2, type = "fitted")
62
RESET test
data: regr3.2
RESET = 0.0033, df1 = 1, df2 = 533, p-value = 0.9539
> resettest(regr3.2, power = 2:3, type = "fitted")
RESET test
data: regr3.2
RESET = 0.0391, df1 = 2, df2 = 532, p-value = 0.9616
The 0.0033 in the first RESET output is the square of the rounded value (0.06) present
in Verbeek at p. 75, which is referred to the t-test formulation of the RESET test.
3.1.4
To establish in the formula the presence of the interaction term among prefarea and
bedrooms we can follow two methods, see Appendix A.4:
define the new term in the formula as the product between the involved
variables: I(prefarea*bedrooms).
define the new term in the formula by making use of the : operator, which in
the formula algebra stands for interaction, as prefarea:bedrooms.
Median
0.00793
3Q
0.12909
Max
0.67559
Coefficients:
(Intercept)
log(lotsize)
bedrooms
bathrms
airco
driveway
recroom
fullbase
gashw
garagepl
0.047961
prefarea
0.146040
stories
0.091473
bedrooms:prefarea -0.004675
--Signif. codes: 0 "***" 0.001
0.011487
0.110285
0.012729
0.035556
4.175
1.324
7.186
-0.131
63
3.48e-05 ***
0.186003
2.26e-12 ***
0.895454
3.1.5
Prediction
The expected log sale price for an arbitrary house in Windsor, with the characteristics
specified in Verbeek can be obtained as:
> predictregr3.2 <- predict(regr3.2, data.frame(lotsize = 10000,
bedrooms = 4, bathrms = 1, airco = 1, driveway = 1,
recroom = 1, fullbase = 1, gashw = 1, garagepl = 2,
prefarea = 1, stories = 2))
> predictregr3.2
1
11.86959
> exp(predictregr3.2)
1
142855.1
and the prediction corrected by considering the half variance factor:
> exp(predictregr3.2 + summary(regr3.2)$sigma^2/2)
1
146052.2
3.1.6
64
Residuals:
Min
1Q Median
-41389 -9307
-591
3Q
7353
Max
74875
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4038.3504 3409.4713 -1.184 0.236762
lotsize
3.5463
0.3503 10.124 < 2e-16 ***
bedrooms
1832.0035 1047.0002
1.750 0.080733 .
bathrms
14335.5585 1489.9209
9.622 < 2e-16 ***
airco
12632.8904 1555.0211
8.124 3.15e-15 ***
driveway
6687.7789 2045.2458
3.270 0.001145 **
recroom
4511.2838 1899.9577
2.374 0.017929 *
fullbase
5452.3855 1588.0239
3.433 0.000642 ***
gashw
12831.4063 3217.5971
3.988 7.60e-05 ***
garagepl
4244.8290
840.5442
5.050 6.07e-07 ***
prefarea
9369.5132 1669.0907
5.614 3.19e-08 ***
stories
6556.9457
925.2899
7.086 4.37e-12 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 15420 on 534 degrees of freedom
Multiple R-squared: 0.6731,
Adjusted R-squared: 0.6664
F-statistic: 99.97 on 11 and 534 DF, p-value: < 2.2e-16
3.1.7
To choose between a linear and a loglinear functional form Verbeek, p. 69, suggests
to make recourse to the PE test procedure4 .
We consider first the construction step by step of the test. We have first to obtain
the predictors in the linear and in the loglinear specifications.
> predlin <- predict(regr3.3)
> predloglin <- predict(regr3.2)
Then we have to consider the estimation of the augmented linear model by adding
the proper term, see Verbeek, and perform an ANOVA to compare the augmented
model with the initial one.
> linaugm <- update(regr3.3, . ~ . + I(log(predlin) predloglin))
> anova(linaugm, regr3.3)
4 At Verbeeks pp. 67-68 the encompassing procedure is presented to compare two non-nested
linear models. This is implemented in the R function encomptest, available in the package lmtest.
See the help ?lmtest::encomptest for more information on this function.
65
+
+
+
+
However, in the package lmtest it is available the function petest that performs the
two previous tests by adding the proper augmentation terms to the linear and loglinear
models and returns the parameter estimates and the t statistics of the augmented
terms in the augmented models.
> library(lmtest)
> petest(regr3.3, regr3.2)
PE test
Model 1: price ~ lotsize + bedrooms + bathrms + airco + driveway +
recroom + fullbase + gashw + garagepl + prefarea + stories
Model 2: log(price) ~ log(lotsize) + bedrooms + bathrms + airco +
driveway + recroom + fullbase + gashw + garagepl + prefarea +
stories
Estimate Std. Error t value Pr(>|t|)
66
M1 + log(fit(M1))-fit(M2)
-74774
12068 -6.1961 1.159e-09 ***
M2 + fit(M1)-exp(fit(M2))
0
0 -0.5688
0.5697
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Observe that the square of the t values correspond to the F statistic obtained before.
3.2
In Section 3.2.2 Verbeek presents some criteria apt to perform regressor selection. In
2 , the stepwise,
Section 3.5 Verbeek compares the models corresponding to the max R
the min AIC and the min BIC criteria with a general unrestricted model explaining
the excess return on S&P 500 index, EXRET, conditional on the full set of regressors
consisting of:
CS_1: credit spread (yield on Moodys Aaa minus BBa debt), lagged one month,
DY_1: dividend yield S&P 500 index, lagged one month (in % per month),
67
We next construct a contingency table month by year. By looking at this table possibly
missing time records may be found:
> table(year, month)
month
year
01 02 03 04 05 06 07 08 09 10 11 12
1966 1 1 1 1 1 1 1 1 1 1 1 1
1967 1 1 1 1 1 1 1 1 1 1 1 1
1968 1 1 1 1 1 1 1 1 1 1 1 1
omitted
month
year
01 02 03 04 05 06 07 08 09 10 11 12
2003 1 1 1 1 1 1 1 1 1 1 1 1
2004 1 1 1 1 1 1 1 1 1 1 1 1
2005 1 1 1 1 1 1 1 1 1 1 1 1
To check if the time series is complete (presence of no missing data) one may control
if some entry in the preceding table exists, whose value is zero.
> prod(table(year, month))
[1] 1
In our case the time series is complete since the product of the entries in the table is
different from 0.
The regression analyses are proposed on the time window beginning at January 1966
and ending at December 1995, so it is possible to define a time series object and a
temporary variable to perform regressions.
> pred <- ts(data = pred, start = c(1966, 1), frequency = 12)
> predtmp <- window(pred, start = c(1966, 1), end = c(1995,
12))
To compare the goodness of the models, the out of sample forecasting performance
will also be evaluated on the time window starting at January 1996 and ending at
December 2005, see below Section 3.2.9.
> predctrl <- window(pred, start = c(1996, 1))
Pay attention to the definition of the variable EXRET, the excess return, which is
expressed as percentage; so in the following analyses it has to be divided by 100.
We now present the methods to implement the four main procedures described by
Verbeek to perform a model selection.
3.2.1
The full model consists simply in the estimation of the general unrestricted linear
model.
68
Median
0.001566
3Q
0.026069
Max
0.138232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.020743
0.040228
0.516 0.60644
PE_1
-0.119712
0.129367 -0.925 0.35542
DY_1
0.126504
0.082880
1.526 0.12783
INF_2
-0.163318
0.076788 -2.127 0.03413 *
IP_2
-0.059783
0.061097 -0.978 0.32851
I3_1
0.268687
0.124505
2.158 0.03161 *
I3_2
-0.222916
0.121074 -1.841 0.06645 .
I12_1
-0.505236
0.123478 -4.092 5.33e-05 ***
I12_2
0.388662
0.127934
3.038 0.00256 **
MB_2
-0.043959
0.083836 -0.524 0.60037
CS_1
0.175387
0.109343
1.604 0.10962
WINTER
0.006249
0.004405
1.419 0.15693
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04022 on 348 degrees of freedom
Multiple R-squared: 0.1698,
Adjusted R-squared: 0.1435
F-statistic: 6.47 on 11 and 348 DF, p-value: 8.4e-10
3.2.2
2 criterion
The max R
> summary(regr3.4mr)
Subset selection object
Call: regsubsets.formula(EXRET/100
force.out = c(1, 12))
13 Variables (and intercept)
Forced in Forced out
CS_1
FALSE
TRUE
DY_1
FALSE
FALSE
I12_1
FALSE
FALSE
I12_2
FALSE
FALSE
I3_1
FALSE
FALSE
I3_2
FALSE
FALSE
INF_2
FALSE
FALSE
IP_2
FALSE
FALSE
MB_2
FALSE
FALSE
PE_1
FALSE
FALSE
WINTER
FALSE
FALSE
OBS
FALSE
TRUE
TS_1
FALSE
FALSE
1 subsets of each size up to 12
Selection Algorithm: exhaustive
CS_1 DY_1 I12_1 I12_2 I3_1 I3_2
1 "*" " " " "
" "
" " " "
2 " " " " "*"
"*"
" " " "
3 "*" " " "*"
"*"
" " " "
4 "*" "*" "*"
"*"
" " " "
5 "*" "*" "*"
"*"
" " " "
6 "*" "*" "*"
"*"
"*" "*"
7 "*" "*" "*"
"*"
"*" "*"
8 "*" "*" "*"
"*"
"*" "*"
9 "*" "*" "*"
"*"
"*" "*"
10 "*" "*" "*"
"*"
"*" "*"
11 "*" "*" "*"
"*"
"*" "*"
69
INF_2
" "
" "
" "
" "
" "
" "
"*"
"*"
"*"
"*"
"*"
IP_2
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
MB_2
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
" "
"*"
PE_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
"*"
"*"
WINTER
" "
" "
" "
" "
"*"
" "
" "
" "
"*"
"*"
"*"
OBS
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
TS_1
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
" "
70
0.15
0.14
0.14
0.14
adjr2
0.14
0.14
0.13
0.13
0.11
0.081
Figure 3.1
TS_1
OBS
WINTER
PE_1
MB_2
IP_2
I3_2
INF_2
I3_1
I12_2
I12_1
DY_1
CS_1
(Intercept)
0.019
[8,]
8
0.14310437
[9,]
9
0.14496314
[10,]
10
0.14532321
[11,]
11
0.14354387
> plot(regr3.4mr, scale = "adjr2")
We can extract the coefficients5 for the model with max R2
> coef(regr3.4mr, 10)
(Intercept)
CS_1
DY_1
I12_1
I12_2
0.031193369 0.158311342 0.102823200 -0.503855027 0.397589862
I3_1
I3_2
INF_2
IP_2
PE_1
0.258969875 -0.223074248 -0.156767570 -0.068582462 -0.157694683
WINTER
0.006415849
5 This
71
To compare the parameter estimates of this model with those pertaining the other
selection criteria we have to obtain an lm object with the estimation results.
This may be obtained by performing the following procedure.
1. The names of the variables, included as regressors in the selected model, correspond
to the names of the coefficients we have just obtained but the first element:
> anames <- names(coef(regr3.4mr, 10))[-1]
> anames
[1] "CS_1"
"DY_1"
"I12_1" "I12_2" "I3_1"
[7] "INF_2" "IP_2"
"PE_1"
"WINTER"
"I3_2"
2. The function match returns a vector of the positions of (first) matches of its first
argument in its second.
We can use this function to get the column indices in the data.frame predtmp
(specified as second argument) that match to the vector of elements consisting of
the dependent variable name, EXRET, and the independent variable names, anames,
(first argument).
> a <- match(c("EXRET", anames), colnames(predtmp))
> a
[1] 4 2 3 5 6 7 8 9 10 12 14
3. Finally the data.frames necessary to perform the linear regression and the
out of sample performance evaluation can be defined by selecting the involved
columns/variables in predtmp and predctrl, and the regr3.4mr is obtained as an
lm object.
> predtmpmr <- data.frame(predtmp[, a])
> predctrlmr <- data.frame(predctrl[, a])
> regr3.4mr <- lm(EXRET/100 ~ ., data = predtmpmr)
> summary(regr3.4mr)
Call:
lm(formula = EXRET/100 ~ ., data = predtmpmr)
Residuals:
Min
1Q
-0.195037 -0.023523
Median
0.001951
3Q
0.026865
Max
0.138763
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.031193
0.034907
0.894 0.37214
CS_1
0.158311
0.104272
1.518 0.12986
DY_1
0.102823
0.069422
1.481 0.13947
I12_1
-0.503855
0.123322 -4.086 5.46e-05 ***
I12_2
0.397590
0.126664
3.139 0.00184 **
I3_1
0.258970
0.122990
2.106 0.03595 *
I3_2
-0.223074
0.120947 -1.844 0.06597 .
72
INF_2
-0.156768
IP_2
-0.068582
PE_1
-0.157695
WINTER
0.006416
--Signif. codes: 0 "***"
0.075686
0.058686
0.107072
0.004389
-2.071
-1.169
-1.473
1.462
0.03907 *
0.24335
0.14171
0.14470
3.2.3
Stepwise
In the package MASS the instruction dropterm is available, which returns some
statistics for testing the presence of every term appearing in a linear regression model.
An initial model, which we assume to be a general unrestricted model, may then be
recursively improved, by alternating the functions dropterm and update, until no
regressor needs to be excluded. In the next section we report an algorithm to perform
a stepwise backward selection procedure in an automatic way.
The syntax of dropterm consists of two arguments: the first one is an lm object,
that is an object resulting from a linear model estimation; the latter is the test to be
performed, in our case an ANOVA consisting in an F test. We recall that the function
update has two main arguments: the first is the lm object to update; the second one
is the updating formula.
We first present the sequence of steps for the model selection in the current case
study. The variable with the lowest (non significant) F statistic will be excluded at
each step.
> library(MASS)
> dropterm(regr3.4f, test = "F")
Single term deletions
Model:
EXRET/100
I12_2
Df
<none>
PE_1
1
DY_1
1
INF_2
1
IP_2
1
I3_1
1
I3_2
1
I12_1
1
I12_2
1
MB_2
1
CS_1
1 0.0041614 0.56703 -2301.2 2.5729 0.109619
WINTER 1 0.0032546 0.56612 -2301.8 2.0122 0.156931
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_1 <- update(regr3.4f, . ~ . - MB_2)
> dropterm(step_1, test = "F")
Single term deletions
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56331 -2303.6
PE_1
1 0.0035011 0.56681 -2303.4 2.1691
0.14171
DY_1
1 0.0035409 0.56685 -2303.3 2.1938
0.13947
INF_2
1 0.0069248 0.57024 -2301.2 4.2903
0.03907 *
IP_2
1 0.0022044 0.56551 -2304.2 1.3657
0.24335
I3_1
1 0.0071561 0.57047 -2301.1 4.4336
0.03595 *
I3_2
1 0.0054907 0.56880 -2302.1 3.4018
0.06597 .
I12_1
1 0.0269434 0.59025 -2288.8 16.6928 5.456e-05 ***
I12_2
1 0.0159032 0.57921 -2295.6 9.8528
0.00184 **
CS_1
1 0.0037206 0.56703 -2303.2 2.3051
0.12986
WINTER 1 0.0034489 0.56676 -2303.4 2.1368
0.14470
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> step_2 <- update(step_1, . ~ . - IP_2)
> dropterm(step_2, test = "F")
Single term deletions
Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 +
CS_1 + WINTER
Df Sum of Sq
RSS
AIC F Value
Pr(F)
<none>
0.56551 -2304.2
PE_1
1 0.0027553 0.56827 -2304.4 1.7053 0.192459
DY_1
1 0.0042573 0.56977 -2303.5 2.6349 0.105441
INF_2
1 0.0056055 0.57112 -2302.7 3.4693 0.063355 .
I3_1
1 0.0078989 0.57341 -2301.2 4.8886 0.027680 *
I3_2
1 0.0052475 0.57076 -2302.9 3.2477 0.072385 .
I12_1
1 0.0286570 0.59417 -2288.4 17.7360 3.231e-05 ***
I12_2
1 0.0153555 0.58087 -2296.6 9.5036 0.002213 **
CS_1
1 0.0117877 0.57730 -2298.8 7.2954 0.007249 **
WINTER 1 0.0029908 0.56851 -2304.3 1.8510 0.174540
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
73
74
I3_2
I12_1
I12_2
CS_1
--Signif.
1
1
1
1
0.0072565
0.0303013
0.0201211
0.0129920
codes:
0.58323
0.60628
0.59610
0.58897
75
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The final model can be estimated by applying the function lm to the formula defined
at step 5.
> regr3.4sw <- lm(step_5)
> summary(regr3.4sw)
Call:
lm(formula = step_5)
Residuals:
Min
1Q
-0.208944 -0.025168
Median
0.000683
3Q
0.027499
Max
0.128369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013180
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04039 on 353 degrees of freedom
Multiple R-squared: 0.1505,
Adjusted R-squared: 0.136
F-statistic: 10.42 on 6 and 353 DF, p-value: 1.213e-10
3.2.4
>
>
>
>
76
Median
0.000683
3Q
0.027499
Max
0.128369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013180
0.009379 -1.405 0.160847
DY_1
0.130442
0.048132
2.710 0.007054 **
I3_1
0.273531
0.121158
2.258 0.024578 *
I3_2
-0.252307
0.119641 -2.109 0.035660 *
I12_1
-0.528130
0.122553 -4.309 2.13e-05 ***
I12_2
0.435283
0.123954
3.512 0.000503 ***
CS_1
0.239039
0.084712
2.822 0.005046 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04039 on 353 degrees of freedom
Multiple R-squared: 0.1505,
Adjusted R-squared: 0.136
F-statistic: 10.42 on 6 and 353 DF, p-value: 1.213e-10
3.2.5
AIC
The command stepAIC, available in the package MASS, performs a stepwise model
selection by the Akaike Information Criterion. The first argument is the general
unrestricted linear model the procedure will be applied to; the option trace=0
suppresses the output of the procedure. See the help ?stepAIC for more information
on this function.
> library(MASS)
> regr3.4aic <- stepAIC(regr3.4f, trace = 0)
> regr3.4aic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 + I12_2 + CS_1 +
WINTER
Step Df
Deviance Resid. Df Resid. Dev
AIC
1
348 0.5628657 -2301.895
2 - MB_2 1 0.0004446865
349 0.5633103 -2303.610
3 - IP_2 1 0.0022043540
350 0.5655147 -2304.204
4 - PE_1 1 0.0027552908
351 0.5682700 -2304.455
> summary(regr3.4aic)
Call:
lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
Residuals:
Min
1Q
-0.202784 -0.023496
Median
0.002058
3Q
0.026805
Max
0.136163
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.021813
0.010225 -2.133 0.03360 *
DY_1
0.166210
0.055195
3.011 0.00279 **
INF_2
-0.106879
0.070144 -1.524 0.12848
I3_1
0.283111
0.122393
2.313 0.02129 *
I3_2
-0.232290
0.120551 -1.927 0.05480 .
I12_1
-0.524830
0.122834 -4.273 2.49e-05 ***
I12_2
0.406251
0.126100
3.222 0.00139 **
CS_1
0.222507
0.084787
2.624 0.00906 **
WINTER
0.006159
0.004375
1.408 0.16005
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.04024 on 351 degrees of freedom
Multiple R-squared: 0.1618,
Adjusted R-squared: 0.1427
F-statistic: 8.47 on 8 and 351 DF, p-value: 1.518e-10
77
78
3.2.6
BIC
By considering in the command stepAIC, available in the package MASS, the argument
k=log(n) specifying the penalty parameter6 , where n is the length of the data set, it is
possible to perform a stepwise model selection by the Bayesian Information Criterion
(BIC) or Schwarz Bayesian Criterion (SBC). The other argument in stepAIC, we
recall, is the linear model the procedure applies to (the option trace=0 suppresses
the output of the procedure).
See the help ?stepAIC for more information on this function.
> regr3.4bic <- stepAIC(regr3.4f, k = log(length(regr3.4f$res)),
trace = 0)
> regr3.4bic$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 + I3_2 + I12_1 +
I12_2 + MB_2 + CS_1 + WINTER
Final Model:
EXRET/100 ~ DY_1 + I12_1 + I12_2 + CS_1
Step Df
Deviance Resid. Df Resid. Dev
1
348 0.5628657
2
- MB_2 1 0.0004446865
349 0.5633103
3
- IP_2 1 0.0022043540
350 0.5655147
4
- PE_1 1 0.0027552908
351 0.5682700
5 - WINTER 1 0.0032091205
352 0.5714791
6 - INF_2 1 0.0044952452
353 0.5759744
7
- I3_2 1 0.0072564641
354 0.5832308
8
- I3_1 1 0.0013361484
355 0.5845670
> summary(regr3.4bic)
Call:
lm(formula = EXRET/100 ~ DY_1 + I12_1 + I12_2 +
Residuals:
Min
1Q
-0.214771 -0.025708
Median
0.001165
3Q
0.027578
AIC
-2255.261
-2260.863
-2265.343
-2269.480
-2273.338
-2276.404
-2277.783
-2282.845
Max
0.132124
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01230
0.00941 -1.308 0.19187
DY_1
0.12378
0.04698
2.635 0.00879 **
6k
I12_1
-0.27563
I12_2
0.20603
CS_1
0.22539
--Signif. codes: 0 "***"
0.05083
0.05298
0.08369
79
3.2.7
To compare the parameter estimates in the previous models, we can make use of the
function mtable available7 in the package memisc.
> library(memisc)
> mtable3.4 <- mtable(full = regr3.4f, "max adj R2" = regr3.4mr,
stepwise = regr3.4sw, "min AIC" = regr3.4aic,
"min BIC" = regr3.4bic)
> mtable3.4 <- relabel(mtable3.4, "(Intercept)" = "constant",
PE_1 = "pe_{t-1}", DY_1 = "dy_{t-1}", INF_2 = "infl_{t-1}",
IP_2 = "ip_{t-2}", I3_1 = "i3_{t-1}", I3_2 = "i3_{t-2}",
I12_1 = "i12_{t-1}", I12_2 = "i12_{t-2}", MB_2 = "mb_{t-2}",
CS_1 = "cs_{t-1}", WINTER = "winter_t")
> mtable3.4
Calls:
full: lm(formula = EXRET/100 ~ PE_1 + DY_1 + INF_2 + IP_2 + I3_1 +
I3_2 + I12_1 + I12_2 + MB_2 + CS_1 + WINTER, data = predtmp)
max adj R2: lm(formula = EXRET/100 ~ ., data = predtmpmr)
stepwise: lm(formula = EXRET/100 ~ DY_1 + I3_1 + I3_2 + I12_1 + I12_2+
CS_1, data = predtmp)
min AIC: lm(formula = EXRET/100 ~ DY_1 + INF_2 + I3_1 + I3_2 + I12_1 +
I12_2 + CS_1 + WINTER, data = predtmp)
min BIC: lm(formula = EXRET/100 ~ DY_1+I12_1+I12_2+CS_1,data=predtmp)
======================================================================
full
max adj R2 stepwise
min AIC
min BIC
---------------------------------------------------------------------constant
0.021
0.031
-0.013
-0.022*
-0.012
(0.040)
(0.035)
(0.009)
(0.010)
(0.009)
pe_{t-1}
-0.120
-0.158
(0.129)
(0.107)
dy_{t-1}
0.127
0.103
0.130**
0.166**
0.124**
7 Observe the use of the double quotes in the mtable call to specify the names of the lm objects
in the output: they are needed only when spaces are present in the name assigned to the lm object.
80
(0.083)
(0.069)
(0.048)
(0.055)
(0.047)
-0.163*
-0.157*
-0.107
(0.077)
(0.076)
(0.070)
ip_{t-2}
-0.060
-0.069
(0.061)
(0.059)
i3_{t-1}
0.269*
0.259*
0.274*
0.283*
(0.125)
(0.123)
(0.121)
(0.122)
i3_{t-2}
-0.223
-0.223
-0.252*
-0.232
(0.121)
(0.121)
(0.120)
(0.121)
i12_{t-1}
-0.505*** -0.504*** -0.528*** -0.525*** -0.276***
(0.123)
(0.123)
(0.123)
(0.123)
(0.051)
i12_{t-2}
0.389**
0.398**
0.435***
0.406**
0.206***
(0.128)
(0.127)
(0.124)
(0.126)
(0.053)
mb_{t-2}
-0.044
(0.084)
cs_{t-1}
0.175
0.158
0.239**
0.223**
0.225**
(0.109)
(0.104)
(0.085)
(0.085)
(0.084)
winter_t
0.006
0.006
0.006
(0.004)
(0.004)
(0.004)
---------------------------------------------------------------------R-squared
0.170
0.169
0.150
0.162
0.138
adj. R-squared
0.144
0.145
0.136
0.143
0.128
sigma
0.040
0.040
0.040
0.040
0.041
F
6.470
7.104
10.419
8.470
14.182
p
0.000
0.000
0.000
0.000
0.000
Log-likelihood
652.129
651.987
647.985
650.409
645.320
Deviance
0.563
0.563
0.576
0.568
0.585
AIC
-1278.259 -1279.975 -1279.971 -1280.819 -1278.640
BIC
-1227.740 -1233.341 -1248.882 -1241.958 -1255.323
N
360
360
360
360
360
======================================================================
infl_{t-1}
3.2.8
The values of the AIC statistics given by R for the estimated models in section 3.2.5
differ from those reported in the mtable output and also from the ones in Verbeek.
N
1 X 2 2k
e +
N i=1 i
N
where N is the dimension of the data set and k is the number of the unknown
parameters in the model.
(3.1)
81
where L is the likelihood, but in case of a linear model with unknown scale
parameter, if RSS denotes the residual sum of squares then extractAIC uses
N log(RSS/N ) for 2 log(L), then we have:
AIC = N log
N
1 X 2
e + 2k
N i=1 i
(3.2)
In the report obtained with the function mtable the AIC and BIC are computed
by using the function AIC, according to relationship (3.1) by considering the
estimate of the logLikelihood given by the function logLik, (k is multiplied by
2 in the case of AIC and by log(N ) in the case of BIC).
Observe that if k is the number of the parameters in a linear regression model, when
the maximum likelihood estimation method is applied to a linear regression model,
k is substituted by k + 1 since the maximum likelihood estimation method considers
also the variance of the error as a parameter to estimate.
So with regard to the computation of AIC, e.g., for the full model we have that the
value given by mtable may be obtained as:
> -2 * logLik(regr3.4f) + 2 * (12 + 1)
'log Lik.' -1278.259 (df=13)
or, simply, as:
> AIC(regr3.4f)
[1] -1278.259
which divided by N = 360 is quite close to the value reported in Verbeeks Table 3.4
at page 73 (The exact correspondence can be created by substituting 2 * 12 instead
of 2 * (12 + 1))
> AIC(regr3.4f)/360
[1] -3.550719
The value given by extractAIC, computed according to (3.2), is
> extractAIC(regr3.4f)
[1]
12.000 -2301.895
and is equivalent to
> 360 * log(sum(residuals(regr3.4f)^2)/360) + 2 * 12
[1] -2301.895
and dividing by 360 we obtain the AIC value according to Verbeeks formula (3.17)
> log(sum(residuals(regr3.4f)^2)/360) + 2 * 12/360
[1] -6.394152
82
3.2.9
Verbeek presents in Table 3.5 the results of the out of sample forecasting performance,
which may be obtained by applying proper functions, here coded in the following
function outsamplefit, to the actual excess returns and the predicted ones8 .
> actual <- predctrl[, "EXRET"]/100
> outsamplefit <- function(actual, predict, name = "") {
mad <- sum(abs(predict - actual))/length(predict)
mape <- sum(abs(predict - actual)/actual)/length(predict)
rmse <- (sum((predict - actual)^2)/length(predict))^0.5
r2os1 <- 1 - sum((predict - actual)^2)/sum((mean(predtmp[,
"EXRET"]/100) - actual)^2)
r2os2 <- (cor(predict, actual))^2
hit <- sum(sign(predict) == sign(actual))/length(predict)
output <- rbind(RMSE = rmse, MAD = mad, MAPE = mape,
r2os1 = r2os1, r2os2 = r2os2, hit = hit)
colnames(output) <- name
output
}
> pr_f <- predict(regr3.4f, predctrl)
> pr_mr <- predict(regr3.4mr, predctrlmr)
> pr_sw <- predict(regr3.4sw, predctrl)
> pr_aic <- predict(regr3.4aic, predctrl)
> pr_bic <- predict(regr3.4bic, predctrl)
> outofsamplefit <- cbind(full = outsamplefit(actual,
predict=pr_f,name="full"),"max adj R2"=outsamplefit(actual,
predict=pr_mr,name="max adj R2"),stepwise=outsamplefit(actual,
predict=pr_sw,name="stepwise"),"min AIC"=outsamplefit(actual,
predict=pr_aic,name="min AIC"),"min BIC"=outsamplefit(actual,
predict=pr_bic,name="min BIC"))
> outofsamplefit[c(1:2, 6), ] <- 100 * outofsamplefit[c(1:2,
6), ]
> round(outofsamplefit, 4)
full max adj R2 stepwise min AIC min BIC
RMSE
4.8332
4.9362
4.8421 4.8843 4.7903
MAD
3.7913
3.8994
3.8040 3.8519 3.7480
MAPE
0.6998
0.6634
0.7736 0.9517 0.4830
r2os1 -0.1583
-0.2082 -0.1626 -0.1830 -0.1379
r2os2 0.0094
0.0105
0.0003 0.0003 0.0000
hit
50.0000
49.1667 48.3333 46.6667 47.5000
Pay attention to the different meaning of the values in the last output: results in the
1st, 2nd and 6th rows are expressed as percentages.
8 Observe that the predicted values for the max adjusted R2 model are based on the data.frame
predctrlmr, defined at the step 4. of the procedure described in Section 3.2.2
3.3
83
Data may be read by means of the function read.table, having extracted the file
bwages.dat from the compressed file ch03.zip.
The summary statistics mean and standard deviation can be obtained for a single
variable, say WAGE, EDUC or EXPER, conditional on the levels of the variable MALE by
using the function tapply. The first argument is the variable we want to study (a
column of a data.frame); the second argument is a conditioning variable; the third
argument is the function used to study the variable in the first argument.
> indwages <- read.table(unzip("ch03.zip", "Chapter 3/bwages.dat"),
header = T)
> tapply(indwages$WAGE, indwages$MALE, mean)
0
1
10.26154 11.56223
We can combine the use of sapply to choose which variables the mean and standard
deviations are computed for
> means <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, mean))
> stdevs <- sapply(c(1, 3, 4), function(i) tapply(indwages[,
i], indwages$MALE, sd))
> meanandstd <- array(c(means = means, stdevs = stdevs),
c(2, 3, 2))
> dimnames(meanandstd) <- list(c("females", "males"),
names(indwages)[c(1, 3, 4)], c("means", "stdevs"))
> meanandstd
, , means
WAGE
EDUC
EXPER
females 10.26154 3.587219 15.20380
males
11.56223 3.243001 18.52296
, , stdevs
WAGE
EDUC
EXPER
females 3.808585 1.086521 9.704987
males
4.753789 1.257386 10.251041
We have omitted from the summary analysis the variables MALE, LNWAGE, LNEXPER
and LNEDUC.
3.3.1
84
Median
-0.3124
3Q
1.5679
Max
30.7015
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.213692
0.386895
0.552
0.581
MALE
1.346144
0.192736
6.984 4.32e-12 ***
EDUC
1.986090
0.080640 24.629 < 2e-16 ***
EXPER
0.192275
0.009583 20.064 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.548 on 1468 degrees of freedom
Multiple R-squared: 0.3656,
Adjusted R-squared: 0.3643
F-statistic:
282 on 3 and 1468 DF, p-value: < 2.2e-16
To include the effect of the squared number of years of experience, see Table 3.8:
WAGE = 1 + 2 MALE + 3 EDUC + 4 EXPER + 5 EXPER2 + ERROR
use
> indwages3.8 <- lm(WAGE ~ MALE + EDUC + EXPER + I(EXPER^2),
data = indwages)
> summary(indwages3.8)
Call:
lm(formula = WAGE ~ MALE + EDUC + EXPER + I(EXPER^2), data = indwages)
Residuals:
Min
1Q
-12.7246 -1.9519
Median
-0.3107
3Q
1.5117
Max
30.5951
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8924849 0.4329127 -2.062
0.0394
MALE
1.3336935 0.1908668
6.988 4.23e-12
EDUC
1.9881267 0.0798526 24.897 < 2e-16
EXPER
0.3579993 0.0316566 11.309 < 2e-16
I(EXPER^2) -0.0043692 0.0007962 -5.487 4.80e-08
---
*
***
***
***
***
Signif. codes:
85
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
3.3.2
86
10
3.0
ScaleLocation
1446
2.0
15
Fitted values
0.08
264
1404
0.04
1446
0.00
Cook's distance
0 2 4 6 8
4
Standardized residuals
1165
1404
15
Cook's distance
1446
10
Fitted values
Normal QQ
1165
1404
1.0
1165
1404
0.0
1446
Standardized residuals
10 20 30
10
Residuals
Residuals vs Fitted
Theoretical Quantiles
500
1000
1500
Obs. number
Figure 3.2 Graphs that can be obtain directly from the lm object indwages3.8 related to
the linear model
Residuals:
Min
1Q
-1.75085 -0.15921
Median
0.00618
3Q
0.17145
Max
1.10533
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.26271
0.06634 19.033 < 2e-16 ***
MALE
0.11794
0.01557
7.574 6.35e-14 ***
LNEDUC
0.44218
0.01819 24.306 < 2e-16 ***
LNEXPER
0.10982
0.05438
2.019
0.0436 *
I(LNEXPER^2) 0.02601
0.01148
2.266
0.0236 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2862 on 1467 degrees of freedom
Multiple R-squared: 0.3783,
Adjusted R-squared: 0.3766
87
Residuals vs Fitted
1.0
0.5
1.0
0.5
0.0
Residuals
1.5
462
312
2.0
677
1.4
1.6
1.8
2.0
2.2
2.4
2.6
2.8
Fitted values
lm(LNWAGE ~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2))
Figure 3.3
Figure 3.3 shows that heteroscedasticity is much less pronounced than for the additive
model. It may be obtained by means of the following instruction.
> plot(indwages3.9, which = 1)
Model without the effect of LNEXPER and of LNEXPER2
To check the joint effect of log(EXPER) and (log(EXPER))2 , we consider the parameter
estimation of the model:
log(WAGE) = 1 + 2 MALE + 3 log(EDUC) + ERROR
88
Model 1: LNWAGE
Model 2: LNWAGE
Res.Df
RSS
1
1469 158.57
2
1467 120.20
--Signif. codes:
~ MALE + LNEDUC
~ MALE + LNEDUC + LNEXPER + I(LNEXPER^2)
Df Sum of Sq
F
Pr(>F)
2
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Median
0.00485
3Q
0.17366
Max
1.11815
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.14473
0.04118 27.798 < 2e-16 ***
MALE
0.12008
0.01556
7.715 2.22e-14 ***
LNEDUC
0.43662
0.01805 24.188 < 2e-16 ***
LNEXPER
0.23065
0.01073 21.488 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2867 on 1468 degrees of freedom
Multiple R-squared: 0.3761,
Adjusted R-squared: 0.3748
F-statistic:
295 on 3 and 1468 DF, p-value: < 2.2e-16
Model with education considered as a factor
We can now consider the education as a factor to study the effects of the different
education levels against the first level of education.
The model matrix (hereafter p1), that is the matrix X containing the values of the
regressors, some of which are dummy variables recoding the factor education, can
be obtained by using the function model.matrix whose first argument is the second
member of the formula defining the linear model, while its second argument is the
data.frame containing the variables involved in the linear model (some rows of p1
are reported).
89
data = indwages)
LNEXPER
3.178054
2.772589
2.890372
3.218876
3.663562
1.791759
2.772589
1.609438
In defining the formula for the linear model we can express log(indwages$WAGE) as a
function of the model matrix p1 we have just obtained; we have to remember to drop
the constant since it already appears in the model matrix p1.
> indwages3.11 <- lm(log(indwages$WAGE) ~ -1 + p1)
> summary(indwages3.11)
Call:
lm(formula = log(indwages$WAGE) ~ -1 + p1)
Residuals:
Min
1Q
-1.65548 -0.15138
Median
0.01324
3Q
0.16998
Max
1.11684
Coefficients:
Estimate Std. Error t value Pr(>|t|)
p1(Intercept) 1.27189
0.04483 28.369 < 2e-16 ***
p1MALE
0.11762
0.01546
7.610 4.88e-14 ***
p1EDUC2
0.14364
0.03336
4.306 1.77e-05 ***
p1EDUC3
0.30487
0.03202
9.521 < 2e-16 ***
p1EDUC4
0.47428
0.03301 14.366 < 2e-16 ***
p1EDUC5
0.63910
0.03322 19.237 < 2e-16 ***
p1LNEXPER
0.23022
0.01056 21.804 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.282 on 1465 degrees of freedom
Multiple R-squared: 0.9858,
Adjusted R-squared: 0.9858
F-statistic: 1.455e+04 on 7 and 1465 DF, p-value: < 2.2e-16
Note that since EDUC was recoded as a factor the above result may be also obtained
with
> indwages3.11 <- lm(log(indwages$WAGE) ~ MALE + EDUC +
LNEXPER, data = indwages)
> summary(indwages3.11)
90
Call:
lm(formula = log(indwages$WAGE)~MALE+EDUC+LNEXPER, data=indwages)
Residuals:
Min
1Q
-1.65548 -0.15138
Median
0.01324
3Q
0.16998
Max
1.11684
Coefficients:
Estimate Std. Error t value
(Intercept) 1.27189
0.04483 28.369
MALE
0.11762
0.01546
7.610
EDUC2
0.14364
0.03336
4.306
EDUC3
0.30487
0.03202
9.521
EDUC4
0.47428
0.03301 14.366
EDUC5
0.63910
0.03322 19.237
LNEXPER
0.23022
0.01056 21.804
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
< 2e-16
4.88e-14
1.77e-05
< 2e-16
< 2e-16
< 2e-16
< 2e-16
***
***
***
***
***
***
***
3.3.3
To study the effects of gender we have to include the interactions MALE:EDUC and
MALE:LNEXPER. Remember that with the (*) the direct effects of the arguments
it applies and their interactions (which are usually defined by the : operator) are
included. So we have two ways for defining the model matrix:
model.matrix(~MALE+EDUC+LNEXPER+MALE:EDUC+MALE:LNEXPER,indwages)
model.matrix(~MALE*EDUC+MALE*LNEXPER,indwages)
We will use the second method which is more compact, directly within the lm formula.
> indwages3.12 <- lm(log(indwages$WAGE) ~ MALE * EDUC +
MALE * LNEXPER, data = indwages)
> summary(indwages3.12)
Call:
lm(formula = log(indwages$WAGE) ~ MALE * EDUC + MALE * LNEXPER,
data = indwages)
Residuals:
Min
1Q
-1.63955 -0.15328
Median
0.01225
3Q
0.16647
91
Max
1.11698
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.21584
0.07768 15.652 < 2e-16 ***
MALE
0.15375
0.09522
1.615 0.106595
EDUC2
0.22411
0.06758
3.316 0.000935 ***
EDUC3
0.43319
0.06323
6.851 1.08e-11 ***
EDUC4
0.60191
0.06280
9.585 < 2e-16 ***
EDUC5
0.75491
0.06467 11.673 < 2e-16 ***
LNEXPER
0.20744
0.01655 12.535 < 2e-16 ***
MALE:EDUC2
-0.09651
0.07770 -1.242 0.214381
MALE:EDUC3
-0.16677
0.07340 -2.272 0.023215 *
MALE:EDUC4
-0.17236
0.07440 -2.317 0.020663 *
MALE:EDUC5
-0.14616
0.07551 -1.935 0.053123 .
MALE:LNEXPER 0.04063
0.02149
1.891 0.058875 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2811 on 1460 degrees of freedom
Multiple R-squared: 0.4032,
Adjusted R-squared: 0.3988
F-statistic: 89.69 on 11 and 1460 DF, p-value: < 2.2e-16
We can perform an ANOVA analysis to evaluate if the joint effect of the interactions
MALE:EDUC and MALE:LNEXPER is significant.
> anova(indwages3.11, indwages3.12)
Analysis of Variance Table
Model 1: log(indwages$WAGE) ~ MALE + EDUC +
Model 2: log(indwages$WAGE) ~ MALE * EDUC +
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
1465 116.47
2
1460 115.37 5
1.0957 2.7732 0.01683
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
LNEXPER
MALE * LNEXPER
*
0.05 "." 0.1 " " 1
92
Residuals:
Min
1Q
-1.63623 -0.15046
Median
0.00831
3Q
0.16713
Max
1.12415
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.48891
0.21203
7.022 3.34e-12 ***
MALE
0.11597
0.01548
7.493 1.16e-13 ***
EDUC2
0.06727
0.22628
0.297
0.7663
EDUC3
0.13525
0.21889
0.618
0.5367
EDUC4
0.20495
0.21946
0.934
0.3505
EDUC5
0.34130
0.21808
1.565
0.1178
LNEXPER
0.16312
0.06539
2.494
0.0127 *
EDUC2:LNEXPER 0.01933
0.07049
0.274
0.7839
EDUC3:LNEXPER 0.04988
0.06821
0.731
0.4647
EDUC4:LNEXPER 0.08784
0.06877
1.277
0.2017
EDUC5:LNEXPER 0.09996
0.06822
1.465
0.1430
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2815 on 1461 degrees of freedom
Multiple R-squared: 0.4012,
Adjusted R-squared: 0.3971
F-statistic: 97.9 on 10 and 1461 DF, p-value: < 2.2e-16
4
Heteroscedasticity and
Autocorrelation
4.1
We import data from the file labour2.wf1, which is a work file of EViews.
We have first to invoke the package hexView and next the command readEViews.
The function unzip extracts a file from a compressed archive.
> library(hexView)
> labour <- readEViews(unzip("ch04.zip", "Chapter 4/labour2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Recall that by using the functions head(labour), tail(labour) and summary(labour) it is possible to explore the beginning, the final section and to obtain
summary statistics for all the variables included in the data-frame.
4.1.1
Linear Model
In Verbeek, see Table 4.1, it is first proposed the estimation of the following linear
model:
LABOR = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
This can be made by using the funtion lm, see Appendix A.5.
> labour4.1 <- lm(LABOR ~ WAGE + OUTPUT + CAPITAL,
data = labour)
> summary(labour4.1)
Call:
lm(formula = LABOR ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-1267.04
-54.11
Median
-14.02
3Q
37.20
Max
1560.48
94
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 287.7186
19.6418
14.65
<2e-16 ***
WAGE
-6.7419
0.5014 -13.45
<2e-16 ***
OUTPUT
15.4005
0.3556
43.30
<2e-16 ***
CAPITAL
-4.5905
0.2690 -17.07
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 156.3 on 565 degrees of freedom
Multiple R-squared: 0.9352,
Adjusted R-squared: 0.9348
F-statistic: 2716 on 3 and 565 DF, p-value: < 2.2e-16
4.1.2
The Breusch-Pagan test (Table 4.2) can be used to check for the presence of
heteroscedasticity. To perform the test we have first to regress the squared residuals
of the preceding regression on the predictors present in the model.
RES2 = 1 + 2 WAGE + 3 OUTPUT + 4 CAPITAL + ERROR
The residuals can be extracted from the object labour4.1 by means of the instruction
labour4.1$res. Observe that to define the squared residuals in the first member of
the model formula, we do not have to make recourse to the function as.is I(), which
instead must necessarily be invoked when a squared variable is included as regressor,
see Appendix A.4.
> labour4.2 <- lm(labour4.1$res^2 ~ WAGE + OUTPUT +
CAPITAL, data = labour)
> summary(labour4.2)
Call:
lm(formula = labour4.1$res^2 ~ WAGE + OUTPUT + CAPITAL, data = labour)
Residuals:
Min
1Q
-500023 -12448
Median
2722
3Q
Max
13354 1193685
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22719.5
11838.9 -1.919
0.0555 .
WAGE
228.9
302.2
0.757
0.4492
OUTPUT
5362.2
214.4 25.016
<2e-16 ***
CAPITAL
-3543.5
162.1 -21.858
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
95
4.1.3
We can also obtain the Breusch-Pagan result, without performing the preceding
regression, by directly invoking the function bptest available in the package lmtest.
> library(lmtest)
> bptest(labour4.1)
studentized Breusch-Pagan test
data: labour4.1
BP = 331.0653, df = 3, p-value < 2.2e-16
4.1.4
Loglinear model
Verbeeks Table 4.3 reports the OLS estimation results for the loglinear model:
log(LABOR) = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) + ERROR (4.1)
which may be obtained as:
> labour4.3 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour)
> summary(labour4.3)
96
Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour)
Residuals:
Min
1Q
-3.9744 -0.1641
Median
0.1079
3Q
0.2595
Max
1.9466
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
6.177290
0.246211 25.089
<2e-16 ***
log(WAGE)
-0.927764
0.071405 -12.993
<2e-16 ***
log(OUTPUT)
0.990047
0.026410 37.487
<2e-16 ***
log(CAPITAL) -0.003697
0.018770 -0.197
0.844
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.4653 on 565 degrees of freedom
Multiple R-squared: 0.843,
Adjusted R-squared: 0.8421
F-statistic: 1011 on 3 and 565 DF, p-value: < 2.2e-16
The corresponding Breusch-Pagan statistic is:
> bptest(labour4.3)
studentized Breusch-Pagan test
data: labour4.3
BP = 7.7269, df = 3, p-value = 0.05201
4.1.5
To perform the White Heteroscedasticity test and obtain the results in Verbeeks
Table 4.4 we have to consider the estimation of the following linear model:
RES2 = 1 + 2 log(WAGE) + 3 log(OUTPUT) + 4 log(CAPITAL) +
+ 22 log(WAGE) + 33 log(OUTPUT) + 44 log(CAPITAL) +
+ 23 log(WAGE) log(OUTPUT) + 24 log(WAGE log(CAPITAL) +
+ 34 log(OUTPUT) log(CAPITAL) + ERROR
(4.2)
To write this formula, have first a check to the variables that result as regressors by
applying the following regression statement1 , see A.4 for the instruction lm.
1 The function coeftest in the package lmtest performs the t tests on the estimated coefficients.
In the present situation it is used to have a look at which variables are present in the model.
97
.
*
.
*
We observe that the interaction terms appear in the model specification (4.2), but
the squared predictors are not included; so we have to adjust the regression formula
in the following way:
> labour4.4 <- lm(labour4.3$res^2 ~ (log(WAGE) + log(OUTPUT) +
log(CAPITAL))^2 + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2), data = labour)
> summary(labour4.4)
Call:
lm(formula = labour4.3$res^2~(log(WAGE)+log(OUTPUT)+log(CAPITAL))^2+
I(log(WAGE)^2)+I(log(OUTPUT)^2)+I(log(CAPITAL)^2),data=labour)
Residuals:
Min
1Q Median
-2.2664 -0.1650 -0.0724
3Q
Max
0.0212 15.2247
Coefficients:
(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)
I(log(WAGE)^2)
I(log(OUTPUT)^2)
I(log(CAPITAL)^2)
log(WAGE):log(OUTPUT)
log(WAGE):log(CAPITAL)
log(OUTPUT):log(CAPITAL)
98
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.1.6
(Intercept)
log(WAGE)
log(OUTPUT)
log(CAPITAL)
99
***
***
***
2 assume
The standard error of the regression (residual standard error), R2 and R
the same values as in the preceding regression, see Section 4.1.5. To obtain the value
of the F test adjusted for the presence of heteroscedasticity it is possible to use the
function waldtest, available in the package lmtest by specifying as first argument
the lm object labour4.3 resulting from the preceding call to linear model, and as
second argument the object containing the estimate of the White covariance matrix.
> waldtest(labour4.3, vcov = vcovHC(labour4.3, type = "HC1"))
Wald test
Model 1: log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL)
Model 2: log(LABOR) ~ 1
Res.Df Df
F
Pr(>F)
1
565
2
568 -3 544.73 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
4.1.7
100
(4.3)
Median
0.3281
3Q
1.1430
Max
6.7871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.25382
1.18545 -2.745 0.006247 **
log(WAGE)
-0.06105
0.34380 -0.178 0.859112
log(OUTPUT)
0.26695
0.12716
2.099 0.036231 *
log(CAPITAL) -0.33069
0.09037 -3.659 0.000277 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.24 on 565 degrees of freedom
Multiple R-squared: 0.02449,
Adjusted R-squared:
F-statistic: 4.728 on 3 and 565 DF, p-value: 0.002876
0.01931
Residuals:
Min
1Q
-11.6861 -0.8002
Median
0.3633
3Q
1.1849
101
Max
6.6993
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.819683
6.530195
0.891 0.373205
log(WAGE)
-4.942304
3.561094 -1.388 0.165729
log(OUTPUT)
0.187647
0.188814
0.994 0.320738
log(CAPITAL)
-0.331626
0.090318 -3.672 0.000264 ***
I(log(WAGE)^2)
0.653428
0.486332
1.344 0.179625
I(log(OUTPUT)^2)
0.001372
0.047232
0.029 0.976834
I(log(CAPITAL)^2) 0.030694
0.026799
1.145 0.252569
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.235 on 562 degrees of freedom
Multiple R-squared: 0.03404,
Adjusted R-squared:
F-statistic: 3.301 on 6 and 562 DF, p-value: 0.003355
0.02372
An analysis of variance confirms that the initial model for heteroscedasticity, see (4.3),
cannot be rejected.
> anova(labour4.6, labour4.6sq)
Analysis of Variance Table
Model 1: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL)
Model 2: log(labour4.3$res^2) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL) + I(log(WAGE)^2) + I(log(OUTPUT)^2) +
I(log(CAPITAL)^2)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
565 2836.1
2
562 2808.3 3
27.756 1.8515 0.1367
To obtain EGLS we have to transform the variables, by considering also the constant,
and perform the initial regression on the transformed variables. This can be made
directly in the linear model formula.
> hhat <- exp(fitted(labour4.6))
Observe that hhat contains a possible estimate for the variances of the error related
to each statistical unit, V ar(i |xi ) see Verbeeks relationship (4.36); that is of the
elements on the diagonal of 2 , the covariance matrix of the errors. appears in
the generalized least squares (GLS) estimator of (), see Verbeeks relationship (4.9):
= (X 0 1 X)1 X 0 1 y.
102
In the present case is assumed a diagonal matrix, that is the errors are assumed to
be uncorrelated.
> labour4.7 <- lm(log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
> summary(labour4.7)
Call:
lm(formula = log(LABOR)/hhat^0.5 ~ -1 + I(1/hhat^0.5) +
I(log(WAGE)/hhat^0.5) + I(log(OUTPUT)/hhat^0.5) +
I(log(CAPITAL)/hhat^0.5), data = labour)
Residuals:
Min
1Q
-29.1086 -0.7875
Median
0.4852
3Q
1.2394
Max
10.4219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
I(1/hhat^0.5)
5.89536
0.24764 23.806 < 2e-16 ***
I(log(WAGE)/hhat^0.5)
-0.85558
0.07188 -11.903 < 2e-16 ***
I(log(OUTPUT)/hhat^0.5)
1.03461
0.02731 37.890 < 2e-16 ***
I(log(CAPITAL)/hhat^0.5) -0.05686
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.509 on 565 degrees of freedom
Multiple R-squared: 0.9903,
Adjusted R-squared: 0.9902
F-statistic: 1.44e+04 on 4 and 565 DF, p-value: < 2.2e-16
which reproduces Verbeeks Table 4.7. Note that it is also possible to specify, in a
simpler way, the latter model, by including the weights option in the function lm,
thus performing weighted least squares.
> labour4.7 <- lm(log(LABOR) ~ log(WAGE) + log(OUTPUT) +
log(CAPITAL), data = labour, weights = hhat^-1)
> summary(labour4.7)
Call:
lm(formula = log(LABOR) ~ log(WAGE) + log(OUTPUT) + log(CAPITAL),
data = labour, weights = hhat^-1)
Weighted Residuals:
Min
1Q
Median
-29.1086 -0.7875
0.4852
3Q
1.2394
Max
10.4219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
103
(Intercept)
5.89536
0.24764 23.806 < 2e-16 ***
log(WAGE)
-0.85558
0.07188 -11.903 < 2e-16 ***
log(OUTPUT)
1.03461
0.02731 37.890 < 2e-16 ***
log(CAPITAL) -0.05686
0.02158 -2.636 0.00863 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.509 on 565 degrees of freedom
Multiple R-squared: 0.8509,
Adjusted R-squared: 0.8501
F-statistic: 1074 on 3 and 565 DF, p-value: < 2.2e-16
The goodness of fit statistics reported in the last output differ from those in the
preceding one.
To obtain, as suggested by Verbeek, R2 = corr2 (yi , yi ) use the code
> cor(log(labour$LABOR), fitted(labour4.7))^2
[1] 0.8404098
4.1.8
n
2
nk ei
e2i
1hi
with HC1,
with HC2,
e2i
(1hi )2
e2i
(1hi )i
with HC3,
with HC4,
where hi is the ith element on the main diagonal of the so-called Hat matrix
H = X(X 0 X)1 X 0 . The estimator HC0 was proposed by White (1980), HC1-HC3 by
MacKinnon and White (1985) to improve the performance in small samples. Long and
Ervin (2000) suggested HC3. The estimator HC4 by Cribari-Neto (2004) should improve
the small sample performance, especially in the presence of influential observations.
An observation is defined influential when its presence considerably alters the
parameter estimates.
Let consider the estimation of the linear model
y = 1 + 2 x +
in presence of an artificial dataset.
3 By setting type="const" the usual homoscedastic estimator for the covariance matrix of the
parameter estimates is selected.
104
> set.seed(123456)
> x <- runif(49, 10, 20)
> y <- 10 + 3 * x + rnorm(49)
The data have been simulated by considering for the variable x a sample, with
dimension 49, from a uniform distribution with values in the interval 10, 20; by setting
1 = 10, 2 = 3 and by considering for the errors realizations from a standardized
Normal random variable.
The OLS estimator may be obtained as:
> lm49 <- lm(y ~ x)
> summary(lm49)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
Median
-2.69955 -0.70519 -0.04837
3Q
0.63405
Max
2.19068
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.06620
0.83438
12.06 5.36e-16 ***
x
2.98002
0.05264
56.61 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.1 on 47 degrees of freedom
Multiple R-squared: 0.9855,
Adjusted R-squared:
F-statistic: 3205 on 1 and 47 DF, p-value: < 2.2e-16
0.9852
Residuals:
Min
1Q
-21.6949 -1.4477
Median
0.9419
3Q
1.8599
105
Max
4.2664
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.6932
2.5530
6.93 9.4e-09 ***
x
2.4616
0.1584
15.54 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.693 on 48 degrees of freedom
Multiple R-squared: 0.8342,
Adjusted R-squared:
F-statistic: 241.5 on 1 and 48 DF, p-value: < 2.2e-16
0.8307
We cannot definitively trust in previous results, due to the presence of the anomalous
case we added to the initial data: this observation has an influence on the parameter
estimates: both on the intercept and on the slope of the linear model.
Figure 4.1 shows the data and the regression lines we have estimated, the dotted one
is referred to the 50 data (with the anomalous presence).
> plot(y ~ x)
> abline(lm49)
> abline(lm50, lty = 2)
The presence of influential data may be detected by performing a Heteroscedasticity
consistent covariance inference by setting the option type="HC4". We have:
>
>
>
t
library(sandwich)
library(lmtest)
coeftest(lm50, vcov = vcovHC(lm50, type = "HC4"))
test of coefficients:
106
70
65
60
55
50
45
15
20
25
Figure 4.1 Scatter plot diagram of the data and regression lines without considering the
anomalous case (plain line) and considering the anomalous case (dotted line)
> X = model.matrix(lm50)
> round(hat(X), 3)
[1] 0.029 0.026 0.026 0.030
[11] 0.029 0.020 0.040 0.037
[21] 0.052 0.065 0.056 0.050
[31] 0.021 0.036 0.042 0.059
[41] 0.043 0.025 0.039 0.035
0.029
0.052
0.022
0.031
0.021
0.046
0.039
0.023
0.066
0.024
0.020
0.037
0.037
0.046
0.020
0.063
0.047
0.037
0.036
0.025
0.051
0.031
0.034
0.023
0.022
0.051
0.027
0.051
0.020
0.212
0.10
hat(X)
0.15
0.20
0.05
20
10
30
40
50
Index
Figure 4.2
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
6.15375 2.8752 0.006003 **
x
2.46161
0.41681 5.9059 3.49e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.6932
6.2806 2.8171 0.007015 **
x
2.4616
0.4254 5.7865 5.303e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC2"))
107
108
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
6.90622 2.5619
0.0136 *
x
2.46161
0.46802 5.2596 3.312e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
> coeftest(lm50, vcov = vcovHC(lm50, type = "HC3"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.69316
7.75586 2.2813
0.02701 *
x
2.46161
0.52584 4.6813 2.365e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Also with reference to the Labour Demand Example it is possible to detect the
presence of anomalous data. See Fig. 4.3.
> X = model.matrix(labour4.3)
> plot(hat(X))
4.2
Data can be read by means of the function read.table, having exctracted the
file icecream.dat from the compressed archive ch04.zip. As usual it is possible
to check for the consistency of the data with the information contained in the file
icecream.txt available in the zip file by means of the functions summary, head and
tail.
> icecream <- read.table(unzip("ch04.zip", "Chapter 4/icecream.dat"),
header = TRUE)
The following variables, with four-weekly observations from March 18, 1951 to July
11, 1953, (30 observations) are present:
Figure 4.4 shows the evolution of the time series Consumption, Temperature/100
and Price (cfr. Verbeeks Fig. 4.3). It may be obtained by first transforming the
4 Expenditures
109
0.06
0.05
0.03
hat(X)
0.04
0.00
0.01
0.02
100
200
300
400
500
Index
Figure 4.3
data.frame icecream in the multiple time series icecream1. One has then to
rescale the temperature by 0.01. Remember that a time series object is not a
data.frame, so the values of the temperature cannot be extracted with the command
icecream1$temp: the code icecream1[,4] will return the temperature values since
the variable temp is the fourth time series in icecream1 (the variable is stored in the
fourth column of the object). A time series object can thus be treated like a matrix.
> icecream1 <- ts(icecream)
> icecream1[, 4] <- icecream1[, 4]/100
To assign proper names to the time series in icecream1 have first a check to the
structure of the object by means of the function str.
> str(icecream1)
mts [1:30,1:5] 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "cons" "income" "price" "temp" ...
110
(4.4)
Median
0.002737
3Q
0.015953
Max
0.078986
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1973151 0.2702162
0.730 0.47179
price
-1.0444140 0.8343573 -1.252 0.22180
5 With the option auto.key=list(points=FALSE,lines=TRUE), which defines the default label and
can thus be omitted, each series is identified by a coloured segment.
Consumption
Price
Temperature
111
0.6
0.7
0.5
0.4
0.3
10
15
20
25
30
Time
Figure 4.4
income
0.0033078 0.0011714
2.824 0.00899 **
temp
0.0034584 0.0004455
7.762 3.1e-08 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03683 on 26 degrees of freedom
Multiple R-squared: 0.719,
Adjusted R-squared: 0.6866
F-statistic: 22.17 on 3 and 26 DF, p-value: 2.451e-07
4.2.1
The Durbin-Watson statistic to test for the presence of the first-order autocorrelation
in the residual series may be obtained by implementing Verbeeks relationship (4.51):
PT
(et et1 )2
dW = i=2PT
.
2
i=1 et
112
4.2.2
The Durbin-Watson statistic can be also directly obtained by making use of the
function dwtest available in the package lmtest which produces also the significance
level of the test. In the present case the hypothesis of no first-order autocorrelation
of the errors has to be rejected.
> library(lmtest)
> dwtest(icecream4.9, alternative = "greater")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0003024
alternative hypothesis: true autocorrelation is greater than 0
By specifying the argument alternative in the function dwtest it is possible to
define the direction of the test; by default alternative is set to "greater", that is
the autocorrelation of first order is greater than 0.
> dwtest(icecream4.9, alternative = "two.sided")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.0006048
alternative hypothesis: true autocorrelation is not 0
> dwtest(icecream4.9, alternative = "less")
Durbin-Watson test
data: icecream4.9
DW = 1.0212, p-value = 0.9997
alternative hypothesis: true autocorrelation is less than 0
Figure 4.5 shows the actual and fitted values for the ice cream consumption giving
evidence of the presence of a pattern in the residual behaviour. To obtain this graph
we define first a time series object icecream1 containing the consumption fitted values
from the regression model (4.4) and the real ice cream consumption values. The fitted
values may be derived by applying the function fitted to the lm object icecream4.9.
> icecream1 <- ts(cbind("Fitted Consumption" = fitted(icecream4.9),
Consumption = icecream$cons))
0.55
113
0.50
0.45
0.40
0.35
Consumption
0.30
0.25
10
15
20
25
30
Time
Figure 4.5
We may then apply the function xyplot to the multiple time series
> xyplot(icecream1, type = list("l", "p"), pch = 19,
xlab = "Time", ylab = "Consumption", superpose = TRUE,
auto.key = FALSE)
The list("l","p") in the argument type specifies that the first time series is plotted
with a line "l", while the second one with points "p"; use pch=19 for plain bullets.
xlab and ylab specify the axis labels and auto.key=FALSE suppresses the legend.
4.2.3
114
A time window starting at time 2 and ending at time 30 (the length of the
residual series) can be considered: in this way the dependent variable in the
linear model is given by the series of the residuals without its first element,
while the independent one is the series of the residuals without its last element.
Let resautocorr and resautocorr0 denote the lm objects obtained according to the
first and second option respectively.
> resautocorr <- lm(icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
> resautocorr0 <- lm(icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
> summary(resautocorr)
Call:
lm(formula = icecream4.9$res[-1] ~ -1 + icecream4.9$res[-30])
Residuals:
Min
1Q
Median
-0.063581 -0.014006 -0.000714
3Q
0.009123
Max
0.080090
Coefficients:
Estimate Std. Error t value Pr(>|t|)
icecream4.9$res[-30]
0.4006
0.1774
2.258
0.0319 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03023 on 28 degrees of freedom
Multiple R-squared: 0.1541,
Adjusted R-squared: 0.1238
F-statistic: 5.099 on 1 and 28 DF, p-value: 0.03192
> summary(resautocorr0)
Call:
lm(formula = icecream4.9$res ~ -1 + c(0, icecream4.9$res[-30]))
Residuals:
Min
1Q
Median
-0.063581 -0.013547 -0.000351
3Q
0.012530
Max
0.080090
Coefficients:
Estimate Std. Error t value Pr(>|t|)
c(0, icecream4.9$res[-30])
0.4006
0.1907
2.101
0.0444 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03249 on 29 degrees of freedom
Adjusted R-squared:
p-value: 0.04444
115
0.1022
The autocorrelation estimate is the same for the two models. In both outputs the
Multiple R-squared is produced as uncentered R-squared since an intercept term is
not present in the proposed models.
> 1
[1]
> 1
[1]
- sum(resautocorr$res^2)/sum(residuals(icecream4.9)[-1]^2)
0.1540577
- sum(resautocorr0$res^2)/sum(icecream4.9$res^2)
0.1321176
For the resautocorr0 model the centered and uncentered R-squared do coincide
since we have considered as dependent variable the complete series of residuals from
equation (4.4), which has zero mean. In Verbeek the first version of the model
(resautocorr) is considered but the centered R-squared has been computed. The
difference with the value reported here is due to the fact that the residual series
without its first element hasnt zero mean.
> (VerRsq<-1-sum(resautocorr$res^2)/sum((residuals(icecream4.9)[-1]mean(residuals(icecream4.9)[-1]))^2))
[1] 0.1491856
or
> (VerRsq<-1-sum(resautocorr$res^2)/(length(residuals(icecream4.9)[-1])1)/var(residuals(icecream4.9)[-1]))
[1] 0.1491856
The asymptotic test on
4.2.4
116
3Q
0.008912
Max
0.081859
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0615530 0.2571651
0.239
0.8128
price
-0.1476412 0.7918621 -0.186
0.8536
income
-0.0001158 0.0011085 -0.104
0.9176
temp
-0.0002033 0.0004328 -0.470
0.6426
shift.res
0.4282815 0.2112149
2.028
0.0534 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03481 on 25 degrees of freedom
Multiple R-squared: 0.1412,
Adjusted R-squared: 0.003833
F-statistic: 1.028 on 4 and 25 DF, p-value: 0.4123
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 4.237064
The test statistic must be compared with the quantile of a Chi-squared random
variable with p degrees of freedom. In the present example, since only the presence of
the first-order autocorrelation is tested, we have p = 1 and 21,0.95 = 3.84.
4.2.5
117
The Breusch-Godfrey test with its significance level may be directly obtained by
having recourse to the function bgtest available in the package lmtest.
> library(lmtest)
> bgtest(icecream4.9)
Breusch-Godfrey test for serial correlation of order up to 1
data: icecream4.9
LM test = 4.2371, df = 1, p-value = 0.03955
Observe that by applying the function coeftest to the object resulting from bgtest
the coefficients from the auxiliary regression including lagged residuals may be
obtained.
> coeftest(bgtest(icecream4.9))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.06155297 0.25716506 0.2394 0.81083
price
-0.14764118 0.79186210 -0.1864 0.85209
income
-0.00011579 0.00110852 -0.1045 0.91681
temp
-0.00020333 0.00043284 -0.4698 0.63852
lag(resid)_1 0.42828155 0.21121490 2.0277 0.04259 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
You will observe that inference on parameter estimates is made through the Normal
distribution (z values and not t values are reported). This because OLS assumptions
are not satisfied and only a normal limiting distribution may be derived for the OLS
parameter estimators, see Johnston and Di Nardo (1997) and Mann and Wald (1943).
4.2.6
Observe that there are no lagged regressors in equation (4.4) and the residuals have
been assumed to be uncorrelated with the regressors in the model, so Verbeek defines
the auxiliary regression for computing the Breusch-Godfrey statistic by including only
the intercept term and the lagged values of the residuals, see Verbeek p. 120, omitting
any other regressors present in equation (4.4).
In consequence of this particular formulation of the model, the value of the statistic
will be different from the one (usually provided by standard software), which we
reported above and which takes into account the possible presence of autocorrelation
between yt and the lagged regressor components of xt , see Johnston and Di Nardo
(1997) p. 191 (6.54).
In Verbeek the estimation of the following auxiliary regression has been considered.
t = const. + t1 + vt
118
We have:
> resautocorrint <- lm(icecream4.9$res ~ c(0, icecream4.9$res[-30]))
> length(icecream4.9$res) * summary(resautocorrint)$r.squared
[1] 3.992121
which is equivalent to invoke the function bgtest for a linear model where the series
et of the residuals depends only on the constant term.
> bgtest(lm(icecream4.9$res ~ 1))
Breusch-Godfrey test for serial correlation of order up to 1
data: lm(icecream4.9$res ~ 1)
LM test = 3.9921, df = 1, p-value = 0.04571
In Verbeek the statistic is actually computed by considering the Multiple R-squared
from model (4.4)
> (length(icecream4.9$res) - 1) * VerRsq
[1] 4.326382
Both formulations of the Breusch-Godfrey statistic reject the hypothesis of no firstorder autocorrelation presence, since the 0.95 quantile of a Chi-squared distribution
with 1 degree of freedom is 3.84.
4.2.7
119
Call:
lm(formula = icecream$cons[-1] - rho * icecream$cons[-T] ~ -1 +
pred1)
Residuals:
Min
1Q
Median
-0.061510 -0.013400 -0.000524
3Q
0.013603
Max
0.082052
Coefficients:
Estimate Std. Error t value Pr(>|t|)
pred1(Intercept) 0.1571429 0.2896285
0.543
0.5922
pred1price
-0.8923919 0.8108501 -1.101
0.2816
pred1income
0.0032028 0.0015460
2.072
0.0488 *
pred1temp
0.0035584 0.0005547
6.415 1.02e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03191 on 25 degrees of freedom
Multiple R-squared: 0.9823,
Adjusted R-squared: 0.9795
F-statistic:
347 on 4 and 25 DF, p-value: < 2.2e-16
> summary(rhoest)
Call:
lm(formula = resid ~ -1 + c(0, resid[-T]))
Residuals:
Min
1Q
-0.061510 -0.013163
Median
0.001124
3Q
0.014793
Max
0.082052
Coefficients:
Estimate Std. Error t value Pr(>|t|)
c(0, resid[-T])
0.4009
0.1923
2.085
0.046 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03266 on 29 degrees of freedom
Multiple R-squared: 0.1304,
Adjusted R-squared: 0.1004
F-statistic: 4.348 on 1 and 29 DF, p-value: 0.04597
The results are reported in Verbeeks Table 4.10. We observe some differences with
Verbeeks standard errors. In this case, as remarked by Verbeek, the Durbin-Watson
statistic is not appropriate since it would be referred to the transformed model.
120
4.2.8
Verbeeks Table 4.11 reports the estimation results for the following model, including
the lagged value of the temperature, and the corresponding Durbin-Watson statistic.
const = 1 + 2 pricet + 3 incomet + 4 tempt + 4 tempt1 + t .
(4.5)
The parameter estimates may be obtained by applying the functions lm and then the
function dwtest to the lm object resulting from the following instruction.
> icecream4.11 <- lm(cons[-1] ~ price[-1] + income[-1] + temp[-1] +
temp[-T], data = icecream)
> summary(icecream4.11)
Call:
lm(formula = cons[-1] ~ price[-1] + income[-1] + temp[-1] + temp[-T],
data = icecream)
Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745
3Q
0.014766
Max
0.080892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price[-1]
-0.8383023 0.6880209 -1.218 0.23490
income[-1]
0.0028673 0.0010533
2.722 0.01189 *
temp[-1]
0.0053321 0.0006704
7.953 3.5e-08 ***
temp[-T]
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02987 on 24 degrees of freedom
Multiple R-squared: 0.8285,
Adjusted R-squared: 0.7999
F-statistic: 28.98 on 4 and 24 DF, p-value: 7.1e-09
We can check for the sign of a possible first order autocorrelation, see Section 4.2.3:
> signcheck <- lm(icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
> summary(signcheck)
Call:
lm(formula = icecream4.11$res ~ -1 + c(0, icecream4.11$res[-29]))
Residuals:
Min
1Q
Median
-0.045799 -0.014388 -0.007997
3Q
0.014110
Max
0.082036
Coefficients:
Estimate Std. Error t value Pr(>|t|)
c(0, icecream4.11$res[-29])
0.07432
0.22636
0.328
121
0.745
-0.03174
122
Residuals:
Min
1Q
Median
-0.049070 -0.015391 -0.006745
3Q
0.014766
Max
0.080892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1894822 0.2323170
0.816 0.42274
price
-0.8383023 0.6880209 -1.218 0.23490
income
0.0028673 0.0010533
2.722 0.01189 *
temp
0.0053321 0.0006704
7.953 3.5e-08 ***
L(temp)
-0.0022039 0.0007307 -3.016 0.00597 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02987 on 24 degrees of freedom
Multiple R-squared: 0.8285,
Adjusted R-squared: 0.7999
F-statistic: 28.98 on 4 and 24 DF, p-value: 7.1e-09
4.3
Data can be read by means of the function read.table, having exctracted the file
forward2c.dat from the compressed archive ch04.zip. Use the functions summary,
head and tail to check for the consistency of the imported data with the information
contained in the file forward2c.txt available in the zip archive.
> riskpremia <- read.table(unzip("ch04.zip", "Chapter 4/forward2.dat"),
header = TRUE)
The data.frame contains 276 observations from January 1979 to December 2001
taken from DATASTREAM on the following variables6 .
6 Verbeek observes that none of the variables is expressed in logs and that pre-Euro rates are
based on exchange rates against the German mark.
123
1.5
2.0
2.5
EXUSBP
0.6
0.8
1.0
1.2
1.4
EXUSEUR
1980
1985
1990
1995
2000
Time
Figure 4.6
A multiple time series object may be defined by using the information that data were
collected with a monthly frequence starting from 1979, January.
> riskpremia <- ts(data = riskpremia, start = c(1979,
1), frequency = 12)
Graphical representations may be obtained by using the function xyplot available in
the package lattice (cfr. ?lattice::lattice and Longhow Lam (2010) for more
information), with separate graphs or a unique graph for each of the time series, see
Figures 4.6 and 4.7.
> library(lattice)
> xyplot(riskpremia[, 1:2])
> xyplot(riskpremia[, 1:2], superpose = TRUE)
Figure 4.8 shows the evolution of the forward discounts obtained as the difference
between the logarithms of the spot rates and of the forward rates (1 month): the
computed series are first combined in the matrix rp, which preserves the multiple
time series, mts, attribute, and the column names are also specified. The matrix is
then plotted by means of the function xyplot.
124
0.5
1.0
1.5
2.0
2.5
EXUSBP
EXUSEUR
1980
1985
1990
1995
2000
Time
Figure 4.7
4.3.1
We have to estimate the parameters in the following model, cfr. equation (4.70) in
Verbeek:
st ft1 = 1 + 2 (st1 ft1 ) + et .
log(EXUSBPt ) log(F1USBPt1 ) = 1 + 2 [log(EXUSBPt1 ) log(F1USBPt1 )] + et
(4.6)
To define lagged variables within the formula of a linear model it is possible to make
use of the operators available in the package dynlm (Dynamic Linear Regression): in
particular by means of the function L(x,k) (equivalent to the function lag(x,lag=k) available in the base system) the series x is lagged of k time units, by default k is
125
0.015
0.010
0.005
0.000
0.005
US$/GBP
US$/EUR
1980
1985
1990
1995
2000
Time
Figure 4.8
set equal to 1.
To estimate the linear model parameters we have then to invoke the function dynlm,
which is characterized by the same structure of the function lm, we have used in the
preceding Sections for the estimation of linear models.
> library(dynlm)
> riskpremia4.12 <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
L(log(EXUSBP) - log(F1USBP)), data = riskpremia)
> summary(riskpremia4.12)
Time series regression with "ts" data:
Start = 1979(2), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F1USBP)) ~ L(log(EXUSBP) log(F1USBP)), data = riskpremia)
126
Residuals:
Min
1Q
-0.14766 -0.01909
Median
0.00073
3Q
0.02082
Max
0.12527
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.005112
0.002365 -2.162 0.031514 *
L(log(EXUSBP) - log(F1USBP)) 3.212170
0.817474
3.929 0.000108 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.03154 on 273 degrees of freedom
Multiple R-squared: 0.05353,
Adjusted R-squared: 0.05006
F-statistic: 15.44 on 1 and 273 DF, p-value: 0.000108
The Breusch-Godfrey statistic may be computed with regard to the two tests for the
presence of first and up to twelfth-order autocorrelation. We can invoke the function
bgtest, we presented in Section 4.2.5.
> library(lmtest)
> bgtest(riskpremia4.12)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremia4.12
LM test = 0.2179, df = 1, p-value = 0.6406
> bgtest(riskpremia4.12, order = 12)
Breusch-Godfrey test for serial correlation of order up
to 12
data: riskpremia4.12
LM test = 10.2603, df = 12, p-value = 0.5931
Both tests reject the null hypotheses of no serial correlation of the residuals. We
remind that the 0.95 quantiles of the two Chi-squared distributions with 1 and 12
degrees of freedom may be obtained as:
> qchisq(0.95, c(1, 12))
[1] 3.841459 21.026070
An ANOVA test may be performed to verify if the intercept and the coefficient in the
linear model (4.6) may jointly be assumed both equal to 0. It is necessary to produce
an lm object related to the simpler model, the one under the null hyphotesis that
1 = 2 = 0. This model contains no regressors, so in defining the formula we have
to exclude the intercept.
> riskpremia4.12anov <- dynlm(log(EXUSBP) - log(L(F1USBP)) ~
-1, data = riskpremia)
> anova(riskpremia4.12anov, riskpremia4.12)
127
Pr(>|t|)
0.017513 *
0.001218 **
" 1
128
Call:
dynlm(formula = log(EXUSEUR) - log(L(F1USEUR)) ~ L(log(EXUSEUR) log(F1USEUR)), data = riskpremia)
Residuals:
Min
1Q
Median
-0.103024 -0.021487 -0.000015
3Q
0.020975
Max
0.088699
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.002280
0.003149 -0.724
0.470
L(log(EXUSEUR) - log(F1USEUR)) 0.484791
0.766435
0.633
0.528
Residual standard error: 0.03368 on 273 degrees of freedom
Multiple R-squared: 0.001463,
Adjusted R-squared: -0.002194
F-statistic: 0.4001 on 1 and 273 DF, p-value: 0.5276
> bgtest(riskpremiauseur)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremiauseur
LM test = 0.1176, df = 1, p-value = 0.7316
> bgtest(riskpremiauseur, order = 12)
Breusch-Godfrey test for serial correlation of order up
to 12
data: riskpremiauseur
LM test = 14.1237, df = 12, p-value = 0.2929
As observed by Verbeek no risk premium is found for the USD/Euro rate, namely
both the regression coefficients are not significantly different from zero; furthermore
the hyphoteses of no first-order and up to the twelfth-order autocorrelation are not
rejected. The Breusch-Pagan test gives evidence of the presence of heteroscedasticity
but the inference based upon heteroscedasticity consistent standard errors confirms
the previous conclusions.
> bptest(riskpremiauseur)
studentized Breusch-Pagan test
data: riskpremiauseur
BP = 3.965, df = 1, p-value = 0.04646
> coeftest(riskpremiauseur,vcov=vcovHC(riskpremiauseur,type="HC1"))
129
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.0022795 0.0030643 -0.7439
0.4576
L(log(EXUSEUR)-log(F1USEUR)) 0.4847906 0.8420818 0.5757
0.5653
4.3.2
The parameters in the following model, cfr. Verbeek equation (4.72), have to be
estimated:
3
3
st ft3
= 1 + 2 (st3 ft3
) + et
log(EXUSBPt ) log(F3USBPt3 ) = 1 + 2 (log(EXUSBPt3 ) log(F3USBPt3 )) + et
(4.7)
The function dynlm may be invoked to estimate a linear model in presence of lagged
variables. Remember that lagged variables can be introduced in the model formula
with the operator L(.).
> library(dynlm)
> riskpremiaoverlUSBP <- dynlm(log(EXUSBP) - log(L(F3USBP,
3)) ~ L(log(EXUSBP) - log(F3USBP), 3), data = riskpremia)
> summary(riskpremiaoverlUSBP)
Time series regression with "ts" data:
Start = 1979(4), End = 2001(12)
Call:
dynlm(formula = log(EXUSBP) - log(L(F3USBP, 3)) ~ L(log(EXUSBP) log(F3USBP), 3), data = riskpremia)
Residuals:
Min
1Q
-0.285511 -0.025561
Median
0.001782
3Q
0.029698
Max
0.176615
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.013566
0.004216 -3.218 0.00145 **
L(log(EXUSBP)-log(F3USBP),3) 3.135215
0.529277
5.924 9.53e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05647 on 271 degrees of freedom
Multiple R-squared: 0.1146,
Adjusted R-squared: 0.1114
F-statistic: 35.09 on 1 and 271 DF, p-value: 9.525e-09
The Breusch-Godfrey statistic is then computed to check for the presence of serially
correlated errors; in particular there is evidence of a strong autocorrelation with
130
reference to the first- and the twelfth-order, but as Verbeek observes the conclusions
are incorrect due to the fact that monthly data for 3 months contracts are considered
and, though see Verbeek relationship (4.73) t may be assumed to be uncorrelated
with xt3 , t may be possibly correlated with t1 and with t2 .
> bgtest(riskpremiaoverlUSBP)
Breusch-Godfrey test for serial correlation of order up to 1
data: riskpremiaoverlUSBP
LM test = 119.6924, df = 1, p-value < 2.2e-16
> bgtest(riskpremiaoverlUSBP, order = 12)
Breusch-Godfrey test for serial correlation of order up to 12
data: riskpremiaoverlUSBP
LM test = 173.672, df = 12, p-value < 2.2e-16
The Breusch-Godfrey statistic must then be computed only with reference to the
autocorrelations of order 3,4, until 12. The auxiliary equation referred to (4.7) is:
3
et = 1 + 2 (st3 ft3
) + 3 et3 + + 12 et12
The matrix re is an mts (multiple time series) object obtained by binding the time
series of the residuals and its proper lagged versions and preserves the initial start of
the series, but adds information at the end of the series have a look at the object
re once You have created it. The correct time window is to be considered and this is
obtained by means of the function window. Any presample value (identified as a NA
not available case in the matrix) is set to zero and the series of the order 1 lagged
errors is dropped. Proper names are finally assigned to the elements (columns) of the
multiple time series object.
> check <- dynlm(reUSBP ~ L(log(EXUSBP) - log(F3USBP), 3) + re,
data = riskpremia)
The results of the auxiliary regression estimation are reported here only for
completeness, thus the following instruction may be dropped.
> summary(check)
131
Median
0.003567
3Q
0.033847
Max
0.162478
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.0003768 0.0042542 -0.089
0.9295
L(log(EXUSBP)-log(F3USBP),3) 0.0680321 0.5378860
0.126
0.8994
reUSBPlag3
-0.0372339 0.0970268 -0.384
0.7015
reUSBPlag4
-0.0582779 0.1313641 -0.444
0.6577
reUSBPlag5
0.0615611 0.1312624
0.469
0.6395
reUSBPlag6
-0.1456368 0.1402550 -1.038
0.3001
reUSBPlag7
-0.0228997 0.1470880 -0.156
0.8764
reUSBPlag8
0.1280666 0.1471511
0.870
0.3849
reUSBPlag9
-0.0768684 0.1408519 -0.546
0.5857
reUSBPlag10
-0.0840098 0.1323110 -0.635
0.5260
reUSBPlag11
0.2226356 0.1325330
1.680
0.0942 .
reUSBPlag12
-0.1622903 0.0973472 -1.667
0.0967 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05678 on 261 degrees of freedom
Multiple R-squared: 0.02627,
Adjusted R-squared: -0.01477
F-statistic: 0.6401 on 11 and 261 DF, p-value: 0.7937
The Breusch-Godfrey statistic can finally be computed by multiplying the multiple
R-squared by the number of observations in the auxiliary regression.
> T <- length(reUSBP)
> summary(check)$r.squared * T
[1] 7.170992
> qchisq(0.95, 10)
[1] 18.30704
To obtain the results of Verbeek the above assumption (consisting in setting to 0 any
presample residual values) must not be made, and T must be substituted by T 12:
it then follows a slightly different formulation, though with the same final conclusion.
132
Median
0.004025
3Q
0.033975
Max
0.164022
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.002471
0.004304 -0.574
0.5663
L(log(EXUSBP)-log(F3USBP),3) 0.166460
0.535840
0.311
0.7563
L(reUSBP, 3:12)3
-0.016621
0.098831 -0.168
0.8666
L(reUSBP, 3:12)4
-0.060702
0.133137 -0.456
0.6488
L(reUSBP, 3:12)5
0.037894
0.131270
0.289
0.7731
L(reUSBP, 3:12)6
-0.141032
0.141052 -1.000
0.3183
L(reUSBP, 3:12)7
-0.023177
0.146965 -0.158
0.8748
L(reUSBP, 3:12)8
0.122700
0.146858
0.835
0.4042
L(reUSBP, 3:12)9
-0.079221
0.140486 -0.564
0.5733
L(reUSBP, 3:12)10
-0.085884
0.131756 -0.652
0.5151
L(reUSBP, 3:12)11
0.223343
0.131967
1.692
0.0918 .
L(reUSBP, 3:12)12
-0.163520
0.096852 -1.688
0.0926 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.05642 on 249 degrees of freedom
Multiple R-squared: 0.03009,
Adjusted R-squared: -0.01276
F-statistic: 0.7022 on 11 and 249 DF, p-value: 0.7361
It follows the value of the Breusch-Godfrey statistic
> T <- length(reUSBP)
> summary(check)$r.squared * (T - 12)
[1] 7.852361
133
Pr(>|t|)
0.012142 *
0.003256 **
" 1
There seem to be some misprints in the standard errors values reported on Verbeeks
p. 133; according to Verbeeks formulae (4.62) and (4.63) we perform the following
check.
> a <- ts(cbind(model.matrix(riskpremiaoverlUSBP),
riskpremiaoverlUSBP$res), start = c(1979, 4),
frequency = 12)
> dimnames(a)[[2]] <- c("int", "x1", "e")
> first.term.4.70 <- t(a[, 1:2]) %*% (a[, 1:2] * a[,
3]^2)/T
> H <- 3
> cum <- 0
> for (j in 1:(H - 1)) {
w_j <- (1 - j/H)
as <- window(a, start = c(1979, 4 + j), end = c(2001,
12))
aj <- window(lag(a, -j), start = c(1979, 4 +
j), end = c(2001, 12))
cum <- cum + w_j * (t(as[, 1:2] * as[, 3]) %*%
(aj[, 1:2] * aj[, 3]) + t(aj[, 1:2] * aj[,
3]) %*% (as[, 1:2] * as[, 3]))
134
}
second.term.4.70 <- cum/T
Sstar <- first.term.4.70 + second.term.4.70
ext.term.4.69 <- solve(t(a[, 1:2]) %*% a[, 1:2])
vcovbeta <- ext.term.4.69 %*% (T * Sstar) %*% ext.term.4.69
diag(vcovbeta)^0.5
int
x1
0.005372888 1.056015009
>
>
>
>
>
3Q
0.04268
Max
0.15541
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-0.010506
0.005983 -1.756
0.0802 .
L(log(EXUSEUR)-log(F3USEUR),3) 0.006050
0.534784
0.011
0.9910
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.06059 on 271 degrees of freedom
Multiple R-squared: 4.722e-07,
Adjusted R-squared:
F-statistic: 0.000128 on 1 and 271 DF, p-value: 0.991
-0.00369
> bgtest(riskpremiaoverlUSEUR)
Breusch-Godfrey test for serial correlation of order up
to 1
data: riskpremiaoverlUSEUR
LM test = 130.1647, df = 1, p-value < 2.2e-16
> bgtest(riskpremiaoverlUSEUR, order = 12)
135
5
Endogeneity, Instrumental
Variables and GMM
5.1
Data are available in the file schooling.wf1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The function unzip extracts the file from the compressed archive ch05.zip.
> library(hexView)
> schooling <- readEViews(unzip("ch05.zip", "Chapter 5/schooling.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Recall that by using the functions head(), tail() and summary() it is possible to
explore the beginning, the final section and to obtain summary statistics for all the
variables included in the data-frame.
The files schooling contain data taken from the National Longitudinal Survey of
Young Men (NLSYM) concerning the United States. The analysis focuses on 1976
but uses some variables that date back to earlier years. The following variables (many
are dummy variables) are present:
138
BLACK 1 if black
The estimation of a linear model explaining the log wage in 1976 is proposed:
LWAGE76 = 1 + 2 ED76 + 3 EXP76 + 4 EXP762 + 5 BLACK + 6 SMSA76 + 7 SOUTH76 + ERROR
The parameter estimates appearing in Verbeeks Table 5.1 may be obtained by using
the function lm.
> schooling5.1 <- lm(LWAGE76 ~ ED76 + EXP76 + EXP762 +
BLACK + SMSA76 + SOUTH76, data = schooling)
> summary(schooling5.1)
Call:
lm(formula = LWAGE76 ~ ED76 + EXP76 + EXP762 + BLACK + SMSA76 +
SOUTH76, data = schooling)
Residuals:
Min
1Q
-1.59297 -0.22315
Median
0.01893
3Q
0.24223
Max
1.33190
Coefficients:
(Intercept)
ED76
EXP76
EXP762
-0.0022409
BLACK
-0.1896315
SMSA76
0.1614230
SOUTH76
-0.1248615
--Signif. codes: 0 "***"
139
Median
-0.296
3Q
1.876
Max
7.199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.869524
4.298357 -0.435 0.663638
AGE76
1.061441
0.301398
3.522 0.000435 ***
I(AGE76^2) -0.018760
0.005231 -3.586 0.000341 ***
BLACK
-1.468367
0.115443 -12.719 < 2e-16 ***
SMSA76
0.835403
0.109252
7.647 2.76e-14 ***
SOUTH76
-0.459700
0.102434 -4.488 7.47e-06 ***
NEARC4
0.347105
0.106997
3.244 0.001191 **
---
140
Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Median
0.02286
Mean
0.00000
3rd Qu.
0.26350
Estimate
Std. Error
(Intercept) 4.0656681454 0.6084960487
ED76
0.1329471988 0.0513793955
EXP76
0.0559613854 0.0259944248
EXP762
-0.0007956595 0.0013403005
BLACK
-0.1031403726 0.0773729097
SMSA76
0.1079848759 0.0497398928
SOUTH76
-0.0981751843 0.0287645065
--Signif. codes: 0 "***" 0.001 "**" 0.01
Max.
1.31600
t value
6.68150
2.58756
2.15282
-0.59364
-1.33303
2.17099
-3.41307
Pr(>|t|)
2.8078e-11
0.00971243
0.03141204
0.55279589
0.18262324
0.03000991
0.00065087
***
**
*
*
***
141
>
>
>
>
Y <- schooling$LWAGE76
X <- model.matrix(schooling5.1)
Z <- X
Z[, 2:4] <- cbind(schooling$NEARC4, schooling$AGE76,
schooling$AGE76^2)
> solve(t(Z) %*% X) %*% t(Z) %*% Y
[,1]
(Intercept) 4.0656681823
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
model.matrix(schooling5.1) returns the matrix of the regressors in the first model.
Z[,2:4]<-cbind(schooling$NEARC4,schooling$AGE76,schooling$AGE762 ) replaces
the values of the endogenous variables in the matrix Z with the values of their instruments (note that the matrix Z was initially set equal to X).
%*% is the operator performing matrix multiplication.
solve() returns the inverse of a matrix, when applied to a single element, or in the
following form gives the solution of a linear system of equations, here (Z 0 X) = (Z 0 Y ):
> solve(t(Z) %*% X, t(Z) %*% Y)
[,1]
(Intercept) 4.0656681824
ED76
0.1329471956
EXP76
0.0559613864
EXP762
-0.0007956596
BLACK
-0.1031403771
SMSA76
0.1079848787
SOUTH76
-0.0981751857
Observe that the latter code is computationally more efficient than the former for
solving a linear system of equations.
The function tsls may also be invoked by using the following four arguments: the
response, the matrix of independent variables, the matrix containing the instruments,
and a vector of weights to be used in the fitting process. Here we consider unitary
weights
> a <- tsls(Y, X, Z, w = 1)
The coefficients and their standard errors may be obtained by applying the function
summary to the object a, or by extracting from a the elements coefficients and
their covariance matrix V, (see the structure of a: str(a)):
> a$coefficients
142
5.2
The following example by Dieter Rozenich is taken from the R help system of the
function gmm, (Chausse 2010).
For the two parameters of a normal distribution N (, 2 ) we have the following
three moment conditions:
E(X) = 0
E[(X )2 ] 2 = 0
E(X 3 ) (2 + 3 2 ) = 0
The first two moment conditions are directly obtained by the definition of N (, 2 ).
The third moment condition may be derived from the third derivative of the moment
generating function (MGF)
2 t2
MX (t) = E exp t +
2
evaluated at t = 0.
Note that, as is usual in GMM, we have more equations (3) than unknown
parameters (2).
A function, say g, is first defined, in order to establish the moment conditions which
depend on the unknown parameters, collected in a vector, say theta = [1 = , 2 =
2 ] and, of course, on the data x = [x1 , x2 , . . . , xn ].
> g <- function(theta, x) {
m1 <- x - theta[1]
m2 <- (x - theta[1])^2 - theta[2]
m3 <- x^3 - theta[1] * (theta[1]^2 + 3 * theta[2])
f <- cbind(m1, m2, m3)
return(f)
}
In presence of a vector of observations x = [x1 , x2 , . . . , xn ]:
1 0
1X
1X
1n m1 =
m1i =
(xi 1 ) = 0,
n
n i=1
n i=1
where 1n is the n 1 unitary vector, corresponds to the first moment condition.
143
1 0
1X
1X
1n m2 =
m2i =
(xi 1 )2 2 = 0
n
n i=1
n i=1
corresponds to the second moment condition.
1X
1X 3
1 0
1n m3 =
m3i =
x 1 (12 + 32 ) = 0
n
n i=1
n i=1 i
corresponds to the third moment condition.
0.0009022213
Theta[2]
20.7162
Convergence code =
The Jacobian related to the moment conditions can also be passed to the function gmm
to define the gradient, possibly improving the efficiency of the minimization algorithm
to solve the GMM problem. In the present case the Jacobian results:
1
0
J = 2 2E(X) 1
32 3 2 3
The function Dg is created to define the Jacobian.
1 The function g can also correspond to a formula when the model is linear (see the R-help
?gmm::gmm).
144
0.0009022213
Theta[2]
20.7162
Convergence code =
The covariance matrix of the parameter estimates can be obtained by means of the
function vcov.gmm.
> vcov.gmm(estimation)
Theta[1]
Theta[2]
Theta[1] 0.20798058 0.05594737
Theta[2] 0.05594737 7.88828621
5.3
145
...
The portfolios are composed by the Center for Research in Security Prices (CRSP)
and contain stocks listed at NYSE, divided into 10 size-based deciles. For instance,
portfolio 1 contains the 10% smallest firms listed at NYSE.
Observe that the values of cons are relative values; that is they are obtained by
the ratio of total US personal consumption expenditures at times t and t 1.
We can transform the data.frame in a multiple time series. This is useful since it
allows us to work with data collected in a matrix object.
> pricing <- ts(data = pricing, start = c(1959, 2),
frequency = 12)
To apply GMM estimation we have first to recall the moment conditions, see Verbeeks
relationships (5.78) and (5.79):
(1
+
r
)
1=0
E CCt+1
f,t+1
t
(r
r
)
= 0, j = 1, . . . , J.
E CCt+1
j,t+1
f,t+1
t
We define g, a function of the parameters, resumed by the vector theta = [, ], and
of the data, here represented by the matrix x. The function g returns a n q matrix
with typical element gi (, xt ) for i = 1, . . . , q and t = 1, . . . , n. The columns of this
matrix are then used to build the q sample moment conditions.
> g <- function(theta,
e1 <- theta[1]
11]) - 1
e2 <- theta[1]
x[, 11])
f <- cbind(e1,
return(f)
}
x) {
* x[, 12]^(-theta[2]) * (1 + x[,
* x[, 12]^(-theta[2]) * (x[, 1:10] e2)
e1 contains the elements necessary to the function gmm to define the first moment
condition; the twelfth column, x[,12], of x is assumed to contain the ratio
values of US consumption expenditures (indeed they are stored as the twelfth
variable in the data.frame pricing); the eleventh column, x[,11], of x is
146
Note that
1X
2
1 x
12(i) (1 + x11(i) ) 1 = 0
n i=1
is the empirical counterpart of the first moment condition.
e2 defines a matrix whose columns contain the elements necessary for defining
the other moment conditions; the columns of x[,1:10]-x[,11] contain the
differences between the monthly returns on portfolios 1-10 and the risk free rate
(Note that the recycling rule for vector differences has been applied).
So the elements in the generic jth column of e2 are of the type:
2
1 x
12(i) (xj(i) x11(i) ),
j = 1, . . . , 10;
by taking the sample averages we can obtain the empirical counterparts of the
remaining 10 moment conditions, j = 1, . . . , 10.
We now invoke the function gmm to estimate the two parameters using the GMM.
Two-step GMM
> library(gmm)
> pricing5.4_two <- gmm(g, pricing, c(0, 0), type = "twoStep",
wmatrix = "ident")
> summary(pricing5.4_two)
Call:
gmm(g=g,x=pricing,t0=c(0,0),type="twoStep",wmatrix="ident")
Method:
twoStep
Kernel:
Quadratic Spectral
Coefficients:
Estimate
Theta[1] 7.0043e-01
Theta[2] 9.1209e+01
Std. Error
1.4694e-01
3.9654e+01
t value
4.7666e+00
2.3001e+00
Pr(>|t|)
1.8732e-06
2.1442e-02
147
Method:
iterative
Kernel:
Quadratic Spectral
Coefficients:
Estimate
Theta[1] 8.2736e-01
Theta[2] 5.7394e+01
Std. Error
1.1616e-01
3.4221e+01
t value
7.1228e+00
1.6772e+00
Pr(>|t|)
1.0576e-12
9.3508e-02
148
0.08
0.06
0.04
0.02
0.00
0.10
0.12
0.14
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Figure 5.1
pricing[, 11])/mean(it_mrs))
mer <- colMeans(pricing[, 1:10] - pricing[, 11])
pred.mer <- exp(12 * pred.mer) - 1
mer <- exp(12 * mer) - 1
plot(mer ~ pred.mer, xlim = c(0, 0.14), ylim = c(0,
0.14), pch = 17, xlab = "Predicted mean excess return",
ylab = "Mean excess return")
> abline(0, 1)
>
>
>
>
149
6
Maximum Likelihood Estimation
and Specification Tests
In this Chapter the Maximum Likelihood method is applied to obtain the parameter
estimates characterizing some statistical distributions: we will take into consideration
the Normal, the Bernoulli, the Exponential and the Poisson ones. The method will
then be applied to estimate the parameters of a linear model with Gaussian errors.
Let (x1 , . . . , xn ) be a sample from a random variable X, discrete or continuous,
with probability density function
f (x; ).
(6.1)
The n-dimensional random variable, (X1 , . . . , Xn ), associated to (x1 , . . . , xn ), when
the hyphothesis of independence and identical distribution of the components of
(X1 , . . . , Xn ) can be assumed, has the following distribution function.
L(x1 , . . . , xn ; ) = fX1 ,...,Xn (x1 , . . . , xn ; ) =
n
Y
f (xi ; ).
i=1
n
Y
f (xi ; ).
i=1
n
X
i=1
since
argmax L(x1 , . . . , xn ; ) = argmax log [L(x1 , . . . , xn ; )] .
152
0.2
0.0
0.1
density
0.3
0.4
10
10
Figure 6.1
6.1
Normal distribution
153
0.2
0.0
0.1
density
0.3
0.4
10
10
Figure 6.2
set.seed(1000)
n <- 100
mean <- 4
sd <- 3
x <- rnorm(n, mean = mean, sd = sd)
We now construct
log (L(x1 , . . . , xn ; )) =
n
X
i=1
ln
1
(xi )2
exp
2 2
(2 2 )1/2
154
that is the opposite of the log-Likelihood function1 , under the assumption that the
observations in the sample are i.i.d. X N (, 2 ) with and 2 unknown parameters.
With the function dnorm(x,mean,sd,log) we can obtain, for the value x, the density
function of the Normal distribution with = mean and = sd when log=FALSE and
the log-density when log=TRUE. Observe that by default the argument log is FALSE.
> ll <- function(theta) -sum(dnorm(x, mean = theta[1],
sd = theta[2]^0.5, log = TRUE))
Here theta is a vector with 2 elements: respectively the mean and the standard
deviation of a Normal distribution.
To invoke the minimization algorithm we need to specify the starting values for the
parameters upon which the objective function depends. Observe that in ill-posed
problems the solution of the minimization algorithm might depend highly on the
choice of the starting parameters.
We can use a Newton type algorithm as minimization algorithm, which is available
in the function nlm. The main arguments of nlm are the function, which has to be
minimized, and the starting values. One can also require to produce the hessian, which
will be used to construct I(, 2 ), the Fisher Information Matrix, and the covariance
matrix of the parameter estimates as the inverse of I(, 2 ). (See the help ?nlm for
more information on the function nlm).
Usually one has to try with different starting values to evaluate the sensitivity of the
solution. Here we propose the following two options:
the median and one half the interquartile range for the location and the scale
parameters respectively;
>
>
>
>
1 This
because R internal optimization routines provide for the solution to minimization problems.
>
>
>
>
155
156
logL
sig
m
a^
od
iho
ikel
u
m
80
290
0
7
6
00
60
8
0
8
2
80
60
60
40
60
8
40
13
20
50
0
14
72
98
208
0
0
0
0
0
0
02
20
2
1
0
28
1 0
48
1
0
46
40
0
40
1
0
76
8 10
1
30
1
0
0
66
220
0
9
40
1
3
0
16
6 78
40
60
00
1
20
4
380
29
15
22
0 1120
0
0
20 2400
000000000
98
86
88
90
92
94
96
72
74
76
78
80
82
84
64
66
68
70
540
62
86
17
334
4
4
4
4
4
4
4
4
4
0
300
80
3
3 20
60
28
4
00
60
80
80
20
38
0
18
20
5
20
10
20
60
7
00
40
6
4
80
26
7
40
9
40
1
8
80
10
0
1
00
0
56
20
10
15
20
00
00
10
15
> summary(out)
Maximum likelihood estimation
Call:
mle(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error
mi1 4.048892 0.3004372
s2 9.026253 1.2762730
-2 log L: 503.8196
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)
157
158
L(, 2 |x1 , . . . , xn ) =
2 2
(2 2 )1/2
i=1
for different sample sizes. Let generate a sample x of length 250 from X N ( =
4, 2 = 25) and define the function llplot to obtain the perspective and contour
graphs of the Likelihood function for the n initial elements of x.
The function pdf redirects the graph on a pdf (file) device.
dev.cur() returns the name of the current device R is working on.
dev.set() defines to R the device to work on.
dev.off() closes the current open device.
Before invoking the following function llplot two devices are opened to which are
respectively redirected the perspectives and contours plots.
> set.seed(1000)
> x <- rnorm(250, mean = 4, sd = 3)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
grid <- expand.grid(xx, yy)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
z <- matrix(z, nrow = length(xx), ncol = length(yy))
dev.set(dev1)
Likelihood profile: s2
2.5
99%
2.0
2.0
2.5
99%
95%
95%
1.5
90%
1.5
90%
1.0
80%
1.0
80%
0.0
0.0
0.5
50%
0.5
50%
3.5
4.0
mi1
4.5
10
12
s2
159
160
a^
2
sig
m
sig
m
u
m
u
m
a^
2
od
liho
o
liho
Like
Like
n: 100, Sample mean: 4.05, Sample variance: 9.12
Like
a^
2
sig
m
sig
m
u
m
u
m
a^
2
od
od
liho
liho
Like
n: 200, Sample mean: 4.18, Sample variance: 8.21
>
>
>
>
>
>
pdf("Chapter06-normallikelihoodcontour.pdf")
dev2 <- dev.cur()
layout(matrix(1:4, 2, 2, byrow = TRUE))
sapply(c(100, 150, 200, 250), function(i) llplot(i))
dev.off(dev1)
dev.off(dev2)
Figures 6.6 and 6.7 report respectively the Likelihood function behaviour via
perspective and contour plots of the surface for in the interval [3, 5] and 2 in
the interval [6, 12]. Observe that perspective plots do not have the same scale for the
Likelihood. It is evident that the Likelihood gets more and more concentrated on the
true values = 4, 2 = 9 when the sample size increases. There is a larger uncertainty
in estimating the mean than the variance.
Observe that with R version 2.15.1 64-bit, run on a Windows 7 system, contour plots
are not produced for n 150. The code works with the 32-bit version.
12
11
10
8
2e163
5e164
8e111
2e111
3.0
3.5
4.0
4.5
5.0
3.0
3.5
4.0
4.5
5.0
11
10
70
2
6e
2
15
215
1e
2e
216
215
e
1.6
8e
4e270
10
2e216
6e216
26
9
3e
269
2.6e269
1.2e
26
1.2e215
1.4e
4e216
8e270
11
12
12
7e163
5.
5e
1
3e
63
1
63
0
11
4e
2.
1
1
1.4
3.5e163
11
10
9
1
1
63
1
63
1
4e
0
11
8e
1
11
63
5e
2e
1.
4e
1e163
2.5e1
1.
6e
1
11
10 1
1e1
.2
e
10
11
1
e
0
1.6
10
1
6e
2.
12
2e270
3.0
3.5
4.0
4.5
5.0
3.0
3.5
4.0
4.5
5.0
161
162
set.seed(1000)
x <- rnorm(250, mean = 4, sd = 3)
xx <- seq(3, 5, l = 50)
yy <- seq(6, 12, l = 50)
grid <- expand.grid(xx, yy)
ll <- function(theta) prod(dnorm(x, mean = theta[1],
sd = theta[2]^0.5))
> z <- sapply(1:nrow(grid), function(i) ll(c(grid[i,
1], grid[i, 2])))
> z <- matrix(z, nrow = length(xx), ncol = length(yy))
> sapply(1:200, function(i) persp(xx, yy, z, theta = i,
phi = 45, shade = 0.15, xlab = expression(mu),
ylab = expression(sigma^2), zlab = "Likelihood"))
6.2
Bernoulli distribution
set.seed(1000)
n <- 100
p <- 0.7
x <- rbinom(n, size = 1, prob = p)
n
X
ln pxi (1 p)1xi
i=1
163
164
200
300
400
logLikelihood
100
0.0
0.2
0.4
0.6
0.8
1.0
165
> library(bbmle)
> out <- mle2(ll, start = theta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = ll, start = theta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
phat
0.7200
0.0449 16.036 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 118.5907
Here we did not encounter any problem of convergence thought the parameter space
was a subset of <. In case a constrained optimization were necessary, one can make
recourse to the functions nlminb or optim or constrOptim (see the R help system
for more information).
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.9.
> plot(profile(out))
We now consider the behaviour of the Likelihood function
n
Y
pxi (1 p)1xi
L(|x1 , . . . , xn ) =
i=1
for different sample sizes. Let generate a sample x of length 250 from X Be(p = .7)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x.
> set.seed(1000)
> x <- rbinom(250, size = 1, prob = 0.7)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(0, 1, l = 500)
ll <- function(theta) prod(dbinom(x, size = 1,
prob = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = "p", ylab = "Likelihood",
sub = paste("n: ", n, ", Sample mean: ",
round(mean(x), 2), sep = ""))
}
> pdf("Chapter06-bernoullilikelihood.pdf")
> layout(matrix(1:4, 2, 2, byrow = TRUE))
166
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
0.60
0.65
0.70
0.75
0.80
phat
6.3
Exponential distribution
x 0.
0e+00
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0e+00
0.0e+00
4e55
1.0e68
p
n: 150, Sample mean: 0.71
Likelihood
p
n: 100, Sample mean: 0.72
Likelihood
167
2e40
Likelihood
1.0e26
0.0e+00
Likelihood
0.0
0.2
0.4
0.6
0.8
1.0
0.0
p
n: 200, Sample mean: 0.68
0.2
0.4
0.6
0.8
1.0
p
n: 250, Sample mean: 0.68
set.seed(1000)
n <- 100
lambda <- 4
x <- rexp(n, rate = lambda)
n
X
ln exi
i=1
can be obtained by using the function dexp with the argument log=TRUE.
168
1.5
0.0
0.5
1.0
density
2.0
2.5
3.0
10
Figure 6.11
169
200
400
300
logLikelihood
100
10
[,1]
[1,] 0.1807194
The behaviour of the logLikelihood function is shown in Figure 6.12 that can be
obtained with the code:
> xx <- seq(0, 10, l = 500)
> yy <- sapply(1:length(xx), function(i) -ll(xx[i]))
> plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "logLikelihood")
The estimation may also be performed by having recourse to the packages stat4 or
bbmle. For the sake of clarity we change the name to the parameter to estimate, to
remark that the starting value has to be specified as an element of a list.
> ll <- function(lambdahat) -sum(dexp(x, rate = lambdahat,
log = TRUE))
170
n
Y
exi
i=1
for different sample sizes. Let generate a sample x of length 250 from X Exp( = 4)
and define the function llplot to obtain the graph of the Likelihood function for the
n initial elements of x. The function pdf opens a pdf file as graphical device to save
the graph.
> set.seed(1000)
> x <- rexp(250, rate = 4)
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
3.5
4.0
4.5
5.0
5.5
lambdahat
171
Likelihood
1.0e+19
0e+00
2e+34
2.0e+19
4e+34
0.0e+00
Likelihood
172
10
10
4e+58
Likelihood
0e+00
2e+58
6.0e+45
0.0e+00
Likelihood
1.2e+46
10
10
Figure 6.14 reports the Likelihood behaviour for (0, 10). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.
6.4
Poisson distribution
x e
,
x!
x = 0, 1, 2, . . . , .
0.4
0.3
0.1
0.0
0
10
15
10
15
10
15
0.3
0.1
0.0
0.0
0.1
0.2
0.2
0.3
0.4
0.4
173
0.2
0.2
0.0
0.1
0.3
0.4
10
15
Figure 6.15
set.seed(1000)
n <- 100
lambda <- 4
x <- rpois(n, lambda = lambda)
xi e
log [L(x1 , . . . , xn ; )] =
ln
xi !
i=1
can be obtained by using the function dpois with the argument log=TRUE.
> ll <- function(theta) -sum(dpois(x, lambda = theta,
log = TRUE))
As starting value we use
174
1000
1500
logLikelihood
500
10
175
176
n
Y
xi e
i=1
xi !
for different sample sizes. Let generate a sample x of length 250 from X
P oisson( = 4) and define the function llplot to obtain the graph of the Likelihood
function for the n initial elements of x.
> set.seed(1000)
> x <- rpois(250, lambda = 4)
> llplot <- function(n) {
x <- x[1:n]
xx <- seq(0, 10, l = 500)
ll <- function(theta) prod(dpois(x, lambda = theta))
yy <- sapply(1:length(xx), function(i) ll(xx[i]))
plot(xx, yy, type = "l", xlab = expression(lambda),
ylab = "Likelihood", sub = paste("n: ", n,
", Sample mean: ", round(mean(x), 2),
sep = ""))
}
> pdf("Chapter06-poissonlikelihood.pdf")
> layout(matrix(1:4, 2, 2, byrow = TRUE))
> sapply(c(100, 150, 200, 250), function(i) llplot(i))
> dev.off()
Figure 6.18 reports the Likelihood behaviour for (0, 10). Observe that the graphs
do not have the same scale for the Likelihood. The Likelihood gets more and more
concentrated by increasing the sample size.
177
2.0
2.5
99%
95%
1.5
90%
1.0
80%
0.0
0.5
50%
3.4
3.6
3.8
4.0
4.2
lambdahat
6.5
Linear model
set.seed(1000)
n <- 100
beta <- c(10, 2:3)
E <- rnorm(n, mean = 0, sd = 2)
W <- matrix(rnorm(n * (length(beta) - 1), mean = 2,
sd = 1), nrow = n, byrow = TRUE)
> X <- cbind(1, W)
> Y <- X %*% beta + E
178
1.5e134
Likelihood
0.0e+00
4e89
0e+00
Likelihood
8e89
10
10
Likelihood
0e+00
2e181
4e225
4e181
8e225
0e+00
Likelihood
10
10
ln
log (L(x1 , . . . , xn ; )) =
24
(24 )1/2
i=1
and its opposite can be formalized in the following way
> ll <- function(theta) -sum(dnorm(Y - X %*% theta[1:length(beta)],
mean = 0, sd = theta[length(beta) + 1]^0.5, log = TRUE))
Starting values for the minimization algorithm, via nlm, are defined randomly for the
linear model parameters, by only ensuring that the starting value pertaining to the
variance of the error is positive.
>
>
>
>
179
180
Pr(z)
< 2.2e-16
< 2.2e-16
< 2.2e-16
1.556e-12
***
***
***
***
-2 log L: 421.258
The parameter estimation of the linear model can be obtained also via OLS, by means
of the function lm.
> summary(lm(Y ~ W))
Call:
lm(formula = Y ~ W)
Residuals:
Min
1Q
-4.5902 -1.2625
Median
0.1035
3Q
1.0177
Max
5.2219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
9.9582
0.6492
15.34 < 2e-16 ***
W1
1.8419
0.2100
8.77 6.04e-14 ***
W2
3.2075
0.2225
14.42 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.019 on 97 degrees of freedom
Multiple R-squared: 0.7462,
Adjusted R-squared:
F-statistic: 142.6 on 2 and 97 DF, p-value: < 2.2e-16
0.7409
Observe that the parameter standard errors and their p-values provided by Maximum
Likelihood are based on the asymptotic covariance matrix and on the normality
assumption for the asymptotic distributions of the parameter estimators, while those
obtained in the linear model via OLS are based on an unbiased estimate for the
variance of the residuals and on the t distribution (always under the assumption of
normality of the errors).
The estimator for the residual standard error can be obtained, for Maximum
Likelihood, as:
> out@coef[length(beta) + 1]^0.5
sigma2
1.988671
2.5
99%
2.5
99%
2.0
95%
2.0
95%
90%
80%
0.5
50%
0.0
0.0
0.5
50%
10.5
11.5
1.4
1.6
1.8
2.0
2.2
2.4
99%
2.5
95%
2.0
beta2
2.0
beta1
95%
90%
80%
1.0
80%
1.5
1.5
90%
1.0
181
1.0
80%
1.5
1.5
90%
1.0
0.5
50%
0.0
0.0
0.5
50%
2.6
2.8
3.0
3.2
3.4
3.6
3.8
beta3
3.0
3.5
4.0
4.5
5.0
5.5
6.0
sigma2
Figure 6.19
This is a biased estimate; the unbiased estimate that coincides with the OLS one is
> (out@coef[length(beta) + 1] * n/(n - length(beta)))^0.5
sigma2
2.01919
Observe that the function mle in the package bbmle produces as a result an object
of class S4. By executing the instruction str(out), you will notice that it is not a
traditional list, but some of its values are identified with the symbol @, which was
also used above to extract the coefficients.
By plotting the result of the function profile applied to an mle2 object it is
possible to investigate the behaviour of the objective function near the solution and
obtain graphical confidence intervals, see Figure 6.19.
> plot(profile(out))
182
6.6
The preceding code is finally applied to the estimation of the linear model (2.2), we
considered in Chapter 2.
WAGE = 1 + 2 MALE + 3 SCHOOL + 4 EXPER + ERROR
> wages <- read.table(unzip("wages_in_the_USA.zip",
"wages1.dat"), header = TRUE)
> regr <- lm(WAGE ~ MALE + SCHOOL + EXPER, data = wages)
> summary(regr)
Call:
lm(formula = WAGE ~ MALE + SCHOOL + EXPER, data = wages)
Residuals:
Min
1Q Median
-7.654 -1.967 -0.457
3Q
Max
1.444 34.194
Coefficients:
Estimate Std. Error t value
(Intercept) -3.38002
0.46498 -7.269
MALE
1.34437
0.10768 12.485
SCHOOL
0.63880
0.03280 19.478
EXPER
0.12483
0.02376
5.253
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
4.50e-13
< 2e-16
< 2e-16
1.59e-07
***
***
***
***
183
184
Coefficients:
Estimate
beta1 -3.3800069
beta2
1.3443683
beta3
0.6387972
beta4
0.1248248
sigma2 9.2677234
Std. Error
0.46469418
0.10761051
0.03277593
0.02374833
0.22836335
-2 log L: 16682.18
> library(bbmle)
> out <- mle2(ll, start = beta.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = ll, start = beta.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
beta1 -3.380007
0.464694 -7.2736 3.500e-13 ***
beta2
1.344368
0.107611 12.4929 < 2.2e-16 ***
beta3
0.638797
0.032776 19.4898 < 2.2e-16 ***
beta4
0.124825
0.023748 5.2562 1.471e-07 ***
sigma2 9.267723
0.228363 40.5832 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 16682.18
> logLik(out)
'log Lik.' -8341.091 (df=5)
> round(vcov(out), 4)
beta1
beta2
beta3 beta4 sigma2
beta1
0.2159 -0.0078 -0.0138 -6e-03 0.0000
beta2 -0.0078 0.0116 0.0003 -3e-04 0.0000
beta3 -0.0138 0.0003 0.0011 1e-04 0.0000
beta4 -0.0060 -0.0003 0.0001 6e-04 0.0000
sigma2 0.0000 0.0000 0.0000 0e+00 0.0521
By plotting the result of the function profile applied to an mle2 object it is possible
to investigate the behaviour of the objective function near the solution and obtain
graphical confidence intervals, see Figure 6.20.
> plot(profile(out))
2.5
99%
2.5
99%
2.5
99%
2.0
95%
2.0
95%
2.0
95%
1.5
1.5
beta2
99%
99%
95%
95%
0.60
0.65
0.70
beta3
1.5
80%
1.0
1.0
80%
0.55
90%
1.5
90%
50%
0.5
1.3
2.5
1.1
2.0
2.5
beta1
2.5
3.5
2.0
4.5
80%
0.0
0.5
50%
0.0
0.0
0.5
50%
90%
1.0
80%
1.0
1.0
80%
1.5
90%
1.5
90%
0.5
50%
0.0
0.0
0.5
50%
0.06
0.10
0.14
beta4
0.18
8.8
9.2
9.6
sigma2
Figure 6.20
185
7
Models with Limited Dependent
Variables
7.1
Data are available in the file BENEFITS.WF1, which is a work file of EViews.
First invoke the package hexView and next the command readEViews to read data.
The file is extracted from the compressed archive ch07.zip with the function unzip.
> library(hexView)
> benefits <- readEViews(unzip("ch07.zip", "Chapter 7/benefits.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
The files BENEFITS contain a sample of 4877 blue collar workers that got unemployed
in the USA between 1982 and 1991. The following variables (many are dummy
variables) are present:
MALE 1 if male
188
MARRIED 1 if married
DKIDS 1 if kids
RR replacement rate
RR2 RR squared
a Logit model
a Probit model
7.1.1
The linear probability model can be estimated (without making any attempt to
constrain the implied probabilities between 0 and 1) with the function lm, we used
for linear models, see Chapter 2.
> lpmfit <- lm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, data = benefits)
> summary(lpmfit)
Call:
lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Residuals:
Min
1Q
-0.9706 -0.5374
Median
0.2231
3Q
0.3347
Max
0.6770
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0768689 0.1220560 -0.630 0.52887
RR
0.6288584 0.3842068
1.637 0.10174
RR2
-1.0190587 0.4809550 -2.119 0.03416 *
AGE
0.0157489
AGE2
-0.0014595
TENURE
0.0056531
SLACK
0.1281283
ABOL
-0.0065206
SEASONAL
0.0578745
HEAD
-0.0437490
MARRIED
0.0485952
DKIDS
-0.0305088
DYKIDS
0.0429115
SMSA
-0.0351950
NWHITE
0.0165889
YRDISPL
-0.0133149
SCHOOL12
-0.0140365
MALE
-0.0363176
STATEMB
0.0012394
STATEUR
0.0181479
--Signif. codes: 0 "***"
0.0047841
0.0006016
0.0012152
0.0142249
0.0248281
0.0357985
0.0166430
0.0161348
0.0174321
0.0197563
0.0140138
0.0187109
0.0030686
0.0168433
0.0178142
0.0002039
0.0030843
3.292
-2.426
4.652
9.007
-0.263
1.617
-2.629
3.012
-1.750
2.172
-2.511
0.887
-4.339
-0.833
-2.039
6.078
5.884
0.00100
0.01530
3.37e-06
< 2e-16
0.79285
0.10601
0.00860
0.00261
0.08016
0.02990
0.01206
0.37534
1.46e-05
0.40468
0.04154
1.31e-09
4.28e-09
189
**
*
***
***
**
**
.
*
*
***
*
***
***
7.1.2
The parameters in the Logit model can be estimated with the function glm which is
used to fit generalized linear models. These models are specified by giving a symbolic
description of (see Hardin and Hilbe (2007)):
1. the probability distribution function (belonging to the exponential family) of
the dependent variable (response).
2. the linear systematic component relating the predictor, = X, to the product
of the matrix X containing the explanatory variables with the parameters .
3. the link function relating the mean of the response to the linear predictor.
This is made with the two main arguments of the function glm:
a model formula for the linear systematic component with the same structure
used for defining linear models,
190
Finally the data.frame containing the variables in the model may be specified with
the argument data.
By default the estimation method used by R for generalized linear models consists
in iteratively reweighted least squares (IWLS). See the help ?glm for more information
on the features of the glm function and Hardin and Hilbe (2007) for a detailed
presentation of generalized linear models.
> logitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, family = binomial(link = "logit"),
data = benefits)
> summary(logitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"logit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2024 -1.2216
0.6959
3Q
0.8844
Max
1.6015
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.800498
0.604168 -4.635 3.56e-06 ***
RR
3.068078
1.868226
1.642 0.10054
RR2
-4.890616
2.333522 -2.096 0.03610 *
AGE
0.067697
0.023910
2.831 0.00463 **
AGE2
-0.005968
0.003038 -1.964 0.04950 *
TENURE
0.031249
0.006644
4.703 2.56e-06 ***
SLACK
0.624822
0.070639
8.845 < 2e-16 ***
ABOL
-0.036175
0.117808 -0.307 0.75879
SEASONAL
0.270874
0.171171
1.582 0.11354
HEAD
-0.210682
0.081226 -2.594 0.00949 **
MARRIED
0.242266
0.079410
3.051 0.00228 **
DKIDS
-0.157927
0.086218 -1.832 0.06699 .
DYKIDS
0.205894
0.097492
2.112 0.03470 *
SMSA
-0.170354
0.069781 -2.441 0.01464 *
NWHITE
0.074070
0.092956
0.797 0.42555
YRDISPL
-0.063700
0.014997 -4.247 2.16e-05 ***
SCHOOL12
-0.065258
0.082413 -0.792 0.42845
MALE
-0.179829
0.087535 -2.054 0.03994 *
STATEMB
0.006027
0.001009
5.973 2.33e-09 ***
STATEUR
0.095620
0.015912
6.009 1.86e-09 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
191
on 4876
on 4857
degrees of freedom
degrees of freedom
7.1.3
The parameters in a Probit model can be estimated by using the same function glm,
that allowed the Logit estimates to be obtained.
We only need to specify the "probit" link.
> probitfit <- glm(Y ~ RR + RR2 + AGE + AGE2 + TENURE +
SLACK + ABOL + SEASONAL + HEAD + MARRIED + DKIDS +
DYKIDS + SMSA + NWHITE + YRDISPL + SCHOOL12 +
MALE + STATEMB + STATEUR, family = binomial(link = "probit"),
data = benefits)
> summary(probitfit)
Call:
glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL+SCHOOL12+MALE+STATEMB+STATEUR, family = binomial(link =
"probit"), data = benefits)
Deviance Residuals:
Min
1Q
Median
-2.2247 -1.2269
0.6988
3Q
0.8884
Max
1.5834
Coefficients:
(Intercept)
RR
RR2
AGE
AGE2
TENURE
SLACK
ABOL
SEASONAL
HEAD
MARRIED
192
DKIDS
-0.0965778
DYKIDS
0.1236097
SMSA
-0.1001520
NWHITE
0.0517937
YRDISPL
-0.0384797
SCHOOL12
-0.0415517
MALE
-0.1067168
STATEMB
0.0036399
STATEUR
0.0568271
--Signif. codes: 0 "***"
0.0518420
0.0586377
0.0418419
0.0558335
0.0090509
0.0497219
0.0527404
0.0006065
0.0094328
-1.863 0.06247 .
2.108 0.03503 *
-2.394 0.01668 *
0.928 0.35359
-4.251 2.12e-05 ***
-0.836 0.40333
-2.023 0.04303 *
6.002 1.95e-09 ***
6.024 1.70e-09 ***
on 4876
on 4857
degrees of freedom
degrees of freedom
7.1.4
To compare the parameter estimates in the preceding model specifications we can use
the function mtable available in the package memisc. See p. 21 for the use of mtable.
> library(memisc)
> table7.2 <- mtable(LPM = lpmfit, Logit = logitfit,
Probit = probitfit)
> table7.2 <- relabel(table7.2, "(Intercept)" = "constant",
RR = "replacement rate", RR2 = "replacement rate^2",
AGE = "age", AGE2 = "age^2/10", TENURE = "tenure",
SLACK = "slack work", ABOL = "abolished position",
SEASONAL = "seasonal work", HEAD = "head of household",
MARRIED = "married", DKIDS="children",DYKIDS="youngchildren",
SMSA = "live in SMSA", NWHITE = "non white",
YRDISPL = "year of displacement", SCHOOL12=">12yearsofschool",
MALE = "male", STATEMB = "state max. befefits",
STATEUR = "state unempl. befefits")
> table7.2
Calls:
LPM: lm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL +
SEASONAL + HEAD + MARRIED + DKIDS + DYKIDS + SMSA + NWHITE +
YRDISPL + SCHOOL12 + MALE + STATEMB + STATEUR, data = benefits)
Logit: glm(formula = Y ~ RR + RR2 + AGE + AGE2 + TENURE + SLACK + ABOL+
193
194
(0.018)
(0.088)
(0.053)
0.001*** 0.006*** 0.004***
(0.000)
(0.001)
(0.001)
state unempl. befefits
0.018*** 0.096*** 0.057***
(0.003)
(0.016)
(0.009)
----------------------------------------------------R-squared
0.067
adj. R-squared
0.063
sigma
0.450
F
18.331
p
0.000
0.000
0.000
Log-likelihood
-3016.708 -2873.197 -2874.071
Deviance
983.900 5746.393 5748.142
AIC
6075.415 5786.393 5788.142
BIC
6211.753 5916.239 5917.987
N
4877
4877
4877
Aldrich-Nelson R-sq.
0.065
0.065
McFadden R-sq.
0.056
0.056
Cox-Snell R-sq.
0.067
0.067
Nagelkerke R-sq.
0.094
0.094
phi
1.000
1.000
Likelihood-ratio
339.663
337.914
=====================================================
state max. befefits
Observe that the R-squared, adj. R-squared, sigma and F final statistics do not
have any statistical relevance for a linear probability model; so they have not to be
considered.
The estimated marginal effect for TENURE, evaluated at the sample average of the
regressors, can be obtained as:
> xlevels <- apply(logitfit$model[, -1], 2, mean)
> avefitlogit <- c(1, xlevels) %*% logitfit$coef
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["TENURE"]
[,1]
[1,] 0.00659471
for the logit model and
> avefitprobit <- c(1, xlevels) %*% probitfit$coef
> dnorm(avefitprobit) * coef(probitfit)["TENURE"]
[,1]
[1,] 0.006203453
for the probit model. The estimated marginal effect of being married for the average
person is:
> exp(avefitlogit)/(1 + exp(avefitlogit))^2 * coef(logitfit)["MARRIED"]
195
[,1]
[1,] 0.05112677
> dnorm(avefitprobit) * coef(probitfit)["MARRIED"]
[,1]
[1,] 0.05100272
in the two specifications.
7.1.5
The
Rp2 ,
wr1
wr0
where wr1 and wr0 are the proportions of incorrect predictions respectively for the
considered model and for a model containing only an intercept.
Rp2 (and the HM index, see Verbeeks Section 7.1.5) can be obtained by defining a
function R2p:
> R2p <- function(y, estobject, cutoff = 0.5) {
a <- table(y, (estobject$fitted > cutoff) * 1)
wr_1 <- 1 - sum(diag(a))/sum(a)
phat <- sum(y)/length(y)
wr_0 <- (1 - phat) * (phat > cutoff) + phat *
(phat <= cutoff)
pa <- prop.table(a, 1)
return(list("Cross-tabulation of actual and predicted outcomes" =
a, Rsq_p = round(1 - wr_1/wr_0, 4), HM = round(pa[1,
1] + pa[2, 2], 4)))
}
estobject can be an object of clas lm or glm. Observe that $fitted extracts
two different type of predicted values; in particular for glm they correspond to the
transformation through the Logit and Probit functions of the predicted values from
a linear specification, see Verbeeks relationship (7.15). cutoff is a threshold value:
when the estimated probality is larger than cutoff the fitted response is set equal to
1, otherwise the response is set to 0. Verbeek assumes cutoff = 0.5.
The Rp2 may then be obtained for the preceding models by invoking the function
R2p, specifying the actual outcomes for the y argument, benefits$Y, and the objects
(lpmfit, logitfit and probitfit) resulting respectively from the linear probability,
Logit and Probit model estimation procedures.
> R2p(y = benefits$Y, estobject = lpmfit, cutoff = 0.5)
$"Cross-tabulation of actual and predicted outcomes"
y
0
0
1
184 1358
196
130 3205
$Rsq_p
[1] 0.035
$HM
[1] 1.0803
> R2p(benefits$Y, logitfit)
$"Cross-tabulation of actual and predicted outcomes"
y
0
1
0
1
242 1300
171 3164
$Rsq_p
[1] 0.046
$HM
[1] 1.1057
> R2p(benefits$Y, probitfit)
$"Cross-tabulation of actual and predicted outcomes"
y
0
1
0
1
231 1311
162 3173
$Rsq_p
[1] 0.0447
$HM
[1] 1.1012
With the function pR2, available in the package pscl the following pseudo-R2
measures may be produced in presence of a glm object, like the Logit and Probit
ones, that results applying the glm function, see Hardin and Hilbe (2007).
See the help ?pscl::pR2 and Hardin and Hilbe (2007) for more information.
> library(pscl)
> pR2(logitfit)
llh
llhNull
-2.873197e+03 -3.043028e+03
r2ML
r2CU
6.727594e-02 9.436996e-02
> pR2(probitfit)
llh
llhNull
-2.874071e+03 -3.043028e+03
r2ML
r2CU
6.694146e-02 9.390077e-02
G2
3.396629e+02
McFadden
5.581002e-02
G2
3.379143e+02
McFadden
5.552271e-02
197
A cross-tabulation (like the one returned by the function R2p) of actual outcomes
against predicted outcomes for discrete data models, with summary statistics such as
the percentage of correctly predicted under fitted and null models may be obtained
by applying the function hitmiss available in the package pscl to a glm object.
The user can also specify a classification threshold different from 0.5 for the predicted
probabilities by changing the default argument k=0.5.
> hitmiss(logitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 242 171
yhat=1 1300 3164
Percent Correctly Predicted = 69.84%
Percent Correctly Predicted = 15.69%,
Percent Correctly Predicted = 94.87%
Null Model Correctly Predicts 68.38%
[1] 69.83802 15.69390 94.87256
> hitmiss(probitfit)
Classification Threshold = 0.5
y=0 y=1
yhat=0 231 162
yhat=1 1311 3173
Percent Correctly Predicted = 69.8%
Percent Correctly Predicted = 14.98%,
Percent Correctly Predicted = 95.14%
Null Model Correctly Predicts 68.38%
[1] 69.79701 14.98054 95.14243
for y = 0
for y = 1
for y = 0
for y = 1
Observe that the log-likelihood for the naive model, where the probability of applying
for benefits is constant, can be obtained, e.g. for the Logit model from the crosstabulation of actual and predicted outcomes, see Verbeeks relationship (7.19):
> a <- R2p(benefits$Y, logitfit)[[1]]
> log0 <- sum(a[2, ]) * log(sum(a[2, ])/sum(a)) + sum(a[1,
]) * log(sum(a[1, ])/sum(a))
> log0
198
[1] -3043.028
The function logLik extracts the log-likelihood from a glm object; so the Mc Fadden
pseudo R-squared (returned by the function pR2) can be also computed with:
> 1 - logLik(logitfit)[1]/log0
[1] 0.05581002
7.2
log
that is
or
p
1p
exp(X 0 )
.
1 + exp(X 0 )
= 0 + 1 X1 + + p Xp
p
= exp{0 + 1 X1 + + p Xp }
1p
p
= e0 e1 X1 ep Xp .
1p
e1 =
1p
1p
1 p1
Lets now study for various p and 1 the relationship between p1 and p, see 7.1
>
>
>
>
>
>
>
>
>
p <- 0:25/25
p <- p[-c(1, length(p))]
odds <- p/(1 - p)
beta <- 0:16/4 - 2
expbeta <- exp(beta)
table <- outer(odds, expbeta, "*")
rownames(table) <- p
colnames(table) <- round(beta, 2)
p1 <- round(table/(1 + table), 3)
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
-1.75
0.01
0.01
0.02
0.03
0.04
0.05
0.06
0.08
0.09
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.31
0.35
0.41
0.48
0.56
0.67
0.81
0.01
0.01
0.02
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.19
0.22
0.26
0.30
0.35
0.41
0.50
0.61
0.76
0.01
0.02
0.03
0.04
0.05
0.07
0.08
0.10
0.11
0.13
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.41
0.47
0.54
0.62
0.72
0.84
-1.5
0.01
0.02
0.04
0.05
0.07
0.08
0.10
0.12
0.14
0.16
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.42
0.48
0.53
0.60
0.68
0.77
0.87
-1.25
0.01
0.03
0.05
0.06
0.08
0.10
0.12
0.15
0.17
0.20
0.22
0.25
0.28
0.32
0.36
0.40
0.44
0.49
0.54
0.59
0.66
0.73
0.81
0.90
-1
0.02
0.04
0.06
0.08
0.11
0.13
0.15
0.18
0.21
0.24
0.27
0.30
0.34
0.38
0.41
0.46
0.50
0.55
0.60
0.65
0.71
0.78
0.84
0.92
-0.75
0.02
0.05
0.08
0.10
0.13
0.16
0.19
0.22
0.25
0.29
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.61
0.66
0.71
0.76
0.82
0.88
0.94
-0.5
0.03
0.06
0.10
0.13
0.16
0.20
0.23
0.27
0.30
0.34
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.67
0.71
0.76
0.80
0.85
0.90
0.95
-0.25
0.04
0.08
0.12
0.16
0.20
0.24
0.28
0.32
0.36
0.40
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.72
0.76
0.80
0.84
0.88
0.92
0.96
0
0.05
0.10
0.15
0.20
0.24
0.29
0.33
0.38
0.42
0.46
0.50
0.54
0.58
0.62
0.66
0.69
0.73
0.77
0.80
0.84
0.87
0.90
0.94
0.97
0.25
0.06
0.12
0.18
0.24
0.29
0.34
0.39
0.44
0.48
0.52
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.81
0.84
0.87
0.90
0.92
0.95
0.97
0.5
0.08
0.15
0.22
0.29
0.35
0.40
0.45
0.50
0.54
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.87
0.89
0.92
0.94
0.96
0.98
0.75
0.10
0.19
0.27
0.34
0.40
0.46
0.51
0.56
0.60
0.64
0.68
0.71
0.75
0.78
0.80
0.83
0.85
0.88
0.90
0.92
0.94
0.95
0.97
0.98
1
0.13
0.23
0.32
0.40
0.47
0.52
0.58
0.62
0.66
0.70
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.92
0.93
0.95
0.96
0.98
0.99
1.25
0.16
0.28
0.38
0.46
0.53
0.59
0.64
0.68
0.72
0.75
0.78
0.80
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.95
0.96
0.97
0.98
0.99
1.5
0.19
0.33
0.44
0.52
0.59
0.64
0.69
0.73
0.76
0.79
0.82
0.84
0.86
0.88
0.90
0.91
0.92
0.94
0.95
0.96
0.97
0.98
0.98
0.99
1.75
2
0.23
0.39
0.50
0.58
0.65
0.70
0.74
0.78
0.81
0.83
0.85
0.87
0.89
0.90
0.92
0.93
0.94
0.95
0.96
0.97
0.97
0.98
0.99
0.99
on the rows p and on the columns ; the entries in the table are the new values of p for a unitary variation of x
-2
Table 7.1
200
and
p
1p
p
1p
e
1 log(1.01)
p1
=
1 p1
p
p1
0.011
'
e
=
1p
1 p1
Exercise
Produce a table similar to Table 7.1 for the logarithm case. Comment the results.
7.3
The parameter estimation of an ordered response model is considered and the results
are compared with those pertaining a Logit framework.
Data are available in the file credit.dta, which is in the Stata format and is available
in the compressed archive ch07.zip. To import data, we have first to invoke the
package foreign and next the command read.dta.
> library(foreign)
> credit <- read.dta(unzip("ch07.zip", "Chapter 7/credit.dta"))
The data base contains 921 observations for 2005 for US firms credit ratings, including
a set of firm characteristics. The data are taken from Compustat.
The following variables are available:
Some summary statistics are first obtained, see Verbeeks Table 7.4, which may be
reproduced with the following code:
> t(apply(credit, 2, function(x) c(mean = mean(x),
median = median(x), min = min(x), max = max(x))))
or
201
a model formula with the same structure used for defining linear models, and
> library(MASS)
> orderedlogitfit <- polr(rating ~ booklev + ebit +
logsales + reta + wka, data = credit, method = "logistic")
The results can be resumed in a unique output with the function mtable, available
in the package memisc.
202
> library(memisc)
> mtable(logitfit, orderedlogitfit)
Calls:
logitfit: glm(formula = invgrade ~ booklev + ebit + logsales + reta +
wka, family = binomial(link = "logit"), data = credit)
orderedlogitfit: polr(formula = rating ~ booklev + ebit + logsales +
reta + wka, data = credit, method = "logistic")
=====================================================
logitfit
orderedlogitfit
----------------------------------------------------(Intercept)
-8.214***
(0.867)
booklev
-4.427***
-2.752***
(0.771)
(0.477)
ebit
4.355**
4.731***
(1.440)
(0.945)
logsales
1.082***
0.941***
(0.096)
(0.059)
reta
4.116***
3.560***
(0.489)
(0.302)
wka
-4.012***
-2.580***
(0.748)
(0.483)
1|2
-0.370
(0.633)
2|3
4.881***
(0.521)
3|4
7.626***
(0.551)
4|5
9.885***
(0.592)
5|6
12.883***
(0.673)
6|7
14.783***
(0.784)
----------------------------------------------------Aldrich-Nelson R-sq.
0.391
0.484
McFadden R-sq.
0.465
0.309
Cox-Snell R-sq.
0.474
0.608
Nagelkerke R-sq.
0.633
0.639
phi
1.000
Likelihood-ratio
591.796
862.873
p
0.000
0.000
Log-likelihood
-341.078
-965.307
Deviance
682.155
1930.614
AIC
694.155
1952.614
203
BIC
723.108
2005.694
N
921
921
=====================================================
Observe that the likelihood ratio test may also be performed by having recourse to
the function lrtest in the package lmtest.
Additional pseudo-R2 measures may be obtained with the function pR2 in the package
pscl.
> library(pscl)
> pR2(logitfit)
llh
llhNull
G2
McFadden
r2ML
-341.0775772 -636.9757787 591.7964028
0.4645360
0.4740549
r2CU
0.6327213
> pR2(orderedlogitfit)
llh
llhNull
G2
McFadden
-965.3071623 -1396.7436930
862.8730614
0.3088874
r2ML
r2CU
0.6081543
0.6389289
To compute the probabilities for the average firm to obtain an investment grade,
having a book leverage of .25 and .75, see Verbeek p. 225, we first obtain the linear
estimates x0i corresponding to the first and third quantiles of booklev by considering
the sample average levels for the other variables:
> xlevels <- apply(credit[, c("ebit", "logsales", "reta",
"wka")], 2, mean)
> avefit <- c(1, quantile(credit$booklev, 0.25), xlevels) %*%
logitfit$coef
> avefit1 <- c(1, quantile(credit$booklev, 0.75), xlevels) %*%
logitfit$coef
and then apply the logistic transformation exp(x0i )/(1 + exp(x0i )), namely:
P {yi 0|xi } = P {i 1 x0i |xi } = P {i 1 + x0i |xi } = F (1 + x0i ) =
=
exp(1 + x0i )
=
1 + exp(1 + x0i )
1
1
exp(1 +x0i )
+1
1
exp(1 x0i ) + 1
204
exp(3 + x0i )
=
1 + exp(3 + x0i )
1
1
exp(3 +x0i )
+1
1
exp(3 x0i ) + 1
we have
> avefit <- c(quantile(credit$booklev, 0.25), xlevels) %*%
orderedlogitfit$coef
> avefit1 <- c(quantile(credit$booklev, 0.75), xlevels) %*%
orderedlogitfit$coef
> 1/(1 + exp(7.626 - avefit))
[,1]
[1,] 0.5169951
> 1/(1 + exp(7.626 - avefit1))
[,1]
[1,] 0.3701199
According to both models the probability of obtaining an investment grade decreases
when the booking leverage increases.
7.4
NY dummy, 1 if no,yes
YN dummy, 1 if yes,no
YY dummy, 1 if yes,yes
205
Two ordered probit models are considered for explaining the willingness to pay: the
first one including only an intercept, while the age class, the gender and the income
class are included as explanatory variables for the second model.
To obtain the maximum likelihood parameter estimates we have first to build the
corresponding likelihood functions, see Verbeeks relationships (7.33)-(7.34). Recall
from Chapter 6 that we have to define the opposite of the Log-likelihood function,
since internal optimization routines provide for minimization.
For both models we define a variable regr including the regressors (for the first
model only a vector of ones). In the former model the likelihood depends on b1 and
on sigma. In the latter one it depends also on b2, b3 and b4.
> llI <- function(b1, sigma) {
regr <- as.matrix(rep(1, nrow(wtp)))
s = sigma
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr * b1)/s)) +
wtp$NY * log(pnorm((wtp$BID1 - regr * b1)/s) pnorm((wtp$BIDL - regr * b1)/s)) + wtp$YN *
log(pnorm((wtp$BIDH - regr * b1)/s) - pnorm((wtp$BID1 regr * b1)/s)) + wtp$YY * log(1 - pnorm((wtp$BIDH regr * b1)/s)))
}
> llII <- function(b1, b2, b3, b4, sigma) {
regr <- cbind(1, wtp$AGE, wtp$FEMALE, wtp$INCOME)
s <- sigma
b <- c(b1, b2, b3, b4)
-sum(wtp$NN * log(pnorm((wtp$BIDL - regr %*%
b)/s)) + wtp$NY * log(pnorm((wtp$BID1 - regr %*%
b)/s) - pnorm((wtp$BIDL - regr %*% b)/s)) +
wtp$YN * log(pnorm((wtp$BIDH - regr %*% b)/s) pnorm((wtp$BID1 - regr %*% b)/s)) + wtp$YY *
log(1 - pnorm((wtp$BIDH - regr %*% b)/s)))
}
We can now use the function mle2 in the package bbmle to obtain the parameter
estimates, having defined a list of starting values for the two models.
> library(bbmle)
> b.start <- list(b1 = 10, sigma = 15)
> out <- mle2(llI, start = b.start)
> summary(out)
Maximum likelihood estimation
206
Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
18.7391
2.4969 7.5049 6.148e-14 ***
sigma 38.6122
2.9332 13.1637 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5, sigma = 30)
> out <- mle2(llII, start = b.start)
> summary(out)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start)
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
30.0730
8.2788 3.6325 0.0002806
b2
-6.9309
1.6656 -4.1613 3.164e-05
b3
-5.1561
4.7135 -1.0939 0.2739901
b4
4.8940
1.9114 2.5604 0.0104549
sigma 36.4774
2.7488 13.2701 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
***
***
*
***
0.05 "." 0.1 " " 1
-2 log L: 782.8182
The parameter estimates for the second model and the asymptotic standard errors
for the parameter estimates in both models are similar to those obtained by Verbeek
but not equal. Let us check what happens by changing the starting values in the
minimization procedure. The function coef allows only the paramater estimates to
be extracted from the object resulting by applying mle2.
> b.start <- list(b1 =
> coef(mle2(llI, start
b1
sigma
18.73886 38.61274
> b.start <- list(b1 =
> coef(mle2(llI, start
b1
sigma
18.73940 38.61309
> b.start <- list(b1 =
> coef(mle2(llI, start
1, sigma = 15)
= b.start))
1, sigma = 20)
= b.start))
4, sigma = 50)
= b.start))
207
b1
sigma
18.73680 38.61727
> b.start <- list(b1 = 40, sigma = 100)
> coef(mle2(llI, start = b.start))
b1
sigma
18.73239 38.61679
The minimization procedure applied to the first model seems to be quite robust with
respect to different sets of initial starting values.
For the second model we have:
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 6,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.132584 -6.939768 -5.186392 4.887521 36.489069
> b.start <- list(b1 = 1, b2 = 1, b3 = 1, b4 = 5,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
29.978269 -6.916038 -5.156463 4.908729 36.469106
> b.start <- list(b1 = 1, b2 = 2, b3 = 3, b4 = 4,
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.058567 -6.925511 -5.157301 4.893616 36.462669
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 =
sigma = 30)
> coef(mle2(llII, start = b.start))
b1
b2
b3
b4
sigma
30.110749 -6.933783 -5.184297 4.886962 36.481068
sigma = 30)
sigma = 14)
sigma = 14)
4,
The shape of the likelihood function seems to be too flat around its minimum; this
issue renders the optimization problem somewhat difficult and the estimation result
is unstable depending on initial starting values. To overcome this situation we have to
specify some parameters to control for the minimization algorithm; in particular we
fix the relative convergence tolerance to reltol = 1015 and the maximum number
of iterations to maxit = 10000. For the first model we have:
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61273
> b.start <- list(b1 = 1, sigma =
> coef(mle2(llI, start = b.start,
maxit = 10000)))
b1
sigma
18.73884 38.61272
15)
control = list(reltol = 1e-15,
20)
control = list(reltol = 1e-15,
208
209
The estimation results, though a bit different from those proposed by Verbeek for the
second model, do no more depend in a significant manner on initial starting values.
So we have the following final output:
> b.start <- list(b1 = 1, sigma = 15)
> outi <- mle2(llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outi)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
18.7388
2.4970 7.5047 6.158e-14 ***
sigma 38.6127
2.9333 13.1635 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 818.009
> b.start <- list(b1 = 30, b2 = -7, b3 = 10, b4 = 4,
sigma = 30)
> outc <- mle2(llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
> summary(outc)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 10000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1
30.1094
8.2791 3.6368 0.000276
b2
-6.9334
1.6657 -4.1625 3.148e-05
b3
-5.1846
4.7138 -1.0999 0.271390
b4
4.8876
1.9113 2.5572 0.010552
sigma 36.4779
2.7489 13.2699 < 2.2e-16
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
***
***
*
***
0.05 "." 0.1 " " 1
-2 log L: 782.8181
From the model with only the intercept the proportion of population with a negative
willingness to pay is:
210
7.5
211
As an illustration for models in presence of count data Verbeek considers the analysis
of the relationship between the number of patents obtained by some firms and their
Research and Development Expenditures.
Data are available in the file patents.dat, a text file that may be read with the
command read.table.
> patents <- read.table(unzip("ch07.zip", "Chapter 7/patents.dat"),
header = TRUE)
The file patents contain data on 181 international manufacturing firms, with their
R&D expenditures, number of patents, industry, etc. for 1990 and 1991. The following
variables are available:
The relationship between the number of patents (a count variable) and the
expenditures in Research and Development are first analyzed by means of a Poisson
regression model.
The maximum likelihood parameter estimates may be obtained by using the function
glm and specifying poisson as family and "log" as link function.
> poissonfit <- glm(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
family = poisson(link = "log"), data = patents)
> summary(poissonfit)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = poisson(link="log"),data=patents)
Deviance Residuals:
Min
1Q
Median
-27.979
-5.246
-1.572
3Q
2.352
Max
29.246
212
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.873731
0.065868 -13.27 < 2e-16 ***
LR91
0.854525
0.008387 101.89 < 2e-16 ***
AEROSP
-1.421850
0.095640 -14.87 < 2e-16 ***
CHEMIST
0.636267
0.025527
24.93 < 2e-16 ***
COMPUTER
0.595343
0.023338
25.51 < 2e-16 ***
MACHINES
0.688953
0.038346
17.97 < 2e-16 ***
VEHICLES
-1.529653
0.041864 -36.54 < 2e-16 ***
JAPAN
0.222222
0.027502
8.08 6.46e-16 ***
US
-0.299507
0.025300 -11.84 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 29669.4
Residual deviance: 9081.9
AIC: 9919.6
on 180
on 172
degrees of freedom
degrees of freedom
G2
2.058754e+04
McFadden
6.752422e-01
The same results (except1 for the McFadden pseudo R2 ) together with the parameter
estimates may also be obtained by applying the function mtable of the package memisc
to the glm object poissonfit.
> library(memisc)
> mtable(poissonfit)
1 mtable
computes McFadden R2 as
1
deviance(model)
deviance(Saturated M odel)
213
Calls:
poissonfit: glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = poisson(link = "log"),
data = patents)
===============================
(Intercept)
-0.874***
(0.066)
LR91
0.855***
(0.008)
AEROSP
-1.422***
(0.096)
CHEMIST
0.636***
(0.026)
COMPUTER
0.595***
(0.023)
MACHINES
0.689***
(0.038)
VEHICLES
-1.530***
(0.042)
JAPAN
0.222***
(0.028)
US
-0.300***
(0.025)
------------------------------Aldrich-Nelson R-sq.
0.991
McFadden R-sq.
0.694
Cox-Snell R-sq.
1.000
Nagelkerke R-sq.
1.000
phi
1.000
Likelihood-ratio
20587.541
p
0.000
Log-likelihood
-4950.789
Deviance
9081.901
AIC
9919.578
BIC
9948.364
N
181
===============================
The function coeftest in the package lmtest produces a table of the coefficients with
their robust standard errors, the z statistics and the statistical significance. Robust
standard errors are obtained by using the function vcovHC, available in the package
sandwich, see Section 4.1.6.
> library(sandwich)
> library(lmtest)
> coeftest(poissonfit, vcovHC(poissonfit, type = "HC"))
214
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.873731
0.742962 -1.1760 0.239591
LR91
0.854525
0.093695 9.1203 < 2.2e-16 ***
AEROSP
-1.421850
0.380168 -3.7401 0.000184 ***
CHEMIST
0.636267
0.225359 2.8233 0.004753 **
COMPUTER
0.595343
0.300803 1.9792 0.047796 *
MACHINES
0.688953
0.414664 1.6615 0.096619 .
VEHICLES
-1.529653
0.280693 -5.4496 5.049e-08 ***
JAPAN
0.222222
0.352840 0.6298 0.528819
US
-0.299507
0.273621 -1.0946 0.273689
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Verbeek performs a Wald test to check for the joint effect of the explanatory variables
included in the model.
We have first to estimate the base glm model including only an intercept and
then call the function waldtest, available in the package lmtest. The first and
second arguments are respectively the baseline model and the complete one; with
the argument test it is possible to specify the type of test ("Chisq" or "F") to be
performed and with the argument vcov the covariance matrix, in the present case a
robust estimate of the covariance matrix.
> poissonfit0 <- glm(P91 ~ 1, family = poisson(link = "log"),
data = patents)
> lmtest::waldtest(poissonfit0, poissonfit, test = "Chisq",
vcov = vcovHC(poissonfit, type = "HC"))
Wald test
Model 1: P91 ~ 1
Model 2: P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +VEHICLES+
JAPAN + US
Res.Df Df Chisq Pr(>Chisq)
1
180
2
172 8 339.97 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The Wald test rejects the hypothesis that the conditional mean is constant and
independent of the explanatory variables.
With regard to the interpretation of the coefficients Verbeek observes that b2 = 0.85,
pertaining the logarithm of the Research & Development expenditures, is to be
interpreted as an elasticity.
We obtain the percentage difference for each industry, ceteris paribus, in the number
of patents with respect to the reference industries (food, fuel, metal and others) by
transforming the parameters b3 , . . . , b9 as 100[exp(bi ) 1], i = 3, . . . , 9
215
US
-25.9
The function dispersiontest is available in the package AER to test if we can expect a
dispersion parameter different from 1. The first argument is a Poisson model estimated
with glm. By default the argument alternative is set to "greater" thus testing for
a situation of overdispersion implying a value for the conditional variance larger than
the value of the conditional mean:
2 = dispersion ,
> library(AER)
> dispersiontest(poissonfit)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
57.36236
A sample estimate of the dispersion parameter is also given. By using the argument
trafo it is possible to specify to 1 or 2 the power k in the following expression
2 = + k
corresponding to the formulation of the variance in the Negative Binomial I and
II models that will be used below, see Kleiber and Zeileis (2008) for more detailed
information. The estimate of the dispersion parameter will be given for .
Note that for k = 1 we have dispersion = (1 + ).
> dispersiontest(poissonfit, trafo = 1)
Overdispersion test
data: poissonfit
z = 3.5861, p-value = 0.0001678
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
56.36236
> dispersiontest(poissonfit, trafo = 2)
Overdispersion test
216
data: poissonfit
z = 3.8271, p-value = 6.482e-05
alternative hypothesis: true alpha is greater than 0
sample estimates:
alpha
0.4278121
To overcome the problem of overdispersion of the sigma parameter Verbeek considers
the estimation of the two models: NegBinI and NegBinII, both based on a
Negative Binomial distribution for the number of patents but depending on different
specifications for the conditional variance, see e.g. Johnson et. al. (2005) and Rigby
and Stasinopoulos (2009).
Observe that in the statistical literature the name of the distribution functions
pertaining these two models are exchanged. So the model NBI will be used to estimate
what in the econometric literature is known as NegBinII and vice versa; later on the
econometric convention is used for naming model objects.
Negative Binomial II - 1st estimation
The function glm provides also the family negative.binomial(1) that is used to
estimate the NegBinII model. The corresponding call and output are:
> NegBinII <- glm(P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER +
MACHINES + VEHICLES + JAPAN + US, family = negative.binomial(1),
data = patents)
> summary(NegBinII)
Call:
glm(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, family = negative.binomial(1), data = patents)
Deviance Residuals:
Min
1Q
Median
-2.9205 -1.1676 -0.3058
3Q
0.3976
Max
2.9594
Coefficients:
Estimate Std. Error t value
(Intercept) -0.32576
0.55216 -0.590
LR91
0.83129
0.07861 10.575
AEROSP
-1.49826
0.37399 -4.006
CHEMIST
0.48779
0.26256
1.858
COMPUTER
-0.16953
0.27553 -0.615
MACHINES
0.05990
0.27957
0.214
VEHICLES
-1.53392
0.36098 -4.249
JAPAN
0.25361
0.40086
0.633
US
-0.58792
0.28228 -2.083
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.5560
< 2e-16
9.17e-05
0.0649
0.5392
0.8306
3.51e-05
0.5278
0.0388
***
***
.
***
*
on 180
on 172
217
degrees of freedom
degrees of freedom
McFadden
0.1430963
r2ML
0.7809187
Observe that the estimates and their standard errors differ somewhat from those
provided in Verbeeks Table 7.8, but we have to remind that glm uses an estimation
method based upon iteratively reweighted least squares and not on maximum
likelihood.
Verbeeks coefficient estimates2 for the NegBinII model may be reproduced by
applying the function glm.nb available in the package MASS.
Negative Binomial II - 2nd estimation
> library(MASS)
> NegBinII.glm.nb <- glm.nb(P91 ~ LR91 + AEROSP + CHEMIST +
COMPUTER + MACHINES + VEHICLES + JAPAN + US,
data = patents)
> summary(NegBinII.glm.nb)
Call:
glm.nb(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, data = patents, init.theta = 0.7686768238,
link = log)
Deviance Residuals:
Min
1Q
Median
-2.6373 -1.0264 -0.2694
3Q
0.3438
Max
2.5966
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.32462
0.56234 -0.577
0.5638
LR91
0.83148
0.08006 10.386 < 2e-16 ***
AEROSP
-1.49746
0.37691 -3.973 7.10e-05 ***
CHEMIST
0.48861
0.26788
1.824
0.0682 .
COMPUTER
-0.17355
0.28086 -0.618
0.5366
MACHINES
0.05926
0.28429
0.208
0.8349
VEHICLES
-1.53065
0.36852 -4.153 3.27e-05 ***
JAPAN
0.25222
0.40983
0.615
0.5383
US
-0.59050
0.28834 -2.048
0.0406 *
2 Also
218
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
on 180
on 172
degrees of freedom
degrees of freedom
Theta:
Std. Err.:
2 x log-likelihood:
0.7687
0.0812
-1639.1910
219
Call:
gamlss(formula = P91 ~ LR91 + AEROSP + CHEMIST + COMPUTER + MACHINES +
VEHICLES + JAPAN + US, sigma.formula = ~1, family = NBII,
data = patents)
Fitting method: RS()
------------------------------------------------------------------Mu link function: log
Mu Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept)
0.6962
0.50450
1.3799 1.694e-01
LR91
0.5779
0.06712
8.6095 4.652e-15
AEROSP
-0.7873
0.33707 -2.3359 2.066e-02
CHEMIST
0.7321
0.18525
3.9518 1.133e-04
COMPUTER
0.1440
0.20644
0.6978 4.863e-01
MACHINES
0.1549
0.25490
0.6076 5.443e-01
VEHICLES
-0.8177
0.26869 -3.0433 2.709e-03
JAPAN
0.4012
0.25742
1.5584 1.210e-01
US
0.1581
0.19856
0.7964 4.269e-01
------------------------------------------------------------------Sigma link function: log
Sigma Coefficients:
Estimate Std. Error
t value
Pr(>|t|)
4.560e+00
1.470e-01
3.103e+01
3.854e-72
------------------------------------------------------------------No. of observations in the fit: 181
Degrees of Freedom for the fit: 10
Residual Deg. of Freedom: 171
at cycle: 13
Global Deviance:
1696.391
AIC:
1716.391
SBC:
1748.376
*******************************************************************
> exp(NegBinI$sigma.coef)
(Intercept)
95.56499
> pR2(NegBinI)
GAMLSS-RS iteration 1: Global Deviance = 1790.816
GAMLSS-RS iteration 2: Global Deviance = 1787.876
GAMLSS-RS iteration 3: Global Deviance = 1786.311
GAMLSS-RS iteration 4: Global Deviance = 1785.551
220
Sigma Coefficients:
Estimate Std. Error
0.26311
0.10568
t value
2.48971
221
Pr(>|t|)
0.01374
222
We can use the function mle2 in the package bbmle to obtain the maximum likelihood
parameter estimates with their standard errors and significance information.
> library(bbmle)
> b.start <- as.list(c(NegBinI$mu.coef, exp(NegBinI$sigma.coef)))
> names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
> NegBinIout <- mle2(llI, start = b.start)
> summary(NegBinIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start)
Coefficients:
Estimate Std. Error
b1 0.695898
0.507413
b2 0.577782
0.067676
b3 -0.786540
0.336884
b4 0.732261
0.185290
b5 0.144167
0.206440
b6 0.154857
0.255039
b7 -0.816659
0.268684
b8 0.400487
0.257415
b9 0.158445
0.198506
d2 95.564819 14.100465
--Signif. codes: 0 "***"
z value
Pr(z)
1.3715
0.17023
8.5375 < 2.2e-16 ***
-2.3348
0.01956 *
3.9520 7.751e-05 ***
0.6984
0.48496
0.6072
0.54372
-3.0395
0.00237 **
1.5558
0.11976
0.7982
0.42476
6.7774 1.223e-11 ***
0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1696.391
By changing the relative tolerance value and incrementing the maximum number of
iterations we have:
> NegBinIout <- mle2(llI, start = b.start, control=list(reltol = 1e-15,
maxit = 50000))
> summary(NegBinIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llI, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Estimate Std. Error z value
Pr(z)
b1 0.690189
0.506968 1.3614 0.173385
b2 0.578394
0.067628 8.5525 < 2.2e-16 ***
b3 -0.786539
0.336789 -2.3354 0.019522 *
223
b4 0.733320
0.185161 3.9604 7.481e-05 ***
b5 0.144998
0.206314 0.7028 0.482179
b6 0.155770
0.254981 0.6109 0.541259
b7 -0.817559
0.268611 -3.0437 0.002337 **
b8 0.400543
0.257280 1.5568 0.119508
b9 0.158789
0.198397 0.8004 0.423502
d2 95.243438 14.006343 6.8000 1.046e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-2 log L: 1696.39
>
>
>
>
b.start<-as.list(c(round(NegBinI$mu.coef,2),exp(NegBinII$sigma.coef)))
b.start <- as.list(c(NegBinII.glm.nb$coef, 1/NegBinII.glm.nb$theta))
names(b.start) <- c(paste("b", 1:9, sep = ""), "d2")
NegBinIIout <- mle2(llII,start=b.start,control=list(reltol=1e-15,
maxit = 50000))
> summary(NegBinIIout)
Maximum likelihood estimation
Call:
mle2(minuslogl = llII, start = b.start, control = list(reltol = 1e-15,
maxit = 50000))
Coefficients:
Estimate Std. Error
b1 -0.324623
0.498168
b2 0.831479
0.076595
b3 -1.497458
0.377230
b4 0.488611
0.256769
b5 -0.173551
0.298809
b6 0.059264
0.279293
b7 -1.530649
0.373899
b8 0.252223
0.426426
b9 -0.590497
0.278778
d2 1.300937
0.137459
--Signif. codes: 0 "***"
z value
-0.6516
10.8555
-3.9696
1.9029
-0.5808
0.2122
-4.0937
0.5915
-2.1182
9.4641
Pr(z)
0.51464
< 2.2e-16
7.199e-05
0.05705
0.56137
0.83196
4.245e-05
0.55420
0.03416
< 2.2e-16
***
***
.
***
*
***
-2 log L: 1639.191
Parameter estimates and their standard errors are now closer to the values in
Verbeeks Table 7.8.
224
7.6
Data are available in the file TOBACCO.WF1, which is a work file of EViews.
The function unzip allows to extract the file from the compressed archive ch07.zip.
> library(hexView)
> at <- readEViews(unzip("ch07.zip", "Chapter 7/tobacco.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
The file tobacco contains information about 2724 Belgian households, taken from
the Belgian household budget survey of 1995/96. The data are kindly supplied by the
National Institute of Statistics (NIS), Belgium. The following variables are present
(some are dummy variables):
D1 dummy, 1 if share1>0
D2 dummy, 1 if share2>0
The shares of families having zero expenditures on alcohol and tobaccoes may be
determined as:
> sum(at$SHARE1 == 0)/length(at$SHARE1)
[1] 0.171072
225
Left-censored
466
Uncensored Right-censored
2258
0
Coefficients:
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
LNX
AGE:LNX
NADULTS:LNX
Log(scale)
Pr(>|z|)
0.000277
0.214993
0.084988
1.27e-05
0.103652
8.17e-05
0.312072
0.066051
< 2e-16
***
.
***
***
.
***
226
--Signif. codes:
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Scale: 0.02442
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 4755 on 9 Df
Wald-statistic: 118.8 on 7 Df, p-value: < 2.22e-16
> at7.9t <- tobit(SHARE2 ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX, data = at)
> summary(at7.9t)
Call:
tobit(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at)
Observations:
Total
2724
Left-censored
1688
Uncensored Right-censored
1036
0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.5899802 0.0934269
6.315 2.70e-10 ***
AGE
-0.1258530 0.0241783
-5.205 1.94e-07 ***
NADULTS
0.0153697 0.0380475
0.404 0.68624
NKIDS
0.0042697 0.0013247
3.223 0.00127 **
NKIDS2
-0.0099719 0.0054713
-1.823 0.06837 .
LNX
-0.0444314 0.0068893
-6.449 1.12e-10 ***
AGE:LNX
0.0088221 0.0017832
4.947 7.52e-07 ***
NADULTS:LNX -0.0006007 0.0027501
-0.218 0.82709
Log(scale) -3.0366568 0.0246517 -123.183 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Scale: 0.048
Gaussian distribution
Number of Newton-Raphson Iterations: 3
Log-likelihood: 758.7 on 9 Df
Wald-statistic: 171.3 on 7 Df, p-value: < 2.22e-16
To obtain the total expenditure elasticities evaluated at the sample averages of
those households that have positive expenditures, we have first to adapt Verbeeks
relationship (7.72) to the present context. The budget share for good j
wj =
pj qj
x
227
wj
wj (x)
=x
pj
pj
wj
1
+ (6j + 7j AGE + 8j NADULTS)
pj
pj
dqj x
wj x
1 x
=
+
(6j + 7j AGE + 8j NADULTS)
dx qj
pj qj
pj qj
=1+
6j + 7j AGE + 8j NADULTS
.
wj
Let averages be the vector containing the average levels of the independent variables
in the model we have estimated for alcohool
> (averages <- apply(model.matrix(at7.9a)[at$SHARE1 >
0, ], 2, mean))
(Intercept)
AGE
NADULTS
NKIDS
NKIDS2
1.00000000 2.43489814 2.00177148 0.56864482 0.04384411
LNX
AGE:LNX NADULTS:LNX
13.77622903 33.42541096 27.74124174
The function model.matrix extracts the matrix of the regressors (including the
constant) of model at7.9a.
[at$SHARE1>0,] selects the observations with a positive expenditure in alcohol.
The column averages are obtained by means of the function apply(,2,mean).
The elements 2 and 3 of averages correspond respectively to the averages of the
variables AGE and NADULTS and coef(at7.9a)[6:8] are the 6j , 7j and 8j coefficient
estimates for alcohool.
The elasticity results:
228
7.7
For data reading and description see the preceding Section 7.6.
Verbeek suggests to estimate the Engel curve only for statistical units who have a
positive budget share, by means of OLS4 .
To this aim we can use the function lm by using the argument subset.
> at7.10a <- lm(SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE1 >
0))
> at7.10t <- lm(SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 +
LNX + AGE:LNX + NADULTS:LNX, data = at, sub = which(SHARE2 >
0))
> library(memisc)
> mtable(at7.10a, at7.10t)
Calls:
at7.10a: lm(formula = SHARE1 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE1 >
0))
at7.10t: lm(formula = SHARE2 ~ AGE + NADULTS + NKIDS + NKIDS2 + LNX +
AGE:LNX + NADULTS:LNX, data = at, subset = which(SHARE2 >
0))
=====================================
at7.10a
at7.10t
------------------------------------(Intercept)
0.053
0.490***
(0.044)
(0.074)
AGE
0.008
-0.031
(0.011)
(0.021)
4 As observed above since AGE is in brackets, we should transform it in a factor:
at$AGE<-as.factor(at$AGE)
229
NADULTS
-0.013
-0.013
(0.016)
(0.032)
NKIDS
-0.002***
0.001
(0.001)
(0.001)
NKIDS2
-0.002
-0.003
(0.002)
(0.005)
LNX
-0.002
-0.034***
(0.003)
(0.005)
AGE x LNX
-0.000
0.002
(0.001)
(0.002)
NADULTS x LNX
0.001
0.001
(0.001)
(0.002)
------------------------------------R-squared
0.051
0.154
adj. R-squared
0.048
0.148
sigma
0.022
0.029
F
17.270
26.732
p
0.000
0.000
Log-likelihood
5467.424 2200.044
Deviance
1.043
0.868
AIC
-10916.849 -4382.088
BIC
-10865.348 -4337.599
N
2258
1036
=====================================
Detailed results about single models can be obtained with summary(at7.10a) and
summary(at7.10t).
To obtain the total expenditure elasticities evaluated at the sample averages of those
households that have positive expenditures, we have to follow a procedure similar to
that presented above.
> averages <- apply(model.matrix(at7.10a), 2, mean)
> w_j <- mean(at$SHARE1[at$SHARE1 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.10a)[6:8])/w_j
[1] 0.922836
> averages <- apply(model.matrix(at7.10t), 2, mean)
> w_j <- mean(at$SHARE2[at$SHARE2 > 0])
> 1 + sum(c(1, averages[2:3]) * coef(at7.10t)[6:8])/w_j
[1] 0.1765833
Verbeek suggests then the estimation of two probit models.
> at7.11a <- glm(sign(SHARE1) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +
WHITECOL, family = binomial(link = probit), data = at)
> at7.11t <- glm(sign(SHARE2) ~ AGE + NADULTS + NKIDS +
NKIDS2 + LNX + AGE:LNX + NADULTS:LNX + BLUECOL +
230
231
N
2724
2724
===========================================
Detailed results pertaining single models can be obtained with summary(at7.11a)
and summary(at7.11t) and the goodness of fit statistics with pR2(at7.11a) and
pR2(at7.11t).
In presence of an household consisting of two adults, the head being a 35-year-old
(belonging to the second AGE class) blue-collar worker and two children older than
2, with total expenditures equal to the overall sample average, the implied estimated
probabilities of a positive budget share of alcohol and tobacco are respectively
(assuming the total expenditures to increase 10%).
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.800741
> pnorm(predict(at7.11a, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.8215568
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX), BLUECOL = 1,
WHITECOL = 0)))
1
0.5171916
> pnorm(predict(at7.11t, data.frame(AGE = 2, NADULTS = 2,
NKIDS = 2, NKIDS2 = 0, LNX = mean(at$LNX) + log(1.1),
BLUECOL = 1, WHITECOL = 0)))
1
0.5045241
The estimated probabilities for alcohoolic beverages are slightly different from those
by Verbeek.
Observe that the standard errors of the parameter estimates obtained with glm
are somewhat different from those obtained by Verbeek since glm uses iteratively
reweighted least squares and not maximum likelihood as estimation method. The
probit function in the package sampleSelection provides maximum likelihood
estimates and standard errors.
> library(sampleSelection)
> at7.11aa <- probit(sign(SHARE1) ~ AGE + NADULTS +
NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, data = at)
> at7.11tt <- probit(sign(SHARE2) ~ AGE + NADULTS +
232
233
WHITECOL
0.021534
0.069428 0.3102 0.7564324
AGE:LNX
0.174736
0.041305 4.2303 2.334e-05 ***
NADULTS:LNX -0.025340
0.062923 -0.4027 0.6871532
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Significance test:
chi2(9) = 108.9097 (p=2.450886e-19)
-------------------------------------------Note that the functions mtable and predict cannot (at the moment) be applied to
objects resulting from sampleSelection::probit.
The Engel curves are finally re-estimated by Verbeek with the two-step estimation
procedure proposed by Heckmann. The function selection, available in the package
sampleSelection can be used. The function selection depends on 4 main
arguments: selection: a formula specifying the (probit) selection model; outcome
a formula relating the outcome to its explanatory variables; data, a data.frame
containing the data to analyze; the method specifying the estimation method, in our
case "2step".
> library(sampleSelection)
> at7.12a <- selection(selection = sign(SHARE1) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE1 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> at7.12t <- selection(selection = sign(SHARE2) ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX +
BLUECOL + WHITECOL, outcome = SHARE2 ~ AGE +
NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX + NADULTS:LNX,
data = at, method = "2step")
> summary(at7.12a)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (466 censored and 2258 observed)
21 free parameters (df = 2704)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -15.88231
2.57393 -6.170 7.83e-10 ***
AGE
0.66785
0.65200
1.024
0.3058
NADULTS
2.25539
1.02453
2.201
0.0278 *
NKIDS
-0.07705
0.03725 -2.069
0.0387 *
NKIDS2
-0.18572
0.14083 -1.319
0.1874
LNX
1.23553
0.19130
6.459 1.25e-10 ***
BLUECOL
-0.06117
0.09777 -0.626
0.5316
WHITECOL
0.05056
0.08471
0.597
0.5506
234
AGE:LNX
-0.04480
0.04854 -0.923
0.3561
NADULTS:LNX -0.16879
0.07423 -2.274
0.0231 *
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0542675 0.1329935
0.408 0.68327
AGE
0.0077095 0.0130468
0.591 0.55463
NADULTS
-0.0133444 0.0247045 -0.540 0.58913
NKIDS
-0.0020244 0.0007637 -2.651 0.00808 **
NKIDS2
-0.0024127 0.0025715 -0.938 0.34821
LNX
-0.0024288 0.0093674 -0.259 0.79544
AGE:LNX
-0.0004044 0.0009420 -0.429 0.66773
NADULTS:LNX 0.0008461 0.0018047
0.469 0.63922
Multiple R-Squared:0.051,
Adjusted R-Squared:0.0476
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -0.0002045 0.0165285 -0.012
0.99
sigma
0.0214876
NA
NA
NA
rho
-0.0095160
NA
NA
NA
-------------------------------------------> summary(at7.12t)
-------------------------------------------Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2724 observations (1688 censored and 1036 observed)
21 free parameters (df = 2704)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.24447
2.21108
3.729 0.000196 ***
AGE
-2.48300
0.55960 -4.437 9.48e-06 ***
NADULTS
0.48520
0.87174
0.557 0.577855
NKIDS
0.08128
0.03083
2.637 0.008425 **
NKIDS2
-0.21166
0.12305 -1.720 0.085522 .
LNX
-0.63208
0.16320 -3.873 0.000110 ***
BLUECOL
0.20642
0.08343
2.474 0.013418 *
WHITECOL
0.02153
0.06943
0.310 0.756456
AGE:LNX
0.17474
0.04131
4.230 2.41e-05 ***
NADULTS:LNX -0.02534
0.06292 -0.403 0.687185
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4515813 0.1086284
4.157 3.32e-05 ***
AGE
-0.0172991 0.0358591 -0.482 0.629547
NADULTS
-0.0174378 0.0339635 -0.513 0.607693
NKIDS
0.0007643 0.0015130
0.505 0.613471
NKIDS2
-0.0020755 0.0053883 -0.385 0.700128
LNX
-0.0301094 0.0090459 -3.329 0.000885 ***
AGE:LNX
0.0012243 0.0025454
0.481 0.630568
235
236
XSBLUECOL
-0.06540005 0.10183231 -0.6422 5.21e-01
XSWHITECOL
0.01981345 0.08890862 0.2229 8.24e-01
XSAGE:LNX
-0.04739224 0.05139110 -0.9222 3.56e-01
XSNADULTS:LNX -0.17079467 0.07741752 -2.2060 2.74e-02
*
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.0481414810 0.506033190 0.09514
0.924
XOAGE
-0.0056858851 0.030390752 -0.18710
0.852
XONADULTS
-0.0044571146 0.075028273 -0.05941
0.953
XONKIDS
-0.0013210916 0.002505419 -0.52730
0.598
XONKIDS2
-0.0018052433 0.005900382 -0.30600
0.760
XOLNX
-0.0022514209 0.035468795 -0.06348
0.949
XOAGE:LNX
0.0005240257 0.002130696 0.24590
0.806
XONADULTS:LNX 0.0002636993 0.005524299 0.04773
0.962
imrData$IMR1 -0.0011748054 0.067667883 -0.01736
0.986
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
------------------------------------------------------------> with(at, {
at7.12trob <- heckitrob(selection = sign(SHARE2) ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX + BLUECOL + WHITECOL, outcome = SHARE2 ~
AGE + NADULTS + NKIDS + NKIDS2 + LNX + AGE:LNX +
NADULTS:LNX,
control = heckitrob.control(weights.x1 = "robCov"))
summary(at7.12trob)
})
------------------------------------------------------------Robust 2-step Heckman / heckit M-estimation
Probit selection equation:
Estimate Std.Error t-value p-value
XS(Intercept) 8.28126218 2.22147554 3.7280 1.93e-04 ***
XSAGE
-2.48452150 0.56216375 -4.4200 9.89e-06 ***
XSNADULTS
0.46546383 0.87382559 0.5327 5.94e-01
XSNKIDS
0.08116955 0.03087415 2.6290 8.56e-03 **
XSNKIDS2
-0.19818390 0.12341443 -1.6060 1.08e-01
XSLNX
-0.63408431 0.16394126 -3.8680 1.10e-04 ***
XSBLUECOL
0.20400053 0.08368888 2.4380 1.48e-02
*
XSWHITECOL
0.01799770 0.06979220 0.2579 7.97e-01
XSAGE:LNX
0.17500330 0.04148927 4.2180 2.46e-05 ***
XSNADULTS:LNX -0.02425829 0.06306736 -0.3846 7.01e-01
Outcome equation:
Estimate
Std.Error t-value p-value
XO(Intercept) 0.3979558830 0.162089777 2.4550 0.0141 *
XOAGE
-0.0323871073 0.060120904 -0.5387 0.5900
XONADULTS
-0.0117157109 0.036582449 -0.3203 0.7490
XONKIDS
0.0003799716 0.002269950 0.1674 0.8670
XONKIDS2
-0.0024720481 0.006241712 -0.3961 0.6920
XOLNX
-0.0264461249 0.014363553 -1.8410 0.0656 .
XOAGE:LNX
0.0022926188 0.004209442 0.5446 0.5860
XONADULTS:LNX 0.0009147453 0.002477952 0.3692 0.7120
imrData$IMR1 -0.0074295332 0.035604993 -0.2087 0.8350
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
-------------------------------------------------------------
237
8
Univariate Time Series Models
8.1
8.1.1
t N ( = 0, 2 ).
To create a time series E with 100 Normal pseudo-random numbers with mean = 0
and variance 2 = 3 use the code
> E <- ts(data = rnorm(n = 500, mean = 0, sd = 3^0.5))
The function ts is used to create time series objects; its main arguments are the
data consisting in a numeric vector or matrix, start defining the time of the first
observation, which can be a single number or a vector of two integers (a natural
time unit and the number of samples into the time unit, e.g. c(2012,2) for 2012
February in a monthly series), end defining the time of the last observation, specified
in the same way as start, and one of the two options: frequency, the number of
observations per unit of time, or deltat, the fraction of the sampling period between
successive observations; e.g., 1/12 for monthly data.
By default both frequency and deltat are set to 1.
The graphical representation of the stochastic process {t } shows no regular
pattern. See Figure 8.1 that can be obtained with the code:
> plot(E)
Observe that {t } is a White Noise process when it consists of a sequence of
uncorrelated random variables with mean 0 and the same variance.
The Gaussian White Noise is an example of White noise.
1 We address readers interested in the financial implementations of stochastic processes to the
R-metrics site (https://www.rmetrics.org). The e-book by W
urtz et. al. (2009) is a good reference
presenting an overall introduction to the type of classes used by R for dealing with time series.
240
100
200
300
400
500
Time
Figure 8.1
8.1.2
(8.1)
241
...
y[n] = 1 y[n-1] + E[n]
Initial data need to be dropped to make their memory effect vanish. So we simulate
a realization longer than n.
> theta1 <- 0.9
> n <- 500
> E <- ts(rnorm(n + 100, mean = 0, sd = 1))
We can now define the initial condition y0 for {yt }, initialize a variable y with the
same length as E and obtain y[1]. Relationships (8.2) are then implemented by means
of a for cycle.
>
>
>
>
y0 <- 0
y <- E * 0
y[1] <- theta1 * y0 + E[1]
for (t in 2:(n + 100)) {
y[t] <- theta1 * y[t - 1] + E[t]
}
> y <- y[(length(y) - n + 1):length(y)]
Observe that if, as we assumed, |1 | < 1 we can obtain, by recursive substitutions, the
causal representation of {yt } as a linear filter based on the generating process {t }
yt =
1i ti ;
i=0
2
0
2
6
242
100
200
300
400
500
Time
Figure 8.2
243
(8.3)
3
where {t } is a White Noise process and the roots of the characteristic equation
1 1 z 2 z 2 = 0
lie outside the unit circle.
To simulate a realization from an AR(2) stochastic process we need the realizations
of a White Noise process {t } and 2 initial conditions for {yt }, say y_1 and y0, the
values of {yt } at times t = 1 and 0
y[1] = 1 y0 + 2 y_1 + E[1]
y[2] = 1 y[1] + 2 y0 + E[2]
y[3] = 1 y[2] + 2 y[1] + E[3]
y[4] = 1 y[3] + 2 y[2] + E[4]
(8.4)
...
y[n] = 1 y[n-1] + 2 y[n-2] + E[n]
Initial data are dropped to make their memory effect vanish
>
>
>
>
>
>
>
>
>
X
yt =
i ti ,
i=0
that sometimes the stationarity condition is expressed by means of the auxiliary equation
z 2 1 z 2 = 0
whose roots must lie inside the unit circle for the process {yt } to be stationary.
244
Observe that to define the filter we do not have to compute the coefficients i ,
which could also be obtained by recursive substitutions of relationship (8.3): we only
need to specify, as for the AR(1) process, the autoregressive coefficients and state
"recursive" as method.
> y <- ts(filter(E, filter = c(theta1, theta2), method =
"recursive")[(length(E) - n + 1):length(E)])
We can also make direct use of the function arima.sim
> y <- arima.sim(model = list(ar = c(theta1, theta2)),
n = 500)
8.1.3
(8.5)
>
>
>
>
>
>
>
Observe that the for cycle can be substituted with the following vector instruction:
245
> y <- E * 0
> y[1] <- E[1] + alpha1 * e0
> y[-1] <- E[-1] + alpha1 * E[-length(y)]
Regarding the linear filter representation observe that
yt =
i ti ,
i=0
This can be implemented with the function filter by specifying the moving average
coefficient, "convolution" as method and4 sides=1.
> y <- filter(E, filter = c(1, alpha1), method = "convolution",
sides = 1)
The series obtained with filter differs from the one built with the for cycle only in
the first observation y[1], since filter uses this observation as the initial condition.
Observe that initial conditions do not have any effect on the evolution of a moving
average process.
We can also make direct use of the function arima.sim:
> y <- arima.sim(model = list(ma = alpha1), n = 100)
The reader can observe what happens by varying 1 in the set {0.8, 0.6, . . . , 0.6, 0.8}.
Simulation from a MA(2) Process
The unique, asymptotic, weakly stationary solution of the following stochastic finite
difference equation, where {t } is a White Noise,
yt = t + 1 t1 + 2 t2
is a stochastic process {yt } named Moving Average process of order 2 denoted with
MA(2), which is invertible if the roots of the characteristic equation
1 + 1 z + 2 z 2 = 0
lie outside the unit circle.
To simulate a realization from a MA(2) stochastic process we need the realizations of
4 With
k
X
i ti ,
with 0 = 1,
i=k
1
0
1
3
246
100
200
300
400
500
Time
Figure 8.3
a White Noise process {t } and two initial conditions, say e0 and e_1, for the values
0 and 1
y[1] = e[1] + 1 e0 + 2 e_1
y[2] = e[2] + 1 e[1] + 2 e0
y[3] = e[3] + 1 e[2] + 2 e[1]
y[4] = e[4] + 1 e[3] + 2 e[2]
...
y[n] = e[n] + 1 e[n-1] + 2 e[n-2]
(8.6)
>
>
>
>
>
>
>
247
We can also use the filter function, with method="convolution" and obtain the
same series except for the 2 initial values.
> y <- filter(E, filter = c(1, alpha1, alpha2), method = "convolution",
sides = 1)
The function arima.sim can also be used
> y <- arima.sim(model = list(ma = c(alpha1, alpha2)),
n = 100)
8.1.4
Let us now simulate a realization from the stochastic process {Yt } defined by the
following stochastic finite difference equation:
Yt = + 1 Yt1 + t
with = 2 and 1 = 0.8; {t } is an assigned Gaussian White Noise process.
{Yt } is an autoregressive process of order 1 with the presence of a drift.
To simulate a realization from this process we can use the following code:
>
>
>
>
n <- 500
alpha <- 2
theta1 <- 0.8
yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
alpha + rnorm(n)
})
The argument rand.gen in arima.sim specifies the generating model for {t }. Here
we considered a sequence of i.i.d. normal pseudo-random values shifted by the constant
alpha, which is equivalent to specify a sequence of normal pseudo-random values with
mean alpha:
> yt <- arima.sim(model = list(ar = theta1), n = n,
rand.gen = function(n) {
rnorm(n, mean = alpha)
})
12
10
8
6
14
248
100
200
300
400
500
Time
Figure 8.4
To obtain the plot of the time series {Yt }, see Fig. 8.4, use
> plot(yt, ylab = paste("Yt AR(1), theta1= ", theta1,
" drift= ", alpha))
We can compute the mean and the variance of Yt , which can be compared with their
theoretical values
1
= 10 and V ar(Yt ) =
= 2.778.
E(Yt ) =
1 1
1 12
> mean(yt)
[1] 10.10053
> var(yt)
[1] 2.179227
Finally we repeat the procedure k = 200 times and obtain summary statistics for the
mean and the variance and plot an histogram for the estimates. See Figures 8.5 and
8.6.
249
8.2
We now consider the study of the autocorrelation function and of the partial
autocorrelation function for a time series with the aim of identifying the order of
an autoregressive or of a moving average model.
8.2.1
(8.7)
250
1.0
0.0
0.5
Density
1.5
2.0
Histogram of a[1, ]
9.4
9.6
9.8
10.0
10.2
10.4
10.6
a[1, ]
Figure 8.5
251
0.0
0.5
1.0
Density
1.5
2.0
9.0
9.5
10.0
10.5
11.0
Figure 8.6 Mean estimates distribution for 200 replications of the simulation of {yt } and
kernel estimate of the density
We can observe that the autocorrelation function has a quite slow decay; namely for
the AR(1) process defined by relationship (8.7) we have that:
Cor(Yt , Yt1 ) = Cor(yt , yt1 ) = 1 ,
Cor(Yt , Yt2 ) = Cor(yt , yt2 ) = 12 ,
...,
Cor(Yt , Ytk ) = Cor(yt , ytk ) = 1k ,
thus the autocorrelation function for an AR(1) process shows a decay of an exponential
type which is not very slow for larger than 0.7.
The plot of the partial autocorrelation function, see Fig. 8.8, can be obtained with
> (ytpacf <- pacf(yt))
Partial autocorrelations of series
Syt
S, by lag
1
252
0.4
0.2
0.0
0.2
ACF
0.6
0.8
1.0
Series yt
10
15
20
25
Lag
Figure 8.7
253
0.4
0.0
0.2
Partial ACF
0.6
0.8
Series yt
10
15
20
25
Lag
Figure 8.8
Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
0.8 0.62 0.48 0.38 0.32 0.25 0.19 0.13 0.07 0.03
0.8 -0.06 0.00 0.03 0.05 -0.07 0.00 -0.03 -0.06 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 -0.04 -0.05 -0.08 -0.09 -0.11 -0.14 -0.15 -0.16 -0.16
PACF -0.03 -0.06 0.06 -0.07 -0.01 -0.04 -0.05 0.01 -0.05 0.02
ACF
PACF
Since the values of the autocorrelation and partial autocorrelation functions are
returned as columns of a matrix, we prefer to use, for typographical convenience, the
transpose operator when invoking the acf2 function. The second argument in acf2
establishes the number of lags to be considered when producing the correlogram.
Observe that it is also possible to use the function PacfPlot available in the package
FitAR, which returns 95% confidence intervals for the partial autocorrelations. See
Fig. 8.10.
> library(FitAR)
> PacfPlot(yt)
254
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.9 Autocorrelation and Partial autocorrelations, via acf2, for {yt }, AR(1) process
with drift ( = 0.8, drif t = 2)
As already observed the autocorrelation function shows a slow decay, while the partial
autocorrelation function cuts off at lag 1; so we can conclude that an autoregressive
model of order 1 can fit the data.
8.2.2
255
0.0
1.0
0.5
pacf
0.5
1.0
10
12
14
lag
95% confidence intervals for pacf
Figure 8.10 Partial autocorrelation for {yt }, AR(1) process with drift ( = 0.8, drif t = 2)
(95% confidence intervals)
Try with
> yt <- arima.sim(model = list(ar = c(0.5,0.1,0.2,0,.2)), n = n,
rand.gen = function(n) {rnorm(n, mean = const)})
To understand why you did not succeed in the generation of a realization from this
process, we can check if the roots of the characteristic equation allow for a stationary
solution of the stochastic difference equation. This can be made with the function
InvertibleQ in the package FitAR which checks if the roots of the characteristic
equation, here
1 0.5z 0.1z 2 0.2z 3 0.2z 5 = 0,
lie outside the unit circle.
> library(FitAR)
> InvertibleQ(c(0.5, 0.1, 0.2, 0, 0.2))
[1] FALSE
256
1
2
3
4
1
1
1
1
= 0.5
=0
= 0.1
= 0.2
2
2
2
2
= 0.1
= 0.1
= 0.5
= 0.1
3
3
3
3
= 0.2
= 0.2
=0
=0
4
4
4
4
=0
=0
= 0.2
= 0.5
5
5
5
5
=0
= 0.5
=0
=0
(8.8)
The autocorrelation function, the partial autocorrelation function and the confidence
intervals for the partial autocorrelation function will also be plotted.
We can define a function for simulating the four processes.
> n <- 500
> const <- 2
> genera <- function(thetas) {
yt <<- arima.sim(model = thetas, n = n, rand.gen = function(n) {
rnorm(n, mean = const)
})
print("theta parameters: ")
print(paste("th", 1:length(thetas[[1]]), "=",
thetas[[1]], sep = "", collapse = ","))
print(t(acf2(yt, 20)))
}
See Figures 8.11, 8.12, 8.13, 8.14.
For these processes it is more complicated to derive the theoretical behaviour of
the autocorrelation function analitically. A simpler way (the reader is invited to try
with this method) is to simulate a very long realization from the processes (e.g. with
n = 105 ) and check for the behaviour of the estimated autocorrelation and partial
autocorrelation functions, which will be very close to their theoretical counterparts.
We can, however, observe that the partial autocorrelation function can help us
identifying the order of the autoregressive model apt to describe the involved time
series. It cuts off at the maximum autoregressive lag.
257
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.11 Autocorrelation and Partial autocorrelation plots for a realization from
Process 1: AR with drif t = 2, the autoregressive coefficients are 0.5, 0.1, 0.2, 0, 0
258
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
0.2
0.2
ACF
0.6
1.0
Series: yt
Figure 8.12 Autocorrelation and Partial autocorrelation plots for a realization from
Process 2: AR with drif t = 2, the autoregressive coefficients are 0, 0.1, 0.2, 0, 0.5
259
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.13 Autocorrelation and Partial autocorrelation plots for a realization from
Process 3: AR with drif t = 2, the autoregressive coefficients are 0.1, 0.5, 0, 0.2, 0
260
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.14 Autocorrelation and Partial autocorrelation plots for a realization from
Process 4: AR with drif t = 2, the autoregressive coefficients are 0.2, 0.1, 0, 0.5, 0
320
210
320
200
320
200
BIC
BIC
310
310
300
190
190
180
290
180
270
250
270
240
270
240
260
230
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
180
300
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
300
260
261
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
230
220
250
220
250
210
240
210
Figure 8.15 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8) from some autoregressive
processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4
The function armasubsets available in the package TSA can also be used to establish
the order of the autoregressive process, that is for ARMA model selection. The
selection algorithm orders different models, chosen following a method proposed by
Hannan and Rissanen (1982), according to their BIC (Bayesian Information Criterion)
value. See Fig. 8.15.
> layout(matrix(1:4, 2, 2))
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
> detach("package:TSA")
We detach the package TSA since it re-defines the functions acf and arima.
262
8.2.3
8.2.4
(8.9)
263
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.16
( = 0.8)
264
Let us now plot the autocorrelation function and the partial autocorrelation function
for the following parameter configurations:
Process
Process
Process
Process
1
2
3
4
1
1
1
1
= 0.5
=0
= 0.1
= 0.2
2
2
2
2
= 0.1
= 0.1
= 0.5
= 0.1
3
3
3
3
= 0.2
= 0.2
=0
=0
4
4
4
4
=0
=0
= 0.2
= 0.5
5
5
5
5
=0
= 0.5
=0
=0
(8.10)
265
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.17 Autocorrelation and Partial autocorrelation plots for a realization from
Process 1: MA with moving average parameters 0.5, 0.1, 0.2, 0, 0
266
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.18 Autocorrelation and Partial autocorrelation plots for a realization from
Process 2: MA with moving average parameters 0, 0.1, 0.2, 0, 0.5
267
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
PACF
0.0 0.2 0.4 0.6 0.8 1.0
Figure 8.19 Autocorrelation and Partial autocorrelation plots for a realization from
Process 3: MA with moving average parameters 0.1, 0.5, 0, 0.2, 0
268
0.2
0.2
ACF
0.6
1.0
Series: yt
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.20 Autocorrelation and Partial autocorrelation plots for a realization from
Process 4: MA with moving average parameters 0.2, 0.1, 0, 0.5, 0
110
150
110
150
110
140
100
94
130
88
130
82
100
110
100
110
99
100
95
98
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
140
89
269
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag3
errorlag4
errorlag5
errorlag1
errorlag2
110
150
BIC
150
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag1
errorlag2
errorlag3
errorlag4
errorlag5
97
84
94
80
89
78
83
Figure 8.21 Identification by means of the BIC criterion of the time series simulated with
relationships (8.8)from some moving average
processes. The correspondence between graphs
Process 1 Process 3
and processes is:
.
Process 2 Process 4
The function armasubsets, see Section 8.2.2, available in the package TSA can also
be used for model selection. See Figures 8.21.
> library(TSA)
> sapply(1:4, function(i) plot(armasubsets(data[, i],
nar = 5, nma = 5)))
> detach("package:TSA")
We can observe that the methods, described above, are not always definitively
resolving for the identification of ARMA models fitting the 4 time series. For Process
3, the autocorrelation function suggests the correct moving average order of the
generating process, while both the partial autocorrelation function and Bayesian
Information Criterion give hints about the presence of an autoregressive model. We
consider again the issue in Section 8.2.6.
270
8.2.5
n <- 500
theta <- 0.8
alpha <- 0.5
set.seed(1234)
yt <- arima.sim(model = list(ar = theta, ma = alpha),
n = n)
and plot the autocorrelation function (correlogram) and the partial autocorrelation
function of {yt }. See Fig. 8.22.
> t(acf2(yt, 20, ma.test = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.89 0.70 0.54 0.41 0.3 0.21 0.15 0.10 0.04 0.00 -0.04
PACF 0.89 -0.43 0.20 -0.14 0.0 0.02 0.01 -0.11 0.01 -0.01 -0.05
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF -0.07 -0.1 -0.12 -0.11 -0.08 -0.04 0.00 0.02 0.04
PACF -0.03
0.0 0.00 0.06 0.05 0.03 0.01 -0.04 0.04
We can observe that the autocorrelation and the partial autocorrelation functions die
out very slowly.
A large order is necessary to identify both an AR model and a MA model to the
simulated time series. So the recourse to a more parsimonious ARMA model can help
relieving model complexity.
Fig. 8.23 shows the identification by means of the BIC criterion using the function
armasubsets
> library(TSA)
> plot(armasubsets(yt, nar = 5, nma = 5))
> detach("package:TSA")
8.2.6
271
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.22 Estimate of the autocorrelation and partial autocorrelation functions for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5
errorlag5
errorlag4
errorlag3
errorlag2
errorlag1
Ylag5
Ylag4
Ylag3
Ylag2
Ylag1
(Intercept)
272
840
840
840
BIC
830
830
820
820
720
Figure 8.23 Identification of an ARMA model by means of the BIC criterion for a
realization from {yt }, ARMA(1,1) process with parameters = 0.8, = 0.5
We consider the behaviour of the armasubsets function for different values of the
arguments nar and nma, see Fig. 8.25.
> library(TSA)
> layout(matrix(1:4, 2, 2))
> plot(armasubsets(yt, nar = 5,
Reordering variables and trying
> plot(armasubsets(yt, nar = 8,
Reordering variables and trying
> plot(armasubsets(yt, nar = 5,
Reordering variables and trying
> plot(armasubsets(yt, nar = 8,
Reordering variables and trying
> detach("package:TSA")
nma = 5))
again:
nma = 5))
again:
nma = 8))
again:
nma = 8))
again:
273
0.4
0.0
ACF
0.4
0.8
Series: yt
10
LAG
15
20
10
LAG
15
20
0.4
0.0
PACF
0.4
0.8
Figure 8.24 Autocorrelation and partial autocorrelation functions of {yt }, AR(1) process
with parameters = 0.8
We can conclude that identification methods based on the use of the autocorrelation
function and the partial autocorrelation function or on the use of information criterion,
like the BIC, can help the researcher but cannot definitively solve the problem.
In particular the function armasubsets has to be called for different combinations
of the maximum AR and MA orders to check for the stability of the proposed
identification solution. Here the model suggested by armasubsets depends clearly
on the maximum order of nar and nma the researcher has chosen.
Another solution is to use the tools available in the package forecast, which
perform automatic model selection and forecasting with reference to Exponential
smoothing and ARIMA methods, see Hyndman and Khandakar (2008). In particular
the function auto.arima deals also with seasonal and integrated5 time series providing
the possible orders of differentiation by having recourse to the KPSS and CanovaHansen tests. See ?auto.arima for more information.
5 See Section 8.4 for Integrated ARMA (ARIMA) models. Seasonal ARMA models are not treated
here.
54
53
54
52
51
49
50
BIC
54
47
48
44
40
40
38
38
33
36
56
54
54
54
52
51
51
50
BIC
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1
errorlag2
errorlag3
errorlag4
42
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
Ylag6
Ylag7
Ylag8
errorlag5
errorlag1
errorlag2
errorlag3
errorlag4
BIC
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag6
errorlag7
errorlag8
errorlag1
(Intercept)
Ylag1
Ylag2
Ylag3
Ylag4
Ylag5
errorlag2
errorlag3
errorlag4
errorlag5
errorlag1
274
47
48
42
44
40
40
38
38
33
36
Figure 8.25 Identification for diffferent choices of nar and nma for {yt }, ARMA(1,0)
process with parameter = 0.8
> library(forecast)
> auto.arima(yt, max.p = 10, max.q = 10)
Series: yt
ARIMA(3,1,2)
Coefficients:
ar1
-0.7519
s.e.
0.0914
ar2
-0.6415
0.0884
ar3
-0.7543
0.0674
ma1
0.3568
0.1375
ma2
-0.2349
0.1319
275
8.3
n <- 100
beta <- 0.5
rho <- 0.5
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n, innov = ut)
The argument innov in the function arima.sim defines the sequence to be used as
error in the generating process.
The estimate of results:
> lm(yt[-1] ~ -1 + yt[-n])$coef
yt[-n]
0.7504439
which is quite different from the theoretical value 0.5.
We can expect the following theoretical value for the bias (according to asymptotic
results):
2
) = (1 ) = 0.3.
plim(E()
1 +
Let now perform the preceding task k = 1000 times, obtain summary statistics and
plot an histogram for the bias of the estimates of the coefficient (beta).
To this purpose we can create a function
> simularar <- function(n = 100, beta = 0.5, rho = 0.5) {
ut <- arima.sim(model = list(ar = rho), n = n)
yt <- arima.sim(model = list(ar = beta), n = n,
innov = ut)
lm(yt[-1] ~ -1 + yt[-n])$coef
}
276
50
Frequency
100
150
0.10
0.15
0.20
0.25
0.30
0.35
0.40
betahat beta
Figure 8.26
To replicate k times the preceding function and collect the results, we can use the
function replicate
> k <- 1000
> betahat <- replicate(k, simularar())
> summary(betahat - beta)
Min. 1st Qu. Median
Mean 3rd Qu.
0.1159 0.2587 0.2953 0.2901 0.3234
Max.
0.4174
Figures 8.26 and 8.27 show the histogram and the density estimate for the distribution
of the bias of the parameter estimates. The graphs can be obtained with the code
> hist(betahat - beta)
> plot(density(betahat - beta))
To obtain in a unique graph, see Fig. 8.28, the histogram, the kernel density and the
normal density use the code.
277
4
0
Density
0.1
0.2
0.3
0.4
Figure 8.27
Density
10
278
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Figure 8.28 Histogram, Kernel density estimate, and Normal distribution approximation
of the distribution of the parameter estimate bias
See Sarkar (2008) for a detailed presentation of the package lattice and at
demo(lattice) and ?lattice::lattice for its main features.
> library(lattice)
> tp1 <- histogram(~(betahat - beta), type = "density",
breaks = 11, panel = function(x, ...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x),
sd = sd(x)), n = 101)
panel.densityplot(x, col = "black", lwd = 1,
n = 101, ...)
})
Package lattice defines the axis limits by internal algorithms, see Fig. 8.29 top,
being always possible to define their values when results from automatic procedures
are not satisfying, see Fig. 8.29 bottom.
279
Density
0
0.1
0.2
0.3
0.4
(betahat beta)
Density
8
6
4
2
0.1
0.2
0.3
0.4
(betahat beta)
Figure 8.29 Histogram, Normal distribution approximation and Kernel density estimate
of the distribution of the parameter estimate bias by means of the package lattice
280
8.3.1
The problem does not ensue when you make recourse to the function histogram of
the package lattice; the general call is histogram applied to the data x (remember
to use the ~ simbol); the panel function specifies which curves have to be plotted.
> library(lattice)
> histogram(~x, type = "density", panel = function(x,
...) {
panel.histogram(x, ...)
panel.mathdensity(dmath = dnorm, col = "black",
lwd = 3, args = list(mean = mean(x), sd = sd(x)),
n = 101)
})
8.4
281
the order = c(p, d, q) where p and q are respectively the order of the
autoregressive and of the moving average parts, and the integer d is the possible
order of differentiation to render the time series x weakly stationary;
Moreover, but the subject is not treated here, the argument seasonal
= list(order = c(0, 0, 0), period = NA) is available for dealing with
seasonal time series, see Hyndman and Khandakar (2008).
The general form for a stationary ARMA(p, q) process, as considered by R, see also
Brockwell and Davis (1991), is:
Yt = 1 (Yt1 ) + . . . + p (Ytp ) + t + 1 t1 + . . . + q tq
(8.11)
where = E(Yt ) is the mean value common to the components of the stochastic
process {Yt }.
The stochastic difference equation may be also referred to the de-meaned process
yt = Yt , or to a de-trended process yt = Yt g(t, Xt ), where g(t, Xt ) is a function
of the time and/or of some other stochastic process {Xt }:
yt = 1 yt1 + 2 yt2 + . . . + p ytp + t + 1 t1 + 2 t2 + . . . + q tq ,
for which it follows that = E(yt ) = 0, t. The above relationships can be re-written
by means of the polynomials in the backward operator B : Byt = yt1 :
p (B) = 1 1 B 2 B 2 . . . p B p ,
q (B) = 1 + 1 B + 2 B 2 + . . . + q B q ,
as follows:
p (B)(Yt ) = q (B)t
or
p (B)yt = q (B)t .
Remind that process {Yt } is stationary if the roots of p (z) = 0 lie outside the unit
circle and it is invertible if the roots of q (z) = 0 lie outside the unit circle. In case
some unit roots, say d, were present in p+d (z) = 0 then {Yt } must be differenced d
times, that is an ARIMA(p, d, q) model has to be fitted.
Different configurations for an ARIMA model are now considered: a stationary
ARMA process and two ARIMA processes with integration orders 1 and 2, for the
case without and with the presence of a drift7 .
7 In
case of an ARMA(p,q) model a drift is present when 6= 0 and from (8.11) we have:
Yt = (1 1 . . . p ) + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq
that is
Yt = drif t + 1 Yt1 + . . . + p Ytp + t + 1 t1 + . . . + q tq .
282
Table 8.1 Summary of the code for estimating ARIMA(p, d, q) models with the arima
function and corresponding code for prediction k steps-ahead.
The sarima function in the package astsa can also be used when d 1, see Section 8.5
(B)yt = drif t + (B)t
typical
behaviour
no unit roots
no drift
Fig. 8.30
no unit roots
with drift
Fig. 8.31
1 unit root
no drift
Fig. 8.32
1 unit root
with drift
Fig. 8.33
2 unit roots
no drift
Fig. 8.34
2 unit roots
with drift
Fig. 8.35
arima(x,c(p,0,q),include.mean=FALSE)
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q))
mean of {Yt }
predict(obj,n.ahead=k)
arima(x,c(p,1,q))
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q),xreg=1:length(x))
linear deterministic trend slope
predict(obj,k,newxreg=1:k+length(x))
arima(x,c(p,2,q))
not present
predict(obj,n.ahead=k)
arima(x,c(p,1,q),xreg=(1:length(x))2)
quadratic deterministic trend coefficient
predict(obj,k,newxreg=(1:k+length(x))2)
We will assume that no roots of p (z) = 0 are inside the unit circle and that p (z) = 0
and q (z) = 0 do not have any common roots. As an example, the code is reported
for estimating the parameters of a time series simulated with the presence of only an
autoregressive coefficient of order 1.
Remember that among the parameters, that have to be estimated, are also p and q,
the orders respectively of the autoregressive and moving average parts of the model.
We assume here that the order has already been identified e.g. by examining the
autocorrelation and partial autocorrelation functions and/or using automatic criteria
like TSA::arimasubsets, or forecast::auto.arima, see Section 8.2.6.
See Section 8.10.8 for the estimation of non complete (subset) models, which can be
performed by using the argument fixed in the arima function.
In Section 8.5 other R functions are presented for the parameter estimation of ARIMA
models.
Table 8.1 summarizes the use of arima and also shows how to use the function
predict for obtaining k steps-ahead forecasts.
8.4.1
283
No drift presence
The time series x is stationary; the estimation of
p (B)yt = q (B)t
can be performed with:
> arima(x, c(p, 0, q), include.mean = FALSE)
Example: Estimation of an ARMA(1,0,0) without drift. See Fig. 8.30.
>
>
>
>
n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))
If we do not specify include.mean=FALSE, that is we use the code for the case with
the mean of the series different from 0 (a drift is present), we obtain:
> arima(y, c(1, 0, 0))
Series: y
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9976: log likelihood=-2835.97
AIC=5677.94
AICc=5677.95
BIC=5694.74
we can first observe that the coefficient named intercept in the output is the
estimate for the mean
of {Yt }. Namely the average value for the realization is
mean(y)=0.1507.
As expected the estimate for the mean is not significantly different from zero
(0.1464/0.0969 < 2 ' 1.96 and also the estimate for the drift will not be significant);
so we can proceed to estimate an autoregressive model with 0 mean, that is without
drift, by setting the argument include.mean=FALSE.
> (output <- arima(y, c(1, 0, 0), include.mean = FALSE))
Series: y
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.7719
0
4
y[1:400]
284
100
200
300
400
Index
Figure 8.30 ARIMA(1,0,0) no drift: an autoregressive behaviour with respect to the mean
0 can be observed. A mean reverting is also present, that is the process tends to come back
to its mean value in the short run
s.e.
0.0142
285
$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9993847 1.2624990 1.3958956 1.4696365 1.5118672
With drift
The time series is again stationary with mean =
drif t
p (0) .
The estimation of
p (B)(Yt ) = q (B)t
can be performed with
> arima(x, c(p, 0, q))
Estimation of an ARMA(1,0,0) with drift. See Fig. 8.31.
>
>
>
>
n <- 2000
drift <- 2
set.seed(123)
y <- arima.sim(model = list(ar = 0.8), n = n, rand.gen = function(n)
drift + rnorm(n))
> (output <- arima(y, c(1, 0, 0)))
Series: y
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
10.1463
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9977: log likelihood=-2835.99
AIC=5677.97
AICc=5677.98
BIC=5694.77
Remind that intercept is the estimate for the mean of {Yt }: the average value in
the realization results mean(y)=10.1507.
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 10.02461 10.05261 10.07418 10.09078 10.10356
286
10
6
y[1:400]
12
14
100
200
300
400
Index
Figure 8.31 ARIMA(1,0,0) with drift: an autoregressive behaviour with respect to the
t
mean ( drif
) can be observed
1
$se
Time Series:
Start = 2001
End = 2005
Frequency = 1
[1] 0.9988302 1.2605381 1.3926252 1.4653016 1.5067219
8.4.2
If p+1 (z) = 0 has 1 unit root it follows p+1 (B) = p (B)(1 B) = p (B) = 0.
Here it is essential to distinguish if a drift characterizes the differenced series.
287
No drift presence
When no drift is present, the model
p (B)Yt = q (B)t
can be estimated with:
> arima(x, c(p, 1, q))
Observe that the intercept, that is the estimate of the mean, will not be produced.
Estimation of an ARMA(1,1,0) no drift. See Fig. 8.32.
>
>
>
>
n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 1, 0)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
which should correspond to
> arima(diff(y), c(1, 0, 0))
Series: diff(y)
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 intercept
0.7699
0.1464
s.e. 0.0143
0.0969
sigma^2 estimated as 0.9976: log likelihood=-2835.97
AIC=5677.94
AICc=5677.95
BIC=5694.74
here the intercept, that is the estimate of the mean of {yt } is produced since
no integration was required in the model, but it is not significant, thus we have to
proceed again to estimate a model without the mean:
> arima(diff(y), c(1, 0, 0), include.mean = FALSE)
20
0
y[1:400]
40
60
288
100
200
300
400
Index
Series: diff(y)
ARIMA(1,0,0) with zero mean
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
which is equivalent to the first estimation result.
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
289
Start = 2002
End = 2006
Frequency = 1
[1] 301.3907 301.3837 301.3783 301.3741 301.3709
$se
Time Series:
Start = 2002
End = 2006
Frequency = 1
[1] 0.9993847 2.0333772 3.1199618 4.2095757 5.2762076
With drift
When the drift is present for the differenced series, it corresponds to the presence of
a linear (deterministic) trend in {Yt } and the model
p (B)(Yt bt) = q (B)t
can be estimated with:
> arima(x, c(p, 1, q), xreg = 1:length(x))
where xreg is an external regressor for {Yt }, here the time sequence.
The coefficient corresponding to xreg is the estimate for the linear (deterministic)
slope b, while the estimate for the drift is p (1) xreg = (1 1 . . . p ) xreg.
Namely in case of an ARIMA(1,1,0) we have:
(Yt bt) = (Yt1 b(t 1)) + t
Yt b = [Yt1 b] + t
Yt = (1 )b + Yt1 + t
the drift (1 )b corresponding to a linear (deterministic) trend with slope b.
Estimation of an ARMA(1,1,0) with drift. See Fig. 8.33.
>
>
>
>
n <- 2000
drift <- 0.2
set.seed(123)
y <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 1, 0), xreg = 1:length(y)))
Series: y
ARIMA(1,1,0)
Coefficients:
ar1 1:length(y)
290
s.e.
0.7700
0.0143
1.1158
0.0971
291
200
0
100
y[1:400]
300
400
100
200
300
400
Index
8.4.3
If 2 unit roots are present in p+2 (B) = 0 we have p+2 (B) = p (B)(1 B)2 =
p (B)2 = 0.
Also here we have to distinguish if a drift characterizes the differenced series.
No drift presence
When no drift is present, the model
p (B)2 Yt = (B)t
can be estimated with:
> arima(x, c(p, 2, q))
292
n <- 2000
drift <- 0
set.seed(123)
y <- arima.sim(model = list(order = c(1, 2, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 2, 0)))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1
0.7719
s.e. 0.0142
sigma^2 estimated as 0.9988: log likelihood=-2837.1
AIC=5678.2
AICc=5678.2
BIC=5689.4
Prediction 5 steps-ahead
> predict(output, n.ahead = 5)
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 219352.7 219654.1 219955.5 220256.8 220558.2
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9993847
2.9449759
5.9209014
9.9226820 14.9209790
With drift
When the drift is present for the differenced series, it corresponds to the presence of
a quadratic (deterministic) trend and the model
p (B)2 (Yt ct2 ) = q (B)t
can be estimated with:
> arima(x, c(p, 2, q), xreg = (1:length(x))^2)
293
4000
0
2000
y[1:400]
6000
8000
100
200
300
400
Index
The coefficient corresponding to xreg is the estimate for the coefficient c of the
quadratic deterministic trend; the estimate for the drift results (1) 2 xreg =
(1 1 . . . p ) 2 xreg.
Namely in case of an ARIMA(1,2,0) we have:
2 (Yt ct2 ) = 2 (Yt1 c(t 1)2 ) + t
and since 2 ct2 = [ct2 c(t 1)2 ] = (2ct c) = 2c
2 Yt 2c = [2 Yt1 2c] + t
2 Yt = (1 )2c + 2 Yt1 + t ;
the drift (1 )2c corresponds to a deterministic quadratic trend with coefficient c.
Estimation of an ARMA(1,2,0) with drift. See Fig. 8.35.
> n <- 2000
> drift <- 0.2
294
> set.seed(123)
> y <- arima.sim(model = list(order = c(1, 2, 0), ar = 0.8),
n = n, rand.gen = function(n) drift + rnorm(n))
> (output <- arima(y, c(1, 2, 0), xreg = (1:length(y))^2))
Series: y
ARIMA(1,2,0)
Coefficients:
ar1 (1:length(y))^2
0.7701
0.5493
s.e. 0.0143
0.0490
sigma^2 estimated as 0.9978: log likelihood=-2836.09
AIC=5678.19
AICc=5678.2
BIC=5694.99
The coefficient corresponding to xreg is an estimate for the coefficient in a linear
model without the intercept describing y as a quadratic function of the time:
> lm(y ~ -1 + I((1:length(y))^2))
Call:
lm(formula = y ~ -1 + I((1:length(y))^2))
Coefficients:
I((1:length(y))^2)
0.5493
The estimate of the drift is:
> (1 - output$coef[1]) * 2 * output$coef[2]
ar1
0.2525803
Prediction 5 steps-ahead
> predict(output, n.ahead = 5, newxreg = (1:5 + length(y))^2)
$pred
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 2222338 2224642 2226946 2229252 2231558
$se
Time Series:
Start = 2003
End = 2007
Frequency = 1
[1] 0.9988837
2.9417787
5.9114554
9.9022919 14.8840825
295
40000
0
20000
y[1:400]
60000
80000
100
200
300
400
Index
8.5
Some other functions are present in the packages of the R system to obtain parameter
estimates for an ARIMA model. See the R help for more information.
Results may differ since numerical methods are adopted within different estimation
techniques, they may depend also on the assumptions made on the starting values of
t , when the order q of the moving average part of the model is greater than 1.
We simulate an ARIMA(2,1,0) process yt = {Yt } and apply some of the available
functions to the estimation of the parameters of an ARMA(2,0,0) model to Dyt = Yt .
> n <- 2000
> drift <- 0.2
> set.seed(123456)
296
8.5.1
drif t
Pay attention, the quantity called intercept is the estimate for the mean 1
=
1 2
0.5; moreover the covariance matrix of the estimates is found from the Hessian of the
log-likelihood, and so may be only a rough guide.
intercept
0.5043
0.0605
log likelihood=-2846.2
BIC=5722.81
1:length(yt)
0.5038
0.0605
log likelihood=-2846.2
BIC=5722.81
8.5.2
297
xmean
0.5043
0.0605
log likelihood=-2846.2
BIC=5722.81
$AIC
[1] 1.011115
$AICc
[1] 1.012125
$BIC
[1] 0.01951616
The result is consistent with:
> sarima(yt, 2, 1, 0, details = FALSE)
$fit
Series: xdata
ARIMA(2,1,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
constant
0.5038
0.0605
log likelihood=-2846.2
BIC=5722.81
practical purposes the order d=1 is the one most frequently occurring.
298
Standardized Residuals
500
1000
1500
2000
Time
ACF of Residuals
ACF
0.1
0.1
0.3
Sample Quantiles
10
20
30
40
50
LAG
Theoretical Quantiles
0.4
p value
0.8
0.0
10
15
20
lag
Figure 8.36 Diagnostic tools from sarima for the model ARIMA(4,0,0) applied to the
differenced time series
$AIC
[1] 1.011113
$AICc
[1] 1.012123
$BIC
[1] 0.01951124
Observe that the constant is the estimate for the coefficient defining the linear
deterministic trend.
Some diagnostic graphs for the residuals are also produced, see Figures 8.36.
8.5.3
299
intercept
0.5043
0.0605
log likelihood=-2846.2
BIC=5722.81
8.5.4
MPE
6.975735e+01
For ARIMA models (also fractionally differenced). Pay attention: the mean of the
time series is first estimated, according also to Brockwell and Davis (1991), then other
parameters are estimated for the de-meaned series (without drift). Also in this case
the quantity in the output, which is named intercept corresponds to the estimate for
the mean. With the function armaFit in the package fArma the estimation methods
"MLE" Maximum Likelihood and "ols" Ordinary Least Squares may be used.
> library(fArma)
> ar2 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> summary(ar2)
Title:
ARIMA Modelling
Call:
armaFit(formula = Dyt ~ ar(2), data = Dyt, method = "ols")
Model:
300
intercept
0.5034
Median
0.01726
3Q
0.69478
Max
3.68815
Moments:
Skewness Kurtosis
0.012757 -0.004689
Coefficient(s):
Estimate Std. Error t
ar1
0.43233
0.02193
ar2
0.19710
0.02192
intercept
0.50344
0.02246
--Signif. codes: 0 "***" 0.001 "**"
sigma^2 estimated as:
AIC Criterion:
value Pr(>|t|)
19.71
<2e-16 ***
8.99
<2e-16 ***
22.41
<2e-16 ***
0.01 "*" 0.05 "." 0.1 " " 1
NULL
0
Description:
Fri May 24 17:10:56 2013 by user: gabriele.cantaluppi
8.5.5
For ARIMA models (also fractionally differenced). It uses a Fast Maximum Likelihood
estimation method as proposed by McLeod and Zhang (2008). Here the parameter
mean is indicated correctly.
> library(FitARMA)
> a <- FitARMA(Dyt, c(2, 0, 0))
> summary(a)
ARIMA(2,0,0)
length of series = 2000 , number of parameters = 3
loglikelihood = -8.33 , aic = 22.7 , bic = 39.5
> coef(a)
MLE
sd
Z-ratio
phi(1) 0.4321938 0.02192204 19.7150305
phi(2) 0.1970989 0.02192204 8.9909012
mu
0.5034386 1.18871170 0.4235161
301
intercept
0.5036
0.0399
8.5.6
The ar function
The function ar fits an autoregressive time series model to the data, by default
selecting the complexity of the model by the Akaike Information Criterion.
> ar(Dyt, order.max = 8, aic = TRUE)
302
package
function to estimate
stats
arima
sarima by Stoffer
armaFit
FitAR
predict
sarima.for
predict
predict
fArma
FitAR
Call:
ar(x = Dyt, aic = TRUE, order.max = 8)
Coefficients:
1
2
0.4320 0.1972
Order selected 2
8.5.7
sigma^2 estimated as
1.01
For ARIMA models with also the presence of exogenous variables affecting the
response through a transfer function, the function arima in the package TSA can
be used.
> library(TSA)
> arima(Dyt, c(2, 0, 0))
Series: x
ARIMA(2,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.4322 0.1971
s.e. 0.0219 0.0219
intercept
0.5043
0.0605
8.6
log likelihood=-2846.2
BIC=5720.81
According to the function one uses to obtain the parameter estimates of an ARIMA
model there exist corresponding functions to forecast future values of the time series,
see Table 8.2.
Remember to use new.xreg with stats::arima when integrated ARIMA models
are considered.
303
Examples of code
304
the number of steps-ahead to consider for the forecast; the three successive
arguments specify the order of the ARIMA model.
> sarima.for(Dyt, 4, 2, 0, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 0.1758992 0.2301188 0.3210887 0.3710921
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.093829 1.159756 1.186845
sarima.for produces consistent forecasts also for the original time series
without having to include the new.xreg argument corresponding to the external
regressor.
> sarima.for(yt, 4, 2, 1, 0)
$pred
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1007.053 1007.283 1007.603 1007.974
$se
Time Series:
Start = 2002
End = 2005
Frequency = 1
[1] 1.004066 1.753864 2.530035 3.272484
The functions predict in the packages fArma and FitAR have the same structure
of the function predict in the package stats.
> library(fArma)
> ar4 <- armaFit(Dyt ~ ar(2), data = Dyt, method = "ols")
> predict(ar4, 4)
$pred
Time Series:
Dyt
1900
1920
1940
1960
1980
2000
Time
Figure 8.37
Start = 2001
End = 2004
Frequency = 1
[1] 0.1746484 0.2283741 0.3188913 0.3686140
$se
Time Series:
Start = 2001
End = 2004
Frequency = 1
[1] 1.004065 1.093883 1.159847 1.186961
$out
Time Series:
Start = 2001
End = 2004
Frequency = 1
305
306
2001
2002
2003
2004
Low 95
-1.7933
-1.9156
-1.9544
-1.9578
> library(FitAR)
> a <- FitAR(Dyt, c(2))
> predict(a, 4)
$Forecasts
1
2
3
4
2000 0.1755649 0.2296387 0.3204801 0.3703991
$SDForecasts
1
2
3
4
2000 1.003959 1.093712 1.159633 1.186718
sarima.for and the method for fArma produce also graphs with the forecasts and
their confidence intervals, see Fig. 8.37.
8.7
Data on the ratio of the S&P composite stock price index and S&P composite earnings
over the period 18712009 (T = 139) are considered; they can be read by means of
the function readEViews, having extracted the file priceearnings.wf1 from the
compressed archive ch08.zip.
Last line (140) has to be dropped, since the corresponding observation does not exist.
The function ts(object, start, frequency) creates a multiple time series from
the columns of a table; in this case there is no need to specify the frequency since
data are annual.
> library(hexView)
> pe <- readEViews(unzip("ch08.zip", "Chapter 8/priceearnings.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
> pe <- pe[-140, ]
> pe <- ts(pe, start = 1871)
The following variables are available:
To obtain a plot of the log of the stock price and of the earnings series use the function
xyplot available in the package lattice. Remind that a multiple time series, mts,
307
LNE
LNP
1900
1950
2000
Time
Figure 8.38
object can be treated like a matrix so it is possible to make reference to the appropriate
columns of the object pe
> library(lattice)
> xyplot(pe[, 1:2], superpose = TRUE)
8.7.1
As Verbeek observes it is clear that both the log price and log earnings series are not
weakly stationary, he suggests to test if the non stationarity is due to the presence of
a stationary trend or of one or more deterministic roots.
To test for the presence of a unit root we have to consider the standard Dickey-Fuller
regression, see Verbeeks equation (8.58):
Yt = + t + 1 Yt1 + et
(8.12)
Let y be the log price series pe[,2]. The estimates of the parameters in relationship
(8.12) can be obtained by making use of the function dynlm as:
308
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
c(1:138)
0.0017627 0.0007406
2.380 0.01870 *
L(y)
-0.0984286 0.0375499 -2.621 0.00977 **
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.04883,
Adjusted R-squared: 0.03474
F-statistic: 3.465 on 2 and 135 DF, p-value: 0.03408
The Dickey-Fuller statistic is given by the t statistic for the lagged variable coefficient
(2.621): the coefficient must be different from 0 for refusing the presence of a
unit root; its t statistic has to be compared with proper critical values to assess
a conclusion.
8.7.2
The result can be obtained directly by using the function ur.df available in the
package urca. The function ur.df has four parameters. The first is a time series.
With the parameter type it is possible to specify if only a constant has to be included
in model (8.12) (type="drift") or both a trend and the drift (type="trend") or
neither the drift nor the trend (type="none") have to be included. The parameter
lags specifies a number of lags for Yt to include in the regression (8.12); selectlags,
which by default is equal to "fixed", may be set to "AIC" or "BIC" for obtaining an
automatic lag selection according to the Akaike or the Bayesian Information criteria,
within the maximum number of lags specified by lags.
> library(urca)
> summary(ur.df(y, type = "trend", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
309
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
-0.56867 -0.10424
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4370623 0.1647873
2.652 0.00895 **
z.lag.1
-0.0984286 0.0375499 -2.621 0.00977 **
tt
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.04883,
Adjusted R-squared: 0.03474
F-statistic: 3.465 on 2 and 135 DF, p-value: 0.03408
310
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.54915 -0.10718
Median
0.01243
3Q
0.11533
Max
0.32751
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.468720
0.169427
2.767 0.00647 **
z.lag.1
-0.106162
0.038692 -2.744 0.00691 **
tt
0.001897
0.000760
2.496 0.01376 *
z.diff.lag
0.077647
0.086853
0.894 0.37293
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1771 on 133 degrees of freedom
Multiple R-squared: 0.05456,
Adjusted R-squared: 0.03324
F-statistic: 2.559 on 3 and 133 DF, p-value: 0.05781
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.57092 -0.10709
Median
0.01747
3Q
0.12584
Max
0.38166
Coefficients:
(Intercept)
z.lag.1
-0.0907927
tt
0.0016462
z.diff.lag1 0.0766940
z.diff.lag2 -0.1370071
--Signif. codes: 0 "***"
0.0399401
0.0007765
0.0867307
0.0906890
-2.273
2.120
0.884
-1.511
311
0.0246 *
0.0359 *
0.3782
0.1333
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.61682 -0.11229
Median
0.01993
3Q
0.11150
Max
0.39296
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4710862 0.1780298
2.646 0.00916 **
z.lag.1
-0.1067164 0.0407667 -2.618 0.00991 **
tt
0.0018926 0.0007863
2.407 0.01750 *
z.diff.lag1 0.1131589 0.0887265
1.275 0.20447
z.diff.lag2 -0.1370854 0.0902885 -1.518 0.13138
z.diff.lag3 0.1613740 0.0910693
1.772 0.07876 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1761 on 129 degrees of freedom
312
Adjusted R-squared:
p-value: 0.02625
0.05765
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.59392 -0.10608
Median
0.02363
3Q
0.10998
Max
0.35148
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4221223 0.1843159
2.290
0.0237 *
z.lag.1
-0.0952906 0.0422614 -2.255
0.0259 *
tt
0.0017278 0.0008067
2.142
0.0341 *
z.diff.lag1 0.1168995 0.0891027
1.312
0.1919
z.diff.lag2 -0.1617383 0.0935295 -1.729
0.0862 .
z.diff.lag3 0.1631230 0.0913932
1.785
0.0767 .
z.diff.lag4 -0.0994732 0.0926253 -1.074
0.2849
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1766 on 127 degrees of freedom
Multiple R-squared: 0.1009,
Adjusted R-squared: 0.05844
F-statistic: 2.376 on 6 and 127 DF, p-value: 0.03292
313
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.59366 -0.10540
Median
0.02747
3Q
0.11002
Max
0.34461
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4220206 0.1890821
2.232
0.0274 *
z.lag.1
-0.0934907 0.0434077 -2.154
0.0332 *
tt
0.0016215 0.0008219
1.973
0.0507 .
z.diff.lag1 0.1141232 0.0910061
1.254
0.2122
z.diff.lag2 -0.1582504 0.0937244 -1.688
0.0938 .
z.diff.lag3 0.1553505 0.0946890
1.641
0.1034
z.diff.lag4 -0.0971210 0.0927473 -1.047
0.2970
z.diff.lag5 -0.0172506 0.0931058 -0.185
0.8533
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1767 on 125 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.05009
F-statistic: 1.994 on 7 and 125 DF, p-value: 0.06091
314
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.60505 -0.09813
Median
0.02635
3Q
0.10937
Max
0.36751
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4665094 0.1942144
2.402
0.0178 *
z.lag.1
-0.1045930 0.0446025 -2.345
0.0206 *
tt
0.0018028 0.0008365
2.155
0.0331 *
z.diff.lag1 0.1306554 0.0923672
1.415
0.1597
z.diff.lag2 -0.1370080 0.0958230 -1.430
0.1553
z.diff.lag3 0.1501264 0.0949723
1.581
0.1165
z.diff.lag4 -0.0682406 0.0960788 -0.710
0.4789
z.diff.lag5 -0.0208425 0.0933334 -0.223
0.8237
z.diff.lag6 0.1097285 0.0933387
1.176
0.2420
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1771 on 123 degrees of freedom
Multiple R-squared: 0.1107,
Adjusted R-squared: 0.05282
F-statistic: 1.913 on 8 and 123 DF, p-value: 0.06374
315
8.7.3
It is possible to create a function that simplifies the code writing, without repeating
7 times the same command. Observe that the function ur.df uses classes of type S4,
so one has to use the @, and not the $ sign, to extract a slot9 from an object of class
S4 produced by ur.df, see str(ur.df(y,type="drift",lags=6)).
> f <- function(x) {
urtest <- ur.df(y, type = "trend", lags = x)
c(stat = urtest@teststat[1], "5% crit. value" = urtest@cval[1,
2])
}
> a <- 0:6
> names(a) <- c("DF", paste("ADF(", 1:6, ")", sep = ""))
> round(sapply(a, f), 3)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
stat
-2.621 -2.744 -2.273 -2.618 -2.255 -2.154 -2.345
5% crit. value -3.430 -3.430 -3.430 -3.430 -3.430 -3.430 -3.430
The function f with argument x is defined, which extracts from the object resulting
from ur.df applied to the time series y, with type="trend" and lags=x, the first
element of the slot teststat, which is the Dickey-Fuller statistic, and the element in
the first row, second column of the slot cval, which is the 5% critical value (see also
the Critical values for test statistics section in the preceding outputs).
The variable a contains the desired lags for the unit root test.
The names DF and ADF(1) to ADF(6) are assigned to the elements of a.
The function sapply is finally used to call the function f for the different values of
the lags in the array a.
8.7.4
None of the preceding tests implies a rejection of the null hypothesis of unit root.
Verbeek suggests to use the Phillips-Perron and the KPSS tests10 for unit root, the
tests can be obtained with the functions ur.pp and ur.kpss available in the package
urca.
The arguments of the function ur.pp are the time series x to be tested for a unit root,
the type, which can be "Z-alpha" or "Z-tau"; the model, with values "constant"
9 Elements
10 Remember
316
or "trend", determining the deterministic part in the test regression, and lags,
specifying the lags used for correction of error term, which can be "short" or "long";
an exact number of lags can be specified with the argument use.lag. The output has
a structure similar to that of ur.df. See the help ?ur.pp for more information.
The arguments of the function kpss.pp are the time series x to be tested for a unit
root, the type, which can be "mu" or "tau"; lags, specicying the maximum number
of lags used for correction of error term, which can be "short", "long", or "nil".
An exact number of lags can be specified with the argument use.lag. The output
has a structure similar to that of ur.df. Only the version with Bartlett weights is
implemented. See the help ?ur.kpss for more information.
> summary(ur.pp(y, type = "Z-tau", model = "trend",
use.lag = 6))
##################################
# Phillips-Perron Unit Root Test #
##################################
Test regression with intercept and trend
Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-0.56867 -0.10424
Median
0.01858
3Q
0.11953
Max
0.34743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5586889 0.2065375
2.705 0.00771 **
y.l1
0.9015714 0.0375499 24.010 < 2e-16 ***
trend
0.0017627 0.0007406
2.380 0.01870 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1763 on 135 degrees of freedom
Multiple R-squared: 0.9512,
Adjusted R-squared: 0.9504
F-statistic: 1315 on 2 and 135 DF, p-value: < 2.2e-16
Z-tau-mu
Z-tau-beta
aux. Z statistics
2.5749
2.4219
is: -2.6634
317
1pct
5pct
10pct
critical values -4.02682 -3.442804 -3.14582
> summary(ur.kpss(y, type = "tau", use.lag = 6))
#######################
# KPSS Unit Root Test #
#######################
Test is of type: tau with 6 lags.
Value of test-statistic is: 0.2233
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.119 0.146 0.176 0.216
The KPSS statistic 0.2233 is larger than the critical value 0.146 thus rejecting trend
stationarity in favour of a unit root.
The KPSS statistic can also be obtained by means of the function kpss.test available
in the package tseries but the function does not allow an exact lag to be specified.
8.7.5
By imposing a first unit root it is possible to test for the presence of a second unit
root with regressions of the form, see Verbeek p. 298:
2 Yt = + Yt1 + c1 2 Yt1 + . . . + t
(8.13)
318
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.99802 -0.10972
Median
0.03045
3Q
0.13696
Max
0.50098
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.502842
0.177367
2.835 0.00536 **
z.lag.1
-0.269302
0.096935 -2.778 0.00632 **
tt
0.004042
0.001498
2.698 0.00796 **
z.diff.lag1 0.057546
0.115829
0.497 0.62021
z.diff.lag2 -0.122193
0.109751 -1.113 0.26772
z.diff.lag3 -0.034512
0.105715 -0.326 0.74463
z.diff.lag4 -0.085099
0.102001 -0.834 0.40573
z.diff.lag5 -0.267248
0.095763 -2.791 0.00610 **
z.diff.lag6 0.049989
0.097278
0.514 0.60826
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2292 on 123 degrees of freedom
Multiple R-squared: 0.2382,
Adjusted R-squared: 0.1887
F-statistic: 4.808 on 8 and 123 DF, p-value: 3.644e-05
319
Call:
lm(formula = y ~ y.l1 + trend)
Residuals:
Min
1Q
-1.04265 -0.11320
Median
0.04181
3Q
0.13670
Max
0.54002
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.929064
0.180139
5.157 8.72e-07 ***
y.l1
0.676123
0.063411 10.662 < 2e-16 ***
320
trend
0.004739
0.001050
4.515 1.37e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.2319 on 135 degrees of freedom
Multiple R-squared: 0.8791,
Adjusted R-squared: 0.8773
F-statistic: 490.9 on 2 and 135 DF, p-value: < 2.2e-16
Z-tau-mu
Z-tau-beta
is: -4.908
aux. Z statistics
5.9746
4.3123
321
2.0
2.5
3.0
3.5
1900
1950
2000
Time
Figure 8.39
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -4.6422 7.2047 10.7863
> ur.df(y, type = "trend", lags = 6)
###############################################################
# Augmented Dickey-Fuller Test Unit Root / Cointegration Test #
###############################################################
The value of the test statistic is: -2.3615 1.9641 2.8168
The KPSS(6) statistic with error correction using the Bartlett kernel results 0.331
and does not reject the null of no unit root:
> summary(ur.kpss(y, type = "mu", use.lag = 6))
#######################
# KPSS Unit Root Test #
322
#######################
Test is of type: mu with 6 lags.
Value of test-statistic is: 0.3308
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.739
The conclusion is that data are not sufficiently informative to distinguish between the
two hypotheses; and mean reversion, if present, is very slow.
8.8
As the reader has surely observed, the function ur.df (with the options none, drift
and trend) produces respectively from 1 to 3 statistics named tau1/tau2/tau3,
phi1/phi2 and phi3. We now consider the hypotheses underlying those test statistics.
No autocorrelations for the differenced series in the augmented Dickey-Fuller test are
taken into account for simplicity.
8.8.1
(8.14)
by substituting the first relationship in the latter one we have an AR(1) process.
Note that if = 1 we have a random walk
Yt = Yt1 + t .
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = ( 1)Yt1 + t .
In the ur.df output will appear only one statistic: tau1, which corresponds to the
hypothesis:
H0 : random walk without drift
H1 : stationary AR(1) without drift
1%, 5% and 10% critical values for the tau1 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when
the statistic is negative and very far from 0 which gives evidence that the process is
stationary.
8.8.2
323
(8.15)
8.8.3
(8.18)
(8.19)
324
(8.20)
that is an ARIMA(0,1,0) (a random walk) with drift, and {Yt } is said difference
stationary; namely, it results
Yt = + t
The augmented Dickey-Fuller test consists in testing the null 1 = 0 in the following
model
Yt = a + bt + ( 1)Yt1 + t
The test is named tau3 in the ur.df output and corresponds to the following set of
hypotheses
H0 : difference stationary (random walk with drift)
H1 : trend stationary
1%, 5% and 10% critical values for the tau3 statistic are reported in the output. The
null hypothesis of unit root will be rejected at a given critical value if the DickeyFuller statistic is lower than the corresponding critical value. This happens when the
statistic is negative and very far from 0 which gives evidence that the process is trend
stationary.
Other tests, say phi2 and phi3, regard the null hypotheses:
8.8.4
Example
We simulate a realization from process (8.18), with n = 1000, = 0.4, = 0.5 and
= 0.8.
>
>
>
>
>
>
>
>
>
set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta <- 0.8
epsilon <- rnorm(n)
ut <- arima.sim(model <- list(ar = theta), n, innov = epsilon)
t <- 1:n
yt <- alpha + gamma * t + ut
Figure 8.40 shows the first 200 observations of the simulated time series.
325
20
40
60
80
100
50
100
150
200
Time
Figure 8.40
> library(lattice)
> xyplot(ts(cbind(yt[1:200], seq(1, 100, length = 200))),
superpose = TRUE, auto.key = FALSE)
Figure 8.41 shows 200 observations from the simulation of a difference stationary time
series.
> library(lattice)
> xyplot(ts(cbind(cumsum(gamma + epsilon[1:200]), seq(1,
100, length = 200))), superpose = TRUE, auto.key = FALSE)
We now estimate model (8.19) by making use of dynlm
Yt = 0 + 0 t + Yt1 + t
> library(dynlm)
> summary(dynlm(yt ~ t + L(yt, 1)))
20
40
60
80
100
326
50
100
150
200
Time
Figure 8.41
Median
0.0007
3Q
0.7159
Max
3.0052
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.565388
0.062490
9.048
<2e-16 ***
t
0.099609
0.009481 10.507
<2e-16 ***
L(yt, 1)
0.800671
0.018970 42.207
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
327
Median
0.1114
3Q
0.8558
Max
3.0957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(yt, 1)
0.0015126 0.0001259 12.015
<2e-16 ***
L(d(yt), 1) -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.064 on 996 degrees of freedom
Multiple R-squared: 0.1435,
Adjusted R-squared: 0.1418
F-statistic: 83.46 on 2 and 996 DF, p-value: < 2.2e-16
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1):
328
> library(urca)
> summary(ur.df(yt, type = "none", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-3.0465 -0.6226
Median
0.1114
3Q
0.8558
Max
3.0957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1
0.0015126 0.0001259 12.015
<2e-16 ***
z.diff.lag -0.0056115 0.0317110 -0.177
0.86
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.064 on 996 degrees of freedom
Multiple R-squared: 0.1435,
Adjusted R-squared: 0.1418
F-statistic: 83.46 on 2 and 996 DF, p-value: < 2.2e-16
drift
We first test for the presence of a unit root
> summary(dynlm(d(yt) ~ L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-3.1111 -0.7205
Median
0.0281
3Q
0.7169
329
Max
3.1238
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
L(yt, 1)
-1.666e-05 2.269e-04 -0.073
0.9415
L(d(yt), 1) -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.032 on 995 degrees of freedom
Multiple R-squared: 0.004227,
Adjusted R-squared:
F-statistic: 2.112 on 2 and 995 DF, p-value: 0.1215
0.002226
330
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-3.1111 -0.7205
Median
0.0281
3Q
0.7169
Max
3.1238
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.400e-01 6.752e-02
7.997 3.52e-15 ***
z.lag.1
-1.666e-05 2.269e-04 -0.073
0.9415
z.diff.lag -6.498e-02 3.164e-02 -2.054
0.0403 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 1.032 on 995 degrees of freedom
Multiple R-squared: 0.004227,
Adjusted R-squared:
F-statistic: 2.112 on 2 and 995 DF, p-value: 0.1215
trend
We first test for the presence of a unit root
> t <- t[-n]
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1000
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
-2.99426 -0.68188
Median
0.00284
3Q
0.71159
Max
3.03239
0.002226
331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.649385
0.065077
9.979
<2e-16 ***
t
0.103120
0.009998 10.314
<2e-16 ***
L(yt, 1)
-0.206347
0.020006 -10.314
<2e-16 ***
L(d(yt), 1) 0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9813 on 994 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.09778
F-statistic: 37.02 on 3 and 994 DF, p-value: < 2.2e-16
The Dickey-Fuller statistic corresponds to the t-value pertaining L(yt, 1) and
appears (tau3) at the first position in the ur.df output at position Value of the
test statistic is:
Now we test the null hypothesis = = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
994 957.17
2
997 1291.15 -3
-333.98 115.61 < 2.2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test will appear (phi2) at the second position in the ur.df output at position
Value of the test statistic is:.
We finally test the null hypothesis = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1)
Model 2: d(yt) ~ 1 + L(d(yt),
Res.Df
RSS Df Sum of Sq
1
994 957.17
2
996 1059.61 -2
-102.44
--Signif. codes: 0 "***" 0.001
+ L(d(yt), 1)
1)
F
Pr(>F)
53.193 < 2.2e-16 ***
"**" 0.01 "*" 0.05 "." 0.1 " " 1
The F test appears (phi3) at the third position in the ur.df output at position Value
of the test statistic is:.
332
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-2.99426 -0.68188
Median
0.00284
3Q
0.71159
Max
3.03239
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.649385
0.065077
9.979
<2e-16 ***
z.lag.1
-0.206347
0.020006 -10.314
<2e-16 ***
tt
0.103120
0.009998 10.314
<2e-16 ***
z.diff.lag
0.037754
0.031690
1.191
0.234
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9813 on 994 degrees of freedom
Multiple R-squared: 0.1005,
Adjusted R-squared: 0.09778
F-statistic: 37.02 on 3 and 994 DF, p-value: < 2.2e-16
8.8.5
Exercise
set.seed(456)
n <- 1000
alpha <- 0.4
gamma <- 0.5
theta1 <- 0.8
ut <- arima.sim(model <- list(order = c(1, 1, 0),
ar = theta1), n)
> t <- ts(0:n)
> yt <- alpha + gamma * t + ut
Testing for a unit root
> summary(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)))
Time series regression with "ts" data:
Start = 3, End = 1001
Call:
dynlm(formula = d(yt) ~ t + L(yt, 1) + L(d(yt), 1))
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539
3Q
0.70942
Max
3.09277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
t
0.0008830 0.0005898
1.497
0.1347
L(yt, 1)
-0.0011549 0.0007256 -1.592
0.1118
L(d(yt), 1) 0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 995 degrees of freedom
Multiple R-squared: 0.6359,
Adjusted R-squared: 0.6348
F-statistic: 579.3 on 3 and 995 DF, p-value: < 2.2e-16
Testing for = = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ -1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ -1 + L(d(yt), 1)
333
334
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
995 956.70
2
998 977.21 -3
-20.511 7.1106 9.956e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Testing for = 1 = 0
> anova(dynlm(d(yt) ~ t + L(yt, 1) + L(d(yt), 1)),
dynlm(d(yt) ~ 1 + L(d(yt), 1)))
Analysis of Variance Table
Model 1: d(yt) ~ t + L(yt, 1) + L(d(yt), 1)
Model 2: d(yt) ~ 1 + L(d(yt), 1)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
995 956.70
2
997 959.27 -2
-2.5686 1.3357 0.2634
> summary(ur.df(yt, type = "trend", lags = 1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
Median
-2.91494 -0.68226 -0.00539
3Q
0.70942
Max
3.09277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1487382 0.0652815
2.278
0.0229 *
z.lag.1
-0.0011549 0.0007256 -1.592
0.1118
tt
0.0008830 0.0005898
1.497
0.1347
z.diff.lag
0.7975308 0.0191505 41.645
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 995 degrees of freedom
Multiple R-squared: 0.6359,
Adjusted R-squared: 0.6348
F-statistic: 579.3 on 3 and 995 DF, p-value: < 2.2e-16
3Q
0.70502
Max
3.10600
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.1575230 0.0656766
2.398
0.0166 *
t
0.0008424 0.0005911
1.425
0.1544
L(yt, 1)
-0.0011135 0.0007270 -1.532
0.1259
L(d(yt), 1:2)1 0.8165655 0.0316832 25.773
<2e-16 ***
L(d(yt), 1:2)2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 993 degrees of freedom
Multiple R-squared: 0.6366,
Adjusted R-squared: 0.6351
F-statistic: 434.9 on 4 and 993 DF, p-value: < 2.2e-16
> summary(ur.df(yt, type = "trend", lags = 2))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
Median
-2.90554 -0.69296 -0.00589
3Q
0.70502
Max
3.10600
335
336
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1575230 0.0656766
2.398
0.0166 *
z.lag.1
-0.0011135 0.0007270 -1.532
0.1259
tt
0.0008424 0.0005911
1.425
0.1544
z.diff.lag1 0.8165655 0.0316832 25.773
<2e-16 ***
z.diff.lag2 -0.0233147 0.0317436 -0.734
0.4628
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.9806 on 993 degrees of freedom
Multiple R-squared: 0.6366,
Adjusted R-squared: 0.6351
F-statistic: 434.9 on 4 and 993 DF, p-value: < 2.2e-16
8.8.6
Exercise
Median
0.02312
3Q
0.73171
Max
3.03899
Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(yt, 1)
0.0007409 0.0008270
0.896
0.371
L(d(yt), 1) 0.0333952
0.0317223
1.053
337
0.293
1.943e-05
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min
1Q
-2.99524 -0.65714
Median
0.02312
3Q
0.73171
Max
3.03899
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1
0.0007409 0.0008270
0.896
0.371
z.diff.lag 0.0333952 0.0317223
1.053
0.293
Residual standard error: 0.9818 on 996 degrees of freedom
Multiple R-squared: 0.002023,
Adjusted R-squared:
F-statistic: 1.01 on 2 and 996 DF, p-value: 0.3647
Call:
lm(formula = z.diff ~ z.lag.1 - 1)
1.943e-05
338
Residuals:
Min
1Q
-3.00751 -0.66268
Median
0.02014
3Q
0.73729
Max
3.01637
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 0.0007877 0.0008256
0.954
0.34
Residual standard error: 0.9816 on 998 degrees of freedom
Multiple R-squared: 0.0009112,
Adjusted R-squared:
F-statistic: 0.9102 on 1 and 998 DF, p-value: 0.3403
-8.988e-05
8.9
To import the data from the file ppp2.wf1, which is a work file of EViews, call first
the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Observations from January 1988 to December 2010 (T=276) on price indices and
exhange rates for the United Kingdom, the United States and the Euro area are
available11 .
11 As Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.
339
4.6
4.8
5.0
5.2
LOGCPIEURO
LOGCPIUK
1990
1995
2000
2005
2010
Time
Figure 8.42
Log consumer price index UK and Euro area, Jan 1988Dec 2010
340
> library(urca)
> y <- ppp[, 7]
> summary(ur.df(y, type = "trend", lags = 0))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min
1Q
Median
-0.0098260 -0.0013041 -0.0000486
3Q
0.0013773
Max
0.0092253
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.010e-01 3.480e-02
2.902 0.00401 **
z.lag.1
-2.106e-02 7.467e-03 -2.821 0.00514 **
tt
3.306e-05 1.399e-05
2.363 0.01882 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.002523 on 272 degrees of freedom
Multiple R-squared: 0.06164,
Adjusted R-squared: 0.05474
F-statistic: 8.934 on 2 and 272 DF, p-value: 0.0001746
341
ADF(7)
-3.439
ADF(7)
FALSE
342
ADF(7)
-3.368
ADF(7)
-3.439
We can observe how for the UK consumer price index the null hypothesis of unit root
is not rejected at 5% only for lags 6, 8 and 36; while for the other lags it is accepted
the hypothesis of stationarity at a confidence level of 5%. If we consider a confidence
level of 1%, to which correspond the critical value 3.98, the null of unit root is never
rejected. A larger negative evidence would be necessary for the Dickey-Fuller statistic
to reject the null of unit root and accept the trend stationarity.
With reference to the log exchange rate Euro/UK we have the following unit root
test results
> f <- function(x) {
nt <- summary(ur.df(y, type = "drift", lags = x))@teststat[1]
wt <- summary(ur.df(y, type = "trend", lags = x))@teststat[1]
return(c(nt, wt))
}
> f1 <- function(x) {
nt <- summary(ur.df(y, type = "drift", lags = x))
nt <- (nt@teststat[1] >= nt@cval[1, 2])
wt <- summary(ur.df(y, type = "trend", lags = x))
wt <- (wt@teststat[1] >= wt@cval[1, 2])
return(c(nt, wt))
}
> a <- 0:6
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> y <- log(ppp[, 6])
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> rownames(out) <- rownames(out1) <- c("Without trend",
343
"With trend")
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend -1.264 -1.215 -1.243 -1.340 -1.525 -1.404 -1.284
With trend
-1.249 -1.195 -1.222 -1.319 -1.505 -1.375 -1.248
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
we have no rejection of the null hypothesis of unit root.
With reference to the log real exchange rate between Euro area and UK:
rst = st (pt pt )
see Fig. 8.43 obtained with the code
> xyplot(ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]),
start = c(1988, 1), freq = 12))
we have the following unit root test results
> y <- ts(log(ppp[, 6]) - (ppp[, 7] - ppp[, 8]), start = c(1988,
1), freq = 12)
> a <- c(0:6, 12)
> names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
> out <- sapply(a, f)
> out1 <- sapply(a, f1)
> rownames(out) <- rownames(out1) <- c("Without trend",
"With trend")
> print(round(out, 3))
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend -1.492 -1.473 -1.427 -1.476 -1.627 -1.520 -1.389
With trend
-1.490 -1.469 -1.418 -1.466 -1.616 -1.504 -1.367
ADF(12)
Without trend -1.993
With trend
-1.966
> print(out1)
DF ADF(1) ADF(2) ADF(3) ADF(4) ADF(5) ADF(6)
Without trend TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
With trend
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
ADF(12)
Without trend
TRUE
With trend
TRUE
the null hypothesis of a unit root in rst cannot be rejected.
The KPSS test (Bartlett weights) with a lag length of 6 results:
0.2
0.3
0.4
0.5
0.6
344
1990
1995
2000
2005
2010
Time
Figure 8.43
345
3Q
0.014394
Max
0.053296
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006740
0.004856
1.388
0.166
L(y)
0.981693
0.012267 80.024
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.02116 on 273 degrees of freedom
Multiple R-squared: 0.9591,
Adjusted R-squared: 0.959
F-statistic: 6404 on 1 and 273 DF, p-value: < 2.2e-16
Verbeek observes that according to this result a proportion of 0.982 of any shock in
the real exchange rate will still remain after one month. Thus the shock after two
months is
> DFreg$coef[2]^2
[1] 0.9637211
and the half life of a shock, describing how long it would take for the effect of a shock
to die out 50%, results
> log(0.5)/log(DFreg$coef[2])
[1] 37.51477
8.10
10
10
15
346
1960
1970
1980
1990
2000
2010
Time
Figure 8.44
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-16.6299 -0.9669
Median
0.0608
3Q
1.1217
Max
7.3940
Coefficients:
Estimate Std. Error t value
(Intercept) 0.76124
0.28860
2.638
z.lag.1
-0.18593
0.05911 -3.145
z.diff.lag1 -0.52310
0.07622 -6.863
z.diff.lag2 -0.29818
0.06921 -4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00901
0.00192
8.57e-11
2.60e-05
**
**
***
***
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min
1Q
-16.2318 -1.0063
Coefficients:
Median
0.0886
3Q
1.0133
Max
7.2619
347
348
Pr(>|t|)
0.017873
0.005447
8.16e-09
0.000929
0.803994
0.173639
*
**
***
***
349
0.4
0.0
ACF
0.4
0.8
Series: infl
4
LAG
4
LAG
0.4
0.0
PACF
0.4
0.8
Figure 8.45 Sample autocorrelation and partial autocorrelation functions of inflation rate.
The scale for lags is years. The 2 is at the eight-th position since we have quarterly data.
PACF 0.61 0.35 0.28 -0.04 0.09 0.12 -0.11 -0.18 0.02 0.08 0.07
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
ACF
0.26 0.19 0.27 0.22 0.23 0.29 0.23 0.23 0.27 0.25
PACF 0.03 -0.05 0.21 -0.04 0.00 0.07 0.01 -0.02 0.02 0.04
[,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.29 0.21 0.21 0.16 0.13 0.16 0.14 0.06 0.06
PACF 0.12 -0.19 -0.02 -0.06 -0.05 0.05 0.01 -0.08 0.02
The autocorrelation function confirms the high persistence of inflation. Verbeek
initially assumes that there are no unit roots and proposes the estimation of an
AR(3) model
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t
since the partial autocorrelation function cuts off at lag 3.
350
8.10.1
AR estimation
We can use the function dynlm in the package dynlm to obtain the estimate of the
AR(3) model by OLS:
> library(dynlm)
> ar3regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3))
> summary(ar3regr)
Time series regression with "ts" data:
Start = 1960(4), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3))
Residuals:
Min
1Q
-16.6299 -0.9669
Median
0.0608
3Q
1.1217
Max
7.3940
Coefficients:
Estimate Std. Error t value
(Intercept) 0.76124
0.28860
2.638
L(infl)
0.29097
0.06823
4.264
L(infl, 2)
0.22492
0.06931
3.245
L(infl, 3)
0.29818
0.06921
4.308
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00901
3.11e-05
0.00138
2.60e-05
**
***
**
***
8.10.2
351
The Ljung-Box statistics for the first 6 autocorrelations can be obtained by applying
Verbeeks relationship (8.67) to the residuals of the regression
QK = T (T + 2)
K
X
k=1
where
rk2
1
r2 .
T k k
8.10.3
It is possible to make direct use of the function Box.test(x, lag, type, fitdf),
where x is a univariate time series object; lag specifies the number of lags to consider;
type the statistic to compute "Box-Pierce" or "Ljung-Box"; fitdf is the number of
degrees of freedom to be subtracted to lag, if x is a series of residuals, usually fitdf
= p+q (where p and q are the orders respectively of the AR and MA parts of an
ARMA model describing the process level) so the degrees fo freedom are lag (p + q),
provided of course that lag > fitdf.
The Ljung-Box statistics for the first 6 and 12 autocorrelations result:
> Box.test(residuals(ar3regr), lag = 6, type = "Ljung-Box",
fitdf = 3)
Box-Ljung test
data: residuals(ar3regr)
X-squared = 10.6758, df = 3, p-value = 0.01361
> Box.test(residuals(ar3regr), lag = 12, type = "Ljung-Box",
fitdf = 3)
Box-Ljung test
data: residuals(ar3regr)
X-squared = 16.8408, df = 9, p-value = 0.05127
To obtain AIC and BIC according to Verbeek formulae (8.68) and (8.69) use12
> T = length(infl)
> log(summary(ar3regr)$sigma^2) + 2 * (3 + 1)/T
[1] 1.767248
> log(summary(ar3regr)$sigma^2) + (3 + 1)/T * log(T)
[1] 1.83231
12 In the output Verbeek reports AIC and BIC computed based on the log likelihood, say
AIC = 2l/T + 2(p + q)/T where l is the log likelihood, and not on the variance of the residuals.
The function arima, see the next subsection, uses 2l + 2(p + q) for AIC.
352
8.10.4
We can use the function arima to obtain the estimate of the AR(3) model:
> (ar3est <- arima(infl, c(3, 0, 0)))
Series: infl
ARIMA(3,0,0) with non-zero mean
Coefficients:
ar1
ar2
0.2925 0.2278
s.e. 0.0669 0.0681
ar3
0.2970
0.0681
intercept
3.7845
0.8628
8.10.5
AR(4) estimation
Verbeek extends the model by adding an additional autoregressive term; the estimate
of the model with OLS is:
> ar4regr <- dynlm(infl ~ L(infl) + L(infl, 2) + L(infl,
3) + L(infl, 4))
> summary(ar4regr)
Time series regression with "ts" data:
Start = 1961(1), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl) + L(infl, 2) + L(infl, 3) + L(infl,
4))
Residuals:
Min
1Q
-16.5054 -1.0348
Median
0.1046
3Q
1.0461
Max
7.3404
Coefficients:
Estimate Std. Error t value
(Intercept) 0.77308
0.29583
2.613
L(infl)
0.30700
0.07179
4.277
L(infl, 2)
0.23416
0.07170
3.266
L(infl, 3)
0.31272
0.07258
4.309
L(infl, 4) -0.04466
0.07266 -0.615
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.00967
2.97e-05
0.00129
2.60e-05
0.53950
**
***
**
***
353
0.4823
ar3
0.3097
0.0711
ar4
-0.0440
0.0712
intercept
3.8046
0.8301
8.10.6
ARMA estimation
Verbeeks adds a moving average term to the AR(3) model. The model:
Yt = + 1 Yt1 + 2 Yt2 + 3 Yt3 + t + 1 t1
can be estimated with (Observe that the mean is returned named as intercept):
> (maest <- arima(infl, c(3, 0, 1)))
Series: infl
ARIMA(3,0,1) with non-zero mean
354
Coefficients:
ar1
ar2
0.1047 0.3029
s.e. 0.2078 0.1055
ar3
0.3607
0.0871
ma1
0.2069
0.2221
intercept
3.8094
0.8235
8.10.7
AR(6) estimation
Since the three estimated models still exhibit some residual serial correlation Verbeek
suggests to inspect the residual autocorrelation and partial autocorrelation functions,
see Fig. 8.46,
> library(astsa)
> t(acf2(ar3regr$res, max.lag = 30))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.03 -0.16 -0.04 -0.01
PACF 0.02 -0.02 -0.05 -0.08 0.09 0.18 -0.04 -0.16 -0.01 0.01
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
ACF
0.00 0.04 -0.14 0.07 -0.05 0.00 0.11 -0.02 -0.05 0.07
PACF -0.05 -0.01 -0.12 0.14 -0.06 -0.03 0.10 0.00 -0.05 0.05
[,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
ACF
0.06 0.17 0.01 0.04 -0.08 -0.04 0.08 0.09 -0.06 -0.04
PACF 0.08 0.20 -0.07 0.06 0.00 -0.07 0.06 0.04 -0.04 0.03
According to Verbeek the inclusion of a sixth lag seems to be appropriate. We have:
> ar6regr <- dynlm(infl ~ L(infl, 1:6))
> summary(ar6regr)
Time series regression with "ts" data:
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, 1:6))
355
0.2
0.2
ACF
0.6
1.0
Series: ar3regr$res
4
LAG
4
LAG
0.2
0.2
PACF
0.6
1.0
Figure 8.46
Residuals:
Min
1Q
-16.089 -1.049
Median
0.150
3Q
1.043
Max
7.286
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.66109
0.30552
2.164 0.03172 *
L(infl, 1:6)1 0.30011
0.07196
4.170 4.61e-05 ***
L(infl, 1:6)2 0.21447
0.07514
2.854 0.00479 **
L(infl, 1:6)3 0.24636
0.07775
3.168 0.00178 **
L(infl, 1:6)4 -0.10612
0.07759 -1.368 0.17301
L(infl, 1:6)5 0.05719
0.07591
0.753 0.45215
L(infl, 1:6)6 0.13007
0.07300
1.782 0.07636 .
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 2.367 on 191 degrees of freedom
356
ar3
0.2474
0.0751
ar4
-0.1038
0.0753
ar5
0.0595
0.0736
ar6
0.1287
0.0707
0.4855
intercept
3.6914
1.0147
8.10.8
Finally Verbeek suggests the estimation of a non-complete model, including for the
AR the first three and the sixth lags. OLS parameter estimates can be obtained by
using the function dynlm.
> summary(ar6regrrestr <- dynlm(infl ~ L(infl, c(1:3,
6))))
Time series regression with "ts" data:
Start = 1961(3), End = 2010(4)
Call:
dynlm(formula = infl ~ L(infl, c(1:3, 6)))
Residuals:
Min
1Q
-16.4714 -0.9931
Median
0.0571
3Q
1.1065
Max
7.4546
Coefficients:
(Intercept)
L(infl,
L(infl,
L(infl,
L(infl,
--Signif.
c(1:3,
c(1:3,
c(1:3,
c(1:3,
codes:
6))1
6))2
6))3
6))6
0.27279
0.21183
0.23985
0.12020
0.06928
0.06983
0.07591
0.06657
3.937
3.034
3.160
1.806
0.000115
0.002750
0.001835
0.072532
357
***
**
**
.
0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
ar3
0.2414
0.0737
ar4
0
0
ar5
0
0
ar6
0.1206
0.0649
intercept
3.6820
1.0307
the sum of the coefficients in the autoregressive specification, which for the
AR(3) and the non complete AR(6) models results:
> sum(coef(ar3regr)[-1])
[1] 0.8140705
358
> sum(coef(ar6regrrestr)[-1])
[1] 0.8446656
log(0.5)
P
p
log
j=1 j
> log(0.5)/log(sum(coef(ar3regr)[-1]))
[1] 3.369564
> log(0.5)/log(sum(coef(ar6regrrestr)[-1]))
[1] 4.105969
8.11
Data can be read by means of the function read.table, having extracted the
file irates.dat from the compressed archive ch08.zip. The function ts(object,
start, frequency) creates a multiple time series from the columns of a table; in
this case we have a monthly frequency so frequency=12.
> irates <- read.table(unzip("ch08.zip", "Chapter 8/irates.dat"),
header = TRUE)
> irates <- ts(irates, start = c(1946, 12), frequency = 12)
359
10
15
r1
r60
1970
1975
1980
1985
1990
Time
Figure 8.47
The file irates contains monthly interest rates for the United States taken from
McCulloch and Kwon (1993). The series start in December 1946 and ends in February
1991.
All interest rates are expressed in % per year. The variables are coded as:
ri interest rate for a maturity of i months (i = 1, 2, 3, 5, 6, 11, 12, 36, 60, 120).
Verbeek observes that in the text a subsample is used starting in January 1970.
> irates1m <- window(irates[, 1], start = c(1970, 1),
end = c(1991, 2))
To obtain the plot for the 1 month and 5 year interes rates use, as usual, the function
xyplot in the package lattice, see Fig. 8.47.
> library(lattice)
> xyplot(window(irates[, c(1, 9)], start = c(1970,
1), end = c(1991, 2)), superpose = TRUE)
360
The OLS estimate of an AR(1) model for the 1 month interest rate can be obtained,
by making use of the function dynlm.
> library(dynlm)
> irates1moutput <- dynlm(irates1m ~ L(irates1m))
> summary(irates1moutput)
Time series regression with "ts" data:
Start = 1970(2), End = 1991(2)
Call:
dynlm(formula = irates1m ~ L(irates1m))
Residuals:
Min
1Q
-4.2955 -0.3207
Median
0.0160
3Q
0.2993
Max
2.9569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.34902
0.15246
2.289
0.0229 *
L(irates1m) 0.95120
0.01963 48.466
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.8215 on 251 degrees of freedom
Multiple R-squared: 0.9035,
Adjusted R-squared: 0.9031
F-statistic: 2349 on 1 and 251 DF, p-value: < 2.2e-16
From which follows the estimate for the mean
> (uncmean <- as.numeric(irates1moutput$coef[1]/(1 irates1moutput$coef[2])))
[1] 7.151415
The sample average results:
> mean(irates1m)
[1] 7.302512
The Dickey-Fuller statistics, in presence of drift, to test for the presence of a unit root
can be obtained by means of the function ur.df available in the package urca.
> library(urca)
> summary(ur.df(irates1m, lags = 0, type = "drift"))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression drift
361
Call:
lm(formula = z.diff ~ z.lag.1 + 1)
Residuals:
Min
1Q
-4.2955 -0.3207
Median
0.0160
3Q
0.2993
Max
2.9569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.34902
0.15246
2.289
0.0229 *
z.lag.1
-0.04880
0.01963 -2.487
0.0135 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.8215 on 251 degrees of freedom
Multiple R-squared: 0.02404,
Adjusted R-squared: 0.02016
F-statistic: 6.184 on 1 and 251 DF, p-value: 0.01354
362
0.2
0.2
ACF
0.6
1.0
Series: irates1moutput$residuals
0.5
1.0
1.5
1.0
1.5
0.2
0.2
PACF
0.6
1.0
LAG
0.5
LAG
Figure 8.48
363
Calls:
n3: lm(formula = irates3m ~ irates1m)
n12: lm(formula = irates12m ~ irates1m)
n60: lm(formula = irates60m ~ irates1m)
==========================================
n3
n12
n60
-----------------------------------------(Intercept)
0.321*** 1.292*** 3.352***
(0.066)
(0.128)
(0.217)
irates1m
1.009*** 0.947*** 0.739***
(0.009)
(0.017)
(0.028)
-----------------------------------------R-squared
0.982
0.929
0.735
==========================================
The values for n , see Verbeeks relationship (8.86), with = 0.95 result for the three
series:
> (1 - 0.95^3)/(1 - 0.95)/3
[1] 0.9508333
> (1 - 0.95^12)/(1 - 0.95)/12
[1] 0.7660665
> (1 - 0.95^60)/(1 - 0.95)/60
[1] 0.3179767
Forecasts under the hypothesis of random walk corresponds to the last observation:
> irates1m[length(irates1m)]
[1] 5.677
Using the estimate = 0.95 the 10 and 120 periods ahead forecasts result:
> uncmean + 0.95^10 * (irates1m[length(irates1m)] uncmean)
[1] 6.268628
> uncmean + 0.95^120 * (irates1m[length(irates1m)] uncmean)
[1] 7.148285
being the latter forecast very close to the unconditional mean.
8.12
8.12.1
When a white noise time series {t } (e.g. the residuals from an ARMA model) shows
volatility clustering it can be modeled by means of an ARCH model.
364
t = t ht
ht = V ar(t |t ) = 0 + 1 2t1 + 2 2t2 + . . . + p 2tp ,
(8.21)
(8.22)
We can obtain the autoregressive specification for the squared process by solving the
second relationship for 2t
2t = 0 + 1 2t1 + . . . + p 2tp + t .
(8.23)
365
Observe how the latter specification for the squared process resembles that of an
ARMA(max(p, q), q), but always remind the multiplicative nature of the process {t }.
Once the model has been estimated we have to check if the standardized residuals
pt
t
h
are white noise and if they follow the distribution assumed for t .
To check the white noise assumption the autocorrelation and partial autocorrelation
functions can be examined and tests both on the autocorrelations of the standardized
residuals and of squared standardized residuals performed.
The distributional assumption can be graphically tested by means of the QQ plot
and by having recourse to the Jarque-Bera and Shapiro-Wilk tests in case a normal
distribution was assumed, see Section 2.8.
8.12.2
A First Example
Identify a proper ARMA model for the time series ats stored in the workspace
exercise.RData, available on the booksite www.educatt.it/libri/materiali.
To load the R data set use the function load. By examining the behaviour of the time
series, see Fig. 8.49, and of its autocorrelation and partial autocorrelation functions,
see Fig. 8.50, one can conclude that an AR(2) model could fit the data:
> load("exercise.RData")
> library(TSA)
> plot(as.ts(ats))
> t(acf2(ats, 20, ma.test = TRUE))
we can also use, for completeness, the function armasubsets in the package TSA to
select the model with the lowest BIC statistic
> library(TSA)
> plot(armasubsets(ats, nar = 5, nma = 5))
> detach("package:TSA")
Figure 8.51 shows that the best model is an AR(2) with a drift presence. It can be
estimated by using the function dynlm.
> library(dynlm)
> regr <- dynlm(ats ~ L(ats, 1) + L(ats, 2))
> summary(regr)
Time series regression with "zoo" data:
Start = 2012-01-10, End = 2013-05-21
Call:
dynlm(formula = ats ~ L(ats, 1) + L(ats, 2))
366
0
15
10
as.ts(ats)
10
15
20
100
200
300
400
500
Time
Figure 8.49
Residuals:
Min
1Q
-12.6518 -1.1904
Median
-0.0193
3Q
1.2209
Max
13.6909
Coefficients:
Estimate Std. Error t value
(Intercept) 0.66923
0.14675
4.560
L(ats, 1)
0.51766
0.04423 11.704
L(ats, 2)
0.18209
0.04423
4.117
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
6.44e-06 ***
< 2e-16 ***
4.50e-05 ***
"*" 0.05 "." 0.1 " " 1
367
0.2
0.2
ACF
0.6
1.0
Series: ats
10
LAG
15
20
10
LAG
15
20
0.2
0.2
PACF
0.6
1.0
Figure 8.50
series
> plot(regr$res)
By checking the behaviour of the autocorrelation function of the squared residuals, see
Fig. 8.53, the presence of autoregressive conditional heteroscedasticity is confirmed
and an ARCH(2) model is suggested.
> acf2(regr$res^2)
Tests for ARCH effects are performed by regressing the squared residuals on a
constant and p lagged squared residuals series.
> (etsumm <- summary(dynlm(regr$res^2 ~ L(I(regr$res^2),
1))))
Time series regression with "zoo" data:
Start = 2012-01-11, End = 2013-05-21
Call:
dynlm(formula = regr$res^2 ~ L(I(regr$res^2), 1))
368
errorlag5
errorlag4
errorlag3
errorlag2
errorlag1
Ylag5
Ylag4
Ylag3
Ylag2
Ylag1
(Intercept)
200
200
190
BIC
190
190
180
180
170
Figure 8.51
Residuals:
Min
1Q
-43.154 -5.175
Median
-4.319
3Q
Max
-0.740 175.485
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.99773
0.86551
5.774 1.37e-08 ***
L(I(regr$res^2), 1) 0.29382
0.04297
6.838 2.37e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 18.06 on 495 degrees of freedom
Multiple R-squared: 0.08631,
Adjusted R-squared:
F-statistic: 46.76 on 1 and 495 DF, p-value: 2.374e-11
0.08446
T times the R2 gives the test statistic which is distributed according to a 2 random
variable with p degrees of freedom.
369
0
10
regr$res
10
15400
15500
15600
15700
15800
Index
Figure 8.52
370
ACF
0.0 0.2 0.4 0.6 0.8 1.0
Series: regr$res^2
10
15
20
25
30
20
25
30
PACF
0.0 0.2 0.4 0.6 0.8 1.0
LAG
10
15
LAG
Figure 8.53 Autocorrelation and partial autocorrelation functions of the residuals from
the AR model
statistic
parameter
p.value
method
data.name
statistic
parameter
p.value
method
data.name
statistic
parameter
p.value
[,1]
42.89536
1
5.774747e-11
"ARCH LM-test;
"regr$res"
[,2]
56.07503
2
6.660228e-13
"ARCH LM-test;
"regr$res"
[,3]
55.92459
3
4.359513e-12
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
method
"ARCH LM-test;
data.name "regr$res"
[,4]
statistic 56.12527
parameter 4
p.value
1.887501e-11
method
"ARCH LM-test;
data.name "regr$res"
[,5]
statistic 56.11349
parameter 5
p.value
7.700796e-11
method
"ARCH LM-test;
data.name "regr$res"
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
Null hypothesis:
no ARCH effects"
371
formula a formula object describing the mean and variance equation of the
ARMA-GARCH/APARCH model. A pure GARCH(1,1) model is selected
when e.g. formula=~garch(1,1). To specify for example an ARMA(2,1)APARCH(1,1) use formula = ~arma(2,1)+apaarch(1,1).
data an optional timeSeries or data frame object containing the variables in the
model. If not found in data, the variables are taken from environment(formula),
typically the environment from which armaFit is called. If data is an univariate
series, then the series is converted into a numeric vector and the name of the
response in the formula will be neglected.
include.mean this flag determines if the parameter for the mean will be
estimated or not. If include.mean=TRUE this will be the case, otherwise the
parameter will be kept fixed during the process of parameter optimization.
include.shape logical flag which determines if the parameter for the shape of
the conditional distribution will be estimated or not. If include.shape=FALSE
then the shape parameter will be kept fixed during the process of parameter
optimization.
trace a logical flag. Should the optimization process of fitting the model
parameters be printed? By default trace=TRUE.
13 From
372
APARCH models and skew distributions for the errors are also implemented.
We start by considering the larger model Yt AR(4) and t ARCH(4)
> garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(4, 0) + garch(4, 0), data = ats,
trace = FALSE)
Mean and Variance Equation:
data ~ arma(4, 0) + garch(4, 0)
<environment: 0x00000000144b8728>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.60499579
0.51475833
omega
alpha1
1.79948380
0.78609851
ar2
0.18565747
alpha2
0.01896518
ar3
-0.01889278
alpha3
0.00000001
ar4
0.01062030
alpha4
0.00000001
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
6.050e-01
8.374e-02
7.225 5.02e-13 ***
ar1
5.148e-01
4.313e-02
11.934 < 2e-16 ***
ar2
1.857e-01
3.694e-02
5.026 5.01e-07 ***
ar3
-1.889e-02
3.484e-02
-0.542
0.588
ar4
1.062e-02
3.047e-02
0.349
0.727
omega
1.799e+00
2.198e-01
8.188 2.22e-16 ***
alpha1 7.861e-01
1.071e-01
7.338 2.16e-13 ***
alpha2 1.897e-02
3.607e-02
0.526
0.599
alpha3 1.000e-08
4.095e-02
0.000
1.000
alpha4 1.000e-08
NA
NA
NA
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1079.044
normalized:
-2.158088
373
Description:
Fri May 24 17:11:13 2013 by user: gabriele.cantaluppi
From the preceding results we observe that the coeffients pertaining the 3rd and 4th
autoregressive lags as well as 3 and 4 are not significantly different from 0, so we
can proceed estimating the model Yt AR(2) and t ARCH(1)
By using the function summary applied to the object resulting from garchFit
diagnostic results pertaining standardized residuals ht are also produced
t
ar2
0.17563
omega
1.78951
alpha1
0.82856
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
0.60017
0.08381
7.161 8.00e-13 ***
ar1
0.52472
0.03971
13.215 < 2e-16 ***
ar2
0.17563
0.03493
5.029 4.94e-07 ***
omega
1.78951
0.20572
8.699 < 2e-16 ***
alpha1
0.82856
0.10820
7.658 1.89e-14 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-1077.762
normalized:
-2.155525
374
Description:
Fri May 24 17:11:13 2013 by user: gabriele.cantaluppi
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
1.325538
0.9976004
3.376161
8.124189
11.41074
6.433391
7.772612
14.90346
7.920425
p-Value
0.5154222
0.6966819
0.9711369
0.9187073
0.9348674
0.777633
0.9325749
0.7819045
0.7913176
deviations 2 ht superposed
4. the autocorrelation function of the time series yt
5. the autocorrelation function of the squared series 2t
6. the cross correlation function between yt2 and yt
7. the graphical representation of the residuals t
375
200
300
400
500
10
15
20
Conditional SD
Cross Correlation
ACF
25
0.3
Lags
0.1
0
100
200
300
400
500
20
10
10
Lags
Residuals
20
res
100
200
300
400
500
100
200
300
Index
ACF of Observations
Conditional SD's
xcsd
500
400
500
0.0
400
6 10
Index
0.6
10 0
10
20
Index
15
Index
100
6 10
ACF
0.6
ACF
0.0
5
15
20
Time Series
10
15
20
25
100
200
Lags
300
Index
Figure 8.54
fGARCH plots
ht
t
ht
t
ht
2t
ht
and
t
ht
t
ht
t
ht
376
Cross Correlation
0.15
0.05
0.05
ACF
0 1 2
2
sres
Standardized Residuals
100
200
300
400
500
10
qnorm QQ Plot
0.8
0 1 2
0.4
4
0
10
Lags
0.0
ACF
20
Index
Sample Quantiles
10
15
20
25
20
Lags
Theoretical Quantiles
ACF
0.0
0.4
0.8
10
15
20
25
Lags
Figure 8.55
2
3
4
2.544390
2.407925
2.310529
1.973537
2.428413
2.763893
fGARCH plots
1.835684
2.140452
2.363383
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(2, 0) + garch(1, 0), data = ats,
trace=FALSE)
Mean and Variance Equation:
data ~ arma(2, 0) + garch(1, 0)
<environment: 0x0000000019fff210>
[data = ats]
Conditional Distribution:
norm
Coefficient(s):
mu
ar1
0.53781 0.49843
ar2
0.19361
omega
1.94839
alpha1
0.77242
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu
0.537809
0.018706
28.75
<2e-16 ***
ar1
0.498429
0.008842
56.37
<2e-16 ***
ar2
0.193607
0.007809
24.79
<2e-16 ***
omega
1.948387
0.047459
41.05
<2e-16 ***
alpha1 0.772418
0.023185
33.32
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-21554.46
normalized:
-2.155446
Description:
Fri May 24 17:11:16 2013 by user: gabriele.cantaluppi
R
R
R
R
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Statistic
4.09366
NA
12.31859
15.11452
22.38218
p-Value
0.1291436
NA
0.2643003
0.4431998
0.3201377
377
378
Ljung-Box Test
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R^2
R^2
R^2
R
Q(10)
Q(15)
Q(20)
TR^2
17.56984
21.53238
22.21691
19.71963
0.06266804
0.1206646
0.3288577
0.07257902
8.13
Data can be read by means of the function read.table, having extracted the file
usd.dat from the compressed archive ch08.zip.
> crates <- read.table(unzip("ch08.zip", "Chapter 8/usd.dat"),
header = TRUE)
The file usd contains daily exchange rate changes from 5 January 1999 to 28 February
2011 (T=3108), without gaps.
The following variables are available:
Since data are irregular (5 days in a week) it is preferable to create an undated time
series. The differenced US$/Euro exchange rate series is analyzed.
> yt <- as.ts(crates[, 3])
The graphical representation can be obtained as:
> library(lattice)
> xyplot(yt)
By regressing the series yt on a constant the residuals et are obtained that can
be modelled by means of an ARCH specification if conditional heteroscedasticity is
present.
> library(dynlm)
> et <- dynlm(yt ~ 1)$res
Tests for ARCH effects can be obtained by means of the function ArchTest in the
package FinTS. Verbeek considers 1 and 6 lags.
379
1000
2000
3000
Time
Figure 8.56
Daily change in log exchange rate US$/DM, 2 January 1980-21 May 1987
> library(FinTS)
> ArchTest(et, lags = 1, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: et
Chi-squared = 136.3154, df = 1, p-value < 2.2e-16
> ArchTest(et, lags = 6, demean = FALSE)
ARCH LM-test; Null hypothesis: no ARCH effects
data: et
Chi-squared = 208.243, df = 6, p-value < 2.2e-16
Both tests reject the hypothesis of homoscedasticity.
The following four models are estimated: an ARCH(6), a GARCH(1,1), an
EGARCH(1,1) and a GARCH(1,1) model with t-distributed errors. The parameter
estimates of the first two and the fourth models can be obtained by using the function
garchFit available in the package fGarch.
380
> library(fGarch)
> arch6 <- garchFit(formula = ~garch(6, 0), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
> summary(arch6)
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(6, 0), data = as.vector(et),
include.mean = FALSE, trace = FALSE)
Mean and Variance Equation:
data ~ garch(6, 0)
<environment: 0x0000000016613a18>
[data = as.vector(et)]
Conditional Distribution:
norm
Coefficient(s):
omega
alpha1
0.237534 0.072566
alpha6
0.078275
alpha2
0.026908
alpha3
0.084783
alpha4
0.112259
alpha5
0.095242
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega
0.23753
0.01524
15.590 < 2e-16 ***
alpha1
0.07257
0.02194
3.308 0.000939 ***
alpha2
0.02691
0.02074
1.298 0.194433
alpha3
0.08478
0.02270
3.736 0.000187 ***
alpha4
0.11226
0.02451
4.579 4.67e-06 ***
alpha5
0.09524
0.02284
4.169 3.05e-05 ***
alpha6
0.07827
0.01945
4.025 5.69e-05 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-3045.473
normalized:
-0.979882
Description:
Fri May 24 17:11:19 2013 by user: gabriele.cantaluppi
381
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
203.2454
0.992023
5.577922
18.3471
21.25551
9.355185
23.09638
43.9503
11.90132
p-Value
0
4.006602e-12
0.8493912
0.2448572
0.382233
0.4987593
0.08211484
0.001528166
0.4536374
382
0.0
1000
1500
2000
2500
3000
500
15
20
25
Cross Correlation
1000
1500
2000
2500
3000
30
20
10
10
35
Lags
Residuals
20
30
0
4
res
Index
30
0.06 0.04
Conditional SD
500
1000
1500
2000
2500
3000
500
1000
1500
2000
Index
Index
ACF of Observations
Conditional SD's
2500
3000
2500
3000
0.5
0.0
xcsd
0.6
1.5
10
Lags
ACF
0
Index
0.5
500
1.5
ACF
0.6
ACF
0
4
Time Series
10
15
20
25
30
35
500
Lags
1000
1500
2000
Index
Figure 8.57
fGARCH plots
Cross Correlation
ACF
0.00
0.06
0
4
sres
0.06
Standardized Residuals
500
1000
1500
2000
2500
3000
30
20
10
10
Index
Lags
qnorm QQ Plot
20
30
ACF
0.0
0.4
0.8
Sample Quantiles
10
15
20
25
30
35
Lags
Theoretical Quantiles
0.0
ACF
0.4
0.8
10
15
20
25
30
35
Lags
Figure 8.58
Coefficient(s):
omega
alpha1
0.0016224 0.0308557
fGARCH plots
beta1
0.9658062
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega 0.0016224
0.0006968
2.328
0.0199 *
alpha1 0.0308557
0.0041378
7.457 8.86e-14 ***
beta1 0.9658062
0.0044704 216.047 < 2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2978.583
normalized:
-0.9583602
383
384
qnorm QQ Plot
2
0
2
Sample Quantiles
Theoretical Quantiles
Figure 8.59 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption
Description:
Fri May 24 17:11:19 2013 by user: gabriele.cantaluppi
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
135.6136
0.993768
4.401897
15.36537
18.42054
3.850639
6.46133
11.21603
4.583747
p-Value
0
2.814197e-10
0.9274011
0.4254333
0.5597264
0.9538334
0.970914
0.9404265
0.9704592
385
qnorm QQ Plot
0
2
Sample Quantiles
Theoretical Quantiles
Figure 8.60 QQ plot for the standardardized residuals from a GARCH(1,1) model under
the normality assumption
386
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data=as.vector(et), cond.dist =
"std", include.mean = FALSE, trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x00000000179a6cc0>
[data = as.vector(et)]
Conditional Distribution:
std
Coefficient(s):
omega
alpha1
0.0018883
0.0313289
beta1
0.9648050
shape
10.0000000
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
omega 1.888e-03
8.235e-04
2.293
0.0218 *
alpha1 3.133e-02
4.817e-03
6.504 7.84e-11 ***
beta1 9.648e-01
5.246e-03 183.899 < 2e-16 ***
shape 1.000e+01
1.497e+00
6.681 2.38e-11 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Log Likelihood:
-2952.596
normalized:
-0.9499987
Description:
Fri May 24 17:11:20 2013 by user: gabriele.cantaluppi
R
R
R
R
R
R^2
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Statistic
136.7413
0.9937392
4.436802
15.43505
18.49223
3.906787
p-Value
0
2.608938e-10
0.9254986
0.4205569
0.5550169
0.951454
387
qstd QQ Plot
0
2
Sample Quantiles
Theoretical Quantiles
Figure 8.61 QQ plot for the standardardized residuals from a GARCH(1,1) model with
t distributed errors
Ljung-Box Test
Ljung-Box Test
LM Arch Test
R^2
R^2
R
Q(15)
Q(20)
TR^2
6.532642
11.22761
4.622786
0.9693467
0.9401049
0.9694076
388
389
Figure 8.62 shows the conditional standard deviations implied by the EGARCH model
and the QQ plot for checking the normality distribution of the standardized residuals.
> xyplot(as.ts(sigma2hat^0.5))
> stdres <- et/sigma2hat^0.5
> qqnorm(stdres)
The function egarch in the package egarch is also available for estimating the
parameters of an EGARCH(1,1) model, but it refers to the following EGARCH model
specification:
#)
(
"
t1
t1
t1
2
2
+ 1 p
log(t ) = 0 + 1 log(t1 ) + 1 p
E p
ht1
ht1
ht1
different from that proposed by Verbeek, see ?egarch::egarch.
390
0.4
0.6
0.8
1.0
1.2
1000
2000
3000
Time
Normal QQ Plot
0
2
4
Sample Quantiles
Theoretical Quantiles
9
Multivariate Time Series Models
9.1
1t IID(0, 12 )
Xt = Xt1 + 2t ,
2t IID(0, 22 )
3Q
2.1179
Max
8.2879
Coefficients:
Estimate Std. Error t value Pr(>|t|)
392
(Intercept) 3.90971
0.24618
15.88
<2e-16 ***
X
-0.44348
0.04733
-9.37
<2e-16 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 3.27 on 198 degrees of freedom
Multiple R-squared: 0.3072,
Adjusted R-squared:
F-statistic: 87.8 on 1 and 198 DF, p-value: < 2.2e-16
> library(lmtest)
> dwtest(tab9.1)
Durbin-Watson test
0.3037
data: tab9.1
DW = 0.1331, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0
As Verbeek remarks, the usual t and F tests may be misleading in situations like the
present one, where we can observe a quite reasonable R-squared and a low DurbinWatson statistic1 , which gives information about the possible presence of a unit
root in the residuals giving also evidence about the non-existence of a cointegrating
relationship between {Xt } and {Yt } as would appear by considering the spurious
relationship. He suggests to include lagged values of both the dependent and the
independent variables in the regression to avoid the spurious regression problem.
Lets perform 1000 Monte Carlo replications of the above spurious regression
experiment to show the large variability level that characterizes parameter estimates,
since there is no actual presence of a relationship between {Xt } and {Yt }.
> set.seed(12345)
> library(lmtest)
> sim <- function(n = 200) {
X <- c(0, cumsum(rnorm(n - 1)))
Y <- c(0, cumsum(rnorm(n - 1)))
res <- lm(Y ~ X)
res1 <- list(int = res$coef[1], slope = res$coef[2],
int.pv=summary(res)$coeff[1,4],slope.pv=summary(res)$coeff[2,
4], r2 = summary(res)$r.squared, dw = dwtest(res)$statistic)
}
> a <- replicate(1000, sim(n = 200))
The function sim generates the data for the present experiment2 ; it returns the
list res1 with elements: the intercept and the slope estimates, the corresponing tstatistics, the multiple R-squared and the Durbin-Watson statistic.
1 We remind that dw 2 2, so approximately 0 dw 4, being dw = 0 when = 1 that is in
presence of a positive unit root, dw = 2 when = 0, and dw = 4 when = 1 that is in presence of
a negative unit root.
2 Observe that to generate a random walk the following for loop can also be used:
X<-rep(0,n)
for (i in 2:n) X[i]<-X[i-1]+rnorm(1)
but it is less efficient and slower; check the two versions of the code for n=1000000.
393
The function replicate (a shortcut for sapply) returns the matrix a, which
contains the results for the 1000 replications of the experiment; in the rows of a are the
values of the intercept, the slope, their significances, the R2 and the Durbin-Watson
statistic, obtained for each replication of the experiment.
By computing the following summary statistics we can observe that 89.5%
replications present a significant3 estimate for the intercept, 82.5% a significant
estimate for the slope, 73.8% significant estimates for both the intercept and the slope,
33.7% a multiple R-squared larger than 0.3 and 98.8% a value for the Durbin-Watson
statistic lower than 0.25.
> sum((a[3, ] < 0.05))/1000
[1] 0.895
> sum((a[4, ] < 0.05))/1000
[1] 0.825
> sum((a[3, ] < 0.05) * (a[4, ] < 0.05))/1000
[1] 0.738
> sum(a[5, ] > 0.3)/1000
[1] 0.337
> sum(a[6, ] < 0.25)/1000
[1] 0.988
Summary statistics for the intercept and slope give evidence that their estimates are
quite different from simulation to simulation, see also Fig. 9.1, giving evidence of the
presence of spurious relationships. Namely {Xt } and {Yt } do not have any reciprocal
relationships.
> summary(as.numeric(a[1, ]))
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
-21.1800 -4.5880 -0.2781 -0.2166
3.7530 27.2800
> summary(as.numeric(a[2, ]))
Min.
1st Qu.
Median
Mean
3rd Qu.
-2.5020000 -0.3608000 0.0091750 0.0004262 0.3686000
Max.
2.7430000
> layout(matrix(1:2, 1, 2))
> hist(as.numeric(a[1, ]), freq = FALSE, main = "",
xlab = "intercept")
> hist(as.numeric(a[2, ]), freq = FALSE, main = "",
xlab = "slope")
9.2
The analyisis of Long-run Purchasing Power Parity, started in Section 8.9 (Verbeeks
Section 8.5), is continued. To import the data from the file ppp2.wf1, which is a work
3 At
the 5% level.
394
0.0
0.00
0.1
0.01
0.2
0.3
Density
0.03
0.02
Density
0.4
0.04
0.5
0.05
0.6
20
10
intercept
Figure 9.1
20
30
3 2 1
slope
file of EViews, invoke first the package hexView and next the command readEViews.
> library(hexView)
> ppp <- readEViews(unzip("ch08.zip", "Chapter 8/ppp2.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
Verbeek observes that the relationship st = pt pt , where st , pt and pt are
respectively the log of the spot exchange rate, the log of domestic prices and that
of foreign prices, may be interpreted as an equilibrium (long-run) or cointegrating
relationship.
In the example, observations for Euro area and the UK from January 1988 until
December 2010 are considered.
See Section 8.9 for the analysis to detect the non-stationarity of the real exchange
rate rst = st pt + pt .
Verbeek suggests to test if the cointegrating relationship involving the log exchange
rate st and the log of price ratio pt pt can be established. The results in Section
395
library(urca)
x <- ppp$LOGCPIEURO - ppp$LOGCPIUK
a <- c(0:6, 12, 24, 36)
names(a) <- c("DF", paste("ADF(", a[-1], ")", sep = ""))
f <- function(k) {
nt <- summary(ur.df(x, type = "drift", lags = k))@teststat[1]
wt <- summary(ur.df(x, type = "trend", lags = k))@teststat[1]
return(c(nt, wt))
}
> out <- sapply(a, f)
> rownames(out) <- c("Without trend", "With trend")
> print(t(round(out, 3)))
Without trend With trend
DF
-2.487
-2.564
ADF(1)
-2.533
-2.622
ADF(2)
-2.518
-2.639
ADF(3)
-2.137
-2.288
ADF(4)
-2.070
-2.229
ADF(5)
-2.037
-2.213
ADF(6)
-2.103
-2.227
ADF(12)
-2.989
-3.041
ADF(24)
-3.131
-3.424
ADF(36)
-2.027
-1.975
Remembering that the 5% critical values for the Dickey Fuller statistics are 2.88,
3.43 respectively for the situations with only a drift and with both a drift and
the trend, the hypothesis of non-stationarity cannot be rejected. The ADF(24) is
marginally significant.
The parameters in the cointegrating regression (see Verbeeks Table 9.5)
st = + (pt pt ) + t
can be estimated by having, as usual, recourse to the function lm.
> a <- lm(log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ I(LOGCPIEURO - LOGCPIUK), data = ppp)
Residuals:
4 As
Verbeek observes in Sections 8.5, 9.3 and 9.5.4, the exchange rates are accidentally not
converted to logs when producing the results. Here they are converted to logs for obtaining results.
396
Min
1Q
-0.24561 -0.08619
Median
0.01368
3Q
0.06477
Max
0.22179
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.38246
0.01806 21.172 < 2e-16 ***
I(LOGCPIEURO - LOGCPIUK) 1.01657
0.28125
3.614 0.000358 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Residual standard error: 0.1045 on 274 degrees of freedom
Multiple R-squared: 0.04551,
Adjusted R-squared: 0.04203
F-statistic: 13.06 on 1 and 274 DF, p-value: 0.0003581
Tests on the residuals for establishing the possible presence of nonstationarity can
also be performed.
The cointegrating regression Durbin-Watson (CRDW) statistic can be obtained with
the function dwtest available in the package lmtest; pay attention to consider only
the value of the statistic and not its p-value, which is computed against the null of
no autocorrelation presence and not for no cointegration. See Verbeeks Table 9.3 for
the 5% critical values for the CRDW test for no cointegration.
> library(lmtest)
> dwtest(a)$stat
DW
0.04120717
The Augmented Dickey Fuller cointegration test is also performed on the residuals,
see Verbeeks Table 9.6. We can have recourse again to the function f constructed
above, by considering only the first row of the resulting output, which is referred to
the situation with the presence of only the drift.
>
>
>
>
x <- a$residuals
a <- 0:6
out <- sapply(a, f)
print(t(round(out[1, ], 3)))
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[1,] -1.497 -1.478 -1.431 -1.479 -1.628 -1.522 -1.392
Also in this case only the value of the statistic can be considered. See Verbeeks Table
9.2 for the 1%, 5% and 10% asymptotic critical values residual-based unit root ADF
test for no cointegration (with constant term). In the present case the 5% critical
value is 3.34 since two variables were considered in the cointegration relationship.
So the null hypothesis of a unit root cannot be rejected.
Verbeek then suggests to consider a more general cointegrating relationships between
the three variables st , pt and pt , by estimating the parameters in the model
st = + pt + pt + t ,
397
see Verbeeks Table 9.7, and performing the corresponding tests on the residuals as
made above, see Verbeeks Table 9.8.
> a <- lm(log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
> summary(a)
Call:
lm(formula = log(FXEP) ~ LOGCPIEURO + LOGCPIUK, data = ppp)
Residuals:
Min
1Q
-0.24406 -0.08568
Median
0.01416
3Q
0.06590
Max
0.22198
Coefficients:
Estimate Std. Error t value
(Intercept)
0.4193
0.2082
2.014
LOGCPIEURO
1.0076
0.2863
3.520
LOGCPIUK
-1.0151
0.2819 -3.601
--Signif. codes: 0 "***" 0.001 "**" 0.01
Pr(>|t|)
0.045027 *
0.000506 ***
0.000376 ***
"*" 0.05 "." 0.1 " " 1
9.3
398
Results are consistent with those reported by Verbeek. Verbeek observes that the
test results may be sensitive to the number of lags that are included and shows what
happens by including a lag of p = 13, see Verbeeks Table 9.11.
> tab9.11<-johansen(sppstar,r=0,p=13,det="const",restr=FALSE)
> tab9.11$tracestat
stat
90%
95%
99%
r=0 25.386442 26.70 29.38 34.87
r=1 8.444536 13.31 15.34 19.69
r=2 3.358870 2.71 3.84 6.64
> tab9.11$lambda
r=1
r=2
r=3
0.06238690 0.01915137 0.01269016
The test in this case does not reject the null of presence of 0 cointegrating vectors.
9.4
399
They can be read by means of the function readEViews, available in the package
hexView, having extracted the work file of EViews money.wf1 from the compressed
archive ch09.zip. We can create a multiple time series from the columns of a table
or a data.frame with the function ts(object, start, frequency); in this case we
have to specify frequency=4 since we are dealing with quarterly data.
We drop the column with the time reference.
> library(hexView)
> money <- readEViews(unzip("ch09.zip", "Chapter 9/money.wf1"))
Skipping boilerplate variable
Skipping boilerplate variable
> money <- money[, -4]
> money <- ts(money, start = c(1954, 1), frequency = 4)
Verbeek considers first three theoretical relationships governing the long-run
behaviour of these variables, that can be assumed as theoretical cointegrating
relationships. Verbeek performs three separate OLS estimates. The parameter
estimates for the equations describing the money demand, the inflation rate and
the commercial paper rate:
mt = 1 + 14 yt + 15 tbrt + 1t
inf lt = 2 + 25 tbrt + 2t
cprt = 3 + 35 tbrt + 3t
are obtained by means of the function dynlm available in the package dynlm and the
results organized by the function mtable available in the package memisc.
>
>
>
>
>
library(dynlm)
library(lmtest)
library(urca)
demols <- dynlm(M ~ Y + TBR, data = money)
inflols <- dynlm(INFL ~ TBR, data = money)
400
401
no cointegration at the 5% level for the last two equations, while, according to the
residual-based unit root ADF statistic for no cointegration (with constant term), the
hypothesis of no cointegration does not hold only for the third equation, describing
risk premium5 .
Verbeek suggests to perform a multivariate vector analysis to have stronger evidence
for the existence of cointegrating relationships between the five variables. He does
also perform a graphical analysis of the residuals of the three equations to check for
stationarity, see Figures 9.2, 9.3 and 9.4.
> plot(demols$res, type = "l")
> abline(h = 0)
> plot(inflols$res, type = "l")
> abline(h = 0)
> plot(commpapols$res, type = "l")
> abline(h = 0)
To perform the Johansen procedure, the maximum length p in the vector
autoregressive model has to be chosen. Verbeek suggests the orders 5 and 6 and
reports in Table 9.14 the trace and maximum eigenvalue tests for cointegration.
The maximum eigenvalue tests for cointegration can be obtained by means of the
function ca.jo available in the package urca, see Verbeeks Table 9.106 . The main
arguments of the function ca.jo are: x, the data matrix to be investigated for
cointegration; type, the test to be conducted, either "eigen" or "trace"; ecdet,
which can be set to "none" for no intercept in cointegration, "const" for constant
term in cointegration and "trend" for trend variable in cointegration; K, the lag
order of the series (levels) in the VAR, spec which determines the specification of
the VECM and can be "longrun" or "transitory". See the help ?urca::ca.jo for
more information.
> allvar <- money[, c("M", "INFL", "CPR", "Y", "TBR")]
> library(urca)
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 5,
spec = "longrun"))
######################
# Johansen-Procedure #
5 Remember that both the CRDW and ADF statistics are null if a unit root is present, that is
under the null hypothesis of no cointegration. Appropriate 5% critical values from Verbeeks Table
9.2 for the ADF test are 3.34 for possible cointegrating relationships involving 2 variables and
3.74 for possible cointegrating relationships involving 3 variables; while with regard to the CRDW
test from Verbeeks Table 9.3 the 5% critical values are 0.20 for possible cointegrating relationships
involving 2 variables and 0.25 for possible cointegrating relationships involving 3 variables (number
of observations: 200).
6 Critical values for the max-eigenvalue test are taken from Osterwald-Lenum (1992). Though quite
similar, they differ somewhat from the critical values reported by Verbeek in Table 9.9. Observe that
tests are reversed with respect to Verbeeks output.
402
0.00
0.05
0.10
0.15
demols$res
0.05
0.10
0.15
1960
1970
1980
1990
Time
Figure 9.2
######################
Test type: trace statistic,
without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
Values of teststatistic and critical values of test:
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
test
|
3.60
| 11.93
| 29.17
| 59.41
| 104.26
10pct
7.52
17.85
32.00
49.65
71.86
5pct
9.24
19.96
34.91
53.12
76.07
1pct
12.97
24.60
41.07
60.16
84.45
403
2
0
2
4
inflols$res
1960
1970
1980
1990
Time
Figure 9.3
constant
4.9488e-14
8.2817e-12
404
1.0
0.5
0.0
0.5
commpapols$res
1.5
2.0
2.5
1960
1970
1980
1990
Time
Figure 9.4
CPR.d
2.351476 0.2971658 -1.5680125 0.2378696 -0.01334883 -2.3114e-12
Y.d
-0.053129 0.0022981 -0.0055104 0.0017462 -0.00051361 8.3207e-14
TBR.d
1.749904 -0.0049994 -1.5903476 0.2031231 -0.06967642 -7.2022e-13
> summary(ca.jo(allvar, ecdet = "const", type = "trace", K = 6,
spec = "longrun"))
######################
# Johansen-Procedure #
######################
Test type: trace statistic,
without linear trend and constant in cointegration
Eigenvalues (lambda):
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
Values of teststatistic and critical values of test:
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
test
|
2.76
| 14.66
| 36.66
| 71.35
| 120.86
10pct
7.52
17.85
32.00
49.65
71.86
5pct
9.24
19.96
34.91
53.12
76.07
405
1pct
12.97
24.60
41.07
60.16
84.45
Eigenvalues (lambda):
[1] 2.4582e-01 1.7317e-01 1.0279e-01 5.1003e-02 2.2397e-02 7.2499e-16
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 4 | 3.60 7.52 9.24 12.97
r <= 3 | 8.32 13.75 15.67 20.20
r <= 2 | 17.25 19.77 22.00 26.81
406
Eigenvalues (lambda):
[1] 2.6902e-01 1.9711e-01 1.2999e-01 7.2538e-02 1.7324e-02 2.4472e-16
Values of teststatistic and critical values of test:
r
r
r
r
r
<= 4
<= 3
<= 2
<= 1
= 0
|
|
|
|
|
test
2.76
11.90
22.00
34.69
49.51
10pct
7.52
13.75
19.77
25.56
31.66
5pct
9.24
15.67
22.00
28.14
34.40
1pct
12.97
20.20
26.81
33.24
39.79
407
408
10
Models based on panel data
10.1
Verbeek considers the application of the Between, the Fixed effects, the OLS and of
the Random effects estimators to deal with a panel data linear model for an individual
wage equation. The data are saved in the file males.dta, which is in the Stata format
and is available in the compressed archive ch10.zip. To read data we have first to
invoke the package foreign and next the command read.dta.
> library(foreign)
> wages <- read.dta(unzip("ch10.zip", "Chapter 10/males.dta"))
Data are taken from the Youth Sample of the National Longitudinal Survey held in
the USA, and comprise a sample of 545 full-time working males who completed their
schooling by 1980 and were then followed over the period 1980-1987. The males in the
sample are young, with an age in 1980 ranging from 17 to 23, and entered the labour
market fairly recently, with an average of 3 years of experience at the beginning of
the sample period.
The following variables are available:
Exper: Age-6-School
LogExper: Log(1+Experience)
Mar: Married
Black: Black
Hisp: Hispanic
410
S: Lives in South
Verbeek supposes that log wages are explained by years of schooling, years of
experience and its square, dummy variables for being a union member, working in
the public sector and being married and two racial dummies.
The package plm can be used to deal with models based on panel data; the plm
procedures are thoroughly described in Croissant and Millo (2008).
A data.frame containing panel data must be characterized by the presence of two
variables defining respectively individual and time indices. The first two columns in
the data.frame wages contain such information.
> wages[1:5, 1:2]
NR YEAR
1 13 1982
2 13 1981
3 13 1986
4 13 1983
5 13 1984
Repeated measurements on each statistical unit are not ordered according to time;
the following code can be used to reorder the data frame when needed.
> i <- order(wages[, 1], wages[, 2])
> wages <- wages[i, ]
> wages[1:5, 1:2]
NR YEAR
7 13 1980
2 13 1981
1 13 1982
4 13 1983
5 13 1984
The panel model may be estimated with the function plm which requires the following
arguments:
411
effect specifies the kind of effect to introduce in the model; the argument may
assume one of the values individual, time or twoways;
model specifies the kind of model: within, random, ht, between, pooling or
fd; the options are referred respectively to the within or fixed effects estimator,
to the random effects estimator, to the Hausman Taylor estimator, the between
effects estimator, to the pooling estimator which is equivalent to OLS and to
the first-difference estimator.
index defines the indices for the individual and the time; it needs to be used
whenever these two variables are not placed in the first two columns of the
data.frame we are analysing.
412
Max.
1.740
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 0.4903902 0.2211917 2.2170 0.0270394 *
SCHOOL
0.0947911 0.0109178 8.6822 < 2.2e-16 ***
EXPER
-0.0502077 0.0503689 -0.9968 0.3193120
EXPER2
0.0051068 0.0032142 1.5888 0.1126871
UNION
0.2743194 0.0471273 5.8208 1.009e-08 ***
MAR
0.1445897 0.0412654 3.5039 0.0004968 ***
BLACK
-0.1391368 0.0489084 -2.8448 0.0046132 **
HISP
0.0054832 0.0427436 0.1283 0.8979738
PUB
-0.0563215 0.1090691 -0.5164 0.6057992
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
83.06
Residual Sum of Squares: 64.819
R-Squared
: 0.2196
Adj. R-Squared : 0.21598
F-statistic: 18.8539 on 8 and 536 DF, p-value: < 2.22e-16
Output for the fixed effects or within estimation method
> summary(fixed)
Oneway (individual) effect Within Model
Call:
plm(formula = WAGE ~ SCHOOL + EXPER + EXPER2 + UNION + MAR +
BLACK + HISP + PUB, data = wages, model = "within")
Balanced Panel: n=545, T=8, N=4360
Residuals :
Median
0.00992
3rd Qu.
0.15900
Max.
1.47000
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
EXPER
0.11645699 0.00843090 13.8131 < 2.2e-16
EXPER2 -0.00428857 0.00060544 -7.0834 1.668e-12
UNION
0.08120303 0.01931592 4.2039 2.683e-05
MAR
0.04510613 0.01831141 2.4633
0.01381
PUB
0.03492672 0.03860819 0.9046
0.36571
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05
***
***
***
*
***
***
***
*
Max.
2.5600
413
414
Pr(>|t|)
0.5951
< 2.2e-16
< 2.2e-16
5.854e-05
< 2.2e-16
8.271e-12
1.126e-09
0.4523
0.9246
***
***
***
***
***
***
415
Max.
1.5400
Coefficients :
Estimate Std. Error t-value
(Intercept) -0.10431133 0.11083404 -0.9411
SCHOOL
0.10102372 0.00892187 11.3232
EXPER
0.11178514 0.00827093 13.5154
EXPER2
-0.00405745 0.00059198 -6.8540
UNION
0.10641339 0.01786690 5.9559
MAR
0.06254646 0.01677617 3.7283
BLACK
-0.14400263 0.04764392 -3.0225
HISP
0.01972690 0.04263026 0.4627
PUB
0.03015546 0.03646707 0.8269
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*"
Pr(>|t|)
0.3466808
< 2.2e-16
< 2.2e-16
8.189e-12
2.791e-09
0.0001952
0.0025218
0.6435709
0.4083261
***
***
***
***
***
**
Pr(>|t|)
0.3643519
< 2.2e-16
< 2.2e-16
1.813e-09
3.287e-07
0.0009823
0.0041327
0.6207870
0.3720755
***
***
***
***
***
**
To obtain a unique summary output for the results of different objects of class plm
class, we have to invoke the package tonymisc which allows the function mtable in
416
417
> ercomp(random)
var std.dev share
idiosyncratic 0.1234 0.3513 0.539
individual
0.1055 0.3248 0.461
theta: 0.6429
The Hausman test can be performed by having recourse to the function phtest
> phtest(fixed, random)
Hausman Test
data: WAGE~SCHOOL + EXPER + EXPER2 + UNION + MAR + BLACK + HISP + PUB
chisq = 31.7531, df = 5, p-value = 6.649e-06
alternative hypothesis: one model is inconsistent
Observe that the function plm returns only:
the within R2 for the Fixed effects and the Random Effects estimators,
It is possible to compute the three goodness of fit statistics for all the estimators by
having recourse to Verbeeks relationships (10.29)-(10.31).
FE
2
Rwithin
F E = corr2 yit
yiF E , yit yi
FE
where yit
yiF E = (xit x
i )0 F E )
2
Rbetween
B = corr2 yiB , yi
where yiB = x
0i B
2
= corr2 {
Roverall
()
yit , yit }
where yit = x
0it b With regard to the between estimator we first need the complete
data for the variables involved in the between model (that is the model matrix for
the OLS estimator), say XdataOLS:
> XdataOLS <- model.matrix(ols)
then we have
>
>
>
>
Observe that in this case we have a regular panel (that is complete time data for each
statistical unit are available). The goodness of fit statistics result:
418
419
wages$NR, mean)
> modelmatrix <- sapply(1:dim(model.matrix(ols))[2],
ff)
> yiB.hat <- modelmatrix %*% coef(ols)
with the corresponding goodness of fit statistics
> (OLS.withinR2 <- cor((yit.hat - rep(yi.hat, each = 8)),
(wages$WAGE - rep(yi.bar, each = 8)))^2)
[,1]
[1,] 0.1679288
> (OLS.betweenR2 <- cor(yiB.hat, yi.bar)^2)
[,1]
[1,] 0.2026535
> (OLS.overallR2 <- cor(yit.hat, wages$WAGE)^2)
[,1]
[1,] 0.1865882
And finally, for the random effects estimator:
>
>
>
>
10.2
Verbeek applies a dynamic linear panel model to the Flannery and Rangan (2006)
theory for explaining adjustments performed by firms to reach their target capital
structure.
420
Data are available in the file debtratio, a Stata file containing information on
US firms over the years 1987-2001. The panel is unbalanced. Data are taken from
Compustat.
> library(foreign)
> debtratio <- read.dta(unzip("ch10.zip", "Chapter 10/debtratio.dta"))
We can check the structure of the panel data with the function pdim available in the
package plm
> library(plm)
> pdim(debtratio)
Unbalanced Panel: n=5449, T=1-16, N=27762
The following variables are available. (Except for bdr and mdr all variables are already
lagged and refer to the (end of the) previous year).
In Verbeeks Table 10.3 results pertaining the OLS, the within effects and the firstdifference estimators are reported. Robust standard errors have been computed.
Output for the OLS estimation method
> ols <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated, model = "pooling",
data = debtratio)
> summary(ols)
Oneway (individual) effect Pooling Model
Call:
421
Max.
0.7690
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept)
0.05818177 0.01089409
5.3407 9.364e-08 ***
lag(mdr, 1)
0.88350360 0.00455677 193.8880 < 2.2e-16 ***
lagebit_ta
-0.03233775 0.00570742 -5.6659 1.483e-08 ***
lagmb
0.00164320 0.00078139
2.1029 0.0354844 *
lagdep_ta
-0.26051795 0.03346611 -7.7845 7.344e-15 ***
laglnta
-0.00067042 0.00060575 -1.1068 0.2684048
lagfa_ta
0.02012146 0.00514792
3.9087 9.312e-05 ***
lagrd_dum
0.00688957 0.00202285
3.4059 0.0006609 ***
lagrd_ta
-0.12020508 0.01423761 -8.4428 < 2.2e-16 ***
lagindmedian 0.03212249 0.00910841
3.5267 0.0004218 ***
lagrated
0.00713406 0.00291144
2.4504 0.0142803 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
1184.8
Residual Sum of Squares: 306.93
R-Squared
: 0.74093
Adj. R-Squared : 0.74052
F-statistic: 5594.77 on 10 and 19562 DF, p-value: < 2.22e-16
> library(lmtest)
> coeftest(ols, vcov = pvcovHC)
t test of coefficients:
(Intercept)
lag(mdr, 1)
lagebit_ta
lagmb
lagdep_ta
laglnta
lagfa_ta
lagrd_dum
lagrd_ta
lagindmedian
Estimate
0.05818177
0.88350360
-0.03233775
0.00164320
-0.26051795
-0.00067042
0.02012146
0.00688957
-0.12020508
0.03212249
422
lagrated
0.00713406 0.00279809
2.5496 0.0107916 *
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Output for the Within estimation method
> within <- plm(mdr ~ lag(mdr, 1) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
lagrd_ta + lagindmedian + lagrated, model = "within",
data = debtratio)
> summary(within)
Oneway (individual) effect Within Model
Call:
plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "within")
Unbalanced Panel: n=3777, T=1-15, N=19573
Residuals :
Min. 1st Qu.
Median
-0.61700 -0.04940 -0.00208
3rd Qu.
0.04310
Max.
0.57600
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
lag(mdr, 1)
5.3498e-01 7.6646e-03 69.7992 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 8.0860e-03 -6.1876 6.260e-10 ***
lagmb
2.2776e-03 1.1358e-03 2.0052
0.04495 *
lagdep_ta
-1.2395e-01 5.7544e-02 -2.1541
0.03125 *
laglnta
3.8030e-02 2.0593e-03 18.4678 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.2635e-02 4.6969 2.664e-06 ***
lagrd_dum
5.9768e-05 5.8840e-03 0.0102
0.99190
lagrd_ta
-6.5676e-02 2.7093e-02 -2.4241
0.01536 *
lagindmedian 1.6722e-01 1.8959e-02 8.8201 < 2.2e-16 ***
lagrated
2.0590e-02 4.6521e-03 4.4259 9.670e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
307.38
Residual Sum of Squares: 202.75
R-Squared
: 0.3404
Adj. R-Squared : 0.27454
F-statistic: 814.653 on 10 and 15786 DF, p-value: < 2.22e-16
> coeftest(within, vcov = pvcovHC)
423
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
lag(mdr, 1)
5.3498e-01 1.1903e-02 44.9438 < 2.2e-16 ***
lagebit_ta
-5.0033e-02 1.1097e-02 -4.5085 6.575e-06 ***
lagmb
2.2776e-03 1.0083e-03 2.2589 0.0239022 *
lagdep_ta
-1.2395e-01 7.0913e-02 -1.7480 0.0804852 .
laglnta
3.8030e-02 3.0676e-03 12.3974 < 2.2e-16 ***
lagfa_ta
5.9344e-02 1.7073e-02 3.4759 0.0005104 ***
lagrd_dum
5.9768e-05 8.0735e-03 0.0074 0.9940935
lagrd_ta
-6.5676e-02 2.6391e-02 -2.4886 0.0128350 *
lagindmedian 1.6722e-01 2.2355e-02 7.4800 7.823e-14 ***
lagrated
2.0590e-02 5.8272e-03 3.5334 0.0004114 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
To obtain the first-difference we need to have recourse to the following trick by
applying OLS to the differenced series since the argument model="fd" doesnt work
correctly, with the current version (1.3-1) of plm, on unbalanced data with holes and
the current data frame has some holes1 .
Output for the First-difference estimation method
> fdmod01 <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +
diff(lagindmedian) + diff(lagrated), model = "pooling",
data = debtratio)
> summary(fdmod01)
Oneway (individual) effect Pooling Model
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated),
data = debtratio, model = "pooling")
Unbalanced Panel: n=2996, T=1-14, N=15039
Residuals :
Min. 1st Qu.
Median
-0.83700 -0.05340 -0.00987
3rd Qu.
0.05090
Max.
0.76400
424
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept)
0.0088779 0.0010578
8.3927 < 2.2e-16 ***
diff(lag(mdr, 1)) -0.1138871 0.0093735 -12.1499 < 2.2e-16 ***
diff(lagebit_ta)
-0.0451704 0.0076222 -5.9261 3.169e-09 ***
diff(lagmb)
0.0027903 0.0011550
2.4157
0.01572 *
diff(lagdep_ta)
0.1095609 0.0659193
1.6620
0.09652 .
diff(laglnta)
0.0644041 0.0039349 16.3672 < 2.2e-16 ***
diff(lagfa_ta)
0.1055631 0.0157632
6.6968 2.206e-11 ***
diff(lagrd_dum)
-0.0170642 0.0078069 -2.1858
0.02885 *
diff(lagrd_ta)
-0.0592139 0.0278048 -2.1296
0.03322 *
diff(lagindmedian) 0.1815726 0.0250867
7.2378 4.781e-13 ***
diff(lagrated)
0.0094495 0.0063567
1.4865
0.13716
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
238.04
Residual Sum of Squares: 231.35
R-Squared
: 0.02813
Adj. R-Squared : 0.028109
F-statistic: 43.4973 on 10 and 15028 DF, p-value: < 2.22e-16
> coeftest(fdmod01, vcov = pvcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.00887786 0.00094621 9.3825 < 2.2e-16
diff(lag(mdr, 1)) -0.11388713 0.01198604 -9.5016 < 2.2e-16
diff(lagebit_ta)
-0.04517037 0.01012320 -4.4621 8.176e-06
diff(lagmb)
0.00279029 0.00110282 2.5301
0.01141
diff(lagdep_ta)
0.10956092 0.07875898 1.3911
0.16422
diff(laglnta)
0.06440411 0.00510375 12.6190 < 2.2e-16
diff(lagfa_ta)
0.10556310 0.01796875 5.8748 4.323e-09
diff(lagrd_dum)
-0.01706422 0.00907665 -1.8800
0.06013
diff(lagrd_ta)
-0.05921391 0.02865177 -2.0667
0.03878
diff(lagindmedian) 0.18157256 0.02607573 6.9633 3.462e-12
diff(lagrated)
0.00944946 0.00656023 1.4404
0.14977
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " "
***
***
***
*
***
***
.
*
***
By having invoked the packages memisc and tonymisc the function mtable can be
used to collect results in a unique output
> mtable(ols, within)
Calls:
ols: plm(formula = mdr ~ lag(mdr, 1) + lagebit_ta + lagmb + lagdep_ta +
laglnta + lagfa_ta + lagrd_dum + lagrd_ta + lagindmedian +
lagrated, data = debtratio, model = "pooling")
425
are consistent with those present in the third edition of Verbeeks book.
426
Max.
7.1200
Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
diff(lagrd_dum)
diff(lagrd_ta)
diff(lagindmedian)
diff(lagrated)
t-value Pr(>|t|)
-0.8910
0.3730
1.2800
0.2006
1.2442
0.2134
1.3177
0.1876
-1.1783
0.2387
-1.1439
0.2527
-1.1763
0.2395
-0.4064
0.6844
1.1608
0.2458
-1.2300
0.2187
-1.2234
0.2212
(Intercept)
diff(lag(mdr, 1))
t value Pr(>|t|)
-2.8315 0.004641 **
27.8507 < 2.2e-16 ***
427
diff(lagebit_ta)
1.2075966 0.1054279 11.4542 < 2.2e-16 ***
diff(lagmb)
0.2442670 0.0183950 13.2790 < 2.2e-16 ***
diff(lagdep_ta)
-1.8583446 0.7100345 -2.6173 0.008875 **
diff(laglnta)
-0.5214084 0.0537632 -9.6982 < 2.2e-16 ***
diff(lagfa_ta)
-1.0912794 0.1782544 -6.1220 9.534e-10 ***
diff(lagrd_dum)
-0.0231265 0.0790461 -0.2926 0.769856
diff(lagrd_ta)
0.8819365 0.2833939
3.1121 0.001862 **
diff(lagindmedian) -3.3777791 0.2218749 -15.2238 < 2.2e-16 ***
diff(lagrated)
-0.2724663 0.0514317 -5.2976 1.194e-07 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
with instrumental variables, with constant3
> fd <- plm(diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) +
diff(lagfa_ta) + diff(lagrd_dum) + diff(lagrd_ta) +
diff(lagindmedian) + diff(lagrated) | . - diff(lag(mdr,
1)) + lag(mdr, 2), model = "pooling", data = debtratio)
> summary(fd)
Oneway (individual) effect Pooling Model
Instrumental variable estimation
(Balestra-Varadharajan-Krishnakumar's transformation)
Call:
plm(formula = diff(mdr) ~ diff(lag(mdr, 1)) + diff(lagebit_ta) +
diff(lagmb) + diff(lagdep_ta) + diff(laglnta) + diff(lagfa_ta) +
diff(lagrd_dum)+diff(lagrd_ta)+diff(lagindmedian)+diff(lagrated) |
. - diff(lag(mdr,1))+lag(mdr,2),data=debtratio,model="pooling")
Unbalanced Panel: n=2996, T=1-14, N=15039
Residuals :
Min. 1st Qu.
-1.5400 -0.0875
Max.
1.3400
Coefficients :
(Intercept)
diff(lag(mdr, 1))
diff(lagebit_ta)
diff(lagmb)
diff(lagdep_ta)
diff(laglnta)
diff(lagfa_ta)
3 Results
t-value
0.8355
10.4884
8.1257
10.9391
-2.0422
-4.3973
-4.7521
Pr(>|t|)
0.403434
< 2.2e-16
4.791e-16
< 2.2e-16
0.041152
1.104e-05
2.032e-06
are consistent with those present in the third edition of Verbeeks book.
***
***
***
*
***
***
428
diff(lagrd_dum)
-0.0211894 0.0126926 -1.6694 0.095052 .
diff(lagrd_ta)
0.1265905 0.0480137 2.6365 0.008384 **
diff(lagindmedian) -0.5839094 0.0783180 -7.4556 9.432e-14 ***
diff(lagrated)
-0.0524240 0.0116591 -4.4964 6.963e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Total Sum of Squares:
238.04
Residual Sum of Squares: 611.01
R-Squared
: 0.0065943
Adj. R-Squared : 0.0065894
F-statistic: -917.329 on 10 and 15028 DF, p-value: 1
> coeftest(fd, vcov = pvcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.0015331 0.0011090
1.3825
0.16684
diff(lag(mdr, 1))
1.3581587 0.0484357 28.0404 < 2.2e-16 ***
diff(lagebit_ta)
0.2027082 0.0212571
9.5360 < 2.2e-16 ***
diff(lagmb)
0.0468071 0.0032282 14.4996 < 2.2e-16 ***
diff(lagdep_ta)
-0.2268575 0.1495174 -1.5173
0.12922
diff(laglnta)
-0.0532186 0.0099653 -5.3404 9.410e-08 ***
diff(lagfa_ta)
-0.1658858 0.0360364 -4.6033 4.193e-06 ***
diff(lagrd_dum)
-0.0211894 0.0163588 -1.2953
0.19524
diff(lagrd_ta)
0.1265905 0.0495030
2.5572
0.01056 *
diff(lagindmedian) -0.5839094 0.0469868 -12.4271 < 2.2e-16 ***
diff(lagrated)
-0.0524240 0.0116359 -4.5054 6.675e-06 ***
--Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1
Arellano Bond one-step
The function pgmm can be used to perform generalized method of moments estimation
for static or dynamic models with panel data. The main argument are: formula: a
symbolic description for the model to be estimated. The prefered interface is now
to indicate a multi-part formula, the first two parts describing the covariates and
the gmm instruments and, if any, the third part the normal instruments; data: a
data.frame; effect: the effects introduced in the model, one of "twoways" (the
default) or "individual"; model one of "onestep" (the default) or "twosteps";
transformation the kind of transformation to apply to the model: either "d" (the
default value) for the difference GMM model or "ld" for the system GMM; fsm: the
matrix for the one step estimator: one of "I" (identity matrix) or "G" (= D0 D where
D is the first-difference operator) if transformation="d", one of "GI" or "full" if
transformation="ld".
> gmm <- pgmm(mdr ~ lag(mdr) + lagebit_ta + lagmb +
lagdep_ta + laglnta + lagfa_ta + lagrd_dum +
429
Median
0.00000
15039
Mean
0.00106
3rd Qu.
0.00000
Max.
0.86950
Coefficients
Estimate Std. Error
lag(mdr)
0.4538191 0.0505589
lagebit_ta
0.0490861 0.0151092
lagmb
0.0206848 0.0022631
lagdep_ta
-0.0227627 0.0957791
laglnta
0.0271979 0.0065750
lagfa_ta
-0.0057013 0.0235412
lagrd_dum
-0.0178886 0.0105722
lagrd_ta
0.0209480 0.0324596
lagindmedian 0.1221109 0.0383102
lagrated
-0.0089387 0.0077516
--Signif. codes: 0 "***" 0.001 "**"
z-value Pr(>|z|)
8.9761 < 2.2e-16 ***
3.2488 0.001159 **
9.1402 < 2.2e-16 ***
-0.2377 0.812146
4.1366 3.526e-05 ***
-0.2422 0.808639
-1.6920 0.090640 .
0.6454 0.518697
3.1874 0.001435 **
-1.1531 0.248851
0.01 "*" 0.05 "." 0.1 " " 1
430
1st Qu.
0.000000
Median
0.000000
15039
Mean
0.001129
3rd Qu.
0.000000
Max.
0.860300
Coefficients
Estimate Std. Error
lag(mdr)
0.3868871 0.0725827
lagebit_ta
0.0371411 0.0174741
lagmb
0.0155606 0.0026849
lagdep_ta
0.0784848 0.1090149
laglnta
0.0298020 0.0082703
lagfa_ta
0.0187859 0.0279833
lagrd_dum
-0.0191112 0.0117298
lagrd_ta
-0.0044324 0.0352264
lagindmedian 0.0878274 0.0439521
lagrated
-0.0087387 0.0098295
--Signif. codes: 0 "***" 0.001 "**"
z-value Pr(>|z|)
5.3303 9.805e-08 ***
2.1255 0.033545 *
5.7955 6.812e-09 ***
0.7199 0.471559
3.6035 0.000314 ***
0.6713 0.502013
-1.6293 0.103253
-0.1258 0.899870
1.9983 0.045689 *
-0.8890 0.373987
0.01 "*" 0.05 "." 0.1 " " 1
11
References
Belgorodski N, Greiner M, Tolksdorf K and Schueller K 2012 rriskDistributions: Fitting
distributions to given data or known quantiles. R package version 1.8. http://CRAN.Rproject.org/package=rriskDistributions
Bolker B and R Development Core Team 2012 bbmle: Tools for general maximum likelihood
estimation. R package version 1.0.5.2. http://CRAN.R-project.org/package=bbmle
Brockwell PJ and Davis RA 1991 Time Series: Theory and Methods, Springer Verlag.
Chambers JM, Cleveland WS, Kleiner B and Tukey PA 1983 Graphical Methods for Data Analysis,
Wadsworth & Brooks/Cole.
Chan K-S, Ripley B 2012 TSA: Time Series Analysis. R package version 1.01. http://CRAN.Rproject.org/package=TSA
Chausse P 2010 Computing Generalized Method of Moments and Generalized Empirical Likelihood
with R. Journal of Statistical Software 34(11), 135, http://www.jstatsoft.org/v34/i11/.
Cookson JA 2012 tonymisc: Functions for Econometrics Output. R package version 1.1.1.
http://CRAN.R-project.org/package=tonymisc
Cribari-Neto F 2004 Asymptotic Inference Under Heteroskedasticity of Unknown Form.
Computational Statistics & Data Analysis 45, 215233.
Croissant Y and Millo G 2008 Panel Data Econometrics in R: The plm Package. Journal of Statistical
Software 27(2), 143, http://www.jstatsoft.org/v27/i02/.
Davidson R and MacKinnon JG 1993 Estimation and Inference in Econometrics, Oxford University
Press.
Elff M 2013 memisc: Tools for Management of Survey Data, Graphics, Programming, Statistics, and
Simulation. R package version 0.96-4. http://CRAN.R-project.org/package=memisc
Faraway
JJ
2002
Practical
Regression
and
Anova
using
R,
July
2002,
http://stat.ethz.ch/CRAN/doc/contrib/Faraway-PRA.pdf.
Flannery MJ and Rangan KP 2006 Partial Adjustment toward Target Capital. Journal of Financial
Economics 41, 4173.
Fox J and Weisberg S 2011 An R Companion to Applied Regression, Second Edition. Thousand
Oaks. CA: Sage. http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
Fox J, Nie Z and Byrnes J 2013 sem: Structural Equation Models. R package version 3.1-3.
http://CRAN.R-project.org/package=sem
Graves S 2012 FinTS: Companion to Tsay (2005) Analysis of Financial Time Series. R package
version 0.4-4. http://CRAN.R-project.org/package=FinTS
Hannan EJ Rissanen J 1982 Recursive Estimation of Mixed Autoregressive-Moving Average Order.
Biometrika 69(1), 8194.
Hardin JW Hilbe JM 2007 Generalized Linear Models and Extensions, Stata Press.
Hyndman RJ, Khandakar Y 2008 Automatic Time Series Forecasting: The forecast Package for R.
Journal of Statistical Software 27(3), 122, http://www.jstatsoft.org/v27/i3/.
Hyndman RJ with contributions from G Athanasopoulos, S Razbash, D Schmidt, Z Zhou and Y
Khan 2013 forecast: Forecasting functions for time series and linear models. R package version
4.03. http://CRAN.R-project.org/package=forecast
Jackman S 2012 pscl: Classes and Methods for R Developed in the Political Science Computational
Laboratory, Stanford University. Department of Political Science, Stanford University. Stanford,
California. R package version 1.04.4. URL http://pscl.stanford.edu/
Jarque CM, Bera A 1987 A Test for Normality of Observations and Regression Residuals.
International Statistical Review 55(2), 163172.
Johnson NL, Kemp AW, Kotz S 2005 Univariate Discrete Distributions. Wiley.
Johnston J, Di Nardo J 1997 Econometric Methods, 4th edn. McGraw-Hill.
432
433
Wuertz D, Chalabi Y with contribution from M Miklovic, C Boudt, P Chausse and others 2012
fGarch: Rmetrics - Autoregressive Conditional Heteroskedastic Modelling. R package version
2150.81. http://CRAN.R-project.org/package=fGarch
Wuertz D, many others and see the SOURCE file 2012 fArma: ARMA Time Series Modelling. R
package version 2160.78. http://CRAN.R-project.org/package=fArma
Zappa D, Bramante R, Nai Ruscone M 2012 Appunti di Metodi Statistici per la Finanza e le
Assicurazioni, Educatt.
Zeileis A 2004 Econometric Computing with HC and HAC Covariance Matrix Estimators. Journal
of Statistical Software 11(10), 117, http://www.jstatsoft.org/v11/i10/.
Zeileis A 2006 Object-oriented Computation of Sandwich Estimators. Journal of Statistical Software
16(9), 1-16. URL http://www.jstatsoft.org/v16/i09/.
Zeileis A 2011 dynlm: Dynamic Linear Regression. R package version 0.3-1. http://CRAN.Rproject.org/package=dynlm
Zeileis A, Hothorn T 2002 Diagnostic Checking in Regression Relationships. R News 2(3), 7-10.
http://CRAN.R-project.org/doc/Rnews/
Zhelonkin M, Genton MG, Ronchetti E 2013 ssmrob: Robust estimation and inference in sample
selection models. R package version 0.2. http://CRAN.R-project.org/package=ssmrob
A
Some useful R functions
This Appendix includes some excerpts from the documentation available on the help
of the R system and some advice regarding the creation of graphs.
The topics regard:
how to install R
436
A.1
How to Install R
A.2
If a package is not present on your R installation, You can download it by using the
option Install package(s) from the menu Packages in the R Console.
It is also possible to use the function install.packages whose main argument is
pkgs a character vector with the names of the packages whose current versions should
be downloaded from the repositories, e.g.
install.packages("lmtest")
You can update the packages available on Your system by using
update.packages(ask=FALSE)
The following code (do not execute if not really needed!) will install all the packages
available on the CRAN site which are not present on Your system (more than 4,000
packages requiring more than 4 GigaBytes of space).
a<-new.packages()
if (length(a)>0) install.packages(a)
The latter code can be useful when R is used without an Internet connection.
??"keyword1 keyword2"
will search keyword1 and keyword2 on the help documentation related to all the
installed packages.
A.3
Data Reading
On Verbeeks site data are saved in the text, Stata and EViews formats, and are
compressed in zip files. We describe the procedures to uncompress a zip file and read
data.
Once having read data in R it is possible to check the consistency of the imported
data with the information contained in the txt file available in the zip file by using
the functions summary, head and tail.
A.3.1
zip files
To see the content of a zip file use the function unzip, available in the utils library,
which is automatically loaded when R starts.
unzip(zipfile, files = NULL, list = FALSE, overwrite = TRUE,
junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE)
zipfile: The pathname of the zip file: tilde expansion (see path.expand) will
be performed.
437
See the R help for the remaining options and more information.
A.3.2
To read from a text file use the command read.table, available in the utils library.
It reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
file: the name of the file which the data are to be read from. Each row of the
table appears as one line of the file.
file can also be a complete URL. (For the supported URL schemes, see the
URLs section of the help for url.)
header: a logical value indicating whether the file contains the names of the
variables as its first line. If missing, the value is determined from the file format:
header is set to TRUE if and only if the first row contains one fewer field than
the number of columns.
sep: the field separator character. Values on each line of the file are separated
by this character. If sep = (the default for read.table) the separator is white
space, that is one or more spaces, tabs, newlines or carriage returns.
row.names: a vector of row names. This can be a vector giving the actual row
names, or a single number giving the column of the table which contains the
row names, or character string giving the name of the table column containing
the row names.
If there is a header and the first row contains one fewer field than the number
of columns, the first column in the input is used for the row names. Otherwise
if row.names is missing, the rows are numbered.
col.names: a vector of optional names for the variables. The default is to use
V followed by the column number.
438
See the R help for the remaining arguments and more information.
A.3.3
To read a Stata file use the function read.dta which reads a file in Stata version 5-11
binary format into a data frame. The function is available in the package foreign.
read.dta(file, convert.dates = TRUE, convert.factors = TRUE,
missing.type = FALSE,
convert.underscore = FALSE, warn.missing.labels = TRUE)
See the R help for more information.
A.3.4
To read an EViews file use the function readEViews, which is available in the package
foreign.
readEViews(filename, as.data.frame = TRUE)
The messages Skipping boilerplate variable will be returned which warn about
the fact that the 2 variables c and resid, which are always created by default by
EViews, and thus are present in the file you are converting, are not read.
See the R help for more information.
A.3.5
A.4
439
formula{stats}
Description. The generic function formula1 and its specific methods provide a way
of extracting formulae which have been included in other objects.
Usage formula(x, ...)
Arguments x an R object. ... further arguments passed to or from other methods.
Details The models fit by, e.g., the lm and glm functions are specified in a compact
symbolic form. The ~ operator is basic in the formation of such models.
An expression of the form y ~ model is interpreted as a specification that the
response y is modelled by a linear predictor specified symbolically by model. Such a
model consists of a series of terms separated by + operators.
The terms themselves consist of variable and factor names separated by the
interaction : operator.
Such a term is interpreted as the interaction of all the variables and factors
appearing in the term.
In addition to + and :, a number of other operators are useful in model formulae.
The * operator denotes factor crossing:
a*b interpreted as a+b+a:b.
The ^ operator indicates crossing to the specified degree.
(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula
containing the main effects for a, b and c together with their second-order interactions.
The %in% operator indicates that the terms on its left are nested within those on the
right.
a + b %in% a expands to the formula a + a:b.
The - operator removes the specified terms. (a+b+c)^2 - a:b is identical to a + b
+ c + b:c + a:c.
It can also used to remove the intercept term: y ~ x - 1 is a line through the
origin.
A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
While formulae usually involve just variable and factor names, they can also involve
arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal. When such
arithmetic expressions involve operators which are also used symbolically in model
formulae, there can be confusion between arithmetic and symbolic operator use.
To avoid this confusion, the function I() can be used to bracket those portions of a
model formula where the operators are used in their arithmetic sense.
For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as
the sum of b and c.
There are two special interpretations of . in a formula.
1 The
function is available in the package stats which is automatically loaded when R starts.
440
The usual one is in the context of a data argument of model fitting functions and
means all columns not otherwise in the formula: see terms.formula.
In the context of update.formula, only, it means what was previously in this part
of the formula.
References
Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical
Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Examples
441
linear model
A.5
The function lm is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance (although aov may
provide a more convenient interface for these). It is available in the package stats
which is automatically loaded when R starts.
Usage
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
Arguments
formula: an object of class formula (or one that can be coerced to that class): a
symbolic description of the model to be fitted. The details of model specification
are given under Details.
na.action: a function which indicates what should happen when the data
contain NAs. The default is set by the na.action setting of options, and is
na.fail if that is unset. The factory-fresh default is na.omit. Another possible
value is NULL, no action. Value na.exclude can be useful.
method: the method to be used; for fitting, currently only method = "qr" is
supported; method = "model.frame" returns the model frame (the same as
with model = TRUE, see below).
442
Details
Models for lm are specified symbolically. See Section A.4.
If the formula includes an offset, this is evaluated and subtracted from the response.
If response is a matrix a linear model is fitted separately by least-squares to each
column of the matrix.
See model.matrix for some further details. The terms in the formula will be reordered so that main effects come first, followed by the interactions, all second-order,
all third-order and so on: to avoid this pass a terms object as the formula (see aov
and demo(glm.vr) for an example).
A formula has an implied intercept term. To remove this use either y { x - 1 or y {
0 + x. See formula for more details of allowed formulae.
Non-NULL weights can be used to indicate that different observations have different
variances (with the values in weights being inversely proportional to the variances); or
equivalently, when the elements of weights are positive integers wi , that each response
yi is the mean of wi unit-weight observations (including the case that there are wi
observations equal to yi and the data have been summarized).
lm calls the lower level functions lm.fit, etc, see below, for the actual numerical
computations. For programming only, you may consider doing likewise.
All of weights, subset and offset are evaluated in the same way as variables in formula,
that is first in data and then in the environment of formula.
Value
lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm").
The functions summary and anova are used to obtain and print a summary and
analysis of variance table of the results. The generic accessor functions coefficients,
effects, fitted.values and residuals extract various useful features of the value returned
by lm.
An object of class "lm" is a list containing at least the following components:
xlevels: (only where relevant) a record of the levels of the factors used in
fitting.
443
In addition, non-null fits will have components assign, effects and (unless not
requested) qr relating to the linear fit, for use by extractor functions such as summary
and effects.
Using time series
Considerable care is needed when using lm with time series.
Unless na.action = NULL, the time series attributes are stripped from the variables
before the regression is done. (This is necessary as omitting NAs would invalidate the
time series attributes, and if NAs are omitted in the middle of the series the result
would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to line up series, so
that the time shift of a lagged or differenced regressor would be ignored. It is good
practice to prepare a data argument by ts.intersect(..., dframe = TRUE), then
apply a suitable na.action to that data frame and call lm with na.action = NULL so
that residuals and fitted values are time series.
Note
Offsets specified by offset will not be included in predictions by predict.lm, whereas
those specified by an offset term in the formula will be.
Author(s)
The design was inspired by the S function of the same name described in Chambers
(1992). The implementation of model formula by Ross Ihaka was based on Wilkinson
& Rogers (1973).
References
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J.
M. Chambers and T. J. Hastie, Wadsworth Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models
for analysis of variance. Applied Statistics, 22, 392-9.
We observe that with time series it is possible to use the function dynlm available
in the package dynlm. See the R help system
444
A.6
Deducer
A.6
Deducer
427
The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to dene a linear
model and the output2 .
The menus can be further enriched by calling the library DeducerExtras, with tools
for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.
Figure A.1
The menu Data contains tools for data manipulation and the menu Analysis tools
for statistical analyses. Figures A.1 and A.2-A.3 report the interface to define a linear
model and the output2 .
The menus can be further enriched by calling the package DeducerExtras, with
tools for inferential statistics and multivariate statistical analysis, DeducerPlugInScaling
with tools for reliability and factor analysis, DeducerSpatial with tools for spatial
statistics, DeducerSurvival with tools for survival analysis, and DeducerText with
tools for the analysis of textual data.
Have a look at http://www.deducer.org for more information.
Figure A.1
445
446
Figure A.2
Figure A.3
447
B
Addendum 3rd edition
This Chapter can be downloaded from the booksite www.educatt.it/libri/materiali.
Euro 23,00