Professional Documents
Culture Documents
Sunil
30 August 2017
Contents
1 A Model of the interest rate spread:An illustration of Box Jenkins Methodology in R 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Estimation: First model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Diagnostic Checking: AR(7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Second Attempt: AR(6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Third Attempt: AR(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Fourth Model :AR(1,2,7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Fifth Model: ARMA(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Sixth Attempt: ARMA(2,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Seventh Model :ARMA(2,(1,7)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Assignments 21
1.1 Introduction
This example is a reproduction from Walter Enders (4th Edition). Chapter 2 Section 10 (pp 88). The
estimated values may differ slightly.
Data: Quarterly data from 1960Q1 to 2012Q4 on interest rate spread (Difference between the interest
rate on 5-year U.S. government bonds(R5) and the rate on 3-month treasury bills(TBILL)).
R5 and TBILL are given in the data file named QUARTERLY. (File has a sas7bdat extension. Use
read.sas7bdat() to load the file.)
The graph of spread is given below.
Always examine the time series visually before starting the analysis
library("sas7bdat")
data1 <- read.sas7bdat("quarterly.sas7bdat")
1
tssp <- ggplot(data = df, mapping = aes(x = date, y = Y)) + geom_line() +
theme_classic() + ggtitle("Figure 1: The interest rate spread") +
xlab("") + ylab("") + geom_hline(yintercept = mu)
Next step is to identity tentative model for the process. We can make initial guess by checking ACF and
PACF. Plots of the ACF and PACF are given in figure 2 and 3 (95% confidence interval is by red dashed
lines). 95% confidence intervals are:
2/sqrt(length(sp))
## [1] 0.1373606
-2/sqrt(length(sp))
## [1] -0.1373606
The joint significance of autocorrelation can be examined using Ljung Box Q statistics. For example the
following code computes joint significance of first two autocorrelations.
Box.test(sp, lag = 2, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: sp
## X-squared = 256.67, df = 2, p-value < 2.2e-16
The first 12 Q statistics along with the numerical values of ACF and PACF are given in Table 1.
source("acfpacf.R")
acf.graph1(sp, 12) + ggtitle("Figure 2: ACF of interest rate spread")
2
Figure 2: ACF of interest rate spread
0.9
0.6
ACF
0.3
0.0
1 3 5 7 9 11
lags
0.6
PACF
0.3
0.0
1 3 5 7 9 11
lags
library("knitr")
# ACF
sacf <- acf(sp, plot = F, 12)
# PACF
spacf <- pacf(sp, plot = F, 12)
df1 <- as.matrix(rbind(ACF = sacf$acf[2:13], PACF = spacf$acf))
3
Table 1: ACF and PACF of Interest Rate Spread.
1 2 3 4 5 6 7 8 9 10 11 12
ACF 0.86 0.68 0.55 0.41 0.28 0.15 0.07 0.04 -0.03 -0.13 -0.20 -0.22
PACF 0.86 -0.21 0.11 -0.18 -0.01 -0.14 0.14 0.01 -0.18 -0.12 -0.04 0.08
Q-stat 157.69 256.67 322.03 358.66 375.68 380.57 381.69 382.04 382.27 385.90 394.72 405.84
pval 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The ACF and PACF converge to zero quickly enough that we do not have to worry about a time-varying
mean.
Recall that the theoretical ACF of a pure MA(q) process cuts off to zero at lag q. Plot of the ACF
dont exhibit such a behavior hence possibility of pure MA(q) process can be eliminated.
Theoretical ACF of an AR(1)model decays geometrically.Examination of numerical values of ACF rules
out this possibility too.
The estimated values of the PACF are such that 11 = 0.86, 22 = 0.21, 33 = 0.11, and 44 = 0.18.
Although 55 is close to zero, 66 = 0.15 and 77 = 0.14. It is clear from the figure that 11 =
0.86, 22 = 0.21, 44 = 0.18, 66 = 0.15 and 77 = 0.14. are statistically different from zero.In a
pure AR(p) model, the PACF cuts to zero after lag p. Hence, if the series follows a pure AR(p) process,
the value of p could be as high as six or seven.
There appears to be an oscillating pattern in the PACF in that the first seven values alternate in sign.
Oscillating decay of the PACF is characteristic of a positive MA coefficient (Cycles?).
Due to the number of small and marginally significant coefficients, the ACF and PACF of the spread
are probably more ambiguous.
Examining the PACF and ACF we can think of an AR(7) or ARMA(1,1) Process. Let us start with
AR(7).
library(forecast)
library(latex2exp)
mar7 <- Arima(sp, order = c(7, 0, 0))
# specify the order in order arg. First value stands for
# number of AR terms, second the order of integration and
# third for MA terms.
mar7
## Series: sp
## ARIMA(7,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 mean
## 1.1057 -0.4441 0.3891 -0.2899 0.2156 -0.2928 0.1328 1.1970
## s.e. 0.0680 0.1003 0.1038 0.1050 0.1031 0.0994 0.0675 0.1689
##
## sigma^2 estimated as 0.2169: log likelihood=-135.66
## AIC=289.32 AICc=290.21 BIC=319.53
4
# Pvalues of ARIMA
p_mar7 <- (1 - pnorm(abs(mar7$coef)/sqrt(diag(mar7$var.coef)))) *
1.96
### presenting the results with p values
df_mar7 <- as.matrix(rbind(mar7$coef, p_mar7))
0.5
Imaginary Part
0.0
0.5
1.0
1.0 0.5 0.0 0.5 1.0
Real Part
The inverse of the roots of lag polynomial(eigen values) are given below . The model is stable and stationary
since all the eigen values are inside the unit circle.
5
All the coefficients are statistically significant at 5
R reports the unconditional mean in the estimation (mu). You can easily find out the intercept using
the formula for .
Next, we need to check the properties of residuals using ACF, PACF and Q
Plots of ACF, PACF and estimates of Q stat of residuals are given below.
res_ar7 <- mar7$residuals
acf.graph1(res_ar7, 12) + ggtitle("Figure 5: ACF of residuls from AR(7)")
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
1 3 5 7 9 11
lags
6
Q-stat 0.00 0.18 0.19 0.2 0.74 0.83 1.52 5.40 6.22 6.22 13.02 13.11
pval 0.99 0.92 0.98 1.0 0.98 0.99 0.98 0.71 0.72 0.80 0.29 0.36
Plot of ACF and PACF of residuals indicate presence of autocorrelation at lag 11.To confirm it we can
check Q stat.
The significance levels of the Q-statistics indicate no remaining autocorrelation in the residuals. Hence
there is no need to account for the autocorrelation at lag 11.
Although the AR(7) model has some desirable attributes, one reasonable estimation strategy is to
eliminate the seventh lag and estimate an AR(6) model over the same sample period.
Note that the data set begins in 1960Q1, so that with seven lags the estimation of the AR(7) begins in
1961Q4. For comparable results we will estimate AR(6) for the same sample
sp1 <- window(sp, start = c(1960, 2))
# Pvalues of ARIMA
p_mar6 <- (1 - pnorm(abs(mar6$coef)/sqrt(diag(mar6$var.coef)))) *
1.96
# presenting the results with p values
df_mar6 <- as.matrix(rbind(mar6$coef, p_mar6))
7
Figure 7:Inverse of the AR roots
1.0
0.5
Imaginary Part
0.0
0.5
1.0
1.0 0.5 0.0 0.5 1.0
Real Part
Q-stat 0.13 0.13 0.13 0.31 0.34 2.08 2.73 10.73 12.03 12.14 16.03 16.39
pval 0.72 0.94 0.99 0.99 1.00 0.91 0.91 0.22 0.21 0.28 0.14 0.17
8
Figure 8: ACF of residuls from AR(6)
0.2
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
1 3 5 7 9 11
lags
Although there is autocorrelations in the residuals at some lags the Q-statistics fail to reject the null of
no autocorrelation at all lags.
Although AR5 appears to be statistically insignificant, it is generally not a good idea to use t-statistics
to eliminate intermediate lags. As such, most researchers would not eliminate the fifth lag and estimate
a model with lags 1 through 4 and lag 6. Recall that the appropriate use of a t-statistic requires that
regressor in question be uncorrelated with the other regressors. (But you can try. If you get a better
model you can use it. )
The overall result is that the diagnostic checks of the AR(6) model suggests that it is adequate. We
can compare these two models using AIC and SBC criteria
AIC for AR(7)
mar7$aic
## [1] 289.3214
AIC for AR(6)
mar6$aic
## [1] 289.5437
SBC for AR(7)
mar7$bic
## [1] 319.5307
9
SBC for AR(6)
mar6$bic
## [1] 316.3586
Thus the AIC selects the AR(7) model, whereas the SBC selects the more parsimonious AR(6) model.
Suppose that you try a very parsimonious model and estimate an AR(2).
# Pvalues of ARIMA
p_mar2 <- (1 - pnorm(abs(mar2$coef)/sqrt(diag(mar2$var.coef)))) *
1.96
### presenting the results with p values
df_mar2 <- as.matrix(rbind(mar2$coef, p_mar2))
AR1 AR2 mu
Coefficients 1.04 -0.22 1.21
P Value 0.00 0.00 0.00
10
Figure 10:Inverse of the AR roots
1.0
0.5
Imaginary Part
0.0
0.5
1.0
1.0 0.5 0.0 0.5 1.0
Real Part
The roots are within the unit circle and AR(2) satisfies the stability condition.
# Ljung Box Q stat
res_ar2 <- mar2$residuals
Mjb <- matrix(nrow = 2, ncol = 12)
for (i in 1:12) {
jb <- Box.test(res_ar2, lag = i, type = "Ljung-Box")
round(Mjb[1, i] <- jb$statistic, 2)
round(Mjb[2, i] <- jb$p.value, 2)
}
Q-stat 0.19 3.79 9.17 9.22 10.53 12.87 16.86 21.84 22.61 23.68 29.34 29.34
pval 0.67 0.15 0.03 0.06 0.06 0.05 0.02 0.01 0.01 0.01 0.00 0.00
11
Figure 11: ACF of residuls from AR(2)
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
0.2
1 3 5 7 9 11
lags
AIC for AR(2)
mar2$aic
## [1] 296.4717
SBC for AR(2)
mar2$bic
## [1] 309.8791
Comparing AR(7) and AR(2) we can see that the AIC selects the AR(7) model, but SBC selects the
AR(2) model.
However, the residual autocorrelations from the AR(2) are problematic.The Q-statistics from the AR(2)
model indicate significant autocorrelation in the residuals at the shorter lags.
As such, AR(2) should be eliminated from further consideration.
If you examined the AR(7) carefully, you might have noticed that AR3 almost offsets AR4 and that AR5
almost offsets A6 (since AR3 + AR4 =0 and AR5 + AR6=0). We can try another model by reestimatin the
model without AR3.AR4,AR5,AR6.
12
mar127 <- Arima(sp, order = c(7, 0, 0), fixed = c(NA, NA, 0,
0, 0, 0, NA, NA))
# Pvalues of ARIMA
p_mar127 <- (1 - pnorm(abs(mar127$coef)/sqrt(diag(mar127$var.coef)))) *
1.96
### presenting the results with p values
df_mar127 <- as.matrix(rbind(mar127$coef[-(3:6)], p_mar127[-(3:6)]))
The AR7 is now statistically insignificant, it might seem preferable to use the AR(2) instead.
Yet, the AR(2) has been shown to be inadequate relative to the AR(7) and the AR(6) models.
# Ljung Box Q stat
res_ar127 <- mar127$residuals
Mjb <- matrix(nrow = 2, ncol = 12)
for (i in 1:12) {
jb <- Box.test(res_ar127, lag = i, type = "Ljung-Box")
round(Mjb[1, i] <- jb$statistic, 2)
round(Mjb[2, i] <- jb$p.value, 2)
}
Q-stat 0.19 4.39 8.41 8.42 9.45 12.47 15.35 22.18 23.68 24.27 28.68 28.79
pval 0.66 0.11 0.04 0.08 0.09 0.05 0.03 0.00 0.00 0.01 0.00 0.00
13
Figure 13: ACF of residuls from AR(1,2,7)
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
1 3 5 7 9 11
lags
This model also exhibits autocorrelation in residuals. (See the Q stat)
Even though the AR(6) and AR(7) models perform relatively well, they are not necessarily the best
forecasting models.
There are several possible alternatives since the patterns of the ACF and PACF are not immediately
clear.
As mentioned earlier the decaying nature of ACF and PACF can be taken as an indication of ARMA(1,1)
Model.
# Pvalues of ARIMA
p_mar11 <- (1 - pnorm(abs(mar11$coef)/sqrt(diag(mar11$var.coef)))) *
1.96
### presenting the results with p values
df_mar11 <- as.matrix(rbind(mar11$coef, p_mar11))
14
"MA1", "mu"))
AR1 MA1 mu
Coefficients 0.76 0.38 1.21
P Value 0.00 0.00 0.00
Q-stat 0.29 0.59 6.49 6.57 8.30 9.90 13.49 18.16 18.27 18.67 23.91 23.91
pval 0.59 0.74 0.09 0.16 0.14 0.13 0.06 0.02 0.03 0.04 0.01 0.02
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
15
Figure 16: ACF of residuls from ARMA(1,1)
0.1
PACF
0.0
0.1
0.2
1 3 5 7 9 11
lags
# AIC
mar11$aic
## [1] 289.5957
# SBC
mar11$bic
## [1] 302.9072
AR and MA coefficient are less than |1| impaling a stationary invertible process.
The SBC from the ARMA(1, 1) is smaller than that of the AR(7) and the AR(6).
Nevertheless,the ARMA(1, 1) specification is inadequate because of remaining serial correlation in the
residuals. The Ljung-Box Q-statistic indicate that the residuals from this model exhibit substantial
serial autocorrelation. As such, we must eliminate the ARMA(1, 1) model from consideration.
Since the ACF decays and the PACF seems to oscillate beginning with lag 2 it seems plausible to estimate an
ARMA(2, 1) model.
16
AR1 AR2 MA1 mu
0.5
Imaginary Part
0.0
0.5
1.0
1.0 0.5 0.0 0.5 1.0
Real Part
The roots are within the unit circle and AR(2) satisfies the stability condition. The process is inverible as
MA coefficient is less than one.
# Ljung Box Q stat
res_ar21 <- mar21$residuals
Mjb <- matrix(nrow = 2, ncol = 12)
for (i in 1:12) {
jb <- Box.test(res_ar21, lag = i, type = "Ljung-Box")
round(Mjb[1, i] <- jb$statistic, 2)
round(Mjb[2, i] <- jb$p.value, 2)
}
17
Table 13: Ljung Box Q statistics for residuals of ARMA(2,1)
Q-stat 0.00 0.05 1.09 1.22 1.42 2.96 7.36 12.04 12.08 12.28 18.72 18.72
pval 0.99 0.97 0.78 0.87 0.92 0.81 0.39 0.15 0.21 0.27 0.07 0.10
0.1
ACF
0.0
0.1
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
0.2
1 3 5 7 9 11
lags
# AIC
mar21$aic
## [1] 287.1041
# SBC
mar21$bic
## [1] 303.7677
This model is an improvement over the ARMA(1, 1) specification.
All the coefficients are statistical significant.
The AIC selects the ARMA(2, 1) model over that AR(6) and the SBC selects the ARMA(2, 1) over the
AR(6) and the AR(7).
18
The values for Q stat indicate that the autocorrelations of the residuals are not statistically significant
at the 5
In order to account for the serial correlation at lag 7, it might seem plausible to add an MA term to the
model at lag 7
mar27 <- Arima(sp, order = c(2, 0, 7), fixed = c(NA, NA, NA,
0, 0, 0, 0, 0, NA, NA))
# Pvalues of ARIMA
p_mar27 <- (1 - pnorm(abs(mar27$coef)/sqrt(diag(mar27$var.coef)))) *
1.96
### presenting the results with p values
df_mar27 <- as.matrix(rbind(mar27$coef[-(4:8)], p_mar27[-(4:8)]))
Q-stat 0.01 0.18 0.58 0.83 0.83 2.06 2.34 2.80 4.71 6.51 11.08 11.24
pval 0.93 0.91 0.90 0.93 0.97 0.91 0.94 0.95 0.86 0.77 0.44 0.51
19
Figure 20: ACF of residuls from ARMA(2,(1,7))
0.15
0.10
0.05
ACF
0.00
0.05
0.10
0.15
1 3 5 7 9 11
lags
0.1
PACF
0.0
0.1
1 3 5 7 9 11
lags
20
Moreover, the series has a number of sharp jumps, indicating that the assumption of normality might
be violated. Possibility of heteroschedasticity in Variance
3. Incorporation of trends
Note: Forecasting will be covered in the next tutorial
2 Assignments
Given the coefficients we can computes the theoretical ACF, PACF for the models given above using
ARMAacf(). Plot the theoretical ACF and PACF of all the models and compare it with sample ACF of
spread.
Compute and plot the IRF
Attempt Questions 9, 11, and 13 form Enders (4th edition) Chapter 2, Excercies, pp 114-115.
21